August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and

August 9, 2006 Agrawal: VDAT'06 Tutorial II 1

Low-Power Electronics and Systems

Vishwani D. AgrawalJames J. Danaher Professor

Department of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849, USA

http://www.eng.auburn.edu/[email protected]

http://www.eng.auburn.edu/~vagrawal

mailto:[email protected]


Contents

• Introduction• Dynamic power

– Short circuit power– Reduced supply voltage operation– Glitch elimination

• Static (leakage) power reduction• Low power systems

– State encoding– Processor and multi-core design

• Books on low-power design


Introduction

Why is it a concern?

Power Consumption of VLSI Chips


ISSCC, Feb. 2001, Keynote“Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second -- about the same number of calculations that the world's fastest supercomputer can perform now.

“Unfortunately, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor. . . .”

Patrick P. Gelsinger Senior Vice PresidentGeneral ManagerDigital Enterprise Group INTEL CORP.


VLSI Chip Power Density

40048008

80808085

8086

286386

486Pentium®

P6

1

10

100

1000

10000

1970 1980 1990 2000 2010

Year

Po

wer

Den

sity

(W

/cm

2 )

Hot Plate

NuclearReactor

RocketNozzle

Sun’sSurface

Source: Intel


Meaning of Low-Power Design

• Design practices that reduce power consumption at least by one order of magnitude; in practice 50% reduction is often acceptable.

• General considerations in low-power design– Algorithms and architectures– High-level and software techniques– Gate and circuit-level methods– Power estimation techniques– Test power


Topics in Low-Power• Power dissipation in CMOS circuits• Device technology

– Low-power CMOS technologies– Energy recovery methods

• Circuit and gate level methods– Logic synthesis– Dynamic power reduction techniques– Leakage power reduction

• System level methods– Microprocessors– Arithmetic circuits– Low power memory technology

• Test power• Power estimation methods and tools


Power in a CMOS GateVVDDDD

iiDDDD(t)(t)

GroundGround


Power Dissipation in CMOS Logic (0.25µ)

%75 %5%20

Ptotal (0→1) = CL VDD2

+ tscVDD Ipeak + VDDIleakage

CL

VDD VDD


Power and Energy

• Instantaneous power (Watts)

P(t) = iDD(t) VDD

• Peak power (Watts)

Ppeak = Max {P(t)}• Average power (Watts)

Pav = [ ∫0

T P(t) dt ]/T

• Energy (Joules)

E = ∫0

T P(t) dt


Low-Power Design Techniques

• Circuit and gate level methods–Reduced supply voltage

–Adiabatic switching and charge recovery

–Logic design for reduced activity

–Reduced Glitches

–Transistor sizing

–Pass-transistor logic

–Pseudo-nMOS logic

–Multi-threshold gates


Low-Power Design Techniques

• Functional and architectural methods– Clock suppression– Clock frequency reduction– Supply voltage reduction– Power down– Algorithmic and Software methods


Test Power• Power grid on a VLSI chip is designed for

certain current capacity during functional operation:– Average current → heat dissipation– Peak current → noise, ground bounce

• Problem – Tests like scan or BIST are nonfunctional and may cause higher than the functional circuit activity; a functionally good chip can fail the test.


Power Estimation Methods

• Spice: Accurate but expensive

• Logic-level– Event-driven simulation– Statistical– Probabilistic

• High-level: Hierarchical


Components of Power• Dynamic

– Signal transitions• Logic activity• Glitches

– Short-circuit

• Static– Leakage Ptotal = Pdyn + Pstat

= Ptran + Psc + Pstat


Power of a Transition: Ptran

VVDDDD

GroundGround

CL

Ron

R=large

vi (t) vo(t) ic(t)


Charging of a Capacitor

V C

R

i(t) v(t)

Charge on capacitor, q(t) = C v(t)

Current, i(t) = dq(t)/dt = C dv(t)/dt

t = 0


i(t) = C dv(t)/dt = [V – v(t)] /R dv(t) V – v(t) ─── = ───── dt RC

dv(t) dt∫ ───── = ∫───── V – v(t) RC

-t ln [V – v(t)] = ── + A

RC

Initial condition, t = 0, v(t) = 0 → A = ln V -t

v(t) = V [1 – exp(───)]

RC


-t v(t) = V [1 – exp( ── )]

RC

dv(t) V -ti(t) = C ─── = ── exp( ── )

dt R RC


Total Energy Per Charging Transition from Power Supply

∞ ∞ V2 -tEtrans = ∫ V i(t) dt = ∫ ── exp( ── ) dt

0 0 R RC

= CV2


Energy Dissipated per Transition in Resistance (R) of “On” Transistors

∞ V2 ∞ -2tR ∫ i2(t) dt = R ── ∫ exp( ── ) dt 0 R2 0 RC

1= ─ CV2

2


Energy Stored in Charged Capacitor

∞ ∞ -t V -t∫ v(t) i(t) dt = ∫ V [1- exp( ── )] ─ exp( ── ) dt0 0 RC R RC

1 = ─ CV2

2


Transition Power• Gate output rising transition

– Energy dissipated in pMOS transistor = CV2/2– Energy stored in capacitor = CV2/2

• Gate output falling transition– Energy dissipated in nMOS transistor = CV2/2

• Energy dissipated per transition = CV2/2• Power dissipation:

Ptrans = Etrans α fck = α fck CV2/2

α = activity factor


Short Circuit Current, isc(t)

Time (ns)0 1

Amp

Volt

VDD

isc(t)

0

Vi(t)Vo(t)

VDD - VTp

VTn

tB tE

Iscmaxf

VDD

Vi(t) Vo(t)

GND


Short-Circuit Energy per Transition

• Escf =∫tB

tE VDD isc(t)dt = (tE – tB) IscmaxfVDD /2

• Escf = tf (VDD- |VTp| -VTn) Iscmaxf /2

• Escr = tr (VDD- |VTp| -VTn) Iscmaxr /2

• Escf = 0, when VDD = |VTp| + VTn


Short-Circuit Power and Voltage Scaling

• Decreases and eventually becomes zero when VDD is scaled down but the threshold voltages are not scaled down.

• References:– M. A. Ortega and J. Figueras, “Short Circuit Power

Modeling in Submicron CMOS,” PATMOS’96, Aug. 1996, pp. 147-166.

– T. Sakurai and A. Newton, “Alpha-power Law MOSFET model and Its Application to a CMOS Inverter,” IEEE J. Solid State Circuits, vol. 25, April 1990, pp. 584-594.


Psc and Output Capacitance

VVDDDD

GroundGround

CL

Ron

R=large

vi (t) vo(t) ic(t)+isc(t)

tftr vo(t)───

R↑


isc and Output Capacitance

-tVDD[1- exp(─────)]

vo(t) R↓tf (t)CIsc(t) = ──── = ──────────────

R↑tf (t) R↑tf (t)


iscmax and Output Capacitance

Small C Large C

tf

1────R↑tf (t)

iscmax

vo(t) vo(t)

i

t


Psc, Output Rise Times, Capacitance

• For given input rise and fall times short circuit power decreases as output capacitance increases.

• Short circuit power increases with increase of input rise and fall times.

• Short circuit power is reduced if output rise and fall times are smaller than the input rise and fall times.


Effects of Scaling Down

• 1-16% short-circuit power at 0.7 micron

• 4-37% at 0.35 micron

• 12-60% at 0.17 micron

• Reference: S. R. Vemuru and N. Steinberg, “Short Circuit Power Dissipation Estimation for CMOS Logic Gates,” IEEE Trans. on Circuits and Systems I, vol. 41, Nov. 1994, pp. 762-765.


Summary: Short-Circuit Power

• Short-circuit power is consumed by each transition (increases with input transition time).

• Reduction requires that gate output transition should not be faster than the input transition (faster gates can consume more short-circuit power).

• Increasing the output load capacitance reduces short-circuit power.

• Scaling down of supply voltage with respect to threshold voltages reduces short-circuit power.


Dynamic Power

VVDDDD

GroundGround

CL

R

R

Dynamic Power

= CLVDD2/2 + Psc

Vi

Vo

isc


Dynamic Power Reduction

• Reduce power per transition– Reduced voltage operation – voltage scaling– Capacitance minimization – device sizing

• Reduce number of transitions– Glitch elimination


CMOS Dynamic PowerDynamic Power = Σ 0.5 αi fclk CLi VDD

2

All gates i

≈ 0.5 α fclk CL VDD2

≈ α01 fclk CL VDD2

where α average gate activity factorα01 = 0.5α, average 0→1 trans.fclk clock frequencyCL total load capacitanceVDD supply voltage


Example: 0.25μm CMOS Chip

• f = 500MHz

• Average capacitance = 15fF/gate

• VDD = 2.5V

• 106 gates

• Power = α01 f CL VDD2

= α01×500×106×(15×10-15×106) ×2.52

= 46.9W, for α01 = 1.0


Signal Activity, α

T=1/f

Clock α01= 1.0

α01= 0.5

α01= 0.5

Comb.signals


Reducing Dynamic Power

• Dynamic power reduction is– Quadratic with reduction of supply voltage– Linear with reduction of capacitance


0.25μm CMOS Inverter, VDD=2.5V

0

-4

-8

-12

-16

-20

Vin (V)

Vou

t (V

)

Vin (V)

2.5

2.0

1.5

1.0

0.5

00 0.5 1.0 1.5 2.0 2.5 0 0.5 1.0 1.5 2.0 2.5

Gai

n


0.25μm CMOS Inverter, VDD< 2.5V

0.2

0.15

0.1

0.05

0

Vin (V)

Vou

t (V

)

Vin (V)

2.5

2.0

1.5

1.0

0.5

00 0.5 1.0 1.5 2.0 2.5 0 0.05 0.1 0.15 0.2

Vou

t (V

)

Gain = -1


Lower Bound on VDD

• For proper operation of gate, maximum gain (for Vin = VDD/2) should be greater than 1.

• Gainmax = -(1/n)[exp(VDD /2ΦT) – 1] = -1 • n = 1.5

• ΦT = kT/q = 26mV

• VDD = 48V

• VDDmin > 2 to 4 times kT/q or ~100mV at room temperature (27oC)

• Ref.: J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, 2003.


Impact of VDD on Performance

CLVDD

Inverter delay = K ───────(VDD – Vt )α

0.6V 1.8V 3.0V VDD

Power

Delay

40

30

20

10

0

Del

ay (

ns)

VDD=Vt

Po

we

r (l

og

sca

le)


Optimum Power × Delay VDD

3

Power × Delay, PD = constant × ─────── (VDD – Vt)α

For minimum power-delay product, d(PD)/dVDD = 0

3VtVDD = ───

3 – α

For long channel devices, α = 2, VDD = 3Vt

For very short channel devices, α = 1, VDD = 1.5Vt


Transistor Sizing for Performance

• Problem: If we increase W/L to make the charging or discharging of load capacitance, then the increased W increases the load for the driving gate

Cin CL


Fixed-Taper Buffer

VinVout

CLCin

1 α α2 αi-1 αn-1

Ci = αi-1Cin

CL = αnCin

Delay= t0

Ref.: J. Segura and C. F. Hawkins, CMOS Electronics, How It Works,How It Fails, Piscataway, New Jersey: IEEE Press, 2004.


Buffer (Cont.)

αn = CL/Cin

ln (CL/Cin)n = ──────

ln α

ith stage delay, ti = αt0, i = 1, . . . n, because each stage drives a stage α times bigger than itself.


Buffer (Cont.)

nTotal delay = Σ ti = nαt0

i=1

= ln(CL/Cin) αt0/ln(α)


Buffer (Cont.)

Differentiating total delay with respect to α and equating to 0, we get

αopt = e ≈ 2.7

The optimum number of stages is

nopt = ln(CL/Cin)


Further Reading

B. S. Cherkauer and E. G. Friedman, “A Unified DesignMethodology for CMOS Tapered Buffers,” IEEE Trans.VLSI Systems, vol. 3, no. 1, pp. 99-111, March 1995.


Logic Activity and Glitches

4 5

7

61

2

3

d=2d=1 d=1

d=1


Glitch Power Reduction

• Design a digital circuit for minimum transient energy consumption by eliminating hazards


Theorem 1• For correct operation with minimum

energy consumption, a Boolean gate must produce no more than one event per transition.

Output logic state changesOne transition is necessary

Output logic state unchangedNo transition is necessary


Inertial Delay of a Gate (Inverter)

dHL dLH

dHL+dLH

d = ──── 2

Vin

Vout

time


• Given that events occur at the input of a gate with inertial delay d at times,

t1 ≤ . . . ≤ tn , the number of events at the gate output cannot exceed

Theorem 2

min ( min ( n n , 1 + ), 1 + )ttnn – t – t11

----------------dd

ttnn - t - t11

tt11 t t22 t t33 t tnn timetime


Minimum Transient Design

• Minimum transient energy condition for a Boolean gate:

| t| tii - t - tjj | < d | < d

Where tWhere tii and t and tjj are arrival times of input are arrival times of input

events and d is the inertial delay of gateevents and d is the inertial delay of gate


Balanced Delay Method

• All input events arrive simultaneously• Overall circuit delay not increased• Delay buffers may have to be inserted

11 111111 11

111111

33

11 11

4?4?


Hazard Filter Method• Gate delay is made greater than maximum input path delay

difference• No delay buffers needed (least transient energy)• Overall circuit delay may increase

33 111111 11

33111111 11


Glitch-Free Design by Linear Programming

• Variables: gate and buffer delays

• Objective: minimize number of buffers

• Subject to: overall circuit delay

• Subject to: minimum transient condition for multi-input gate


Variables for Full-Adder

• Gate delay variables d4 . . . d12

• Buffer delay variables d15 . . . d29

Delay variables are located at the checkpoints of the circuit.

Delay variables


Objective Function

• Ideal: minimize the number of non-zero delay buffers

• Actual: minimize sum of buffer delays


Specify Critical Path Delay

1111

11 11

1111

11

11

11

0000

00

0000

00

00 0000

0000

00

00

Sum of delays on critical path ≤ Sum of delays on critical path ≤ maxdelmaxdel

Original design


Multi-Input Gate Condition

11

11 11

11

d1d1

d2d2

dd

d1 - d2 ≤ dd1 - d2 ≤ dd2 - d1 ≤ dd2 - d1 ≤ d

dd

dd

|d1 - d2| ≤ d ≡|d1 - d2| ≤ d ≡


Results: 1-Bit Adder

R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993.


AMPL Solution: maxdel = 6

2211

11 11

1111

22

11

22

22

11


AMPL Solution: maxdel = 7

2222

11 11

1111

11

11

33

22


AMPL Solution: maxdel ≥ 11

2233

11 11

1111

44

33

55


Removing a Limitation• Constraints are written by path enumeration.

• Since number of paths in a circuit can be exponential

in circuit size, the formulation is infeasible for large

circuits.

• Example: c880 has 6.96M constraints.

• Solution: A linear complexity method. See,– T. Raja, Master’s Thesis, Rutgers University, 2002.

– T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum

Dynamic Power CMOS Circuit Design by a Reduced

Constraint Set Linear Program,” Proc. 16th International Conf.

VLSI Design, 2003, pp. 527-532.


Comparison of Constraints

Number of gates in circuit

Nu

mb

er

of

con

stra

ints


Benchmark Circuits

Circuit

C432

C880

C6288

c7552

Maxdel.(gates)

1734

2448

4794

4386

No. ofBuffers

9566

6234

294120

366111

Average

0.720.62

0.680.68

0.400.36

0.380.36

Peak

0.670.60

0.540.52

0.360.34

0.340.32

Normalized Power


c7552: 3,500-gate CMOS Circuit

Clock CyclesInst

an

tan

eo

us

En

erg

y x

10--

10 J

ou

les


References• R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for

Mathematical Programming, South San Francisco: The Scientific Press, 1993.• M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc.

ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188.• V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10th Int’l Conf.

VLSI Design, Jan. 1997, pp. 193-197.• V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital

Circuit Design for Minimum Transient Energy and Linear Programming Method,” Proc. 12th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439.

• M. Hsiao, E. M. Rudnick and J. H. Patel, “Effects of Delay Model in Peak Power Estimation of VLSI Circuits,” Proc. ICCAD, Nov. 1997, pp. 45-51.

• T. Raja, A Reduced Constraint Set Linear Program for Low Power Design of Digital Circuits, Master’s Thesis, Rutgers Univ., New Jersey, 2002.

• T. Raja, V. D. Agrawal and M. L. Bushnell, “Transistor Sizing of Logic gates to Maximize Input Delay Variability,” J. of Low Power Electronics (JOLPE), vol. 2, pp. 121-128, 2006.


Static (Leakage) Power

• Dynamic– Signal transitions

• Logic activity• Glitches

– Short-circuit

• Static– Leakage


Leakage Power

IG

ID

Isub

IPT

IGIDL

n+ n+

GroundVDD

R


Leakage Current Components

• Subthreshold conduction, Isub

• Reverse bias pn junction conduction, ID

• Gate induced drain leakage, IGIDL due to

tunneling at the gate-drain overlap

• Drain source punchthrough, IPT due to short

channel and high drain-source voltage

• Gate tunneling, IG through thin oxide


Subthreshold Current

Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH)/nVt}

μ0: carrier surface mobility

Cox: gate oxide capacitance per unit area

L: channel lengthW: gate widthVt = kT/q: thermal voltage

n: a technology parameter


IDS for Short Channel Device

Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH+ηVDS)/nVt}

VDS = drain to source voltage

η: a proportionality factor


Increased Subthreshold Leakage

0 VTH’ VTH

Lo

g I

sub

Gate voltage

Scaled device

Ic


Reducing Leakage Power

• Leakage power as a fraction of the total power increases as clock frequency drops. Turning supply off in unused parts can save power.

• For a gate it is a small fraction of the total power; it can be significant for very large circuits.

• Scaling down features requires lowering the threshold voltage, which increases leakage power; roughly doubles with each shrinking.

• Multiple-threshold devices are used to reduce leakage power.


Problem Statement• Problem: To Design a CMOS Circuit,

– using dual-threshold devices to globally minimize subthreshold leakage

– using delay elements to eliminate all glitches– maintaining specified performance– allowing performance-power tradeoff

• Reference: Y. Lu and V. D. Agrawal, “Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for Vth Assignment and Path Balancing,” Proc. PATMOS, 2005, pp. 217-226.


MILP: Mixed Integer Linear ProgramMinimize { Σ Xi ILi + (1-Xi)IHi

all gates i

+ Σ Σ Δdij } all gates i→ j

Where Xi = 1, gate i has low Vth, low leakage = ILi

Xi = 0, gate i has high Vth, high leakage = IHi

Δdij = delay inserted between gates i and j

for glitch suppression

Xi = [0,1], is an integer, Δdij is a real variable

ILi and IHi are constants for gate i obtained by SPICE simulation


MILP - Constraints

Circuit delay constraint for each PO i: Tmax can be the delay of critical path or clock period specified by the

circuit designer.

Glitch suppression constraint for each gate i:

(1)

(2)

(3)

Constraints (1), (2) and (3) make sure that Ti - ti < di for each gate, so glitches are eliminated.

Ti is the latest signal arrival time at the output of gate i.

ti is the earliest signal arrival time at the output of gate i.

iiHiiLii

HiiLiijiji

HiiLiijiji

tTDXDX

DXDXdtt

DXDXdTT

)1(

)1(

)1(

,

,

maxTTi


Power-Delay Tradeoff Example14-Gate Full Adder (Unptimized, Tmax = Tc)

A

B

C

S

C0

Low Vth gates

Critical path

Ileak = 161 pA


Power-Delay Tradeoff Example14-Gate Full Adder (Optimized, Tmax = Tc)

A

B

C

S

C0

Low Vth

High Vth

Delay buffer (high Vth)

Critical path

Ileak = 73 pA


Power-Delay Tradeoff Example14-Gate Full Adder (Optimized, Tmax =

1.25Tc)A

B

C

S

C0

Low Vth

High Vth

Delay buffer (high Vth)

Critical path

Ileak = 16 pA


Leakage Reduction and Performance Tradeoff @ 27 , 70nm℃

Circuit#

gates

CriticalPathDelay

Tc (ns)

UnoptimizedIleak (μA)

Optimized Ileak (μA)

(Tmax= Tc )

LeakageReduction

Sun OS 5.7 CPUsecs.

Optimized Ileak (μA)

(Tmax=

1.25Tc )

Leakage Reduction

SunOS 5.7CPUsecs.

C432 160 0.751 2.620 1.022 61.0% 0.42 0.132 95.0% 0.3

C499 182 0.391 4.293 3.464 19.3% 0.08 0.225 94.8% 1.8

C880 328 0.672 4.406 0.524 88.1% 0.24 0.153 96.5% 0.3

C1355 214 0.403 4.388 3.290 25.0% 0.1 0.294 93.3% 2.1

C1908 319 0.573 6.023 2.023 66.4% 59 0.204 96.6% 1.3

C2670 362 1.263 5.925 0.659 90.4% 0.38 0.125 97.9% 0.16

C3540 1097 1.748 15.622 0.972 93.8% 3.9 0.319 98.0% 0.74

C5315 1165 1.589 19.332 2.505 87.1% 140 0.395 98.0% 0.71

C6288 1189 2.177 23.142 6.075 73.8% 277 0.678 97.1% 7.48

C7552 1046 1.915 22.043 0.872 96.0% 1.1 0.445 98.0% 0.58


Leakage, Dynamic and Total Power Comparison @ 90 , 70nm℃

Circuit#

Gates

Leakage Power Dynamic Power Total Power

Pleak1*

(uW)

Pleak2*

(uW)Leakage

Reduction Pdyn1*

(uW)

Pdyn2*

(uW)Dynamic

Reduction Ptotal1*

(uW)

Ptotal2*

(uW)Total

Reduction

C432 160 35.77 11.87 66.8% 101.0 73.3 27.4% 136.8 85.2 37.7%

C499 182 50.36 39.94 20.7% 225.7 160.3 29.0% 276.1 200.2 27.5%

C880 328 85.21 11.05 87.0% 177.3 128.0 27.8% 262.5 139.1 47.0%

C1355 214 54.12 39.96 26.3% 293.3 165.7 43.5% 347.4 205.7 40.8%

C1908 319 92.17 29.69 67.8% 254.9 197.7 22.4% 347.1 227.4 34.5%

C2670 362 115.4 11.32 90.2% 128.6 100.8 21.6% 244.0 112.1 54.1%

C3540 1097 302.8 17.98 94.1% 333.2 228.1 31.5% 636.0 246.1 61.3%

C5315 1165 421.1 49.79 88.2% 465.5 304.3 34.6% 886.6 354.1 60.1%

C6288 1189 388.5 97.17 75.0% 1691.2 405.6 76.0% 2079.7 502.8 75.8%

C7552 1046 444.4 18.75 95.8% 380.9 227.8 40.2% 825.3 246.6 70.1%

* 1: unoptimized circuits; 2: optimized circuits.


Low-Power System Design• State encoding

– Bus encoding– Finite state machine

• Clock gating– Flip-flop– Shift register

• Microprocessors– Single processor– Multi-core processor


Bus Encoding• Example: Four bit bus

• 0000→1110 has three transitions.• If bits of second pattern are inverted, then 0000→0001 will

have only one transition.

• Bit-inversion encoding for N-bit bus:

Number of bit transitions0 N/2 N

N

N/2

0Nu

mb

er

of b

it tr

an

sitio

ns

afte

r in

vers

ion

en

cod

ing


Bus-Inversion Encoding Logic

Polarity decision

logic

Se

nt d

ata

Re

ceiv

ed

da

ta

Bus register

Polarity bit

M. Stan and W. Burleson, “Bus-InvertCoding for Low Power I/O,” IEEE Trans.VLSI Systems, vol. 3, no. 1, pp. 49-58,March 1995.


FSM State Encoding

11

01000.1

0.10.4

0.3

0.6 0.9

0.6

01

11000.1

0.10.4

0.3

0.6 0.9

0.6

Expected number of state-bit transitions:

2(0.3+0.4) + 1(0.1+0.1) = 1.6 1(0.3+0.4+0.1) + 2(0.1) = 1.0

Transition probability based on

PI statistics

State encoding can be selected using a power-based cost function.


FSM: Clock-Gating• Moore machine: Outputs depend only on

the state variables.– If a state has a self-loop in the state transition

graph (STG), then clock can be stopped whenever a self-loop is to be executed.

Sj

SiSk

Xi/Zk

Xk/Zk

Xj/Zk

Clock can be stopped when (Xk, Sk) combination occurs.


Clock-Gating in Moore FSM

Combinational logic

LatchClock

activation logic

Flip

-flo

ps

PI

CK

PO

L. Benini and G. De Micheli,Dynamic Power Management,Boston: Springer, 1998.


Clock-Gating in Low-Power Flip-Flop

D QD

CK

C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997.


Reduced-Power Shift Register

D Q D Q D Q

D QD QD Q

D Q

D Q

D

CK(f/2)

mu

ltip

lexe

r

Output

Flip-flops are operated at full voltage and half the clock frequency.


Power Reduction in Processors

• Just about everything is used.• Hardware methods:

• Voltage reduction for dynamic power• Dual-threshold devices for leakage reduction• Clock gating, frequency reduction• Sleep mode

• Architecture:• Instruction set• hardware organization

• Software methods


SIA Roadmap for Processors (1999)Year 1999 2002 2005 2008 2011 2014

Feature size (nm) 180 130 100 70 50 35

Logic transistors/cm2 6.2M 18M 39M 84M 180M 390M

Clock (GHz) 1.25 2.1 3.5 6.0 10.0 16.9

Chip size (mm2) 340 430 520 620 750 900

Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5

High-perf. Power (W) 90 130 160 170 175 183

Source: http://www.semichips.org

http://www.semichips.org/


Power Reduction Example

• Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W• Reduce voltage to 1.5V, power (5.3x) = 4.9W• Eliminate FP, power (3x) = 1.6W• Scale 0.75→0.35μ, power (2x) = 0.8W• Reduce clock load, power (1.3x) = 0.6W• Reduce frequency 200→160MHz, power (1.25x) = 0.5W• J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC

Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996.


Low-Power Datapath Architecture• Lower supply voltage

– This slows down circuit speed– Use parallel computing to gain the speed back

• Works well when threshold voltage is also lowered.

• About 60% reduction in power obtainable.• Reference: A. P. Chandrakasan and R. W.

Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.


A Reference Datapath

Combinationallogic

OutputInputR

eg

iste

r

Re

gis

ter

CK

Supply voltage = Vref

Total capacitance switched per cycle = Cref

Clock frequency = fPower consumption: Pref = CrefVref

2f

Cref


A Parallel ArchitectureComb.Logic

Copy 1

Comb.Logic

Copy 2

Comb.Logic

Copy N

Re

gis

ter

Re

gis

ter

Re

gis

ter

Re

gis

ter

N to

1 m

ulti

ple

xer

MultiphaseClock gen. and mux

control

InputOutput

CK

f

f/N

f/N

f/N

A copy processes every Nth input, operates at reduced voltage

Supply voltage:VN ≤ V1 = Vref

N = Deg. of parallelism


Control Signals, N = 4

CK

Phase 1

Phase 2

Phase 3

Phase 4


PowerPN = Pproc + Poverhead

Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN

2f

= (Cinreg+ Ccomb+Coutreg)VN2f

= CrefVN2f

Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN

2f

PN = [1 + δ(N – 1)]CrefVN2f

PN VN2

── = [1 + δ(N – 1)] ───P1 Vref

2


Voltage vs. Speed CLVref CLVref

Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2

where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltage

Supply voltage

No

rma

lize

d g

ate

de

lay,

T

4.0

3.0

2.0

1.0

0.0 Vt Vref =5VV2=2.9V

N=1

N=2

V3

N=31.2μ CMOS Voltage reduction

slows down as we get closer to Vt


Increasing Multiprocessing

PN/P1

1 2 3 4 5 6 7 8 9 10 11 12

1.0

0.8

0.6

0.4

0.2

0.0

Vt=0V (extreme case)

Vt=0.4V

Vt=0.8V

N

1.2μ CMOS, Vref = 5V


Extreme Cases: Vt = 0Delay, T α 1/ Vref

For N processing elements, delay = NT → VN = Vref/N

PN 1── = [1+ δ (N – 1)] ── → 1/NP1 N2

For negligible overhead, δ→0

PN 1── ≈ ──P1 N2

For Vt > 0, power reduction is less and there will be an optimum value of N.


Example: Multiplier Core

• Specification:• 200MHz Clock• 15W dissipation @ 5V• Low voltage operation, VDD ≥ 1.5 volts

(VDD – 0.5)2

Relative clock rate = ─────── 20.25

• Problem:• Integrate multiplier core on a SOC• Power budget for multiplier ~ 5W


A Multicore Design

MultiplierCore 1

MultiplierCore 5

Reg

RegR

egR

eg

5 to

1 m

ux

MultiphaseClock gen.

and muxcontrol

Input

Output

200MHzCK

200MHz

40MHz

40MHz

40MHz

MultiplierCore 2

Core clock frequency = 200/N, N should divide 200.


How Many Cores?

• For N cores:• clock frequency = 200/N MHz

• Supply voltage, VDDN= 0.5 + (20.25/N)1/2 Volts

• Assuming 10% overhead per core, VDDN

Power dissipation =15 [1 + 0.1(N – 1)] (───)2

watts 5


Design TradeoffsNumber of cores

NClock (MHz)

Core supply VDDN (Volts)

Total Power

(Watts)

1 200 5.00 15.0

2 100 3.68 8.94

4 50 2.75 5.90

5 40 2.51 5.29

8 25 2.10 4.50


Pipeline Architecture

Processor

f

Input Output

Re

gis

ter

½Proc.

f

Input Output

Re

gis

ter

½Proc.

Re

gis

ter

Capacitance = CVoltage = VFrequency = fPower = CV2f

Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f


Approximate Trend n-parallel proc. n-stage pipeline proc.

Capacitance nC C

Voltage V/n V/n

Frequency f/n f

Power CV2f/n2 CV2f/n2

Chip area n times 10-20% increase

G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.


Multicore Processors

2000 2004 2008

Per

form

ance

bas

ed o

nS

PE

Cin

t200

0 an

d S

PE

Cfp

2000

ben

chm

arks

Multicore

Single core

Computer, May 2005, p. 12


Multicore Processors

• D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005.

• A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors.

• S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.


Cell - Cell Broadband Engine Architecture

L to RAtsushi Kameyama, ToshibaJames Kahle, IBMMasakazu Suzoki, Sony

© I

EE

E S

pe

ctru

m,

Jan

ua

ry 2

00

6

Nine-processor chip:192 Gflops


Cell’s Nine-Processor Chip

© IEEE Spectrum, January 2006 Eight IdenticalProcessors f = 5.6GHz (max)44.8 Gflops


Books on Low-Power Design (1) • L. Benini and G. De Micheli, Dynamic Power Management Design Techniques and

CAD Tools, Boston: Springer, 1998.• T. D. Burd and R. A. Brodersen, Energy Efficient Microprocessor Design, Boston:

Springer, 2002.• A. Chandrakasan and R. Brodersen, Low-Power Digital CMOS Design, Boston:

Springer, 1995.• A. Chandrakasan and R. Brodersen, Low-Power CMOS Design, New York: IEEE

Press, 1998.• J.-M. Chang and M. Pedram, Power Optimization and Synthesis at Behavioral

and System Levels using Formal Methods, Boston: Springer, 1999.• M. S. Elrabaa, I. S. Abu-Khater and M. I. Elmasry, Advanced Low-Power Digital

Circuit Techniques, Boston: Springer, 1997.• R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum

Publishers, 2002.• S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Boston:

Springer, 1998.• J. B. Kuo and J.-H. Lou, Low-Voltage CMOS VLSI Circuits, New York: Wiley-

Interscience, 1999.• J. Monteiro and S. Devadas, Computer-Aided Design Techniques for Low Power

Sequential Logic Circuits, Boston: Springer, 1997.• S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS

Technologies, Boston: Springer, 2005.• W. Nebel and J. Mermet, Low Power Design in Deep Submicron Electronics,

Boston: Springer, 1997.


Books on Low-Power Design (2)• N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits,

Boston: Springer, 2003.• V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic and N. Nedovic, Digital System

Clocking: High Performance and Low-Power Aspects, Wiley-IEEE, 2005.• M. Pedram and J. M. Rabaey, Power Aware Design Methodologies, Boston:

Springer, 2002.• C. Piguet, Low-Power Electronics Design, Boca Raton: Florida: CRC Press, 2005.• J. M. Rabaey and M. Pedram, Low Power Design Methodologies, Boston:

Springer, 1996.• S. Roudy, P. K. Wright and J. M. Rabaey, Energy Scavenging for Wireless Sensor

Networks, Boston: Springer, 2003.• K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design, New York: Wiley-

Interscience, 2000.• E. Sánchez-Sinencio and A. G. Andreaou, Low-Voltage/Low-Power Integrated

Circuits and Systems – Low-Voltage Mixed-Signal Circuits, New York: IEEE Press, 1999.

• W. A. Serdijn, Low-Voltage Low-Power Analog Integrated Circuits, Boston:Springer, 1995.

• S. Sheng and R. W. Brodersen, Low-Power Wireless Communications: A Wideband CDMA System Design, Boston: Springer, 1998.

• G. Verghese and J. M. Rabaey, Low-Energy FPGAs, Boston: springer, 2001.• G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Springer, 1998.• K.-S. Yeo and K. Roy, Low-Voltage Low-Power Subsystems, McGraw Hill, 2004.


Other Books Useful in Low-Power Design

• A. Chandrakasan, W. J. Bowhill and F. Fox, Design of High-Performance Microprocessor Circuits, New York: IEEE Press, 2001.

• N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005.

• S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits, New York: McGraw-Hill, 1996.

• E. Larsson, Introduction to Advanced System-on-Chip Test Design and Optimization, Springer, 2005.

• J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Second Edition, Upper Saddle River, New Jersey: Prentice-Hall, 2003.

• J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, New York: IEEE Press, 2004.

Documents

August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and