25
11/15/05 ELEC 5970-001/6970-001 Lectur e 19 1 ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http://www.eng.auburn.edu/~vagrawal [email protected]

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

  • Upload
    paytah

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Power Aware Microprocessors. Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University - PowerPoint PPT Presentation

Citation preview

Page 1: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 1

ELEC 5970-001/6970-001(Fall 2005)Special Topics in Electrical EngineeringLow-Power Design of Electronic Circuits

Power Aware Microprocessors

Vishwani D. AgrawalJames J. Danaher Professor

Department of Electrical and Computer EngineeringAuburn University

http://www.eng.auburn.edu/[email protected]

Page 2: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 2

SIA Roadmap for Processors (1999)Year 1999 2002 2005 2008 2011 2014

Feature size (nm) 180 130 100 70 50 35

Logic transistors/cm2 6.2M 18M 39M 84M 180M 390M

Clock (GHz) 1.25 2.1 3.5 6.0 10.0 16.9

Chip size (mm2) 340 430 520 620 750 900

Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5

High-perf. Power (W) 90 130 160 170 175 183

Source: http://www.semichips.org

Page 3: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 3

Power Reduction in Processors

• Just about everything is used.• Hardware methods:

• Voltage reduction for dynamic power• Dual-threshold devices for leakage reduction• Clock gating, frequency reduction• Sleep mode

• Architecture:• Instruction set• hardware organization

• Software methods

Page 4: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 4

SPEC CPU2000 Benchmarks• Twelve integer and 14 floating point

programs, CINT2000 and CFP2000.

• Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor.

• CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios.

Page 5: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 5

Reference CPU s: Sun Ultra 5_10 300MHz Processor

0

500

1000

1500

2000

2500

3000

3500g

zip

vp

rg

cc

mc

fc

raft

yp

ars

er

eo

np

erl

bm

kg

ap

vo

rte

xb

zip

2tw

olf

wu

pw

ise

sw

imm

gri

da

pp

lum

es

ag

alg

el

art

eq

ua

ke

fac

ere

ca

mm

plu

ca

sfm

a3

ds

ixtr

ac

ka

ps

i

CINT2000

CFP2000

Page 6: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 6

CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard)

0

500

1000

1500

2000

2500g

zip

vpr

gcc

mcf

craf

ty

par

ser

eon

per

lbm

k

gap

vort

ex

bzi

p2

two

lf

Base ratio

Opt. ratio

SPECint2000_base = 1341SPECint2000 = 1389

Source: www.spec.org

Page 7: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 7

Two Benchmark Results

• Baseline: A uniform configuration not optimized for specific program:

• Same compiler with same settings and flags used for all benchmarks

• Other restrictions

• Peak: Run is optimized for obtaining the peak performance for each benchmark program.

Page 8: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 8

CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard)

0

500

1000

1500

2000

2500

3000w

up

wis

esw

im

mg

rid

app

lum

esa

gal

gel art

equ

ake

face

rec

amm

plu

cas

fma3

dsi

xtra

ck

apsi

Base ratio

Opt. ratio

SPECfp2000_base = 1627SPECfp2000 = 1630

Source: www.spec.org

Page 9: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 9

CINT2000: 1.7GHz Pentium 4(D850MD Motherboard)

0100200300400500600700800900

1000g

zip

vpr

gcc

mcf

craf

ty

par

ser

eon

per

lbm

k

gap

vort

ex

bzi

p2

two

lf

Base ratio

Opt. ratio

SPECint2000_base = 579SPECint2000 = 588

Source: www.spec.org

Page 10: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 10

CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard)

0

200

400

600

800

1000

1200

1400w

up

wis

esw

im

mg

rid

app

lum

esa

gal

gel art

equ

ake

face

rec

amm

plu

cas

fma3

dsi

xtra

ck

apsi

Base ratio

Opt. ratio

SPECfp2000_base = 648SPECfp2000 = 659

Source: www.spec.org

Page 11: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 11

Energy SPEC Benchmarks

• Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by:

1/(Execution time)Energy efficiency = ────────────

joules consumed

Page 12: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 12

Energy Efficiency

• Efficiency averaged on n benchmark programs:

nEfficiency = ( Π Efficiencyi )

1/n

i=1where Efficiencyi is the efficiency for program i.

• Relative efficiency:

Efficiency of a computerRelative efficiency = ─────────────────

Eff. of reference computer

Page 13: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 13

SPEC2000 Relative Energy Efficiency

0

1

2

3

4

5

6

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

Pentium [email protected]/0.6GHz Energy-efficient procesor

Pentium [email protected] (Reference)

Pentium [email protected]

Always max. clock

Laptop adaptive clk.

Min. power min. clock

Page 14: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 14

Voltage Scaling

• Dynamic: Reduce voltage and frequency during idle or low activity periods.

• Static: Clustered voltage scaling• Logic on non-critical path given lower voltage• 47% power reduction with 10% area increase

reported.• M. Igarashi et al., “Clustered Voltage Scaling

Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997.

Page 15: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 15

Pipeline Gating• A pipeline processor uses speculative execution.

• Incorrect branch prediction results in pipeline stalls and wasted energy.

• Idea: Stop fetching instructions if a branch hazard is expected:

• If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for some k cycles.

• Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Annual International Symp. Computer Architecture, June 1998.

Page 16: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 16

Slack Scheduling• Application: Superscalar, out-of-order execution:

• An instruction is executed as soon as data and resources it needs become available.

• A commit unit reorders the results.

• Delay the execution of instructions whose result is not immediately needed.

• Example of RISC instructions:• add r0, r1, r2; (A)• sub r3, r4, r5; (B)• and r9, x1, r9; (C)• or r5, r9, r10; (D)• xor r2, r10, r11;(E)

J. Casmira and D. Grunwald,“Dynamic Instruction SchedulingSlack,” Proc. ACM Kool ChipsWorkshop, Dec. 2000.

Page 17: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 17

Slack Scheduling Example

Slack scheduling

AB C

D

E

Standard scheduling

A B C

D

E

Page 18: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 18

Slack Scheduling

Slack bitLow-power

execution units

Re-order buffer

Sch

edul

ing

logi

c

Page 19: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 19

Parallel Architecture

Processor

f

Processor

f/2

Processor

f/2

f

Input Output

Input

Output

Capacitance = CVoltage = VFrequency = fPower = CV2f

Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f

Page 20: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 20

Pipeline Architecture

Processor

f

Input Output

Re

gis

ter

½Proc.

f

Input Output

Re

gis

ter

½Proc.

Re

gis

ter

Capacitance = CVoltage = VFrequency = fPower = CV2f

Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f

Page 21: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 21

Approximate Trend n-parallel proc. n-stage pipeline proc.

Capacitance nC C

Voltage V/n V/n

Frequency f/n f

Power CV2f/n2 CV2f/n2

Chip area n times 10-20% increase

G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.

Page 22: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 22

Clock Distribution

clock

Page 23: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 23

Clock Power

Pclk = CLVDD2f + CLVDD

2f / λ + CLVDD2f / λ2 + . . .

stages – 1 1= CLVDD

2f Σ ─ n = 0 λn

where CL = total load capacitance

λ = constant fanout at each stage in distributionnetwork

Clock consumes about 40% of total processor power.

Page 24: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 24

Clock Network ExamplesAlpha 21064 Alpha 21164 Alpha 21264

Technology 0.75μ CMOS 0.5μ CMOS 0.35μ CMOS

Frequency (MHz) 200 300 600

Total capacitance 12.5nF

Clock load 3.25nF 3.75nF

Clock power 20W

Max. clock skew 200ps (<10%) 90ps

D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.

Page 25: Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

11/15/05 ELEC 5970-001/6970-001 Lecture 19 25

Power Reduction Example

• Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W• Reduce voltage to 1.5V, power (5.3x) = 4.9W• Eliminate FP, power (3x) = 1.6W• Scale 0.75→0.35μ, power (2x) = 0.8W• Reduce clock load, power (1.3x) = 0.6W• Reduce frequency 200→160MHz, power (1.25x) = 0.5W• J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC

Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996.