Upload
paytah
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Power Aware Microprocessors. Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University - PowerPoint PPT Presentation
Citation preview
11/15/05 ELEC 5970-001/6970-001 Lecture 19 1
ELEC 5970-001/6970-001(Fall 2005)Special Topics in Electrical EngineeringLow-Power Design of Electronic Circuits
Power Aware Microprocessors
Vishwani D. AgrawalJames J. Danaher Professor
Department of Electrical and Computer EngineeringAuburn University
http://www.eng.auburn.edu/[email protected]
11/15/05 ELEC 5970-001/6970-001 Lecture 19 2
SIA Roadmap for Processors (1999)Year 1999 2002 2005 2008 2011 2014
Feature size (nm) 180 130 100 70 50 35
Logic transistors/cm2 6.2M 18M 39M 84M 180M 390M
Clock (GHz) 1.25 2.1 3.5 6.0 10.0 16.9
Chip size (mm2) 340 430 520 620 750 900
Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5
High-perf. Power (W) 90 130 160 170 175 183
Source: http://www.semichips.org
11/15/05 ELEC 5970-001/6970-001 Lecture 19 3
Power Reduction in Processors
• Just about everything is used.• Hardware methods:
• Voltage reduction for dynamic power• Dual-threshold devices for leakage reduction• Clock gating, frequency reduction• Sleep mode
• Architecture:• Instruction set• hardware organization
• Software methods
11/15/05 ELEC 5970-001/6970-001 Lecture 19 4
SPEC CPU2000 Benchmarks• Twelve integer and 14 floating point
programs, CINT2000 and CFP2000.
• Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor.
• CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 5
Reference CPU s: Sun Ultra 5_10 300MHz Processor
0
500
1000
1500
2000
2500
3000
3500g
zip
vp
rg
cc
mc
fc
raft
yp
ars
er
eo
np
erl
bm
kg
ap
vo
rte
xb
zip
2tw
olf
wu
pw
ise
sw
imm
gri
da
pp
lum
es
ag
alg
el
art
eq
ua
ke
fac
ere
ca
mm
plu
ca
sfm
a3
ds
ixtr
ac
ka
ps
i
CINT2000
CFP2000
11/15/05 ELEC 5970-001/6970-001 Lecture 19 6
CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard)
0
500
1000
1500
2000
2500g
zip
vpr
gcc
mcf
craf
ty
par
ser
eon
per
lbm
k
gap
vort
ex
bzi
p2
two
lf
Base ratio
Opt. ratio
SPECint2000_base = 1341SPECint2000 = 1389
Source: www.spec.org
11/15/05 ELEC 5970-001/6970-001 Lecture 19 7
Two Benchmark Results
• Baseline: A uniform configuration not optimized for specific program:
• Same compiler with same settings and flags used for all benchmarks
• Other restrictions
• Peak: Run is optimized for obtaining the peak performance for each benchmark program.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 8
CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard)
0
500
1000
1500
2000
2500
3000w
up
wis
esw
im
mg
rid
app
lum
esa
gal
gel art
equ
ake
face
rec
amm
plu
cas
fma3
dsi
xtra
ck
apsi
Base ratio
Opt. ratio
SPECfp2000_base = 1627SPECfp2000 = 1630
Source: www.spec.org
11/15/05 ELEC 5970-001/6970-001 Lecture 19 9
CINT2000: 1.7GHz Pentium 4(D850MD Motherboard)
0100200300400500600700800900
1000g
zip
vpr
gcc
mcf
craf
ty
par
ser
eon
per
lbm
k
gap
vort
ex
bzi
p2
two
lf
Base ratio
Opt. ratio
SPECint2000_base = 579SPECint2000 = 588
Source: www.spec.org
11/15/05 ELEC 5970-001/6970-001 Lecture 19 10
CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard)
0
200
400
600
800
1000
1200
1400w
up
wis
esw
im
mg
rid
app
lum
esa
gal
gel art
equ
ake
face
rec
amm
plu
cas
fma3
dsi
xtra
ck
apsi
Base ratio
Opt. ratio
SPECfp2000_base = 648SPECfp2000 = 659
Source: www.spec.org
11/15/05 ELEC 5970-001/6970-001 Lecture 19 11
Energy SPEC Benchmarks
• Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by:
1/(Execution time)Energy efficiency = ────────────
joules consumed
11/15/05 ELEC 5970-001/6970-001 Lecture 19 12
Energy Efficiency
• Efficiency averaged on n benchmark programs:
nEfficiency = ( Π Efficiencyi )
1/n
i=1where Efficiencyi is the efficiency for program i.
• Relative efficiency:
Efficiency of a computerRelative efficiency = ─────────────────
Eff. of reference computer
11/15/05 ELEC 5970-001/6970-001 Lecture 19 13
SPEC2000 Relative Energy Efficiency
0
1
2
3
4
5
6
SP
EC
INT
20
00
SP
EC
FP
20
00
SP
EC
INT
20
00
SP
EC
FP
20
00
SP
EC
INT
20
00
SP
EC
FP
20
00
Pentium [email protected]/0.6GHz Energy-efficient procesor
Pentium [email protected] (Reference)
Pentium [email protected]
Always max. clock
Laptop adaptive clk.
Min. power min. clock
11/15/05 ELEC 5970-001/6970-001 Lecture 19 14
Voltage Scaling
• Dynamic: Reduce voltage and frequency during idle or low activity periods.
• Static: Clustered voltage scaling• Logic on non-critical path given lower voltage• 47% power reduction with 10% area increase
reported.• M. Igarashi et al., “Clustered Voltage Scaling
Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 15
Pipeline Gating• A pipeline processor uses speculative execution.
• Incorrect branch prediction results in pipeline stalls and wasted energy.
• Idea: Stop fetching instructions if a branch hazard is expected:
• If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for some k cycles.
• Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Annual International Symp. Computer Architecture, June 1998.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 16
Slack Scheduling• Application: Superscalar, out-of-order execution:
• An instruction is executed as soon as data and resources it needs become available.
• A commit unit reorders the results.
• Delay the execution of instructions whose result is not immediately needed.
• Example of RISC instructions:• add r0, r1, r2; (A)• sub r3, r4, r5; (B)• and r9, x1, r9; (C)• or r5, r9, r10; (D)• xor r2, r10, r11;(E)
J. Casmira and D. Grunwald,“Dynamic Instruction SchedulingSlack,” Proc. ACM Kool ChipsWorkshop, Dec. 2000.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 17
Slack Scheduling Example
Slack scheduling
AB C
D
E
Standard scheduling
A B C
D
E
11/15/05 ELEC 5970-001/6970-001 Lecture 19 18
Slack Scheduling
Slack bitLow-power
execution units
Re-order buffer
Sch
edul
ing
logi
c
11/15/05 ELEC 5970-001/6970-001 Lecture 19 19
Parallel Architecture
Processor
f
Processor
f/2
Processor
f/2
f
Input Output
Input
Output
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f
11/15/05 ELEC 5970-001/6970-001 Lecture 19 20
Pipeline Architecture
Processor
f
Input Output
Re
gis
ter
½Proc.
f
Input Output
Re
gis
ter
½Proc.
Re
gis
ter
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f
11/15/05 ELEC 5970-001/6970-001 Lecture 19 21
Approximate Trend n-parallel proc. n-stage pipeline proc.
Capacitance nC C
Voltage V/n V/n
Frequency f/n f
Power CV2f/n2 CV2f/n2
Chip area n times 10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 22
Clock Distribution
clock
11/15/05 ELEC 5970-001/6970-001 Lecture 19 23
Clock Power
Pclk = CLVDD2f + CLVDD
2f / λ + CLVDD2f / λ2 + . . .
stages – 1 1= CLVDD
2f Σ ─ n = 0 λn
where CL = total load capacitance
λ = constant fanout at each stage in distributionnetwork
Clock consumes about 40% of total processor power.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 24
Clock Network ExamplesAlpha 21064 Alpha 21164 Alpha 21264
Technology 0.75μ CMOS 0.5μ CMOS 0.35μ CMOS
Frequency (MHz) 200 300 600
Total capacitance 12.5nF
Clock load 3.25nF 3.75nF
Clock power 20W
Max. clock skew 200ps (<10%) 90ps
D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.
11/15/05 ELEC 5970-001/6970-001 Lecture 19 25
Power Reduction Example
• Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W• Reduce voltage to 1.5V, power (5.3x) = 4.9W• Eliminate FP, power (3x) = 1.6W• Scale 0.75→0.35μ, power (2x) = 0.8W• Reduce clock load, power (1.3x) = 0.6W• Reduce frequency 200→160MHz, power (1.25x) = 0.5W• J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996.