26
Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

Embed Size (px)

Citation preview

Page 1: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis

Kim Hazelwood

September 29, 2006

Page 2: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

2 of 26

Modern Computing Challenges

• Performance• Power

– Energy consumption, max instantaneous power, di/dt

• Temperature– Total heat output, “hot spots”

• Reliability– Neutron strikes, alpha particles, MTBF, design flaws

• Approaches: Circuit, microarchitecture, compiler• Constraint: Fixed HW-SW interface (e.g., x86)

Page 3: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

3 of 26

Typical Approaches• Optimize using SW or HW techniques in isolation

• Performance

– SW: Compile-time optimizations

– HW: Architectural improvements, VLSI technology

• Reliability: Code/data duplication (HW or SW)

• Power & Temperature

– HW control mechanisms

– Profile + recompile cycle HW

SW

Page 4: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

4 of 26

Modern Design Constraints

Compilers – “Compile once, run anywhere”– Cannot ship “MS Office for 1Q05 batch of Pentium-4

3GHz, > 1GB RAM, BrandX power supply, located in high altitudes…”

Microarchitecture – Limited window of application knowledge (past must predict the future)

VLSI – Guaranteed correctness, reliability

We currently must optimize for the common case (but must design for the worst case)

Page 5: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

5 of 26

The Power of Virtualization

• A HW-SW interface layer

HW

SW Applications

Binary Modifier

Initiallyx86x86

Eventually

SWI

HWI

Page 6: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

6 of 26

Dynamic Binary Modification

• Creates a modified code image at run time

EXE

Transform

CodeCache

Execute

Profile

Examples:• Dynamo (HP)• DAISY/BOA (IBM)• CMS (Transmeta) • Mojo (Microsoft)• Strata (UVa)• Pin (Intel)

Page 7: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

7 of 26

Dynamic Instrumentation Demo

• Pin– Four architectures – IA32, EM64T, IPF, XScale– Four OSes – Linux, FreeBSD, MacOS, Windows– http://rogue.colorado.edu/pin/

Page 8: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

8 of 26

Dynamic Optimization Demo

• DynamoRIO – Windows and Linux for IA32– http://www.cag.lcs.mit.edu/dynamorio/

Page 9: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

9 of 26

Dynamic Binary Modification

• Creates a modified code image at run time

• Always triggered by software events … until now

EXE

Transform

CodeCache

Execute

Profile

Examples:• Dynamo (HP)• DAISY/BOA (IBM)• CMS (Transmeta) • Mojo (Microsoft)• Strata (UVa)• Pin (Intel)

Page 10: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

10 of 26

Tortola: Symbiotic Optimization

• Enable HW/SW Communication

HW

SW Applications

Binary Modifier

Page 11: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

11 of 26

Simulation Methodology

• SimpleScalar 4.0 for x86• Wattch 1.02 power extensions• Pin dynamic instrumentation system (x86/Linux version)

HW

SW Application

Binary Modifier

Benchmarks

Pin

Wattch &

Simplescalar/x86

Page 12: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

12 of 26

Tortola Applications• Combine global program information with run-

time feedback– System-specific power usage– Application-specific heat anomalies– Workload/input specific performance optimization

• Reduce hardware complexity– No more backwards compatibility warts– Fix bugs after shipment– Reduce time to market for new architectures

• One such application: The di/dt problem

Page 13: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

13 of 26

The Di/dt Problem• Voltage stability is important for reliability,

performance• Low-power techniques have a negative side

effect: current variation

• Dips (undershoots) in supply voltage – can cause incorrect values to be calculated or stored

• Spikes (overshoots) in supply voltage – can cause reliability problems

Page 14: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

14 of 26

The Di/dt Problem

• ITRS cites noise management as a Grand Challenge for 5-10 year time frame

• Several trends are aggravating the issue:– Voltage is scaling down with technology – Current draw is increasing– Package impedance is not scaling as quickly– Aggressive clock gating causes large swings in

processor current draw (di/dt)

Page 15: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

15 of 26

Di/dt Solutions

Software

MicroArch

Circuit-Level

Compiler Optimizations

Sensor/Actuator Mechanisms

Decoupling capacitors More Vdd Gnd pins on package

Co-Designed MicroArch & SW Binary Modifier

Page 16: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

16 of 26

Sensor-Actuator Mechanisms

• On-chip voltage sensors detect abnormally high/low voltage levels

• On-chip actuator then attempts to quickly raise/lower the processor’s current draw– Phantom firing

• increases current (at the expense of power)– Resource throttling

• reduces current (at the expense of performance)

Page 17: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

17 of 26

Detecting Imminent Emergencies

1.05V

1.03V

1V

0.97V

0.95V

Op

erat

ing

V

olt

age

Ran

ge

Hard EmergencySoft

Emergency

ControlThreshold

Page 18: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

18 of 26

Targeting Mid-Frequency Di/dt

• Problematic: wide current spike

• Worst case: pulse at the resonant frequency

Su

pp

ly V

olta

ge (

V)

Pro

cess

or C

urr

ent

(A)

Time (Cycles)

Minimum Voltage

20 cycles

Su

pp

ly V

olta

ge (

V)

Pro

cess

or C

urr

ent

(A)

Time (Cycles)

Minimum Voltage

60 cycles

Maximum Voltage

*From:

Joseph et al.

HPCA-9

Page 19: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

19 of 26

A Di/dt Stressmark

BEGIN_LOOP:…ldt $f1, ($4)divt $f1, $f2, $f3divt $f3, $f2, $f3stt $f3, 8($4)ldq $7, 8($4)cmovne $31, $7, $3stq $3, $(4)stq $3, $(4)stq $3, $(4)…stq $3, $(4)…JUMP BEGIN_LOOP

Seq

uen

tial

Seq

uen

tial

Lo

w C

urr

ent

Lo

w C

urr

ent

Par

alle

lP

aral

lel

Hig

h C

urr

ent

Hig

h C

urr

ent

But…Actuator engages every loop iteration degrading performance

Why not correct the problem in the code?

Page 20: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

20 of 26

Proposed Solution

• Leverage our additional software layer to supplement existing solutions

• Microarchitecture provides feedback to our software-based virtual layer

Microprocessor

Sensor+Actuator Ext

AlteredAlteredExecutableExecutable

SW

HW

BinaryModifierExecutableExecutable

VL

Page 21: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

21 of 26

Required Investigations• Characterizing emergencies

– How often do we see di/dt emergency loops?

• Communication between the microarchitecture and the virtual layer– What information should be passed to virtual layer during

an emergency?

• Fixing di/dt via binary modification– Will existing techniques help?– New algorithms?

Page 22: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

22 of 26

Static vs. Dynamic InstancesData suggests modifying a few code sequences will eliminate

many voltage emergencies

1

10

100

1000

10000

100000

1000000

gzip

vpr

gcc

mcf

craf

ty

pars

er

eon

perl

bmk

gap

vort

ex

bzip

2

twol

f

wup

wis

e

swim ar

t

mgr

id

appl

u

mes

a

galg

el

equa

ke

face

rec

sixt

rack

amm

p

apsi

Em

erg

enci

es

Distinct Total

Page 23: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

23 of 26

Possible Compiler Optimizations

• Our goal is to– Smooth out current profile, or– Knock pulses off of the resonant frequency

• Some existing options– Software pipelining, code motion, instruction padding

Microprocessor

Sensor+Actuator Ext’ns

AlteredAlteredExecutableExecutable

BinaryModifier

ApplyApplyOptimizationsOptimizations

ExecutableExecutable

Page 24: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

24 of 26

Loop Unrolling & SW Pipelining

AABB

Iteration=1

Iteration=2

Iteration=3

Current

AABB

AABB

AABB

Software pipeliningsmoothes profile

Current

Problematicloop:

AAAABBBB

Current

Loop unrolling disrupts resonance pulse

Unrolledloop:

Page 25: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

25 of 26

Unrolling the Di/dt Stressmark

0.97V

0.98V

0.99V

1.00V

1.01V

1.02VBefore Loop Unrolling After Loop Unrolling

H

L

H1

H2

L1

L2

Page 26: Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

26 of 26

Summary

• Symbiotic program optimization is a powerful approach

• The di/dt problem – well suited for a symbiotic solution

• The Tortola design can also target power reduction, temperature reduction, reliability, etc.

http://www.tortolaproject.com/