18
Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering , Arizona State University 2 Computer Science Engineering , Arizona State University Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors

Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Embed Size (px)

Citation preview

Page 1: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Vinay Hanumaiah1 and Sarma Vrudhula2

1Electrical Engineering , Arizona State University2Computer Science Engineering , Arizona State University

Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors

Page 2: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

DTM and Reliability

• High temperature greatly degrades reliability

• high peak temperature

• large no. of thermal cycles

• 10°C – 15°C increase reduces reliability by half

• Multi-cores have large temporal and spatial thermal variations

• higher gradients higher reliability degradation

• requires invoking DTM more often

• DTM allows complex objectives and granular control

Page 3: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Related Work

3

• Effects of temperature on reliability

• Coskun:Sigmetrics’07

• Lu:IEEEMICRO’05

• Min. peak temperature with deadline constraints

• Chantem:DATE’08 (many-core, task allocation),

• Jayaseelan:ICCAD’08 (single, task sequence)

• Maximize throughput

• Wang:ECRTS’06 (thermal, timing, single-core)

• Murali:CODES’07 (thermal, no deadlines, many-core)

Page 4: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

What is our Contribution?

4

Determine optimal speed profile

•For many core processor

•Minimize peak temperature

•Satisfy task deadlines,

• while considering start times

• include leakage dependence on temperature

Page 5: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Power and Thermal Model

5

Full HotSpot model Simplified thermal model

• ignores lateral resistance • ignores die capacitances• Lumped package• < 6% loss in accuracy• required for analytical analysis

Page 6: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Problem Formulation

6

ObjectiveFind cores speed profile that minimizes peak temperature

Given n tasks, instruction length, power profile n cores, RC thermal model

ConstraintsStart times and deadlines

Assumptions Independent and non-identical threads One thread per core Simplified thermal model

Page 7: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

7

Solution Outline

• Step 1 – Find parametric optimal speed profile [Hanumaiah:DATE’09]

• Fixed maximum temperature

• No deadlines

• Step 2 – find Parameters in Step 1 for every slot

• To satisfy task deadlines for given initial package temperature

Page 8: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Solution Outline - contd

8

• Step 3 – For every slot

• find initial package temperature to satisfy start times

• also determine global min peak temperature

Page 9: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Step1: Fixed max. temp., no deadlines

9

Page 10: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Step 2: Fixed max. temp., with deadlines

10

Need for Step 2

Page 11: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

• Find the total power PT for corresponding Tpkg

• Find optimal speed profile for the critical task

• Determine Tpkg over the slot

Step 2: Fixed max. temp., with deadlines

Page 12: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Step 2: Power allocation scheme

• Let tsched = unit scheduling interval

• Determine approx. dTpkg(tsched)/dt

• Find corresponding PT (tsched)

• PT (tsched) = PT (tsched) – Pcritical (tsched)

• Sort tasks according to nearest deadline

• Allocate max. power Pmax,i (tsched) to the earliest task

• PT (tsched) = PT (tsched) – Pmax,i (tsched)

• Continue until PT (tsched) =0

Page 13: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Step 3: Satisfy Start Times

• Instruction completed in each slot is monotonic

• with initial package temperature of slots

• with the maximum temperature

• Can be solved optimally as quasiconcave (monotonic) optimization

Page 14: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Experimental Setup

14

• Multicore version of Alpha 21264

• HotSpot – thermal model, PTScalar – power model

• SPEC benchmarks

• Dynamic power – 230 W, leakage power – 60 W

• Scheduling interval – 10 ms

Page 15: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Trade-off: Peak Temperature vs Deadlines

15

Relaxed deadlines Tight deadlines

Page 16: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Optimal Policy vs Min. Makespan Policy

16

Opt. policy - relaxed deadlines Min. makespan

Page 17: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Discretization of Optimal Policy

17

Continuous versionDiscrete version

8 speeds

Page 18: Vinay Hanumaiah 1 and Sarma Vrudhula 2 1 Electrical Engineering, Arizona State University 2 Computer Science Engineering, Arizona State University Reliability-aware

Summary

• Proposed reliability-aware transient speed policy

• Minimizes peak temperature

• Satisfies task deadlines and start times

• Includes accurate power and thermal models

• Optimal trade-off of peak temperature with deadlines

• Incorporated in Magma simulator

• Fast, accurate thermal-aware architectural simulator

• Available as open source at http://vrudhula.lab.asu.edu/magma/

18