12
PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER Guanglei Liu Department of Electrical and Computer Engineering Florida International University July 12, 2012 Major Professor: Dr. Gang Quan

PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER

Embed Size (px)

DESCRIPTION

PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER. Guanglei Liu Department of Electrical and Computer Engineering Florida International University July 12, 2012 Major Professor: Dr. Gang Quan. Thermal Design Challenges. Number of transistors keeps increasing. - PowerPoint PPT Presentation

Citation preview

PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER

Guanglei LiuDepartment of Electrical and Computer Engineering

Florida International UniversityJuly 12, 2012

Major Professor: Dr. Gang Quan

Thermal Design Challenges

Figure from Intel Microprocessor Technology Lab, 2011

Number of transistors keeps increasing

• Nearly 40 billon transistors are

integrated into single die [Mizunuma, 2009

ICCAD]

More complicated architectures are built

• 80 core single chip processor has been

demonstrated by Intel [Vangal, 2007 ISSCC]

Environmental concerns

• In U.S, 46% of electricity is generated by fossil

fuels.

Electric Bill

• U.S. Datacenters: 120 billon kilowatt

hours in 2012 • 9 billion dollar, 15% of all energy in U.S.

High transistor density increases power density

High power density brings up the on-chip temperatures and causes thermal issues

Source: Environmental Protection Agency (EPA) Report

Thermal Issues

Increase package/cooling costs

• 1-3 dollar per watt [Skadron, ICSA 2003]• Data center, each watt on computing, ½ - 1 watt for

cooling [Brill, 2007]

Affect reliability

• As much as 50% reduction of device’s life span for

every 10oC increase [Yeo, DAC 2008]

Degrade performance

• 10-15% more circuit delay for each 15oC increase [Santarini, EDN 2005]

Crush the computing system

• Processor’s self-protect mechanism automatically shuts down processor to avoid physical damage [Rohou, WFDO 1999]

Increase Leakage power consumption

• Temperature from 65oC to 110oC can increase the leakage power by 38% for IC circuits.[Santarini, EDN 2005]

Computing system cooling solutions

Mechanical Cooling Solution

Air-cooling (e.g. fan + heat sink)

• Cooling cost takes 51% of overall server power budget [Lefurgy, COM 2003]

• Noise level increases 10dB as fan speed increases by 50% [Lyon, STMMS 2004]

Liquid-cooling

• High density liquid absorb 3500 times more heat than air [Chu, DMR 2004]

High cooling cost

Dynamic Thermal Management (DTM)

• Dynamic voltage and frequency scaling

(DVFS) technique [Kim, HPCA 2008]

• Task migration [Lim QED 2002]

• Clock gating [Gunther, ITJ 2001]

• Fetch toggling [Brooks, HPCA 2001]

Sacrifice system performance

Related Theoretical Work

Our Research Goal: To develop up a practical hardware platform that enables us to investigate the limitations of the existing theoretical work, and develop practical and effective DTM techniques to accommodate those limitations

Those theoretical work are derived based on simplified mathematical

thermal models and idealized assumptions

Thermal-aware throughput maximization

[Chantem et al., ISLPED 2009][Zhang et al., ICCAD 2007][Chatha et al., DAC 2010]

Peak temperature minimization

[Chaturvedi et al., ASPDAC 2011][Liu et al., RTAS 2010]

[Qiu et al., ICESS 2010]

Overall energy reduction under peak temperature constraints

[Bao et al., DATE 2010][Andrei et al., DAC 2009][Huang et al., DATE 2011]

Real-time guarantee under peak temperature constraint

[Chaturvedi et al., CIT 2010][Wang et al., RTS 2006]

[Huang et al., RTSS 2009]

Thermal management validation

[SUSCOM 2012]

• DTM techniques VS air-cooling• DTM vs DPM algorithm•Fundamental DTM principles validation

Reactive DTMSingle-core

•Limitations of theoretical works• Non-constant sampling period• Thermal profiling analysis

[GreenCom 2012]

Major contributions

Practical hardware platform

• Intel i5 Quad core• Linux operating system

[SouthEast 2011]

Proactive DTM algorithmMulti-core

[DATE 2012] [ASP2012]

• Neighbor-aware temperature prediction• Algorithm for multicore with task migration

Practical Hardware Platform

CoreTemp driver

Read on-chip thermal sensor

Lm-sensors Tool

Monitor system information

Cpufreq module

12 different speed levels

Fancontrol shell script

Manually adjust fan speed

Intel i5 quad coreTemperature

capturing

SPEC Benchmark

DVFS Technique

Fan Speed Control

Computing system hardware monitoring tool

Temperature value

Fan Speed Voltage

value

Fan control

DVFS technique

DVFS techniqueDVFS technique

Power measurem

ent

Task migration

CPU_affinity module

Migrate process between cores

Dell Precision T1500 workstation

Linux kernel version of 2.6.23

SPEC CPU2000 Benchmark

Integers and floating point operations

Fluke current clamp, Multimeter

Cooling/ CPU power consumption

Our Approach

Enhanced reactive DTM (ERDTM)

Build up a temperature vs. speed lookup table Run benchmarks with different speed

levels Collect corresponding peak

temperatures

Offline thermal profiling analysis

Buffer zone and safe region

Buffer zone:

Safe region:

Time

Temperature

Safe region

Buffer zoneTsafe

TTURESHOLD

is maximum possible temperature

increment 4oC

Experimental results

Four identical tasks assigned to four cores to simulate single-core environment

Temperature threshold is 55oC Construct the lookup table offline

Frequency lookup table

Experiment setup

FSDTM algorithm VS-DTM algorithm ERDTM algorithm

Number of violations

87 Number of violations

12Number of violations

0

DTM algorithm Performance evaluation

galgel ammp lucas equake vpr gcc parser crafty0.96

0.98

1

1.02

1.04

1.06

1.08

1.1FSDTM VS-DTM ERDTM

SPEC CPU2000 Benchmark

Thro

ughp

ut (%

)

ERDTM average throughput improvement is 8.1%

Neighbor-aware temperature prediction

Our Neighbor-aware prediction

where and are weights, which are obtained by collecting training data

Obtained offline

Individual increment factor

Processor temperature increment

Neighbor increment factor

Heat transfer from neighbor processor

Training process

Apply least-square estimation

Run the tasks and record temperature information

Neighbor-aware Task Migration

Always migrate task from hottest core to

the coolest core.

Conventional approach:

NADTM Algorithm

Predict thermal emergency

Migrate task

DVFS technique

Heat factor: to evaluate the processor hotness

Increasing factor: to evaluate the temperature increment

Our migration strategy

choose the migration candidate with the minimum

Performance analysis

Single task Multiple task

NADTM algorithm can effectively control the temperature under the threshold

It has a small temperature oscillation of 1oC

An average of 3.6% overall throughput

improvement

An average of 5.8% overall throughput

improvement

Thank You for Your Attention !

Journals

Peer Reviewed Conferences

1. Guanglei Liu, M. Fan, G. Quan, M. Qiu “On-Line Predictive Thermal Management under Peak Temperature Constraints for Practical Multi-core Platforms”, Journal of Low Power Electronics (ASP). (under review), 2012.

2. Guanglei Liu, G. Quan, M. Qiu “Practical Dynamic Thermal Management on An Intel Desktop Computer ” , Embedded Software Design, Journal of Sustainable Computing (SUSCOM) (under review), 2012.

3. H. Huang, V. Chaturvedi, Guanglei Liu, G. Quan, ”Leakage Aware Scheduling On Maximum Temperature Minimization For Periodic Hard Real-Time Systems”, Journal of Low Power Electronics (ASP), 2012.

1. Guanglei Liu, M. Fan, G. Quan, “Neighbor-Aware Dynamic Thermal Management for Multi-core Platform”, The 15th Design, Automation, and Test in Europe (DATE 2012), Dresden, Germany, March 12-16, 2012.

2. Guanglei Liu, G. Quan, M. Qiu, “The Practical On-line Scheduling for Throughput Maximization on Intel Desktop Platform under the Maximum Temperature Constraint“, The 2011 IEEE/ACM Green Computing and Communications (GreenCom 2011), Sichuan, China, August 4-5, 2011.

3. Guanglei Liu, G. Quan, ”Thermal Aware Scheduling on an Intel Desktop Computer,” IEEE SouthEast Conference (SouthEast 2011), Nashville, Tennessee, March 17-20, 2011.

4. Guanglei Liu, J. Fan, “Framework for Statistical Analysis of Homogeneous Multi- core Power Grid Networks“, IEEE 8th International Conference on ASIC (ASICON 2009), Changsha, China, October 20-23, 2009.

5. C. Liu, J. Tan, R. Chen, Guanglei Liu, J. Fan, “Thermal Aware Clocktree Optimization in Nanometer VLSI Systems Considering Temperature Variations“, IEEE 40th Southeastern Symposium on System Theory (SSST 2008), New Orleans, LA, March 17-18, 2008.