11
Thermal Aware Data Management in Cloud based Data Centers Ling Liu College of Computing Georgia Institute of Technology SEEDM workshop, May 2-3, 2011

Thermal Aware Data Management in Cloud based Data Centers Ling Liu College of Computing Georgia Institute of Technology NSF SEEDM workshop, May 2-3, 2011

Embed Size (px)

Citation preview

Thermal Aware Data Management in Cloud based Data Centers

Ling LiuCollege of Computing

Georgia Institute of Technology

NSF SEEDM workshop, May 2-3, 2011

Thermal aware Computing Era

• Power density increases– Circuit density increases by a factor of 3 every 2 years– Energy efficiency increases by a factor of 2 every 2 years– Effective power density increases by a factor of 1.5 every 2 years

[Keneth Brill: The Invisible Crisis in the Data Center]

• Maintenance/TCO rising– Data Center TCO doubles every three years– Three-year cost of electricity exceeds the purchase cost of the server– Virtualization/Consolidation is a 1-time/short term solution

[Uptime Institute]

• Thermal management corresponds to an increasing portion of expenses– Thermal-aware computing and management solutions becoming prominent

– Increasing need for thermal awareness

[VarsamopoulosGupta 2008]

Thermal aware Task Scheduling in Data Centers

• Given a total task C, how to divide it among N server nodes to finish computing task with minimal cooling energy cost ?

• Self-Interference and cross-interference lead to the temperature rise of inlet air, should be minimized

• Environment interference(room temperature) is not critical• Task scheduling in spatial domain

[VarsamopoulosGupta 2008]

Cooling Cost aware Scheduling

[VarsamopoulosGupta-2008]

Energy Saving by Dynamic Load Distribution

Increasing the range of changes in the rack heat load

• Heat load distribution of [30 kW, 5 kW, 5 kW, 20 kW] in the case study only needs 1.7 m/s (9,726 CFM) cooling air flow

• It is 19% less than the uniform distribution needs

• This could save ~$189,000 annually in typical real world data centers

[15,15,15,15] kW with 2.1 m/s [30,5,5,20] kW with 1.7 m/s

Temperature Contours Around Racks:

[Yogendra Joshi, Georgia Tech/CERCS]

Think Globally, Act Locally

Numerically

Run simulations for a range of

velocities

Make a server heat load-Inlet T variation matrix

Change in max. inlet T of servers

Unit change in server loads

S1 S2 Sn

S1

S2

Sn

Experimentally

Vary the heat loads sequentially

at servers for a chosen unit cell and monitor the

max. server inlet T

Advantage:

The simulations run for different velocities are not required for the experimental approach.

Modifications:

Blocks of servers can be identified with same effect or no effect on the inlet T.

• This will give insights on the sparsity of this matrix.

• Reduce the computational work.

A Matrix

n

iil

1

max

..ts crT TlA

maxmin lll Where,

server I load

Minimum load (startup)

Max. load (full utilization)

Max. inlet T allowed by ASHRAE

n

iil

1

max

crT TlA

maxmin lll maxmin lll

[Yogendra Joshi, Georgia Tech/CERCS] ]

68% increase in allowed heat dissipation

(For the same CRAC velocity)

37.5% decrease in Facilities Energy Consumption (For the same heat

dissipation)

An Example

288

293

298

303

308

313

318

323

328

Max.

Inle

t T

at

Serv

ers

(K

)

AILM: 0.8-7.5kWserver range - A rack

AILM: 0.8-7.5kWserver range - B Rack

Uniform: 5kW serverload - A Rack

Uniform: 5kW serverload - B Rack

SafeTemperature

Limit

11 141312 15 4116 21 3122 23 2524 26

Total Data Center Load Dissipation

298kW

297kW

VCRAC = 5m/s

11 41

16 46

[Yogendra Joshi, Georgia Tech/CERCS]

Pertinence of Thermal Maps in Data Center Management

• Given an equipment utilization layout, find the temperature around the room

• Create a collection of thermal maps or a function to “predict” thermal behavior of a task assignment

• Use collection to decide on job placement (temporally and spatially)

[VarsamopoulosGupta 2008]

Thermal-awareData Management

[Adapted from VarsamopoulosGupta 2008]

Thermal aware data management

• Task profiling – CPU utilization, I/O activity

etc• Equipment power profiling

– CPU consumption, disk consumption etc

• Heat recirculation modeling• Task management technologies

Need for a comprehensive research framework