21
Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting

Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting

Embed Size (px)

Citation preview

Walter Binder

University of Lugano, Switzerland

Niranjan Suri

IHMC, Florida, USA

Green Computing: Energy Consumption

Optimized Service Hosting

2

009

-01

-26

2

Motivation

• Data centers are becoming ubiquitous

Large installations of computer systems

Providing critical services

• Data centers are big power consumers

Continuously operating computers,

regardless of the load

Cooling

2

009

-01

-26

3

Reducing Power Consumption

• Green Grid consortium advocates data center design and management to improve energy efficiency

• Right-sizing data centers at design time

• Energy-efficient cooling

• Virtualization (multiple servers on same physical machine)

• Processor power saving (e.g., clock rate depending on load)

• Powering down unused machinesComputers with dedicated roles (e.g., computers performing backups)

2

009

-01

-26

4

Our Approach

• Load on machines varies over time

• Turn off subset of unnecessary machines, respectively restart machines according to load

• Problems

Load is distributed over multiple machines

Load reduction typically also distributed across multiple machines

Need to consolidate load on a subset of machines in order to free up machines that can be turned off

• Goal: Minimum number of machines running

• Constraint: QoS must be ensured

Service-Level Agreements (SLAs) must not be violated

2

009

-01

-26

5

Example

0%

100%

A B n

Heavy Load

0%

100%

A B n

Light Load (Evenly Distributed)

0%

100%

A B n

Light Load (Consolidated)

Shutdown Idle Servers

2

009

-01

-26

6

Service Types

• Hosting environment may offer multiple service types

• Service type consists ofService interface

SLA defining QoS parameters

• SLA parameters specified according to a common ontology

WS-Agreement, WSLA, SLAng, etc.

Here: Single QoS parameter: Response time

2

009

-01

-26

7

Stateless versus Stateful Services

• Stateless service:

Requests are independent

After completing all pending requests, a stateless service may be stopped

• Stateful service:

Requests in one session may depend on prior requests in the same session

Sessions may be explicitly terminated by clients, or expire after some period of inactivity

After termination of all sessions, a stateful service may be stopped

2

009

-01

-26

8

Hosting Environment (1)

• Dedicated machines for three different purposes:

File servers

• Provide all data sources

Compute servers

• Execute service requests

Dispatchers

• Receive service requests and choose compute servers to handle them

• Decide on shutdown and restart of compute servers

• Dispatchers and file servers are continuously running

• Only idle compute servers may be shut down

2

009

-01

-26

9

Hosting Environment (2)

Compute servers File serversDispatcherClients

requests dispatchdata

access

2

009

-01

-26

10

Hosting Environment (3)

• Heterogeneous environment

Machines have different computing resources

• Dynamically changing environment

New machines may be added

Cores may fail

• Compute servers may host any number of service types, and a service type may be hosted by any number of compute servers

• Compute servers are ranked according to energy efficiency

2

009

-01

-26

11

Node Manager

• Each compute server runs a Node Manager component

• Monitors idle time and average response time for each service type

• Communicates measurements to dispatcher

• Handles server shutdown upon request from dispatcher

• Notifies dispatcher upon startup

2

009

-01

-26

12

Shutdown of Compute Severs

• Dispatcher notifies Node Manager on compute server to prepare shutdown

• No further service requests are dispatched to the compute server

• Node Manager waits for

Completion of all previously accepted requests

Termination of all active sessions

• Alternative: Migration of sessions

2

009

-01

-26

13

Shutdown Options

• Complete shutdown

No power consumption

Ensures clean state upon restart (e.g., no memory leaks)

Slow restart

• Hibernation

No power consumption

Memory saved on persistent storage

Resume by reloading memory snapshot

• Standby

Reduced power consumption

Processor stopped, but memory remains active

Fast restart

2

009

-01

-26

14

Restart of Compute Servers

• Wake on LAN

• Magic packet is broadcast to LAN

Special header: 0xFF repeated 6 times

MAC address of the machine to restart

• Dispatcher initiates compute server restart

• Node Manager notifies dispatcher of completed restart

• Dispatcher needs to know MAC addresses of all compute servers

2

009

-01

-26

15

Service Dispatch: Definitions

• n compute servers <s1,…,sn>

• Sorted according to energy efficiency

sx more energy efficient than sy x < y

• In each configuration

s1 … sr are running (1 ≤ r ≤ n)

sr … sn are shut down

(or in the process of shutting down)

• pT(i): probability that request for service type T is dispatched to s i

2

009

-01

-26

16

Service Dispatch upon Request

• Take a random number z

(0 ≤ z ≤ 1; uniform distribution)

• Choose sc such that

c = min { i: (1 ≤ i ≤ n) &&

(z ≤ sum(1; i; pT(i))) }

• Related to lottery scheduling

Tickets instead of probabilities

2

009

-01

-26

17

Update of Probabilities (1)

• In regular intervals, dispatcher obtains monitoring data from Node Managers of running compute servers

• If si had idle time and si had no problem meeting the SLAs:

Increase load on si, reduce load on sr

pT(r) := pT(r) – Δp

pT(i) := pT(i) + Δp

• If r > 1 and for all service types TpT(r) = 0, initiate shutdown of sr

2

009

-01

-26

18

Update of Probabilities (2)

• If compute server si violates the SLA for a service type T (overload situation):

First try to find a running compute server sk (1 ≤ k ≤ r) that has idle time and met the SLAs of all service types

• Balance load between si and sk

• pT(i) := pT(i) – Δp

• pT(k) := pT(k) + Δp

If there is no such compute server sk, initiate restart of sr+1

2

009

-01

-26

19

Future Work (1)

• Testbed and evaluationMain evaluation metric: Energy savings for given workloads

Service performance must be modeled

Traces of service execution in data centers needed

• Migration of sessionsReduces the time for preparing shutdown

• Complex optimization criteriaMinimize number of service types hosted on the same compute server

Consider estimated shutdown preparation time when choosing the compute server to shut down

2

009

-01

-26

20

Future Work (2)

• Distribution and replication

Service dispatcher must not become bottleneck

• Fault tolerance

Dispatcher must detect compute server failures

Dispatcher must not become single point of failure

• Sudden load fluctuations

Shutting down machines increases vulnerability wrt. denial-of-service attacks

2

009

-01

-26

21

Conclusions

• Data centers are growing and consume huge amounts of electrical energy

• Energy can be saved by powering down unused machines according to the current load

• Requires consolidation of services on a subset of the available machines

• Probabilistic approach to energy consumption-aware load-balancing