Renewable and cooling aware workload management for

Renewable and Cooling Aware Workload Management forSustainable Data Centers∗

Zhenhua Liu$, Yuan Chen†, Cullen Bash†, Adam Wierman$,Daniel Gmach†, Zhikui Wang†, Manish Marwah†, Chris Hyser†

$California Institute of Technology, Pasadena, CA, USA, E-mail: {zhenhua, adamw}@caltech.edu†HP Labs, Palo Alto, CA, USA, E-mail: [email protected]

ABSTRACTRecently, the demand for data center computing has surged,increasing the total energy footprint of data centers world-wide. Data centers typically comprise three subsystems: ITequipment provides services to customers; power infrastruc-ture supports the IT and cooling equipment; and the coolinginfrastructure removes heat generated by these subsystems.This work presents a novel approach to model the energyflows in a data center and optimize its operation. Tradition-ally, supply-side constraints such as energy or cooling avail-ability were treated independently from IT workload man-agement. This work reduces electricity cost and environmen-tal impact using a holistic approach that integrates renew-able supply, dynamic pricing, and cooling supply includingchiller and outside air cooling, with IT workload planningto improve the overall sustainability of data center opera-tions. Specifically, we first predict renewable energy as wellas IT demand. Then we use these predictions to generate anIT workload management plan that schedules IT workloadand allocates IT resources within a data center accordingto time varying power supply and cooling efficiency. Wehave implemented and evaluated our approach using tracesfrom real data centers and production systems. The resultsdemonstrate that our approach can reduce both the recur-ring power costs and the use of non-renewable energy by asmuch as 60% compared to existing techniques, while stillmeeting the Service Level Agreements.

Categories and Subject DescriptorsC.0 [Computer Systems Organization]: General

Keywordssustainable data center, renewable energy, demand shaping,scheduling, cooling optimization

1. INTRODUCTIONData centers are emerging as the “factories” of this genera-

tion. A single data center requires a considerable amount ofelectricity and data centers are proliferating worldwide as a

∗This work is done during Zhenhua Liu’s internship at HPLabs. Zhenhua Liu and AdamWierman are partly supportedby NSF grant CNS 0846025 and DoE grant DE-EE0002890.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGMETRICS’12, June 11–15, 2012, London, England, UK.Copyright 2012 ACM 978-1-4503-1097-0/12/06 ...$10.00.

result of increased demand for IT applications and services.As a result, concerns about the growth in energy usage andemissions have led to social interest in curbing their energyconsumption. These concerns have led to research efforts inboth industry and academia. Emerging solutions include theincorporation of renewable on-site energy supplies as in Ap-ple’s new North Carolina data center, and alternative coolingsupplies as in Yahoo’s New York data center. The problemaddressed by this paper is how to use these resources mosteffectively during the operation of data centers.

Most of the efforts toward this goal focus on improving theefficiency in one of the three major data center silos: (i) IT,(ii) cooling, and (iii) power. Significant progress has beenmade in optimizing the energy efficiency of each of the threesilos enabling sizeable reductions in data center energy usage,e.g., [1, 2, 3, 4, 5, 6, 7, 8]; however, the integration of thesesilos is an important next step. To this end, a second genera-tion of solutions has begun to emerge. This work focuses onthe integration of different silos [9, 10, 11, 12]. An example isthe dynamic thermal management of air-conditioners basedon load at the IT rack level [9, 13]. However, to this point,supply-side constraints such as renewable energy and cool-ing availability are largely treated independently from work-load management such like scheduling. Particularly, currentworkload management are not designed to take advantage oftime variations in renewable energy availability and coolingefficiencies. The work in [14] integrates power capping andconsolidation with renewable, but they do not shift work-loads to align power demand with renewable supply.

The potential of integrated, dynamic approaches has beenrealized in some other domains, e.g., cooling managementsolutions for buildings that predict weather and power pricesto dynamically adapt the cooling control have been proposed[15]. The goal of this paper is to start to realize this potentialin data centers. Particularly, the potential of an integratedapproach can be seen from the following three observations:

First, most data centers support a range of IT workloads,including both critical interactive applications that run 24x7such like Internet services, and delay tolerant, batch-styleapplications as scientific applications, financial analysis, andimage processing, which we refer to as batch workloads orbatch jobs. Generally, batch workloads can be scheduled torun anytime as long as they finish before deadlines. Thisenables significant flexibility for workload management.

Second, the availability and cost of power supply, e.g., re-newable energy supply and electricity price, is often dynamicover time, and so dynamic control of the supply mix canhelp reduce CO2 emissions and offset costs. Thus, thought-ful workload management can have a great impact on energyusage and costs by scheduling batch workloads in a mannerthat follows the renewable availability.

Third, many data centers nowadays are cooled by multiplemeans through a cooling micro grid combining traditionalmechanical chillers, airside economizers, and waterside econ-

175

IT Power

Cooling Power

IT Equipment(Interactive Workloads / Batch Jobs)

Outside AirEconomizer

ChillerPlant

WaterEconomizer

Grid Power

Wind Power

PV Power

CRACSCRACPower

Energy Storage

PowerInfrastructure

Figure 1: Sustainable Data Center

omizers. Within a micro grid, each cooling approach has adifferent efficiency and capacity that depends on IT work-load, cooling generation mechanism and external conditionsincluding outside air temperature and humidity, and mayvary with the time of day. This provides opportunities tooptimize cooling cost by “shaping” IT demand according totime varying cooling efficiency and capacity.

The three observations above highlight that there is con-siderable potential for integrated management of the IT,cooling, and power subsystems of data centers. Providingsuch an integrated solution is the goal of this work. Specif-ically, we provide a novel workload scheduling and capac-ity management approach that integrates energy supply (re-newable energy supply, dynamic energy pricing) and coolingsupply (chiller cooling, outside air cooling) into IT workloadmanagement to improve the overall energy efficiency and re-duce the carbon footprint of data center operations.

A key component of our approach is demand shifting,which schedules batch workloads and allocates IT resourceswithin a data center according to the availability of renew-able energy supply and the efficiency of cooling. This is acomplex optimization problem due to the dynamism in thesupply and demand and the interaction between them. Tosee this, given the lower electricity price and temperature ofoutside air at night, batch jobs should be scheduled to runat night; however, because more renewable energy like solaris available around noon, we should do more work during theday to reduce electricity bill and environmental impact.

At the core of our design is a model of the costs within thedata center, which is used to formulate a constrained convexoptimization problem. The workload planner solves this op-timization to determine the optimal demand shifting. Theoptimization-based workload management has been popularin the research community recently, e.g., [16, 17, 18, 19, 3, 11,20, 21]. The key contributions of the formulation consideredhere compared to the prior literature are (i) the addition of adetailed cost model and optimization of the cooling compo-nent of the data center, which is typically ignored in previousdesigns; (ii) the consideration of both interactive and batchworkloads; and (iii) the derivation of important structuralproperties of the optimal solutions to the optimization.

In order to validate our integrated design, we have imple-mented a prototype of our approach for a data center thatincludes solar power and outside air cooling. Using our im-plementation, we perform a number of experiments on a realtestbed to highlight the practicality of the approach (Section5). In addition to validating our design, our experiments arecentered on providing insights into the following questions:

(1) How much benefit (reducing electricity bill and environ-mental impact) can be obtained from our renewable andcooling-aware workload management planning?

(2) Is net-zero1 grid power consumption achievable?

1By “net-zero” we mean that the total energy usage over a

24 48 72 96 120 144 1680

50

100

150

hour

rene

wab

le g

ener

atio

n (k

W)

solarwind

Figure 2: One week renewable generation

(3) Which renewable source is more valuable? What is theoptimal renewable portfolio?

2. SUSTAINABLE DATA CENTEROVERVIEW

Figure 1 depicts an architectural overview of a sustain-able data center. The IT equipment includes servers, stor-age and networking switches that support applications andservices hosted in the data center. The power infrastruc-ture generates and delivers power for the IT equipment andcooling facility through a power micro grid that integratesgrid power, local renewable generation such as photovoltaic(PV) and wind, and energy storage. The cooling infrastruc-ture provides, delivers and distributes the cooling resourcesto extract the heat from the IT equipment. In this example,the cooling capacity is delivered to the data center throughthe Computer Room Air Conditioning (CRAC) Units fromthe cooling micro grid that consists of air economizer, watereconomizer and traditional chiller plant. We discuss thesethree key subsystems in detail in the following sections.

2.1 Power InfrastructureAlthough renewable energy is in general more sustainable

than grid power, the supply is often time varying in a mannerthat depends on the source of power, location of power gen-erators, and the local weather conditions. Figure 2 showsthe power generated from a 130kW PV installation for anHP data center and a nearby 100kW wind turbine in Califor-nia, respectively. The PV generation shows regular variationwhile that from the wind is much less predictable. How tomanage these supplies is a big challenge for application ofrenewable energy in a sustainable data center.

Despite the usage of renewable energy, data centers muststill rely on non-renewable energy, including grid power andon-site energy storage, due to availability concerns. Gridpower can be purchased at either a pre-defined fixed rate oran on-demand time-varying rate, and Figure 3 shows an ex-ample of time-varying electricity price over 24 hours. Theremight be an additional charge for the peak demand.

Local energy storage technologies can be used to store andsmooth out the supply of power for a data center. A varietyof technologies are available [22], including flywheels, batter-ies, and other systems. Each has its costs, advantages anddisadvantages. Energy storage is still quite expensive andthere is power loss associated with energy conversion andcharge/discharge. Hence, it is critical to maximize the useof the renewable energy that is generated on site. An idealscenario is to maximize the use of renewable energy whileminimizing the use of storage.

2.2 Cooling SupplyDue to the ever-increasing power density of IT equipment

in today’s data centers, a tremendous amount of electricityis used by the cooling infrastructure. According to [23], a

fixed period is less than or equal to the local total renewablegeneration during that period. Note that this does not meanthat no power from the grid is used during this period.

176

24 48 72 96 120 144 1680

2

4

6

hour

elec

tric

ity p

rice

(cen

ts/k

Wh)

Figure 3: One week real-time electricity price

significant amount of data center power goes to the coolingsystem (up to 1/3) including CRAC units, pumps, chillerplant and cooling towers.

Lots of work has been done to improve the cooling effi-ciency through, e.g., smart facility design, real-time controland optimization [8, 7]. Traditional data centers use chillersto cool down the returned hot water from CRACs via me-chanical refrigeration cycles since they can provide high cool-ing capacity continuously. However, compressors within thechillers consume a large amount of power [24, 25]. Recently,“chiller-less” cooling technologies have been adopted to re-move or reduce the dependency on mechanical chillers. Inthe case with water-side economizers, the returned hot wateris cooled down by components such as dry coolers or evap-orative cooling towers. The cooling capacity may also begenerated from cold water from seas or lakes. In the case ofair economizers, cold outside air may be introduced after fil-tering and/or humidification/de-humidification to cool downthe IT equipment directly while hot air is rejected into theenvironment.

However, these so-called “free” cooling approaches are ac-tually not free [24]. First, there is still a non-negligible energycost associated with these approaches, e.g., blowers drivingoutside air through data center need to work against air flowresistance and therefore consume power. Second, the effi-ciency of these approaches is greatly affected by environ-mental conditions such as ambient air temperature and hu-midity, compared with that of traditional approaches basedon mechanical chillers. The cooling efficiency and capacityof the economizers can vary widely along with time of theday, season of the year, and geographical locations of thedata centers. These approaches are usually complementedby more stable cooling resources such as chillers, which pro-vides opportunities to optimize the cooling power usage by“shaping” IT demand according to cooling efficiencies.

2.3 IT WorkloadThere are many different workloads in a data center. Most

of them fit into two classes: interactive, and non-interactiveor batch. The interactive workloads such as Internet servicesor business transactional applications typically run 24x7 andprocess user requests, which have to be completed within acertain time (response time), usually within a second. Non-interactive batch jobs such as scientific applications, finan-cial analysis, and image processing are often delay tolerantand can be scheduled to run anytime as long as progress canbe made and the jobs finish before the deadline (completiontime). This deadline is much more flexible (several hoursto multiple days) than that of interactive workload. Thisprovides great optimization opportunities for workload man-agement to“shape”non-interactive batch workloads based onthe varying renewable energy and cooling supply.

Interactive workloads are characterized by stochastic prop-erties for request arrival, service demand, and Service LevelAgreements (SLAs, e.g., thresholds of average response timeor percentile delay). Figure 4 shows a 7-day normalized CPUusage trace for a popular photo sharing and storage web ser-vice, which has more than 85 million registered users in 22countries. We can see that the workload shows significant

0 24 48 72 96 120 144 1680

0.2

0.4

0.6

0.8

1

hour

norm

aliz

ed C

PU

usa

ge

Figure 4: One week interactive workload

variability and exhibits a clear diurnal pattern, which is typ-ical for data center interactive workloads.

Batch jobs are defined in terms of total resource demand(e.g, CPU hours), starting time, completion time as well asmaximum resource consumption (e.g., a single thread pro-gram can use up to 1 CPU). Conceptually, a batch job canrun at anytime on many different servers as long as it fin-ishes before the specified completion time. Our integratedmanagement approach exploits this flexibility to make use ofrenewable energy and efficient cooling when available.

3. MODELING AND OPTIMIZATIONAs discussed above, time variations in renewable energy

supply availability and cooling efficiencies provide both op-portunities and challenges for managing IT workloads in datacenters. In this section, we present a novel design for renew-able and cooling aware workload management that exploitsopportunities available to improve the sustainability of datacenters. In particular, we formulate an optimization problemfor adapting the workload scheduling and capacity allocationto varying supply from power and cooling infrastructure.

3.1 Optimizing the cooling substructureWe first derive the optimal cooling substructure when mul-

tiple cooling approaches are available in the substructure.We consider two cooling approaches: the outside air cool-ing which supplies most of the cooling capacity, and coolingthrough mechanical chillers which guarantees availability ofcooling capacity. By exploring the heterogeneity of the ef-ficiency and cost of the two approaches, we represent theminimum cooling power of the substructure as a function ofthe IT heat load.

In the following discussion, we define cooling coefficient asthe cooling power divided by the IT power to represent thecooling efficiency. By cooling capacity we mean how muchheat the cooling system can extract from the IT equipmentand reject into the environment. In the case of outside aircooling, the cold air from outside is assumed pushed intothe return ends of the CRAC units while the hot air fromthe outlets of the server racks is exhausted to the ambientenvironment through ducts.

Outside Air CoolingThe energy usage of outside air cooling is mainly the powerconsumed by blowers, which can be approximated as a cu-bic function of the blower speed [26, 24]. We assume thatcapacity of the outside air cooling is under tight control,e.g., through blower speed tuning, to avoid over-provisioning.Then the outside air capacity is equal to the IT heat load atthe steady state when the latter does not exceed the total aircooling capacity. Based on basic heat transfer theory [27],the cooling capacity is proportional to the air volume flowrate. The air volume flow rate is approximately proportionalto blower speed according to the general fan laws [26, 24].Therefore, outside air cooling power can be defined as a func-tion of IT power d as fa(d) = kd3, 0 ≤ d ≤ d, k > 0, which isa convex function. The parameter k depends on the temper-ature difference, (tRA − tOA), based again on heat transfer

177

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

IT power (kW)

cool

ing

coef

ficie

nt

OAT = 30 CelsiusOAT = 25 CelsiusOAT = 20 Celsius

(a) Outside air cooling

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

IT power (kW)

cool

ing

coef

ficie

nt


(b) Chiller cooling

Figure 5: Cooling coefficient comparison, for conver-sion, 20◦C=68◦F, 25◦C=77◦F, 30◦C=86◦F

theory, where tOA is the outside air temperature (OAT) andtRA is the temperature of the (hot) exhausting air from theIT racks. The maximum capacity of this cooling system canbe modeled as d = C(tRA − tOA). The parameter C > 0is the maximum capacity of the air, which is proportionalto the maximal outside air mass flow rate when the blowersrun at the highest speed. As one example, Figure 5(a) showsthe cooling coefficient for an outside air cooling system (as-suming the exhausting air temperature is 35◦C/95◦F) underdifferent outside air temperatures.

Chilled Water CoolingFirst-principle models of chilled water cooling systems, in-cluding the chiller plant, cooling towers, pumps and heatexchangers, are complicated [27, 13, 25]. In this paper, weconsider an empirical chiller efficiency model that was builton actual measurement of an operational chiller [25]. Fig-ure 5(b) shows the cooling coefficient of the chiller. Differentfrom the outside air cooling, the chiller cooling coefficientdoes not change much with OAT and the variation over dif-ferent IT load is much smaller than that under outside aircooling. In the following analysis, the chiller power consump-tion is approximated as fc(d) = γd, where d is again the ITpower and γ > 0 is a constant depending on the chiller.Our analysis also applies to the case of any arbitrary strictlyincreasing and convex chiller cooling function [28].

Cooling optimizationAs shown in Figure 5, the efficiency of outside air coolingis more sensitive to IT load and the OAT than is that ofchiller cooling. Furthermore, the cost of outside air coolingis higher than that of the chiller when the IT load exceedsa certain value because its power increases very fast (super-linearly) as the IT power increases, in particular for highambient temperatures. The heterogeneous cooling efficien-cies of the two approaches and the varying properties alongwith air temperature and heat load provide opportunities tooptimize the cooling cost by using proper cooling capacityfrom each cooling supply as we discuss below, or by manip-ulating the heat load through demand shaping as we showin later sections.

For a given IT power d and outside air temperature tOA,there exists an optimal cooling capacity allocation betweenoutside air cooling and chiller cooling. Assume the coolingcapacities provided by the chiller and outside air are d1 andd2 respectively (d1 = d − d2). From the cooling modelsintroduced above, the optimal cooling power consumption is

c(d) = mind2∈[0,d]

γ(d− d2)+ + kd32 (1)

This can be solved analytically, which yields

d∗2 =

{d if d ≤ dsds otherwise

where ds = min{√

γ/3k, d}, and the optimal outside air

0 10 20 30 40 500

2

4

6

8

10

IT power (kW)

optim

al c

oolin

g po

wer

(kW

)


Figure 6: Optimal cooling power

cooling capacity is d∗1 = d− d∗2. So,

c(d) =

{kd3 if d ≤ dskd3s + γ(d− ds) otherwise

(2)

is the cooling power of the optimal substructure, which isused for the optimization in later sections. Figure 6 illus-trates the relationship between cooling power and IT loadfor different ambient temperatures. We see that the coolingpower is a convex function of IT power, higher with hotteroutside air. We also prove that the optimal c(d) is a convexfunction of d in [28], which is important for using this as acomponent of the overall optimization for workload manage-ment.

3.2 System ModelWe consider a discrete-time model whose timeslot matches

the timescale at which the capacity provisioning and schedul-ing decisions can be updated. There is a (possibly long) timeperiod we are interested in, {1, 2, ..., T}. In practice, T couldbe a day and a timeslot length could be 1 hour. The man-agement period can be either static, e.g., to perform thescheduling every day for the execution of the next day, ordynamic, e.g., to create a new plan if the old scheduling dif-fers too much from the actual supply and demand. The goalof the workload management is at each time t to:(i) Make the scheduling decision for each batch job;

(ii) Choose the energy storage usage;

(iii) Optimize the cooling infrastructure.We assume the renewable supply at time t is r(t), which

may be a mix of different renewables, such as wind and PVsolar. We denote the grid power price at time t by p(t) andassume p(t) > 0 without loss of generality. If at some time twe have negative price, we will use up the total capacity atthis timeslot, then we only need to make capacity decisionsfor other timeslots, see [28] for more details. To model energystorage, we denote the energy storage level at time t by es(t)with initial value es(0) and the discharge/charge at time tby e(t), where positive or negative values mean discharge orcharge, respectively. Also, there is a loss rate ρ ∈ [0, 1] forenergy storage. We therefore have the relation es(t + 1) =ρ(es(t) − e(t)) between successive timeslots and we require0 ≤ es(t) ≤ ES,∀t, where ES is the energy storage capacity.Though there are more complex energy storage models [29],they are beyond the scope of the paper.

Assume that there are I interactive workloads. For in-teractive workload i, the arrival rate at time t is λi(t), themean service rate is μi and the target performance metrics(e.g., average delay, or 95th percentile delay) is rti. In or-der to satisfy these targets, we need to allocate interactiveworkload i with IT capacity ai(t) at time t. Here ai(t) isderived from analytic models (e.g., M/GI/1/PS, M/M/k) orsystem measurements as a function of λi(t) because perfor-mance metrics generally improve as the capacity allocated tothe workload increases, hence there is a sharp threshold forai(t). Note that our solution is quite general and does notdepend on a particular model.

Assume there are J classes of batch jobs. Class j batchjobs have total demand Bj , maximum parallelization MPj ,starting time Sj and deadline Ej . Let bj(t) denote theamount of capacity allocated to class j jobs at time t. We

178

have 0 ≤ bj(t) ≤ MPj ,∀t and∑

t bj(t) ≤ Bj . Given theabove definitions, the total IT demand at time t is given by

d(t) = Σiai(t) + Σjbj(t). (3)

When taking into consideration the server power model,we can further transform d(t) into power demand, as in Sec-tion 4.1. We assume the total IT capacity is D, so 0 ≤d(t) ≤ D,∀t. Note here d(t) is not constant, but insteadtime-varying as a result of dynamic capacity provisioning.

3.3 Cost and Revenue ModelThe cost of a data center includes both capital and op-

erating costs. Our model focuses on the operational elec-tricity cost. Meanwhile, by servicing the batch jobs, thedata center can obtain revenue. We model the data centercost by combining the energy cost and revenue from batchjobs. Note that, to simplify the model, we do not include theswitching costs associated with cycling servers in and out ofpower-saving modes; however, the approach of [3] provides anatural way to incorporate such costs if desired.

To capture the variation of the energy cost over time, welet g(t, d(t), e(t)) denote the energy cost of the data centerat time t given the IT power d(t), optimal cooling power,renewable generation, electricity price, and energy storageusage e(t). For any t, we assume that g(t, d(t), e(t)) is non-decreasing in d(t), non-increasing in e(t), and jointly convexin d(t) and e(t).

This formulation is quite general, and captures, for exam-ple, the common charging plan of a fixed price per kWh plusan additional “demand charge” for the peak of the averagepower used over a sliding 15 minute window [30], in whichcase the energy cost function consists of two parts:

p(t) (d(t) + c(d(t))− r(t)− e(t))+

and

ppeak(max

t(d(t) + c(d(t))− r(t)− e(t))+

),

where p(t) is the fixed/variable electricity price per kWh,and ppeak is the peak demand charging rate. We could alsoinclude a sell-back mechanism and other charging policies.Additionally, this formulation can capture a wide range ofmodels for server power consumption, e.g., energy costs asan affine function of the load, see [1], or as a polynomialfunction of the speed, see [4, 31].

We model only the variable component of the revenue2,which comes from the batch jobs that are chosen to be run.Specifically, the data center gets revenue R(b), where b isthe matrix consisting of bj(t),∀j, ∀t. In this paper, we focuson the following, simple revenue function

R(b) = ΣjRj

(Σt∈[Sj,Ej ]bj(t)

),

where Rj is the per-job revenue.(Σt∈[Sj,Ej ]bj(t)

)captures

the amount of class j jobs finished before their deadlines.

3.4 Optimization ProblemWe are now ready to formulate the renewable and cooling

aware workload management optimization problem. Our op-timization problem takes as input the renewable supply r(t),electricity price p(t), optimal cooling substructure c(d(t)),and IT workload demand ai(t), Bj and related information(the starting time Sj , deadline Ej , maximum parallizationMPj), IT capacity D, energy storage capacity ES and lossrate ρ, and generates an optimal schedule of each timeslot forbatch jobs bj(t) and energy storage usage e(t), according to

2Revenue is also derived from the interactive workload, butfor the purposes of workload management the amount ofrevenue from this workload is fixed.

the availability of renewable power and cooling supply suchthat specified SLAs (e.g., deadlines) and operational goals(e.g., minimizing operational costs) are satisfied.

This is captured by the following optimization problem:

minb,e

Σtg(t, d(t), e(t))−ΣjRj

(Σt∈[Sj,Ej ]bj(t)

)(4a)

s.t.Σtbj(t) ≤ Bj , ∀j (4b)

es(t+ 1) = ρ(es(t)− e(t)), ∀t (4c)

0 ≤ bj(t) ≤ MPj , ∀j,∀t (4d)

0 ≤ d(t) ≤ D, ∀t (4e)

0 ≤ es(t) ≤ ES. ∀t (4f)

Here d(t) is given by (3). (4b) means the amount of servedbatch jobs cannot exceed the total demand, and could be-come Σtbj(t) = Bj if finishing all class j batch job is re-quired. (4c) updates the energy storage level of each times-lot. We also incorporate constraints on maximum paralleliza-tion (4d), IT capacity (4e), and energy storage capacity (4f).We may have other constraints, such as a “net zero” con-straint that the total energy consumed be less than the totalrenewable generation within [1, T ], i.e.

∑t(d(t) + c(d(t))) ≤∑

t r(t). Note that, though highly detailed, this formulationdoes ignore some important concerns of data center design,e.g., reliability and availability. Such issues are beyond thescope of this paper; nevertheless, our designs merge nicelywith proposals such as [32] for these goals.

In this paper, we restrict our focus from optimization (4a)to (5a), but the analysis can be easily extended to otherconvex cost functions.

minb,e

Σtp(t)(d(t) + c(d(t)) − r(t) − e(t))+ − ΣjRj

(Σt∈[Sj,Ej ]

bj(t))

(5a)

s.t.Σtbj(t) ≤ Bj , ∀j (5b)

es(t+ 1) = ρ(es(t) − e(t)), ∀t (5c)

0 ≤ bj(t) ≤ MPj , ∀j,∀t (5d)

0 ≤ d(t) ≤ D, ∀t (5e)

0 ≤ es(t) ≤ ES. ∀t (5f)

Note that this optimization problem is jointly convex inbj(t) and e(t) and can therefore be efficiently solved.

Given the significant amount of prior work approachingdata center workload management via convex optimization[16, 17, 18, 19, 3, 11, 20], it is important to note the keydifference between our formulation and prior work–our for-mulation is the first, to our knowledge, to incorporate renew-able generation, storage, an optimized cooling micro grid,and batch job scheduling with consideration of both pricediversity and temperature diversity. Prior formulations haveincluded only one or two of these features. This “univer-sal” inclusion is what allows us to consider truly integratedworkload management.

3.5 Properties of the optimal workloadmanagement

The usage of the workload management optimization de-scribed above depends on more than just the ability to solvethe optimization quickly. In particular, the solutions mustbe practical if they are to be adopted in actual data centers.

In this section, we provide characterizations of the optimalsolutions to the workload management optimization, whichhighlight that the structure of the optimal solutions facil-itates implementation. Specifically, one might worry thatthe optimal solutions require highly complex scheduling of

179

the batch jobs, which could be impractical. For example,if a plan schedules too many jobs at a time, it may not bepractical because there is often an upper limit on how manyworkloads can be hosted in a physical server. The followingresults here show that such concerns are unwarranted.

Energy usage and costAlthough it is easily seen that the workload management op-timization problem has at least one optimal solution,3 in gen-eral, the optimal solution is not unique. Thus, one may worrythat the optimal solutions might have very different proper-ties with respect to energy usage and cost, which would makecapacity planning difficult. However, it turns out that theoptimal solution, though not unique, has nice properties withrespect to energy usage and cost.

In particular, we prove that all optimal solutions use thesame amount of power from the grid at all times. Thus,though the scheduling of batch jobs and the usage of energystorage might be very different, the aggregate grid powerusage is always the same. This is a nice feature when con-sidering capacity planning of the power system. Formally,this is summarized by the following theorem proved in [28].

Theorem 1. For the simplified energy cost model (5a),suppose the optimal cooling power c(d) is strictly convex ind. Then, the energy usage from the grid, (d(t) + c(d(t)) −r(t) − e(t))+, at each time t is common across all optimalsolutions.

Though Theorem 1 considers a general setting, it is notgeneral enough to include the optimal cooling substructurediscussed in Section 3.1, which includes a strictly convex sec-tion followed by a linear section (while in practice, the chillerpower is usually strictly convex in IT power, and satisfiesthe requirement of Theorem 1). However, for this setting,there is a slightly weaker result that still holds–the marginalcost of power during each timeslot is common across all op-timal solutions. This is particularly useful because it thenprovides the data center operator a benchmark for evaluat-ing which batch jobs are worthy of execution, i.e., providerevenue larger than the marginal cost they would incur. For-mally, we have the following theorem proved in [28].

Theorem 2. For the simplified energy cost model (5a),suppose c(d) is given by (2). Then, the marginal cost ofpower, ∂

(p(t)(d(t) + c(d(t))− r(t)− e(t))+

)/∂(d(t)), at each

time t is common across all optimal solutions.

Complexity of the schedule for batch jobsA goal of this work is to develop an efficient, implementablesolution. One practical consideration is the complexity ofthe schedule for batch jobs. Specifically, a schedule mustnot be too “fragmented”, i.e., have many batch jobs beingrun at the same time and batch jobs being split across alarge number of time slots. This is particularly importantin virtualized server environments because we often need toallocate a large amount of memory for each virtual machineand the number of virtual machines sharing a server is oftenlimited by the memory available to virtual machines even ifthe CPUs can be well shared. Additionally, there is alwaysoverhead associated with hosting virtual machines. If werun too many virtual machines on the same server at thesame time, the CPU, memory, I/O and performance can beaffected. Finally, with more virtual machines, live migrationsand consolidations during runtime management can affectthe system performance.

3This can be seen by applying Weierstrass’ theorem [33],since the objective function is continuous and the feasibleset is compact subset of Rn.

historical power traces,

weather, configuration

forecasted

PV power

forecasted

workload

temperature,

electricity price,

IT capacity,

cooling parameters

historical workload traces

workload schedule,

resource provisioning plan

PV Power

Forecaster

Capacity and Workload Planner

Runtime Workload Manager

IT Workload

Forecaster

SLAs,

operational goals

Figure 7: System Architecture

However, it turns out that one need not worry about anoverly “fragmented” schedule, since there always exists a“simple” optimal schedule. Formally, we have the followingtheorem, which is proved in [28].

Theorem 3. There exists an optimal solution to the work-load management problem with at most (T + J − 1) of thebj(t) are neither 0 nor MPj .

Informally, this result says that there is a simple optimalsolution that uses at most (T +J−1) more timeslots, or cor-responding active virtual machines, in total than any othersolutions finishing the same number of jobs. Thus, on aver-age, for each class of batch job per timeslot, we run at most(T+J−1)

TJmore active virtual machines than any other plan

finishing the same amount of batch jobs. If the number ofbatch job classes is not large, on average, we run at most onemore virtual machine per slot. In our experiments in Section5, the simplest optimal solution only uses 4% more virtualmachines. Though Theorem 3 does not guarantee that everyoptimal solution is simple, the proof is constructive. Thus,it provides an approach that allows one to transform an op-timal solution into the simplest optimal solution.

In addition to Theorem 3, there are two other properties ofthe optimal solution that highlight its simplicity. We statethese without proof due to space constraints. First, whenmultiple classes of batch jobs are served in the same timeslot,all of them except possibly the one with the lowest revenueare finished. Second, in every timeslot, the lowest marginalrevenue of a batch job that is served is still larger than themarginal cost of power from Theorem 2.

4. SYSTEM PROTOTYPEWe have designed and implemented a supply-aware work-

load and capacity management prototype in a productiondata center based on the description in the previous sec-tion. The data center is equipped with on-site PV powergeneration and outside air cooling. The prototype includesworkload and capacity planning, runtime workload schedul-ing and resource allocation, renewable generation and ITworkload demand prediction. Figure 7 depicts the systemarchitecture. The predictors use the historical traces of sup-ply and demand information to predict the available powerof an on-site PV, and the expected interactive workload de-mand. The capacity planner takes the predicted energy sup-ply and cooling information as inputs and generates an op-timal capacity allocation scheme for each workload. Finally,the runtime workload manager executes the workload plan.

The remainder of this section provides more details oneach component of the system: capacity and workload plan-ning, PV and workload prediction, and runtime workloadmanagement.

180

4.1 Capacity and Workload PlannerThe data center has mixed energy sources: on-site pho-

tovoltaic (PV) generation tied to grid power. The coolinginfrastructure has a cooling micro grid, including outside aircooling and chiller cooling. The data center hosts interac-tive applications and batch jobs. There are SLAs associatedwith the workloads. Though multiple IT resources can beused by IT workloads, we focus on CPU resource manage-ment in this implementation. We use a virtualized serverenvironment where different workloads can share resourceson the same physical servers.

The planner takes the following inputs: power supply (timevarying PV power and grid power price data), interactiveworkload demand (time varying user request rates, responsetime target), batch job resource demands (CPU hours, ar-rival time, deadline, revenue of each batch job), IT con-figuration information (number of servers, server idle andpeak power, capacity) and cooling configuration parameters(blower capacity, chiller cooling efficiency) and operationalgoals. We use the optimization (5a) in Section 3.4 with thefollowing additional details.

We first determine the IT resource demand of interac-tive workload i using the M/GI/1/PS model, which gives

1μi−λi(t)/ai(t)

≤ rti. Thus, the minimum CPU capacity

needed is ai(t) =λi(t)

μi−1/rti, which is a linear function of the

arrival rate λi(t). We estimate μi through real measure-ments and set the response time requirement rti accordingto the SLAs. While the model is not perfect for real-worlddata center workloads, it provides a good approximation.Although important, performance modeling is not the focusof this paper. The resulting average CPU utilization of in-teractive workload i is 1 − 1

μirti, therefore its actual CPU

usage at time t is ai(t)(1− 1

μirti

), the remaining ai(t)

1μirti

capacity can be shared by batch jobs. For a batch job j,assume at time t it shares nji(t) ≥ 0 CPU resource withinteractive workload i and uses additional nj(t) ≥ 0 CPUresource by itself, then its total CPU usage at time t isbj(t) = Σinji(t) + nj(t), which is used to update Constraint(5b) and (5d). We have an additional constraint on CPU ca-pacity that can be shared Σjnji(t) ≤ ai(t)

1μirti

. Assume the

data center has D CPU capacity in total, so the IT capacityconstraint becomes Σiai(t) + Σjnj(t) ≤ D. Although ouroptimization (5a) in Section 3.4 can be used to handle ITworkload with multi-dimensional demand, e.g., CPU, mem-ory, here we restrict our attention to CPU-bound workloads.

The next step is to estimate the IT power consumption,which can be done based on the average CPU utilization

Pserver(u) = Pi + (Pb − Pi) ∗ uwhere u is the average CPU utilization across all servers, Pi

and Pb are the power consumed by the server at idle andtheir fully utilized state, respectively. This simple modelhas proved very useful and accurate in modeling power con-sumption since other components’ activities are either staticor correlate well with CPU activity [1]. Assuming each serverhas Q CPU capacity, using the above model, we estimate theIT power as follows4 :

d(t) =Σiai(t)

Q(Pi + (Pb − Pi) ∗ ui) +

Σjnj(t)

QPb,

where ui =(1− 1

μirti+

Σjnji(t)

ai(t)

).

The cooling power can be derived from the IT power ac-cording to the cooling model (2) described in Section 3.1.

4Since the number of servers used by an interactive workloador a class of batch jobs is usually large in data centers, wetreat it as continuous.

24 48 72 96 120 144 1680

50

100

150

hour

PV

gen

erat

ion

(kW

)

actualpredicted

Figure 8: PV prediction

By solving the cost optimization problem (5a), we thenobtain a detailed capacity plan, including at each time t thecapacity allocated to each class j of batch jobs bj(t) (fromnji(t) and nj(t)), and interactive workload i ai(t), energystorage usage e(t), as well as optimal cooling configuration(i.e., capacity for outside air cooling and chiller cooling).

It follows from Section 3.4 that this problem is a convex op-timization problem and hence there exist efficient algorithmsto solve this problem. For example, disciplined convex pro-gramming [34] can be used. Under this approach, convexfunctions and sets are built up from a small set of rules fromconvex analysis, starting from a base library of convex func-tions and sets. Constraints and objectives that are expressedusing these rules are automatically transformed to a canon-ical form and solved. In our prototype, the algorithm isimplemented using Matlab CVX [34], a modeling system forconvex optimization.

We then utilize the Best Fit Decreasing (BFD) method [35]to decide how to place and consolidate the workloads at eachtimeslot. More advanced techniques exist for optimizing theworkload placement [5], but they are out this paper’s scope.

4.2 PV Power ForecasterA variety of methods have been used for fine-grained en-

ergy prediction, mostly using classical auto-regressive tech-niques [36, 37]. However, most of the work does not explicitlyuse the associated weather conditions as a basis for modeling.The work in [38] considered the impact of the weather con-ditions explicitly and used an SVM classifier in conjunctionwith a RBF kernel to predict solar irradiation. We use a sim-ilar approach for PV prediction in our prototype implemen-tation. In order to predict PV power generation for the nextday, we use a k-nearest neighbor (k-NN) based algorithm.The prediction is done at the granularity of one-hour timeperiods. The basic idea is to search for the most “similar”days in the recent past (using one week worked well here5)and use the generation during those days to estimate thegeneration for the target hour. The similarity between twodays is determined using features such as ambient temper-ature, humidity, cloud cover, visibility, sunrise/sunset timeson those days, etc. In particular, the algorithm uses weightedk-NN, where the PV prediction for hour t on the next day

is given by yt =Σi∈Nk(xt,D)yi/d(xi,xt)

Σi∈Nk(xt,D)1/d(xi,xt), where yt is the PV

predicted output at hour t, xt is the feature vector, e.g., tem-perature, cloud cover, for the target hour t obtained from theweather forecast, yi is the actual PV output for neighbor i,xi is the corresponding feature vector, d is the distance met-ric function, Nk(xi,D) are k-nearest neighbors of xi in D. kis chosen based on cross-validation of historical data.

Figure 8 shows the predicted and actual values for thePV supply of the data center for one week in September2011. The average prediction errors vary from 5% to 20%.The prediction accuracy depends on occurrence of similarweather conditions in the recent past and the accuracy of theweather forecast. The results in [28] show that a ballpark

5If available, data from past years could also be used.

181

50 100 150 2000

2

4

6

8x 10

9

period (hour)

pow

er

Period = 24

(a) FFT analysis

6 12 18 240

0.2

0.4

0.6

0.8

1

1.2

hour

norm

aliz

ed c

pu d

eman

d

predicted workloadactual workload

(b) Workload prediction

Figure 9: Workload analysis and prediction

approximation is sufficient for planning purposes and oursystem can tolerate prediction errors in this range.

4.3 IT Workload ForecasterIn order to perform the planning, we need knowledge about

the IT demand, both the stochastic properties of the inter-active application and the total resource demand of batchjobs. Though there is large variability in workload demands,workloads often exhibit clear short-term and long-term pat-terns. To predict the resource demand (e.g., CPU resource)for interactive applications, we first perform a periodicityanalysis of the historical workload traces to reveal the lengthof a pattern or a sequence of patterns that appear periodi-cally. Fast Fourier Transform (FFT) can be used to find theperiodogram of the time-series data. Figure 9(a) plots thetime-series and the periodogram for 25 work days of a realCPU demand trace from an SAP application. From this wederive periods of the most prominent patterns or sequencesof patterns. For this example, the peak at 24 hours in theperiodogram indicates that it has a strong daily pattern (pe-riod of 24 hours). Actually, most interactive workloads ex-hibit prominent daily patterns. An auto-regressive modelis then used to capture both the long term and short termpatterns. The model estimates w(d, t), the demand at timet on day d, based on the demand of the previous N daysas w(d, t) = ΣN

i=1ai ∗ w(d − i, t) + c. The parameters arecalibrated using historical data.

We evaluate the workload prediction algorithm with sev-eral real demand traces. The results for a Web applicationtrace are shown in Figure 9(b). The average prediction er-rors are around 20%. If we can use the previous M timepoints of the same day for the prediction, we could furtherreduce the error rate.

The total resource demand (e.g., CPU hours) of batch jobscan be obtained from users or from historical data or throughoffline benchmarking [39]. Like supply prediction, a ballparkapproximation is good enough, as we will see in [28].

4.4 Runtime Workload ManagerThe runtime workload manager schedules workloads and

allocates CPU resource according to the plan generated bythe planner. We implement a prototype in a KVM-basedvirtualized server environment [40]. Our current implemen-tation uses a KVM/QEMU hypervisor along with controlgroups (Cgroups), a new Linux feature, to perform resourceallocation and workload management [40]. In particular, itexecutes the following tasks according to the plan: (1) createand start virtual machines hosting batch jobs; (2) adjust theresource allocation (e.g., CPU shares or number of virtualCPUs) to each virtual machine; (3) migrate and consolidatevirtual machines via live migration. The workload managerassigns a higher priority to virtual machines running inter-active workloads than virtual machines for batch jobs viaCgroups. This guarantees that resources are available asneeded by interactive applications, while excess resources canbe used by the batch jobs, improving server utilization.

5. EVALUATIONTo highlight the benefits of our design for renewable and

cooling aware workload management, we perform a mixtureof numerical simulations and experiments in a real testbed.We first present trace-based simulation results in Section 5.1,and then the experimental results on the real testbed imple-mentation in Section 5.2.

5.1 Case StudiesWe begin by discussing evaluations of our workload and ca-

pacity planning using numerical simulations. We use tracesfrom real data centers. In particular, we obtain PV supply,interactive IT workload, and cooling data from real data cen-ter traces. The renewable energy and cooling data is frommeasurements of a data center in California. The data cen-ter is equipped with 130kW PV panel array and a coolingsystem consisting of outside air cooling and chiller cooling.We use the real-time electricity price of the data center loca-tion obtained from [41]. The total IT capacity is 500 servers(100kW). The interactive workload is a popular web serviceapplication with more than 85 million registered users in 22countries. The trace contains average CPU utilization andmemory usage as recorded every 5 minutes. Additionally,we assume that there are a number of batch jobs. Half ofthem are submitted at midnight and another half are sub-mitted around noon. The total demand ratio between theinteractive workload and batch jobs is 1:1.5. The interactiveworkload is deemed critical and the resource demand mustbe met while the batch jobs can be rescheduled as long asthey finish before their deadlines. The plan period is 24-hours and the capacity planner creates a plan for the next24-hours at midnight based on renewable supply and cool-ing information as well as the interactive demand. The planincludes hourly capacity allocation for each workload. Weassume perfect knowledge about the workload demand andrenewable supply, and the results in [28] validate our solu-tion works well with prediction errors and different mixes ofinteractive and batch workloads.

Here we explore: (i) How valuable is renewable and coolingaware workload management? (ii) Is net zero possible underrenewable and cooling aware workload management? (iii)What portfolio of renewable sources is best?

How valuable is renewable and cooling awareworkload management?We start with the key question for this paper: how muchenergy cost/CO2 savings does renewable and cooling awareworkload management provide? In this study, we assumehalf of batch jobs must be finished before noon and anotherhalf must be finished before midnight. We compare the fol-lowing four approaches: (i) Optimal, which integrates supplyand cooling information and uses our optimization algorithmto schedule batch jobs; (ii) Night, which schedules batch jobsat night to avoid interfering with critical workloads and totake advantage of idle machines (this is a widely used so-lution in practice); (iii) Best Effort (BE), which runs batchjobs immediately when they arrive and uses all available ITto finish batch jobs as quickly as possible; (iv) Flat, whichruns batch jobs at a constant rate within the deadline periodfor the jobs.

Figure 10 shows the detailed schedule and power consump-tion for each approach, including IT power (batch and in-teractive workloads), cooling, and supply. As shown in thefigure, compared with other approaches, Optimal reshapesthe batch job demand and fully utilizes the renewable sup-ply, and uses non-renewable energy, if necessary, to completethe batch jobs during this 24-hour period. These additionalbatch jobs are scheduled to run between 3am and 6am or11pm and midnight, when the outside air cooling is most ef-ficient and/or the electricity price is lower. As a result, our

182

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)

interactivebatch jobcooling power

IT capacityrenewable

(a) Optimal

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(b) Night

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(c) Best Effort (BE)

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(d) Flat

Optimal Night BE Flat0

500

1000

1500

2000

ener

gy (

kWh)

grid powerrenewable

(e) Power usage comparisonsenergy cost CO2 emission

0

0.5

1

1.5

2

norm

aliz

ed v

alue

OptimalNightBEFLAT

(f) Efficiency comparisons

Figure 10: Power cost minimization while finishingall jobs

solution reduces the grid power consumption by 39%-63%compared to other approaches (Figure 10(e)). Though notclear in the figure, the optimal solution does consume a bitmore total power (up to 2%) because of the low cooling ef-ficiency around noon. Figure 10(f) shows the average recur-ring electricity cost and CO2 emission per job. The energycost per job is reduced by 53%-83% and the CO2 emissionper job is reduced by 39%-63% under the Optimal schedule.The adaptation of workload management to renewable avail-ability is clear in Figure 10. Less clear is the importance ofmanaging the workload in a manner that is “cooling aware”.

The importance of cooling aware scheduling: As dis-cussed in Sections 2.2 and 3.1, the cooling efficiency andcapacity of a cooling supply often vary over time. This isparticularly true for outside air cooling. One important com-ponent of our solution is to schedule workloads by taking intoaccount time varying cooling efficiency and capacity. To un-derstand the benefits of cooling integration, we compare ouroptimal solution as shown in Figure 10(a) with two solu-tions that are renewable aware but handle cooling differently:(i)Cooling-oblivious ignores cooling power and considers ITpower only, (ii)Static-cooling uses a static cooling efficiency(assuming the cooling power is 30% of IT power) to esti-mate the cooling power from IT power and incorporates thecooling power into workload scheduling.

Figures 11(a) and 11(b) show Cooling-oblivious and Static-cooling schedules, respectively. As shown in the figures, bothschedules integrate renewable energy into scheduling and runmost batch jobs when renewables are available. However, be-cause they do not capture the cooling power accurately, theycannot fully optimize workload schedule. In particular, byignoring the cooling power, Cooling-oblivious underestimatesthe power demand and runs more jobs than the available PVsupply and hence uses more grid power during the day. Thisis also less cost-efficient because the electricity price peaks atthat time. On the other hand, by overestimating the coolingpower demand, Static-cooling fails to fully utilize the renew-able supply and results in inefficiency, too. Figures 11(c) and11(d) compare the total power usage and energy efficiency,

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(a) Cooling-oblivious

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(b) Static-cooling

Optimal Cooling−oblivious Static−cooling0

500

1000

1500

2000

ener

gy (

kWh)

grid powerrenewable

(c) Power usage comparisonsenergy cost CO2 emission

0

0.5

1

1.5

2

norm

aliz

ed v

alue

OptimalCooling−obliviousStatic−cooling

(d) Efficiency comparisons

Figure 11: Benefit of cooling integration

i.e., normalized energy cost and carbon emission per batchjob of these two approaches and Optimal, respectively. Byaccurately modeling the cooling efficiency and adapting totime variations in cooling efficiency, our solution reduces theenergy cost by 20%-38% and CO2 emission by 4%-28%.

The importance of optimizing the cooling micro-grid:Optimizing the workload scheduling based on cooling effi-ciency is only one aspect of our cooling integration. Anotherimportant aspect is to optimize the cooling micro-grid, i.e,using the proper amount of cooling capacity from each cool-ing resource. This is important because different cooling sup-plies exhibit different cooling efficiencies as the IT demandand external conditions such as OAT changes. Our solutiontakes this into account and optimizes the cooling capacitymanagement in addition to IT workload management.

We compare: (i) Optimal adjusts the cooling capacity foroutside air cooling and chiller cooling based on the dynamiccooling efficiency, which is determined by IT demand andoutside air temperature; (ii) Binary Outside Air (BOA) usesoutside air cooling at its full capacity if OAT exceeds somethreshold (25◦C) or interactive demand is too low (less than10% of the IT capacity) and does not use it at all otherwise;(iii) Chiller only uses the chiller cooling only. All three so-lutions are renewable and cooling aware and schedule work-load according to the renewable supply and cooling efficiency.They finish the same number of batch jobs. The differenceis how they manage cooling resources and capacity.

Figure 12 shows the cooling capacity from outside air cool-ing and chiller cooling for these three solutions. As shownin this figure, Optimal uses outside air only during nightwhen it is more efficient, and combines outside air coolingand chiller cooling during other times. In particular, our so-lution uses less outside air cooling and more chiller coolingbetween 1pm and 4pm as outside air cooling is less efficientdue to high IT demand and outside air temperature at thattime. In contrast, BOA runs outside air at full capacityexcept the hours around noon when the outside air is lessefficient. Figures 12(g) and 12(h) compare the power, cost,and cooling consumption of the three approaches. By op-timizing the cooling substructure, our solution reduces thecooling power by 66% over BOA and 48% over Chiller only.

Is net zero energy consumption possible with renewableand cooling aware workload management?Now, we switch our goal from minimizing the cost incurredby the data center to minimizing the environmental impact ofthe data center. Net zero is often used to describe a buildingwith zero net energy consumption and zero carbon emissionannually. Recently, researchers have envisioned how net zero

183

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(a) Optimal plan

0 6 12 18 240

50

100

150

200

hour

cool

ing

capa

city

(kW

)

outside airchiller

(b) Optimal cooling capacity

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(c) BOA plan

0 6 12 18 240

50

100

150

200

hour

cool

ing

capa

city

(kW

)

outside airchiller

(d) BOA cooling capacity

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(e) Chiller only plan

0 6 12 18 240

50

100

150

200

hour

cool

ing

capa

city

(kW

)

outside airchiller

(f) Chiller only cooling capac-ity

Optimal BOA Chiller only0

100

200

300

400

cool

ing

ener

gy (

kWh)

(g) Cooling powerenergy cost CO2 emission

0

0.5

1

1.5

2

norm

aliz

ed v

alue

OptimalBOAChiller only

(h) Efficiency comparisons

Figure 12: Benefit of cooling optimization

building concepts can be effectively extended into the datacenter space to create a net zero data center, whose totalpower consumption is less than or equal to the total powersupply from renewable. We explore if net zero is possiblewith renewable and cooling aware workload management indata centers and how much it will cost.

By adding a net-zero constraint (i.e., total power consump-tion ≤ total renewable supply) to our optimization problem,our capacity planner can generate a net-zero schedule. Fig-ure 13(a) shows our solution (Net-zero1 ) for achieving a netzero operation goal. Similar to the optimal solution shownin Figure 10(a), Net-zero1 optimally schedules batch jobsto take advantage of the renewable supply; however, batchjobs are only executed when renewable energy is availableand without exceeding the total renewable generation, andthus some are allowed to not finish during this 24 hour pe-riod. In this case, about 40% of the batch jobs are delayedtill a future time when a renewable energy surplus may ex-ist. Additionally, some renewable energy is reserved to offsetnon-renewable energy used at night for interactive workloads.This excess renewable power is reserved in the afternoonwhen the outside air temperature peaks, thus minimizingthe energy required for cooling.

A key to achieving net zero is energy storage. By maximiz-ing renewable usage directly, Net-zero1 reduces the depen-dency on storage and hence the capital cost. To understandthe benefit, we compare Net-zero1 with another schedule,Net-zero2, which runs the same amount of batch jobs butdistributes the batch jobs over 24-hours as shown in Fig-

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(a) Net-zero1

0 6 12 18 240

50

100

150

200

hour

pow

er (

kW)



(b) Net-zero2

Figure 13: Net Zero Energy

0 0.2 0.4 0.6 0.8 10

1000

2000

3000

4000

solar ratio

grid

pow

er u

sage

(kW

h)

storage = 0storage = 100kWhstorage = 200kWh

(a) With turning off servers

0 0.2 0.4 0.6 0.8 10

2000

4000

6000

8000

solar ratio

grid

pow

er u

sage

(kW

h)

storage = 0storage = 100kWhstorage = 200kWh

(b) Without turning offservers

Figure 14: Optimal renewable portfolio

ure 13(b). Both approaches achieve the net-zero goal, butNet-zero2 uses 287% more grid power compared to our solu-tion Net-zero1. As a result, the energy storage sizes of Net-zero1 and Net-zero2 are 82kWh and 330kWh, respectively.Using an estimated cost of 400$/kWh [22], this difference inenergy demand results in $99,200 more energy storage ex-penditure for Net-zero2.

What portfolio of renewable sources is best?To this point, we have focused on PV solar as the sole sourceof renewable energy. Since wind energy is becoming increas-ingly cost-effective, a sustainable data center will likely useboth solar and wind to some degree. The question thatemerges is which one is better source from the perspectiveof optimizing data center energy efficiency. More generally,what is the optimal portfolio of renewable sources?

We conduct a study using the wind and solar traces de-picted in Figure 2. Assuming an average renewable supply of200kW, we vary the mix of solar and wind in the renewablesupply. For each mix, we use our capacity management opti-mization algorithm to generate an optimal workload sched-ule. We compare the non-renewable power consumption fordifferent renewable mixes for two cases (turning off unusedservers and without turning off unused servers). Figure 14shows the results as a function of percentage of solar withdifferent storage capacity. As shown in the figure, the opti-mal portfolio contains more solar than wind because solar isless volatile and the supply aligns better with IT demand.

However, wind energy is still an important component anda small percentage of wind can help improve the efficiency.For example, the optimal portfolio without storage consistsof about 60% solar and 40% wind. When we do not turnoff unused servers, wind becomes more valuable due to thesignificant power consumption during night even when theserver utilization is low.

In summary, solar is a better source for local data centersin sunny areas such like Palo Alto and a small addition ofwind can help improve energy efficiency. The optimal port-folio varies for different areas. Additionally, recent work hasshown that the value of wind increases significantly whengeographically diverse data centers are considered [17, 20].

5.2 Experimental Results on a Real TestbedThe case studies described in the previous section high-

light the theoretical benefits of our approach over existingsolutions. To verify our claims and ensure that we have a

184

practical and robust solution, we experimentally evaluate ourprototype implementation on a real data center testbed andcontrast it with a current workload management approach.

5.2.1 Experiment SetupOur testbed consists of four high end servers (each with

two 12-core 1.8GHz processors and 64 GB memory) and thefollowing workloads: one interactive Web application, and 6batch applications. Each server is running Scientific Linux.Each workload is running inside a KVM virtual machine.The interactive application is a single-tier web applicationrunning multiple Apache web servers and batch jobs are sys-bench [42] with different resource demands. httperf [43] isused to replay the workload demand traces in the form ofCGI requests, and each request is directed to one of theWeb servers. The PV, cooling data, and interactive workloadtraces used in the case study are scaled to the testbed ca-pacity. We measure the power consumption via the server’sbuilt-in management interfaces, collect CPU consumptionthrough system monitoring and obtain response times ofWeb servers from Apache logs.

5.2.2 Experiment ResultsWe compare two approaches: (i) Optimal is our optimal

design, (ii) Night, which runs batch jobs at night. For eachplan, the runtime workload manager dynamically starts thebatch jobs, allocates resources to both interactive and batchworkloads and turns on/off servers according to the plan.Figures 15(a) and 15(b) shows the CPU allocation of ouroptimal solution and the actual CPU consumption measuredin the experiment, respectively. The results show that actualresource usage closely follows the power generated by thecapacity planner. The results for Night are similar. Wefurther compare the predicted power usage in the plan andthe actual power consumption in the experiment for bothapproaches. From Figures 15(c) and 15(d), we see that theactual power usage is close to the power plan.

We then compare the power consumption and performanceof the two approaches. Figure 16(a) shows the power con-sumption. Optimal does more work during the day when therenewable energy source is available. Night uses additionalservers from midnight to 6am to run batch jobs while oursolution starts batch jobs around noon by taking advantageof renewable energy. Compared with Night, our approachreduces the grid power usage by 48%. One thing worth men-tioning is that the total power is not quite proportional tothe total CPU utilization as a result of the large idle partof the server power. This is most noticeable when the num-ber of servers is small, as we see in the experimental results,Figures 15(b) and 15(c). When the total number of serversincreases, the impact of idle power decreases and Optimalwill save even more grid power.

One reason that batch jobs are scheduled to run at nightis to avoid interfering with interactive workloads. Our ap-proach runs more jobs during the day when the web serverdemand is high. To understand the impact of this on per-formance, we compare the average response time of the webserver. The results show that both approaches are almostidentical: 153.0ms for Optimal, compared to 156.0ms forNight. See [28] for more details. This is because both so-lutions satisfy web server demand and Linux KVM/Cgroupsscheduler is preemptive and enables CPU resources to be ef-fectively virtualized and shared among virtual machines [40].In particular, assigning a much higher priority to virtual ma-chines hosting the web servers guarantees that resources areavailable as needed by the web servers.

In summary, the real experiment results demonstrate that(i) the optimization-based workload management scheme cantranslate effectively into a prototype implementation, (ii)compared with traditional workload management solution,

0 6 12 18 240

20

40

60

80

hour

CP

U u

tiliz

atio

n

interactivebatch job

(a) CPU allocation (Optimal)

0 6 12 18 240

20

40

60

80

hour

CP

U u

tiliz

atio

n

interactivebatch jobs

(b) CPU consumption (Opti-mal)

0 6 12 18 240

200

400

600

800

1000

hour

pow

er (

Wat

t)

planexperiment

(c) Power (Optimal)

0 6 12 18 240

200

400

600

800

1000

hour

pow

er (

Wat

t)

planexperiment

(d) Power (Night)

Figure 15: Comparison of plan and experimental re-sults

0 6 12 18 240

200

400

600

800

1000

1200

hour

pow

er (

Wat

t)

OptimalNightrenewable supply

(a) Power consumptionOptimal Night

0

5

10

ener

gy (

kWh)

grid powerrenewable

(b) Total power consumption

Figure 16: Comparison of optimal and night

Optimal significantly reduces the use of grid power withoutdegrading the performance of critical demand.

6. CONCLUDING REMARKSOur goal in this paper is to provide an integrated workload

management system for data centers that takes advantage ofthe efficiency gains possible by shifting demand in a way thatexploits time variations in electricity price, the availabilityof renewable energy, and the efficiency of cooling. There aretwo key points we would like to highlight about our design.

First, a key feature of the design is the integration of thethree main data center silos: cooling, power, and IT. Thougha considerable amount of work exists in optimizing efficien-cies of these individually, there is little work that providesan integrated solution for all three. Our case studies illus-trate that the potential gains from an integrated approachare significant. Additionally, our prototype illustrates thatthese gains are attainable. In both cases, we have taken caseto measurements from a real data center and traces of realapplications to ensure that our experiments are meaningful.

Second, it is important to point out that our approachuses a mix of implementation, modeling, and theory. At thecore of our design is a cost optimization that is solved by theworkload manager. Care has been taken in designing andsolving this optimization so that the solution is “practical”(see the characterization theorems in Section 3.5). Buildingan implementation around this optimization requires signifi-cant measurement and modeling of the cooling substructure,and the incorporation of predictors for workload demand andPV supply. We see one role of this paper as a proof of conceptfor the wide-variety of “optimization-based designs” recentlyproposed, e.g., [16, 17, 18, 19, 3, 11, 20].

There are a number of future directions building on thiswork including integrating reliability and a more detailedstudy of the role of storage. But, perhaps the most exciting

185

direction is the potential to consider renewable and coolingaware workload management of geographically diverse datacenters as opposed to the local workload management con-sidered here. As illustrated in [17, 20], geographical diver-sity can be a significant aid in handling the intermittency ofrenewable sources and electricity price fluctuations. For ex-ample, the fact that fluctuations in wind energy are nearlyuncorrelated for significantly distant locations means thatwind can provide a nearly constant baseline supply of energyif workload can be adapted geographically. Another futuredirection we would like to highlight is to understand the im-pact of workload management on the capacity planning ofa data center, e.g. the size of renewable infrastructure, thecapacity of IT and cooling infrastructure, and the capital ex-pense and to minimize the Total Cost of Ownership (TCO).

AcknowledgementsWe are grateful to many members of Sustainable Ecosys-tem Research Group at HP Labs. Chandrakant Patel hasprovided great support and guidance at various stages ofthis work. Niru Kumari provided valuable information onchiller cooling models. Martin Arlitt, Amip Shah, SergeyBlagodurov and Alan McReynolds offered helpful feedback.We also thank the anonymous reviewers and our shepherd,Christopher Stewart, for their valuable comments and help.

7. REFERENCES[1] X. Fan, W.-D. Weber, and L. A. Barroso, “Power

provisioning for a warehouse-sized computer,” in Proc. ofACM ISCA, 2007.

[2] A. Gandhi, M. Harchol-Balter, and C. L. R. Das, “Optimalpower allocation in server farms,” in Proc. of ACMSigmetrics, 2009.

[3] M. Lin, A. Wierman, L. L. H. Andrew, and E. Thereska,“Dynamic right-sizing for power-proportional data centers,”in Proc. of INFOCOM, 2011.

[4] A. Wierman, L. L. H. Andrew, and A. Tang, “Power-awarespeed scaling in processor sharing systems,” in Proc. ofINFOCOM, 2009.

[5] J. Choi, S. Govindan, B. Urgaonkar, andA. Sivasubramaniam, “Power consumption prediction andpower-aware packing in consolidated environments,” IEEETransactions on Computers, vol. 59, no. 12, 2010.

[6] R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, andX. Zhu, “No “power” struggles: Coordinated multi-levelpower management for the data center,” in Proc. ofASPLOS, 2008.

[7] Z. Wang, A. McReynolds, C. Felix, C. Bash, and C. Hoover,“Kratos: Automated management of cooling capacity indata centers with adaptive vent tiles,” in Proc. of IMECE,2009.

[8] C. E. Bash, C. D. Patel, , and R. K. Sharma, “Dynamicthermal management of aircooled data centers,” in Proc. ofITHERM, 2006.

[9] W. Huang, M. Allen-Ware, J. Carter, E. Elnozahy,H. Hamann, T. Keller, C. Lefurgy, J. Li, K. Rajamani, andJ. Rubio, “Tapo: Thermal-aware power optimizationtechniques for servers and data centers,” in Proc. of IGCC,2011.

[10] J. Moore, J. Chase, P. Ranganathan, and R. Sharma,“Making scheduling cool: Temperature-aware workloadplacement in data centers,” in Proc. of USENIX ATC, 2005.

[11] E. Pakbaznia and M. Pedram, “Minimizing data centercooling and server power costs,” in Proc. of ISLPED, 2009.

[12] Y. Chen, D. Gmach, C. Hyser, Z. Wang, C. Bash,C. Hoover, and S. Singhal, “Integrated management ofapplication performance, power and cooling in datacenters,” in Proc. of NOMS, 2010.

[13] T. Breen, E. Walsh, J. Punch, C. Bash, and A. Shah, “Fromchip to cooling tower data center modeling: Influence ofserver inlet temperature and temperature rise acrosscabinet,” Journal of Electronic Packaging, vol. 133, no. 1,2011.

[14] D. Gmach, J. Rolia, C. Bash, Y. Chen, T. Christian,A. Shah, R. Sharma, and Z. Wang, “Capacity planning andpower management to exploit sustainable energy,” in Proc.of CNSM, 2010.

[15] “Clean urban energy: Turn buildings into batteries.” 2011.[16] K. Le, O. Bilgir, R. Bianchini, M. Martonosi, and T. D.

Nguyen, “Capping the brown energy consumption ofinternet services at low cost,” in Proc. IGCC, 2010.

[17] Z. Liu, M. Lin, A. Wierman, S. H. Low, and L. L. H.Andrew, “Greening geographical load balancing,” in Proc.ACM Sigmetrics, 2011.

[18] L. Rao, X. Liu, L. Xie, and W. Liu, “Minimizing electricitycost: Optimization of distributed internet data centers in amulti-electricity-market environment,” in Proc. ofINFOCOM, 2010.

[19] P. Wendell, J. W. Jiang, M. J. Freedman, and J. Rexford,“Donar: decentralized server selection for cloud services,” inProc. of ACM Sigcomm, 2010.

[20] Z. Liu, M. Lin, A. Wierman, S. H. Low, and L. L. H.Andrew, “Geographical load balancing with renewables,” inProc. ACM GreenMetrics, 2011.

[21] M. Lin, Z. Liu, A. Wierman, and L. L. Andrew, “Onlinealgorithms for geographical load balancing.”http://http://www.cs.caltech.edu/ zliu2/onlineGLB.pdf.

[22] Electricity Energy Storage Technology Options: A WhitePaper Primer on Applications, Costs and Benefits. 2010.

[23] “Report to congress on server and data center energyefficiency.” 2007.

[24] R. Zhou, Z. Wang, A. McReynolds, C. Bash, T. Christian,and R. Shih, “Optimization and control of coolingmicrogrids for data centers,” in Proc. of ITherm, 2012.

[25] C. Patel, R. Sharma, C. Bash, and A. Beitelmal, “Energyflow in the information technology stack,” in Proc. ofIMECE, 2006.

[26] F. Bleier, Fan Handbook: Selection, Application andDesign. New York: McGraw-Hill, 1997.

[27] J. Holman, Heat Transfer. 8th ed. New York: McGraw-Hill,1997.

[28] Z. Liu, Y. Chen, C. Bash, A. Wierman, D. Gmach,Z. Wang, M. Marwah, and C. Hyser, “Renewable andcooling aware workload management for sustainable datacenters,” Tech. Rep. HPL-2012-73, HP Labs, April 2012.

[29] R. Urgaonkar, B. Urgaonkar, M. Neely, andA. Sivasubramaniam, “Optimal power cost managementusing stored energy in data centers,” in Proc. of ACMSigmetrics, 2011.

[30] S. Ong, P. Denholm, and E. Doris, “The impacts ofcommercial electric utility rate structure elements on theeconomics of photovoltaic systems,” Tech. Rep.NREL/TP-6A2-46782, National Renewable EnergyLaboratory, 2010.

[31] L. L. H. Andrew, M. Lin, and A. Wierman, “Optimality,fairness and robustness in speed scaling designs,” in Proc. ofACM Sigmetrics, 2010.

[32] E. Thereska, A. Donnelly, and D. Narayanan, “Sierra: apower-proportional, distributed storage system,” Tech. Rep.MSR-TR-2009-153, Microsoft Research, 2009.

[33] D. P. Bertsekas, Nonlinear Programming. Athena Scientific,1999.

[34] http://cvxr.com/cvx/[35] V. Vazirani, Approximation Algorithms. Springer, 2003.[36] D. R. Cox, “Prediction by exponentially weighted moving

averages and related methods,” 1961.[37] A. Kansal, J. Hsu, S. Zahedi, and M. B. Srivastava, “Power

management in energy harvesting sensor networks,” 2007.[38] N. Sharma, P. Sharma, D. Irwin, and P. Shenoy, “Predicting

solar generation from weather forecasts using machinelearning,” in Proc. of SmartGridComm, 2011.

[39] Y. Becerra, D. Carrera, and E. Ayguade, “Batch jobprofiling and adaptive profile enforcement for virtualizedenvironments,” in Proc. of ICPDNP, 2009.

[40] “Kvm kernel based virtual machine.”http://www.redhat.com/f/pdf/rhev/DOC-KVM.pdf.

[41] http://www.ferc.gov/market-oversight/mkt electric[42] “Sysbench: a system performance benchmark.”

http://sysbench.sourceforge.net/.[43] http://www.hpl.hp.com/research/linux/httperf/

186

Documents

Renewable and cooling aware workload management for