15
Cluster Comput (2008) 11: 213–227 DOI 10.1007/s10586-008-0060-0 Autonomic resource management in virtualized data centers using fuzzy logic-based approaches Jing Xu · Ming Zhao · José Fortes · Robert Carpenter · Mazin Yousif Received: 31 May 2008 / Accepted: 16 June 2008 / Published online: 16 July 2008 © Springer Science+Business Media, LLC 2008 Abstract Data centers, as resource providers, are expected to deliver on performance guarantees while optimizing re- source utilization to reduce cost. Virtualization techniques provide the opportunity of consolidating multiple separately managed containers of virtual resources on underutilized physical servers. A key challenge that comes with virtualiza- tion is the simultaneous on-demand provisioning of shared physical resources to virtual containers and the manage- ment of their capacities to meet service-quality targets at the least cost. This paper proposes a two-level resource manage- ment system to dynamically allocate resources to individ- ual virtual containers. It uses local controllers at the virtual- container level and a global controller at the resource-pool level. An important advantage of this two-level control ar- chitecture is that it allows independent controller designs for separately optimizing the performance of applications and the use of resources. Autonomic resource allocation is re- alized through the interaction of the local and global con- trollers. A novelty of the local controller designs is their use of fuzzy logic-based approaches to efficiently and robustly J. Xu ( ) · M. Zhao · J. Fortes University of Florida, Gainesville, FL, USA e-mail: [email protected]fl.edu M. Zhao e-mail: [email protected]fl.edu J. Fortes e-mail: [email protected]fl.edu R. Carpenter · M. Yousif Intel Corporation, Hillsboro, OR, USA R. Carpenter e-mail: [email protected] M. Yousif e-mail: [email protected] deal with the complexity and uncertainties of dynamically changing workloads and resource usage. The global con- troller determines the resource allocation based on a pro- posed profit model, with the goal of maximizing the to- tal profit of the data center. Experimental results obtained through a prototype implementation demonstrate that, for the scenarios under consideration, the proposed resource management system can significantly reduce resource con- sumption while still achieving application performance tar- gets. Keywords Autonomic computing · Fuzzy modeling · Resource management · Virtualization · Two-level control · Data centers 1 Introduction Today’s data centers host a variety of business-critical ap- plications such as web hosting, e-commerce sites and en- terprise systems on shared hardware platforms. The need to manage multiple applications in a shared infrastructure cre- ates the challenge (and also the opportunity) of on-demand resource provisioning and allocation in response to time- varying workloads. Ideally, data centers should provide a “pay-per-use” approach, i.e., hosted applications or applica- tion providers would deliver services to their customers us- ing hardware resources provided by data centers that charge based on the resources consumed (instead of charging for resources needed to meet peak demands). To realize this in a cost-effective manner, the data center must provide flexi- ble and manageable execution environments that are special- ized for each application without compromising its ability to share resources among applications and delivering to them the necessary performance, security and isolation.

Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227DOI 10.1007/s10586-008-0060-0

Autonomic resource management in virtualized data centers usingfuzzy logic-based approaches

Jing Xu · Ming Zhao · José Fortes · Robert Carpenter ·Mazin Yousif

Received: 31 May 2008 / Accepted: 16 June 2008 / Published online: 16 July 2008© Springer Science+Business Media, LLC 2008

Abstract Data centers, as resource providers, are expectedto deliver on performance guarantees while optimizing re-source utilization to reduce cost. Virtualization techniquesprovide the opportunity of consolidating multiple separatelymanaged containers of virtual resources on underutilizedphysical servers. A key challenge that comes with virtualiza-tion is the simultaneous on-demand provisioning of sharedphysical resources to virtual containers and the manage-ment of their capacities to meet service-quality targets at theleast cost. This paper proposes a two-level resource manage-ment system to dynamically allocate resources to individ-ual virtual containers. It uses local controllers at the virtual-container level and a global controller at the resource-poollevel. An important advantage of this two-level control ar-chitecture is that it allows independent controller designs forseparately optimizing the performance of applications andthe use of resources. Autonomic resource allocation is re-alized through the interaction of the local and global con-trollers. A novelty of the local controller designs is their useof fuzzy logic-based approaches to efficiently and robustly

J. Xu (�) · M. Zhao · J. FortesUniversity of Florida, Gainesville, FL, USAe-mail: [email protected]

M. Zhaoe-mail: [email protected]

J. Fortese-mail: [email protected]

R. Carpenter · M. YousifIntel Corporation, Hillsboro, OR, USA

R. Carpentere-mail: [email protected]

M. Yousife-mail: [email protected]

deal with the complexity and uncertainties of dynamicallychanging workloads and resource usage. The global con-troller determines the resource allocation based on a pro-posed profit model, with the goal of maximizing the to-tal profit of the data center. Experimental results obtainedthrough a prototype implementation demonstrate that, forthe scenarios under consideration, the proposed resourcemanagement system can significantly reduce resource con-sumption while still achieving application performance tar-gets.

Keywords Autonomic computing · Fuzzy modeling ·Resource management · Virtualization · Two-level control ·Data centers

1 Introduction

Today’s data centers host a variety of business-critical ap-plications such as web hosting, e-commerce sites and en-terprise systems on shared hardware platforms. The need tomanage multiple applications in a shared infrastructure cre-ates the challenge (and also the opportunity) of on-demandresource provisioning and allocation in response to time-varying workloads. Ideally, data centers should provide a“pay-per-use” approach, i.e., hosted applications or applica-tion providers would deliver services to their customers us-ing hardware resources provided by data centers that chargebased on the resources consumed (instead of charging forresources needed to meet peak demands). To realize this ina cost-effective manner, the data center must provide flexi-ble and manageable execution environments that are special-ized for each application without compromising its ability toshare resources among applications and delivering to themthe necessary performance, security and isolation.

Page 2: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

214 Cluster Comput (2008) 11: 213–227

Virtualization is key to this vision, by allowing physicalservers to be carved into multiple virtual resource contain-ers, and enabling a virtualized data center where applica-tions are hosted and managed in their dedicated virtualizedcontainers. In particular, virtual machines (e.g., [2, 7, 15]),which provide strong isolation, security and customizabil-ity, can be dynamically created to serve as virtual containers.The management of these containers, e.g., lifecycle manage-ment and resource allocation, can be conducted through theinterface provided by the virtualization platform.

Applications served by a data center are usually business-critical applications with Quality-of-Service (QoS) require-ments. The resource allocation needs to not only guaranteethat a virtual container always has enough resources to meetits application’s performance goals, but also prevent over-provisioning in order to reduce cost and allow the concurrenthosting of more applications. Static allocation approachesthat consider a fixed set of applications and resources cannotbe used because of changing workload mixes, and solutionsthat only consider behavior of individual applications fail tocapture the competition for shared resources by virtualizedcontainers.

This paper presents a two-level resource managementsystem that enables automatic and adaptive resource allo-cation in accordance with Service Level Agreements (SLA)specifying dynamic tradeoffs of service quality and cost. Re-source management in a data center is decoupled at two lev-els: virtual containers and resource pools. The key to cost-effective resource allocation is the ability to efficiently findthe minimum amount of resources that an application needsto meet the desired QoS. In each virtual container hosting anapplication, a local controller is responsible for determiningthe amount of resources needed by the application and mak-ing resource requests accordingly. A global controller re-sponds to the local controllers’ requests by dynamically al-locating resources across multiple virtual containers hostedon the same physical resources. It controls the resource al-locations in a way that maximizes the data center’s profit.

Two fuzzy-logic-based approaches—fuzzy modeling andfuzzy prediction—are proposed for use by local controllersin automatically learning of runtime behavior of virtual con-tainers under dynamically changing workloads. One of theadvantages of fuzzy approaches is that they do not requireprior knowledge or a mathematical model of the system be-ing managed. They typically do not need time-consumingtraining, which makes them suitable for real-time control.Moreover, the approaches are robust with respect to noisydata and have the ability to adapt to changes very quickly.A prototype of the proposed two-level resource manage-ment system has been deployed on a virtualized data cen-ter testbed. Typical e-business applications with syntheticworkloads and real-world traces were used to evaluate accu-racy of the fuzzy-logic-based approaches employed by the

local controller and the efficiency of resource allocation de-termined by the global controller. The results show that theproposed system can effectively allocate resources to virtualcontainers under dynamically changing workloads, and sig-nificantly reduce resource consumption while still achievingthe desired application performance.

The rest of this paper is organized as follows. Section 2provides an overview of the two-level resource managementsystem. Sections 3 and 4 describe in detail the designs of thelocal controller and the global controller. Section 5 presentsan evaluation of the prototype. Section 6 examines relatedwork and Sect. 7 concludes the paper.

2 Two-level autonomic resource control

A data center, illustrated in Fig. 1, serves a number of appli-cations. Each delivers a distinct service to its customers us-ing (virtual) resources provided by its dedicated container,which is the virtual machine that hosts the application. Thedata center allocates the physical resources to each virtualcontainer based on its application’s resource needs.

Application SLAs between an application provider and itscustomers state the quality of service providers promised tothe clients. To achieve performance isolation and guaran-tee an application SLA independently of the load on othercontainers, a local resource controller is employed in eachvirtual container to estimate the resources needed by the ap-plication’s workload and to make resource requests to theglobal controller. By doing so, the local controller mini-mizes leasing costs by avoiding over-provisioning for theapplication running on the container. Resource SLAs be-tween application providers and the data center owner spec-ify both the cost of rental resources and the penalty duein case the data center fails to deliver resources needed byapplication providers. The underlying assumption is that ifthe data center does not allocate enough physical resourcesrequested by the local controller resulting in its applica-tion’s SLA violation, the data center provider will be pe-nalized. The global controller makes allocation decisionsamong competing requests, trying to avoid violations of re-source SLAs.

This two-level resource control system is preferred overthe more obvious centralized approach in which all the con-trol functions are implemented at one centralized location.Since local containers are independent of each other, hetero-geneous local controllers’ implementations are possible. Allof the internal complexities of control functions in virtualcontainers are compressed by local controllers into straight-forward resource requests, which specify the amount of re-sources needed. The system handles two different types ofoptimizations independently. The local controller tries tominimize the resource consumed by the virtual container

Page 3: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227 215

Fig. 1 “Pay-as-you-go” datacenter with virtual resourcecontainers to host applications,and a two-level controllerarchitecture to allocate physicalresources to containers

to reduce the resource cost while still satisfying applicationSLAs of its clients. The global controller seeks to maximizeits own profit, which is the revenue received from allocat-ing its resources among virtual containers minus the cost ofpenalties incurred from resource SLA violations. The con-troller design is applicable to manage any type of resourcesuch as CPU, memory and I/O bandwidth, while provision-ing and managing CPU cycles are of particular interest inthis paper and tested in more detail. The following sectionsexplain our approach to the design of the local and globalcontrollers.

3 Local resource controller

Interaction between the local and global controllers enablesa virtual container to augment its resources in response toincreased workload, and to reduce its resources when theyare no longer needed. The main task of the local controlleris to estimate the set of resources needed by an applicationrunning in the container. Our approach to the design of sucha controller is based on fuzzy logic theory, as discussed next.

3.1 Background

Fuzzy logic [21] is a tool to deal with uncertain, imprecise,or qualitative decision-making problems. Unlike Booleanlogic, where an element x either belongs or does not belongto a set A, in fuzzy logic the membership of x in a fuzzy setF has a degree value (called fuzzy value) in a continuousinterval between 0 and 1 representing the extent to which x

belongs to F . Fuzzy sets are defined by membership func-tions that map set elements into the interval [0,1].

One of the most important applications of fuzzy logicis the design of fuzzy rule-based systems. These systemsuse “IF-THEN” rules (also called fuzzy rules) whose an-tecedents and consequents use fuzzy-logic statements to rep-resent the knowledge or control strategies of the system. Thecollection of fuzzy rules is called a rule base. There are manyapproaches to the construction of fuzzy rules, for example,by capturing expert experience or system operator’s controlactions. The approach taken for the design of our systemis to learn fuzzy rules using online monitoring information,making it a so-called self-organizing fuzzy system.

Fig. 2 The monitored runtime behavior of virtual container

The process of formulating the mapping from inputs tooutputs using fuzzy rules is called the fuzzy inference (FIS)mechanism. Since fuzzy rules use fuzzy sets and their asso-ciated membership functions to describe system variables,two functions are necessary for translating between numericvalues and fuzzy values. The process of translating input val-ues into one or more fuzzy sets is called fuzzification. De-fuzzification is the inverse transformation which derives asingle numeric value that best represents the inferred fuzzyvalues of the output variable.

3.2 Virtual container run-time behavior

To determine the resource needs of an application hosted ina virtual container, the local controller needs to learn the be-havior of the virtual container under dynamically changingworkloads. Figure 2 shows the abstracted inputs and out-puts of a virtual container that hosts a running application.The virtual container receives the application workload fromits clients, and utilizes the physical resources provided bythe data center resource pool to process the workload. Theachieved QoS of the application depends on the amount ofallocated resources and the incoming workload. The infor-mation about the application’s workload, its received perfor-mance and its virtual container’s resource consumption aremonitored by the system sensors as Fig. 2 illustrates. Thelocal controller adaptively adjusts the resources requestedfrom the global controller, in order to achieve desired QoSwith the minimum cost. Depending on what information isavailable from the system, two approaches are proposed forestimating resource needs: (1) fuzzy modeling to character-ize the relationship between workload and resource use and(2) fuzzy prediction to determine a mapping from observa-tions of recent resource usage to future resource needs.

Page 4: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

216 Cluster Comput (2008) 11: 213–227

Fig. 3 Fuzzy modeling andinference functions in a localresource controller

3.3 Fuzzy-modeling approach

The first approach uses fuzzy logic to model the behaviorof a virtual container by automatically learning the relation-ship between workload and the corresponding resource con-sumption when the desired QoS is achieved. It requires thesystem to periodically monitor the application workload andtheir resource usage, which are then used as an input-outputdata pair for generating fuzzy rules. Figure 3 illustrates thekey functions for fuzzy modeling in the local controller. Thedata monitored by the sensors are first processed by the fil-tering and clustering functions. The modeling function con-structs fuzzy IF-THEN rules using the produced data clus-ters and keeps them in a rule base. The fuzzy model’s pa-rameters determined by the calculated centers and ranges ofdata clusters are also stored in a database. Once the fuzzymodel is built up, the fuzzy inference functions periodicallyprocess the fuzzy rules kept in the knowledge base to deter-mine the resource needs based on the currently monitoredworkload. The rest of this section explains these functionsin detail.

Data monitoring and filtering The sensors periodicallymeasure the application workload w(t), its performancep(t), and the resource usage r(t) of a virtual container. Fora typical data center application, its workload can usuallybe described by the rate and mixture of the requests. Forinstance, a Web server’s workload can be characterized bythe HTTP request rate as well as the ratio of static Web-content requests to dynamic ones. The performance metricsare often directly taken from the SLA, e.g. the throughput(number of completed transactions per second) and/or aver-age service response time.

The metrics for resource utilization are associated withthe different types of consumed physical resources, includ-ing CPU utilization, used memory size, disk storage, diskI/O rate and network bandwidth. However, an application’svirtual resource usage (the values collected inside of the vir-tual container) does not necessarily represent its physical re-source consumption. For example, an application’s networkI/O consumes not only the physical network bandwidth, but

also the physical CPU cycles. In the proposed approach, anapplication’s resource usage is obtained by directly moni-toring the physical resource consumption of its virtual con-tainer. This is sensible because in the envisioned data centera virtual container is dedicated to an application.

The measured workload and its resource usage composethe input-output data used for system modeling. A sequenceof input-output data pairs (w(t), r(t)) produced by the sen-sors at constant time intervals (20 seconds in our experi-ments) is filtered based on the corresponding performancemeasurements p(t). The filtering policy is such that a datapair measured at time t is kept or filtered out depending onwhether the performance measured at the same time satis-fies the SLA or not, respectively. Performance is satisfactoryonly if the resource capacity allocated to the virtual con-tainer at time t is sufficient for the given SLA. In this case,the monitored resource utilization represents the actual re-source needs, and thus the data pair can be used for modellearning. On the contrary, an SLA violation indicates thatthe allocated resources are not enough to achieve the SLAtarget. In this case, the resource consumption is capped bythe allocated capacity so that the monitored values are lessthan the desired resource demands and cannot be used infuzzy modeling.

Data clustering and fuzzy rule construction The filtereddata pairs are used for building fuzzy models which char-acterzing the relationship between the application workloadand resource consumption. In order to avoid generating alarge number of fuzzy rules, the data are clustered to pro-duce a concise representation of the system’s behavior. Sev-eral classic clustering algorithms can be used, e.g. hierarchi-cal and k-means clustering. In the proposed local controllerdesign, subtractive clustering [5] is chosen for its speed androbustness.

This clustering method assumes that each data point is apotential cluster center and chooses the data center based onthe density of surrounding data points. The algorithm selectsthe data point with the highest density as the first clustercenter and then removes all data points in the vicinity of thefirst cluster center in order to determine the next data cluster

Page 5: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227 217

and its center location. This process continues until all thedata are within the radius of a cluster center. The variableradius represents a cluster center’s range of influence in eachof the data dimensions, assuming that the data fall within aunit hyperbox. Setting small radius values generally resultsin finding a large number of small clusters. This value is setto 0.5 in the local controller’s implementation.

Since each cluster exemplifies a characteristic of systembehavior, it can be used as the basis of a fuzzy rule thatdescribes system behavior. If n data clusters are formed, n

rules can be produced in which the ith rule is expressed as:

IF input w(t) is in cluster i,

THEN output u(t) is in cluster i.

Each cluster specifies a fuzzy set with its membership func-tions determined by the cluster center and range. Using the

Gaussian membership function, μi(x) = e− (w−ci )

2

2σ2i , the cen-

ter of the membership function ci equals the center of clus-ter i and the weight of membership function σi equals theradius of that cluster.

The model described by the above fuzzy rules is calledzero-order Sugeno-type fuzzy model [14]. The modeling ac-curacy can be improved significantly by using the first-orderSugeno model, in which the output of each rule is a linearfunction of the input variables. The rules are rewritten asfollows,

IF input w(t) is in cluster i, THEN output u(t) = aw + b,

where the parameters a and b in the linear equations areestimated by the least-squares method.

Fuzzy inference Once the fuzzy model relating workloadto resource consumption is learned from the selected work-load to resource usage measurements, it can be used in arule-based fuzzy inference module which, given the appli-cation’s workload, produces the estimated application’s re-source demand for the virtual container.

The fuzzy inference module consists of four basic func-tions as illustrated in Fig. 3. The knowledge base includesa database consisting of membership functions of the fuzzysets and a rule base where the fuzzy rules are specified. Inthe fuzzification function, the input w(t) measured from thesensor is mapped to input fuzzy sets using the membershipfunctions. A decision-making unit, called the fuzzy infer-ence engine, infers from input fuzzy sets to output fuzzysets according to the rules stored in the knowledge base.The defuzzification function aggregates the fuzzy outputsand converts them to a numeric output. The final output isthe weighted average of all rule outputs with the weight ofith rule being the membership of the input in cluster i.

In summary, using fuzzy modeling and fuzzy inferenceshown in Fig. 3, the local controller estimates the resourceneeds for the current workload measured by the sensor, andsends requests to the global controller to either ask for moreresources if the current allocation is not sufficient to satisfythe SLA or to withdraw resources when no longer needed.

Adaptive modeling The discussion so far has only con-sidered offline model learning from collected data. As theworkload or system conditions change, the model describ-ing the system’s behavior needs to capture the changes ac-cordingly. The adaptive modeling is employed by the localcontroller in which the model is repeatedly updated basedon online monitored information.

The clustering function takes new data into considerationas soon as they arrive and keeps updating, so that up-to-dateclusters are always provided for the modeling. Whenever thedata clusters are updated, the parameters of the membershipfunctions are changed accordingly in the database. If a newcluster is added, a corresponding rule is then added into therule base; and similarly, if a cluster no longer exists, the ruleassociated with it is removed from the rule base.

In the case when the allocated resources are insufficientfor the workload, the monitored data become disqualifiedand are filtered out because of the performance degradation.The shortage of qualified data will hurt the model’s learn-ing speed and quality. To avoid this situation, whenever thefilter function detects that the percentage of qualified datais less than 50% during a time window T (set to 5 minutesin the prototype), the controller asks for an additional pre-defined percentage (10% is used in the prototype) of currentresource allocation from the global controller to improve theapplication’s performance back to the desired level.

3.4 Fuzzy-prediction approach

The fuzzy-modeling-based approach described above auto-matically builds a mapping from the application workload tothe corresponding resource needs for the desired QoS. Thisapproach is applicable only when the application workloadcan be characterized and monitored. However, data cen-ters can host a variety of applications which are very dif-ferent from each other so that there is no standard way ofmeasuring workloads. In some cases it is hard to describean application workload using a few metrics like requestrate. The second proposed approach—fuzzy-prediction—only requires information about the resource usage (e.g.,CPU utilization), which is easy to obtain by monitoringsystem-level metrics. The basic idea is to determine futureresource needs on the basis of observations of past resourceusage using fuzzy system.

Page 6: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

218 Cluster Comput (2008) 11: 213–227

Fig. 4 Fuzzy predictionfunctions in a local controller

Fig. 5 An example of three-input two-output data sequences for fuzzyrule construction

Fuzzy rule construction The fuzzy prediction system, il-lustrated in Fig. 4, has some components that are similar tothose used in the fuzzy modeling approach. The fuzzy rulesrepresenting a mapping from input space to output spaceare generated from the monitored data and stored in therule base. The fuzzy inference system processes the learnedfuzzy rules to forecast future resource demands based onthe current system observations. Let r(t) (t = 1,2,3, . . .)be the measured resource usage at sampling time t . Theproblem addressed by the fuzzy prediction system can beformulated as: at time t , given the latest m measurementsr(t), r(t − 1), . . . , r(t − m + 1) as the inputs, determine theresource needs at future times r(t +1), r(t +2), . . . , r(t +n)

as the outputs (m and n are the number of inputs and outputsfor a fuzzy rule, respectively).

A fuzzy rule is generated from an input-output data pair,whose components are subsequences of the successive re-source usage measurements with the input subsequence pre-ceding the output subsequences. Figure 5 shows an exampleof three-input two-output (m = 3 and n = 2 in this case)fuzzy rules. To translate input-output pairs into fuzzy rules,the first step is to divide the input and output spaces into sub-domains representing by fuzzy sets. Assuming that the inputand output ranges are normalized to [0,1], each space is di-vided into 2N + 1 domains, denoted by R1,R2, . . . ,R2N+1,each assigned a fuzzy membership function. Figure 6 gives

Fig. 6 An example of dividing the input/output space into 11 fuzzydomains and the corresponding membership functions

an example where the input/output space is divided into 11fuzzy sets (N = 5 is used in our prototype) with triangularmembership functions.

The next step is to assign a given data point to the fuzzyset with the highest membership degree using the member-ship functions described above. For example, input i1 is con-sidered to be R5 and output o1 is considered to be R8 inFig. 6. Finally, a fuzzy rule is constructed from a pair ofinput-output data as follows,

Rule i: IF i1 is Ri1 and i2 is Ri2, . . . , and im is Rim,

THEN o1 is Ro1, and o2 is Ro2 , . . . , and on is Ron.

Therefore, every sequence of m + n consecutive resourceusage measurements can be used to generate a fuzzy rulewhich maps the input space (i1, i2, . . . , im) representing theprevious system state to the output space (o1, o2, . . . , on)representing the more recent or current state.

Fuzzy rule update A fuzzy rule is generated at every sam-pling time and each rule with m-input and n-output repre-sents a point in the (m + n)-dimensional rule space. If everyrule is stored in the rule base the memory requirements will

Page 7: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227 219

Fig. 7 The fuzzy-rule updating procedure

be excessive, and it is probable that there would be conflict-ing rules which have the same IF part but a different THENpart. The first problem is solved by partitioning input-outputspaces into a finite number of domains as described aboveso that at most one rule is stored in the rule base for eachdomain. The number of rules increases as new input-outputdata are collected, but it never exceeds the maximum num-ber of domains partitioned in the rule space.

To overcome the second problem, when updating the rulebase a reliability index is computed for each rule as Ji = thenumber of occurrences of rule i. Whenever a rule is gener-ated, the system scans all the rules stored in the rule base.If there is a matching rule (i.e., a rule in the same domain),the value of J is increased by 1. Otherwise, the new rule isadded to the rule base and J is initialized to 1. Figure 7 il-lustrates the procedure for updating rules. If there exist con-flicting rules, which one takes effect is determined by thevalue of the reliability index. The rule with the highest relia-bility index is activated, indicating that the active fuzzy ruleappears more frequently than the other conflicting rules. Ifconflicting rules have the same value of reliability index, theone that appeared most recently is activated.

Fuzzy inference Given the latest resource usage measure-ments as inputs, as Fig. 4 shows, the fuzzy inference engineprocesses the active fuzzy rules in the rule base to determinethe outputs which consist of the future resource usage. Ini-tially, there is no rule in the rule base. After the first m + n

measurements are obtained, the first rule is generated andstored in the rule base. Afterwards, at each sampling point,a rule is constructed and the rule base is updated follow-

ing the updating procedure illustrated in Fig. 7. This updat-ing procedure makes the proposed fuzzy prediction capableof self-learning the resource usage behavior of the managedvirtual container.

Compared with the fuzzy-modeling approach, both meth-ods essentially “learn” from the input and output historyto build a mapping and can adaptively update the mappingwhen new data are available so that it can reflect systemchanges very quickly. No prior knowledge or mathematicalmodel of the system is required and they both are a one-pass build-up procedure that does not need iterative time-consuming training. The difference between the two ap-proaches is that the fuzzy modeling approach maps work-load to resource consumption, while the fuzzy predictionmaps the observations of recent resource usage to the futureresource needs.

4 Global resource controller

Each individual local controller tries to minimize the re-source cost by only requesting the resources necessary formeeting the application SLA. The global controller receivesrequests for physical resources from the local controllersand allocates the resources among them as required. It seeksto make allocations that maximize the data center’s profit,which is the revenue received by allocating the physical re-sources among virtual containers minus the penalties due toviolations of resource SLA.

The global controller makes allocation decision based onthe received requests and currently available resources inthe data center. If the requested resources are allocated, theapplication providers are charged for the resources they re-ceive. Otherwise, the data center has to pay certain penaltiesfor the unsatisfied requests. The resource price and penal-ties specified in the resource SLA are negotiated betweenthe data center owner and application providers. To simplifythe problem, it is assumed that the revenue linearly increaseswith the amount of allocated resources and is bounded by thepoint of requested amount. The penalty also has a linear re-lationship with the amount of unsatisfied resources (shownin Fig. 8).

Without loss of generality, this paper considers a singleresource type and a single allocation period. Suppose that K

virtual containers are concurrently active in the data cen-ter. Let reqk denote the resources requested from virtualcontainer k, and alck be the amount of resources grantedto it by the global controller. The data center receives rev-enue of revk for every allocated resource unit over an al-location period. But if the global controller cannot satisfythe request reqk , the data center pays a penalty of pnlkper unit of unmet resource demand, according to the re-source SLA. Each resource allocation decision made by the

Page 8: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

220 Cluster Comput (2008) 11: 213–227

Fig. 8 The revenue and penaltyfunction with respect toallocated resources

global controller is expressed as a resource allocation vec-tor (alc1, alc2, . . . , alcK ), and the total profit obtained by thedata center for a time period is

profit(alc1, alc2, . . . , alcK)

=K∑

k=1

[revkalck − pnlk(reqk − alck)] (1)

s.t. 0 ≤ alck ≤ reqk, C =K∑

k=1

alck ≤ A

where C is the total amount of resources allocated to thevirtual containers, and A is the total resource capacity avail-able in the data center. The profit equation can be rewrittenas follows, for one allocation period t ,

profit(alc1, alc2, . . . , alcK)

=K∑

k=1

(revk + pnlk)alck −K∑

k=1

pnlkreqk (2)

(revk + pnlk) is defined as the profit rate for virtual con-tainer k. Assuming that the global controller can allocateany resource fraction to the virtual containers, a greedy al-gorithm that allocates resources in the order of decreasingprofit rates is an optimal allocation (this is similar to the caseof a fractional knapsack problem [10]).

To optimize profit over multiple time periods, the allo-cation decision has to be repeated. Equation (3) defines acumulative profit which is the discounted sum under a dis-counting factor γ over a time horizon T . The factor modelsthe fact that future profit is worth less than current profitbecause of the uncertainty in the future, for T allocation pe-riods,

T∑

t=1

γ tprofitt =T∑

t=1

K∑

k=1

γ t [revkalctk − pnlk(reqk − alctk)](3)

Based on the above profit model, a greedy strategy that max-imizes the total profit for every period is still optimal be-cause the allocation decision for current period does not af-fect the future periods.

5 Experimental evaluation

This section summarizes the experimental evaluation of theproposed two-level control system for dynamic resourceallocation in a data center environment with time-varyingworkloads. Section 5.2 discusses the experiments that eval-uate the ability of the local controller to track the resourceneeds of changing workloads. Section 5.3 considers themaximal profit approach (1) discussed in Sect. 4 when theglobal controller must allocate limited resources among sev-eral competing virtual containers.

5.1 Experimental setup

Data center testbed The testbed is deployed on a 16-CPUIBM x336 based cluster that provides virtual containersfor applications, and several workload-generating clients.VMware ESX Server 3.0.1 is installed in each cluster nodeequipped with dual hyperthreaded Intel Xeon 3.2 GHz CPUsand 4 GB memory. Virtual machines are created on theservers and used as virtual containers to host applications.The clients are placed on VMware-Server-1.0.0-based vir-tual machines, hosted on another cluster of 32 dual-2.4 GHzhyperthreaded Intel Xeon nodes. Web-based workloads aregenerated by the clients and issued to the applications acrossa Gigabit Ethernet network.

Application and workload The Java Pet Store [26] waschosen to represent a typical e-business application. It im-plements an online store that allows users to browse andmake orders, and managers to manage orders, suppliers andinventory. It is a reference application that has been devel-oped on various Java EE platforms. Synthetic HTTP work-loads that mimic the key aspects of real-world workloads arecreated with various client sessions issued by httperf [11].Each individual session consists of a sequence of requestsgenerated by repeating and mixing the following customeractions: go to the storefront, sign in, browse products, addsome products to shopping cart, and checkout. Two key pa-rameters are adjusted to vary a session’s workload on the ap-plication: the user’s think time (the time between two con-secutive requests) can be changed to generate different re-quest rates; the ratio of dynamic requests (e.g., sign in, check

Page 9: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227 221

out and search product) to static requests (e.g., browse sta-tic Web pages and view images) can be varied in order tochange the workload characteristics. A Perl program wasdeveloped to create different workloads and drive httperf toissue the requests.

Traces collected from ’98 World Cup sites are also usedin the experiments to represent real-world workloads. Thelogs provided by an Internet repository [25] consist of about1.3 million requests made to the World Cup Web site be-tween April 30, 1998 and July 26, 1998. A real-time logreplayer [24] is used to generate workloads using the trace.

Global/local controller prototype The virtual containersare monitored and controlled through the Web-service-basedmanagement interface provided by VMware ESX Server. Itallows the allocation of a server’s physical resources amongits hosted virtual machines (e.g. setting the minimum, maxi-mum and proportional resource shares of a virtual machine),and also provides the real-time monitoring of a virtual ma-chine’s resource utilization.

The proposed two-level controllers are implemented inJava, running along with the virtual containers. Every vir-tual machine has a local controller to manage the virtualcontainer it provides, and the ESX cluster has a global con-troller to manage the shared resources for the virtual con-tainers hosted on it. The sensors, also developed in Java,monitor the workload (request rate and mixture), the appli-cation throughput (reply rate), and the resource consumption(CPU usage). The monitoring period is set to 20 seconds.Because the concerned workloads are mostly CPU inten-sive, the experiments focus on the utilization and allocationof CPU resources.

5.2 Local controller validation

The first set of experiments is to validate whether the lo-cal controller can accurately estimate resource needs usingfuzzy-modeling and fuzzy-prediction approaches under dy-namically changing workload.

5.2.1 Fuzzy-modeling approach

In the first experiment, the workload generator issues ses-sions to the Pet Store every 10 seconds. These sessions onlycontain requests for static Web content with a user think-time ranging from 0.1 to 1 second, and each session lastsaround 1 minute. The generated sessions are divided intogroups and the average user think-time of each group is de-creasing, hence increasing the request rate among groups.This setup emulates the presence of bursts in real-worldworkloads. The entire experiment lasts for 4000 seconds.

Because the workloads used in this experiment consist ofonly static Web-content requests, the CPU usage is highly

Fig. 9 Fuzzy models learned from the workload with static Web re-quests at the beginning (model1) and the end (model2) of the experi-ment

correlated with the request rate, which is then used as theonly metric to characterize the workload. In this case, theinput and output to fuzzy modeling are the request rate andCPU usage measurements. The first 50 pairs of data pointscollected from the sensors are used to initialize the learningof the fuzzy model. Afterwards, the model is continuouslyupdated every 200 seconds (during which 10 new data pointsbecome available from the sensors). Figure 9 illustrates themodels learned from the monitored data at the beginningand the end of the experiment, which shows an approximatelinear relationship between the request rate and CPU usage1

in the range of 0 to 100 requests/second.The local controller continuously estimates the CPU de-

mand based on the current workload and the latest learnedfuzzy model, and dynamically adjusts the CPU requests tothe global controller. Because the available resources aresufficient in this experiment, the global controller alwaysallocates to the virtual container the exact amount of CPUrequested by the local controller. To prove the accuracy ofthe fuzzy modeling, the same experiment is repeated onthe virtual container which is statically allocated with alarge amount of CPU (3.2 GHz) in order to obtain the idealthroughput for the same workload.

The throughputs from these two runs are compared inFig. 10, indicating that the actual throughput obtained byusing the local controller is almost identical to the idealthroughput. Compared to the static allocation of CPU withthe peak value (overprovision based on the highest load),the dynamic approach saves about 55% of CPU cycles oth-erwise needed for this experiment. This confirms that theonline fuzzy modeling can accurately learn the relationshipbetween the workload and resource demand, and effectivelyguide the resource allocation for the virtual container.

In the second experiment, the workloads are generatedsimilarly to the previous one, except that requests for dy-namic Web content are also considered. Every group of ses-sions differs not only in the request rate but also in the pro-

1In VMware ESX Server, the amount of CPU allocation and usage canbe expressed in CPU frequencies (Hz).

Page 10: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

222 Cluster Comput (2008) 11: 213–227

Fig. 10 Comparison of the throughput achieved by using local con-troller and the ideal throughput for the workload with static Web re-quests

Fig. 11 The surface of the fuzzy model learned from the workloadwith dynamic Web requests

portion of dynamic requests in the workload: the ratio of dy-namic to static requests grows from 0 to 1 across the groups.Servicing dynamic Web content requires processing by theapplication server and database, and thus typically consumesmore resources than servicing static Web content. If the localcontroller still uses request rate as the only metric for repre-senting workload, the resulting model cannot truly describethe actual relationship between workload and resource de-mand. The experiment results (observed but not shown here)also confirm that the throughput achieved by using such amodel is much worse than the ideal throughput for the work-load.

In contrast, using both the request rate and dynamic/staticrequest ratio to characterize the workload, a 3D fuzzy modelcan be constructed to describe the relationship betweenworkload and resource demand more accurately. Figure 11shows the surface of the model learned at the end of the ex-periment. One of the advantages of fuzzy modeling demon-strated by the above experiments is that fuzzy models arewell-suited for learning non-linear and complex relation-ships.

Figure 12 compares the application’s throughput to theideal throughput obtainable for the workload. The graphshows that the throughput achieved is again very close to itsideal level (the difference is under 6%). It is also noticeablethat when the workload is high the difference becomes rela-tively larger. This is because of the delay between the change

Fig. 12 Comparison of the throughput achieved by using local con-troller and the ideal throughput for the workload with dynamic Webrequests

Fig. 13 Fuzzy model learned from the trace-based workload at thebeginning (model1) and the end (model2) of the experiment

of workload and resource allocation, which is largely due tothe granularity of the online monitoring and control. Whenthe workload is heavy, this delay causes the application’sthroughput to fluctuate a little around the ideal one. How-ever, the overall error is still very low. About 33% of CPUcycles are saved by this dynamic allocation, compared to afixed allocation where overprovision is based on the highestload.

In the third experiment, the 1998 World Cup Web sitetrace collected on May 31 from 5am to 5pm (local timein Paris) is used to generate workload, and it is played at12X speedup to enhance its intensity. All the documentsrequested by the trace are created by the log replayer toolbased on the sizes recorded in the trace. Because only staticWeb pages are requested in the trace replaying, the workloadis characterized by the request rate.

During the experiment, the first 30 measurements ofworkload and CPU consumption are used to initialize thefuzzy model. After that, the model is updated every 200seconds. Figure 13 illustrates the model learned at the be-ginning and the end of the experiment. Figure 14 showsthat the application’s throughput achieved by using the localcontroller is close to the ideal throughput obtainable for theworkload (the difference is within 5%), indicating the effec-

Page 11: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227 223

Fig. 14 Comparison of the throughput achieved by using local con-troller and the ideal throughput for the trace-based workload

Fig. 15 Comparison of the throughput achieved by dynamic allocationusing fuzzy prediction approach and the ideal throughput with maximalallocation

tiveness of the fuzzy-modeling approach under real-worldworkloads. The dynamic allocation uses less than 30% ofthe CPU cycles used by a static approach that allocates max-imum CPU fraction based on the highest workload.

5.2.2 Fuzzy-prediction approach

Similar to the previous experiment, three days of web tracesfrom World Cup 98 Web site are chosen to generate work-load and played at 24X speedup to reduce experiment du-ration. During the experiment, only the CPU utilization ofthe virtual container is measured and fed to local controllerevery 20 seconds. The local controller applies the proposedfuzzy-prediction approach to estimate resource needs foreach allocation interval (set to one minute in the experi-ment).

The first 50 measurements are used to initialize the fuzzyrules. Then the rule base is updated whenever the new CPUusage measurement is available (every 20 seconds). Forevery minute, the local controller estimates the CPU usagefor the next minute based on the fuzzy rules learned from thepervious observations and then sends requests to the globalcontroller. The global controller adjusts the CPU allocation

Fig. 16 The CPU allocated to the virtual container by using local andglobal controller

as required. Figure 15 shows the resulting throughput for thetrace-workload by using fuzzy prediction and the through-put achieved by using maximal allocation (3.2 GHz). Theresults are very close and the differences between them areless than 1% on average. Figure 16 plots the CPU allocatedto the virtual container during the experiment and about 44%resources can be saved using the dynamic allocation com-pared with maximal allocation.

Comparing to the fuzzy-modeling approach, fuzzy pre-diction can produce accurate short-term resource estima-tion for local controllers. Fuzzy modeling applies clusteringtechniques to provide concise data presentation, resulting insmaller rule base (less than ten fuzzy rules during experi-ments) than the one generated from the fuzzy-prediction ap-proach (about several tens of fuzzy rules in the experiments).

5.3 Global controller validation

The last set of experiments investigates the allocation deci-sions of the global controller among multiple virtual con-tainers. Two virtual containers (VC1, VC2) running on thesame server node compete for the available CPU cycles (re-stricted to 1 GHz in the experiment). Both containers hostthe Java Pet Store application and two clients issue requestsfor static Web content to them. VC1 serves a fixed work-load, which has a constant request rate of 30 requests/sec;while VC2 receives an increasing workload with a requestrate rising from 10 up to 60 requests/sec.

The local controllers of these two containers employ thefuzzy modeling approach to dynamically estimate their CPUdemands for the workloads. The amounts of resources re-quested by the two local controllers during the experimentare plotted in Fig. 17. The local controller of VC1 requestsaround 500 MHz of CPU throughout the entire experiment;while VC2 increases its CPU request from about 200 MHzto more than 800 MHz as its workload grows.

When the CPU needed by VC2 goes beyond 500 MHz,the global controller responds to the resource shortage by

Page 12: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

224 Cluster Comput (2008) 11: 213–227

Fig. 17 CPU requests from two virtual containers (VC1, VC2) thatshare limited resources

Fig. 18 CPU allocation that favors VC2

Fig. 19 CPU allocation that favors VC1

reducing the CPU allocation to VC1 or VC2 or both. Theallocation policy of the global controller is to maximize itsprofit by applying the greedy algorithm discussed in Sect. 4.Two scenarios are considered in the experiments. In the firstcase, the profit rate of VC2 is higher than VC1; therefore,the global controller decides to satisfy the resource requestsfrom VC2 by reducing the allocation for VC1 whenever aCPU shortage happens. Figure 18 shows the actual CPU al-locations for the two containers throughout the experiment.The second case considers the opposite situation where VC1has a higher profit rate and thus is favored in the resource al-location. In this case, VC2 suffers from a resource shortagewhen the global controller cannot allocate enough resourcesfor both containers (Fig. 19). The greedy strategy to resourceallocation using the simplified profit model can be theoreti-

cally proved to be the optimal solution to maximize the totalprofit.

6 Related work

To the best of our knowledge there is no prior work using afuzzy modeling approach to data center resource manage-ment. The following briefly summarizes other work withsome common elements with this paper’s approach.

Rule-based systems: This approach uses a set of event-condition-action rules (defined by system experts) that aretriggered when some precondition is satisfied (e.g., whensome metrics exceed a predefined threshold). For example,the HP-UX Workload Manager [23] allows the relative CPUutilization of a resource partition to be controlled within auser-specified range, and the approach of Rolia et al. [12]observes resource utilization (consumption) by an applica-tion workload and uses some “fixed” threshold to decidewhether current allocation is sufficient or not for the work-load. With the growing complexity of systems, even expertsare finding it difficult to define thresholds and corrective ac-tions for all possible system states.

Control theory: Approaches based on control theoryhave been applied to resource management to achieve per-formance guarantees. Most of the work assumes a linearrelationship between the QoS parameters and the controlparameters, and involves a training phase with a given work-load to perform system identification. Typically, control pa-rameters must be specified or configured offline and on aper-workload basis. Abdelzaher et al. [1] investigated thisapproach for QoS adaptation in Web servers. In [19, 22],a nonlinear relation between response time and CPU allo-cation to a Web server is studied, and a bimodal model isused to switch between underload and overload operatingregions. To deal with time-varying workloads, more recentwork applies adaptive control theory, in which models areautomatically adapted to changes using online system iden-tification.

Model-based: Previous research efforts [4, 8, 13, 18, 20]have been trying to model computer systems from differentperspectives. Bennani et al. [3] predicts the response timeand throughput for both online and batch workloads usingmulticlass open queueing networks. Liu et al. [9] uses ARmodels to map CPU entitlement percentage to the mean re-sponse time with a fixed workload. Chandra et al. [4] mod-els the resource using a time-domain queueing model whichrelates the resource requirements to its workload. Some ofthese approaches make simplifying assumptions such as us-ing a single queue to model the whole system, which mayfail to capture complexities of the relationship between ap-plication workload and resource usage. Some models arevalidated only using simulations.

Page 13: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227 225

Reinforcement learning (RL): Tesauro [16] proposed touse reinforcement learning for autonomic resource allo-cation. Compared with our fuzzy-logic-based approaches,both can automatically learn from past experiences with-out an explicit performance model. However, RL usuallyuses lookup table to store the information it obtained fromtraining data. The size of table increases exponentially withthe number of state variables, causing the scalability issue.Fuzzy rule based system keeps its knowledge more effi-ciently in the form of fuzzy rules and fuzzy membershipfunctions. RL may also have a long training time due tothe absence of domain knowledge or good heuristics, whilethe construction of fuzzy rule base in our approach does notrequire time-consuming training. In [17], the authors pro-posed to use a hybrid RL method combining RL and queuingmodels, in which RL trains offline on data collected while aqueuing model policy controls the system to avoid perfor-mance degradation in live online training.

Fuzzy control: Diao et al. [6] proposed a profit-orientedfeedback control system for maximizing SLA profits in Webserver systems. The control system applies fuzzy control toautomate the admission control decisions in a way that bal-ances the loss of revenue due to rejected work against thepenalties incurred if admitted work has excessive responsetime.

The proposed resource management system in this paperdiffers from the prior work in the following aspects:

• The resource control functions are separated between re-source provider and application provider, which makesthe design of data center resource management more flex-ible and robust. Each local controller tries to maximize itsprofits by requesting “just enough” resources for satisfy-ing application SLAs as well as reducing unnecessary re-source cost. The global controller takes into account thetradeoff between revenue obtained from satisfied resourcerequests and cost from violations of resource SLAs.

• Fuzzy-logic-based approaches provide a generic ap-proach to representing the relationship between systemvariables. It can be easily applied to any type of applica-tions hosted in virtual containers. This approach makesno underlying assumption of the workload characteris-tics, and can learn any type of relationship very fast. Es-pecially, the fuzzy system can efficiently model the non-linear system with dynamically changing operating con-ditions.

• The resource management process is automatic withoutany human intervention. The fuzzy rules are automati-cally learned from online monitoring data and the knowl-edge base is updated continuously as new data arrives,enabling the system to capture transient or unexpectedworkload changes.

7 Conclusions and future work

This paper presents a flexible two-level resource manage-ment system that is able to provide high quality of servicewith much lower resource allocation costs than worst-caseprovisioning. At the application level, to enable the localcontroller to accurately estimate the resource demands fordifferent workloads, two fuzzy logic based methods—fuzzymodeling and fuzzy prediction—are proposed to guide re-source allocation based on online measurements. Both ap-proaches have an adaptive learning ability, thus requiring noprior knowledge about the system or the application. Specif-ically, the fuzzy modeling approach characterizes the map-ping from the workloads and the corresponding resourcerequirements, while the fuzzy prediction builds a mappingfrom current resource usage to future resource needs. In thiscontext, “adaptive” means that the mapping can be easily up-dated when new information is available in order to adapt tothe system changes and reflect the most recent system con-ditions. The global controller at the resource-pool level triesto find the optimal resource allocation based on the proposedprofit model, towards maximizing the total profit of the datacenter.

Our approach, in conjunction with virtualization tech-niques, can provide application isolation and performanceguarantees in the presence of changing workloads by dy-namically allocating resources at fine time granularity,which results in high utilization and low cost as well. Theproposed resource management system is implemented on avirtualized data center testbed and evaluated using applica-tions that are representative of e-business and Web-contentdelivery scenarios. Both synthetic and real-world Web work-loads are used to evaluate the effectiveness of the approach.The experimental results show that the system can signifi-cantly reduce resource cost while still guaranteeing applica-tion QoS in various scenarios.

Future papers will consider the cost of virtual machinemigration (i.e., reassignment of a virtual machine to an-other physical host due to resource shortage in the currenthost). The profit model described in (3) does not includethe migration cost (i.e., the cost is assumed to be zero).This assumption is reasonable in the scenario where thedata center’s physical resources reside in one location andthey have adequate network bandwidth so that the migra-tion could take place in less than a few seconds. However,in the case of scale-out data centers where the migrationsmay happen between different locations, the migration delaycould be significant. The cost due to the resource overheadand reallocation delay should be incorporated into the profitmodel, which makes the profit optimization much harder.The global controller has to determine not only the quantityof physical resources allocated to each virtual container, butalso the location of those virtual containers (i.e., which phys-ical server is assigned to host the virtual container) at each

Page 14: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

226 Cluster Comput (2008) 11: 213–227

allocation period. The global controller’s allocation vectorextends to ((alct1; loct1), (alct2; loct2), . . . , (alctK ; loctK))

in which alctk and loctk represent the amount of the phys-ical resources assigned to the kth virtual container and itslocation at time t . If lock is changed from time t to t + 1,it indicates that the virtual container k is moved from onehost to another. The cost of resource overhead and potentialperformance loss caused by migration should be deductedfrom the data center’s profit which is then redefined as, forone allocation period t ,

profitt ((alct1; loct1), (alct2; loct2), . . . , (alctK ; loctK))

=K∑

k=1

[revtkalctk − pnlk(reqk − alck)

(4)− migr(loctk − loc(t−1)k)]

migr(loctk − loc(t−1)k) ={

0 if loctk = loc(t−1)k

migr otherwise

where migr is the cost of moving a virtual container fromone host to another. Clearly, the assertion that an allocationdecision for a given period won’t affect the future profit isno longer valid and the global controller has to make betterdecisions than merely maximizing its profit for the currentperiod. Considering the effect on future periods, the cumu-lative profit over a time horizon T defined in (3) is rewrittenas follows, for T allocation periods,

T∑

t=1

γ tprofitt =T∑

t=1

K∑

k=1

γ t [revkalctk − pnlk(reqk − alctk)

− migr(loctk − loc(t−1)k)] (5)

To maximize the profit using the above model is much morecomplicated than using (3) because each allocation deci-sion will affect the future decisions and the resulting profitas well. Moreover, the above profit formula requires theinformation about the future resource requests alctk (t =2, . . . , T ), indicating that the local controllers have to pro-vide n-step ahead forecast of resource needs. How to incor-porate forecasting component into the local controllers andhow to solve the optimization problem (5) in the global con-troller are the subject of ongoing work.

Acknowledgements This work was supported in part by an In-tel grant (IT R Council), National Science Foundation grant CNS-0540304, the BellSouth Foundation, equipment awards from DURIPand IBM and software donations from VMware. Any opinions, find-ings and conclusions or recommendations expressed in this materialare those of the authors and do not necessarily reflect the views ofthe National Science Foundation, BellSouth Foundation, Intel, IBM,or VMware.

References

1. Abdelzaher, T., Shin, K.G., Bhatti, N.: Performance guarantees forweb server end-systems: a control-theoretical approach. In: IEEETrans. Parallel Distrib. Syst. 13(1) (2002)

2. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A.,Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtual-ization. In: Proc. of the ACM Symposium on Operating SystemsPrinciples (SOSP), October 2003

3. Bennani, M.N., Menascé, D.A.: Resource allocation for auto-nomic data centers using analytic performance models. In: Proc.of 2nd IEEE International Conference on Autonomic Computing(ICAC) (2005)

4. Chandra, A., Gong, W., Shenoy, P.: Dynamic resource allocationfor shared data centers using online measurements. In: Proc. ofIEEE International Workshop on Quality of Service (IWQoS),June 2003

5. Chiu, S.: Fuzzy model identification based on cluster estimation.J. Intell. Fuzzy Syst. 2(3) (1994)

6. Diao, Y., Hellerstein, J.L., Parekh, S.: Using fuzzy control to max-imize profits in service level management. IBM Syst. J. 41(3)(2002)

7. Dike, J.: A user-mode port of the Linux kernel. In: Proc. of 4thAnnual Linux Showcase & Conference (ALS 2000) (2000)

8. Doyle, R., Chase, J., Asad, O., Jin, W., Vahdat, A.: Model-basedresource provisioning in a web service utility. In: Proc. of the 4thConference on USENIX Symposium on Internet Technologies andSystems, March 2003

9. Liu, X., Zhu, X., Singhal, S., Arlitt, M.: Adaptive entitlementcontrol of resource containers on shared servers. In: Proc. of 9thIFIP/IEEE International Symposium on Integrated Network Man-agement, May 2005

10. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Com-puter Implementations. Wiley, New York (1990)

11. Mosberger, D., Jin, T.: httperf: a tool for measuring web serverperformance. Perform. Eval. Rev. 26(3) (1998)

12. Rolia, J., Cherkasova, L., McCarthy, C.: Configuring workloadmanager control parameters for resource pools. In: Proc. of 10thIEEE/IFIP Network Operations and Management Symposium(NOMS), April 2006

13. Sha, L., Liu, X., Lu, Y., Abdelzaher, T.: Queueing model basednetwork server performance control. In: Proc. of the 23rd IEEEReal-Time Systems Symposium (RTSS’02) (2002)

14. Sugeno, M., Yasukawa, T.: A fuzzy-logic-based approach to qual-itative modeling. IEEE Trans. Fuzzy Syst. 1(1) (1993)

15. Sugerman, J., Venkitachalam, G., Lim, B.: Virtualizing I/O de-vices on VMware workstation’s hosted virtual machine monitor.In: Proc. of 2001 USENIX Annual Technical Conference, June2001

16. Tesauro, G.: Online resource allocation using decompositional re-inforcement learning. In: Proc. of the Twentieth National Confer-ence on Artificial Intelligence (AAAI-05), July 2005

17. Tesauro, G., Jong, N., Das, R., Bennani, M.: On the use of hybridreinforcement learning for autonomic resource allocation. Clust.Comput. 10(3) (2007)

18. Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.:An analytical model for multi-tier internet services and its applica-tions. In: Proc. of ACM Sigmetrics Conference (SIGMETRICS),Jun 2005

19. Wang, Z., Zhu, X., Singhal, S.: Utilization and SLO-based con-trol for dynamic sizing of resource partitions. In: Proc. of 16thIFIP/IEEE Distributed Systems: Operations and Management(DSOM), October 2005

20. Xu, W., Zhu, X., Singhal, S., Wang, Z.: Predictive control for dy-namic resource allocation in enterprise data centers. In: Proc. of2006 IEEE/IFIP Network Operations & Management Symposium,April 2006

Page 15: Autonomic resource management in virtualized data centers ...visa.lab.asu.edu/web/wp-content/uploads/2015/12/cluster-computing-09.pdfers, and enabling a virtualized data center where

Cluster Comput (2008) 11: 213–227 227

21. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)22. Zhu, X., Wang, Z., Singhal, S.: Utility-driven workload manage-

ment using nested control design. In: Proc. of American ControlConference (ACC), June 2006

23. HP-UX Workload Manager, http://docs.hp.com/en/5990-8153/ch05s12.html

24. See http://www.cs.virginia.edu/~rz5b/software/software.htm25. See http://ita.ee.lbl.gov/html/contrib/WorldCup.html26. See https://blueprints.dev.java.net/petstore/

Jing Xu is a PhD candidate in theDepartment of Electrical and Com-puter Engineering and a member ofthe Advance Computing and Infor-mation Systems Laboratory at theUniversity of Florida. She receivedthe degree of B.E. from Universityof Science and Technology of Chinain 2002. Her research interests are inthe areas of distributed/grid comput-ing, autonomic computing, and vir-tualization.

Ming Zhao is a PhD candidate inthe Department of Electrical andComputer Engineering and a mem-ber of the Advance Computing andInformation Systems Laboratory atthe University of Florida. He re-ceived the degrees of BE and MEfrom Tsinghua University. His re-search interests are in the areas ofdistributed computing, autonomiccomputing, and virtualization. Heis a student member of IEEE andACM.

José Fortes (S’80–M’83–SM’92–F’99) received the Ph.D. degreein electrical engineering from theUniversity of Southern California,Los Angeles, in 1984. From 1984to 2001, he was on the faculty ofthe School of Electrical and Com-puter Engineering, Purdue Univer-sity, West Lafayette, IN. In 2001,he joined both the Department ofElectrical and Computer Engineer-ing and the Department of Com-puter and Information Science andEngineering, University of Florida,Gainesville, as a Professor and Bell-

South Eminent Scholar. At the University of Florida he founded anddirects the Advanced Computing and Information Systems (ACIS) lab-oratory and the NSF Industry-University Cooperative Center on Au-tonomic Computing. His current research interests are in the areas ofdistributed computing and autonomic computing. Dr. Fortes was a Dis-tinguished Visitor of the IEEE Computer Society from 1991 to 1995.

Robert Carpenter joined Intel in1997 after a 20-year career as an at-torney, retiring as district attorneyof Greene County, NY in 1996. Formany years, he taught mathemat-ics at Bard College and later lawat Albany Law School. After serv-ing as the Business Process Automa-tion architect within IT, he was theIT architect for SOE [Service Ori-ented Enterprise] working closelywith the platform groups sharingtime between Beta SI and IT Re-search. Robert Carpenter was de-ceased on November 2nd, 2007.

Mazin Yousif is a Chief SystemsArchitect at Numonyx Corporation.Prior to that, Mazin was a PrincipalEngineer and Director in the Corpo-rate Technology Group of Intel Cor-poration in Hillsboro, OR, where heled a team that focused on platformprovisioning & virtualization to en-able platform autonomics & Capac-ity on Demand (CoD) in a Scale-out Environment. Mazin was alsoone the principal architects definingthe InfiniBand Architecture, duringwhich he chaired the Management

Working Group (MgtWG) in the InfiniBand Trade Association (IBTA).From 1993 to 1995, he was an assistant professor in the ComputerScience Department at Louisiana Tech University. Mazin worked forIBM’s xSeries Server Division in Research Triangle Park (RTP), NC,from 1995–2000. Mazin also served as Adjunct Professor at severaluniversities including the Oregon Graduate Institute (OGI), Duke Uni-versity and North Carolina State University.Mazin finished his Master and Ph.D. degrees from the PennsylvaniaState University in 1987 and 1992, respectively.Mazin’s research interests include Computer Architecture, Autonomicand Grid Computing, Clustered Architectures, workload characteriza-tion and workload-driven platform architectures He has published 50+articles in his areas of research. Mazin chaired the program commit-tee of several conferences and workshops and has been in the programcommittees of many others. Mazin is in the advisory board of the Jour-nal of Pervasive Computing and Communications (JPCC), and is aneditor in Cluster Computing, the Journal of Networks, Software Toolsand Applications. He is also in the Advisory board of ERCIM. Mazinis an IEEE senior member.