10
Cluster Comput DOI 10.1007/s10586-016-0682-6 ACCRS: autonomic based cloud computing resource scaling Ziad A. Al-Sharif 1 · Yaser Jararweh 2 · Ahmad Al-Dahoud 2 · Luay M. Alawneh 1 Received: 12 June 2016 / Accepted: 2 November 2016 © Springer Science+Business Media New York 2016 Abstract A cloud computing model gives cloud service providers the ability to retain multiple workloads on a single physical system. However, efficient resource provisioning and possible system fault management in the cloud can be a challenge. Early fault detection can provide room to recover from potential faults before impacting QoS. Current static techniques of fault management in computing systems are not satisfactory enough to safeguard the QoS requested by cloud users. Thus, new smart techniques are needed. This paper presents the ACCRS framework for cloud computing infrastructures to advance system’s utilization level, reduce cost and power consumption and fulfil SLAs. The ACCRS framework employs Autonomic Computing basic compo- nents which includes state monitoring, planning, decision making, fault predication, detection, and root cause analysis for recovery actions to improve system’s reliability, avail- ability, and utilization level by scaling resources in response to changes in the cloud system state. B Ziad A. Al-Sharif [email protected] Yaser Jararweh [email protected] Luay M. Alawneh [email protected] 1 Software Engineering Department, Jordan University of Science and Technology, Irbid 22110, Jordan 2 Computer Science Department, Jordan University of Science and Technology, Irbid 22110, Jordan Keywords Cloud computing · Resource scaling · Auto- nomic computing · Quality of service · Energy efficiency 1 Introduction A Cloud Service Provider (CSP) provides and maintains on-demand computing services to users with an acceptable Quality of Service (QoS). Cloud Users (CU) are unrestrained to system maintenance, resource provisioning and service continuity, which became the obligations of CSPs. This permits CUs to focus on their business advancement with- out wasting time on system’s related issues. Furthermore, cloud computing model gives CSPs the ability to operate multiple workloads on a single physical system. This tremen- dously reduces cost and power consumption and increases resource utilization. However, this model incurs a number of challenges, some of which relate to efficient resource provisioning (scheduling) and potential system faults that may cause service interruption and consequently affect CSPs profit, market share, and reputation. The Service Level Agreements (SLAs) constrain CSPs to ensure service avail- ability, reliability, and continuity. In a cloud system, the number of possible faults creates a critical challenge for CSPs and their SLAs. Early fault detection will provide room for CSPs to recuperate from faults before impacting QoS. On the other hand, Autonomic Computing in Cloud (ACC) is the cloud system’s ability to manage itself given high-level objectives [14]. Cloud computing models are growing large, complex, and costly to be managed yet workloads and envi- ronment conditions tend to change rapidly. Thus, autonomic decisions and actions are needed. The goal is to make cloud computing systems and their applications capable of man- aging themselves with minimum human interference. Thus, 123

ACCRS: autonomic based cloud computing resource scaling

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ACCRS: autonomic based cloud computing resource scaling

Cluster ComputDOI 10.1007/s10586-016-0682-6

ACCRS: autonomic based cloud computing resource scaling

Ziad A. Al-Sharif1 · Yaser Jararweh2 · Ahmad Al-Dahoud2 · Luay M. Alawneh1

Received: 12 June 2016 / Accepted: 2 November 2016© Springer Science+Business Media New York 2016

Abstract A cloud computing model gives cloud serviceproviders the ability to retain multiple workloads on a singlephysical system. However, efficient resource provisioningand possible system fault management in the cloud can be achallenge. Early fault detection can provide room to recoverfrom potential faults before impacting QoS. Current statictechniques of fault management in computing systems arenot satisfactory enough to safeguard the QoS requested bycloud users. Thus, new smart techniques are needed. Thispaper presents the ACCRS framework for cloud computinginfrastructures to advance system’s utilization level, reducecost and power consumption and fulfil SLAs. The ACCRSframework employs Autonomic Computing basic compo-nents which includes state monitoring, planning, decisionmaking, fault predication, detection, and root cause analysisfor recovery actions to improve system’s reliability, avail-ability, and utilization level by scaling resources in responseto changes in the cloud system state.

B Ziad A. [email protected]

Yaser [email protected]

Luay M. [email protected]

1 Software Engineering Department, Jordan University ofScience and Technology, Irbid 22110, Jordan

2 Computer Science Department, Jordan University of Scienceand Technology, Irbid 22110, Jordan

Keywords Cloud computing · Resource scaling · Auto-nomic computing · Quality of service · Energy efficiency

1 Introduction

A Cloud Service Provider (CSP) provides and maintainson-demand computing services to users with an acceptableQuality of Service (QoS). CloudUsers (CU) are unrestrainedto system maintenance, resource provisioning and servicecontinuity, which became the obligations of CSPs. Thispermits CUs to focus on their business advancement with-out wasting time on system’s related issues. Furthermore,cloud computing model gives CSPs the ability to operatemultipleworkloads on a single physical system. This tremen-dously reduces cost and power consumption and increasesresource utilization. However, this model incurs a numberof challenges, some of which relate to efficient resourceprovisioning (scheduling) and potential system faults thatmay cause service interruption and consequently affect CSPsprofit, market share, and reputation. The Service LevelAgreements (SLAs) constrain CSPs to ensure service avail-ability, reliability, and continuity. In a cloud system, thenumber of possible faults creates a critical challenge forCSPs and their SLAs. Early fault detection will provideroom for CSPs to recuperate from faults before impactingQoS.

On the other hand,AutonomicComputing inCloud (ACC)is the cloud system’s ability to manage itself given high-levelobjectives [1–4]. Cloud computingmodels are growing large,complex, and costly to be managed yet workloads and envi-ronment conditions tend to change rapidly. Thus, autonomicdecisions and actions are needed. The goal is to make cloudcomputing systems and their applications capable of man-aging themselves with minimum human interference. Thus,

123

Page 2: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

ACC tries to ensure the system’s survivability, which is sys-tem’s ability to maintain its near optimal performance withminimum resources, protect itself from all types of attacks,recover from faults, and reflect to changes in the environmentby automatically reconfiguring its resources. Environmentchanges can be internal such as excessive CPU utilizationand high power consumption and/or external such as externalattacks and spike in the incoming workloads, any of whichcan impact the system’s equilibrium [5]. Thus, the systemmust be able to modify itself in order to counter the effects ofchanges in the environment andmaintain its equilibrium. Thechanges are analyzed to determine whether any of the essen-tial variables outreaches its viability limits. Then, it triggersa predefined plan to determine the proper changes to injectinto the current behavior of the system such that it accom-modates these changes and returns the system to its stabilitystate within the new environment.

However, current static techniques and fault managementin computing systems are not satisfactory enough [6–10] tohandle ACC and provide the QoS requested by CUs. Thus,new smart yet comprehensive procedures are needed. Thispaper presents on-demand Autonomic Cloud ComputingResource Scaling (ACCRS) framework. The ACCRS frame-work employs resource provisioning of cloud environmentsand dynamically scales cloud resources based on availablesystem resources, utilization level and SLAs. It improvescloud system reliability and availability by applying proac-tive fault detection techniques to prevent fault occurrence,if possible, and employing reactive recovery techniqueswhen faults occur. ACCRS provides a mechanism wherebychanges in the cloud system’s essential variables (i.e. per-formance, power, fault, security, etc.) can cause changes tothe behavior of the computing system such that the systemis brought back into equilibrium with respect to the environ-ment.

ACCRS will help CSPs to succeed in manifold. First,reduce costs and power consumption by increasing the uti-lization levels and efficiency of equipment and facilities.Second, create dynamic network policies that allow dealingwith larger workloads and higher demands while maintain-ing reliability and availability. Finally, increase the velocityat which IT can respond to business needs and satisfying theverity of these needs. The ACCRS framework is composedof resource scaling to optimize operational cost, root causeanalysis of faults, early fault detection and fault preventionwith a fast recovery to system’s normal state, which is iden-tified as a safe zone.

The rest of this paper presents and evaluates the ACCRSframework. Section 2 highlights some of the related research.Section 3 presents the details of the ACCRS framework. Sec-tion 4 presents the results that are generated using CloudExp.Finally, Section 5 concludes our findings and presents ourplanned future work.

2 Related work

Various researchers have tackled the utilization of a cloudsystem from different perspectives. Buyya et al. [11] pre-sented a data analytics workflow engine, which is a prototypefor dynamic resource provisioning. This systemmonitors thesystem workflow and calls the resource manager to increaseor decrease cloud resources. The Aneka system [12] presentsa resource provisioning algorithm based on SLA orientationby comparing the time needed to accomplish the current jobswith the proposed SLA timeline. Coasters system [13] isa uniform resource provisioning and access for clouds andgrids. It assumes to build a uniform system to access a multi-service system, such as a Grid or a cloud. Its main goal isto convene both usability and performance goals. Lee et al.[14] presented optimal cloud resource provisioning (OCRP)algorithm that tries to minimize both under-provisioningand over-provisioning problems under the demand and priceuncertainty in cloud computing environments. Dejun et al.[15] presented a resource provisioning approach for webapplications in the cloud. Their approach consists of per-formance profiling where each tier consists of multiple hostsrunning the same application. Scarce [16] consists of deploy-ing an agent at server side that is responsible formanaging theresources and checking the system’s health. Federated CloudEnvironment [17] is a collection of IaaS providers that inter-act with each other in order to minimize the cost of resourceprovisioning.

The authors in [18] presented a resource provisioningapproach to provide users with control over the resourcemanager. A limited look-ahead control (LLC) [19] proposeda prediction algorithm to solve the resource provisioningbased on model predictive control techniques. In [20], theauthors presented guided redundant submission (GRS) toensure high performance distributed computing (HPDC) thattries to apply resource provisioning on slot allocation. Pandaet al. [21] proposed several task scheduling algorithms in het-erogeneous multi-cloud systems. Finally, the authors in [22]presented resource provisioning using virtual machine mul-tiplexing, which defines the performance measurement andresource provisioning through SLA. The authors in [23] pre-sented dynamic resources provisioning using a multi-agentbased technique.

An autonomic system can be a collection of autonomiccomponents, which can manage their internal behaviorsand relationships with others in accordance to a set ofpredefined policies. The autonomic computing system hasproperties [24] such as self-optimizing, self-protecting, self-configuring, and self-healing. It should be noted that anautonomic computing system addresses these issues in anintegrated manner rather than being treated in isolation.Consequently, the system design paradigm takes a holisticapproach that can integrate all these attributes seamlessly and

123

Page 3: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

efficiently. In our work, we mainly focus on self-configuringof a cloud system. The ACCRS framework aims to pro-vide the optimal resource allocation to satisfy user’s SLA,reduce power consumption, and increase the resource effi-ciency level (higher utilization level).

Algorithm 1 System State Analyses and Decision Making1: procedure SSA- DMA2: WorkloadType ← SSA-WCA3: NeededHosts ← SSA-Predict4:5: � Check System for Potential Faults (i.e. Hardware Failure)6: while WorkloadT ype ∈ Sa f eZone ∧ FAULT == FALSE

do7: DoNothing

8: while WorkloadT ype /∈ Sa f eZone ∧ FAULT == FALSEdo

9: if WorkloadT ype == L IGHT then10: CloudState ← UU11: Decrease Resources based on Num. of NeededHosts

12: if WorkloadT ype == HEAVY then13: CloudState ← OU14: if Resources == Available then15: Increase Resources based on Num. of NeededHosts16: else17: Add-to-WaitingQueue

18: if FAULT == T RUE then19: �Apply RCA Techniques and Identify Faulty Resources (i.e.

Hardware ID)20: if WorkloadT ype == L IGHT then21: CloudState ← UUF22: Decrease/Increase Resources based on Num. of Needed

Hosts23: Migrate Obstructed VMs24: else if WorkloadT ype == HEAVY then25: CloudState ← OUF26: if Resources == Available then27: Increase Resources based on Num. of NeededHosts28: Replace Infected hosts29: Migrate Obstructed VMs30: else31: Add-to-WaitingQueue

Algorithm 2 System State Analyses and Workload Classifi-cation1: procedure SSA- WCA2: cpu ← current level of CPU3: ram ← current level of RAM4: bw ← current level of BandWidth5:6: � CPU, RAM, & Bandwidth Thresholds are: 80%, 86%, & 63%,

respectively7: if cpu > 80%∨ram > 86%∨bw > 63% then return HEAVY8: else if cpu < 70% ∨ ram < 70% ∨ bw < 50% then return

HEAVY9: else return Sa f eZone

Fig. 1 The ACCRS framework

3 ACCRS framework

Figure 1 depicts ACCRS framework and its major compo-nents. These components are explained below.

3.1 System state monitoring (SSM)

This component is responsible for recording CPU, RAM,bandwidth utilization level, system throughput, and powerconsumption data.

3.2 System state analyses and decision makingalgorithm(SSA-DMA)

This component processes the data collected in theSSMcom-ponent above. It uses Algorithm 1 to perform the correctdecision. The data is first checked for any possible hardwarefailure by measuring the system throughput. We assume thatthe system input must meet the system output based on theaspects of cloud system flow and the number of active VMs(i.e., 50 VMs). In case of hardware failure, we use the RootCause Analysis (RCA) algorithm [25,26] in finding the faultorigin (i.e. Host ID). RCA is used to identify and replace thehosts with failure. In case of free-errors state, we monitorthe system utilization level to identify the workload intensityFig. 2.

Algorithm 2 presents the Workload Classification Algo-rithm (WCA) that identifies the system’s workload as heavyor light weight. WCA defines heavy workload by measur-ing utilization level of system’s RAM, CPU, and bandwidthand checks whether any of these attributes breach the 86, 80,and 63% respectively; these thresholds are experimentallyidentified by our experiments. A system in a faulty state or aheavy workload needs resource scaling, i.e. increase numberof hosts, in order to return to the safe zone, which is between70% and 80% utilization. Experimentally, it is reported byIBM that the ideal system utilization level for a near opti-mal power consumption is 75% [27]. The light workloadstate is considered when system utilization is below 70%. Asystem with a lightweight workload consumes unnecessarypower as in the case of heavy workload, but with low uti-lization level and low throughput. By decreasing the numberof hosts to an optimal number, we can achieve better sys-

123

Page 4: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

Fig. 2 System’s safe zone [5]

tem utilization level within the system’s safe zone. This alsooptimizes the power consumption and system performancewithout impacting system’s reliability, availability and end-user’s SLAs. ACCRS identifies a cloud system in one of fivestates:

1. Safe zone means the system performs properly and itsutilization level is within 70–80%, which means the sys-tem has an optimal utilization level, power consumption,and QoS (SLA is maintained).

2. Under-utilization (UU) occurs when the system encoun-ters a light workload. The power consumption of thisworkload is high and its throughput is low.

3. Under-utilization with fault (UUF) occurs when thesystem encounters a light workload with some faultyresource (i.e., hardware failure).

4. Over-utilization (OU) occurs when the system encoun-ters a heavy workload that puts the system utilization andthroughput in a high level. Increasing the system utiliza-tion will cause the system to drop incoming workloadsor delay them in a waiting queue.

5. Over-utilization with Fault (OUF) occurs when the sys-tem encounters a heavy workload with some faultyresources (i.e., hardware failure).

3.3 Host-level resource scaling (H-LRS)

This component has a Global Cloud Manager (GCM) thatperforms resource scaling based on the best cluster config-uration that is determined in the previous component (theSSA-DMA). Experimentally, and based on a normal cloudsystem configuration, we found that one host in heavy work-load can deal with 5–6 VMs. Consequently, these results areused to predict the amount of resources needed to increase ordecrease the number of hosts. The resource scaling process

will decide to scale up or down resources in order to getthe near optimal utilization and power consumption whilemaintaining high level QoS.

Continuous system monitoring is required to collect sys-tem’s information of utilization level. The main objective isto keep the system within its safe zone as possible. If thesystem is out of its safe zone, the ACCRS tries to returnthe system to its safe zone by scaling up or down systemresources. Furthermore, the two-phase prediction algorithmis used to predict the optimal number of hosts needed, seeAlgorithm 3. Through our experimentation, we found thatfor light and heavy workloads, we can predict the number ofhosts to decrease or increase in order to save power, increaseutilization, and prevent the system from dropping or evendelaying user requests. System specification contains the cur-rent number of hosts (HN ), host RAM (HR), host CPU (HC ),and the safe zone threshold sa f ezonet that is found based onthe number of VMs (VN ), and the maximum memory (VR)and CPU (VC ) specifications that it could have, and the safezone boundaries.

– Phase 1 Lines 13–16 of Algorithm 3 produce a predic-tion of how much resources are needed in the currentworkload. These equations are experimentally formu-lated. Their values are used to apply the second phaseto produce the near optimal number of hosts needed toserve the current workload within the acceptable highutilization level.

Algorithm 3 Host-Level Resource Prediction (H-LRP)1: procedure SSA- Predict2:3: � Hosts Specifications4: HN ← Total Number of Hosts5: HR ← Total Number of RAM per Host6: HC ← Total Number of CPU per Host7:8: � VMs Specifications9: V MN ← Total Number of VMs10: V MR ← Total Number of RAM per VM11: V MC ← Total Number of CPU per VM12:13: RAMNeededHosts ← HN × HR × Sa f eZoneR14: CPUNeededHosts ← HN × HC × Sa f eZoneC15: RAMNeededVMs ← VN × VR × Sa f eZoneR16: CPUNeededV Ms ← VN × VC × Sa f eZoneC17:18: � Needed RAM19: RAMUsagePrediction ← RAMNeededVMs ÷ RAMNeededHosts

20:21: � Needed CPU22: CPUUsagePrediction ← CPUNeededVMs ÷ CPUNeededHosts

23:24: � Predicted Number of Hosts25: #NeededHosts ← �(max (RAMUsagePrediction × HN ,

CPUUsagePrediction × HN ))�

123

Page 5: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

Fig. 3 ACCRS: multi-levelresource scaling

– Phase 2 The ACCRS predicts the number of hosts basedon the maximum number (see line 25 of Algorithm 3)between the two numbers produced by lines 19 and 22of Algorithm 3. Our experimental results show that thesystem operational safe zone is determined by an approx-imate utilization between 70 and 80%. Moreover, theupper border of the safe zone is determined to be about86, 80, and 63% utilization level for all of RAM, CPU,and bandwidth respectively.

3.4 VM-level resource scaling (VM-LRS)

Unlike the H-LRS and its GCM, this component has a LocalCloud Manager (LCM) that performs resource scaling basedon the utilization level of the VM itself. Its main objective isto reconfigure (i.e. scale) the VMs’ specifications in order tocopewith the dynamic changes in theworkload. For example,the VM might have a light workload whereas it is config-ured to handle a heavy workload. By dynamically changingthe VM’s resource configurations, ACCRS adds more VMsto the same active host without the need to employ newhosts. As shown in Fig. 3, GCM administers the total systemflow whereas LCM is responsible for monitoring the VMsand the cloud user’s demand (utilization level on the VMitself). Algorithm 4 presents the VM-LRS resource scalingthat decides which VM will be scaled down in order to freemore resource. This approach increases the system through-put without the need for extra new resources; with a slightincrease in power consumption.

Algorithm 4 VM-Level Resource Scaling1: procedure VM- LRS2: for each V Mi ∈ V Ms do3: if Actual Utilization of V Mi > 65% then4: if Utili zation ∈ 80% − 86% of Reserved Resources then5: Do Not Scale This VM6: else7: Decrease V Mi Reserved Resources8: else9: Decrease V Mi Reserved Resources

4 Experimental and simulated results

4.1 Experimental setup

CloudExp [28] is built on top of CloudSim, which is a cloudcomputing modeling and simulation tool [29]. CloudExp isa rich, comprehensive, yet simple easy-to-use and efficientcloud computing modeling and simulation toolkit. It addsnew features such as the support for different types of work-loads. Users can build cloud infrastructure and customize allits aspects from the host processing nodes to the networktopology. It also allows users to integrate SLAs and otherbusiness aspects; it includes an extensive workload gener-ator capable of representing real world cloud applicationsaccurately. It allows users to comprehend the different cloudsystemcomponents and their roles in thewhole system.Userscan modify various components and their parameters, runsimulations, and analyze results. Hence, CloudExp’s mod-ularity in design allows users to integrate new componentsand extend existing ones easily and effectively.

We based our workload on Rain workload generator [30].Each user (or a task) is assigned to a certain generator and athread that is executed in the assigned VM. When the threadfinishes executing, it generates a summary (e.g. status, exe-cution results, etc.). The experiments are kept for 48 h witha space-shared VM allocation policy. Table 1 shows the sys-tem setup specifications that are used in the experiments.Our experimentation setup consists of 10 hosts with fixedresources. ACCRS prediction algorithm can predict the nearoptimal number of hosts that can run the current workload.Algorithms 1,2, and 3 allow us to predict the number of hoststhat are needed in order to minimize power consumption andget the optimal utilization level by returning the system to itssafe zone.

4.2 Results

The efficacy of ACCRS framework has been simulatedentirely in CloudExp [28]. The experiments consist of creat-ing multiple cloud environments to simulate the heavy andlightworkloads. The setup contains several types ofworkload

123

Page 6: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

Table 1 System’s specificationComponent RAM CPU (MIPS) Bandwidth

Host 4.5 GB 5000 (Dual Core) 1,000,000

VM Random (512–1024) MB 2000(1Core) Random (100,000–150,000)

0

20

40

60

80

100

10 15 20 22 25 30 33 35 40 50 2.6

2.8

3

3.2

3.4

3.6

Util

izat

ion

(%)

Pow

er /

Wat

t (x1

04)

VMs

BWCPURAM

Power

Fig. 4 System state monitoring with increasing workloads

0

20

40

60

80

100

10 15 20 22 25 30 33 35 40 50

Util

izat

ion

(%)

VMs

BWCPURAM

Power

2.6

2.8

3

3.2

3.4

3.6

Pow

er /

Wat

t (x1

04)

Fig. 5 System state monitoring with random workloads

patterns tested on stable and unstable environments (withfaults). A stochastic model is applied to simulate differentscenarios of cloud environments, users, and user requests(workloads). Different types of cloud environments are used.Figure 4 presents system normal flow with increasing work-load. The RAM, CPU, bandwidth utilization and powerconsumption increase as the workload increases overtime.Figure 5 presents the cloud system behavior with chang-ing workloads overtime. The purpose of this experiment isto show how the system utilization and power consumptionincreases or decreases in relevance to theworkloads intensity.

0

20

40

60

80

100

10 15 20 22 25 30 33 35 40

50/1

050

/11

50/1

250

/13

50/1

450

/15

50/1

650

/17

Util

izat

ion

(%)

VMs/Number of Hosts

Threshold 2Threshold 1

BWCPURAM

Fig. 6 Increasing demand on cloud (system utilization with resourcescaling)

2.5

3

3.5

4

4.5

5

5.5

6

10 15 20 22 25 30 33 35 4050

/1050

/1150

/1250

/1350

/1450

/1550

/1650

/17

Pow

er /

Wat

t (x

105 )

VMs/Number of Hosts

PowerThreshold 1Threshold 2

Fig. 7 Increasing demand on cloud (power consumptionwith resourcescaling)

4.3 Host-level resource scaling (H-LRS)

Figure 6 presents the system with increasing workload andfixed number of hosts. The system starts droppingVMswhenthe system reaches 86% RAM, 80% CPU, and 63% band-width utilization. This may present a possible fault or anover-utilization state.Consequently, the systemstate divergesfrom the safe zone and ACCRS needs to scale up resources.By increasing number of hosts and redistributing the work-loads, the system starts to return back to its normal state(safe zone). On the other hand, Fig. 7 shows the correspond-ing increase in power consumption, which is a trade off that

123

Page 7: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

0

10

20

30

40

50

60

70

80

90

100

35/1

035

/1035

/1035

/1035

/1035

/1035

/1035

/935

/935

/935

/935

/935

/935

/935

/835

/835

/835

/835

/835

/835

/8 0

50000

100000

150000

200000

250000

300000

350000U

tiliz

atio

n (%

)

Pow

er C

onsu

mpt

ion

(Wat

t)

VMs / Number of hosts

RAM UPower

Fig. 8 RAM utilization level for underutilization state (with resourcescaling)

0

10

20

30

40

50

60

70

80

90

100

35/1

035

/1035

/1035

/1035

/1035

/1035

/1035

/935

/935

/935

/935

/935

/935

/935

/835

/835

/835

/835

/835

/835

/8 0

50000

100000

150000

200000

250000

300000

350000

Util

izat

ion

(%)

Pow

er C

onsu

mpt

ion

(Wat

t)

VMs / Number of hosts

CPU UPower

Fig. 9 CPU utilization level for underutilization state (with resourcescaling)

we are willing to pay for in order to prevent any possibleSLA violation.

Figures 8 and 9 present theRAMandCPUutilization levelin the under-utilization state with a light workload. Whenhost’s allocated resources aremore thanwhat is needed by thecurrent workload, the system works perfectly but with a lowutilization level and high power consumption. In this case,the ACCRS scales down the system resources (i.e. numberof hosts) in order to increase the utilization level and reducethe power consumption. Figure 10 presents the bandwidthutilization level when dealing with light workload. ACCRSreduces the power consumption by removing hosts (i.e. scal-ing down) and redistributes the workloads into other hosts.

Approximately, we can save up to 8% in power consump-tion by running 9 out of 10 hosts. Also, we can save up to 22%in power consumption by running 8 out of 10 hosts for thesame workload. These figures ensures that no further scalingdown can be made in resources when the 8 hosts are reached;

0

10

20

30

40

50

60

70

80

90

100

35/1

035

/1035

/1035

/1035

/1035

/1035

/1035

/935

/935

/935

/935

/935

/935

/935

/835

/835

/835

/835

/835

/835

/8 0

50000

100000

150000

200000

250000

300000

350000

Util

izat

ion

(%)

Pow

er C

onsu

mpt

ion

(Wat

t)

VMs / Number of hosts

BW UPower

Fig. 10 BW utilization level for underutilization state (with resourcescaling)

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 40

50

60

70

80

90

100

Util

izat

ion

(%)

Sys

tem

Thr

ough

put (

VM

Num

ber)

Time

RAM UCPU U

VM Number

Fig. 11 Fault injection in stable environment

it is when system utilization reaches its near maximum. Anyfurther resource scaling will diverge the system from its safezone; unless there is a change in the workload.

In Figure 11, we injected faults (i.e. infected hosts at timeframes 9–11). Then the throughput started to go down and theutilization level started to go up. Injecting faults is achievedthrough reducing the system’s RAM by 75, 50 and 25% foreach time frame.After locating the infectedhost, it is replacedby a new one in order to bring system’s state back to normal.As shown in Fig. 11, the system in time frames 13–18 willbe back to its normal state.

In our experimentation setup, the system can handle upto 50 VMs. In another experiment, we used 60 VMs on thesame system specifications used above. In this experiment,the system needed to queue the extra 10 VMs until resourcesare available. Table 2 presents the time and power consump-tion after scaling up resources or queuing extra VMs. It isclear that the ACCRS framework provides 20.5% power sav-

123

Page 8: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

Table 2 Resource scaling versus queuing

ACCRS Queue time Gain (%)

Power (W) 42100 52300 20.5

Time (min) 260 285 5.2

40

45

50

55

60

65

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10

Util

izat

ion

(%)

Time

reserved CPUCPU

Reserved RAMRAM

Fig. 12 Power consumption before and after the VM-LRS is applied

40

45

50

55

60

65

70

75

80

85

90

1 2 3 4 5 6

Util

izat

ion

(%)

Time

Old RAMOld CPU

New RAMNew CPU

Target Level

Fig. 13 System throughput before and after the VM-LRS is Applied

ing comparingwith the queuing technique. It also reduces theexecution time by 5.2%.

4.4 VM-level resource scaling (VM-LRS)

Figure 12 shows the user’s allocated RAM and CPU in onehost and the actual usage for them in a single VM created onthe same host. The VM is configured with about 30% of extraresources than it is actually needed. By reconfiguring theVMallocated resources, we have extra resources to create newVMs on the same host. Figure 13 presents the new utilizationlevels after applying ACCRS’s LCM with an increase of theutilization levels of theVMnear the reserved utilization level.

5 Conclusion and future work

This paper presented on-demand Autonomic Cloud Com-puting Resource Scaling (ACCRS) framework for cloudcomputing infrastructures. By applying global and localACCRS on a cloud system, we can increase system availabil-ity and reliability, reduce power consumption, and optimizeutilization rate within a safe zone. ACCRS scales up or downthe number of physical hardware (hosts) and VMs and pro-vides an early detection for hardware failure. This keeps thesystem running in a near optimal power consumption andperformance safe zone. One drawback of this work is theuse of simulation models to conduct the experimental resultsand the evaluation parts. As future work, we are aiming atapplying ACCRS approach in a real cloud infrastructure thatwill enable us to evaluate our simulated results with the onesgenerated from the real environment.

References

1. Parashar, M., Hariri, S.: Autonomic Computing: Concepts,Infrastructure, and Applications. CRC press (2006)

2. Al-Dahoud, A., Al-Sharif, Z., Alawneh, L., Jararweh, Y.: Auto-nomic cloud computing resource scaling. In: 4th InternationalIBM Cloud Academy Conference (ICACON 2016), University ofAlberta, Edmonton, Canada, IBM (2016)

3. Jararweh, Y., Al-Ayyoub, M., Darabseh, A., Benkhelifa, E., Vouk,M., Rindos, A.: Software defined cloud: survey, system and evalu-ation. Future Gener. Comput. Syst. 58, 56–74 (2016)

4. Darabseh, A., Al-Ayyoub, M., Jararweh, Y., Benkhelifa, E., Vouk,M., Rindos, A.: Sddc: a software defined datacenter experimentalframework. In: Future Internet of Things and Cloud (FiCloud),2015 3rd International Conference on, pp. 189–194 (2015)

5. Jararweh, Y.: Autonomic Programming Paradigm for High Per-formance Computing. PhD thesis, University of Arizona, Tucson(2010). AAI3423763

6. Dai, Y., Xiang, Y., Zhang, G.: Self-healing and hybrid diagnosisin cloud computing. In: Cloud Computing, pp. 45–56. Springer(2009)

7. Bhaduri, K., Das, K.,Matthews, B.L.: Detecting abnormalmachinecharacteristics in cloud infrastructures. In: DataMiningWorkshops(ICDMW), 2011 IEEE 11th International Conference on, pp. 137–144, IEEE (2011)

8. Alhosban, A., Hashmi, K., Malik, Z., Medjahed, B.: Self-healingframework for cloud-based services. In: Computer Systems andApplications (AICCSA), 2013 ACS International Conference on,pp. 1–7, IEEE (2013)

9. Buyya, R., Ramamohanarao, K., Leckie, C., Calheiros, R.N.,Dastjerdi, A.V., Versteeg, S.: Big data analytics-enhanced cloudcomputing: Challenges, architectural elements, and future direc-tions. arXiv preprint, arXiv:1510.06486 (2015)

10. Islam, S., Keung, J., Lee, K., Liu, A.: An empirical study intoadaptive resource provisioning in the cloud. In: IEEE InternationalConference on Utility and Cloud Computing (UCC 2010), p. 8(2010)

11. Buyya, R., Garg, S.K., Calheiros, R.N.: Sla-oriented resourceprovisioning for cloud computing: challenges, architecture, andsolutions. In: Cloud and Service Computing (CSC), 2011 Interna-tional Conference on, pp. 1–10, IEEE (2011)

123

Page 9: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

12. Vecchiola, C., Chu, X., Buyya, R.: Aneka: a software platformfor.net-based cloud computing. High Speed Larg. Scale Sci. Com-put. 18, 267–295 (2009)

13. Hategan, M., Wozniak, J., Maheshwari, K.: Coasters: uniformresource provisioning and access for clouds and grids. In: Util-ity and Cloud Computing (UCC), 2011 Fourth IEEE InternationalConference on, pp. 114–121, IEEE (2011)

14. Chaisiri, S., Lee, B.-S., Niyato, D.: Optimization of resource provi-sioning cost in cloud computing. IEEE Trans. Serv. Comput. 5(2),164–177 (2012)

15. Dejun, J., Pierre, G., Chi, C.-H.: Resource provisioning of webapplications in heterogeneous clouds. In: Proceedings of the 2ndUSENIX Conference on Web Application Development, pp. 5–5,USENIX Association (2011)

16. Bonvin, N., Papaioannou, T.G., Aberer, K.: Autonomic sla-drivenprovisioning for cloud applications. In: Proceedings of the 201111th IEEE/ACM International Symposium on Cluster, Cloud andGrid Computing, pp. 434–443, IEEE Computer Society (2011)

17. Toosi, A.N., Calheiros, R.N., Thulasiram, R.K., Buyya, R.:Resource provisioning policies to increase iaas provider’s profitin a federated cloud environment. In: High Performance Comput-ing and Communications (HPCC), 2011 IEEE 13th InternationalConference on, pp. 279–287, IEEE (2011)

18. Juve G., Deelman, E.: Resource provisioning options for large-scale scientific workflows. In: eScience, 2008. eScience’08. IEEEFourth International Conference on, pp. 608–613, IEEE (2008)

19. Kusic, D., Kephart, J.O., Hanson, J.E., Kandasamy, N., Jiang,G.: Power and performance management of virtualized computingenvironments via lookahead control. Clust. Comput. 12(1), 1–15(2009)

20. Kee,Y.-S., Kesselman, C.: Grid resource abstraction, virtualization,and provisioning for time-targeted applications. In: Cluster Com-puting and the Grid, 2008. CCGRID’08. 8th IEEE InternationalSymposium on, pp. 324–331, IEEE (2008)

21. Panda, S.K., Jana, P.K.: Efficient task scheduling algorithms forheterogeneous multi-cloud environment. J. Supercomput. 71(4),1505–1533 (2015)

22. Meng, X., Isci, C., Kephart, J., Zhang, L., Bouillet, E., Pendarakis,D.: Efficient resource provisioning in compute clouds via vm mul-tiplexing. In: Proceedings of the 7th International Conference onAutonomic Computing, pp. 11–20, ACM (2010)

23. Al-Ayyoub, M., Jararweh, Y., Daraghmeh, M., Althebyan, Q.:Multi-agent based dynamic resource provisioning and monitoringfor cloud computing systems infrastructure. Clust. Comput. 18(2),919–932 (2015)

24. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman,S., Youseff, L., Zagorodnov, D.: The eucalyptus open-sourcecloud-computing system. In: Cluster Computing and the Grid,2009. CCGRID’09. 9th IEEE/ACM International Symposium on,pp. 124–131, IEEE (2009)

25. Bhaumik, S.K.: Root cause analysis in engineering failures. Trans.Indian Inst. Metals 63(2), 297–299 (2010)

26. Zhu, Q., Tung, T., Xie, Q.: Automatic fault diagnosis in cloudinfrastructure. In: Cloud Computing Technology and Science(CloudCom), 2013 IEEE 5th International Conference on, vol. 1,pp. 467–474, IEEE (2013)

27. I.G.B.S. IBM, “Business strategy for cloud providers. http://www.itworldcanada.com/archive/Documents/whitepaper/ITW157B_BusinessStretegyForCloudProviders.pdf (2009). Accessed: 1 June2016

28. Jararweh, Y., Jarrah, M., Alshara, Z., Alsaleh, M., Al-Ayyoub, M.:Cloudexp: a comprehensive cloud computing experimental frame-work. Simul. Model. Pract. Theory 49, 180–192 (2014)

29. Calheiros, R.N., Ranjan, R., Beloglazov,A., DeRose, C.A., Buyya,R.: Cloudsim: a toolkit for modeling and simulation of cloudcomputing environments and evaluation of resource provision-

ing algorithms. Software: Practice and Experience, vol. 41, no. 1,pp. 23–50 (2011)

30. Beitch, A., Liu, B., Yung, T., Griffith, R., Fox, A., Patterson, D.A.:Rain: AWorkload Generation Toolkit for Cloud Computing Appli-cations. University of California, Tech. Rep. UCB/EECS-2010-14(2010)

ZiadA.Al-Sharif is currently anassistant professor at JordanUni-versity of Science and Technol-ogy, Irbid, Jordan. He joined theDepartment of Software Engi-neering in February of 2010.Dr. Al-Sharif received his Ph.D.degree in Computer Science inDecember of 2009 from the Uni-versity of Idaho, USA. He alsoreceived his MS. degree in Com-puter Science in August of 2005from New Mexico State Univer-sity, USA. His research interestsare in digital forensics, cloud

computing, software engineering, and collaborative virtual environ-ments.

Yaser Jararweh received hisPh.D. in Computer Engineeringfrom University of Arizona in2010.He is currently an associateprofessor ofComputer Science atJordan University of Science andTechnology, Jordan. He has co-authored about seventy techni-cal papers in established journalsand conferences in fields relatedto cloud computing, HPC, SDNand Big Data. He was one of theTPC Co-Chair, IEEE Globecom2013 International Workshop onCloud Computing Systems, and

Networks, and Applications (CCSNA). He is a steering committeemember for CCSNA 2014 and CCSNA 2015 with ICC. He is the Gen-eral Co-Chair in IEEE International Workshop on Software DefinedSystems SDS-2014 and SDS 2015. He is also chairing many IEEEevents such as ICICS, SNAMS, BDSN, IoTSMS and many others. Dr.Jararweh served as a guest editor for many special issues in differentestablished journals. Also, he is the steering committee chair of the IBMCloud Academy Conference.

Ahmad Al-Dahoud is a Ph.D.student at the University of Brad-ford, UK. He received his Ms.degree of computer science fromJordan University of Scienceand Technology (JUST). Hisresearch interest include cloudand autonomic computing.

123

Page 10: ACCRS: autonomic based cloud computing resource scaling

Cluster Comput

Luay M. Alawneh is an assis-tant professor in the Depart-ment of Software Engineeringat Jordan University of Sci-ence and Technology, Irbid,Jordan. His research interestsare software engineering, soft-ware maintenance and evolution,and high performance comput-ing systems. Alawneh receiveda Ph.D. in electrical and com-puter engineering from Concor-dia University.

123