Predicting Web Service Levels During VM Live Migrations · I )Identify and quantify e ects on (web) service levels I )Find most in uencing utilization metrics Experiment I Two servers,

Predicting Web Service LevelsDuring VM Live Migrations

Helmut Hlavacs and Thomas TreutnerResearch Group Entertainment Computing, University of Vienna, Austria

Introduction

I VM live migration important for energy efficiencyI Enables us to establish energy efficient target distribution of VMsI Supposedly no perceivable service downtime while live migratingI Live migration is resource intensive (iterative page copying)I Experiments: Influence on service levels while migrating?I Modelling: Predict service levels based on utilization?

Scenario

I Virtualized data center, static consolidation (P2V)I Provisioning for peak load, still bad energy efficiencyI E.g., 9-5-cycles, very low utilization at nightI Live migration enables dynamic consolidationI But: Seldom used, fear of possible side effectsI⇒ Identify and quantify effects on (web) service levelsI⇒ Find most influencing utilization metrics

Experiment

I Two servers, a single VM, migrating forth and backI VM disk image on central node (Gbit, open-iscsi)I qemu-kvm VM: Linux, Apache2, PHP5, MediaWikiI SQL VM and load generation on an extra nodesI Logging utilization of servers and VM, more than 100 variablesI Rise load from 50 to 600 concurrent virtual users and backI Migrate every 15min, track response time of last 5minI Maximum allowed response time: 1s

Data Overview

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 20 40 60 80 100 120 140 160 180 200

CP

U U

tiliz

atio

n (

VM

no

rme

d)

Interval

0

1

2

3

4

5

6

0 20 40 60 80 100 120 140 160 180 200

Lo

ad

5 A

ve

rag

e

Interval

0

5000

10000

15000

20000

25000

06 12 18 00 06 12 18 00 06 12

Re

sp

on

se

Tim

e [

ms]

Hours

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0 20 40 60 80 100 120 140 160 180 200

Se

rvic

e L

eve

l R

atio

Interval

0

1

2

3

4

5

6

7

8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Lo

ad

5 A

ve

rag

e

CPU Utilization (VM normed)

Measured

M/M/1

M/D/1

M/G/1, c2B=0.42

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0 1 2 3 4 5 6

Se

rvic

e L

eve

l

Load5 Average

MeasuredTheoretic

I We can interpret the UNIX load as approximation to Q, the average numberof jobs in a Markovian M/M/1 queue, and the VM’s CPU utilization as ρ,

the system utilization: QM/M/1 = ρ2

1−ρI UNIX load is exponentially averaged by definition and the service times are

not necessarily exactly exponentially distributed: QM/G/1 = ρ2

1−ρ ·(1+c2

B)

2

I For deterministic service times c2B = 0, resulting in QM/D/1 = ρ2

2(1−ρ)

I Simple linear regression delivers the coefficient c2B = 0.42

I P(T ≤ x) = FT(x) = 1− e−µ(1−ρ)x, the theoretical probability that theresponse time T is lower than or equal to a limit x for a given service rate µ

Service Levels for Low/Medium/High Workload Scenarios

I Low (top right): Slightly increasedresponse times during live migration,seldom response time violations

I Medium (bottom left): SLA ratiogenerally satisfies the 97% limit

I High (bottom right): Often andheavy violations, unacceptable lowservice levels, typically decreased by20-25% percentage points

0

2

4

6

8

10

12

14

16

0.85

0.9

0.95

1

Respo

nse T

ime

[s]

SL

A R

atio

Time

HTTP Response TimeMigration started

Migration finishedSLA Ratio

Maximum allowed response time

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0.95

0.96

0.97

0.98

0.99

1

Respo

nse T

ime

[s]

SL

A R

atio

Time




0

5

10

15

20

0.5

0.6

0.7

0.8

0.9

1

Respo

nse T

ime

[s]

SL

A R

atio

Time




Model Selection

I Stepwise model selection:Akaike Information Criterion

I Finds trade-off between number ofparameters (model size) andgoodness of fit (model quality)

I For comparison: Exhaustiveall-subsets-regression (LEAPS)

I LEAPS: Find best of all possiblemodels for given range of model size

I Computationally intensive even ifnumber of variables is limited

I R2Adj increases slightly with increased

model complexity. UNIX load5

contributes R2Adj of ∼ 90%

0.9

0.91

0.92

0.93

0.94

0.95

0.96

0.97

1 2 3 4 5 6 7 8 9 10 11 12

R2 A

dj

Model Size

Best value

Most Influencing Model ParametersVariable Meaning Estimate Std. Error Pr(> |t|)Intercept 2.395e+00 5.069e-01 3.00e-06

wp01 load5 VM UNIX load5 -1.871e-02 2.627e-03 3.67e-12

wp01 swapUsed VM swap used -7.656e-07 8.809e-08 < 2e-16

wp01 residentSize SQ Suared amount of resident memory used by

the qemu-kvm process.

-4.652e-14 1.652e-14 0.00506

src host cpu proc s Tasks created/s, source host. -2.475e+00 9.166e-01 0.00716

src host cpu proc s SQ 1.091e+00 4.133e-01 0.00856

wp01 cpu util vmnorm SQ -1.328e-01 3.187e-02 3.63e-05

wp01 cpu util vmnorm CPU util measured inside VM 9.517e-02 2.316e-02 4.64e-05

wp01 load5 SQ Squared UNIX load5 of the VM. -1.140e-03 1.918e-04 5.22e-09

wp01 freeMemRatio SQ Squared ratio of free memory inside the VM. 1.976e-02 9.462e-03 0.03727

Conclusions

I Impact of live migration on SL depends on amount of workloadI Tighter SLAs can be fulfilled during low and medium workloadI Migrating during high load causes massive decrease of service levelI Service level variance during a live migration to 90% predictable

using only a single variable, the UNIX load5 average, models with 12variables can explain 95% of variance

I Systems using live migration as a mechanism to realize a more energy efficienttarget distribution and have service level targets need to consider the UNIX

load average, but typical hypervisors do not collect/export this informationI Hypervisors should be extended to export load information (cf. free memory)

Future Work

I Influence of additional VMs (idle, utilized, mixed)I Linux and qemu-kvm: Kernel Samepage Merging (KSM)I Database VM migration, currently taboo due to potentially severe influenceI qemu-kvm parameters: Bandwidth limits, maximum allowed downtimeI Predict migration delay, energy consumption, service downtime

Created with LATEXbeamerposter http://www-i6.informatik.rwth-aachen.de/~dreuw/latexbeamerposter.php

http://cs.univie.ac.at/research/research-groups/entertainment-computing/ <firstname>.<lastname>@univie.ac.at

http://www-i6.informatik.rwth-aachen.de/~dreuw/latexbeamerposter.php

Documents

Predicting Web Service Levels During VM Live Migrations · I )Identify and quantify e ects on (web) service levels I )Find most in uencing utilization metrics Experiment I Two servers,