View
1
Download
0
Category
Preview:
Citation preview
Predicting Web Service Levels During VM Live Migrations
Predicting Web Service Levels During VM Live
Migrations
5th International DMTF Academic Alliance Workshop on Systems
and Virtualization Management: Standards and the Cloud
Helmut Hlavacs, Thomas Treutner
24. 10. 2011
1 / 31
Predicting Web Service Levels During VM Live Migrations
Table of Contents
1 Scenario
2 Experiment
3 Data overview
4 Model selection
5 Conclusions
2 / 31
Predicting Web Service Levels During VM Live Migrations
Abstract
• VM live migration important for energy efficiency• Establish energy efficient target distribution of VMs
• No perceivable downtime while live migrating?
• Live migration is resource intensive (iterative page copying)
• Experiments: Influence on service levels while migrating?• Modelling: Predict service levels based on utilization?
3 / 31
Predicting Web Service Levels During VM Live Migrations
Scenario
Scenario
Scenario
• Virtualized data center, static consolidation (P2V)
• Provisioning for peak load, still bad energy efficiency
• E.g., 9-5-cycles, very low utilization at night
• Live migration enables dynamic consolidation
• But: Seldom used, fear of possible side effects
• ⇒ Identify and quantify effects on (web) service levels
4 / 31
Predicting Web Service Levels During VM Live Migrations
Scenario
Scenario
5 / 31
Predicting Web Service Levels During VM Live Migrations
Experiment
Experiment
Experiment
• 2 servers, 1 VM, migrating forth and back
• VM disk image on central node (Gbit, open-iscsi)
• qemu-kvm VM: Linux, Apache2, PHP5, MediaWiki
• SQL VM and load generation on an extra nodes
• Logging utilization of servers and VM, >100 variables
• Rise load from 50 to 600 concurrent virtual users and back
• Migrate every 15min, track response time of last 5min (SLA=1s)
6 / 31
Predicting Web Service Levels During VM Live Migrations
Experiment
Experiment
7 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
CPU utilization
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 20 40 60 80 100 120 140 160 180 200
CP
U U
tiliz
atio
n (
VM
no
rme
d)
Interval
Abbildung: Effect of increasing/decreasing HTTP workload in equal steps on
the VM’s CPU utilization
8 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Load average
0
1
2
3
4
5
6
0 20 40 60 80 100 120 140 160 180 200
Lo
ad
5 A
vera
ge
Interval
Abbildung: Effect of increasing/decreasing HTTP workload in equal steps on
the VM’s UNIX load average.
9 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Markov Queues
• UNIX load as approximation to Q, the average number of jobs
in an M/M/1 queue, CPU utilization as ρ, the system utilization
• QM/M/1 = ρ2
1−ρ
• UNIX load exponentially averaged by definition, service times
are not necessarily exactly exponentially distributed, a
systematic deviation from the theoretical solution is expectable.
• QM/G/1 = ρ2
1−ρ · (1+c2
B)2
, cB: coefficient of service time variation
• Exponentially distributed service times: c2
B = 1
• Deterministic service times: c2
B = 0
• Simple linear regression: c2
B = 0.42
10 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
0
1
2
3
4
5
6
7
8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Lo
ad
5 A
vera
ge
CPU Utilization (VM normed)
Measured
M/M/1
M/D/1
M/G/1, c2B=0.42
Abbildung: Queuing issues are rapidly increasing at 60-70% CPU utilization.
The measured data follow the trend of the theoretical solution very closely.
11 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time
0
5000
10000
15000
20000
25000
06 12 18 00 06 12 18 00 06 12
Re
spo
nse
Tim
e [
ms]
Hours
Abbildung: HTTP response times are massively increased during phases
with higher workload.
12 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Service Level
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 20 40 60 80 100 120 140 160 180 200
Se
rvic
e L
eve
l R
atio
Interval
Abbildung: Service levels decrease significantly during phases of high
workload. The maximum allowed response time is 1 s, which is extremely
conservative.13 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Load average vs Service Level
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 1 2 3 4 5 6
Se
rvic
e L
eve
l
Load5 Average
MeasuredTheoretic
Abbildung: Higher load average is an indicator for queuing issues which are
causing decreased service level ratios. The measured data follow the
theoretical solution closely for higher load values.14 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time: Low Load
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0.95 0.96 0.97 0.98 0.99 1
Resp
onse
Tim
e [s]
SLA
Ratio
Time
HTTP Response TimeMigration started
Migration finishedSLA Ratio
Maximum allowed response time
Abbildung: Influence of live migration on HTTP response time and service
level during low load.
15 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time: Medium Load
0
2
4
6
8
10
12
14
16
0.85
0.9
0.95
1
Resp
onse
Tim
e [s]
SLA
Ratio
Time
HTTP Response TimeMigration started
Migration finishedSLA Ratio
Maximum allowed response time
Abbildung: Influence of Live Migration on HTTP Response Time and service
level during medium load.
16 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time: High Load
0
5
10
15
20
0.5
0.6
0.7
0.8
0.9
1
Resp
onse
Tim
e [s]
SLA
Ratio
Time
HTTP Response TimeMigration started
Migration finishedSLA Ratio
Maximum allowed response time
Abbildung: Influence of Live Migration on HTTP Response Time and service
level during high load.
17 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
Model selection
Stepwise: AIC
• Akaike Information Criterion (AIC), lower values are better
• AIC = 2k − 2 ln(L)• k : the number of independent parameters of a model
• L: the maximum likelihood
• Stepwise model selection approach
• Find trade-off between number of parameters (model size) and
goodness of fit (model quality)
• Start with empty model, in each step a variable can be
added/removed, until the AIC can not be decreased further.
• Does not guarantee to find a global optimum, but typically gives a
near-optimum result within reasonable time.
18 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
Model selection
Exhaustive: LEAPS
• Comparison with the stepwise AIC approach
• Exhaustive all-subsets-regression
• Find best of all possible models for given range of model size
• Computationally intensive even if number of variables is limited
19 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
-4550
-4500
-4450
-4400
-4350
-4300
-4250
-4200
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
AIC
va
lue
Step
AIC value in current stepFinal AIC value
Abbildung: The AIC value quickly improves during the first steps and then
slowly converges to the final value, thus allowing a trade-off between model
complexity and precision.20 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
0.013
0.014
0.015
0.016
0.017
0.018
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sta
nd
ard
De
via
tion
of
Re
sid
ua
ls
Step
Best value
Abbildung: The residuals’ standard deviation quickly improves during the first
steps and then slowly converges to the final value, when using stepwise AIC.
21 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
Abbildung: QQ-Plot of standardized residuals for the model produced by
stepwise AIC.
22 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
Abbildung: There is no visible correlation between the residuals’ variance
and the time series.
23 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
LEAPS
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
1 2 3 4 5 6 7 8 9 10 11 12
R2 A
dj
Model Size
Best value
Abbildung: Higher model complexity yields better model quality when using
LEAPS, but the increase is rather small.
24 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
LEAPS
0.015
0.016
0.017
0.018
0.019
0.02
0.021
0.022
0.023
0.024
0.025
1 2 3 4 5 6 7 8 9 10 11 12
Sta
nd
ard
De
via
tio
n o
f R
esid
ua
ls
Model Size
Best value
Abbildung: Higher model complexity yields better regression error when
using LEAPS.
25 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC: Most influencing variables of final model
Variable Meaning Estimate Std. Error Pr(> |t|)Intercept 2.395e+00 5.069e-01 3.00e-06
wp01 load5 VM UNIX load5 -1.871e-02 2.627e-03 3.67e-12
wp01 swapUsed VM swap used -7.656e-07 8.809e-08 < 2e-16
wp01 residentSize SQ The squared amount of res-
ident memory used by the
qemu-kvm process.
-4.652e-14 1.652e-14 0.00506
src host cpu proc s Tasks created/s, source
host.
-2.475e+00 9.166e-01 0.00716
src host cpu proc s SQ 1.091e+00 4.133e-01 0.00856
wp01 cpu util vmnorm SQ -1.328e-01 3.187e-02 3.63e-05
wp01 cpu util vmnorm CPU util measured inside
VM
9.517e-02 2.316e-02 4.64e-05
wp01 load5 SQ The squared UNIX load5of the VM.
-1.140e-03 1.918e-04 5.22e-09
wp01 freeMemRatio SQ The squared ratio of free
memory inside the VM.
1.976e-02 9.462e-03 0.03727
Tabelle: The most influencing and significant variables of the final model
produced by stepwise AIC.
26 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Qualitative
1 Impact of live migration on SL depends on amount of workload
2 Tighter SLAs can be fulfilled for low workload situations
3 High workload: SL decreases massively
4 ⇒ Live migrating when VM has only medium load
5 Avoid multiple live migrations within short time
6 ⇒ Inertia effect for recently migrated VMs
7 Avoid senseless flip-flop migrations and stabilize service level
27 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Quantitative
1 SL variance during a live migration to 90% predictable using only
a single variable, the UNIX load5 average.
2 Models with 12 variables can explain 95% of variance
3 Models selected by AIC/LEAPS are of comparable quality and
robust to statistical tests
4 AIC is way faster, so LEAPS is not really necessary
28 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Technical
1 Systems using live migration as a mechanism to realize a more
energy efficient target distribution need to consider the UNIXload average, at least for web servers and comparable
2 Typical hypervisors do not collect and export this information
3 Needs to be done by VM introspection
4 Related efforts by qemu-kvm and libvirt developers to
pass-through the VMs’ memory utilization (free mem)
5 Should be extended to export load information
29 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Future Work
1 Influence of additional VMs
• idle
• utilized
• and a mixed scenario
2 Linux and qemu-kvm: Kernel Samepage Merging (KSM)
3 Database VM migration
4 Bandwidth limits, maximum allowed downtime
5 Migration delay, energy consumption, service downtime
30 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Q&A
Thank you for your attention!
31 / 31
Recommended