Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Predicting Web Service Levels During VM Live Migrations
Predicting Web Service Levels During VM Live
Migrations
5th International DMTF Academic Alliance Workshop on Systems
and Virtualization Management: Standards and the Cloud
Helmut Hlavacs, Thomas Treutner
24. 10. 2011
1 / 31
Predicting Web Service Levels During VM Live Migrations
Table of Contents
1 Scenario
2 Experiment
3 Data overview
4 Model selection
5 Conclusions
2 / 31
Predicting Web Service Levels During VM Live Migrations
Abstract
• VM live migration important for energy efficiency• Establish energy efficient target distribution of VMs
• No perceivable downtime while live migrating?
• Live migration is resource intensive (iterative page copying)
• Experiments: Influence on service levels while migrating?• Modelling: Predict service levels based on utilization?
3 / 31
Predicting Web Service Levels During VM Live Migrations
Scenario
Scenario
Scenario
• Virtualized data center, static consolidation (P2V)
• Provisioning for peak load, still bad energy efficiency
• E.g., 9-5-cycles, very low utilization at night
• Live migration enables dynamic consolidation
• But: Seldom used, fear of possible side effects
• ⇒ Identify and quantify effects on (web) service levels
4 / 31
Predicting Web Service Levels During VM Live Migrations
Scenario
Scenario
5 / 31
Predicting Web Service Levels During VM Live Migrations
Experiment
Experiment
Experiment
• 2 servers, 1 VM, migrating forth and back
• VM disk image on central node (Gbit, open-iscsi)
• qemu-kvm VM: Linux, Apache2, PHP5, MediaWiki
• SQL VM and load generation on an extra nodes
• Logging utilization of servers and VM, >100 variables
• Rise load from 50 to 600 concurrent virtual users and back
• Migrate every 15min, track response time of last 5min (SLA=1s)
6 / 31
Predicting Web Service Levels During VM Live Migrations
Experiment
Experiment
7 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
CPU utilization
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 20 40 60 80 100 120 140 160 180 200
CP
U U
tiliz
atio
n (
VM
no
rme
d)
Interval
Abbildung: Effect of increasing/decreasing HTTP workload in equal steps on
the VM’s CPU utilization
8 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Load average
0
1
2
3
4
5
6
0 20 40 60 80 100 120 140 160 180 200
Lo
ad
5 A
vera
ge
Interval
Abbildung: Effect of increasing/decreasing HTTP workload in equal steps on
the VM’s UNIX load average.
9 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Markov Queues
• UNIX load as approximation to Q, the average number of jobs
in an M/M/1 queue, CPU utilization as ρ, the system utilization
• QM/M/1 = ρ2
1−ρ
• UNIX load exponentially averaged by definition, service times
are not necessarily exactly exponentially distributed, a
systematic deviation from the theoretical solution is expectable.
• QM/G/1 = ρ2
1−ρ · (1+c2
B)2
, cB: coefficient of service time variation
• Exponentially distributed service times: c2
B = 1
• Deterministic service times: c2
B = 0
• Simple linear regression: c2
B = 0.42
10 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
0
1
2
3
4
5
6
7
8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Lo
ad
5 A
vera
ge
CPU Utilization (VM normed)
Measured
M/M/1
M/D/1
M/G/1, c2B=0.42
Abbildung: Queuing issues are rapidly increasing at 60-70% CPU utilization.
The measured data follow the trend of the theoretical solution very closely.
11 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time
0
5000
10000
15000
20000
25000
06 12 18 00 06 12 18 00 06 12
Re
spo
nse
Tim
e [
ms]
Hours
Abbildung: HTTP response times are massively increased during phases
with higher workload.
12 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Service Level
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 20 40 60 80 100 120 140 160 180 200
Se
rvic
e L
eve
l R
atio
Interval
Abbildung: Service levels decrease significantly during phases of high
workload. The maximum allowed response time is 1 s, which is extremely
conservative.13 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Load average vs Service Level
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 1 2 3 4 5 6
Se
rvic
e L
eve
l
Load5 Average
MeasuredTheoretic
Abbildung: Higher load average is an indicator for queuing issues which are
causing decreased service level ratios. The measured data follow the
theoretical solution closely for higher load values.14 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time: Low Load
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0.95 0.96 0.97 0.98 0.99 1
Resp
onse
Tim
e [s]
SLA
Ratio
Time
HTTP Response TimeMigration started
Migration finishedSLA Ratio
Maximum allowed response time
Abbildung: Influence of live migration on HTTP response time and service
level during low load.
15 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time: Medium Load
0
2
4
6
8
10
12
14
16
0.85
0.9
0.95
1
Resp
onse
Tim
e [s]
SLA
Ratio
Time
HTTP Response TimeMigration started
Migration finishedSLA Ratio
Maximum allowed response time
Abbildung: Influence of Live Migration on HTTP Response Time and service
level during medium load.
16 / 31
Predicting Web Service Levels During VM Live Migrations
Data overview
Response Time: High Load
0
5
10
15
20
0.5
0.6
0.7
0.8
0.9
1
Resp
onse
Tim
e [s]
SLA
Ratio
Time
HTTP Response TimeMigration started
Migration finishedSLA Ratio
Maximum allowed response time
Abbildung: Influence of Live Migration on HTTP Response Time and service
level during high load.
17 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
Model selection
Stepwise: AIC
• Akaike Information Criterion (AIC), lower values are better
• AIC = 2k − 2 ln(L)• k : the number of independent parameters of a model
• L: the maximum likelihood
• Stepwise model selection approach
• Find trade-off between number of parameters (model size) and
goodness of fit (model quality)
• Start with empty model, in each step a variable can be
added/removed, until the AIC can not be decreased further.
• Does not guarantee to find a global optimum, but typically gives a
near-optimum result within reasonable time.
18 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
Model selection
Exhaustive: LEAPS
• Comparison with the stepwise AIC approach
• Exhaustive all-subsets-regression
• Find best of all possible models for given range of model size
• Computationally intensive even if number of variables is limited
19 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
-4550
-4500
-4450
-4400
-4350
-4300
-4250
-4200
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
AIC
va
lue
Step
AIC value in current stepFinal AIC value
Abbildung: The AIC value quickly improves during the first steps and then
slowly converges to the final value, thus allowing a trade-off between model
complexity and precision.20 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
0.013
0.014
0.015
0.016
0.017
0.018
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sta
nd
ard
De
via
tion
of
Re
sid
ua
ls
Step
Best value
Abbildung: The residuals’ standard deviation quickly improves during the first
steps and then slowly converges to the final value, when using stepwise AIC.
21 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
Abbildung: QQ-Plot of standardized residuals for the model produced by
stepwise AIC.
22 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC
Abbildung: There is no visible correlation between the residuals’ variance
and the time series.
23 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
LEAPS
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
1 2 3 4 5 6 7 8 9 10 11 12
R2 A
dj
Model Size
Best value
Abbildung: Higher model complexity yields better model quality when using
LEAPS, but the increase is rather small.
24 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
LEAPS
0.015
0.016
0.017
0.018
0.019
0.02
0.021
0.022
0.023
0.024
0.025
1 2 3 4 5 6 7 8 9 10 11 12
Sta
nd
ard
De
via
tio
n o
f R
esid
ua
ls
Model Size
Best value
Abbildung: Higher model complexity yields better regression error when
using LEAPS.
25 / 31
Predicting Web Service Levels During VM Live Migrations
Model selection
AIC: Most influencing variables of final model
Variable Meaning Estimate Std. Error Pr(> |t|)Intercept 2.395e+00 5.069e-01 3.00e-06
wp01 load5 VM UNIX load5 -1.871e-02 2.627e-03 3.67e-12
wp01 swapUsed VM swap used -7.656e-07 8.809e-08 < 2e-16
wp01 residentSize SQ The squared amount of res-
ident memory used by the
qemu-kvm process.
-4.652e-14 1.652e-14 0.00506
src host cpu proc s Tasks created/s, source
host.
-2.475e+00 9.166e-01 0.00716
src host cpu proc s SQ 1.091e+00 4.133e-01 0.00856
wp01 cpu util vmnorm SQ -1.328e-01 3.187e-02 3.63e-05
wp01 cpu util vmnorm CPU util measured inside
VM
9.517e-02 2.316e-02 4.64e-05
wp01 load5 SQ The squared UNIX load5of the VM.
-1.140e-03 1.918e-04 5.22e-09
wp01 freeMemRatio SQ The squared ratio of free
memory inside the VM.
1.976e-02 9.462e-03 0.03727
Tabelle: The most influencing and significant variables of the final model
produced by stepwise AIC.
26 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Qualitative
1 Impact of live migration on SL depends on amount of workload
2 Tighter SLAs can be fulfilled for low workload situations
3 High workload: SL decreases massively
4 ⇒ Live migrating when VM has only medium load
5 Avoid multiple live migrations within short time
6 ⇒ Inertia effect for recently migrated VMs
7 Avoid senseless flip-flop migrations and stabilize service level
27 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Quantitative
1 SL variance during a live migration to 90% predictable using only
a single variable, the UNIX load5 average.
2 Models with 12 variables can explain 95% of variance
3 Models selected by AIC/LEAPS are of comparable quality and
robust to statistical tests
4 AIC is way faster, so LEAPS is not really necessary
28 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Technical
1 Systems using live migration as a mechanism to realize a more
energy efficient target distribution need to consider the UNIXload average, at least for web servers and comparable
2 Typical hypervisors do not collect and export this information
3 Needs to be done by VM introspection
4 Related efforts by qemu-kvm and libvirt developers to
pass-through the VMs’ memory utilization (free mem)
5 Should be extended to export load information
29 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Conclusions
Future Work
1 Influence of additional VMs
• idle
• utilized
• and a mixed scenario
2 Linux and qemu-kvm: Kernel Samepage Merging (KSM)
3 Database VM migration
4 Bandwidth limits, maximum allowed downtime
5 Migration delay, energy consumption, service downtime
30 / 31
Predicting Web Service Levels During VM Live Migrations
Conclusions
Q&A
Thank you for your attention!
31 / 31