Upload
noha-elprince
View
198
Download
1
Tags:
Embed Size (px)
Citation preview
Autonomous Resource Provision in Virtual Data Centers
Presented by: Noha Elprince
IFIP/IEEE DANMS, 31 May 2013
Unused resources
2
Demand
Capacity
Time
Res
ourc
es
Demand
Capacity
Time
Res
ourc
es
Static Data Centers vs. Dynamic
Figure: RAD Lab, UC Berkeley
• Fixed pre-assigned Resources ( provision for peak )
• Static Environment • Manual change of configurations
• Cloud Elasticity ~ “Pay as you go” • Virtualized Environment • Automated “Self-service” change of
configurations.
3
• under-provisioning Heavy Penalty
Lost revenue
Lost users
Res
ourc
es
Demand
Capacity
Time (days) 1 2 3
Res
ourc
es
Demand
Capacity
Time (days) 1 2 3
Res
ourc
es
Demand
Capacity
Time (days) 1 2 3
3
Figures: RAD Lab, UC Berkeley
Cloud Elasticity problems…
Cloud Elasticity …
4
• over-provisioning overutilization
Demand
Capacity
Time
Res
ourc
es
Unused resources
Figure: RAD Lab, UC Berkeley
5
Virtualization (Cloud Foundation) • Virtualization allows a
computational resource to be partitioned on multiple isolated execution environments (VMs)
• Turning the machine into a “virtual
image” ~ Self-immunity from: Ø Hardware breakdowns. Ø Running out of the resources
Challenge: Service Differentiation
Problem q Over and under provisioning in spite of:
difficulty of estimating the actual needs due to time-varying and diverse workload.
q Enabling service differentiation in a virtualized environment.
6
Methodology
• Develop and implement an autonomic resource management controller that: Ø Effectively optimize the resource by predicting current
resource needs. Ø Continuous resource self-tuning to accommodate load
variations and enforce service differentiation during resource allocation.
• Test the proposed prototype on real traces.
7
Motivation
• Help datacenters to manage resources effectively. • Propagate Cloud Computing (increase cloud users =>
less expensive) • Optimize resources (green I.T !)
8
Related Work
v Approaches for autonomic resource management:
– Utility based self-optimizing approach. – Model-based approach based on perf. Modeling. – Machine Learning Approach – Fuzzy logic approach.
9
Proposed Solution Architecture: Sys. Modeling
10
r(t+1)
I. System Modeling : Data set
11
• Idea: Learning from successful jobs: (normal termination, fulfill client’s anticipated perf.) • A real computing center trace of Los Alamos National Lab (LANL) • LANL is a United States Department of Energy (DOE) national
laboratory.
• LANL conducts multidisciplinary research in fields such as national security , space exploration, renewable energy, medicine, nanotechnology and supercomputing.
• System: 1024-node Connection Machine CM-5 from Thinking Machines • Jobs: 201,387 , Duration: 2 Yrs.
12
v Feature Selection • Use stepwise regression to: Sort variables out & leave more influential ones in the model. • Results: Out of 18 features, 5 features were selected:
=> run_time, wait_time, Avg_cpu_time, used_mem, status
v Filter • Remove jobs with status =unsuccessful ( failed/ aborted ) • Discard records that have average_cpu_time_used <=0 and used_mem <=0
v Data Cleaning • Normalize data to remove noise
I. System Modeling : Data Preprocessing
13
I. System Modeling : Statistical Analysis
14
!
Cascaded classifiers (MISO model)
I. System Modeling : Model I/O
15
§ Linear Regression § Sugeno Fuzzy Inference System (FCM, SUB) § Regression Tree (REP-Tree) § Model Tree (M5P) § Boosting (Rep-Tree, M5P) § Bagging (Rep-Tree, M5P)
I. System Modeling : ML approaches
Why ML ? - Due to the Non-linear nature of the data - Ability to deal with complex nature of data. - Detect dependency between i/ps and o/ps efficiently.
16
Bagging vs. Boosting Classifiers
• Bagging (Bootstrap aggregating) uses bootstrap sampling.
• Trains k classifier on each bootstrap sample.
• A weighted majority (voting) of the k learned classifiers (using equal weights).
• Boosting: weak classifiers form a final strong classifier.
• After a weak learner is added, the data is reweighted: Ø misclassified examples=> gain weight Ø examples classified correctly => lose
weight.
• Thus future learners focus more on the data that previous weak learners misclassified.
II. Res. Predictor
17
v The client requests hosting a specific type of application with a pre-specified response time.
v An initial estimate is generated. v Rate of prediction is
accompanied by the coming of the client to the data center.
18
Classifier Type RMSE MAE RAE CC
Linear Reg. C1 0.0024 0.0008 50.33% 0.70 C2 0.0023 0.0001 57.29% 0.71 C3 0.0026 0.0003 58.15% 0.98
Sugeno FIS (SUB) C1 0.0021 0.0009 44.89% 0.66 C2 0.0012 0.0002 51.06% 0.66 C3 0.0011 0.0002 53.93% 0.85
Boosting (M5P) C1 0.0020 0.0006 34.59% 0.80 C2 0.0018 0.0007 39.20% 0.84 C3 0.0003 0.0001 10.99% 0.99
Bagging Tree (M5P)
C1 0.0018 0.0005 32.57% 0.84 C2 0.0017 0.0007 36.38% 0.84 C3 0.0003 0.0001 11.82% 0.99
Validation: Perf. Measures for different prediction models
19
Resource Predictor: Learning Time Comparison
III. Resource Allocator
20
1. Res Allocator initially allocates resources ( based on the prediction model).
2. Check the error resulting from the
tuner. 3. The tuner Calculates the normalized error in resource allocation
4. Takes the feedback from the tuner (ResAdjustment) and sends a command to the VC in the VM with the appropriate decision.
RespTimeError (k) = RespTime(k)ref ! RespTime(k)obs
RespTime(k)ref
21
IV. Resource Tuner : Rule-Based Fuzzy System
i/ps
o/p
RespTimeError
ClientClass
Status ResDirection
ResController ( mamdani)
22
IV. Resource Tuner: Rule-Based Fuzzy System
!!!!"#$%&'()($%!
!!!!!!!#$%*+',$-((.(!/.0! 1$2'3,! 4'56!
!!!78'$9:!!;8<%%!
=.82! !!!!!!!! !!>?/!
!!!!!!!!>&1!!>?1!
!!!!!!>&4!!>?4!
>'8@$(! !!!!!!>&/!!! !!
!!!!!!!>&1!!>?1!
!!!!!>&4!!>?4!
A(.9B$! !!!!!!>&/!! !!
!!!!!!!>&1!!!!!!!!!! !
!!!!!!!>&4!>?4!
!Over provision / Under provision
- Total # rules : 18 - The grades of
membership of each attribute (high, medium , low) are adjusted by experts in the datacenter.
- ResDir : • reflects a percentage of the resource that should be utilized in
the VC. (ResAdjust = ResDir x ResWt x VCres) • ranges [-1 +1] with MFs (low, med, high) for:
Ø speed up (+ve side) Ø step down (-ve side)
V. Adaptive Learning
23
New incoming data will be fed into the prediction model by different ways (depending on the prediction model used): - Directly via clustering (if
clustering is used as in TS-FIS) => online learning
- OR it will be stored in the
database until a certain threshold reached, then an ECA rule is fired , initiating re-modeling => offline learning
V. Adaptive Learning: update Rules in Fuzzy Tuner FIS
24
Rule Editor
25
Resource Tuner validation - Example
!!!!"#$%&'()($%!
!!!!!!!#$%*+',$-((.(!/.0! 1$2'3,! 4'56!
!!!78'$9:!!;8<%%!
=.82! !!!!!!!! !!>?/!
!!!!!!!!>&1!!>?1!
!!!!!!>&4!!>?4!
>'8@$(! !!!!!!>&/!!! !!
!!!!!!!>&1!!>?1!
!!!!!>&4!!>?4!
A(.9B$! !!!!!!>&/!! !!
!!!!!!!>&1!!!!!!!!!! !
!!!!!!!>&4!>?4!
! Over provision / Under provision
Method: Testing cases using the fuzzy rule viewer.
26
I/ps: RespTimeError : medium , client class : Gold and status: underprovision O/p: ResDirection : SUM (speed up medium)
Resource Tuner Validation - Example
RespTimeError= 0.5 ClientClass= 0.9 status= 0.2 ResDirection = 0.5
27
I/ps: RespTimeError: medium, Client class: Silver, Status : underprovision o/p: ResDirection is SUM (speed up medium)
Resource Tuner Validation - Example
RespTimeError= 0.5 ClientClass= 0.5 status= 0.2 ResDirection = 0.5
28
I/ps: RespTimeError : medium , client class : bronze , status: underprovision o/p: ResDirection : noAction
Resource Tuner Validation - Example
RespTimeError= 0.5 ClientClass= 0.19 status= 0.2 ResDirection = 0.01
Conclusions • Proposed ML model predicts the right amount of
resources (Bagging/Boosting is promising). • The Fuzzy tuner
- Accommodates any deviation in workload c/cs. - Enforces service differentiation.
• Adaptive Learning guarantee having an up-to-date model that lowers future SLA violations.
29
Questions ?
30