Net-Centric Software and Systems I/UCRC
Copyright © 2011 NSF Net-Centric I/UCRC.All Rights Reserved.
High-Confidence SLA Assurance for Cloud Computing Systems and Services
Project Lead: Farokh B. Bastani, I-Ling Yen, Krishna Kavi, and Jeff TianDate: April 7, 2011
2
• Emerging cloud computing paradigm enables– On-demand access to storage, computing, software, and physical resources– Integrated capabilities of a large spectrum of networked services and
resources for realizing tasks that are far beyond current practices Need SLA to enhance cloud system usability and dependability
• Existing SLA (service level agreement) research: Siloed– SLA model: Consider agreement for each QoS aspect independently– Client perspective
• Need to establish SLAs one service at a time, lacking an end-to-end approach for the client task that require composing multiple services/resources
• Consider individual QoS aspects independently, not potential tradeoffs– Provider perspective
• Each provider operates independently, lacks a collaborative concept to globally achieve high SLA assurance while maximizing resource utilization
– No satisfactory solutions to security issues across all layers• Challenges: Develop a comprehensive SLA model and supporting
environment
Problem Description
3
Proposed Solution
Local QoS Monitoring
Resource Management
Admission Control
feed
back
SR
RR
RS
S
Provider 1
Local QoS Monitoring
Resource Management
Admission Control
feed
back
SR S
Provider 2
RR
Local QoS Monitoring
Resource Management
feed
back
Provider 3
Admission Control
Local QoS Monitoring
Resource Management
Admission Control
feed
back
Provider N
Service Composer
SLA for first service
SLA for second service. . .
Fail to get agreement
Integrated SLA Monitoring- Agent based distributed monitoring and behavior integration - Rule based approach, formalize SLAs as rules, events as facts, and use reasoning to derive the violation situations - Consider fuzzy violation decision models - Across providers and resource types- Proactive SLA assurance (recovery)
- Perform end-to-end QoS analysis before SLAs May need reservations to avoid new failures - Consider QoS aspects holistically and directly determine the configuration parameters to fully control tradeoffs Improve SLA model to support holistic SLA
Improved SLA protocol:
First d
eterm
ine with
which
providers an
d levels
of QoS
Then prelim
inarily ch
eck th
e
possibilit
y of gett
ing the S
LAs
Finally es
tablish
the S
LAs
client
client
.
.
.
At each provider:- Consider strict & flexible SLAs- Develop optimal resource management and admission control schemes - Formulation: optimization problem with the objective of maximizing the gain, given task completion rewards and violation penalties and the available resources - Admit only if positive gain- Local monitoring and online reconfiguration - Ensure SLAs are satisfied if resources are sufficient; if not, adjust resource decisions
Probabilistic SLAs to collaboratively
get backup resources under failure
or extreme load
Form cloud community
4
2011 New Project SummaryHigh-Confidence SLA Assurance for Cloud Computing Systems and Services
Tasks:1. Comprehensive model of cloud SLAs
considering correlations of QoS aspects and end-to-end QoS requirements
2. Integrated SLA monitoring approach across providers and resource types
3. Optimal adaptive strategies for assuring SLAs under normal and failure situations
4. Method of assessing system-level SLAs based on component-level SLAs
5. Layered collaborative approach for optimally achieving global SLA assurance by leveraging resources from multiple cloud domains
Research Goals:1. Improved SLA models and protocols to
facilitate highly dependable and practically usable cloud computing
2. Optimal supporting environment for SLA assurance considering end-to-end QoS and QoS tradeoffs and achieving local as well as global monitoring, resource management, and admission control
Benefits to Industry Partners:1. Advanced cloud technologies to meet specified
SLAs to a high degree of confidence in spite of multiple failures
2. Enable cloud computing to be used for critical applications, including health-care systems, emergency response systems, defense systems, transportation systems, etc.
Project Schedule:
A M J J A S O N D J F M A11 12
Task 1: SLA model
Task 2: Integrated SLA monitoring
Task 3: Optimal adaptive SLA assurance