Upload
trinhtuyen
View
226
Download
3
Embed Size (px)
Citation preview
© 2010 IBM Corporation
Cloud Performance Considerations
Dr. Stefan Pappe - Distinguished Engineer - Leader Cloud Service Specialty Area Dr. Curtis Hrischuk – Cloud Performance Leader
IBM Global Technology Services
© 2010 IBM Corporation |
Cloud Performance
2
Disclaimer
This document represents the author's views and opinions.
It does not necessarily represent IBM's position or strategies.
© 2010 IBM Corporation |
Cloud Performance
3
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
© 2010 IBM Corporation |
Cloud Performance
4 4
$0
$50
$100
$150
$200
$250
$300
Installed Base (M Units)
Spending (US$B)
New server spending
Server mgmt and admin costs
Power and cooling costs
0
5
10
15
20
25
30
35
40
45
50
Source: IDC, 2008 1WW TB Capacity Shipped on Enterprise Disk Storage Systems 2Server processing consumption doubles every 3 years
IT Costs are Increasing
§ Costs to manage systems has doubled since 2000
§ Costs to power and cool systems has doubled since 2000
§ Devices accessing data over networks doubling every 2.5 years
§ Bandwidth consumed doubling every 1.5 years
§ Data Doubling every 18 months1
§ Server processing capacity doubling every 3 years2
§ 10G Ethernet ports tripling over the next 5 years
© 2010 IBM Corporation |
Cloud Performance
5
What’s Driving Cloud Computing?
1. Cost Reduction: 1. Efficiency: virtual resources for hardware utilization (memory, disk,
machines) 2. Sharing of hardware/maintenance: multitenancy for cost reduction 3. Automation: automate mundane tasks 4. Commodity hardware for most public clouds – Cloud: Highly virtualized with many users sharing the same hardware
2. Technology Maturity Cycle 1. New: Wow, it works! 2. Commercialization: Will it make money long term? 3. “Good enough”: Functionality is “good enough” for majority of users.
Users have a lower tolerance for poor ease of use, care less about the technical details, etc.
4. Standardization: If users don’t care about technical details, we can standardize and virtualize.
5. Business: Focus higher in the solution stack – Cloud: Companies who are moving to the cloud are focusing on their
business, not technology.
3. Payment model: Pay per use to reduce bar of adoption 1. Pay up front for all required capital 2. Finance terms (deferred financial cost) 3. Pay per use (for public cloud). – Cloud: Pay per use with immediate time to value
vs.
vs.
vs.
© 2010 IBM Corporation |
Cloud Performance
7
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
© 2010 IBM Corporation |
Cloud Performance
8
What is Different about the Cloud
ServerServer
Server
Server
Server
Server
Data center
• Customers buy hw and sw • 10’s to 100’s hw servers • Servers are in silos • Enterprise applications • Few failures • Heterogeneous hw
Cloud
• Customers rent hw and sw • 1000s to 10,000’s hw servers • Elastic capacity (+/- servers) • Enterprise and other apps • Constant failures • Commodity hw • Quality of Experience (QoE) is very important to customers • Users run on virtualized hw
“By 2012, one out five businesses will own no IT assets at all.” Gartner 01/18/2010 http://www.gartner.com/it/page.jsp?id=1278413
Grid
• Customers buy hw and sw • 100’s to 1000’s hw servers • Shared servers • Mostly batch apps • Need to account for failures • Homogenous hw
© 2010 IBM Corporation |
Cloud Performance
9
Is Performance Important to the Success of the Cloud
§ Five of the 10 obstacles and opportunities for cloud computing are related to quality-of-service aspects such as availability, performance, capacity or scalability.
§ Obstacle # 1 “Availability of service” discusses availability risks for cloud computing as a result of e.g. programming errors, overload of common services or Distributed Denial of Service (DDoS) attacks
§ Obstacle # 4 “Data transfer bottlenecks” discusses the growing data intensity of applications and how this impacts data transfer rates and costs in the cloud
§ Obstacle # 5 “Performance unpredictability” discusses performance risks caused by e.g. inefficiencies in I/O sharing and by high performance computing
§ Obstacle # 6 “Scalable storage” discusses the difficulties of applying cloud computing to solutions requiring highly scalable persistent storage
§ Obstacle # 8 “Scaling quickly” discusses the difficulties of quickly scaling up and down in response to load without violating service level agreements.
From "Above the Clouds: A Berkeley View of Cloud Computing."
© 2010 IBM Corporation |
Cloud Performance
10
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
© 2010 IBM Corporation |
Cloud Performance
IBM offers highly integrated cloud solutions for different client requirements regarding workloads, service levels and delivery models
low gain
high gain
low pain
high pain
Workloads determine type and fit of Cloud Services
• Availability • Redundancy
• Monitoring
• End to End Process Mgmt
• Core Infrastructure Services
• Server Management • Storage Management
• Security, Patch, Risk
Service Level expectations require different Cloud Management Services
Enterprise Enterprise
Data Center
Private Cloud
Enterprise
Data Center
IBM operated
Managed Private Cloud
IBM owned and operated
Hosted Private Cloud
User A
User B
User C
User D
User E
Public Cloud Services
Enterprise A
Enterprise B
Enterprise C
Shared Cloud Services
• Problem/Change • Audit Checking
• Software License Mgmt
• Application Management
• Compliance Checking
• MW and DBMS Services • Network Connectivity
• Help Desk
• Business Continuity
Different Cloud Delivery Models accommodate different needs regarding architectural control, operations and asset ownership
Delivery Model 1 Delivery Model 2 Delivery Model 3 Delivery Model 4 Delivery Model 5
11
Tier 1 Tier 2 Tier 3 Tier 4 ITSM Run Tier
© 2010 IBM Corporation |
Cloud Performance
12
What are the Layers in the Cloud
Infrastructure as a Service
Servers Networking Storage Data Center Fabric
Shared virtualized, dynamic provisioning
Infrastructure as a Service
Servers Networking Storage Data Center Fabric
Shared virtualized, dynamic provisioning
Platform as a Service
High Volume Transactions
Middleware
Database
Web 2.0 Application Runtime Java
Runtime Development
Tooling
Platform as a Service
High Volume Transactions
Middleware
Database
Web 2.0 Application Runtime Java
Runtime Development
Tooling
Software as a Service
Collaboration
Business Processes
CRM/ERP/HR Industry
Applications Software as a Service
Collaboration
Business Processes
CRM/ERP/HR Industry
Applications
© 2010 IBM Corporation |
Cloud Performance
13
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
© 2010 IBM Corporation |
Cloud Performance
14
Operating System
Is the Cloud More Complex: Virtualization
Operating System
JVM
Application server
Application
§ Multiple hardware and software queues in a normal server § Virtualization adds two new queues (guest OS and hypervisor) which is a network of software queues § Memory and disk space are fixed resources that are shared even more § Hypervisor must cap resource usage to prevent starvation and provide QoS guarantees
queue queue queue
queue
new queue
new queue
Hypervisor
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
© 2010 IBM Corporation |
Cloud Performance
15
Is the Cloud More Complex: Scale Out and Network Functions
§ Network is a critical resource for persistent storage, input and output traffic
§ Network attached storage is a shared pool of multiple storage pods
Operating System
Hypervisor
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Operating System
Hypervisor
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Operating System
Hypervisor
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
new queue
new queue
Network Attached Storage
Network Attached Storage
© 2010 IBM Corporation |
Cloud Performance
16
Is the Cloud More Complex: Virtual Machine Mobility
§ VMs leave, appear, move, grow
§ Workload classes appear, change, move, go way § VMs have different processor power sizes
Operating System
Hypervisor
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Operating System
Hypervisor
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Operating System
Hypervisor
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Guest OS
JVM
Application server
Application
Network Attached Storage
Network Attached Storage
Guest OS
JVM
Application server
Application
© 2010 IBM Corporation |
Cloud Performance
17
IBM CloudBurst Appliance
U 0 No.424140393837363534333231 F R R F302928272625242322212019181716 F R R F15141312111098765432 1
Blade Center Comp.
1U GB Ethernet Sw
Bla
deS
erve
r
EXP3000
x3650M2 Mgt Node
DS3400
Bla
deS
erve
rB
lade
Ser
ver
Bla
deS
erve
r
PS3 GbE
Bla
deS
erve
rB
lade
Ser
ver
Bla
deS
erve
rB
lade
Ser
ver
Main
C
B
D
A EXP3000
EXP3000
EXP3000
EXP3000
Bla
deS
erve
r
PS4 GbE
Bla
deS
erve
r
Bla
deS
erve
r
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Bla
deS
erve
r
1U GB Ethernet Sw
Mgt PS1 Fan 1
Mgt PS2 Fan 2
Bla
deS
erve
r
Bla
deS
erve
r
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1U
Bla
deS
erve
rB
lade
Ser
ver
Bla
deS
erve
r
Bla
deS
erve
r
Bla
deS
erve
rB
lade
Ser
ver
Bla
deS
erve
rB
lade
Ser
ver
Mgt PS2 Fan 2 PS4
Kbd. Mon.
PS3 GbE
1U PANEL
Fan 1
GbE
Bla
deS
erve
rB
lade
Ser
ver
Bla
deS
erve
r
Bla
deS
erve
rB
lade
Ser
ver
Bla
deS
erve
r
PDU
PDU
PDU
PDU
Blade Center Comp.Mgt PS1
DS3400
EXP3000
1U PANEL1U PANEL
Customer Network
Midplane
AM
M2
AM
M2
Midplane
The image
x3650 M2
HS22 B
lade H
S22 B
lade H
S22 B
lade H
S22 B
lade 24 pt 1Gps Ethernet
Sw
24 pt 1Gps Ethernet Sw
10pt FC SM 10pt FC SM Bay 3 Bay 4
Cntl A
Cntl B DS3400
10G SM Bay 1
4
10G SM Bay 2
§ Compute, Network, and Storage resources are integrated into the appliance
© 2010 IBM Corporation |
Cloud Performance
18
How is Cloud Performance Analysis Done
§ Dynamic modeling required to characterize non-locality due to feedback between layered subsystems
– Classical queuing theory is not that helpful – Discrete event simulation approaches are needed
Servers Switches
NAS
A bo0leneck at the NAS may slow the execu9on at the server due to backpressure caused by a feedback chain through network!
Backpressure NAS
bottleneck shows up at
servers
© 2010 IBM Corporation |
Cloud Performance
19
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
© 2010 IBM Corporation |
Cloud Performance
20
The Cloud Performance Challenge
§ Quality of Experience (QoE) depends upon (hybrid) cloud service performance – Excellent QoE accelerates adoption and is a functional requirement – QoE crosses boundaries of internet, network, system, application performance and
resilience
§ Competitive pressure will require competitive performance from all vendors to keep customers
– IaaS and PaaS paradigms allow customers to move (e.g., price, QoE, etc) – e.g., Amazon EC2 and IBM Compute Cloud can run the same software – QoS and SLA’s are an important differentiator
§ Performance of the cloud will evolve to near real-time business – Communication needs are near real-time for correctness – Complex event processing needs to be done quickly to be useful ”Great engineering comes from creating predictable results at predictable costs…
… If you’re not measuring, you’re not engineering” – Rico Mariani, Chief Architect of Visual Studio, Microsoft Corp.
§ Cloud computing is a new paradigm which will have new performance challenges - It incorporates prior component performance challenges too - Hybrid clouds expand this further (e.g., network hops / latency) - Customer expectations will require education
© 2010 IBM Corporation |
Cloud Performance
21
Open Question: Comparing Cloud Performance
§ It can’t!!
§ There aren’t any industry defined benchmarks because the workload classes vary greatly and have dynamic lifetimes
§ And a benchmark needs to include cost and availability as key factors
§ Perhaps a benchmark framework needed that workloads are plugged into?
§ Perhaps a meta-benchmark analysis needed to provide a score?
© 2010 IBM Corporation |
Cloud Performance
22
Central shared storage (SAN or NFS) J Provisioning is fast J Live migration is supported L VM disk I/O is slow due to disk and network contentions
Open Question: Central storage vs. Local Disks vs Combination vs New …
Host machine
Guest OS
Host OS
Hypervisor
Central shared storage
Image repository
Image 1
Image 2
Virtual disk store
Root disk
Data disk
Copy
Local disks J VM disk I/O is fast L Provisioning is slow due to network image copying L No live migration is supported
Host machine
Guest OS
Host OS
Hypervisor
Repository server
Data disk Data disk Root disk
Image repository
Image 1
Image 2 Copy
How to combine these approaches for the best performance?
© 2010 IBM Corporation |
Cloud Performance
23
Open Question: Optimal Approaches for Bin Packing and Moving VMs
“When deploying services in a cloud, a balance must be found between performance and capacity of the service, and the memory available on nodes. This is further complicated if the number of replicas of an application is limited, for instance by the available number of licenses. The analysis of interference between services must scale to large numbers of host nodes, applications, replicas of applications, and classes of users. This paper combines a multi-dimensional packing heuristic and network flow optimization to satisfy simultaneous constraints on throughputs, processor utilizations, memory availability and license availability, at a minimum cost and with a minimum of host processors. “
Jim Zhanwen Li, John Chinneck, Murray Woodside, and Marin Litoiu. 2009. Deployment of Services in a Cloud Subject to Memory and License Constraints. In Proceedings of the 2009 IEEE International Conference on Cloud Computing (CLOUD '09).
© 2010 IBM Corporation |
Cloud Performance
24
Open Question: Performance Fault Diagnosis and Analysis
§ Intermittent backpressure causes lower level hw and/or sw to slow down
§ The problem may appear to move if it is caused by a VM and the VM moves
§ The problem may appear to move if it is caused by a VM and the problem VM dies
§ The problem may appear to move if it is caused by a VM and the problem VM starts up
§ The problem may appear to move if it is caused by hw and the VM moves
§ Several VMs may show the same symptom separated in space and time
§ What data and how much to monitor, with 104 à 105 elements
§ Expert system / analytics are needed to help in the identification of problems
§ Extend analysis to predict hw failures before the occur
© 2010 IBM Corporation |
Cloud Performance
25
Open Question: The CAP Theorem and Performance
§ Three properties of shared-data, distributed systems 1. Consistency: one update is made, all observers are updated 2. Availability: all database transactions should be processed accurately and promptly 3. Tolerance: tolerant to network Partitions
§ CAP Theorem – Only two properties can be achieved at any time – Network partitions is given in distribute systems – Have to pick one between consistency and availability
§ How will distributed architectures change to optimize for each pair of properties – Eventual consistency, non-relational databases?
Lynch, Nancy, and Seth Gilbert. “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services.” ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.
For a general description see: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
© 2010 IBM Corporation |
Cloud Performance
26
Cloud Service Developer
Cloud Service Provider
Security & Resiliency
Service Development
Tools
Common Cloud Management Platform
OSS – Operational Support Services
BSS – Business Support Services
Cloud Services
Virtualized Infrastructure – Server, Storage, Network, Facilities
Cloud Service Consumer
Consumer In-house IT
Cloud Service Integration
Tools
e.g. Service Activation • process optimization
e.g. Provisioning • image copy • instance creation • partitioning
e.g. Run Time Performance • Integration of storage, hypervisor,
network components
• Dedicated nodes
Selected Performance Activities – IBM Cloud
© 2010 IBM Corporation |
Cloud Performance
27
© Copyright IBM Corporation 2010. All rights reserved. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
Disclaimer