27
© 2010 IBM Corporation Cloud Performance Considerations Dr. Stefan Pappe - Distinguished Engineer - Leader Cloud Service Specialty Area Dr. Curtis Hrischuk – Cloud Performance Leader IBM Global Technology Services

Cloud Performance Considerations - uni-stuttgart.de · Cloud Performance Considerations ... • Customers rent hw and sw •1000s to 10,000ʼs hw servers ... Kbd. Mon. PS3 GbE 1U

Embed Size (px)

Citation preview

© 2010 IBM Corporation

Cloud Performance Considerations

Dr. Stefan Pappe - Distinguished Engineer - Leader Cloud Service Specialty Area Dr. Curtis Hrischuk – Cloud Performance Leader

IBM Global Technology Services

© 2010 IBM Corporation |

Cloud Performance

2

Disclaimer

This document represents the author's views and opinions.

It does not necessarily represent IBM's position or strategies.

© 2010 IBM Corporation |

Cloud Performance

3

Agenda

§ Why cloud computing

§ What is cloud computing

§ What are the business perspectives

§ What is different about the cloud

§ Open questions

© 2010 IBM Corporation |

Cloud Performance

4 4

$0

$50

$100

$150

$200

$250

$300

Installed Base (M Units)

Spending (US$B)

New server spending

Server mgmt and admin costs

Power and cooling costs

0

5

10

15

20

25

30

35

40

45

50

Source: IDC, 2008 1WW TB Capacity Shipped on Enterprise Disk Storage Systems 2Server processing consumption doubles every 3 years

IT Costs are Increasing

§  Costs to manage systems has doubled since 2000

§  Costs to power and cool systems has doubled since 2000

§  Devices accessing data over networks doubling every 2.5 years

§  Bandwidth consumed doubling every 1.5 years

§  Data Doubling every 18 months1

§  Server processing capacity doubling every 3 years2

§  10G Ethernet ports tripling over the next 5 years

© 2010 IBM Corporation |

Cloud Performance

5

What’s Driving Cloud Computing?

1.  Cost Reduction: 1.  Efficiency: virtual resources for hardware utilization (memory, disk,

machines) 2.  Sharing of hardware/maintenance: multitenancy for cost reduction 3.  Automation: automate mundane tasks 4.  Commodity hardware for most public clouds – Cloud: Highly virtualized with many users sharing the same hardware

2.  Technology Maturity Cycle 1.  New: Wow, it works! 2.  Commercialization: Will it make money long term? 3.  “Good enough”: Functionality is “good enough” for majority of users.

Users have a lower tolerance for poor ease of use, care less about the technical details, etc.

4.  Standardization: If users don’t care about technical details, we can standardize and virtualize.

5.  Business: Focus higher in the solution stack – Cloud: Companies who are moving to the cloud are focusing on their

business, not technology.

3.  Payment model: Pay per use to reduce bar of adoption 1.  Pay up front for all required capital 2.  Finance terms (deferred financial cost) 3.  Pay per use (for public cloud). – Cloud: Pay per use with immediate time to value

vs.

vs.

vs.

© 2010 IBM Corporation |

Cloud Performance

6

Is Cloud Computing Growing

Mind share

Market share

© 2010 IBM Corporation |

Cloud Performance

7

Agenda

§ Why cloud computing

§ What is cloud computing

§ What are the business perspectives

§ What is different about the cloud

§ Open questions

© 2010 IBM Corporation |

Cloud Performance

8

What is Different about the Cloud

ServerServer

Server

Server

Server

Server

Data center

•  Customers buy hw and sw • 10’s to 100’s hw servers •  Servers are in silos •  Enterprise applications •  Few failures •  Heterogeneous hw

Cloud

•  Customers rent hw and sw • 1000s to 10,000’s hw servers •  Elastic capacity (+/- servers) •  Enterprise and other apps •  Constant failures •  Commodity hw •  Quality of Experience (QoE) is very important to customers •  Users run on virtualized hw

“By 2012, one out five businesses will own no IT assets at all.” Gartner 01/18/2010 http://www.gartner.com/it/page.jsp?id=1278413

Grid

•  Customers buy hw and sw • 100’s to 1000’s hw servers •  Shared servers •  Mostly batch apps •  Need to account for failures •  Homogenous hw

© 2010 IBM Corporation |

Cloud Performance

9

Is Performance Important to the Success of the Cloud

§  Five of the 10 obstacles and opportunities for cloud computing are related to quality-of-service aspects such as availability, performance, capacity or scalability.

§  Obstacle # 1 “Availability of service” discusses availability risks for cloud computing as a result of e.g. programming errors, overload of common services or Distributed Denial of Service (DDoS) attacks

§  Obstacle # 4 “Data transfer bottlenecks” discusses the growing data intensity of applications and how this impacts data transfer rates and costs in the cloud

§  Obstacle # 5 “Performance unpredictability” discusses performance risks caused by e.g. inefficiencies in I/O sharing and by high performance computing

§  Obstacle # 6 “Scalable storage” discusses the difficulties of applying cloud computing to solutions requiring highly scalable persistent storage

§  Obstacle # 8 “Scaling quickly” discusses the difficulties of quickly scaling up and down in response to load without violating service level agreements.

From "Above the Clouds: A Berkeley View of Cloud Computing."

© 2010 IBM Corporation |

Cloud Performance

10

Agenda

§ Why cloud computing

§ What is cloud computing

§ What are the business perspectives

§ What is different about the cloud

§ Open questions

© 2010 IBM Corporation |

Cloud Performance

IBM offers highly integrated cloud solutions for different client requirements regarding workloads, service levels and delivery models

low gain

high gain

low pain

high pain

Workloads determine type and fit of Cloud Services

•  Availability •  Redundancy

•  Monitoring

•  End to End Process Mgmt

•  Core Infrastructure Services

•  Server Management •  Storage Management

•  Security, Patch, Risk

Service Level expectations require different Cloud Management Services

Enterprise Enterprise

Data Center

Private Cloud

Enterprise

Data Center

IBM operated

Managed Private Cloud

IBM owned and operated

Hosted Private Cloud

User A

User B

User C

User D

User E

Public Cloud Services

Enterprise A

Enterprise B

Enterprise C

Shared Cloud Services

•  Problem/Change •  Audit Checking

•  Software License Mgmt

•  Application Management

•  Compliance Checking

•  MW and DBMS Services •  Network Connectivity

•  Help Desk

•  Business Continuity

Different Cloud Delivery Models accommodate different needs regarding architectural control, operations and asset ownership

Delivery Model 1 Delivery Model 2 Delivery Model 3 Delivery Model 4 Delivery Model 5

11

Tier 1 Tier 2 Tier 3 Tier 4 ITSM Run Tier

© 2010 IBM Corporation |

Cloud Performance

12

What are the Layers in the Cloud

Infrastructure  as  a  Service

Servers Networking Storage Data  Center   Fabric

Shared  virtualized,  dynamic  provisioning

Infrastructure  as  a  Service

Servers Networking Storage Data  Center   Fabric

Shared  virtualized,  dynamic  provisioning

Platform  as  a  Service

High  Volume Transactions

Middleware

Database

Web  2.0  Application Runtime Java

Runtime Development

Tooling

Platform  as  a  Service

High  Volume Transactions

Middleware

Database

Web  2.0  Application Runtime Java

Runtime Development

Tooling

Software  as  a  Service

Collaboration

Business   Processes

CRM/ERP/HR Industry  

Applications Software  as  a  Service

Collaboration

Business   Processes

CRM/ERP/HR Industry  

Applications

© 2010 IBM Corporation |

Cloud Performance

13

Agenda

§ Why cloud computing

§ What is cloud computing

§ What are the business perspectives

§ What is different about the cloud

§ Open questions

© 2010 IBM Corporation |

Cloud Performance

14

Operating System

Is the Cloud More Complex: Virtualization

Operating System

JVM

Application server

Application

§  Multiple hardware and software queues in a normal server §  Virtualization adds two new queues (guest OS and hypervisor) which is a network of software queues §  Memory and disk space are fixed resources that are shared even more §  Hypervisor must cap resource usage to prevent starvation and provide QoS guarantees

queue queue queue

queue

new queue

new queue

Hypervisor

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

© 2010 IBM Corporation |

Cloud Performance

15

Is the Cloud More Complex: Scale Out and Network Functions

§  Network is a critical resource for persistent storage, input and output traffic

§  Network attached storage is a shared pool of multiple storage pods

Operating System

Hypervisor

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Operating System

Hypervisor

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Operating System

Hypervisor

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

new queue

new queue

Network Attached Storage

Network Attached Storage

© 2010 IBM Corporation |

Cloud Performance

16

Is the Cloud More Complex: Virtual Machine Mobility

§  VMs leave, appear, move, grow

§  Workload classes appear, change, move, go way §  VMs have different processor power sizes

Operating System

Hypervisor

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Operating System

Hypervisor

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Operating System

Hypervisor

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Guest OS

JVM

Application server

Application

Network Attached Storage

Network Attached Storage

Guest OS

JVM

Application server

Application

© 2010 IBM Corporation |

Cloud Performance

17

IBM CloudBurst Appliance

U 0 No.424140393837363534333231 F R R F302928272625242322212019181716 F R R F15141312111098765432 1

Blade Center Comp.

1U GB Ethernet Sw

Bla

deS

erve

r

EXP3000

x3650M2 Mgt Node

DS3400

Bla

deS

erve

rB

lade

Ser

ver

Bla

deS

erve

r

PS3 GbE

Bla

deS

erve

rB

lade

Ser

ver

Bla

deS

erve

rB

lade

Ser

ver

Main

C

B

D

A EXP3000

EXP3000

EXP3000

EXP3000

Bla

deS

erve

r

PS4 GbE

Bla

deS

erve

r

Bla

deS

erve

r

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Bla

deS

erve

r

1U GB Ethernet Sw

Mgt PS1 Fan 1

Mgt PS2 Fan 2

Bla

deS

erve

r

Bla

deS

erve

r

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1U

Bla

deS

erve

rB

lade

Ser

ver

Bla

deS

erve

r

Bla

deS

erve

r

Bla

deS

erve

rB

lade

Ser

ver

Bla

deS

erve

rB

lade

Ser

ver

Mgt PS2 Fan 2 PS4

Kbd. Mon.

PS3 GbE

1U PANEL

Fan 1

GbE

Bla

deS

erve

rB

lade

Ser

ver

Bla

deS

erve

r

Bla

deS

erve

rB

lade

Ser

ver

Bla

deS

erve

r

PDU

PDU

PDU

PDU

Blade Center Comp.Mgt PS1

DS3400

EXP3000

1U PANEL1U PANEL

Customer Network

Midplane

AM

M2

AM

M2

Midplane

The image

x3650 M2

HS22 B

lade H

S22 B

lade H

S22 B

lade H

S22 B

lade 24 pt 1Gps Ethernet

Sw

24 pt 1Gps Ethernet Sw

10pt FC SM 10pt FC SM Bay 3 Bay 4

Cntl A

Cntl B DS3400

10G SM Bay 1

4

10G SM Bay 2

§  Compute, Network, and Storage resources are integrated into the appliance

© 2010 IBM Corporation |

Cloud Performance

18

How is Cloud Performance Analysis Done

§ Dynamic modeling required to characterize non-locality due to feedback between layered subsystems

– Classical queuing theory is not that helpful – Discrete event simulation approaches are needed

Servers  Switches  

NAS  

A  bo0leneck  at  the  NAS  may  slow  the  execu9on  at  the  server  due  to    backpressure  caused  by  a  feedback  chain  through  network!  

Backpressure NAS

bottleneck shows up at

servers

© 2010 IBM Corporation |

Cloud Performance

19

Agenda

§ Why cloud computing

§ What is cloud computing

§ What are the business perspectives

§ What is different about the cloud

§ Open questions

© 2010 IBM Corporation |

Cloud Performance

20

The Cloud Performance Challenge

§  Quality of Experience (QoE) depends upon (hybrid) cloud service performance –  Excellent QoE accelerates adoption and is a functional requirement –  QoE crosses boundaries of internet, network, system, application performance and

resilience

§  Competitive pressure will require competitive performance from all vendors to keep customers

–  IaaS and PaaS paradigms allow customers to move (e.g., price, QoE, etc) –  e.g., Amazon EC2 and IBM Compute Cloud can run the same software –  QoS and SLA’s are an important differentiator

§  Performance of the cloud will evolve to near real-time business –  Communication needs are near real-time for correctness –  Complex event processing needs to be done quickly to be useful ”Great engineering comes from creating predictable results at predictable costs…

… If you’re not measuring, you’re not engineering” – Rico Mariani, Chief Architect of Visual Studio, Microsoft Corp.

§  Cloud computing is a new paradigm which will have new performance challenges -  It incorporates prior component performance challenges too - Hybrid clouds expand this further (e.g., network hops / latency) -  Customer expectations will require education

© 2010 IBM Corporation |

Cloud Performance

21

Open Question: Comparing Cloud Performance

§  It can’t!!

§  There aren’t any industry defined benchmarks because the workload classes vary greatly and have dynamic lifetimes

§  And a benchmark needs to include cost and availability as key factors

§  Perhaps a benchmark framework needed that workloads are plugged into?

§  Perhaps a meta-benchmark analysis needed to provide a score?

© 2010 IBM Corporation |

Cloud Performance

22

Central shared storage (SAN or NFS) J Provisioning is fast J Live migration is supported L VM disk I/O is slow due to disk and network contentions

Open Question: Central storage vs. Local Disks vs Combination vs New …

Host machine

Guest OS

Host OS

Hypervisor

Central shared storage

Image repository

Image 1

Image 2

Virtual disk store

Root disk

Data disk

Copy

Local disks J VM disk I/O is fast L Provisioning is slow due to network image copying L No live migration is supported

Host machine

Guest OS

Host OS

Hypervisor

Repository server

Data disk Data disk Root disk

Image repository

Image 1

Image 2 Copy

How to combine these approaches for the best performance?

© 2010 IBM Corporation |

Cloud Performance

23

Open Question: Optimal Approaches for Bin Packing and Moving VMs

“When deploying services in a cloud, a balance must be found between performance and capacity of the service, and the memory available on nodes. This is further complicated if the number of replicas of an application is limited, for instance by the available number of licenses. The analysis of interference between services must scale to large numbers of host nodes, applications, replicas of applications, and classes of users. This paper combines a multi-dimensional packing heuristic and network flow optimization to satisfy simultaneous constraints on throughputs, processor utilizations, memory availability and license availability, at a minimum cost and with a minimum of host processors. “

Jim Zhanwen Li, John Chinneck, Murray Woodside, and Marin Litoiu. 2009. Deployment of Services in a Cloud Subject to Memory and License Constraints. In Proceedings of the 2009 IEEE International Conference on Cloud Computing (CLOUD '09).

© 2010 IBM Corporation |

Cloud Performance

24

Open Question: Performance Fault Diagnosis and Analysis

§  Intermittent backpressure causes lower level hw and/or sw to slow down

§  The problem may appear to move if it is caused by a VM and the VM moves

§  The problem may appear to move if it is caused by a VM and the problem VM dies

§  The problem may appear to move if it is caused by a VM and the problem VM starts up

§  The problem may appear to move if it is caused by hw and the VM moves

§  Several VMs may show the same symptom separated in space and time

§  What data and how much to monitor, with 104 à 105 elements

§  Expert system / analytics are needed to help in the identification of problems

§  Extend analysis to predict hw failures before the occur

© 2010 IBM Corporation |

Cloud Performance

25

Open Question: The CAP Theorem and Performance

§  Three properties of shared-data, distributed systems 1.  Consistency: one update is made, all observers are updated 2.  Availability: all database transactions should be processed accurately and promptly 3.  Tolerance: tolerant to network Partitions

§  CAP Theorem –  Only two properties can be achieved at any time –  Network partitions is given in distribute systems –  Have to pick one between consistency and availability

§  How will distributed architectures change to optimize for each pair of properties –  Eventual consistency, non-relational databases?

Lynch, Nancy, and Seth Gilbert. “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services.” ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.

For a general description see: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

© 2010 IBM Corporation |

Cloud Performance

26

Cloud Service Developer

Cloud Service Provider

Security & Resiliency

Service Development

Tools

Common Cloud Management Platform  

OSS – Operational Support Services

BSS – Business Support Services

Cloud Services

Virtualized Infrastructure – Server, Storage, Network, Facilities

Cloud Service Consumer

Consumer In-house IT

Cloud Service Integration

Tools

e.g. Service Activation •  process optimization

e.g. Provisioning •  image copy •  instance creation •  partitioning

e.g. Run Time Performance •  Integration of storage, hypervisor,

network components

•  Dedicated nodes

Selected Performance Activities – IBM Cloud

© 2010 IBM Corporation |

Cloud Performance

27

© Copyright IBM Corporation 2010. All rights reserved. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.

Disclaimer