Managing Performance in the Cloud
TheDevMgr
BACKGROUNDCloud History
• Desktop internet computing
• Shift from local to centralised computing
• Software was cheap and hardware was expensive.
In the nineties…
• Shift from desktop to mobile
• The cloud is born
• Bezos and his book company start to shape the future.
The carefree noughty days
• Shift from centralised to distributed computing
• Commoditisation of computing (PAYG)
• Anything-as-a-Service (XaaS).
The twenty-tens
THE CLOUDWhat is it?
Service ModelsX
aaS SaaS
PaaS
IaaS
Anything
Software
Platform
Infrastructure
Infrastructure (IaaS)
• Outsource hardware to support operations– Storage, servers, networking components
• Service provider owns and hosts equipment
• Service provider responsible for management & maintenance.
Platform (Paas)
• Paradigm for delivering operating systems and associated services over the Internet
• No downloads or installation
• Google App Engine, Microsoft Windows Azure, Heroku & Force.com.
Software (SaaS)
• Software distribution model in which applications are hosted by a vendor or service provider
• Made available to customers over the Internet
• SalesForce.com, many...many...more.
Deployment Models
Private PublicHybrid
• “Virtualised” infrastructure operated for a single organisation (single tenant)
• Hosted internally or externally
• Managed internally or by a third-party
• Can be secured to meet compliance
• More expensive, less flexible.
Private Cloud
• Service provider makes resources available to the general public over the Internet– Compute, Storage, O/S, Applications
• May be free or pay-per-usage model
• Fast deployment, short commitments
• Shared services, less control.
Public Cloud
• Core platform on private cloud
• Burstable capability into public cloud
• Brings best of both private and public
• Brings problems of both private and public.
Hybrid
THE COST OF POOR CLOUD PERFORMANCE
Financial and customer satisfaction
Cost• Compuware survey suggests large
business losses can exceed £500k due to poor cloud performance
• 57% of European IT Directors believe that they can’t manage cloud application performance
• You still have to deliver 2 second response times.
Performance• 50% of ops teams have suffered more
than one P-1 performance issue in the cloud
• 33% experience a P-1 issue every month
• 60% of incidents took more than 2 hours to resolve
• Good luck webops (cloudops). Source: AppDynamics
COMMON PERFORMANCE CHALLENGES
Traditional and new problems
Performance Challenges• Traditional
• Connectivity– Bandwidth /
Latency
• Bottlenecks– CPU, IO, Database
• Contemporary
• Bigger scale–More stuff
• Shared infrastructure– Not your stuff
(entirely).
Traditional• Connectivity
• Latency, jitter & Packet loss
• Bandwidth limitations
• Users demand fast access to data
• Bottlenecks
• Will still occur!
• Virtualised hardware– Host Contention– Storage.
Contemporary• Bigger Scale
• 10’s, 100’s, 1000’s, 10,000’s of servers – VM Sprawl
• Dynamically allocated physical resource
• Over-provisioning
• Hidden billing costs
• Shared Resources
• Room for one more?
• Deal with other peoples problems– DDOS, general
stupidity?
–Mi casa, es tu casa.
• Elasticity– Planned (scheduled/controlled scaling)– Unplanned (auto-scaling)
• Global distribution– Data Centres– Data
• Less Control.
Paradigm Shift
Data location still matters!
CLOUD EXPERIENCESStories from the trenches
INFRASTRUCTURE-AS-SERVICE
IaaS
• Adactus Food Ordering Platform
• Transacts –> 7 million orders & > $100M USD a year – 30% daily of orders taken in1 hour
• Adopted as eCommerce platform for Pizza Hut and KFC globally.
Application
Platform• Private• Global instances all
deployed on private clouds
• VMWare ESX Hosts– V-Web’s
• Dedicated / Non-Virtualised SQL
• Public• Rackspace public
cloud
• On-Demand– Load Balancers–Web Servers– SQL Servers
• High-scale, high-volume.
• Big Scale– A lot more to manage
• Virtual Platform– Contention
• End-to-End Application Performance Management.
Challenges
Solutions
• Cloud-centric APM– AppDynamics– CloudKick (now Rackspace APM)– Rightscale
• Automated Operations– Chef, Puppet (SysOps)– CloudFoundry, OpenShift (App LifeCycle)– Heroku, AppFog (NoOps?)
PLATFORM-AS-A-SERVICEPaaS
• Adactus Pulse
• Claims management solution for the insurance industry delivered as SaaS
• Processed over a million claims
• Deployed for ISS and Aviva.
Application
Platform• Deployed into Windows Azure Platform–Web Roles–Worker Roles– SQL Azure– SQL Azure Reporting Services
• Upgrade of traditional ASP.NET application
• Continuous Deployment Process.
Challenges
• Disproving the “shared resource” impact– Is it the infrastructure?
• Database performance is a black-box– Limitations and more limitations
• Getting performance data is hard work– Not easy to access, dispersed everywhere
• Baseline performance is not linear.
Baseline Performance
Large variances in baseline performance.
Windows Azure is more consistent.
Solutions• Instrumentation is king– Aspect Orientation (AOP)
• Gibraltar
– Does your provider offer a Performance API?
• Dedicated Cloud (Azure) Tools• Dynatrace• Cerebrata
• You must automate– Deployment (and everything else!)– Consistency is key.
DATABASE-AS-SERVICEDaaS
• Service provider takes responsibility for installing and maintaining the database.
• Amazon (mySQL)• Microsoft SQL Azure• Google App Engine Datastore• CouchDB, MongoDB.
Overview
Challenges
• Most service providers are having performance issues (even Google!)
• Database is a (performance) black-box– You will find limitations
• Need to handle transient connections– Your database will be there, but not always.
Solutions• Do as much tuning outside of the cloud
as possible
• Instrument your data access
• DB sharding becomes viable easy
• Build connection resiliency into your data-framework.
• On-premise databases– Are you sure?
• You might be about to create your own data storm?– Too much on-premise data– Too little bandwidth.
Caution
SOFTWARE-AS-A-SERVICESaaS
Overview• Adactus Pulse– Delivered on a SaaS Model
• We consume SaaS (heavily)– CRM, Performance, Google Apps, WIKI, Bug
Tracking, Testing, Accounting, Planning & Forecasting, Document Management, CMS, Exception Handling, Business Intelligence, Deployment, APM, Collaboration, HRM, ERP and more.
Challenges• Consumer
• Good news– Performance is out
of your control!
• Bad news– Performance is out
of your control!
• Provider
• Expectations are high!– Response times
• Performance is still king!– Competitors– Repeat use.
Real User Monitoring• Consumer
• It’s your new best friend
• Get to know your SLA– Its your new best
friend
• Simple rules– Be the first to know– Get your money back
• Provider
• It’s your new best friend
• You will live & die by your SLA’s
• Simple rules– Be the first to know– Tell your customers.
MonitoringXaaS
SaaS
PaaS
IaaS
RUM
Instrumentation
APM
BEYOND PERFORMANCEStories from the trenches
Service-Level-Agreements
• Critical element for both provider and consumer
• Don’t waste time on detailed numerical service level agreements
• SLAs need to be based on end-user experience.
Service-Level-Agreements1. Establish system availability
2. Establish system response time
3. Establish error resolution time
4. Establish a fail over window for disaster recovery
5. Ensure that you can get your data back.
Service-Level-Agreements• IaaS– The O/S is your responsibility• Managed Cloud Platforms are available
• PaaS– SLA’s stop at the O/S• Your application still remains your responsibility
• SaaS– Know your SLA inside out. Its your
responsibility.
Disaster Recovery
• It’s hard in the cloud
• DR strategies are still emerging
• Bandwidth & network capacity limits
• Security is still a concern.
Disaster Recovery• There isn’t a single blueprint
• Identify critical resources and recovery methods
• Architect for redundancy
• Back up to/from and restore to/from the cloud
• Most cloud SLA’s > 99.5% availability– 4 hours, 39 minutes downtime per month.
THANK YOU. QUESTIONS?That’s all folks!