Upload
nati-shalom
View
4.707
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Building cross-region and cross could high availability into your app, a real life use case by Gigaspaces, Nati Shalom, Funder & CTO, Gigaspaces Achieving high levels of availability and disaster recovery in a cloud environment requires the implementation of patterns and practices that introduce redundancy through multi-zone, multi-region, and multi-cloud deployments. As we move towards implementing higher availability, we cannot escape the direct increase in the accidental complexity of the deployment architecture resulting from lack of cloud portability and deployment lifecycle automation. We present how high availability and disaster recovery were achieved in reality by using the Cloudify open source framework on top of AWS. This approach applies to not just AWS but also other public clouds and private cloud environments such as Eucalyptus. The resulting reference architecture provides portable PostgreSQL replication and disaster recovery as well as application tier scalability across zones, regions, and public/private clouds through a unified deployment workflow.
Citation preview
Protect your app from OutagesNati Shalom CTO GigaSpaces@natishalom
May 2013
2
AWS and outages Outage impact Disaster Recovery – it’s all about redundancy! Cloudify as a solution for redundancy Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved
AGENDA
3
AWS USAGE
Managing Big Data on the Cloud
• AWS – around 0.5M servers• Facebook – less than 0.1M servers• Google – around 1M servers
4
THE OUTAGE PROBLEM
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved5
OUTAGE – APRIL 21, 2011
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved6
OUTAGE - JUNE 29, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved7
OUTAGE - OCTOBER 22, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved8
OUTAGE - CHRISTMAS EVE 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved9
NOT ONLY AMAZON
28 December 2012 - some owners of Microsoft's XBox 360 gaming console were unable to access some of their cloud-based storage files.
26 July 2012 - Service for Microsoft’s Windows Azure Europe region went down for more than two hours
29 February 2012 - The ultimate result was service impacts of 8-10 hours for users of Azure data centers in Dublin, Ireland, Chicago, and San Antonio.
10
THAT’S WHAT YOU EXPECT?
Managing Big Data on the Cloud
99% - 3.65 days downtime99.9% - 8.76 hours downtime99.99% - 53 minutes downtime99.999% - 5.26 minutes downtime
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved11
OUTAGE IMPACT – DESIGN FOR FAILURES
Outage could cost…$89K per hour for Amadeus$225K per hour for PayPal!
12
DISASTER RECOVERY
13
MULTI CLOUD
Managing Big Data on the Cloud
14
PREPARE FOR DISASTER RECOVERY
Managing Big Data on the Cloud
•Dedicated expert for DR architecture•Define target recovery time & point•Assume every tier can fail•Use monitoring and alerts•Document your operational processes
15
CHAOS MONKEY
Managing Big Data on the Cloud
16
It’s all about REDUNDANCY!
17
CLONE YOUR ENVIORMENT
Managing Big Data on the Cloud
18
CLONE YOUR DATA
•RDS Read Replica•More to come…
19
Automating your DR
Processes
Leverage Existing Automation Frameworks
Configuration Centric APP Centric (PaaS)
CLONE YOUR ENV - HOW DOES IT WORK?
BUILT IN SUPPORT FOR MANAGING DATA IN THE CLOUD
Real Time Relational DB Clusters
NoSQL Clusters Hadoop
Storm MySQL MongoDB Hadoop (Hive, Pig,..)
Elastic Caching XAP Postgress Cassandra ZooKeeper
Couchbase
ElasticSearch
23
Real Life Scenario
VERIFI (CURRENT) DEPLOYMENT ARCHITECTURE
24
Availability region (US-West: Oregon)
Data VolumeInternet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
EC2 Instance
EC2 Instance
PostgresSQL
Cassandra
4 recipes
TARGET ARCHITECTURE
Availability Region (US-West Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
Postgres MasterEC2 Instance
EC2 Instance
Cassandra
Availability Region (US-East Virginia)
Data Volume
EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
Postgres SlaveEC2 Instance
EC2 Instance
Cassandra
replication
Bootstrap two EC2 clouds in different regions, install the “verifi” application on each. The second cloud will have a slightly modified (extended) postgres recipe for acting as a slave + no running app servers. Upon the primary zone failure, the second cloud will spin up instances of the app servers and turn the data instance into master, then bootstrapping another “slave” cloud in another zone.
FAILOVER SCENARIO
26
Region (US-West Oregon)
App ServersPostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
XApp Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure occurs
Bootstrap another cloud in a different region using the same application recipe used to bootstrap cloud #2 above*
1 2 3
Liveness poll
Liveness poll
0 Upon initial deployment, the primary deployment of the application will be bootstrapped onto cloud #1, another slightly modified application recipe will be bootstrapped as cloud #2, polling cloud #1 for failure, and acting as a PostgresSQL db slave.
Turn Postgres slave into master, Start app server instances*
27 Copyright 2012 Gigaspaces. All Rights Reserved
NEXT STEPS
Across clouds(AWS, Rackspace, Azure…etc)
Across AWS regions
Across AWS zones
1 application + overrides
Several cloud drivers
1 application + overrides1 cloud driver
1 application + overrides 1 cloud driver
Avai
labi
lity
Supported byVerifi phase #1
28 Copyright 2012 Gigaspaces. All Rights Reserved
EVOLUTION PATH
Availability
Com
plex
ityMulti
cloud/provider
Multi region
Multi zone
Multi instance
Multi cloud/provider
Multi region
Multi zoneMulti
instance
29
AWS and outages Outage impact Disaster Recovery – it’s all about redundancy!
Cloning your environment – app stack Cloning your DB – Replication
Cloudify as a solution for Redundancy Use recipes to work on any cloud Fast and customized data replication
Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved
SUMMARY
30
Thank You!@natishalom
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved
QUESTIONS & ANSWERS