Sajee Mathew
Architecting for High Availability
Solutions Architect
2 2
What is High Availability?
• Availability: Percentage of time an application operates during its work cycle
• Loss of availability is known as an outage or downtime
– App is offline, unreachable, or partially available
– App is slow to use
– Planned and unplanned
• Goal
– No downtime
– Always available
3 3
Availability is related to • Scalability
– Ability of an application to accommodate growth without changing design
– If app cannot scale, availability may be impacted
– Scalability doesn’t guarantee availability
• Fault Tolerance
– Built-in redundancy so apps can continue functioning when components fail
– Fault tolerance is crucial to HA
• Disaster Recovery
– The process, policies, and procedures related to restoring service after a catastrophic event
• AWS democratizes High Availability
– Multiple servers, isolated redundant data centers, regions across the globe, FT services, etc.
AWS GLOBAL
INFRASTRUCTURE
US-WEST (Oregon) EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
AWS GovCloud (US)
ASIA PAC (Sydney)
Regions
US-WEST (Oregon)) EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
AWS GovCloud (US)
ASIA PAC (Sydney)
Availability Zones
AWS BUILDING BLOCKS
Inherently Highly Available and Fault Tolerant Services
Highly Available with the right architecture
Amazon S3
Amazon DynamoDB
Amazon CloudFront
Amazon Route53
Elastic Load Balancing
Amazon SQS
Amazon SNS
Amazon SES
Amazon SWF
…
Amazon EC2
Amazon EBS
Amazon RDS
Amazon VPC
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
LET’S BUILD A
HIGHLY AVAILABLE SYSTEM
#1 DESIGN FOR FAILURE
●○○○○
« Everything fails all the time »
Werner Vogels
CTO of Amazon
AVOID SINGLE POINTS OF FAILURE
AVOID SINGLE POINTS OF FAILURE
ASSUME EVERYTHING FAILS,
AND WORK BACKWARDS
YOUR GOAL
Applications should continue to function
AMAZON EBS ELASTIC BLOCK STORE
AMAZON ELB ELASTIC LOAD BALANCING
#2 MULTIPLE
AVAILABILITY ZONES ●●○○○
AMAZON RDS
MULTI-AZ
AMAZON ELB AND
MULTIPLE AZs
AUTO SCALING SCALE UP/DOWN EC2 CAPACITY
#4 SELF-HEALING
●●●●○
HEALTH CHECKS
+ AUTO SCALING
HEALTH CHECKS
+ AUTO SCALING
=
SELF-HEALING
AMAZON S3 STATIC WEBSITE
+ AMAZON ROUTE 53
WEIGHTED RESOLUTION
#5 LOOSE
COUPLING ●●●●●
BUILD LOOSELY COUPLED SYSTEMS
The looser they are coupled, the bigger they scale,
the more fault tolerant they get…
AMAZON SQS SIMPLE QUEUE SERVICE
PUBLISH& NOTIFY
RECEIVE TRANSCODE
PUBLISH& NOTIFY
RECEIVE TRANSCODE
VISIBILITY TIMEOUT
CLOUDWATCH METRICS FOR AMAZON SQS
+ AUTO SCALING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
YOUR GOAL
Applications should continue to function
IT’S ALL ABOUT
CHOICE BALANCE COST & HIGH AVAILABILITY
AWS ARCHITECTURE CENTER http://aws.amazon.com/architecture
AWS TECHNICAL ARTICLES http://aws.amazon.com/articles
AWS BLOG http://aws.typepad.com
AWS PODCAST http://aws.amazon.com/podcast