View
258
Download
2
Category
Preview:
Citation preview
53 Minutes or Less - Architecting For Failure In
The CloudBen Andersen-Waine
53 Minutes?
99.99%
Availability (%) Year Month Week
90 36.5 Days 72 Hours 16.8 Hours
99 3.65 Days 7.2 Hours 1.68 Hours
99.9 8.76 Hours 43.8 Min 10.1 Min
99.99 52.56 Min 4.38 Min 1.01 Min
Adapted From: https://en.wikipedia.org/wiki/High_availability
Architecting For Failure?
Who are you?
1) You have some kind of web application / service
2) You are using an IaaS cloud provider
3) The service needs to be “highly available”
Infrastructure
Infrastructure
• Regions & Availability Zones
• Autoscaling
• Multi Region
Regions And Availability Zones
“Each region is a separate geographic area. Each region has multiple, isolated locations known as Availability Zones. Amazon EC2 provides you the ability to place resources, such as instances, and data in multiple locations. Resources aren't replicated across regions unless you do so specifically.”
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html
http://aws.amazon.com/about-aws/global-infrastructure/
Auto Scaling
“Auto Scaling helps you maintain application availability and allows you to scale your Amazon EC2 capacity up or down automatically according to conditions you define. ”
https://aws.amazon.com/autoscaling/
Auto Scaling
• Instance metrics (useful for containers)
• Load balancer health check (useful for web apps on EC2)
Multi Region
Devops
One day I had this fantasy of starting a certification service for operations. The certification assessment would consist of a colleague and I turning up at the corporate data center and setting about critical production servers with a baseball bat, a chainsaw, and a water pistol. The assessment would be based on how long it would take for the operations team to get all the applications up and running again.
http://martinfowler.com/bliki/PhoenixServer.html
Immutable Infrastructure
Devops• Environment Creating
• Releasing
• Secret Management
• Service Discovery
Environment Creation
• Vendors Tool (AWS Cloud Formation / GCE Cloud Deployment Manager)
• 3rd Party Solution - Terraform, Ansible
Immutable Infrastructure
http://martinfowler.com/bliki/SnowflakeServer.html
Configuration changes are regularly needed to tweak the environment so that it runs efficiently and communicates properly with other systems. This requires some mix of command-line invocations, jumping between GUI screens, and editing text files.
The result is a unique snowflake - good for a ski resort, bad for a data center.
Releases: Build An Artifact
• Build A VM (AWS ami / GCE image)
• Use Containers
Releases: Building A VM
Releases: Building A Container
Releases: Canarys
http://martinfowler.com/bliki/CanaryRelease.html
Releases: Blue / Green Deploy
https://cloudnative.io/blog/2015/02/the-dos-and-donts-of-bluegreen-deployment/
Service Discovery
https://www.nginx.com/blog/service-discovery-in-a-microservices-architecture/
Service Discovery
• https://github.com/coreos/etcd
• https://www.consul.io/
• https://zookeeper.apache.org/
Secrets
• Use secret keeper or vault
• Use environment variables
Secrets
Secrets
Secrets
Secrets
Secrets
• https://www.vaultproject.io/
• https://square.github.io/keywhiz/
Secrets
Software Development
General Best Practise
• Write tests (preferably first)
• Continuously integrate
• Write Documentation
Problem: Services Go Away
Circuit Breaking
http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html
Circuit Breaking
http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html
Circuit Breaking
Available solutions:
• https://github.com/Netflix/Hystrix
• https://github.com/ejsmont-artur/php-circuit-breaker
Problem: Spikey Workloads
Queue Based Load Levelling
https://msdn.microsoft.com/en-gb/library/dn589783.aspx
Priority Queue
https://msdn.microsoft.com/en-gb/library/dn589794.aspx
Competing Consumers
https://msdn.microsoft.com/en-gb/library/dn568101.aspx
Monitoring / SLAs
SLA - Service Level Agreement
http://www.nkarten.com/handbook.pdf
Monitoring
Obligatory Meme
The Simian Army
http://techblog.netflix.com/2011/07/netflix-simian-army.htmlhttps://github.com/Netflix/SimianArmy/
Final Thoughts
Questions
Recommended