35
Seven Steps to a Peaceful Life on AWS Andrew Shieh SmugMug @shandrew Philip Jacob Stackdriver @whirlycott Friday, November 15, 13

DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Embed Size (px)

DESCRIPTION

(Presented by Stackdriver) Key decisions related to architecture, tools, processes, and even team composition can have a dramatic effect on the human effort required to operate distributed applications on AWS. If you make the wrong decisions on in these areas, you spend your days, nights, weekends, and vacations dealing with issues and noise. If you make the right decisions, you and your team can focus on building customer value, and your time away from work is spent… not working. Stackdriver and Smugmug describe the seven most important practices that world-class operations teams employ to minimize operational overhead, highlighting real-world examples to illustrate the importance of each.

Citation preview

Page 1: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Seven Steps to a Peaceful Life on AWS

Andrew ShiehSmugMug

@shandrew

Philip Jacob Stackdriver @whirlycott

Friday, November 15, 13

Page 2: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 3: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 4: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Stuff we have in common

✓ Years of AWS experience✓ Success and failure with many lessons learned✓ Both using Stackdriver for infrastructure monitoring✓ Lots of data✓ Philosophically aligned on how to run on AWS‣ Superheroes

Friday, November 15, 13

Page 5: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 6: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Transition to Distributed SystemsLure of Elasticity

Peak of Expectations

DevOps Nirvana

Operational Enlightenment

CLOUD HYPE

TIME

Friday, November 15, 13

Page 7: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

STEPS

Friday, November 15, 13

Page 8: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 9: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

1: Apply lean production principles

Friday, November 15, 13

Page 10: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Release all the time: continuous improvement

Friday, November 15, 13

Page 11: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Make it frictionless

Friday, November 15, 13

Page 12: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

$ stack deploy

Friday, November 15, 13

Page 13: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 14: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

2: Choose the right instance type

Friday, November 15, 13

Page 15: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Factors to Consider

CPUNetworkDisk I/O

WorkloadCost

Tools to help you decide

vmstatiostatsarR

ExcelStackdriver + agent

Friday, November 15, 13

Page 16: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

21%$20%$

12%$

11%$

9%$

7%$7%$

3%$ 2%$ 2%$ 2%$1%$ 1%$

0%$ 0%$ 0%$ 0%$ 0%$

m1.large$

m1.small$

m1.m

edium

$

c1.medium

$

c1.xlarge$

t1.micro$

m1.xlarge$

m2.xlarge$

m2.2xlarge$

m2.4xlarge$

m3.xlarge$

m3.2xlarge$

cc2.8xlarge$

hi1.4xlarge$

cg1.4xlarge$

hs1.8xlarge$

cc1.4xlarge$

cr1.8xlarge$

Distribu=on$of$EC2$Instance$Usage$

Friday, November 15, 13

Page 17: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

+ EC2

Friday, November 15, 13

Page 18: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

3: Use configuration management

Friday, November 15, 13

Page 19: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 20: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

4: Choose the right monitoring solution

Friday, November 15, 13

Page 21: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 22: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Rapid Setup Full-stack AWS Integration IntelligentCluster-aware

Friday, November 15, 13

Page 23: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

5: Design effective alerting policies

Friday, November 15, 13

Page 24: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Simple rules for confidently waking up ops@ at 3am

1.Something had better be broken (or close to it) for the customer

2.The broken thing should be as obvious as possible

3. It should be clear what action I can take to make the situation better

Customers seeing huge spike in 5XX errors

Code deploy to web cluster one hour ago

Revert!

Friday, November 15, 13

Page 25: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

6: Architect for high availability

Friday, November 15, 13

Page 26: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Elastic Load BalancingAmazon RDSApache

Zookeeper

Friday, November 15, 13

Page 27: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

AI

F

Cell-1GW

MQ

AI

F

Cell-2GW

MQ

Cloud Integration System Agents Custom Metrics

Load Balancing 1 Load Balancing nLoad Balancing 2

DNSData Ingestion

S3

Archival Online Analysis

Serving

WorkersWorkers

Workers

AgentsAgents

Agents

APIAPI

API

Q 1

2n

3

Cassandra

Batch

Aggregation Correlation Trending

Web/Mobile

o UIUI

Anomaly

Health

AI

F

Cell-nGW

MQ

Elastic Load Balancingw/ haproxy

Localized failureIdentical dimensions

Easy to reasonNetwork partitions ok

Friday, November 15, 13

Page 28: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Handling failure

Avoid it

Mask it

Minimize it

Recover quickly

Cluster AZ Region

Resilience

Tolerance

Friday, November 15, 13

Page 29: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

7: Think holistically about quality assurance

Friday, November 15, 13

Page 30: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

AUTOSCALING +AUTOMATION +CONTINUOUS INTEGRATION +DEVOPS GOVERNANCE +ELASTICITY +PROGRAMMABLE INFRASTRUCTURE =CONSTANT CHANGE

Friday, November 15, 13

Page 31: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

You cannot pre-test every change

So

You need to be really good at detecting issues

Very quickly

Friday, November 15, 13

Page 32: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Monitoring is a key part of quality assurance for dynamic systems

But monitoring tools need to be intelligent

Distributed sensorsCloud-aware

Anomaly detectionSynthetic transactions

Friday, November 15, 13

Page 33: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

• Training• Recommended reading:

• Systemantics (aka The Systems Bible)

• High Scalability (http://highscalability.com/)

• James Hamilton’s blog (http://perspectives.mvdirona.com/)

Friday, November 15, 13

Page 34: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Visit us at http://www.smugmug.com/

Friday, November 15, 13

Page 35: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Visit us at booth 315!

Friday, November 15, 13