Transcript
Page 1: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Seven Steps to a Peaceful Life on AWS

Andrew ShiehSmugMug

@shandrew

Philip Jacob Stackdriver @whirlycott

Friday, November 15, 13

Page 2: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 3: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 4: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Stuff we have in common

✓ Years of AWS experience✓ Success and failure with many lessons learned✓ Both using Stackdriver for infrastructure monitoring✓ Lots of data✓ Philosophically aligned on how to run on AWS‣ Superheroes

Friday, November 15, 13

Page 5: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 6: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Transition to Distributed SystemsLure of Elasticity

Peak of Expectations

DevOps Nirvana

Operational Enlightenment

CLOUD HYPE

TIME

Friday, November 15, 13

Page 7: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

STEPS

Friday, November 15, 13

Page 8: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 9: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

1: Apply lean production principles

Friday, November 15, 13

Page 10: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Release all the time: continuous improvement

Friday, November 15, 13

Page 11: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Make it frictionless

Friday, November 15, 13

Page 12: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

$ stack deploy

Friday, November 15, 13

Page 13: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 14: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

2: Choose the right instance type

Friday, November 15, 13

Page 15: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Factors to Consider

CPUNetworkDisk I/O

WorkloadCost

Tools to help you decide

vmstatiostatsarR

ExcelStackdriver + agent

Friday, November 15, 13

Page 16: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

21%$20%$

12%$

11%$

9%$

7%$7%$

3%$ 2%$ 2%$ 2%$1%$ 1%$

0%$ 0%$ 0%$ 0%$ 0%$

m1.large$

m1.small$

m1.m

edium

$

c1.medium

$

c1.xlarge$

t1.micro$

m1.xlarge$

m2.xlarge$

m2.2xlarge$

m2.4xlarge$

m3.xlarge$

m3.2xlarge$

cc2.8xlarge$

hi1.4xlarge$

cg1.4xlarge$

hs1.8xlarge$

cc1.4xlarge$

cr1.8xlarge$

Distribu=on$of$EC2$Instance$Usage$

Friday, November 15, 13

Page 17: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

+ EC2

Friday, November 15, 13

Page 18: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

3: Use configuration management

Friday, November 15, 13

Page 19: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 20: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

4: Choose the right monitoring solution

Friday, November 15, 13

Page 21: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Friday, November 15, 13

Page 22: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Rapid Setup Full-stack AWS Integration IntelligentCluster-aware

Friday, November 15, 13

Page 23: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

5: Design effective alerting policies

Friday, November 15, 13

Page 24: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Simple rules for confidently waking up ops@ at 3am

1.Something had better be broken (or close to it) for the customer

2.The broken thing should be as obvious as possible

3. It should be clear what action I can take to make the situation better

Customers seeing huge spike in 5XX errors

Code deploy to web cluster one hour ago

Revert!

Friday, November 15, 13

Page 25: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

6: Architect for high availability

Friday, November 15, 13

Page 26: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Elastic Load BalancingAmazon RDSApache

Zookeeper

Friday, November 15, 13

Page 27: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

AI

F

Cell-1GW

MQ

AI

F

Cell-2GW

MQ

Cloud Integration System Agents Custom Metrics

Load Balancing 1 Load Balancing nLoad Balancing 2

DNSData Ingestion

S3

Archival Online Analysis

Serving

WorkersWorkers

Workers

AgentsAgents

Agents

APIAPI

API

Q 1

2n

3

Cassandra

Batch

Aggregation Correlation Trending

Web/Mobile

o UIUI

Anomaly

Health

AI

F

Cell-nGW

MQ

Elastic Load Balancingw/ haproxy

Localized failureIdentical dimensions

Easy to reasonNetwork partitions ok

Friday, November 15, 13

Page 28: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Handling failure

Avoid it

Mask it

Minimize it

Recover quickly

Cluster AZ Region

Resilience

Tolerance

Friday, November 15, 13

Page 29: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

7: Think holistically about quality assurance

Friday, November 15, 13

Page 30: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

AUTOSCALING +AUTOMATION +CONTINUOUS INTEGRATION +DEVOPS GOVERNANCE +ELASTICITY +PROGRAMMABLE INFRASTRUCTURE =CONSTANT CHANGE

Friday, November 15, 13

Page 31: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

You cannot pre-test every change

So

You need to be really good at detecting issues

Very quickly

Friday, November 15, 13

Page 32: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Monitoring is a key part of quality assurance for dynamic systems

But monitoring tools need to be intelligent

Distributed sensorsCloud-aware

Anomaly detectionSynthetic transactions

Friday, November 15, 13

Page 33: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

• Training• Recommended reading:

• Systemantics (aka The Systems Bible)

• High Scalability (http://highscalability.com/)

• James Hamilton’s blog (http://perspectives.mvdirona.com/)

Friday, November 15, 13

Page 34: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Visit us at http://www.smugmug.com/

Friday, November 15, 13

Page 35: DevOps Nirvana: Seven Steps to a Peaceful Life on AWS (ARC210) | AWS re:Invent 2013

Visit us at booth 315!

Friday, November 15, 13