Lessons Learned Managing Large AWS Environments

Preview:

DESCRIPTION

How to you optimize management of 500+ AWS servers? In this presentation I share my experiences using Amazon Web Servers covering techniques for webscale. Learn how to optimized your cost, handle security, automate and be prepared for handling failure.

Citation preview

Lessons learned managing large

AWS EnvironmentsRonald Bradford

http://ronaldbradford.com @RonaldBradford

2013.06

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SCOPE

Consulting experiences with AWS

Several different clientsLargest - 500+ servers

Some 40-50+ servers

Some 2-5 servers

LAMP/RoR/RDS/Windows

EffectiveMySQL.com - Performance, Scalability & Business Continuity

ABOUT MySELF

Enterprise Data Architecture

24 years with RDBMS - 13 years with MySQL

Using AWS 4+ years

Published author - 4 books

Accomplished presenter - 8 years

Work at Independent MySQL Consultant

Ronald BRADFORD

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Covering

1. Products

2. Cost

3. Web Scale

4. Security

5. Instrumentation

6. Failure

EffectiveMySQL.com - Performance, Scalability & Business Continuity

AWS Products & Ecosystem

1

EffectiveMySQL.com - Performance, Scalability & Business Continuity

ABOUT AWS

Many, many products and features

EC2, S3, EBS, ELB, RDS, EMR, VPC, CDN, SWF, SQS, SES, SNS, IAM, ...

Mechanical Turk

Flexible Payments Service (FPS)

AMAZON WEB SERVICES30+

EffectiveMySQL.com - Performance, Scalability & Business Continuity

AWS CONSOLE

May 2013 Aug 2012

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Announcements

Product Announcements

Pricing Changes

New instance types

New features (e.g. IOPS)

New Products (e.g. Redshift/ OpsWorks)

http://aws.amazon.com/about-aws/newsletters/

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Announcements

Product Announcements

Pricing Changes

New instance types

New features (e.g. IOPS)

New Products (e.g. Redshift/ OpsWorks)

Examples in presentation

http://aws.amazon.com/about-aws/newsletters/

EffectiveMySQL.com - Performance, Scalability & Business Continuity

ECOSYSTEM

AWS Marketplacehttps://aws.amazon.com/marketplace/

Over 800

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Product growth

When I started

No RDS, In-memory Cache, DynamoDB, Glacier

No Elastic Beanstalk, OpsWorks

No management console

EffectiveMySQL.com - Performance, Scalability & Business Continuity

AWS Costs2

EffectiveMySQL.com - Performance, Scalability & Business Continuity

operating cost

Are you monitoring your costs?

Daily

Hourly

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Operating Cost

https://github.com/ronaldbradford/aws

$ ec2_cost.sh

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Operating Cost

https://github.com/ronaldbradford/aws

$ ec2_cost.sh

$29,000 p.m.

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Your Money

What is AWS costing you?

Instance types/sizes

Cost options

http://aws.amazon.com/ec2/instance-types

http://aws.amazon.com/ec2/pricing

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance Types

General-purpose

Compute-optimized

Memory-optimized

Storage-optimized

GPU

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance Prices

$Large Instance (m1.large)

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance Prices

$On Demand $0.24 Per hour investment

Reserved $0.136 * + Annual contract ( +$ 0.043)

Spot $0.03+ * Can be terminated (budget)

Large Instance (m1.large)

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance Prices

$On Demand $0.24 Per hour investment

Reserved $0.136 * + Annual contract ( +$ 0.043)

Spot $0.03+ * Can be terminated (budget)

Large Instance (m1.large)

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance Prices

$On Demand $0.24 Per hour investment

Reserved $0.136 * + Annual contract ( +$ 0.043)

Spot $0.03+ * Can be terminated (budget)

Large Instance (m1.large)

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance Prices

$On Demand $0.24 Per hour investment

Reserved $0.136 * + Annual contract ( +$ 0.043)

Spot $0.03+ * Can be terminated (budget)

Large Instance (m1.large)

40% saving

up to 80+% saving

Was $0.32 til 11/19/2012Was $0.26 til 1/16/2013

Light/Medium/Heavy utilization

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SPOT EXAMPLE

One hour (24 cents)

1 x Large - Reserved

7.5G, 4 CPUs, 850G

8 x Large - Spot

or

1 x Eight Extra Large - Spot (cc2.8xlarge)

60G, 88 CPUs, 3.4T,10Gb NIC

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SPOT EXAMPLE

One hour (24 cents)

1 x Large - Reserved

7.5G, 4 CPUs, 850G

8 x Large - Spot

or

1 x Eight Extra Large - Spot (cc2.8xlarge)

60G, 88 CPUs, 3.4T,10Gb NIC

price has changed 3 times in 8 months

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SPOT HISTORY

$ ec2-describe-spot-price-history -t m1.large -d Linux/UNIX SPOTINSTANCEPRICE 0.030000 2013-05-28T17:20:41-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.100000 2013-05-28T17:07:02-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.030000 2013-05-28T16:37:51-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.100000 2013-05-28T16:31:03-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.030000 2013-05-28T16:24:48-0500 m1.large Linux/UNIX us-east-1dSPOTINSTANCEPRICE 0.030000 2013-05-28T16:24:48-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.100000 2013-05-28T16:15:03-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.060000 2013-05-28T16:08:34-0500 m1.large Linux/UNIX us-east-1dSPOTINSTANCEPRICE 0.030000 2013-05-28T16:01:59-0500 m1.large Linux/UNIX us-east-1bSPOTINSTANCEPRICE 0.240000 2013-05-28T15:55:12-0500 m1.large Linux/UNIX us-east-1bSPOTINSTANCEPRICE 0.030000 2013-05-28T15:48:32-0500 m1.large Linux/UNIX us-east-1bSPOTINSTANCEPRICE 0.030000 2013-05-28T15:42:07-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.045000 2013-05-28T15:35:47-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.050000 2013-05-28T15:35:47-0500 m1.large Linux/UNIX us-east-1bSPOTINSTANCEPRICE 0.400000 2013-05-28T15:29:15-0500 m1.large Linux/UNIX us-east-1bSPOTINSTANCEPRICE 0.260000 2013-05-28T15:22:47-0500 m1.large Linux/UNIX us-east-1bSPOTINSTANCEPRICE 0.030000 2013-05-28T15:16:01-0500 m1.large Linux/UNIX us-east-1dSPOTINSTANCEPRICE 0.030000 2013-05-28T15:16:01-0500 m1.large Linux/UNIX us-east-1aSPOTINSTANCEPRICE 0.026000 2013-05-28T15:09:30-0500 m1.large Linux/UNIX us-east-1a

3c to 10c Zone A3c to 40c Zone B2013

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SPOT HISTORY

$ ec2-describe-spot-price-history -t m1.large -d Linux/UNIX 0.0260 2012-09-27T09:45:46-0800 m1.large Linux/UNIX us-east-1b0.0260 2012-09-27T09:45:46-0800 m1.large Linux/UNIX us-east-1d0.0290 2012-09-27T09:38:37-0800 m1.large Linux/UNIX us-east-1b0.0370 2012-09-27T09:38:37-0800 m1.large Linux/UNIX us-east-1d0.0600 2012-09-27T09:31:29-0800 m1.large Linux/UNIX us-east-1b0.1700 2012-09-27T09:31:29-0800 m1.large Linux/UNIX us-east-1d0.1600 2012-09-27T09:24:20-0800 m1.large Linux/UNIX us-east-1d0.0600 2012-09-27T09:17:11-0800 m1.large Linux/UNIX us-east-1b0.0900 2012-09-27T09:17:11-0800 m1.large Linux/UNIX us-east-1d0.0260 2012-09-27T09:09:55-0800 m1.large Linux/UNIX us-east-1c0.0260 2012-09-27T09:09:55-0800 m1.large Linux/UNIX us-east-1b

2.6c to 17c (1/2 of 34c)One AZ only2012

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Using SPOTS

Is your volume predicable?

Splitting on-demand/spot instances

Can work be done asynchronously?

i.e. can be queued

Is work restartable?

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Using SPOTS

Is your volume predicable?

Splitting on-demand/spot instances

Can work be done asynchronously?

i.e. can be queued

Is work restartable? WARNING: Not for general workloads

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance sizes

Evaluating the right instance size

What is your bottleneck?

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instance sizes

Evaluating the right instance size

What is your bottleneck?

Developing a tool to recommend savings

EffectiveMySQL.com - Performance, Scalability & Business Continuity

TRUSTED ADVISOR

AWS now offers Trusted AdvisorRecommendations to save money

Improve performance

Close security problems

http://aws.amazon.com/premiumsupport/trustedadvisor/

EffectiveMySQL.com - Performance, Scalability & Business Continuity

COST SAVINGS

Other players

http://www.newvem.com/http://www.cloudyn.com/

EffectiveMySQL.com - Performance, Scalability & Business Continuity

OTHER COST SAvings

CDN - Cloudfront

Bandwidth

Reduce response size (e.g. 10%)

Storage

old EBS snapshots

Remove unused instances

http://aws.amazon.com/cloudfront/

NEW: Announced 1/9/2103 CloudWatch Alarm Actions

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Web Scale(hint: no humans)

3

EffectiveMySQL.com - Performance, Scalability & Business Continuity

ABOUT WEB SCALE

GUI = #FAIL

CLI is necessary

Manual CLI use is slow

Automation in crucial

Parallel

EffectiveMySQL.com - Performance, Scalability & Business Continuity

AWS CLI’s

Different for EC2, ELB, RDS etc

Updated frequently (i.e. monthly)

$ git clone https://github.com/ronaldbradford/aws.git$ cd aws/scripts$ ./aws_cli_configure.sh

EffectiveMySQL.com - Performance, Scalability & Business Continuity

AWS CLI’s

Different for EC2, ELB, RDS etc

Updated frequently (i.e. monthly)

$ git clone https://github.com/ronaldbradford/aws.git$ cd aws/scripts$ ./aws_cli_configure.sh

Simple helper

EffectiveMySQL.com - Performance, Scalability & Business Continuity

RTFM

http://aws.amazon.com/archives/Amazon-EC2

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Identifiers

Access Key ID

Private Access Key

X.509 Certificates (2 of)

Private (*) & Public

AWS Account ID

Canonical User IDhttps://portal.aws.amazon.com/gp/aws/securityCredentials

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CLI Examples

Launch Script

Demand/Spot or switch between

Verify SSH

Verify MySQL

Verify replication in sync

Add to ELB

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CLI Examples

Audit Script

Consolidates information

Parallel operations

Unused EC2/EBS etc

Feeds reporting

ELB/EC2 usage

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CLI EXAMPLES

Others

Cost Measurement

Cloning (optimizes scale-up)

Move servers between load balancers

Spot History graphing

Spot History email alerts

EffectiveMySQL.com - Performance, Scalability & Business Continuity

AWS Security4

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SECURITY

Do not give away the front door keys

Do not open all the windows

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SECURITY OPTIONS

Keypairs

Security groups

Virtual Private Cloud (VPC)

Identity and Access Management (IAM)

Multi-factor authentication

Learn the different benefits

http://aws.amazon.com/mfa/

EffectiveMySQL.com - Performance, Scalability & Business Continuity

SECURITY TIPS

Restrict open access to port 80/443

Jump box

Restrict IP Access

Additional authentication

Per user SSH authentication

Do not use keypair

EffectiveMySQL.com - Performance, Scalability & Business Continuity

products

Many Others (AWS Summit 2013)

Cloudaware

Enstratius

AlertLogic

Dome9

SafeNet

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instrumentation5

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instrumentation

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instrumentation

What is important to you?

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instrumentation

What is important to you?

All server stats

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instrumentation

What is important to you?

All server stats

Sampling issues

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Instrumentation

What is important to you?

All server stats

Sampling issues

Deceiving averages (frequency)

EffectiveMySQL.com - Performance, Scalability & Business Continuity

REQUESTS PER SEC

5 second averages, not 1 minute samplehttps://github.com/ronaldbradford/reqstat

EffectiveMySQL.com - Performance, Scalability & Business Continuity

REQUESTS PER SEC

5 second averages, not 1 minute samplehttps://github.com/ronaldbradford/reqstat

EffectiveMySQL.com - Performance, Scalability & Business Continuity

REQUESTS PER SEC

5 second averages, not 1 minute samplehttps://github.com/ronaldbradford/reqstat

-1,500 RPS

EffectiveMySQL.com - Performance, Scalability & Business Continuity

outliers

EffectiveMySQL.com - Performance, Scalability & Business Continuity

outliersI care about these

EffectiveMySQL.com - Performance, Scalability & Business Continuity

TESTING

End to end testing critical

Network latency

ELB performance

EffectiveMySQL.com - Performance, Scalability & Business Continuity

products

AWS Cloudwatch

Many Others (AWS Summit 2013)

Datadog

Boundary

CopperEgg

AppDynamics

EffectiveMySQL.com - Performance, Scalability & Business Continuity

products

AWS Cloudwatch

Many Others (AWS Summit 2013)

Datadog

Boundary

CopperEgg

AppDynamics

What features matter?

EffectiveMySQL.com - Performance, Scalability & Business Continuity

Failure6

EffectiveMySQL.com - Performance, Scalability & Business Continuity

FAILURE

EffectiveMySQL.com - Performance, Scalability & Business Continuity

FAILURE

Instances fail

EffectiveMySQL.com - Performance, Scalability & Business Continuity

FAILURE

Instances fail

Outages occur

AWS scheduled reboots

EffectiveMySQL.com - Performance, Scalability & Business Continuity

FAILURE

Instances fail

Outages occur

AWS scheduled reboots

Be prepared

Chaos Monkey

http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CONCLUSION

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CONCLUSION

Cost Management (saving money)

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CONCLUSION

Cost Management (saving money)

CLI automation

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CONCLUSION

Cost Management (saving money)

CLI automation

Instrumentation (inc business metrics)

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CONCLUSION

Cost Management (saving money)

CLI automation

Instrumentation (inc business metrics)

Distribute your application & data

EffectiveMySQL.com - Performance, Scalability & Business Continuity

CONCLUSION

Cost Management (saving money)

CLI automation

Instrumentation (inc business metrics)

Distribute your application & data

Disaster is inevitable

EffectiveMySQL.com - Performance, Scalability & Business Continuity

AWS for FREE

http://aws.amazon.com/free/

Free EC2 t1.micro for a year

Free RDS t1.micro for a year

S3, DynamoDB, SimpleDB, +++

EffectiveMySQL.com - Performance, Scalability & Business Continuityhttp://effectiveMySQL.comRonald Bradford

Recommended