© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
CPN211 - Reducing Cost and Maximizing
Efficiency: Tightening the Belt on AWS Tom Johnston - Business Development Manager, Amazon Web Services
Sean Simpson - Director of Operations, Stitcher, Inc.
Kingsley Wood - Business Development Manager, Amazon Web Services
Ashay Padwal - CTO, Vserv.mobi
November 15, 2013
Introductions and Outline
• Tom Johnston (AWS)
Reducing Cost and Spending Smart
• Sean Simpson (Stitcher)
Moving to AWS – A Story
• Kingsley Wood (AWS)
Maximizing Efficiency and Cost Optimization
• Ashay Padwal (vServ.mobi)
a Spot Case Study
Reducing Cost
and
Spending Smart
Tom Johnston – Business Development Manager, AWS
Fundamentals
• Explicit Objectives
• Match Instances with Workloads
• Match Scale & Use with Demand
• Match Purchasing with Utilization
• Governance Matters
Objectives
Objectives
AWS provides you the ability to
match your architecture to your
objectives
Start
Choose an instance that best meets your basic requirements
Match memory & virtual cores
Instance types
Start
Choose an instance that best meets your basic requirements
Match memory & virtual cores
Tune
Change instance size up or down based upon
monitoring
Use CloudWatch & Trusted Advisor to assess
Instance types
Instance Amazon
CloudWatch Alarm
Free Memory
Free CPU
Free HDD
…
Custom Metrics
…
At 1-min
intervals
PUT 2 weeks
Know your usage
Me
mo
ry (
GB
)
Processing Ability
More
Memory
More
Processing
M1
M3
High-CPU
High
Mem
Cluster
Compute
C3
High
Storage
High
I/O
High-Mem
Cluster
Compute
Start
Choose an instance that best meets your basic requirements
Match memory & virtual cores
Tune
Change instance size up or down based upon
monitoring
Use CloudWatch & Trusted Advisor to assess
Roll-Out
Run multiple instances in multiple Availability
Zones
Instance types
Choose your metric optimize for the metric
Choose your metric optimize for the metric
Cost per unit of work per instance(size)
Workload A
Optimal on 4x
m1.xlarge
Workload B
Optimal on 10x
m1.medium
Workload C
Optimal on 2x
m3.xxlarge
Choose your metric optimize for the metric
Cost per unit of work per instance (size)
100 concurrent jobs on 10 x m1.large @ $0.26 / hr = $ 0.026 / job
300 concurrent jobs on 10 x m3.xlarge @ $0.58 / hr = $ 0.019 / job
vs
Choose your metric optimize for the metric
Think workload density
Don’t just focus on instance hourly rate
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
Capacity of 1 Server
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
Capacity of 1 Server
Traditional capacity required
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
Capacity of 1 Server
Traditional capacity required
1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
Capacity of 1 Server
Traditional capacity required
1 Server for 8 hours 1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
Capacity of 1 Server
Traditional capacity required
1 Server for 8 hours 1 Server for 8 hours
1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
Capacity of 1 Server
Traditional capacity required
1 Server for 8 hours 1 Server for 8 hours
1 Server for 8 hours
1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Serv
er
Lo
ad
Hour of day
Capacity of 1 Server
Traditional capacity required
1/3rd
Saving
0
1
2
3
4
5
6
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Insta
nce C
ou
nt
Day of Month
0
1
2
3
4
5
6
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Insta
nce C
ou
nt
Day of Month
Monthly
predictable
peak
processing
0
1
2
3
4
5
6
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Insta
nce C
ou
nt
Day of Month
Traditional capacity required
0
1
2
3
4
5
6
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Insta
nce C
ou
nt
Day of Month
Elastic Capacity
Traditional capacity required
0
1
2
3
4
5
6
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Insta
nce C
ou
nt
Day of Month
75% Savings
Traditional capacity required
Elastic Capacity
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
Reserved instances
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Reserved instances
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Reserved instances
Up to 58% Savings
Heavy utilization RI
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Reserved instances
> 80% utilization Lower costs up to 58%
Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline
Heavy utilization RI
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Reserved instances
> 80% utilization Lower costs up to 58%
Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline
Heavy utilization RI
Up to 49%
Savings
Medium utilization RI
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Reserved instances
> 80% utilization Lower costs up to 58%
Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline
Heavy utilization RI
41-79% utilization Lower costs up to 49%
Use Cases: Web applications, many heavy
processing tasks, running much of the time
Medium utilization RI
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Reserved instances
> 80% utilization Lower costs up to 58%
Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline
Heavy utilization RI
41-79% utilization Lower costs up to 49%
Use Cases: Web applications, many heavy
processing tasks, running much of the time
Medium utilization RI
Up to 34% Savings
Light utilization RI
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Reserved instances
> 80% utilization Lower costs up to 58%
Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline
Heavy utilization RI
41-79% utilization Lower costs up to 49%
Use Cases: Web applications, many heavy
processing tasks, running much of the time
Medium utilization RI
15-40% utilization Lower costs up to 34%
Use Cases: Disaster Recovery, Weekly / Monthly reporting, Elastic Map Reduce
Light utilization RI
Best RI for Utilization
$-
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
$14,000
$16,000
$18,000
Heavy
Medium
Light
O-Demand
Best RI for Utilisation
$-
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
$14,000
$16,000
$18,000
Heavy
Medium
Light
O-Demand
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
On Demand
Light Utilization RI
Medium Utilization RI
Heavy utilization RI
Optimizing costs with RIs
Unix/Linux instances start at $0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1- or 3-year terms
Pay low up-front fee, receive significant hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Bid on unused EC2 capacity
Spot Price based on supply/demand, determined automatically
Cost / Large Scale, dynamic workload handling
Use Cases:
Applications with flexible start and end times
Applications only feasible at very low compute prices
Spot instances
Spot instances
Governance Matters
• Who can create and launch instances?
• Who checks that only needed instances are
running?
• Have specific policies
• Use AWS tools such as IAM to help enforce
them
Checklist
• Identify your goals
• Understand your workload & match to instances
• Scale up and down with demand
• Align purchasing methods & utilization
• Have governance appropriate to your goals
• Change in goals & workload will drive change in
use of AWS
Moving to AWS – A Story
Sean Simpson
Director of Operations - Stitcher, Inc.
What is Stitcher?
• Stitcher is to news and talk radio what Pandora
is to music
• Stitcher is a content aggregator
• Stitcher is an on-demand service
• Stitcher is deployed on mobile, CE, and
automotive platforms
Stitcher by the Numbers
• 12 million downloads
• 20,000+ shows
• Over 1 million hours of listening weekly
• Over 100 TB outbound data monthly
With Growth Comes Pain
• DRBD database locked us into hardware
• Sublease of colocation facility restricted our
access to our servers
• Server leases and purchases constrained our
architecture
• Growth inhibited by human, server, and vendor
resources
What options did we consider?
• Move to another colocation facility
• Move to a cloud provider
• Move to a hybrid colocation/cloud provider
Why we chose Amazon Web Services
• Familiarity – Already using Amazon Simple Storage Service for our RSS
feeds
– Already experimenting with Amazon Elastic Compute Cloud
– Recently implemented Amazon Simple Queue Service
Why we chose Amazon Web Services
• Flexibility / Scalability – Ability to adjust resources quickly in our production environment
– Ability to create any number of environments
– Ability to design servers as we wanted with respect to operating
systems, systems software, etc.
Why we chose Amazon Web Services
• Cost – Cost matches usage
– Bandwidth savings when using Amazon CloudFront as our CDN
– Many resources to assist in optimization
– Put simply, we got our solution for the lowest quote
Why we chose Amazon Web Services
• Documentation & Customer Service – Knowledgeable solutions architects
– “Right-level” documentation
– Quick response to our needs
Architecting Change
• Ask yourself: What are we trying to achieve?
• Know yourself, know your systems
• Consider industry best practices (but don’t
blindly follow them)
• Read the documentation
Use Puppet or Chef
• Configuration management tools are both
enabling and liberating
• Build, destroy, and build again
• Write once, build many
• Nuances between node types are managed with
clearly written rules
• Naming conventions are your friend
Our Architecture
Looks nice, but what does it do?
• High Availability
• Scalability
• Security
• Performance
• Cost effectiveness
The Results – Database connections/sec
450
225
0 100 200 300 400 500
After
Before
The Results – GetStationPlaylist()
0.1
0.75
0 0.2 0.4 0.6 0.8
After
Before
The Results – Maximum throughput
20000
5000
0 5000 10000 15000 20000 25000
After
Before
The Results – Downtime
15
1200
0 200 400 600 800 1000 1200 1400
After
Before
Cost Optimization Results
• Twice the results for the same money
How we save money
• Reserved instances
• Appropriate instance types
• CloudFront CDN
• Rapid reorganization using the API
• Monitor utilization
• Load test
• Housecleaning
On Deck Cost Savings
• Spot instances for processing tasks
• Auto Scaling
• In-app optimizations
• Instance type tuning
Parting Advice
• Architect for 10X
• Take the time to get it right the first time (or at
least, close enough)
• Plan on continuous evolution of systems
Maximizing Efficiency
and
Cost Optimization
Kingsley Wood – Business Development Manager, AWS
Considerations
• Offloading – reduce footprint
• Utilization – your biggest lever
• Managed Services – leverage RDS, SQS, SES
• Consolidated Billing – pooling resources
• Flexible Evolution – continually revisit
• Spot Instances – think big, new possibilities
OFFLOAD all static content • reduce your compute demand and costs
• improve end-user experience
• increase reliability and durability
+
ENTIRE SITE via CloudFront • minimize client-server chatter (keep it at the edge)
• reduce server-database traffic (cache the common calls)
• speed up mobile app response (persistent connections)
+
Real World Example
Standard Setup
• 4 x Medium Instances
$485
• AWS Data Transfer 1 TB
$194
• Total = $679
Optimized
• 1 x Medium Instance
$121
• CloudFront Data 1 TB
$168
• CloudFront Requests
$1.89
• Total = $291
57% Lower Cost + 6X Faster
Offloading Tips
• Leverage S3, CloudFront, Route 53
• Eliminate repeated calls (edge and data cache)
• Static website hosting on S3
No web server at all!
• Minimize your EC2 and database footprint
stand up Read Replicas for variable loads
Utilization and Auto-Scaling: Granularity
more small instances vs. less large instances
29 Large @
$0.32/hr
= $9.28
59 Small @
$0.08/hr
= $4.72
Utilization – Trigger Actions by Event
Leverage CloudWatch to collect and measure metrics
Buuuk for Singapore Press Holdings (SPH)
The Straits Times Mobile App
REAL-TIME reaction response • notification of pending News Flash (with audible alarm)
• on-demand ramp up of capacity (6 mins)
• subscriber alert push delivered
• mass response traffic handled (followed by ramp down)
Architecture
Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and
highly available systems in the cloud.
These qualities have been designed into our services both by handling such aspects without any special
action by you and by providing features that must be used explicitly and correctly.
Managed Services Reduce:
Managed Services
Elastic Load Balancing
Amazon Relational Database Service
(RDS)
Amazon Simple Queue Service
(SQS)
Amazon Simple Email Service
(SES)
Amazon Elastic
MapReduce
Amazon
ElastiCache
Amazon Simple
Notification Service
(SNS)
$0.028 per hour
Web Servers
Availability Zone
Elastic Load
Balancing DNS
Web Servers
$0.08 per hour
(small instance)
Availability Zone
$0.028 per hour
Web Servers
Availability Zone
EC2 instance
+ software LB
Elastic Load
Balancer DNS
DNS
VS
SQS queue
Consumers Producer
$0.50 per
1,000,000 Requests ($0.0000005 per Request)
Producer
SQS queue
Consumers
Consumers Producer
EC2 instance
+ software queue
$0.50 per
1,000,000 Requests ($0.0000005 per Request)
$0.08 per hour
(small instance)
VS
Consolidated Billing
RI Purchases to grow a Resource Pool
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8 9 10 11 12
E
D
C
B
AReserved Instance
Pool
Tiered Pricing
Flexibility: Take advantage!
Architecture
vs.
Gardening
STOP/START
size changes
new instance types
vary capacity
rearrange, etc.
What are Spot Instances?
• Value Pricing
• Up to 92% discount
Elastic • Capacity not otherwise
available
Minimum Commitment • Commit to 1 hour
• Tradeoff Potential for interruption
Key Points about Spot
• Spare capacity – supply and demand
• Be prepared for no availability at times
• Be willing to accept and deal with interruption
• Far greater potential scale
starting at 5X default instance limits
• Massive possible capacity = new ideas…
Consider 2 Time-to-Value Scenarios 1) Value of results quickly diminishes
e.g., Engineering simulations e.g., Analytics before an M&A deal
2) Value of result stable until deadline
Spot Applications
Ideal Applications
Batch Processing
Time-Delayable
Fault-Tolerant or Restartable
Compute-Intensive
Horizontally Scalable
Stateless Worker Nodes
Region and AZ Independent
Uses Deployment Automation
Less Ideal Applications
Interactive
Strict/Tight SLA for Completion
Expensive to Handle Terminations
Data-Intensive
In-Memory Scaling
Long-Running Worker Nodes
Requires a Single AZ
Manually Launched and Managed
Spot Advice and Tips
• Don’t build your reliability ENTIRELY on spot
vServ.mobi – exceptional and smart architecture
• With time flexibility, different approaches:
delayed results, lower cost
spend less, quicker answers
• Ask different questions:
with enormous capacity, what is now possible?
Look at the World Differently
• Order of magnitude more capacity
• New experiments enabled = innovation!
• Lucky Oyster – recommendation exchange
• Prototyping a new search technology idea (using Common Crawl)
• 3.4 billion web pages > 1 TB of data > Index of 400 million entities
• “The cost? About $100... in about 14 hours”
A Spot Case Study
Ashay Padwal
CoFounder & CTO – vServ.mobi
GLOBAL INNOVATION FOCUSED
Award Winning Mobile Ad Exchange
across Emerging Markets
31 Bn Ad Requests / Month
Over 200 Mn Unique Users / Month
10% SOUTH AMERICA
7% NORTH AMERICA
11% EUROPE
33% INDIA
14% MIDDLE EAST & AFRICA
11% REST OF ASIA
14% SE ASIA
Infrastructure: Requirements & Challenges
Requirement: Self Serve for Publisher On-boarding & Exit
Challenge: No Capacity Planning; Extreme Scalability
Requirement: Start Up
Challenge: No Capex, no Lock-in
Requirement: Least Latency & High Availability
Challenge: Suite of services – Compute, Load Balancing,
DNS, CDN, Storage, Multiple DCs per location
Requirement: Global Setup management with small team
Challenge: Availability across Regions with extensive APIs
1
2
3
4
Infrastructure: Solution
AWS
AWS
EC2 & ELB – Multi-AZ
Route53, CloudFront, S3
US East, US West, Europe, South America, Asia
For Middle East, we host in Turkey
For Africa, we host in South Africa
1
2
3
4
Deployment Overview
Ad Delivery Setup
Now What? Reduce Cost without impacting Performance
• AWS is pretty cost-effective. But we were greedy!
• Saving more meant more money for other areas in our
business.
• We walked in the opposite direction... and it worked!
• We use spot instances in production extensively.
• Sounds risky? - Yes, but if you architect your system
correctly, you should be safe.
What we did
Selected the right Instance Type - use CloudWatch for CPU & memory usage
- Load Test
Designed our servers to be self-sufficient and perishable - Business logic & DB on same server
- Transaction Logs written to EBS
- Auto Setup on Server
- Data Collection module
We built a custom Scaling solution - Add/Remove instances by checking present traffic & predicting traffic
in the immediate future
- Based on trending of spot prices either try launching spot or fall back
to on-demand instances
- Remove servers if in use between 45-55min
- Track spot prices to shift to on-demand
1
2
3
What AWS did
Reduced pricing for EC2 (On Demand & Reserved) and S3
Cheap Archival System - Glacier
Pre warming of Load Balancer (ELB)
AMI movement across regions
ELB with equal distribution of traffic across instances
spread in any Availability Zone
1
2
3
4
5
THANK YOU!
Ashay Padwal CTO & Co-Founder [email protected]
Closing – Key Takeaways
• Re-evaluate, revist and re:Invent
Evolve along with AWS
• Leverage
Managed Services, CloudWatch
• Stay up to date
RI modifications, Trusted Advisor
• AWS Blog: aws.typepad.com
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
CPN211
Recommended