Upload
bob-brown
View
129
Download
8
Tags:
Embed Size (px)
Citation preview
Top 6 CostsEc2, RDS, SQS, S3, Support, Data Xfer
Also in ProductionDynamoDB, Elasticache, EMR, Lambda
On OccasionRedshift, Aurora, Kinesis, Data Pipeline
Of CourseIAM, Cloudfront, Route53, CloudTrail, SNS,
CWPlanning to Use
EFS, EC2 Container Service
We are “All In” on AWS
Some Useful Services to Gain VisibilityAWS Cost ExplorerNetflix ICE (via Teevity)CloudYN, Cloudability, Cloudcheckr,
CloudHealthTechAWS Billing and Detailed Billing CVS FilesCustom
Where are we?
Teevity is still building Teevity and welcomes any user that wants to register to go to http://teevity.com – they can register for free. More users provides data to help may Teevity better.
Teevity does not compete with the OSS version of Ice. They are building on top of it and around it (adding things to make it better) . The plan is to release a large and rich use-case oriented documentation on both NetflixOSS/Ice and Teevity in the coming month (http://docs.teevity.com)
Teevity plans to release a version on the AWS Marketplace a called "Teevity Incognito" so users can have their own instances.
Most Graphs are from Teevity
** Important **- This bill includes all charges except credits and refunds.- The first day of the month always has additional costs (support and reservations).- The time zone is UTC- The most recent day is always a partial result (delayed by at least a few hours). Date Amount Spent Running Total---------- ------------ -------------1970.01.01 5940 59402014.05.01 13366 193062014.05.02 2998 223042014.05.03 3152 254562014.05.04 2993 284502014.05.05 3078 315292014.05.06 2377 339072014.05.07 2505 364122014.05.08 2528 389412014.05.09 2572 415142014.05.10 2473 439872014.05.11 2562 46550
Differing Results
1970: Reservation Purchases, 5/1: Includes Monthly Reservation Cost
Amortized/Not AmortizedNew Services not IncludedSupport Included/Not IncludedDelayed ReportingReport Handling ErrorsConsolidation by Time ErrorsRefund/Credit HandlingTimeZone
Used Billing Invoice for AccuracyUsed Other Reports for Trends/ComparisonLet Accounting Sort out Amortization
Differing Results
Tags are your friendTag by
StackEnvironmentApplication
Scripts based on tagCost Control and Management Reports by
Tag
Identify Everything
We saw the savings in turning off instancesWrote a script to turn off and on dailyComplaints about unavailability to work
Work from homeWork late/earlyData Loss from Instance Shutdowns
Went from 14 hours off to 3 hours offNeeded a way allow developers to start stop
instances
Culture plays an issue
usage: listASGs.py [-h] [-v] [-e ENVIRONMENT] [-n NAME] [-r REGION] [-a {suspend,resume,set,start,stop,store}] [-w] [-c CAPACITY] [--excludes EXCLUDES] [-k]
List Autoscaling Groups and Act on themoptional arguments: -h, --help show this help message and exit -v, --verbose Up the displayed messages or provide more detail -e ENVIRONMENT, --environment ENVIRONMENT Set the environment variable for the filter. You can chose 'all' as well as dev/qa/prd/ops/int/... -n NAME, --name NAME Set the base stack name for the filter. Default is everything -r REGION, --region REGION Set the region. Default is everything -a {suspend,resume,set,start,stop,store}, --action {suspend,resume,set,start,stop,store} Determines the action for the script to take -w, --html Print output in HTML format rather than text -c CAPACITY, --capacity CAPACITY Specifies the value for capacity. Enter as '#/#/#' in min, desired, max order --excludes EXCLUDES Enter a regular expression to exclude matchnames -k, --kind Display the underlaying Instance Type
Developed script “listASGs.py”
Instances should be tied to an ASGAll instances MUST be tagged“Invalid” instances should be shut down
automaticallySimian Army
Janitor MonkeyGraffiti MonkeySecurity MonkeyConformity MonkeyDoctor MonkeyChaos Monkey, Chaos Gorilla
Orphan identification script
Shutdown Abandoned instances
i-bdf03614 --> ansible-xyz-AnsibleBob (m3.medium)i-46d4e196 --> edda-ops1-netflixEdda (m3.xlarge)i-1a5550c9 --> emr-prd1-CORE (m1.medium) [SPOT]i-1cd2a2b4 --> emr-prd1-CORE (m3.xlarge) [SPOT]i-36d3a39e --> emr-prd1-CORE (m3.xlarge) [SPOT]i-37d3a39f --> emr-prd1-CORE (m3.xlarge) [SPOT]i-38d3a390 --> emr-prd1-CORE (m3.xlarge) [SPOT]i-41dca992 --> emr-prd1-CORE (m1.medium) [SPOT]i-1dd3a3b5 --> emr-prd1-MASTER (m3.xlarge)i-215550f2 --> emr-prd1-MASTER (m1.medium)i-69dca9ba --> emr-prd1-MASTER (m1.medium)i-48fbcc9b --> emr-prd1-TASK (m1.medium) [SPOT]i-62fccbb1 --> emr-prd1-TASK (m1.medium) [SPOT]i-83f9ce50 --> emr-prd1-TASK (m1.medium) [SPOT]i-84fccb57 --> emr-prd1-TASK (m1.medium) [SPOT]i-86f9ce55 --> emr-prd1-TASK (m1.medium) [SPOT]i-8afccb59 --> emr-prd1-TASK (m1.medium) [SPOT]i-8df9ce5e --> emr-prd1-TASK (m1.medium) [SPOT]i-8ef9ce5d --> emr-prd1-TASK (m1.medium) [SPOT]i-aafbcc79 --> emr-prd1-TASK (m1.medium) [SPOT]i-acfbcc7f --> emr-prd1-TASK (m1.medium) [SPOT]i-1058a8fe --> experts-beta-experts (c3.xlarge)i-f00f1b01 --> ftp-ops1-ftp (m3.medium)i-a6106f75 --> gene-gene-gene (c3.2xlarge) [SPOT]i-f1c5510b --> internal-access1a-bubblewrapp (m3.medium)i-97f6676a --> jenkins-ops1-jenkins (c3.large)i-945ada7d --> lamp-dev-lamptest (m3.medium)i-8fd25f66 --> logstash-ops-logstash (m3.xlarge) [SPOT]i-8c8c8ca3 --> nissolr-prd1-nissolrStandAlone-Cloud1 (i2.xlarge)i-028f8f2d --> nissolr-prd1-nissolrStandAlone-Cloud2 (i2.xlarge)i-2be6e504 --> nissolr-prd1-nissolrStandAlone-Cloud3 (i2.xlarge)i-67d59ab1 --> nissolr-prd1-zookeeper1 (t1.micro)i-afcbac80 --> recommend-dev2-recommend (m3.medium)
Daily List of Orphans
DynamoDB Dynamic Script
Manual Adjustment
Automated Adjustment
https://aws.amazon.com/blogs/aws/auto-scale-dynamodb-with-dynamic-dynamodb/
Refactor the Code for Efficiencies
SQS Buffering/Batching (1:10)Long Polling
http://genekrevets.com/2015/07/23/gutting-amazon-web-services-bills-sqs-part-1/
Unattached Volumes can easily growYou can view unattached volumes by runningthe AWS cli command:
aws ec2 describe-volumes –output text | grep available
us-east-1a False 20 snap-5c4b92de available vol-f44096be standard
us-east-1a False 20 snap-5c4b92de available vol-b04a9cfa standard
us-east-1a False 60 20 snap-bf8db125 available vol-baae0c54 gp2
us-east-1a False 1200 400 snap-4629e4de available vol-5360fdbd gp2
us-east-1e False 48 16 snap-e49eb646 available vol-6c918e74 gp2
Review Unattached Volumes
Idle Load Balancers
ASG Shutdown does NOT
Eliminate Load Balancer C
osts
Rewrote CFN to handle Not a
lways
Using Load Balancer.
Consider Spots/Fleet with Backup
http://www.appneta.com/blog/aws-spot-instances/
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html
Initially, Multiple ASGs with Minimum On Demand
Discovered Spots stay up for long periodsMove all in to Spots with OnDemand BackupSwitching to Fleet with OnDemand BackupOnDemand Backup (Spots)
Two Minute Warning FlagSeparate ASG for On Demand is updated
Spots
Unused Reservations...
Machine Zone VPC Cnt AUI NUI HUI MUI LUI ORI Diff1 Diff2 OD Rate Price Loss OverPay Save
------------ ------------ --- --- --- --- --- --- --- --- ----- ----- ------- ------ --------- --------- --------
c3.large us-east-1a 4 6 -2 -2 0.105 $ 156.24
c3.large us-east-1e 10 8 2 2 0.105 $ 156.24
c3.xlarge us-east-1a 3 3 0 0 0.210
c3.xlarge us-east-1e 2 2 0 0 0.210
i2.xlarge us-east-1c 3 3 3 0.853 $ 1903.90
m3.large us-east-1a 2 2 1 0 -1 0.140 $ 106.16
m3.large us-east-1e 2 2 2 0 0.140
m3.medium us-east-1a 7 4 2 5 1 0.070 $ 52.08
m3.medium us-east-1a yes 1 1 1 0 0.070
m3.medium us-east-1c 2 1 2 1 0.070 $ 52.08
m3.medium us-east-1e 7 2 4 3 1 0.070 $ 52.08
*m3.xlarge us-east-1c 1 *** *** 0.280 0.0321 $ 23.88 $ 179.44
*m3.xlarge us-east-1e 6 *** *** 0.280 0.0321 $ 143.29 $1106.63
*t1.micro us-east-1a 283 *** *** 0.020 0.0031 $ 652.71 $3558.33
Unused Reservations (Our script)
What can we do?Transfer between Availability ZonesTransfer within a FamilyModify Instance Type to match reservationMove to Spot or Fleet
Unused Reservations
DetailsSpecify how AWS is responsible
Unable to View EMRsOnly Site Admin Root Accounts can see all EMRsDid log tickets to help resolve but no answerAmazon recommends not using root accounts
Detailed steps of process to discoverWork with your account representative
Credit full amount requested, $21,560
Request a Refund
1. Set up Standards (Multiple Accounts, Tagging Names)2. Gain Visibility – Get a tool to visualize Costs and Assets3. Tag Assets (Use CloudFormation, Scripts, Graffiti Monkey)4. Turn off Unused Instances (We started with QA/Dev)5. Use ASGs to turn off instances when less traffic6. Buy EC2 Reservations, not once a year, monthly. Try to use
fewer instance families7. Give Developers a way to Easily Turn On/Off ASGs/Instances8. Set Rules - must have tags, must be tied to an ASG9. Use Simian Army (Janitor Monkey) to automatically handle
cleanup10. Evaluate Price/Time/Need for Failover (Multi-AZ, Instances
across Regions, Geography)
Recap
11. Take advantage of drop in prices with Amazon12. Use the DynamoDB Dynamic Script to manage Read/Write
Capacity13. Understand how you are charged and refactor code as
needed14. Use SQS batch requests15. Use SQS long polling16. Buy non-EC2 Reservations - DynamoDB, RDS, Elasticache,
Redshift17. Consolidate Instances (RDS, EC2, Elasticache)18. Put alarms in place, pay attention to the Data19. Where appropriate, ask Amazon for a Refund20. Right Size Instances (Low Usage/Memory to Smaller
Instances), Avoid overprovisioning
Recap
21. Turn off Detailed Cloud Watch Monitoring if Not Needed22. Consider moving Cloud Watch Linux Data to cheaper
service (Librato, Self Hosted Graphite, etc)23. Look at Trusted Advisor Reports24. Delete Unattached Volumes25. Right Size Low Utilization (CPU/Memory) instances, move
to smaller instances26. Consider moving legacy instances to current instance types
(more powerful and at a lower cost)27. Modify Setup to convert Unneeded Load Balancers28. Convert to Spot and/or Fleet Instances (Bidding Strategies)29. Monitor Unused Reservations30. Move cloudwatch alarms/tracking elsewhere
Recap
31. Optimize Cloudfront (do you need to be close to all of the edges?)32. Move into VPC33. Use Placement34. Use Docker, Consolidate Containers to fewer instances35. Pay attentions to EIPs36. Know/Understand your EMR usage and expectations37. Pay attention to Data Transfer costs38. Use the Right Storage: S3, Normal or Reduced Redundancy,
Glacier, AutoDelete Policies, etc.39. Leverage Services (CloudSearch, DynamoDB, Lambda,
ElastiCache, etc)40. Set Termination by ASG to be "Closest to Instance Hour“ (Saves
10-15%)41. Use “burstable” instances when appropriate (when it’s good you
can save 20-50% going from m3.medium or c3.large to t2.medium)
Recap
Incremental Fixes, Rome wasn’t built in a dayReview Data PeriodicallyEngage Developers in the process(es)Create a culture of cost awarenessHave the users of the resource own some of the
responsibility for costsGet some cost data visibility to stakeholders dailyCustomize cost data for stakeholder’s needsCost isn’t everything, get metrics that compare to
subscribers, pageviews, customers, api calls, urls processed. Increased usage means increased costs and if traffic means revenue, that could be very good.
Have a Plan