Upload
peter-sankauskas
View
452
Download
3
Embed Size (px)
Citation preview
@pas256https://cloudnative.io/
Auto Scaling GroupsAdvanced AWS meetup
!
!
!
Peter Sankauskas Founder of CloudNative
@pas256
@pas256https://cloudnative.io/
Daily lifeMore users
Higher costsMore logsMore data
New engineers
More instances
Increased deployment frequency
Reduce costs Eliminate deployment risksBoss
Deadline
@pas256https://cloudnative.io/
Your GoalSleep
ReliableSocial life
Sleep
UptimeTime with family
Sleep
@pas256https://cloudnative.io/
– PagerDuty
“Don’t hate the Pager, hate the game”
@pas256https://cloudnative.io/
Old world
Inst
ance
s ru
nnin
g
0
2
4
7
9
11
Used Capacity
70% Wasted
@pas256https://cloudnative.io/
Auto Scaling Group
• Your assistant in the cloud
• First level support
• Automation
0
2
4
7
9
11
Used Capacity
@pas256https://cloudnative.io/
Auto Scaling Group• Capacity: minimum, maximum, desired
• Access: ELB
• Polices
• Where:
• Availability Zones
• VPC Subnets
ASG Launch Config
Scaling PolicyScaling Policy
Scaling PolicyScheduled
Action
Scheduled Action
Scheduled Action
@pas256https://cloudnative.io/
{! "Type" : "AWS::AutoScaling::AutoScalingGroup",! "Properties" : {! "AvailabilityZones": [ String, ... ],! "Cooldown": String,! "DesiredCapacity": String,! "HealthCheckGracePeriod": Integer,! "HealthCheckType": String,! “LaunchConfigurationName": String,! "LoadBalancerNames": [ String, ... ],! "MaxSize": String,! "MetricsCollection": [ MetricsCollection, ... ]! "MinSize": String,! “NotificationConfiguration": NotificationConfiguration,! "PlacementGroup": String,! "Tags": [ Auto Scaling Tag, ... ],! “TerminationPolicies": [ String, ... ],! "VPCZoneIdentifier": [ String, ... ]! }!}
@pas256https://cloudnative.io/
Launch Configuration• Every ASG needs a Launch Configuration
• Describes what an individual EC2 instance looks like
• AMI
• Instance type
• Security groups
@pas256https://cloudnative.io/
{! "Type" : "AWS::AutoScaling::LaunchConfiguration",! "Properties" : {! "AssociatePublicIpAddress": Boolean,! "BlockDeviceMappings": [ BlockDeviceMapping, ... ],! "EbsOptimized": Boolean,! "IamInstanceProfile": String,! "ImageId": String,! "InstanceMonitoring": Boolean,! "InstanceType": String,! "KernelId": String,! "KeyName": String,! "RamDiskId": String,! "SecurityGroups": [ SecurityGroup, ... ],! "SpotPrice": String,! "UserData": String! }!}
@pas256https://cloudnative.io/
Scaling Plans
1. Fixed
2. Manual
3. Scheduled
4. Dynamic
@pas256https://cloudnative.io/
Fixed• Ensure a fixed number of instances is always running
• Set MinSize = MaxSize
• Examples
• Any “master” service
• Zookeeper - 3 nodes across 3 AZs
• Cassandra0
1
2
3
Used Capacity
@pas256https://cloudnative.io/
# One Asgard instance - troposphere example!launchConfig = t.add_resource(asg.LaunchConfiguration("launchConf",! AssociatePublicIpAddress=True,! IamInstanceProfile=Ref(asgardInstanceProfile),! ImageId=FindInMap("AWSRegion2AMI", Ref("AWS::Region"), "AMI"),! InstanceType="m3.medium",! KeyName="admin",! SecurityGroups=[Ref(asgardInstanceSecurityGroup)],!))!!
asgardASG = t.add_resource(asg.AutoScalingGroup("asgardASG",! Tags=[asg.Tag("Name", "Asgard", True)],! Cooldown="120",! MinSize="1",! MaxSize="1",! AvailabilityZones=["us-west-2a","us-west-2b"],! VPCZoneIdentifier=["subnet-c46c6982","subnet-8133f6e4"],! LaunchConfigurationName=Ref(asgardLaunchConfig),!))
@pas256https://cloudnative.io/
Manual Scaling
• Use API to change capacity on demand
SetDesiredCapacity!
• AutoScalingGroupName = my-asg
• DesiredCapacity = 20
1
2
Used Capacity0
1
2
@pas256https://cloudnative.io/
Scheduled
• At this time, set capacity to X
• Each ScheduledAction must have a unique start time
• Guaranteed order of execution within same ASG
0
2
4
7
9
11
Used Capacity
@pas256https://cloudnative.io/
Specific date and timePutScheduledUpdateGroupAction!
• ScheduledActionName = ScaleOut
• AutoScalingGroupName = my-asg
• DesiredCapacity = 3
• StartTime = “2013-05-12T08:00:00Z”
@pas256https://cloudnative.io/
Recurring schedulePutScheduledUpdateGroupAction!
• ScheduledActionName = Scaleout-schedule-year
• AutoScalingGroupName = my-asg
• DesiredCapacity = 3
• Recurrence = “30 0 1 1,6,12 0”
@pas256https://cloudnative.io/
Dynamic Scaling
• Best Utilization
• Lowest Cost
0
2
4
7
9
11
Used Capacity
@pas256https://cloudnative.io/
Trigger: CloudWatch Alarm• Metrics
• CPU Utilization
• Network in/out
• Size of queue (SQS)
• Anything you put into CloudWatch
• Set the Alarm Action to the ARN of the ScalingPolicy
@pas256https://cloudnative.io/
Action: ScalingPolicy• Adjustment Types
• Change by number
• E.g. Scale Out: Add 2 more instances
• E.g. Scale In: Remove 1 instances
• Exact
• E.g. Scale Out: Have exactly 8 instances
• Percentage
• E.g. Scale Out: Add 25% more instances
@pas256https://cloudnative.io/
Cooldown
• After a ScalingPolicy has been fired, wait X seconds before performing any other actions.
• Manual Scaling: SetDesiredCapacity
• HonorCoolDown = True/False
@pas256https://cloudnative.io/
Load Balancing
• Put an ELB in front of the instance in your ASG
• Set when creating the ASG
• Zero effort in adding and removing instances
• Additional health check options
@pas256https://cloudnative.io/
Health Checks• By default, ASG uses EC2 Status Checks
• If you have an ELB, you can use the same ELB health checks
• HTTP:80/healthcheck!
• HTTP 200 response is the only thing that is considered healthy
• E.g. Return something else while app is loading filled
@pas256https://cloudnative.io/
Termination Policy
• OldestInstance
• NewestInstance
• OldestLaunchConfiguration
• ClosestToNextInstanceHour
@pas256https://cloudnative.io/
@pas256https://cloudnative.io/
Requirements for Dynamic Scaling• Stateless application
• Configuration must be 100% automated
• Tools understand dynamic environments
• Config management
• Monitoring
• Log aggregation
@pas256https://cloudnative.io/
@pas256https://cloudnative.io/
Migration
• Create an ASG or LaunchConfiguration from an already running instance
• Put that instance in the ASG
@pas256https://cloudnative.io/
{! "Type" : "AWS::AutoScaling::AutoScalingGroup",! "Properties" : {! "AvailabilityZones" : [ String, ... ],! "Cooldown" : String,! "DesiredCapacity" : String,! "HealthCheckGracePeriod" : Integer,! "HealthCheckType" : String,! "InstanceId" : String,! "LaunchConfigurationName" : String,! "LoadBalancerNames" : [ String, ... ],! "MaxSize" : String,! "MetricsCollection" : [ MetricsCollection, ... ]! "MinSize" : String,! "NotificationConfiguration" : NotificationConfiguration,! "PlacementGroup" : String,! "Tags" : [ Auto Scaling Tag, ... ],! "TerminationPolicies" : [ String, ... ],! "VPCZoneIdentifier" : [ String, ... ]! }!}
@pas256https://cloudnative.io/
{! "Type" : "AWS::AutoScaling::LaunchConfiguration",! "Properties" : {! "AssociatePublicIpAddress" : Boolean,! "BlockDeviceMappings" : [ BlockDeviceMapping, ... ],! "EbsOptimized" : Boolean,! "IamInstanceProfile" : String,! "ImageId" : String,! "InstanceId" : String,! "InstanceMonitoring" : Boolean,! "InstanceType" : String,! "KernelId" : String,! "KeyName" : String,! "RamDiskId" : String,! "SecurityGroups" : [ SecurityGroup, ... ],! "SpotPrice" : String,! "UserData" : String! }!}
@pas256https://cloudnative.io/
# Instance Configuration - Self healing NAT - troposphere!natLaunchConfig = t.add_resource(asg.LaunchConfiguration(! "natLaunchConfig",! AssociatePublicIpAddress=True,! InstanceType="t1.micro",! ImageId="ami-f032acc0",! SecurityGroups=[Ref(natSecurityGroup)],! IamInstanceProfile=Ref(natInstanceProfile),! UserData=Base64(Join("\n", [! "#!/bin/bash",! "yum update -y",! "instanceId=`/opt/aws/bin/ec2-metadata -i | cut -f2 -d' '`",! "region=`/opt/aws/bin/ec2-metadata -z | cut -f2 -d' ' | sed '$s/.$//'`",! "vpcId=`aws ec2 describe-instances --instance-ids $instanceId --region $region --query 'Reservations[*].Instances[*].VpcId' --output text`",! """rtbId=`aws ec2 describe-route-tables --region $region --filters "[{\\"Name\\":\\"vpc-id\\",\\"Values\\":[\\"$vpcId\\"]},{\\"Name\\":\\"association.main\\",\\"Values\\":[\\"true\\"]}]" --query RouteTables[*].RouteTableId --output text`""",! """aws ec2 modify-instance-attribute --instance-id $instanceId --source-dest-check '{"Value": false}' --region $region --output table""",! "aws ec2 replace-route --route-table-id $rtbId --destination-cidr-block 0.0.0.0/0 --instance-id $instanceId --region $region --output table",! "aws ec2 create-route --route-table-id $rtbId --destination-cidr-block 0.0.0.0/0 --instance-id $instanceId --region $region --output table"! ]))!))
@pas256https://cloudnative.io/
UserData and cloud-init• Inside LaunchConfiguration
• Set UserData script to be run by cloud-init
• If you are using Chef, this is what you will do
• More details:
• Watch Episode #4 on Answers for AWS
@pas256https://cloudnative.io/
Baking AMIs• Raw: Do everything on boot
• Fully Baked: Immutable infrastructure
• Half-Baked: Anything in-between
!
http://answersforaws.com/blog/2013/11/half-baked/
@pas256https://cloudnative.io/
Deploy Changes• Option 1: Change AMI or User Data in LaunchConfiguration
• NOTE: This has no immediate outcome
• Only affects newly launched instances
• Revisit TerminatePolicy
• You need to terminate existing instances so that new ones come up with the changes
@pas256https://cloudnative.io/
Deploy Changes• Option 2: Create a completely new stack
• Use CloudFormation (or whatever) to create a new ASG, LaunchConfig, ScalingPolicies, ELB, Security Group, VPC, Subnets, etc
• Overkill
• If you have high traffic, the new ELB will not be pre-scaled and will not handle the load
• Need to contact AWS TAM
@pas256https://cloudnative.io/
Blue/Green DeploymentOr is a red/black deployment… or is it A/B deployment?
• Option 3:
• Reuse existing infrastructure including the same ELB
• Create a new ASG and LaunchConfig
• Switch traffic at the ELB from old ASG to new ASG
@pas256https://cloudnative.io/
– Peter Sankauskas… just now
“It’s not about how fast you can deploy, it is about how fast you can rollback”
@pas256https://cloudnative.io/
Canary Deployment• Very similar to blue/green deployment
• New ASG and LaunchConfig
• Add traffic to only 1 instance in the new ASG
• Then 2 instance
• Up to 100%
• Both versions running side by side
• Roll off traffic from old ASG instances
@pas256https://cloudnative.io/
Running multiple version• DB Schema changes are on a different schedule to code
deployments
• mcfunley (Etsy): “We deploy schema changes once per week. The code always works against both versions of the schema. We never take downtime for schema changes. We avoid data loss by doing soft deletes as much as we can.”
• Deploy features dark
• Use Feature Flags
@pas256https://cloudnative.io/
Tools• Baking AMIs
• Packer - Hashicorp
• Aminator - Netflix
• CloudNative
• Deployment
• Asgard - Netflix
• CloudNative
@pas256https://cloudnative.io/
New World• Automation expert
• Stateless, independently scalable apps
• Allergic to manual labor
• Embrace your laziness
• Auto Scaling Groups provide:
• Zero-effort scaling
• Fault-tolerance
• Increase reliability & uptime
• Decrease cost