29
LearnBop Blue/Green Deployments October 2015

LearnBop Blue Green AWS Deployments - October 2015

Embed Size (px)

Citation preview

Page 1: LearnBop Blue Green AWS Deployments - October 2015

LearnBop Blue/Green Deployments

October 2015

Page 2: LearnBop Blue Green AWS Deployments - October 2015

whoami

Page 3: LearnBop Blue Green AWS Deployments - October 2015

utcnow

CTO at

www.learnbop.com

Algorithmic individual tutoring tuned by veteran teachers

Common Core and state standards supported

Currently enjoyed in schools. Sign up to be notified when parent led version is live:http://go.learnbop.com/amazon-parents

Page 4: LearnBop Blue Green AWS Deployments - October 2015

Common sample architecture

Page 5: LearnBop Blue Green AWS Deployments - October 2015

General release good practices

Continuous integration - build, test, etcScripted environment creation/update (ideally in source control)Scripted “one-click” deployNew code, API’s AND database schema should be backwards compatible

Page 6: LearnBop Blue Green AWS Deployments - October 2015

Why not rolling releases?

Not immutable infrastructure❖ Opportunities for config creep❖ Rollback risks - Code only releases likely easy. What if you patch the

OS, update a few libraries, etc?

Manual or automatic complexity tracking version state

Some big change will require new servers/environment anyway

Page 7: LearnBop Blue Green AWS Deployments - October 2015

Why blue/green?

Immutable infrastructure❖ Ensures your environment build process is up to date each release❖ Old environment is guaranteed untouched if rollback or comparison

needed

Rollback is FAST

Same process for minor or major changes (OS updates? no problem)

One button spin up & deploy plus one button to shift traffic. Either old or new. No complex in between risk.

Page 8: LearnBop Blue Green AWS Deployments - October 2015

Swap CNAMEs to the rescue?

Page 9: LearnBop Blue Green AWS Deployments - October 2015

Swap CNAMEs to the rescue?

Page 10: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 1

Page 11: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 1Maybe in 1993…GET / HTTP/1.0

Page 12: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 1Maybe in 1993…GET / HTTP/1.0

Page 13: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 2GET / HTTP/1.1

Page 14: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 3

Page 15: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 4

Page 16: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 5

Page 17: LearnBop Blue Green AWS Deployments - October 2015

Web Request Path - Round 6

Page 18: LearnBop Blue Green AWS Deployments - October 2015

Swap CNAMEs to the rescue?

Page 19: LearnBop Blue Green AWS Deployments - October 2015

Swap CNAME worst caseBad Scenario 1 - Users stuck on old pre-swap version longer than a few min

User actively clicking on the site with keep the HTTP keep-alive sockets active and won’t get a chance to check DNS again

Browser and OS DNS cache can keep old value longer than a minimal DNS TTL

Some DNS servers or apps may be configured/misconfigured with abnormally high TTL

Bad Scenario 2 - Users stuck on old pre-swap version INDEFINITELYLong polling, websockets, notification refresh will keep re-using the same

HTTP keep-alive socketIt never goes back to a DNS server to get a new address as long as they

don’t lose internet access/close the browserI’ve seen it happen 12+ hours

Page 20: LearnBop Blue Green AWS Deployments - October 2015

Swap CNAME worst caseBad Scenario 3 - Semi-permanent stale data

CDN caches old version of file during you swapBrowser gets old file with Cache-Control: max-

age=3600 and caches it for a YEAREmergency Workarounds

Tell your users to clear cache (not a great move for public websites)

Change your cachebuster ?build= # and re-publishDisable CDNBad Scenario 4 - User requests going from old

→ new → old serversRequest hits one bank of DNS servers and gets new

IPHit different bank of DNS servers and gets old IPCould send new form data to old server backend...

Page 21: LearnBop Blue Green AWS Deployments - October 2015

Swap CNAME worst caseBad Scenario 3 - Semi-permanent stale data

CDN caches old version of file during you swapBrowser gets old file with Cache-Control: max-

age=3600 and caches it for a YEAREmergency Workarounds

Tell your users to clear cache (not a great move for public websites)

Change your cachebuster ?build= # and re-publishDisable CDNBad Scenario 4 - User requests going from old

→ new → old serversRequest hits one bank of DNS servers and gets new

IPHit different bank of DNS servers and gets old IPCould send new form data to old server backend...

Page 22: LearnBop Blue Green AWS Deployments - October 2015

How do we know what version users are hitting?

Page 23: LearnBop Blue Green AWS Deployments - October 2015

How do we know what version CDN is hitting?

Page 24: LearnBop Blue Green AWS Deployments - October 2015

Discarded AlternativesTry to reuse ELB OR put servers in a 3rd ELB (not in blue or green env)

Complex to manage which servers should be in and outIf using Elastic Beanstalk and auto-scaling complex to manage new servers

or putting in servers

Trick Beanstalk into switching the ELB it’s using (swap ELB for pre and post)

Error: Tag keys starting with ‘aws:’ are reserved for internal use

Swap CNAMEs first and then put new nodes in both new and old ELB. Remove old nodes from old ELB after

Not bad but still need to leave old ELB up in case of old DNSPre-rollback testing hard as old nodes are not reachable

Page 25: LearnBop Blue Green AWS Deployments - October 2015

Final Solution AttributesAttributes

Only possible relatively recently with new AWS attach/detach ELB to AutoScaling Group (ASG) feature out June 11th - see blog post

Fully scripted and one click (bash script run through RunDeck)Rollback is as simple and running it again to swap backNo CNAME/DNS changes!Old environment not hit more than 3 minutes after new servers come onlineNo one hitting new server has any risk of future request hitting old server

(unless you rollback)

Page 26: LearnBop Blue Green AWS Deployments - October 2015

Final Solution Environment SetupEnvironment work

Initial state: Beanstalk application with two environments running and green (staging and production)

Create two new ELB’s outside of Elastic Beanstalk (PROD and STAGING)Attach STAGING ELB to staging (pre-swap to prod) Autoscaling GroupCNAME dualstack DNS name of STAGING ELB to your staging web site

addressAttach PROD ELB to production Autoscaling GroupCNAME dualstack DNS name of PROD ELB to your production site

addressEnsure Connection Draining is enabled on all four ELBs with a timeout of

120 secondsEnsure application sets a session type cookie on EVERY requestCreate an ELB application controlled session stickiness cookie policy

Page 27: LearnBop Blue Green AWS Deployments - October 2015

Final Solution Steps - Sanity ChecksFirst Do No Harm! Lots of sanity checks before proceeding.

1. Confirm two environments exist in application and one has the PROD ELB attached to its ASG and the other has the STAGING ELB attached to its ASG.

2. Confirm both environments are Health: Green

Page 28: LearnBop Blue Green AWS Deployments - October 2015

Final Solution Steps1. Enable ELB application sticky cookie policy on PROD ELB (both HTTP and

HTTPS if applicable! - avoid users hitting new servers then old)2. Set PROD ELB Connection Idle Timeout to 20 seconds (to close

connection and thwart WebSockets, Long Polling, HTTP keep-alive)3. Attach PROD ELB to new code environment ASG (loop until complete)4. Detach PROD ELB from old code environment ASG (loop until complete)5. Disable ELB application sticky cookie policy on PROD ELB6. Set PROD ELB Connection Idle Timeout back to 60 seconds7. Attach STAGING ELB to old code environment ASG (loop until complete)8. Detach STAGING ELB from new code environment ASG (loop until

complete)9. Flag old code environment for termination (separate script 2 hours later)10.Flag deployment successful in 3rd party tools/monitoring

Rollback if needed is running the same script

Page 29: LearnBop Blue Green AWS Deployments - October 2015

Q&A / Thank you!

Always Be Shipping!Email: [email protected]: alec1a

Slide Deck (posted by Sunday, Oct 4th)http://tinyurl.com/bluegreen2015

LearnBop for Parentshttp://go.learnbop.com/amazon-parents