25
Release Often Release Safely Sergejus Barinovas (@sergejusb) http://sergejus.blogas.lt

Release Often Release Safely

Embed Size (px)

DESCRIPTION

Kung-Fu of releasing often but safely for high loaded systems

Citation preview

Page 1: Release Often Release Safely

Release Often Release Safely

Sergejus Barinovas (@sergejusb)

http://sergejus.blogas.lt

Page 2: Release Often Release Safely

This is not a theoretical presentation

Page 3: Release Often Release Safely

This presentation based on real life experience

Page 4: Release Often Release Safely

Successful software workflow

Your software

cannot go down

You got even more

customers

You got customers

You released software

Page 5: Release Often Release Safely

Dilemma: Innovative or Stable?

Innovative Often (bi-weekly) releases of new features Higher risk of bugs and downtimes

Stable Higher uptime and better customer perception Seasonal releases of new features

Page 6: Release Often Release Safely

We wanted both …

… be innovative and agile while staying as much stable as possible

Page 7: Release Often Release Safely

Stability in our terms

99.999% uptime for serving ads

2 datacenters + clouds

500 M requests / day

Page 8: Release Often Release Safely

Let’s learn Kung Fu of releasing often

and safely

Page 9: Release Often Release Safely

Challenges we ha(d/ve)

Detect issues in production as soon as possible

Test new features in production while reducing impact for customers

Roll-out new features in a controlled manner

Page 10: Release Often Release Safely

Detect issues in production ASAP

Monitoring Choose monitoring system carefully

It took us about 1 year (Zabbix) First list all your possible monitoring use cases

Prepare your software for monitoring Logging is a must have! Performance / SLA counters help to measure

and understand software better Create a clear baseline to compare

with after releases

Page 11: Release Often Release Safely

Detect issues in production ASAP

Automated functional tests Designed to detect end-user issues

Differently than unit and integration tests

UI / business logic Still not as many as we want (Selenium UI / C#) Ongoing process of unifying automated QA tests

Run after each release and on periodic basis Very important if you have > 1 server Huge time saver if tests are repetitive

Page 12: Release Often Release Safely

Though unit tests help in finding bugs during coding, they are more

vital when software evolves!

Finding

Page 13: Release Often Release Safely

Test new features in production

Even ideal staging environment is not equal to production environment

Before starting rolling-out new feature it is important to check its Resource consumption

CPU / RAM / HDD / IO / Network

Performance impact on existing functionality Response times / SLA

Stability Errors / memory leaks

Page 14: Release Often Release Safely

Test new features in production

Use Case #1:

Safely rollout new feature that integrates into core data collection pipeline

Page 15: Release Often Release Safely

Test new features in production

Dark releases Works best with brand new features Release new feature to one or several servers New feature gets real load, but is not available

for customers Have automated rollback package in

case something goes wrong

Page 16: Release Often Release Safely

Test new features in production

Dark release notes from our release plan

Release Date

Release Type

Team Project/Product Release Notes

2011.08.03 Dark RnD Topic Modelling Final part of the Topic Model Storage dark release. Changes to pullTransactions procedure on all Collect serversEnabled for Danish, Sweden and English languages

2011.08.02 Dark RnD Topic Modelling Part 2 of the Topic Model Storage dark release.Changes to pullTransactions procedure on Collect2 serverEnabled for Danish language only

2011.08.01 Dark RnD Topic Modelling Part 1 of the Topic Model Storage dark release.SQL part of Administration and Collect servers (apart from pullTransactions procedure, this will be in part 2)Windows service part of Proc03 including integration with Amazon

Page 17: Release Often Release Safely

Test new features in production

Use Case #2:Safely migrate to the new SQL connection pooling mechanism

Page 18: Release Often Release Safely

Test new features in production

Feature flags and switchers Works both for brand new features and updates Feature can be switched on / off any time

if (FeatureEnabled) then … if (UseNewLogic) then … else …

Can effect existing customers Possible to test each server one by one

by switching feature on / off

Page 19: Release Often Release Safely

Test new features in production

Use Case #3:

Safely migrate to the brand-new intelligent targeting subsystem

Page 20: Release Often Release Safely

Test new features in production

Valves Very similar to switches Feature can get from 0% to 100% of real load Very handy to gradually roll-out new features on

each server one by one So far helped us a lot though require extra

development effort

Page 21: Release Often Release Safely

Test new features in production

Caveats we had so far Make sure you can turn features on / off without

effecting connected users Create simple interface to display current status

of all switches and valves on each affected server Secure access to switches and valves

Page 22: Release Often Release Safely

Controlling roll-out of new feature

Switches and valves enable very smooth and controlled roll-out

Partial roll-out to different datacenters / clouds Different datacenters / clouds have different version

of feature released Redirect all traffic to the new or old version of feature

Page 23: Release Often Release Safely

Controlling roll-out of new feature

Future research: application level load balancing Load balancer can act as a switches / valve without

actually programming load distribution logic Ability to automatically redirect users to the new

version of application while preserving old one

Page 24: Release Often Release Safely

Summary

Monitoring system is very important, but your software should be prepared for this

Automated functional tests are functional monitoring of your software

Switches and valves are very powerful concept for testing in production and roll-outs, but require extra development and maintenance time

Dark releases and partial roll-outs are the most cost effective safety mechanism

Page 25: Release Often Release Safely

Thanks! Questions?Sergejus Barinovas (@sergejusb)

http://sergejus.blogas.lt