Ways to minimise performance risks in continuous delivery

WAYS TO MINIMISE PERFORMANCE RISKS IN CONTINUOUS DELIVERY

Adriaan Thomas4 June 2013

INTRODUCTION

OBJECTIVEPut working software into production as quickly as possible, whilst minimising risk of load-related problems:

• Bad response times

• Lack of capacity

• Availability too low

• Excessive system resource use

Within the context of websites.

TRADITIONAL APPROACHLoad testing through simulation

http://www.flickr.com/photos/danramarch/4423023837

DECIDE WHAT TO TEST

•Focus on busiest instant•Model most-hit functionality•Extrapolate to expected load

•Look at production traffic•Or attempt educated guess

DECIDE ON SCOPE

Component test

Chain test

Full environment test•Test coverage•Level of certainty•Number of systems•Amount of work

SET UP TEST DATA

• Usually starts as a copy from production

• Or educated guess what people will enter

• Render anonymous

• Make tests deterministic

• Synchronise between all systems

http://www.flickr.com/photos/22168167@N00/3889737939/

DECIDE ON STRATEGY

One or more of:

•Scalability test

•Stress test

•Endurance test

•Regression test

•Resilience testhttp://www.flickr.com/photos/timjoyfamily/5935279962/

DECIDE ON TEST DURATION

(which is tricky)

http://www.flickr.com/photos/wwarby/3297205226

PROVIDE HARDWARE

http://www.flickr.com/photos/s_w_ellis/2681151694/

Copy of production?

Only one copy?

Virtualisation?

Sharing between teams?

INTEGRATE INTO PIPELINE

Unit testFunctional integration

testLoad test

Very fast Fast Takes longer

INTEGRATE INTO PIPELINE

Unit test

Functional integration

Load test

Very fast Takes longer

PERMANENT LOAD TESTING

Daytime: constant load, teams inspect impact of changes

Nighttime: Endurance test

Weekends: refresh test data

http://ww

w.flickr.com/photos/renaissancecham

bara/5106171956/

RESPONSE TIMEDNS lookup (www.xebia.com)

Time to first byte + loading HTMLTime to render

Time to document complete

Browser CPU useBandwidth

# connections to a single host

http://www.webpagetest.org/result/130522_FG_10SC/1/details/

SSL handshake

Parse times

Blocking client code

IMPACT OF THE BROWSERwww.browserscope.org

CLEAR REQUIREMENTSResponse time

Fail: 10 Now: 3.5 Goal: 1Intention: Users get a response quickly so that they are happy and spend more money.

Stakeholder: Marketing dept.

Scale: 95th percentile of “document complete” response times, in seconds, measured over one minute.

Metric: Page load times as reported by our RUM tool.

Inspired by Tom Gilb, Competitive Engineering

WebPageTest: first view + repeat view (median of 3)

95th percentile response times from access logs

ADJUST REQUIREMENTS DUE TO LACK OF REAL BROWSERS

Playground to test changesNo impact on real users

Less pressure

More work

Guesswork and extrapolationCan take a significant amount of time

More hardware

THINGS WILL BREAK...... in spite of your best efforts

http://www.flickr.com/photos/jmarty/1239950166/

SO INSTEAD WE SHOULD FOCUS ON FAST RECOVERY

http://www.flickr.com/photos/19107136@N02/8386567228/

“MTTR is more important than MTBF*”

John Allspaw

* for most types of F

ile re

Test duration

MTBF LEADS TO FUD

Time→TTD find cause (RCA) write & test fix build deploy validatecom

deploy & testMonitoring

Alerts

• Skills•Organisation•Culture•Maintainability• Simple architecture

•Fast w

orkstations•

Good tooling

ble to quickly test locally

utomation

•Fast build server•

Efficient tests

Monitoring•

ation•

Flexible architecture

DEMING FEEDBACK LOOPS

OODA LOOPS

Observe

Orient

Decide

AVOID TEST-ONLY MEASUREMENTS

SIMPLE ARCHITECTURE

THE ONLY THING THAT MATTERS IS WHAT HAPPENS IN PRODUCTION

Everything else is an assumption.

DEPLOYING CHANGES

http://www.flickr.com/photos/39463459@N08/5083733600

BLUE-GREEN DEPLOYMENTS

Version n+1

Version n

Amazon Route 53

Elastic Load

Balancer

Elastic Load

Balancer

Instances

DARK LAUNCHINGWeb page DB

DARK LAUNCHINGWeb page DB Weather SP

FEATURE TOGGLES

CANARY RELEASING

0% 100%

PRODUCTION-IMMUNE SYSTEMS

CONTROLLED LOAD TESTING

Instance RDS DB Instance

RDS DB InstanceRead Replica

Instance

Amazon Route 53

Elastic Load

Balancer

MONITORING

http://www.flickr.com/photos/smieyetracking/5609671098/

MONITORINGTechnical metrics•CPU use•Memory use•TPS•Response times•etc

Process metrics•# bugs•MTTR, MTTD•Time from idea to live on site•etc

Business metrics•Revenue•# unique visitors•etc

http://www.flickr.com/photos/smieyetracking/5609671098/

MEASURE IMPACT OF CHANGES

tail -‐f access_log | alstat.pl -‐i10 -‐n10 -‐stt

Hits Hits% TPS AvgTmTk TTmTk% AvgRSize RSize% 2013-‐06-‐04 19:37:40 (08) 14 0.1% 1.4 1.652 5.7% 2691 0.2% POST 200 /login.do 14 0.1% 1.4 0.918 3.2% 3739 0.3% GET 200 /home.do 14 0.1% 1.4 0.879 3.1% 3185 0.2% POST 200 /order.do 7 0.1% 0.7 0.807 1.4% 1974 0.1% POST 200 /account.do 4 0.0% 0.4 0.735 0.7% 3228 0.1% GET 200 /products.do 5 0.0% 0.5 0.697 0.9% 969 0.0% POST 200 /settings.do 9 0.1% 0.9 0.687 1.5% 1827 0.1% POST 200 /changeorder.do 27 0.2% 2.7 0.649 4.3% 2997 0.4% POST 200 /newpasswd.do 15 0.1% 1.5 0.580 2.2% 2488 0.2% GET 200 /offer.do 95 0.9% 9.5 0.520 12.2% 4801 2.3% GET 200 /search.do

MEASURE LATENCYAvg. response times front end vs backend

Number of calls

SMALL DEPLOYMENTS

http://www.flickr.com/photos/rbulmahn/4925464931/

GO/NO-GO MEETINGS

• What are the biggest fears?

• How can we measure this?

• What can be done if it does happen?

RETROSPECTIVESHow can we prevent a failure from happening again?

How can we detect it earlier?

Was there only one root cause?

http://www.flickr.com/photos/katerha/8380451137

INTRODUCE OUTAGES

Chaos monkey

Game day exercises

http://www.flickr.com/photos/frostnova/440551442/

CULTURE

• Dev and Ops work together on providing information.

• Assumptions are dangerous, try to eliminate as many as possible.

• Small changes are easier to fix than large ones.

• Deploy during office hours so everyone is available in case problems happen.

• All information, including business metrics, should be accessible to everyone.

Culture

Automation

Measurement

Sharing

SIMPLE, FLEXIBLE ARCHITECTURE

• If the site goes down often, probably its architecture is at fault

• Avoid fragile systems

• Resilience is key

• Scalable (redundancy is not waste)

• Rather many small systems than a few large ones

• State is a “hot brick”

CHANGES FOR THE BUSINESS

• Accept to push smaller changes.

• Continuous delivery vs continuous deployment.

• Share data.

CONCLUSION

Work on your ability to respond to failure. Trying to prevent failure can slow you down and make you focus on the wrong things.

Keep assumptions clearly separated from facts. Make your decisions based on evidence.

Measure everything, including the impact of changes to the business.

Look for your compromise, try permanent load testing first and learn from that.

QUESTIONS?

athomas@xebia.com@a32anwww.xebia.comblog.xebia.com

(we’re hiring)

Ways to minimise performance risks in continuous delivery

Technology

Minimise Waste Maximise Profits

Research Choice Managed Portfolios - North Online · The Research Choice Managed Portfolios offer you A diversified portfolio of assets to help enhance returns and minimise risks

Guide to bottle feeding - assets.publishing.service.gov.uk · Guide to bottle feeding how to prepare infant formula and sterilise feeding equipment to minimise the risks to your baby

Our Principal Risks and Uncertainties€¦ · the Group risk profile to identify • Evaluate Group-wide severe, but plausible, risks and implications • Drive continuous improvement

Risk & Risk Management. Risk management Risk management is concerned with identifying risks and drawing up plans to minimise their effect on a project

28 - Comcare · minimising exposure to hazards or risks, ... practicable’ to eliminate or minimise the health and safety ... as preventative and remedial measures that can be put

Foreword - safeworkaustralia.gov.au · Web viewAll people-handling activities are a potential source of injury and you must eliminate or minimise the risks associated with this

DATANGO RELOADED datango performance suite · Guide your users through your applicati-on with datango navigation to minimise process risks from input errors, simulta-neously increasing

& * J - Acclivis...Enterprise Business Continuity Operations gre a Business Continuity Disaster Recovery Performance Hub O Office 365 acclivis.com.sg Reduce risks: Minimise downtime

COVID-19 VENTILATION GUIDANCE … · ventilation guide. INTERACTIVE PDF. 2 Contents 1 Why indoor ventilation is important to reduce Covid-19 cases 3 1.1 Covid risks 5 2 Minimise risks

EU Customs Competency Framework - Role Descriptions - Risk · concerning the customs treatment of goods. To minimise the occurrence of risks, Customs can use risk management as a

ParentsNext Deed 2018-2021 - docs.employment.gov.au€¦ · Web viewmeans having in place strategies to minimise and manage risks of exposure to inappropriate or harmful on-line

Healthcare Infection Surveillance of Western Australia · * trends are identified and clinicians engaged to review clinical care to minimise infection risks and thus ... systems are

Victoria - Bradnam's Windows & Doors · On Site Protection Guidelines Bradnam’s recommend that builders follow the instructions below to minimise the risks of damage to windows

PDF Continuous Delivery for DC:OS with Spinnaker · Why continuous delivery? •Decrease risks of deployment •Decrease cost of deployment •Decrease delay between feature development

Risk Management & Safety · Risk Management is the identification, assessment, and prioritisation of risks, followed by actions to minimise, monitor, and control the probability and/or

FACILITIES MANAGEMENT - Churches Fire...Working across many industries, including facilities management, ... A fire safety management system can significantly reduce risks and minimise

Understanding Construction Risk · PDF fileUnderstanding Construction Risk Assessment A basic guide. ... minimise the risks. u x v w y Start here Guided example: Steel erection . Guided

Blackberry - daf.qld.gov.au · Managing blackberry The GBO requires a person to take reasonable and practical steps to minimise the risks posed by blackberry. This fact sheet provides

Contract management: control value and minimise …Contract management: control value and minimise risks A necessary response to the overwhelming quantity and diversified complexity