37
11/05/2018 1 © 2013 – 2017 naked Agility Limited All Rights Reserved A DevOps Story from the trenches @MrHinsh 1 Martin Hinshelwood | @MrHinsh [email protected] | http://nkdagility.com/blog © 2013 – 2017 naked Agility Limited All Rights Reserved 2

A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

1

© 2013 – 2017 naked Agility Limited All Rights Reserved

A DevOps Story from the

trenches@MrHinsh 1

Martin Hinshelwood | @MrHinsh

[email protected] | http://nkdagility.com/blog

© 2013 – 2017 naked Agility Limited All Rights Reserved

2

Page 2: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

2

© 2013 – 2017 naked Agility Limited All Rights Reserved

© 2013 – 2017 naked Agility Limited All Rights Reserved

4

Page 3: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

3

© 2013 – 2017 naked Agility Limited All Rights Reserved

Diego Lo Giudice and Dave West, Forrester

February 2011

Transforming Application Delivery

Firms today experience a much higher velocity of business change. Market opportunities appear or dissolve in months or weeks instead of years.

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

This is the story of:

Page 4: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

4

© 2013 – 2017 naked Agility Limited All Rights Reserved

Microsoft uses Team Services

© 2013 – 2017 naked Agility Limited All Rights Reserved

Developer Division

Page 5: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

5

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Faster Value Delivery

Increase flow of business value

Shorten cycle times Reduce re-work costs

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Schedule

Code Test & Stabilize Code Test & Stabilize

Beta RTM

Page 6: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

6

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Feedback

Planning

Customer feedback – we should

change the way a feature works. We

didn’t get it quite right…

… but we’re booked solid already.

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

S1 S2 S3 S4 S5 Stabilization S6

Story: Sprint 1-5

A

B

Page 7: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

7

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Now

2 years

3 weeks

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Deliver more value to customers

Faster responses to customers and market changes

Improved engineering satisfaction

2x productivity increase

Features Delivered per Year

https://www.visualstudio.com/en-us/articles/news/features-timeline

22

5865

111

262

249

2012 2013 2014 2015 2016 2017

Page 8: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

8

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Organization

Roles

Teams

Cadence

Taxonomy

Plan

Practices

Guiding Principles

Alignment

Autonomy

“Let’s try to give our teams three things…. Autonomy, Mastery, Purpose”

Page 9: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

9

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

AlignmentEvery team and business tracks scenarios and features consistently.

AutonomyEvery team chooses how to manage stories and/or tasks

Taxonomy & Staying Aligned

© 2013 – 2017 naked Agility Limited All Rights Reserved

Planning

Epic18 months

Aspirational (60%)

Plan3 sprints

Thoughtful (90%)

3

Sprint3 weeks

Confident (95%)

1

Season6 months

Hopeful (80%)

6Teams are responsible for the detail

Leadership is responsible

for the big picture

Page 10: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

10

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Scenarios

Features

Stories

Tasks

Aligned Autonomy

AlignmentThe big picture in light of our business goals

AutonomyThe detail about what we’ll deliver to achieve our business goals

Week 1 Week 2 Week 3

Week 1 Week 2 Week 3Week 2 Week 3

Sprint 69Sprint 68 Sprint 70

Sprint Planning Done!

Page 11: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

11

What we accomplished

Week 1 Week 2 Week 3

Week 1 Week 2 Week 3Week 2 Week 3

Sprint 69Sprint 68 Sprint 70

The sprint plan

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Sprint Mails

Value delivered

during the sprint

Video demonstrating

the value

What the team is

planning to accomplish

in the next sprint

Page 12: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

12

© 2013 – 2017 naked Agility Limited All Rights Reserved

It’s not 2 years, but…

• Updates were large

• Months apart

• Lots of problems!

4/1/2010 4/23/2012

5/3/2010

TFS 2010 RTM

4/23/2011

Service Deployment

8/5/2011

Service Update

9/26/2011

//BUILD 2011

12/7/2011

Service Update

1/30/2012

Service Update

2/20/2012

Service Update

3/12/2012

Service Update

4/2/2012

Service Update

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Organization Chart… before

Program Management Development Testing

Operations

Page 13: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

13

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Organization Chart

Program Management Engineering

Operations

Engineering

Program Management is responsible for:

WHAT we’re building, and

WHY we’re building it

Engineering is responsible for

HOW we’re building it, and that

we’re building it with QUALITY

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Teams

Program Management Engineering

Page 14: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

14

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Week 1 Week 2 Week 3

Week 1 Week 2 Week 3Week 2 Week 3

Sprint 69Sprint 68 Sprint 70

Deployment

Sprint Planning Done!

If it’s bad, YOU wake up

Page 15: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

15

© 2013 – 2017 naked Agility Limited All Rights Reserved

But we have many teams

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Everyone creates a branch…

Week 1 Week 2 Week 3

Page 16: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

16

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Writes a lot of code…

Week 1 Week 2 Week 3

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

It needs to come together…

Week 1 Week 2 Week 3

Page 17: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

17

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Merge Debt

Week 1 Week 2 Week 3

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Branching

Page 18: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

18

© 2013 – 2017 naked Agility Limited All Rights Reserved

Maintaining enterprise rigor

© 2013 – 2017 naked Agility Limited All Rights Reserved

Branching

Page 19: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

19

© 2013 – 2017 naked Agility Limited All Rights Reserved

Quality and Testing

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Quality- Before

Code Test & Stabilize Code Test & Stabilize

Beta RTM

Planning

Code

Complete

Page 20: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

20

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Quality- After

© 2013 – 2017 naked Agility Limited All Rights Reserved

Everything is going to production

Page 21: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

21

© 2013 – 2017 naked Agility Limited All Rights Reserved

Feature Flags

ON

OFF

© 2013 – 2017 naked Agility Limited All Rights Reserved

Feature Flags

ON

OFF

Page 22: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

22

© 2013 – 2017 naked Agility Limited All Rights Reserved

Feature Flags

ON

OFF

© 2013 – 2017 naked Agility Limited All Rights Reserved

Feature Flags

ON

OFF

Page 23: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

23

© 2013 – 2017 naked Agility Limited All Rights Reserved

Feature Flags

ON

OFF

© 2013 – 2017 naked Agility Limited All Rights Reserved

Feature Flags

ON

OFF

Page 24: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

24

© 2013 – 2017 naked Agility Limited All Rights Reserved

Feature Flags

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Multiple scale units enable canary testing

VSO SU1

Chicago

VSO SU0

San Antonio

VSO SU7

Australia

Shared Platform Services

San Antonio

Page 25: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

25

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

There’s no place like production!

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Health Dashboards

Page 26: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

26

© 2013 – 2017 naked Agility Limited All Rights Reserved

Customer IntelligenceBusiness IntelligenceOperational Intelligence

Gather everything

Dashboard DevOps Debug Experiments

© 2013 – 2017 naked Agility Limited All Rights Reserved

Getting the availability model right

Experience: Coverage too narrow as service footprint grows

Experience: Loses sensitivity as command volumes grow

Experience: Empathizes individual customer impact

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

-200

0

200

400

600

800

1000

1200

1400

1600

9/25/13 2:24 PM 9/25/13 3:36 PM 9/25/13 4:48 PM 9/25/13 6:00 PM 9/25/13 7:12 PM 9/25/13 8:24 PM 9/25/13 9:36 PM 9/25/13 10:48 PM

Sept 25th 2013 LSI

FailedExecutionCount SlowExecutionCount Start End Availability (ID4 - Activity Only) Availability (Current)

Page 27: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

27

© 2013 – 2017 naked Agility Limited All Rights Reserved

Alerting is key to fast detection

Every alert must be actionable and represent a real issue with the system.

Alerts should create a sense of urgency – false alerts dilutes that

Redundant alerts for same the issue

Needed to set right thresholds and tune often

Stateless alerts contributed to further noise

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Health model in action

• 3 errors for memory and

performance

• All 3 related to same

code defect

• APM component mapped to feature team

• Auto-dialer engaged Global DRI

Eliminated alert noise

~928 alerts per week to

~22 and reduced DRI

escalations by ~56%

Page 28: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

28

© 2013 – 2017 naked Agility Limited All Rights Reserved

© 2013 – 2017 naked Agility Limited All Rights Reserved

VSTS Scorecard

Page 29: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

29

© 2013 – 2017 naked Agility Limited All Rights Reserved

Time to MitigateTime to Detect

% o

f In

cid

en

ts

DRAFT

DRAFT

Microsoft Confidential 60

Service Availability & Health Metrics

DRAFT DRAFTDRAFT

Incid

en

t C

ou

nt

Incid

en

t C

ou

nt

DRAFT

DRAFTDRAFT

% o

f In

cid

en

ts

Use

r M

inu

tes

DRAFT

DRAFTDRAFT

Error By SourceIncidents by SeverityUser Impact Minutes During Incidents [TFS

Only]

3

2

1

4

1. TFS Availability is on an improving trend. No Sev0/Sev1 LSIs for July.

2. App Insights switched from synthetic availability to real-user experience in Ibiza portal. A high

volume of SEV-2 LSIs (72) contributed to customer impact in addition to intermittent UX errors.

(UX fixes applied on 8/11 that improves availability)

3. App Insights was impacted by 3 long running LSIs related to ES maintenance, Ibiza updates and an

Azure Storage outage.

4. TFS Service attainment (SLO) improved significantly MoM with focus on minimizing failed/slow

commands and reviewing in weekly LiveSite reviews

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Service status

Page 30: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

30

© 2013 – 2017 naked Agility Limited All Rights Reserved

RCA (Root Cause Analysis) transparency

© 2013 – 2017 naked Agility Limited All Rights Reserved

Summary

Page 31: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

31

© 2013 – 2017 naked Agility Limited All Rights Reserved

Martin Hinshelwood

[email protected]

• Hear about our journey: http://aka.ms/engineeringstories

• Learn how we deploy VSTS: https://blogs.msdn.microsoft.com/devops/2017/04/25/how-we-use-rm-part-1/

• Follow our ongoing journey and VSTS updates: https://blogs.msdn.microsoft.com/bharry/

• Use VSTS

• http://visualstudio.com/team-services

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Connect With Martin Hinshelwood:

65

+44 7811 164 522@MrHinsh

[email protected]

https://nkdAgility.com/blog

https://nkdagility.net/ScrumDayLondon2018-Enterprise

Page 32: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

32

Starting with what is most important/most pain, go from there

Designing metrics is as hard as designing features

Baking it into the review culture – from top to bottom –cadence is the heartbeat – spurs activity

© 2013 – 2017 naked Agility Limited All Rights Reserved

Page 33: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

33

© 2013 – 2017 naked Agility Limited All Rights Reserved

Changing the test portfolio balance

Tests should be written at the lowest level possible

Write once, run anywhere including production system

Product is designed for testability

Test code is product code, only reliable tests survive

Testing infrastructure is a shared Service

© 2013 – 2017 naked Agility Limited All Rights Reserved

Agenda

Page 34: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

34

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Feedback - Before

Code Test & Stabilize Code Test & Stabilize

Beta RTM

Planning

??

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Feedback - After

? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Page 35: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

35

© 2013 – 2017 naked Agility Limited All Rights Reserved

Agenda

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Staying connected

Chat Chat Chat Chat Chat Chat

Every 3 sprints we sit down with

the team for a “chat”

Page 36: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

36

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

•What’s next on your backlog?

•How are you doing with regards to debt?

•Any issues?

Team “Chats”

Version Control

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Team “Chats”

Page 37: A DevOps Story from the trenches - ScotSoft · 11/05/2018 2 ©2013 –2017 naked Agility Limited All Rights Reserved ©2013 –2017 naked Agility Limited All Rights Reserved 4

11/05/2018

37

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Team “Chats”

© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh

Sprint mails

Plan Accomplished