Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
11/05/2018
1
© 2013 – 2017 naked Agility Limited All Rights Reserved
A DevOps Story from the
trenches@MrHinsh 1
Martin Hinshelwood | @MrHinsh
[email protected] | http://nkdagility.com/blog
© 2013 – 2017 naked Agility Limited All Rights Reserved
2
11/05/2018
2
© 2013 – 2017 naked Agility Limited All Rights Reserved
© 2013 – 2017 naked Agility Limited All Rights Reserved
4
11/05/2018
3
© 2013 – 2017 naked Agility Limited All Rights Reserved
Diego Lo Giudice and Dave West, Forrester
February 2011
Transforming Application Delivery
Firms today experience a much higher velocity of business change. Market opportunities appear or dissolve in months or weeks instead of years.
“
”
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
This is the story of:
11/05/2018
4
© 2013 – 2017 naked Agility Limited All Rights Reserved
Microsoft uses Team Services
© 2013 – 2017 naked Agility Limited All Rights Reserved
Developer Division
11/05/2018
5
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Faster Value Delivery
Increase flow of business value
Shorten cycle times Reduce re-work costs
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Schedule
Code Test & Stabilize Code Test & Stabilize
Beta RTM
11/05/2018
6
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Feedback
Planning
Customer feedback – we should
change the way a feature works. We
didn’t get it quite right…
… but we’re booked solid already.
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
S1 S2 S3 S4 S5 Stabilization S6
Story: Sprint 1-5
A
B
11/05/2018
7
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Now
2 years
3 weeks
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Deliver more value to customers
Faster responses to customers and market changes
Improved engineering satisfaction
2x productivity increase
Features Delivered per Year
https://www.visualstudio.com/en-us/articles/news/features-timeline
22
5865
111
262
249
2012 2013 2014 2015 2016 2017
11/05/2018
8
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Organization
Roles
Teams
Cadence
Taxonomy
Plan
Practices
Guiding Principles
Alignment
Autonomy
“Let’s try to give our teams three things…. Autonomy, Mastery, Purpose”
11/05/2018
9
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
AlignmentEvery team and business tracks scenarios and features consistently.
AutonomyEvery team chooses how to manage stories and/or tasks
Taxonomy & Staying Aligned
© 2013 – 2017 naked Agility Limited All Rights Reserved
Planning
Epic18 months
Aspirational (60%)
Plan3 sprints
Thoughtful (90%)
3
Sprint3 weeks
Confident (95%)
1
Season6 months
Hopeful (80%)
6Teams are responsible for the detail
Leadership is responsible
for the big picture
11/05/2018
10
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Scenarios
Features
Stories
Tasks
Aligned Autonomy
AlignmentThe big picture in light of our business goals
AutonomyThe detail about what we’ll deliver to achieve our business goals
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3Week 2 Week 3
Sprint 69Sprint 68 Sprint 70
Sprint Planning Done!
11/05/2018
11
What we accomplished
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3Week 2 Week 3
Sprint 69Sprint 68 Sprint 70
The sprint plan
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Sprint Mails
Value delivered
during the sprint
Video demonstrating
the value
What the team is
planning to accomplish
in the next sprint
11/05/2018
12
© 2013 – 2017 naked Agility Limited All Rights Reserved
It’s not 2 years, but…
• Updates were large
• Months apart
• Lots of problems!
4/1/2010 4/23/2012
5/3/2010
TFS 2010 RTM
4/23/2011
Service Deployment
8/5/2011
Service Update
9/26/2011
//BUILD 2011
12/7/2011
Service Update
1/30/2012
Service Update
2/20/2012
Service Update
3/12/2012
Service Update
4/2/2012
Service Update
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Organization Chart… before
Program Management Development Testing
Operations
11/05/2018
13
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Organization Chart
Program Management Engineering
Operations
Engineering
Program Management is responsible for:
WHAT we’re building, and
WHY we’re building it
Engineering is responsible for
HOW we’re building it, and that
we’re building it with QUALITY
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Teams
Program Management Engineering
11/05/2018
14
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3Week 2 Week 3
Sprint 69Sprint 68 Sprint 70
Deployment
Sprint Planning Done!
If it’s bad, YOU wake up
11/05/2018
15
© 2013 – 2017 naked Agility Limited All Rights Reserved
But we have many teams
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Everyone creates a branch…
Week 1 Week 2 Week 3
11/05/2018
16
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Writes a lot of code…
Week 1 Week 2 Week 3
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
It needs to come together…
Week 1 Week 2 Week 3
11/05/2018
17
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Merge Debt
Week 1 Week 2 Week 3
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Branching
11/05/2018
18
© 2013 – 2017 naked Agility Limited All Rights Reserved
Maintaining enterprise rigor
© 2013 – 2017 naked Agility Limited All Rights Reserved
Branching
11/05/2018
19
© 2013 – 2017 naked Agility Limited All Rights Reserved
Quality and Testing
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Quality- Before
Code Test & Stabilize Code Test & Stabilize
Beta RTM
Planning
Code
Complete
11/05/2018
20
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Quality- After
© 2013 – 2017 naked Agility Limited All Rights Reserved
Everything is going to production
11/05/2018
21
© 2013 – 2017 naked Agility Limited All Rights Reserved
Feature Flags
ON
OFF
© 2013 – 2017 naked Agility Limited All Rights Reserved
Feature Flags
ON
OFF
11/05/2018
22
© 2013 – 2017 naked Agility Limited All Rights Reserved
Feature Flags
ON
OFF
© 2013 – 2017 naked Agility Limited All Rights Reserved
Feature Flags
ON
OFF
11/05/2018
23
© 2013 – 2017 naked Agility Limited All Rights Reserved
Feature Flags
ON
OFF
© 2013 – 2017 naked Agility Limited All Rights Reserved
Feature Flags
ON
OFF
11/05/2018
24
© 2013 – 2017 naked Agility Limited All Rights Reserved
Feature Flags
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Multiple scale units enable canary testing
VSO SU1
Chicago
VSO SU0
San Antonio
VSO SU7
Australia
Shared Platform Services
San Antonio
11/05/2018
25
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
There’s no place like production!
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Health Dashboards
11/05/2018
26
© 2013 – 2017 naked Agility Limited All Rights Reserved
Customer IntelligenceBusiness IntelligenceOperational Intelligence
Gather everything
Dashboard DevOps Debug Experiments
© 2013 – 2017 naked Agility Limited All Rights Reserved
Getting the availability model right
Experience: Coverage too narrow as service footprint grows
Experience: Loses sensitivity as command volumes grow
Experience: Empathizes individual customer impact
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
-200
0
200
400
600
800
1000
1200
1400
1600
9/25/13 2:24 PM 9/25/13 3:36 PM 9/25/13 4:48 PM 9/25/13 6:00 PM 9/25/13 7:12 PM 9/25/13 8:24 PM 9/25/13 9:36 PM 9/25/13 10:48 PM
Sept 25th 2013 LSI
FailedExecutionCount SlowExecutionCount Start End Availability (ID4 - Activity Only) Availability (Current)
11/05/2018
27
© 2013 – 2017 naked Agility Limited All Rights Reserved
Alerting is key to fast detection
Every alert must be actionable and represent a real issue with the system.
Alerts should create a sense of urgency – false alerts dilutes that
Redundant alerts for same the issue
Needed to set right thresholds and tune often
Stateless alerts contributed to further noise
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Health model in action
• 3 errors for memory and
performance
• All 3 related to same
code defect
• APM component mapped to feature team
• Auto-dialer engaged Global DRI
Eliminated alert noise
~928 alerts per week to
~22 and reduced DRI
escalations by ~56%
11/05/2018
28
© 2013 – 2017 naked Agility Limited All Rights Reserved
© 2013 – 2017 naked Agility Limited All Rights Reserved
VSTS Scorecard
11/05/2018
29
© 2013 – 2017 naked Agility Limited All Rights Reserved
Time to MitigateTime to Detect
% o
f In
cid
en
ts
DRAFT
DRAFT
Microsoft Confidential 60
Service Availability & Health Metrics
DRAFT DRAFTDRAFT
Incid
en
t C
ou
nt
Incid
en
t C
ou
nt
DRAFT
DRAFTDRAFT
% o
f In
cid
en
ts
Use
r M
inu
tes
DRAFT
DRAFTDRAFT
Error By SourceIncidents by SeverityUser Impact Minutes During Incidents [TFS
Only]
3
2
1
4
1. TFS Availability is on an improving trend. No Sev0/Sev1 LSIs for July.
2. App Insights switched from synthetic availability to real-user experience in Ibiza portal. A high
volume of SEV-2 LSIs (72) contributed to customer impact in addition to intermittent UX errors.
(UX fixes applied on 8/11 that improves availability)
3. App Insights was impacted by 3 long running LSIs related to ES maintenance, Ibiza updates and an
Azure Storage outage.
4. TFS Service attainment (SLO) improved significantly MoM with focus on minimizing failed/slow
commands and reviewing in weekly LiveSite reviews
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Service status
11/05/2018
30
© 2013 – 2017 naked Agility Limited All Rights Reserved
RCA (Root Cause Analysis) transparency
© 2013 – 2017 naked Agility Limited All Rights Reserved
Summary
11/05/2018
31
© 2013 – 2017 naked Agility Limited All Rights Reserved
Martin Hinshelwood
• Hear about our journey: http://aka.ms/engineeringstories
• Learn how we deploy VSTS: https://blogs.msdn.microsoft.com/devops/2017/04/25/how-we-use-rm-part-1/
• Follow our ongoing journey and VSTS updates: https://blogs.msdn.microsoft.com/bharry/
• Use VSTS
• http://visualstudio.com/team-services
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Connect With Martin Hinshelwood:
65
+44 7811 164 522@MrHinsh
https://nkdAgility.com/blog
https://nkdagility.net/ScrumDayLondon2018-Enterprise
11/05/2018
32
Starting with what is most important/most pain, go from there
Designing metrics is as hard as designing features
Baking it into the review culture – from top to bottom –cadence is the heartbeat – spurs activity
© 2013 – 2017 naked Agility Limited All Rights Reserved
11/05/2018
33
© 2013 – 2017 naked Agility Limited All Rights Reserved
Changing the test portfolio balance
Tests should be written at the lowest level possible
Write once, run anywhere including production system
Product is designed for testability
Test code is product code, only reliable tests survive
Testing infrastructure is a shared Service
© 2013 – 2017 naked Agility Limited All Rights Reserved
Agenda
11/05/2018
34
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Feedback - Before
Code Test & Stabilize Code Test & Stabilize
Beta RTM
Planning
??
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Feedback - After
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
11/05/2018
35
© 2013 – 2017 naked Agility Limited All Rights Reserved
Agenda
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Staying connected
Chat Chat Chat Chat Chat Chat
Every 3 sprints we sit down with
the team for a “chat”
11/05/2018
36
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
•What’s next on your backlog?
•How are you doing with regards to debt?
•Any issues?
Team “Chats”
Version Control
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Team “Chats”
11/05/2018
37
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Team “Chats”
© 2013 – 2017 naked Agility Limited All Rights Reserved@MrHinsh
Sprint mails
Plan Accomplished