94
Practical Operability Techniques for Teams Matthew Skelton Conflux confluxdigital.net / @ConfluxHQ Continuous Lifecycle London, 17 May 2018

confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Practical Operability Techniques for Teams

Matthew SkeltonConflux

confluxdigital.net / @ConfluxHQ

Continuous Lifecycle London, 17 May 2018

Page 2: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Practical operabilityWhy do we need a focus on operability?5 practical operability techniques to try

2

Page 3: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

use modern logging, Run Book dialogue sheets, endpoint

healthchecks, correlation IDs, and user personas as

team collaboration techniques3

Page 4: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

About meMatthew Skelton, Conflux

@matthewpskelton

matthewskelton.net

Leeds, UK

4

assemblyconf.com

Page 5: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Team Guide to Software OperabilityMatthew Skelton & Rob Thatcher

skeltonthatcher.com/publications

Download a free sample chapter

5

Page 6: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

You?Software Developer

Tester / QA

DevOps Engineer

Team Leader

Head of Department

6

Page 7: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Practical Operability Techniques

for Teams

7

Page 8: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Operability

making software work well in Production

8

Page 9: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Operability

9

Page 10: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

“But can’t we just give those things to a DevOp?”

10

Page 11: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

“But can’t we just give those things to the

DevOp?”

11

Page 12: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Operability is a shared concern

#BizDevTestSecOps12

Page 13: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Operability is a shared concern

#BizDevTestSecOps13

Page 14: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

14

Page 15: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

15

A self-managed Kubernetes cluster near you

Page 16: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Operability is a shared concern

#BizDevTestSecOps16

Page 17: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Practical operability techniques1. Modern logging2. Run Book dialogue sheets3. Endpoint healthchecks4. Correlation IDs5. Lightweight User Personas

17

Page 18: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lack of observability for distributed systems

18

Page 19: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

19

Logging with Event IDs

Page 20: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Modern logging w/ Event IDsDistinct application states

No “logorrhoea” (!)

Distributed tracing via logs

Build a shared understanding

20

Page 21: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

search by event

Event ID

{Delivered, InTransit, Arrived}

21

Page 22: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

transaction trace

Correlation ID

612999958…

22

Page 23: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

How many distinct event types (state transitions) in

your application?

23

Page 24: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

24

Page 25: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

represent distinct states

25

Page 26: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

enum

Human-readable sets: unique values, sparse,

immutable

C#, Java, Python, node(Ruby, PHP, …)

26

Page 27: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

public enum EventID { // Badly-initialised logging data NotSet = 0, // An unrecognised event has occurred UnexpectedError = 10000, ApplicationStarted = 20000, ApplicationShutdownNoticeReceived = 20001, MessageQueued = 40000, MessagePeeked = 40001,

BasketItemAdded = 60001, BasketItemRemoved = 60002,

CreditCardDetailsSubmitted = 70001,

// ... }

27

Page 28: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

BasketItemAdded = 60001BasketItemRemoved = 60002

28

Page 29: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

log using Event IDs with a modern

‘structured logging’ library

29

Page 30: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

example:https://github.com/EqualExperts/opslogger

Sean Reilly@seanjreilly 30

Page 31: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Example: video processingOn-demand processing of TV and mobile streaming advertsAd-agency → TV broadcasterHigh throughputGlitch-free video & audio31

Page 32: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Storage I/O

Worker Job

Queue

Upload

32

Page 33: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

33

Page 34: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

34

Page 35: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Example: video processingDiscover processing bottlenecksTrigger alerts Report on KPIsTarget areas for improvement

35

Page 36: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Modern logging w/ Event IDsreduce time to detect problems

increase team engagement

increase configurability

enhance collaboration

36

Page 37: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Modern Logging: Collaborate on Event IDs and Correlation traces for better system awareness

37

Page 38: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Operational aspects not addressed, or addressed

too late in the cycle

38

Page 39: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Run Book dialogue sheets

Page 40: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Run Book dialogue sheetsChecklists for typical operational considerations

Team-friendly exploration

40

Page 41: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

runbooktemplate.infoRun Book dialogue sheets

41

Page 42: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

System characteristics

Hours of operation

During what hours does the service or system actually need to operate? Can portions or features of the system be unavailable at times if needed?

Hours of operation - core features

(e.g. 03:00-01:00 GMT+0)

Hours of operation - secondary features

(e.g. 07:00-23:00 GMT+0)

Data and processing flows

How and where does data flow through the system? What controls or triggers data flows?(e.g. mobile requests / scheduled batch jobs / inbound IoT sensor data )

… 42

Page 43: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

http://runbooktemplate.info/

43

Page 44: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

runbooktemplate.infoRun Book dialogue sheets

44

Page 45: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Run Book dialogue sheetsEarly discovery of operational requirementsInput to team backlog“Shift-left” testingAvoid operational problems

45

Page 46: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Run Book dialogue sheets: Collaborate on operational

requirements for better system awareness

46

Page 47: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

“Why has my deployment failed again?”

“Why is Pre-Prod always so flaky?”

47

Page 48: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Endpoint healthchecks

Page 49: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Endpoint healthchecksSimple HTTP check Common way to assess any service/app/componentKey operational requirement

49

Page 50: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

endpoint healthchecks

Every runnable app/service/daemon exposes /status/health

An HTTP GET to the endpoint returns:200 – "I am healthy"

500 – "I am sick" 50

Page 51: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

endpoint healthchecks

Each component is responsible for determining its own health and viability – this is very contextual

51

Page 52: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

endpoint healthchecks

Use JSON as a response type – parsable by both

machines and humans!

52

Page 53: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

53

Page 54: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

endpoint healthchecks

For databases and other non-HTTP components, run a lightweight HTTP

service in front of the component200 / 500 responses

54

Page 55: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Helper service

55

Page 56: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

https://github.com/Lugribossk/simple-dashboard

56

Page 57: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

57

Question:

What does this look like for Serverless?

¯\_(ツ)_/¯

Page 58: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Endpoint healthchecksRapid dashboarding and diagnosisReduce confusion around environment state“Fail fast” aka “learn sooner”

58

Page 59: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Endpoint healthchecks: Collaborate on component

health status for better system awareness

59

Page 60: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

“Which containers [servers] handled the

request?”

60

Page 61: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Correlation IDs

Page 62: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Correlation IDsUnique-ish identifiersTrace calls across machine & container boundariesRe-assemble HTTP request/response later

62

Page 63: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

‘Unique-ish’ identifier for each request

Passed through downstream layers

63

Page 64: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Unique-ish ID64

Page 65: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Synchronous HTTP:

X-HEADER e.g. X-trace-idX-trace-id: 348e1cf8

If header is present, pass it on

(Yes, RFC6648, but this is internal only)

65

Page 66: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Asynchonous (queues, etc.):

Message Attributes, name:value paire.g. "trace-id":"348e1cf8"

AWS SQS: SendMessage() / ReceiveMessage()

Log the Correlation ID if present

66

Page 67: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Example: OpenTracing / PCF

3 tracing elements:TraceID, SpanID, ParentSpan

"X-B3-TraceId" "X-B3-SpanId" "X-B3-ParentSpan"

67

Page 68: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Example: OpenTracing / PCF

Always log the TraceID as-isLog calling SpanID as ParentSpan

Log new SpanID

68

Page 69: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Trace

Span

ParentSpan

69

Page 70: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Correlation IDsDetect bottlenecks and unexpected interactionsIncrease transparencyLearn about the system

70

Page 71: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Correlation IDs: Collaborate on distributed tracing for better system

awareness71

Page 72: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Software is difficult to operate: poor UX for Ops.

72

Page 73: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight user personas

Page 74: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight User PersonasSimple characterisation of user needs for Dev/Test/OpsBased on full UX user personas but less detailed

74

Page 75: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight user personas:

Ops EngineerTest Engineer

Build & Deployment EngineerService Owner

75

Page 76: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight user personas:

Consider the User Experience (UX) of engineers and team members using

and working with the software

76

Page 77: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

http://www.keepitusable.com/blog/?tag=alan-cooper 77

Motivations

Goals

Frustrations

Page 78: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight user personas:

What data does the User Persona need visible on a dashboard in order to make decisions rapidly & safely?

78

Page 79: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

https://www.geckoboard.com/blog/visualisation-upgrades-progressing-towards-a-more-useful-and-beautiful-dashboard/ 79

Page 80: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight User PersonasEmpathise better with people from other rolesCapture missing operational requirements

80

Page 81: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight User Personas: Collaborate on user needs

for better system awareness

81

Page 82: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Summary

82

Page 83: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Operability

making software work well in Production

83

Page 84: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

84

Page 85: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lack of observabilityOperational aspects not known“Why has deployment failed?”

What handled the request?Poor UX for Ops

85

Page 86: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Logging with Event IDs

use enum-based Event IDs to explore runtime behaviour and

fault conditions86

Page 87: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Run Book dialogue sheets

explore and establish operational requirements as a team, around a physical table,

together87

Page 88: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Endpoint healthchecks

HTTP 200 / 500 responses to /status/health call with JSON details – good for tools and

humans88

Page 89: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Correlation IDs

trace execution using correlation IDs:

synchronous (HTTP X-trace-id) async (SQS MessageAttribute)

89

Page 90: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Lightweight user personas

explore the UX and needs of different roles for rapid

decisions via dashboards90

Page 91: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

use modern logging, Run Book dialogue sheets, endpoint

healthchecks, correlation IDs, and user personas as

team collaboration techniques91

Page 92: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Team Guide to Software OperabilityMatthew Skelton & Rob Thatcher

skeltonthatcher.com/publications

Download a free sample chapter

92

Page 93: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

Resources•Team Guide to Software Operability by Matthew Skelton and Rob Thatcher (Skelton Thatcher Publications, 2016) http://operabilitybook.com/ •Run Book template & Run Book dialogue sheets http://runbooktemplate.info/

93

Page 94: confluxdigital.net / @ConfluxHQ Conflux Practical …continuouslifecycle.london/wp-content/uploads/2017/12/...Can portions or features of the system be unavailable at times if needed?

thank you

@ConfluxHQconfluxdigital.net