17
Problem Management Familiarisation Training Michael Hall Real-World IT www.real-worldit.com

Problem Management Familiarisation Training Michael Hall Real-World IT

Embed Size (px)

DESCRIPTION

© Real-World IT 2014 All rights reserved Incidents and Problems – What’s the Difference? 3 ITIL® 2011 defines an incident as:  An unplanned interruption to a business service,  A reduction in the quality of a service, or  The failure of a CI that has not yet impacted a service (2011, Service Operations, p 72) While a problem is defined as:  The underlying cause of one or more incidents (2011, Service Operations, p 97) In other words:  Incidents stop services being useful  Problems are why they happen ITIL® is a Registered Trade Mark of AXELOS Limited

Citation preview

Page 1: Problem Management Familiarisation Training Michael Hall Real-World IT

Problem Management Familiarisation TrainingMichael HallReal-World ITwww.real-worldit.com

Page 2: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

What We Cover Incidents and Problems – What’s the Difference?

Definition of Problem Management

Difference from Incident Management

Keys to Success

Advantages of structured problem solving

How we work together – the ‘Rules of Engagement’

How we run problems - logistics

How we run problems - structure

Appendix Solving problems as a group Benefits to the business and IT teams Key Performance Indicators

2

Page 3: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Incidents and Problems – What’s the Difference?

3

ITIL® 2011 defines an incident as: An unplanned interruption to a business service, A reduction in the quality of a service, or The failure of a CI that has not yet impacted a service(2011, Service Operations, p 72)

While a problem is defined as: The underlying cause of one or more incidents(2011, Service Operations, p 97)

In other words: Incidents stop services being useful

Problems are why they happen

ITIL® is a Registered Trade Mark of AXELOS Limited

Page 4: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

So what is Problem Management?ITIL® 2011 defines the objectives of Problem Management as:

“Prevent problems and resulting incidents from happening Eliminate recurring incidents Minimise the impact of incidents that cannot be prevented”(2011, Service Operations, p 97)

In other words: Finding the cause and

Fixing it so it cannot happen again

Will make a difference to stability

ITIL® is a Registered Trade Mark of AXELOS Limited

4

Page 5: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

How is it different from Incident Management? Incident management is focused

on restoring service• Reactive by nature• Minimise time to restore• Minimise business impact• But by itself does not reduce the

long term incidence of service interruptions

Problem management is focused on identifying the root cause• Establish the real reason for the

incident• Execute a plan to fix the cause

permanently

5

Page 6: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Keys to success:• Good handover from Incident Mgmt• Structured investigation methods• Key staff collaborate on investigation• Confirm root cause, then work out how to fix it• Fully costed solution options• Only proceed with fixes when approved• Check solution really has fixed the cause• Only close problems when definitely fixed• Report on success – what won’t happen again

6

Problem Detection

Problem Investigation

Error Resolution

Review and Closure

Root Cause Confirmed

What Makes Problem Management Successful?

Page 7: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Use structured methods to improve• Speed to root cause - standard approach• Consistency - based on evidence• Certainty that real causes are found• Collaboration – teams know what to

expect Repeatable process* used every time:

• Define the problem precisely• Use rapid analysis first for root cause

◦ Why did that object have that fault?◦ Repeat until cause is clear – 4 to 6 questions

• If rapid analysis does not reveal cause, move to IS/BUT NOT and possible causes◦ Identify more about the problem◦ Find possible causes◦ Test each logically to confirm true cause

• Decide how to fix the problem◦ Develop options and choose most effective◦ Confirm actions and costs with customer

• Implement the solution• Verify the problem has been eliminated

*This is KEPNERandFOURIE, insert your preferred method here

7

06

Average time to root cause

Days

62%

Clear communication to customers of what happened and why, plus how and when a permanent fix will be deployed

Why Adopt Structured Problem Solving?

Page 8: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

How we work together - the ‘Rules of Engagement’ Assign no blame• What happened and why?• What can be done to stop it happening again?• Human error? No such thing!

Attend problem sessions • Problem solving as a group of experts• Clear statement of the problem• Gather all the facts• List all possible causes

Suspend Judgement• Keep an open mind • Assume you do not know cause• Fit theories to facts, not the other way around

Make problem tasks a priority• Take responsibility for your tasks• Make your management aware• Raise conflicts to the problem manager• Don’t close tasks without review and approval

8

Page 9: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

How we run problem investigations - logisticsProblem investigations are usually run as bridge calls. Join the bridge when asked. Who will be engaged is usually negotiated in advance with your management Bridge Lines

• Dial-in details will always be in the invitation Timing

• For major problems, problem management should hold the initial call within 24 hours of service restoration

• Aim is to keep the evidence fresh in people’s minds• As many follow-up calls as necessary are held to get to cause and then resolution

Reporting• Problem management has a commitment to produce regular progress reports• The first is published immediately after the first analysis call is held• Follow-up reports published as required – regularly while root cause is being

investigated, then at significant milestones until resolved• The problem investigation team is always copied on these reports

9

Page 10: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

How we run problem investigations – structure – 1All problem investigations have a standard agenda that we always follow.The aim is to confirm root cause as quickly as possible – ‘Root Cause Analysis’ - then to decide what the permanent fix should be – ‘Problem Resolution’ (also called ‘Error Resolution’). Define the problem clearly

• Start from technical cause, if found during incident response. If not, determine technical cause first

• Make sure this statement is about one object with one fault• This statement is always an event in time

Use a rapid analysis strategy first to determine root cause, using a cascade of questions:• Ask ‘Why did that object have that fault?’• Repeat until the underlying cause is clear (usually four to six questions does it)• Remember that the test for a root cause is:

• ‘If I fix this, will it stop repeats of the incident?’

10

Page 11: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

If rapid analysis does not reveal root cause easily, move on to an ‘IS/BUT NOT’ and possible causes analysis:• Use the KEPNERandFOURIE method to ask 8 questions about the problem.• For each question, also quickly brainstorm possible causes before moving to the

next. Do not judge or try to filter at this point. Suspend judgement and record all suggestions.

• Select the most likely of the possible causes for verification (usually up to 4 or 5).• Test each logically using the KEPNERandFOURIE technique• The cause that meets all criteria is the most likely root cause• Remember that there can also be contributing factors and multiple causes that

occur together to cause the problem

• Note: This structure is based on the KepnerandFourie TM methodology. Substitute the methodology as required

How we run problem investigations – structure – 2

11

Page 12: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Once cause is found and confirmed, you are half way there. The next step is to develop a solution and implement it. Develop a solution for the cause or causes

• Develop options and decide the best and most cost -effective• Obtain approval for implementation, including spend, timing and resources

Implement the solution• Track any implementation steps and dates until the solution is in place

Verify the problem has been eliminated• Has the solution prevented future incidents?

Report success to our customers• Make sure people know that the problem is solved

How we run problem investigations – structure – 3

12

Page 13: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Appendix Solving problems as a group

Benefits to the business and IT teams

Key Performance Indicators

13

Page 14: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Why solve problems as a group? Evidence shows that group problem-solving is more effective

• More effective than individual efforts or investigations run by groups of like-minded individuals.

• Diversity in perspective and ways of thinking lead to better outcomes than even the best problem solvers.

• Humans are 'good at producing convincing arguments, but we are also adept at puncturing other people's faulty reasoning‘

Mix of ‘insiders and outsiders’ critical • Too many insiders leads to uniform thinking, while too many outsiders dampens

the free exchange of ideas• Ignoring the ‘outside view’ limits our thinking• Diverse groups generate many more interesting ideas to help solve problems

So take advantage of it!

• References: Hong and Page (2004), Jones D (2012), Kahnemann (2012)

14

Page 15: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

What are the Business Benefits? Problem management increases the overall stability of systems supporting business

operationsDirectly• Fixing problems – finding root causes and resolving them• Not just visible problems – potential problems as well• Reducing the number of recurring incidents Indirectly• Influence design of applications and infrastructure• Improved audit and regulatory compliance – effective governance seen to be done

Cross silo and cross regional resolutions• Problems addressed comprehensively

◦ Identify all instances that could be affected◦ Resolution plans with global reach

Improve visibility of problems• Regular reporting to senior management

◦ Accountability for driving improvement

• KPIs in place to measure success◦ Make performance visible to all

15

Page 16: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Improved stability• Gets rid of recurring problems• Reduces fire-fighting – more time for “value-add”

work Framework for problem solving

• Gives more confidence when attacking problems• Makes engaging the right people much easier• Clarifies which problems are being worked on• And who owns each problem• Results in happier customers and colleagues

Satisfaction• Finds the cause without witch hunts or blame• Quickly shares knowledge about problems for

everyone’s benefit• Makes working together across teams much easier• Delivers a higher success rate and a faster turn

around• Automates reporting to replace tedious manual work

And what are the Benefits for IS Teams?

16

Page 17: Problem Management Familiarisation Training Michael Hall Real-World IT

© Real-World IT 2014 All rights reserved

Measuring the difference we make – Key Performance Indicators (KPIs)

% of root cause being found:• 95% of root causes will be found

% root cause found within 5 working days:• Root cause will be found within 5 working days 80% of the

time Number of recurring problems:

• Zero recurring problems % of problems resolved within agreed time frame:

• 90% of problems will be resolved in the time frame agreed by management

% reduction in incidents from agreed baseline:• 25% reduction in incidents per period

These KPIs will go live with looser targets that will tighten as the problem management implementation progresses

17