18
LEARNING FROM FAILURE 2 DR JOHN ROOKSBY

CS5032 Lecture 10: Learning from failure 2

Embed Size (px)

Citation preview

Page 1: CS5032 Lecture 10: Learning from failure 2

LEARNING FROM FAILURE 2DR JOHN ROOKSBY

Page 2: CS5032 Lecture 10: Learning from failure 2

IN THIS LECTURE

This lecture will focus on how accidents and serious incidents are investigated and analysed

• When do investigations happen?

• How are they conducted?

• What analytical methods are used?

Page 3: CS5032 Lecture 10: Learning from failure 2

INVESTIGATIONS

Accidents and failures need to be investigated.

• Investigations enable you to identify the (most likely) causes of accidents and failures

• The causes and conditions leading up to accidents and incidents are often complex, and immediate reactions may be wrong

• Investigations seek to uncover underlying causes, not just the immediate causes

• Investigations should also address how future accidents can be avoided

• In this context, investigations are primarily to prevent future occurrences than establish responsibility

Page 4: CS5032 Lecture 10: Learning from failure 2

INVESTIGATIONS

The basic steps of an investigation are

1. Collection phase: Evidence, and facts are sought

2. Analysis phase: The evidence and facts are analysed and opinions invited from experts and other parties

3. Judgements phase: Judgements are made about the causes of an incident or accident and the associated responsibilities

4. Follow up: Recommendations should be made on how to stop similar problems happening again.

In practice, the process will be iterative.

Page 5: CS5032 Lecture 10: Learning from failure 2

INVESTIGATIONS

There are limitations on the investigation process. Investigations can be costly.

• We may never know all the facts. With complex systems, it can be very hard or impossible to know everything that happened in the run up to an incident. In major accidents sources of evidence may be damaged or lost.

• There will always be subjective views and uncertainties, especially around human actions.

Judgements need to be made about the extent to which an incident can be investigated.

Investigations often conclude with the “likely” or “probable” causes rather than a definitive version of events

Page 6: CS5032 Lecture 10: Learning from failure 2

WHO INVESTIGATES?

The scope and emphasis of an investigation is likely to reflect the position of the investigator

An investigator ought to be independent.

• In practice, this can be hard to achieve.• Some industries have an official, independent investigation

organisation• In the event of a major incident, a ‘public enquiry’ may be

used, in which the evidence and investigative process is made public and so open to scrutiny.

Page 7: CS5032 Lecture 10: Learning from failure 2
Page 8: CS5032 Lecture 10: Learning from failure 2

ANALYSIS

The analysis phase of an investigation needs to explore and evaluate often complex information.

Experts and specialists may need to be involved at this point.

There is no standard method for analysing an accident, and continuing debate about how this is best done.

Approaches include

• Narrative approaches• Causal chains• Systems approaches

Page 9: CS5032 Lecture 10: Learning from failure 2

NARRATIVE APPROACHES

All accident investigations will produce a narrative of some kind. Many reports are purely a narrative and a set of conclusions. A narrative is a written account of an incident or accident.

• Producing this can be non-trivial because it can be difficult to structure events, many of which may have occurred simultaneously and many of which may have ambiguities, into a linear document.

Producing a narrative is a key step in making sense of an incident

Narrative accounts have serious limitations however. It is difficult to evaluate their depth and coverage, and they tend to ‘storify’ complex events.

Page 10: CS5032 Lecture 10: Learning from failure 2

“ROOT CAUSE” APPROACHES

Many approaches have been developed to systematically identify the root causes of an incident. These approaches are based on the idea that the immediate events in an incident are symptoms of a much deeper problem.

Root cause analysis techniques usually express events as a chain. The chains often branch, and multiple chains can be synchronised to represent parallel events.

• Examples: MORT (management oversight risk tree), FMEA (Failure mode and effects analysis) , Barrier analysis, WBA (Why-because Analysis)

Page 11: CS5032 Lecture 10: Learning from failure 2

“ROOT CAUSE” APPROACHES - LIMITATIONS

The stopping problem• A causal chain could in theory go backwards indefinitely.

The proximity problem• A root cause is often found to be something proximal to

the accident (often a human operator).

The causation problem• Hindsight and investigative biases frame particular

actions in terms of their contributions to an outcome

However, this does not mean that it is wrong to try to identify underlying causes

Investigations usually refer to the “likely” or “probable” root causes

Page 12: CS5032 Lecture 10: Learning from failure 2

SYSTEMS METHODS

Systems methods for accident analysis have come into use over the last decade.

• From this perspective, accidents result from inadequate control or enforcement of safety-related constraints on the development, design, and operation of the systems.

Systems methods emphasise controls over the system itself. This recognises that no system is inherently safe, and that systems (particularly socio-technical systems) adapt and change over time.

• A key approach is STAMP (Systems-Theoretic Accident Model and Processes).

Page 13: CS5032 Lecture 10: Learning from failure 2

SYSTEMS METHODS

Key criticisms of systems models

• They are often used as a means of pursuing and attributing blame to high level people in an organisation

• They can turn attention too far away from the actual design and implementation of the technology

Page 14: CS5032 Lecture 10: Learning from failure 2

HINDSIGHT AND FORESIGHT

It is essential to learn from our mistakes, but we should not wait for accidents to happen before we try to improve the dependability of systems. How can we predict problems that may occur? How can we ensure systems are resilient to possible problems.

Several of the methods mentioned in this lecture can be used to follow through the consequences of possible problems or failures.

Predicting possible causes and consequences of failure, unless in very narrow circumstances, can involve many arbitrary decisions.

Page 15: CS5032 Lecture 10: Learning from failure 2

COLUMBIA

htt

p:/

/en

.wik

ipe

dia

.org

/wik

i/Sp

ace

_S

hu

ttle

_C

olu

mb

ia

Page 16: CS5032 Lecture 10: Learning from failure 2

INVESTIGATIONThe Columbia Accident Investigation Board was an independent board set up to analyse the Columbia disaster

• 13 board members and many investigators

• Investigation took around 5 months

• Cost approximately 17 million dollars

• 230 page report produced

The proximal cause was fairly clear from the outset. The investigation sought to focus on underlying causes.

• The investigation focused on organisational, historical, budgetary and political factors in the shuttle programme

• The questions surrounded the issue that foam strikes were routinely ignored

Page 17: CS5032 Lecture 10: Learning from failure 2

KEY POINTS

Investigations are important for learning from failures.

Investigations often show that initial assumptions about the cause of an incident are wrong or partial. They aim to find underlying or “root” causes.

All investigations involve some sort of judgement. Investigations should be as neutral as possible, but in practice this is difficult to achieve.

There are many methods for analysing an incident or accident, and no single right way to do this.

Page 18: CS5032 Lecture 10: Learning from failure 2

FURTHER READING

MAIB – Maritme Accident Investigation Branch

• http://www.maib.gov.uk/home/index.cfm

AAIB – Air Accidents Investigation Branch

• http://www.aaib.gov.uk/home/index.cfm

Columbia Accident Investigation Branch

• http://caib.nasa.gov/