Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
© Erik Hollnagel, 2014
Introduction to FRAM:A Method and its Principles
Professor Erik Hollnagel
E-mail: [email protected]
University of Southern DenmarkInstitute of Regional Health ResearchOdense, denmark
Region of Southern DenmarkCentre for Quality
Middelfart, denmark
© Erik Hollnagel, 2014
Models and methodsAn analysis of something inevitably involves some assumptions about how that something happens. These assumptions correspond to a model: a simplified explanation of how something can happen and of how the ‘world’ is organised. The organisation usually implies some kind of hierarchical ordering of layers, parts, or components (structural models).
The model defines what the method can be used for, and therefore also sets the limits of the method.
The FRAM is a method to develop a representation or model of how something happens. This model can then be the basis for various kinds of analyses (reactive, proactive). A FRAM model represents the functions that sufficient and necessary for an activity to take place – not when it goes wrong but when it goes right.
© Erik Hollnagel, 2014
The causality credo
Adverse outcomes happen because something has gone wrong. Adverse outcomes therefore have causes, which can be found and treated.
All accidents can be prevented (zero harm).
Find the component that failed by reasoning
backwards from the final consequence.
Accidents result from a combination of active
failures (unsafe acts) and latent conditions (hazards).
Find the probability that components “break”, either alone or in simple combinations.
Look for combinations of failures and latent conditions that may constitute a risk.
Accident investigation Risk analysis
© Erik Hollnagel, 2014
Common assumptions (~ 1970)
The failure probability of elements can be analysed/described individually
The order or sequence of events is predetermined and fixed
When combinations occur they can be described as linear (tractable, non-interacting)
The influence from context/conditions is limited and quantifiable
The function of each element is bimodal (true/false, work/fail)
System can be decomposed into meaningful elements (components, events)
© Erik Hollnagel, 2014
Revised assumptions - 2014
While many adverse events can be attributed to failures and malfunctions of everyday functions, many others must be understood as the result of combinations of variability of everyday performance.
Risk and safety analyses should acknowledge the importance of variability of everyday performance and how this creates conditions that may lead to both positive and adverse outcomes.
Outcomes are determined by relations rather than by factors - performance variability rather than by failure probability.
The function of the system is not bimodal, but everyday performance is – and must be – variable.
Systems cannot be decomposed in a meaningful way (no natural elements or components)
CertificationI
P
C
O
R
TFAA
LubricationI
P
C
O
R
T
Mechanics
High workload
Grease
Maintenance oversight
I
P
C
O
R
T
Interval approvals
Horizontal stabilizer
movementI
P
C
O
R
TJackscrew up-down
movementI
P
C
O
R
T
Expertise
Controlledstabilizer
movement
Aircraft design
I
P
C
O
R
T
Aircraft design knowledge
Aircraft pitch control
I
P
C
O
R
T
Limiting stabilizer
movementI
P
C
O
R
T
Limitedstabilizer
movement
Aircraft
Lubrication
End-play checking
I
P
C
O
R
T
Allowableend-play
Jackscrew replacement
I
P
C
O
R
T
Excessiveend-play
High workload
Equipment Expertise
Interval approvals
Redundantdesign
Procedures
Procedures
© Erik Hollnagel, 2014
Principles for FRAM
The principle of equivalence of successes and failures.
I
The principle of approximate adjustments.
II
The principle of emergence.III
The principle of functional resonance.
IV
© Erik Hollnagel, 2014
I: Equivalence of success and failuresFailure is normally explained as a breakdown or malfunctioning of a system and/or its components.
Resilience Engineering and Safety-II recognises that individuals and organisations must adjust to the current conditions in everything they do. Because information, resources and time always are finite, the adjustments will always be approximate.
This view assumes that success and failure are of a fundamentally different nature (the ‘hypothesis of different causes’).
Performance adjustments
Unacceptable outcomes
Performance variability (approximate adjustments) is also the reason why things sometimes go wrong.
Acceptable outcomes
Performance variability (approximate adjustments) is the reason why everyday work is safe and effective.
© Erik Hollnagel, 2014
The difference can be difficult to define
objectively.Action
Expected outcome
Unexpected outcome
2. An action that leads to the expected outcome, is seen as a correct action.
2. An action that leads to the expected outcome, is seen as a correct action.
3. An action that leads to unexpected outcomes, is classified as an “error”
3. An action that leads to unexpected outcomes, is classified as an “error”
4. In hindsight, the alternative “correct” action is identified.
4. In hindsight, the alternative “correct” action is identified.
Outcome of previous action
Actions and “errors”
1. An action is chosen to fit the current
situation.
1. An action is chosen to fit the current
situation.
"Knowledge and error flow from the same mental sources, only success can tell one from the other."(Ernst Mach, 1838-1916)
© Erik Hollnagel, 2014
II: Approximate adjustments
Availability of resources (time, manpower, materials,
information, etc.) may be limited and uncertain.
People adjust what they doto match the situation.
Performance variability is inevitable, ubiquitous, and necessary.
Because of resource limitations, performance adjustments will always be approximate.
Performance variability is the reason why things sometimes go wrong.
Performance variability is the reason why everyday
work is safe and effective.
© Erik Hollnagel, 2014
If thoroughness dominates, there may be too little time to carry out the actions.
If efficiency dominates, actions may be badly
prepared or wrong
Neglect pending actionsMiss new events
Miss pre-conditionsLook for expected results
Thoroughness: Time to thinkRecognising situation.Choosing and planning.
Efficiency: Time to doImplementing plans. Executing actions.
Efficiency-Thoroughness Trade-Off
Time & resources needed
Time & resources available
© Erik Hollnagel, 2014
No time (or resources) to do it now
Some ETTO heuristics
Looks fineNot really important
Normally OK, no need to check
Will be checked by someone else
Can’t remember how to do it We always do it this way
Idiosyncratic (work related)
Has been checked by someone else
Cognitive (individual)
Judgement under uncertainty
Cognitive primitives (SM – FG)
Reactions to information input
overload and underload
Cognitive style
Collective (organisation)
Negative reporting
Reduce redundancy
Meet “production” targetsReduce
unnecessary cost
Double-bind
We must get this doneMust be ready in time
Must not use too much of X
I’ve done it millions of time before
This way is much quicker
It looks like X (so it probably is X)Reject conflicting
information
Confirmation bias
© Erik Hollnagel, 2014
The wet floor
A mill employee slipped and fell on a wet floor and fractured his kneecap. For more than six years it had been the practice to wet down too great an area of floor space at one time and to delay unnecessarily the process of wiping up.
Slipping on the part of one or more employees was a daily occurrence. The ratio of no-injury slips to the injury was 1,800 to 1.(Heinrich, 1931)
© Erik Hollnagel, 2014
III: Principle of emergence
The variability of normal performance is rarely large enough to be the cause of an accident in itself or even to constitute a malfunction. The variability from multiple functions may combine in unexpected ways, leading to consequences that are disproportionally large, hence produce non-linear effects. Both failures and normal performance are emergent rather than resultant phenomena, because neither can be attributed to or explained only by referring to the (mal)functions of specific components or parts.
Socio-technical systems are intractable because they change and develop in response to conditions and demands. It is therefore impossible to know all the couplings in the system, hence impossible to anticipate more than the regular events. The couplings are mostly useful, but can also constitute a risk.
The Small World Problem
© Erik Hollnagel, 2014
The small world problem
Stanley Milgram (1933-1984)
Travers & Milgram (1969). An experimental study of the small world problem. Sociometry, 32(4), 425-443.
A “target person” (Boston) and three groups of “starting persons” were selected (Nebraska: n=296, Boston: n=100). Target was identified by name, address, occupation, place of work, college & graduation year, military service, wife’s maiden name, hometown. Each starter was given a document and asked to move it by mail toward the target, via first-name acquaintances, who was asked to repeat the procedure.
What is the probability that any two persons, selected arbitrarily from a large population, will now each other, or be linked via common acquaintances?
© Erik Hollnagel, 2014
Stable vs. transient causes
Final effects are (relatively) stable
changes to some part of the system.
Effects are ‘real.’
Causes are assumed to be stable. Causes can be ‘found’ by backwards tracing from the effect. Causes are ‘real.’
Causes can be associated with components or functions that in some way have ‘failed.’ The ‘failure’ is either visible after the fact, or can be deduced from the facts.
© Erik Hollnagel, 2014
Stable vs. transient causes
Final outcomes are (relatively) stable
changes to some part of the system.
Effects are ‘real.’
Causes represent a pattern that existed at one point in time. But they are inferred rather than ‘found.’ Causes are ‘elusive.’
Outcomes ‘emerge’ from transient (short-lived) intersections of conditions and events.
Outcomes cannot be traced back to specific components or functions. Outcomes are emergent because the conditions that can explain them were transient.
© Erik Hollnagel, 2014
IV: Resonance
Natural oscillation
Forcing function
Natural oscillation +
forcing function
Time
Forcing function with same frequency as natural oscillation
Resonance, same frequency but increased amplitude
Natural frequency, fixed amplitude
© Erik Hollnagel, 2014
Signal
Detection threshold
Stochastic resonance
Mixed signal + random noise
Stochastic resonance
Random noise
Detection threshold
Time
Stochastic resonance is the enhanced sensitivity of a device to a weak signal that occurs when random noise is added to the mix.
© Erik Hollnagel, 2014
Performance variability
Time
Functional resonance
For each function, the others constitute the environment.
Every function has a normal weak, variability.
The pooled variability of the “environment” may lead to resonance, hence to a noticeable “signal”
Functional resonance is the detectable signal that emerges from the unintended interaction of the normal variabilities of many signals.
© Erik Hollnagel, 2014
Tacoma Narrows Bridge
July 1, 1940
November 7, 1940
© Erik Hollnagel, 2014
London Millennium Bridge
Opened June 10, 2000
Closed June 12, 2000.Reason: bridge swayed severely as people walked across it.
Reopened after reconstruction, January 2002
© Erik Hollnagel, 2014
FRAM analysis steps
Propose ways to monitor and dampen performance variability (indicators, barriers, design / modification, etc.)4
Describe the actual / potential variability of 'foreground' functions and 'background' functions (context). Identify functional resonance based on potential / actual dependencies (couplings) among functions.
3
Complete the FRAM model by ensuring all defined aspects are described for at least two functions (as Output and as [Input, Precondition, Resource, Control, Time]).
2
Identify the essential functions in the event ('foreground' functions) – when things go right; characterise each using the six basic aspects. 1
Define the purpose of modelling and describe the situation being analysed. An event that has occurred (incident/accident), a possible future scenario (risk), the consequences of a design/modification.
0
© Erik Hollnagel, 2014
Identifying Functions: General
PURPOSE: To find out what went wrong or malfunctioned (cause or root cause). Accident investigations start from the observed (adverse) outcome(s), and trace the developments backwards until an acceptable cause is found.
PURPOSE: A FRAM analysis aims to identify how the system should have functioned (or should function) for everything to succeed (i.e., everyday performance), and to understand the variability which alone or in combination prevented that (or may prevent that) from happening.
MODEL: A FRAM model describes a system’s functions and the potential couplings among functions. The model does not describe or depict an actual sequence of events, i.e., an accident scenario.
INSTANTIATION: An accident scenario can be the result of an instantiation of the model. The instantiation is a “map” of how functions are coupled, or may become coupled, under given – favourable or unfavourable - conditions.
© Erik Hollnagel, 2014
Output
Resources (execution conditions)
Control
Input
Precondition
Time
Describing a FRAM function
That which activates the function and/or is used or transformed to produce the output. Constitutes the link to upstream functions.
That which is the result of the function. Constitutes the links
to downstream functions.
That which is needed or consumed by the function when it is active (matter,
energy, competence, software, manpower).
That which supervises or regulates the function. E.g., plans, procedures, guidelines
or other functions.
System conditions that must be fulfilled before a function can be carried out.
Temporal aspects that affect how the function is carried out (constraint, resource).
© Erik Hollnagel, 2014
Describing the aspects
The aspects of a function are described using the FRAM Model Visualiser (FMV). The FMV provides a structured way of defining, editing, and revising functions.
© Erik Hollnagel, 2014
Identifying Functions: Details
There is no single, correct level of description. A FRAM model will typically comprise functions described on different levels.
If there can be significant variability in a foreground function, then it is possible to go deeper into the analysis of that function, and possibly break it down into subfunctions.
The analysis may go beyond the boundaries of the system as initially defined. If some background function can vary and thereby affect foreground functions “inside” the system, then it should be considered a foreground function.
A FRAM analysis can in principle begin with any function. The analysis will show the need for other functions to be included, i.e., functions that are coupled or linked through various relations. FRAM defines six types of relations.
Where to begin
Level of description
Level of detail
System boundary (stop rule)
Functions are pragmatically labelled as being either foreground or background functions.
Foreground background
© Erik Hollnagel, 2014
Foreground and background (functions)
FRAM uses a distinction between foreground and background functions, which may all affect performance variability. Foreground functions are directly associated with the activity being modelled and may vary significantly during a scenario.Background functions refer to common conditions that may vary more slowly. The distinction between foreground and background functions is relative rather than absolute.A ‘background’ function may be analysed further, and thereby becomes a ‘foreground’ function.
Both sets of functions should be calibrated as far as possible using information extracted from accident databases.
© Erik Hollnagel, 2014
Why do functions vary?
The variability of the output can be a result of:
General principle: Variability of function Variability of output from function.
The performance of a function, hence the output, may also vary due to a combination of the three conditions: internal variability, external variability, and coupling.
The variability of the working environment, i.e., the conditions under which the function is carried out. This can be described as external or exogenous variability.
Influences from upstream functions, where the outputs from upstream functions (as input, precondition, resource, control, or time) may vary.
The variability of the function itself, i.e., a result of the nature of the function. This can be described as internal or endogenous variability.
© Erik Hollnagel, 2014
Simple description of variability
In the FRAM, the variability of the output should be considered relative to its use by a downstream function. Output variability can be described in terms of timing and precision.
With regard to precision, outputs can be imprecise, precise, or acceptable – relative to a downstream function. An imprecise output is either incomplete, inaccurate, ambiguous or misleading so that it does not meet the requirements of downstream functions. A precise output corresponds to the requirements of the downstream functions. An acceptable output can be used by the downstream functions, but requires some adjustment or variability of the downstream functions. These may use additional time and resources, hence increase variability.
With regard to timing, outputs can be produced too early, on time, too late, or not at all. If there is a noticeable delay in the propagation, then the transmission of the output may be described as a function in its own right.
© Erik Hollnagel, 2014
Upstream-downstream couplings
Upstream output variability Input Pre-condition Resource Control Time
Timing Too early
On time
Too late
Omission
Precision Imprecise
Acceptable
Precise
© Erik Hollnagel, 2014
Move lift
RP
I
T
O
C
[Operating manual (35 pages)]
[Lift in tilt-back position]
[Instruction to move lift]
[Lift stored in work area]
Tilt lift toTilt-backposition
RP
I
T
O
C
Preparingwork
RP
I
T
O
C
A FRAM model of everyday operation
[Work planning]
[Competence in operation]
[Lift has been delivered]
[Tilt area clear]
[Platform lowered]
[Outriggers in folded position]
© Erik Hollnagel, 2014
Move lift
RP
I
T
O
C
[Operating manual (35 pages)]
[Lift in tilt-back position]
[Instruction to move lift]
[Lift stored in work area]
Tilt lift toTilt-backposition
RP
I
T
O
C
Preparingwork
RP
I
T
O
C
A FRAM instantiation of the event
[Work planning]
[Competence in operation]
[Monsoon (rain)]
[Tilting lift took too long]
[Lift has been delivered]
[Lift delivered too early]
[Tilt area clear]
[Platform lowered]
[Outriggers in folded position]
[Monsoon rain]
© Erik Hollnagel, 2014
A FRAM instantiation of the situation
TBD
TBD
TBD
TBD TBD
TBD
TBD
TBD