Introduction to High Reliability Organizations Ralph T. Soule

Introduction to High Reliability Organizations

Ralph T. Soule

What is Reliability?

• “Reliability depends on the lack of unwanted, unanticipated, and unexplainable variance in performance”

– Eric Hollnagel, 1993

Examples of High Reliability Organizations

• Nuclear power-generation plants• Naval aircraft carriers• Chemical production plants• Offshore drilling rigs• Air traffic control systems• Incident command teams• Wild land firefighting crews• Hospital ER/Intensive care units

Low Reliability

Example of Low Reliability:Tenerife

• Precursors– Small airport, overtaxed air controllers,

speaking English, not accustomed to large planes, visibility obscured by fog

– KLM Pilot in a hurry to complete mission before crew expended, spends most of his time running simulators as an instructor, not challenged by crew

– Too much information passed too fast, speakers “stepping” on each other

1-The planes are parked at the end of runway 12 with the KLM in front of the Pan Am.2-The KLM has made it to the end of runway 30 and is ready for takeoff.3-The Pan Am has passed the first taxiway.4-The Pan Am has passed the second taxiway.5-The Pan Am has missed the third taxiway where it is supposed to exit. The KLM begins to takeoff.6-The Pan Am tries to get off the runway but is hit by the KLM. 583 people die.

Copyright ©2000 BMJ Publishing Group Ltd.

Reason, J. BMJ 2000;320:768-770

“Swiss cheese” modelof accident causation

So Why Do Smart People Do Dangerous Things?

• Excessive Workload – Physical and cognitive effort involved in task

performance.

• Lack of Situation Awareness– What’s going on?– What’s likely to happen next?– What will happen if I take a particular action?

• Excessive Stress, Fatigue, Uncertainty, etc.– Impacts perceptual-motor performance, decision-

making, etc.

So Why Do Smart People Do Dangerous Things?

• Diminished Attention– Too much to attend to at once (overload)– Too little to attend to for too long (underload)

• Poor Teamwork and Communication– Often due to poor layout of work space and/or

poor layout of command and communication structure

Common reasons why defensive weaknesses not detected and repaired

• People involved tend to forget to be afraid

• Bad events rare in well-defended systems, few people have direct experience with them


• Production demands are immediate, attention grabbing whereas safe operations generate a constant – and hence relatively uninteresting – non-event outcome– Reliability is invisible - nothing much to pay attention

to– If people see nothing happening, easy to presume

nothing is happening• If nothing is happening and they continue to operate has they

have, they get fooled into thinking nothing will continue to happen

Dangerous Defenses

• Defense in depth– Redundancy makes systems more complex

• Since some types of errors are not immediately recognizable, operators/maintainers will not learn from them as readily

• Accumulated errors can allow ‘holes’ in defensive systems to line up and permit the passage of an accident trajectory

– Unforeseen common mode failures can make it possible for errors to affect more than one layer simultaneously

– Changing procedures to prevent the “last” problem can make the procedures more complex and introduce unforeseen failure modes


• Conclusion– If eternal vigilance is the price of freedom, then

chronic unease is the price of safety– HROs are hard to sustain when the thing about which

one is uneasy either has not happened or happened a long time ago

– Accidents do not occur because people gamble and lose, they occur because people do not believe that the accident that is about to occur is possible

–

High Reliability

• Operators/Managers of High Reliability Orgs assume that each day could be a bad day and act accordingly. Regarding safety information, they– Actively seek it– Messengers are trained and rewarded– Responsibility is shared– Failures lead to far-reaching reforms– New ideas are welcomed

5 Habits of Highly Reliable Organizations

• Don't be tricked by your success– Preoccupation with failure

• Defer to experts on the front line– Sensitivity to operations

• Let the unexpected circumstances provide your solution– Resilience means having a steady head

• Embrace complexity– Reluctance to simplify

• Anticipate -- but also anticipate your limits– You can prepare for unexpected events, within limits

Characteristics of HROs

• Safety oriented culture

• Operations are a team effort

• Communications are highly valued

• Always prepared for the unexpected

• Multidisciplinary review of near- misses and

adverse outcomes


• Strong, well defined leadership

• safety oriented culture• operations are a team

effort• flattened hierarchy for

safety • communications revered

• emergencies and the unexpected are rehearsed

• place value on the “near miss”

• multidisciplinary review • top brass (senior

management) endorses• substantial budget

devoted to training


• Safety oriented culture

• Operations are a team effort

How Different Organizations Handle Safety Information

• Pathologic CultureDon’t want to know, messengers are shot, responsibility is shirked, failure is punished or concealed, new ideas are actively discouraged

• Bureaucratic CultureMay not find out, messengers are listened to... if they arrive, responsibility is compartmentalized, failures lead to local repairs, new ideas often present problems

• Generative CultureActively seek it, messengers are trained and rewarded, responsibility is shared, failures lead to far-reaching reforms, new ideas are welcomed

Tools

• Commander's Intent– Intended to help people read the your mind if they

run into uncertainty about how to carry out the orders under “field” conditions.

– Situation (Here is what I think we face).– Task (Here is what I think we should do).– Intent (Here is why)– Concerns (Here is what we should keep our eye on)– Calibration (Now, talk to me (about your questions and

concerns))

• Pre-mortems– Tool for anticipating the unexpected

Summary

• High reliability is a process, not a result• Safety is a dynamic non-event• Error management has two components:

– Error reduction and– Error containment

• Engineering a Safety Culture – reporting culture -> providing the data for navigation– just culture -> trust, deciding when there is fault– flexible culture -> adjust – learning culture -> plan, do, check, act

• If eternal vigilance is the price of freedom, then chronic unease is the price of safety

References

• The Logic of Failure, Dietrich Dorner• Intuition at Work, Gary Klein• The Vulnerable System: An Analysis of the Tenerife Air

Disaster, Karl E. Weick• The 1996 Mount Everest climbing disaster: The

breakdown of learning in teams, D. Christopher Kayes, Into Thin Air, Jon Krakauer, Outside Online

• Managing the Risks of Organizational Accidents, James Reason

• Managing the Unexpected, Weick and Suttcliff• Inviting Disaster, James Chiles• http://www.highreliability.org/

http://www.highreliability.org/

Documents

Introduction to High Reliability Organizations Ralph T. Soule