Upload
joy-ramsey
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Introduction to High Reliability Organizations
Ralph T. Soule
What is Reliability?
• “Reliability depends on the lack of unwanted, unanticipated, and unexplainable variance in performance”
– Eric Hollnagel, 1993
Examples of High Reliability Organizations
• Nuclear power-generation plants• Naval aircraft carriers• Chemical production plants• Offshore drilling rigs• Air traffic control systems• Incident command teams• Wild land firefighting crews• Hospital ER/Intensive care units
Low Reliability
Example of Low Reliability:Tenerife
• Precursors– Small airport, overtaxed air controllers,
speaking English, not accustomed to large planes, visibility obscured by fog
– KLM Pilot in a hurry to complete mission before crew expended, spends most of his time running simulators as an instructor, not challenged by crew
– Too much information passed too fast, speakers “stepping” on each other
1-The planes are parked at the end of runway 12 with the KLM in front of the Pan Am.2-The KLM has made it to the end of runway 30 and is ready for takeoff.3-The Pan Am has passed the first taxiway.4-The Pan Am has passed the second taxiway.5-The Pan Am has missed the third taxiway where it is supposed to exit. The KLM begins to takeoff.6-The Pan Am tries to get off the runway but is hit by the KLM. 583 people die.
Copyright ©2000 BMJ Publishing Group Ltd.
Reason, J. BMJ 2000;320:768-770
“Swiss cheese” modelof accident causation
So Why Do Smart People Do Dangerous Things?
• Excessive Workload – Physical and cognitive effort involved in task
performance.
• Lack of Situation Awareness– What’s going on?– What’s likely to happen next?– What will happen if I take a particular action?
• Excessive Stress, Fatigue, Uncertainty, etc.– Impacts perceptual-motor performance, decision-
making, etc.
So Why Do Smart People Do Dangerous Things?
• Diminished Attention– Too much to attend to at once (overload)– Too little to attend to for too long (underload)
• Poor Teamwork and Communication– Often due to poor layout of work space and/or
poor layout of command and communication structure
Common reasons why defensive weaknesses not detected and repaired
• People involved tend to forget to be afraid
• Bad events rare in well-defended systems, few people have direct experience with them
Common reasons why defensive weaknesses not detected and repaired
• Production demands are immediate, attention grabbing whereas safe operations generate a constant – and hence relatively uninteresting – non-event outcome– Reliability is invisible - nothing much to pay attention
to– If people see nothing happening, easy to presume
nothing is happening• If nothing is happening and they continue to operate has they
have, they get fooled into thinking nothing will continue to happen
Dangerous Defenses
• Defense in depth– Redundancy makes systems more complex
• Since some types of errors are not immediately recognizable, operators/maintainers will not learn from them as readily
• Accumulated errors can allow ‘holes’ in defensive systems to line up and permit the passage of an accident trajectory
– Unforeseen common mode failures can make it possible for errors to affect more than one layer simultaneously
– Changing procedures to prevent the “last” problem can make the procedures more complex and introduce unforeseen failure modes
Common reasons why defensive weaknesses not detected and repaired
• Conclusion– If eternal vigilance is the price of freedom, then
chronic unease is the price of safety– HROs are hard to sustain when the thing about which
one is uneasy either has not happened or happened a long time ago
– Accidents do not occur because people gamble and lose, they occur because people do not believe that the accident that is about to occur is possible
–
High Reliability
• Operators/Managers of High Reliability Orgs assume that each day could be a bad day and act accordingly. Regarding safety information, they– Actively seek it– Messengers are trained and rewarded– Responsibility is shared– Failures lead to far-reaching reforms– New ideas are welcomed
5 Habits of Highly Reliable Organizations
• Don't be tricked by your success– Preoccupation with failure
• Defer to experts on the front line– Sensitivity to operations
• Let the unexpected circumstances provide your solution– Resilience means having a steady head
• Embrace complexity– Reluctance to simplify
• Anticipate -- but also anticipate your limits– You can prepare for unexpected events, within limits
Characteristics of HROs
• Safety oriented culture
• Operations are a team effort
• Communications are highly valued
• Always prepared for the unexpected
• Multidisciplinary review of near- misses and
adverse outcomes
Characteristics of HROs
• Strong, well defined leadership
• safety oriented culture• operations are a team
effort• flattened hierarchy for
safety • communications revered
• emergencies and the unexpected are rehearsed
• place value on the “near miss”
• multidisciplinary review • top brass (senior
management) endorses• substantial budget
devoted to training
Characteristics of HROs
• Safety oriented culture
• Operations are a team effort
How Different Organizations Handle Safety Information
• Pathologic CultureDon’t want to know, messengers are shot, responsibility is shirked, failure is punished or concealed, new ideas are actively discouraged
• Bureaucratic CultureMay not find out, messengers are listened to... if they arrive, responsibility is compartmentalized, failures lead to local repairs, new ideas often present problems
• Generative CultureActively seek it, messengers are trained and rewarded, responsibility is shared, failures lead to far-reaching reforms, new ideas are welcomed
Tools
• Commander's Intent– Intended to help people read the your mind if they
run into uncertainty about how to carry out the orders under “field” conditions.
– Situation (Here is what I think we face).– Task (Here is what I think we should do).– Intent (Here is why)– Concerns (Here is what we should keep our eye on)– Calibration (Now, talk to me (about your questions and
concerns))
• Pre-mortems– Tool for anticipating the unexpected
Summary
• High reliability is a process, not a result• Safety is a dynamic non-event• Error management has two components:
– Error reduction and– Error containment
• Engineering a Safety Culture – reporting culture -> providing the data for navigation– just culture -> trust, deciding when there is fault– flexible culture -> adjust – learning culture -> plan, do, check, act
• If eternal vigilance is the price of freedom, then chronic unease is the price of safety
References
• The Logic of Failure, Dietrich Dorner• Intuition at Work, Gary Klein• The Vulnerable System: An Analysis of the Tenerife Air
Disaster, Karl E. Weick• The 1996 Mount Everest climbing disaster: The
breakdown of learning in teams, D. Christopher Kayes, Into Thin Air, Jon Krakauer, Outside Online
• Managing the Risks of Organizational Accidents, James Reason
• Managing the Unexpected, Weick and Suttcliff• Inviting Disaster, James Chiles• http://www.highreliability.org/