Upload
john-rooksby
View
268
Download
1
Tags:
Embed Size (px)
Citation preview
ORGANISATIONS AND DEPENDABILITY 1
DR JOHN ROOKSBY
IN THIS LECTURE…
High Reliability Organisations
These are organisations that are able to achieve high reliability from complex, critical systems
• This lecture will cover five of the key qualities said to be held by these organisations
This lecture will use Nuclear Powered Carriers as an example High Reliability Organisation, and NASA at the time of the Columbia disaster as an example of an unreliable organisation
NORMAL ACCIDENTS
Charles Perrow, and introduced the idea that failures are normal in complex systems. Perrow argued serious failures are likely when there is:
• Interactive complexity: The presence of unfamiliar, unplanned and unexpected sequences of events in a system that are not visible or immediately comprehensible
• Tight coupling: The presence of interdependent components. Tight coupling will make a system more prone to cascading errors.
So complex, tightly coupled systems shouldn’t be built?
HRO researchers argue that some complex, tightly coupled systems are far more dependable than others – because of the way they are managed
PRINCIPLES
High Reliability Organisations
Low Reliability Organisations
Focus on failure Focus on Success
Focus on reliability Focus on efficiency
Reluctant to simplify Rely on Simplicity
Dynamic hierarchies Inflexible Hierarchy
De-centralised decision making Centralised decision making
Open information Hide Information
Multiple perspectives Single perspectives
Are committed to resilience Are on “automatic pilot”
NUCLEAR POWERED CARRIERS
Complex, high risk socio-technical systems
• Multiple (mechanical and digital) systems• Dangerous objects (aircraft, fuel, and explosives) in close
proximity. Aircraft taking off and landing in 48-60 second intervals.
• 6000 crew. Several different kinds of aircraft, multiple squadrons. All work interdependently and must be coordinated.
• Carriers are 24 stories high and carry enough fuel for 15 years. 2000 telephones. 3,360 compartments and spaces
NUCLEAR POWERED CARRIERS
High risk
• Nuclear reactor accidents• Fire, flooding, grounding, collision• Fuel and weapons explosions• Mistaken identification of friends and foes• High risks both to crew and a much larger public
High reliability
• Low “crunch rates” • Comparatively few major accidents
COLUMBIA DISASTER
Feb 1st 2003 - Columbia disintegrates during re-entry into the earth’s atmosphere
The thermal protection system had been damaged during launch when a large piece of foam insulation broke off the main propellant tank and hit the shuttle
• Known problem. • The majority of shuttle launches had included foam
strikes, but nothing had been done about the design• They were aware the foam had struck the wing, but it
was not treated as serious• Engineers concerns were not listened to
NASA
NASA had repeated similar failings
• The Challenger disaster, 28th Jan 1986 (mission STS 51-L)• The Columbia disaster, 1st Feb 2003 (Mission STS-107)
Many of the failings were the result of deep routed organisational findings
NASA strived to implement HRO principles
FIVE PRINCIPLES
High Reliability Organisations
Low Reliability Organisations
Focus on failure Focus on Success
Focus on reliability Focus on efficiency
Reluctant to simplify Rely on Simplicity
Dynamic hierarchies Inflexible Hierarchy
De-centralised decision making Centralised decision making
Share information Hide Information
Multiple perspectives Single perspectives
Are committed to resilience Are on “automatic pilot”
1. RELIABILITY OVER EFFICIENCY
High Reliability Organisations give reliability precedence over efficiency
• Decisions are made on the grounds of reliability first and then efficiency
• Efficiency initiatives are treated with scepticism
1. RELIABILITY OVER EFFICIENCY
High Reliability Organisations do the following:
• Managers regularly talk to and familiarise themselves with staff about how they do their work and why.
• Organisations develop safety measures as well as financial measures, and include these in employee evaluations
• Organisations assign value to the avoidance of accidents• High redundancy despite cost• Cautious actions when necessary despite cost
• Carriers have to persuade congress that enormous amounts
of redundancy (in jobs, communication structures, parts) are
necessary, and that enormous amounts of training are
necessary
• Constant training despite cost. Commanding officers demand
that carriers have regular sea exercises, that they are not just
kept in port
NASA Prioritised efficiency over reliability
• In the 1990s NASA faced drastic cuts and became overly concerned with pleasing congress. NASA Initiated the Faster, Better, Cheaper strategy in the mid-90s. Wanted to stick to a strict schedule.
• With STS-107 they worried that the time needed to analyse the foam strike would delay the next mission. Didn’t want to change the next missions objectives to a rescue mission.
• Saw positioning the shuttle over Hawaii for images to be made as time consuming and costly
2. PREOCCUPATION WITH FAILURE
High Reliability Organisations are preoccupied with failure (They do not focus on success)
• Workers need to be heedful to the possibility of failure• Failures are understood to be normal (but unacceptable)• Know there can be unexpected failure modes, even in common
activities
2. PREOCCUPATION WITH FAILURE
High Reliability Organisations address failure by
• Constant training of all people (simulations, apprenticing, practice)
• Using incident reporting• Designing in extensive redundancy• Maintaining contingencies for critical operations• Requiring proofs that something is safe, not that it is unsafe
• There is constant tracking of issues around malfunctioning,
defective and substandard equipment. They act on these by
training crew how to overcome problems and pressuring
vendors to make improvements
• Extensive redundancy (overlapping jobs, multiple channels
and centres of communications, spare parts, multiple sources
for decision making).
• Example: if an aircrafts landing gear warning light comes on,
the spotter, commander and pilot all work together to establish
what the issues is.
• Multiple contingencies are maintained. Example: There will
always be multiple options for how to land the plane (or for
the pilot to escape).
• Foam had been shed on 65 of 79 missions prior to STS-107.
There were repeated resolves to do something about this and
yet nothing happened.
• After the foam strike, engineers who raised concerns were
asked to prove it posed a danger rather than prove it didn’t.
• No sustained effort to acquire images of the shuttle, or to
share them internally
• A shuttle was available for a rescue mission but never
actually considered.
3. SHARING THE BIG PICTURE
High Reliability Organisations want everyone to know the whole picture
• If people are narrowly focused they will act only in their own interest.
• People need to maintain awareness of other people and events around the organisation
3. SHARING THE BIG PICTURE
High Reliability Organisations
• Train people broadly• Educate people about overarching objectives, and set
statements of purpose• Give people access to information on what is happening
elsewhere• Clearly specify how people and teams fit into the whole
• Maintain awareness through many communication devices
and multiple kinds of communication device, and have
multiple centers of communication, each has direct access to
information, each is vigilant.
• Have well articulated hierarchies
• Deck hands are motivated because they are treated as core
parts of teams
• People are rotated through different jobs. Top personnel are
rotated to a different position every 90 days.
• Employees had little understanding of the overall
organisation, and its internal processes
• A team was set up with the correct expertise to assess the
foam strike damage but its objectives were fuzzy and it had
no direct connection to management
• But not given the appropriate official category “Tiger Team”
• The investigators did not know the process for requesting
images, and were rebuked when they tried because they did
not have the authority to request them or the correct approval
4. RELUCTANCE TO SIMPLIFY
All organisations have to simplify and abstract, to filter out unnecessary information (particularly for getting “big pictures”)
But High Reliability Organisations
• Use labels and categories as little as possible as they stop you from looking further into details and events.
• Continually rework labels and categories• Listen to wisdom, but with skepticism • Do not focus on information that supports expectations, but
focus on that which doesn’t fit or disconfirms desires
• There are clear responsibilities and tasks, but in practice the crew are constantly negotiating, communicating and interacting
• If there is a problem with an aircraft, multiple people take multiple views.
• Narrowed the foam strike down to a ‘tile incident’, because
management had expertise in Tiles. It was a reinforced-
carbon carbon panel (RCC) incident.
• The assessment of the damage was done using simulation
software called ‘Crater’ .
• This software was designed for simulating small projectiles
but the foam debris was 640 times larger than the data used
to calibrate Crater.
• Crater was not understood by NASA and the simulation was
actually run and interpreted outside the organisation.
• The simulation was only run twice and the people who ran it
did not think it was very useful, but did not communicate this
well
5. MIGRATION OF DECISION MAKING
High Reliability Organisations migrate decision making as far down the organisation as possible
• Decisions are not made by one central authority. Decisions need to be made where there is expertise. This helps decisions to be made quickly and correctly
5. MIGRATION OF DECISION MAKING
In order to defer expertise:
• Decision making ability migrated to the lowest appropriate levels
• People are trained in making decisions and are given the right resources to do so
• There is recognition of skill levels and legitimacy through the organisation and people are trusted
• There is hierarchy, but decision making is pushed to the extremes. For example if there is debris on the runway, whoever spots it can halt operations and have it cleared
• Rank is not treated as an issue here
• NASA Mission STS-107
• Decision making centralised among managers and ignored
the expert opinions of engineers
• Required authority for decisions to be made
• Example: When images were requested, the organisation
worried about the rank of the requestor
KEY POINTS• Organisational approaches are necessary for achieving
dependable systems. Dependability is not a quality of a technology but a quality of technology-in-practice.
• Technologies are not inherently dependable, but require people to operate and manage them in ways that are dependable
• The HRO literature has identified a number of qualities of highly reliable organisations. These mainly relate to the operation of technology, although some researchers have studied software development organisations from this perspective.
READINGKH Roberts (1990) Some Characteristics of One Type of High Reliability Organisation. Organisational Science, 1, 2: 160-76.
Book: Charles Perrow (1984) Normal Accidents, Living with High Risk Technologies
Book chapter: Karl Weick (2005) Making Sense of Blurred Images. In W Starbuck and M Farjoun, Organisation at the Limit. Blackwell publishing