47
Future Trends in Process Safety Prof. Nancy Leveson Engineering Systems Aeronautics and Astronautics MIT

Leveson HUG 07.ppt

Embed Size (px)

DESCRIPTION

Different information on safety subjects relating construction industry

Citation preview

  • Future Trends in Process Safety Prof. Nancy LevesonEngineering SystemsAeronautics and AstronauticsMIT

  • Youve carefully thought out all the anglesYouve done it a thousand timesIt comes naturally to youYou know what youre doing, its what youve been trained to do your whole life.Nothing could possibly go wrong, right?

  • Think Again

  • TopicsLessons from Texas City New factors in process accidents Safety as a control problemConclusions

  • LeadershipSafety requires passionate and effective leadershipTone is set at the top of the organizationNot just sloganeering but real commitmentSetting prioritiesAdequate resources assignedA designated, high-ranking leaderSafety and productivity are not conflicting if take a long-term view

  • Managing and Controlling SafetyNeed clear definition of expectations, responsibilities, authority, and accountability at all levels of safety control structureEntire control structure must together enforce the system safety propertyUnsafe changes must be eliminated or controlled through system design or detected and fixed before they lead to an accident.Planned changes (MOC process)Unplanned changes

  • Visibility and CommunicationDownward and upward communicationRequires a positive, open, trusting environmentNeed effective measurement and monitoring of process safety performance (e.g., injury rates are not useful and are misleading)Avoid culture of denialIf managers do not want to hear, people stop talking

  • Information and Appropriate FeedbackGood accident/incident investigation and follow throughIdentification and correction of systemic causal factors.Ensuring thorough reporting of incidents and near missesThorough hazard identification, analysis, and controlEffective process safety audit system to ensure adequate process safety performance

  • Oversight and Control Results of operating experience, process hazard analyses, audits, near misses, or accident investigations must be used to improve process operations and process safety management system.Address promptly and track to completion the deficiencies found during assessments, audits, inspections and incident investigation.

  • Fumbling for his recline button Ted unwittingly instigates a disaster

  • Process Safety vs. Personal Safety All behavior influenced by context in which it occursBoth physical and social contextPersonal safety focuses on changing individual behaviorProcess (system) safety focuses on design of system in which behavior occursTo understand why process accidents occur and to prevent them, need to:Understand current context (system design)Create a design that effectively ensures safety

  • The Enemies of SafetyComplacencyArroganceIgnorance

  • Factors in ComplacencyDiscounting riskOver-relying on redundancyUnrealistic risk assessmentIgnoring low-probability, high-consequence eventsAssuming risk decreases over timeIgnoring warning signs

  • TopicsLessons from Texas City New factors in process accidents New technology System accidentsNew types of human errorSafety as a control problemConclusions

  • Accident with No Component Failures

  • Types of AccidentsComponent Failure AccidentsSingle or multiple component failuresUsually assume random failure

    System AccidentsArise in interactions among componentsRelated to interactive complexity and tight couplingExacerbated by introduction of computers and software

  • Safety vs. ReliabilitySafety and reliability are NOT the sameSometimes increasing one can even decrease the other.Making all the components highly reliable will have no impact on system accidents.For relatively simple, electro-mechanical systems with primarily component failure accidents, reliability engineering can increase safetyFor complex systems, need something more

  • Humans in Process Safety Usually define human error as deviation from normative procedures, but operators always deviate from standard proceduresNormative vs. effective proceduresSometimes violation of rules has prevented accidentsCannot effectively model human behavior by decomposing it into individual decisions and acts and studying it in isolation fromPhysical and social contextValue system in which takes placeDynamic work process

  • Less successful actions are natural part of search by operators for optimal performance

  • New Operator Roles and ErrorsHigh tech automation changing cognitive demands on operatorsSupervising rather than directly monitoringDoing more cognitively complex decision-makingDealing with complex, mode-rich systemsIncreasing need for cooperation and communicationHuman-factors experts complaining about technology-centered automationDesigners focus on technical issues, not on supporting operator tasks Leads to clumsy automationErrors are changing, e.g., errors of omission vs. commission

  • Impacts on System DesignDesign for error toleranceAlarm management (managing by exception)Matching tasks to human characteristicsDesign to reduce human errorsProviding information and feedbackTraining and maintaining skills

  • TopicsLessons from Texas City New factors in process accidents Safety as a control problem New approaches to hazard analysis Design for safetyRisk analysis and managementConclusions

  • STAMP: A Systems Model of Accident CausalitySystems-Theoretic Accident Model and ProcessesSafety treated as a control problem, not a failure problem Accidents are not simply an event or chain of events Involve a complex, dynamic processArise from interactions among humans, machines and the environment

  • A Broad View of ControlDoes not imply need for a controller Component failures and dysfunctional interactions may be controlled through design (e.g., redundancy, interlocks, fail-safe design) or through processManufacturing processes and proceduresMaintenance processesOperationsDoes imply the need to enforce safety constraints in some way

  • STAMP (2)Safety is an emergent property that arises when system components interact with each other within a larger environmentA set of safety constraints related to behavior of system components enforces that propertyAccidents occur when interactions among system components violate those constraints Goal of process (system) safety engineering is to identify the safety constraints and enforce them in the system design

  • Example Safety ConstraintsBuild safety in by enforcing constraints on behavior Controller contributes to accidents not by failing but by:Not enforcing safety-related constraints on behaviorCommanding behavior that violates safety constraints

    System Safety Constraint: Water must be flowing into reflux condenser whenever catalyst is added to reactor

    Software (Controller) Safety Constraint: Software must always open water valve before catalyst valve

  • STAMP (3)Systems are not staticA socio-technical system is a dynamic process continually adapting to achieve its ends and to react to changes in itself and its environmentSystems and organizations migrate toward accidents (states of high risk) under cost and productivity pressures in an aggressive, competitive environmentPreventing accidents requires designing a control structure to enforce constraints on system behavior and adaptation that ensures safety

  • ExampleControlStructure

  • Controlling and managing dynamic systems requires visibility and feedback

    Controlled Process

    Model ofProcess

    Controller ControlActionsFeedback

  • Relationship Between Safety and Process ModelsAccidents occur when models do not match process andIncorrect control commands givenCorrect ones not givenCorrect commands given at wrong time (too early, too late)Control stops too soon(Note the relationship to system accidents)

  • Relationship Between Safety and Process Models (2)How do they become inconsistent?Wrong from beginningMissing or incorrect feedbackNot updated correctlyTime lags not accounted forResulting inUncontrolled disturbancesUnhandled process statesInadvertently commanding system into a hazardous stateUnhandled or incorrectly handled system component failures

  • Modeling Accidents Using STAMPTwo types of models are used:Static safety control structureBehavioral dynamics (system dynamics) Dynamic processes behind change in the safety control structure, i.e., why it may change (e.g., degrade) over time

  • Simplified System Dynamics Model of Columbia Accident

  • Uses for STAMPBasis for new, more powerful hazard analysis techniques (STPA)Safety-driven designMore comprehensive accident/incident investigation and root cause analysisOrganizational and cultural risk analysisDefining safety metrics and performance auditsDesigning and evaluating potential policy and structural improvementsIdentifying leading indicators of increasing risk (canary in the coal mine)New risk management toolsNew holistic approaches to security

  • STAMP-Based Hazard Analysis (STPA)Supports a safety-driven design process whereHazard analysis influences and shapes early design decisionsHazard analysis iterated and refined as design evolvesGoals (same as any hazard analysis)Identification of system hazards and related safety constraints necessary to ensure acceptable riskAccumulation of information about how hazards can be violated, which is used to eliminate, reduce and control hazards in system design, development, manufacturing, and operations

  • STPA (2)STPA processStarts with identifying system requirements and design constraints necessary to maintain safety.Then STPA assists in Top-down refinement into requirements and safety constraints on individual components.Identifying scenarios in which safety constraints can be violated. Using results to eliminate or control hazards in design, operations, etc.

  • Copyright Nancy Leveson, Aug. 2006

  • Comparison of STPA with Traditional HA TechniquesTop-down (vs bottom-up like FMECA)Considers more than just component failure and failure events (includes these but more general)Guidance in doing analysis (vs. FTA)Handles dysfunctional interactions and system accidents, software, management, etc.

  • Comparisons (2)Concrete model (not just in head)Not physical structure (HAZOP) but control (functional) structureGeneral model of inadequate control (based on control theory)HAZOP guidewords based on model of accidents being caused by deviations in system variablesIncludes HAZOP model but more generalFault trees concentrate on component failures, miss system accidents

  • Risk Analysis and Risk Management

    Effectiveness and Credibility of ITATime

  • TimeSystem Technical Risk

  • Identifying Lagging vs. Leading IndicatorsNumber of waivers issued good indicator for risk in Space Shuttleoperations but lags rapid increase in riskTime

  • Time No. of incidents under investigation a better leading indicator

  • Managing Tradeoffs Among RisksGood risk management requires understanding tradeoffs amongScheduleCostPerformanceSafety

  • Example: Schedule Pressure and Safety PriorityOverly aggressive schedule enforcement has little effect on completion time (
  • ConclusionsFuture needs for safety in the process industry:Differentiation between process safety and personal (occupational) safetyImproved safety culture managementNew approaches to handleAdvanced technology (particularly digital technology)System accidents and complexityNew types of human errorUsing a control-based (vs. failure-based) model of causality expands our power to prevent process accidents

    *My personal view of the accident and accidents in general. Although the Baker Panel report findings represent a consensus view, each member has their own prioritization of the importance of the individual findings and their own view of accidents in general. So let me tell you a little about my background to set the context for what you will hear.***2. System accidents, hmi and computers in control, human errors4. Hazard analysis and stpa (batch reactor) design for safety (precedence) risk analysis and management

    *Instead of focusing on deficiencies at BP, want to look at what can be learned from our findings and what general principles were violated.Thus focus more on what need to do to do it right rather than on what was wrong at BP (although they clearly are just two sides of the coin)Sincere management concern is top factor in organizations that have fewer accidents and losses***Management info system is second more important factor in organizations that have accidents.Three aspects: collection, analysis, dissemination and useManagement by goals in high-pressure industries (such as offshore oil drilling) encourages an image of super-performance and creates a tendency to cover up past mistakes. corporate learning requires formal or informal mechanisms to observe, record, retrieve past collective experience, including mistakes Requires delegation of responsibility for capturing info, rewards or at least not punishment, a system for creating and handling incident/accident reports, comprehensive procedures for analyzing incidents and identifying causal factors, and procedures for using reports and generating corrective actions.*Not always done. Can develop tremendous backlogs. Becomes standard operating procedure*Often treat accidents as a chain of events and end up blaming operators or humans close to the actual events. But human behavior always occurs in a context. And humans will always make mistakes. Need to create context in which humans less likely to do the wrong thing.Systems approach to safety (vs. reliability approach)*Confusion between personal and process safety was a major cause of accident, in my view.Measuring and controlling wrong thing e.g., days without an accident is not a process safety measurement.Couldnt find culture survey with process safety questions (and data for comparison). Had to write our own.*SUBSAFEComplacency factors: Discounting risk: a human tendency, when people attempt to predict risk, they explicity or implicitly multiply events with low probability, assuming independence, and co thatme out with impossibly low numbers, when in fact the events are dependent. Called the Titanic Coincidence. Titanic Effect: Explain the fact that major accidents often preceded by a belief they cannot happen. Magnitude of disasters decreases to the extent that people believe that disasters are possible and plan to prevent them or to minimize their effects. Costs of taking action in advnce to prevent are inconsequential when measured against losses that may ensue if no action taken.Over-relying on redundancy: many accidents result of common cause failures in redundant systems Paradox: providing redundancy may lead to the complacency that defeats the redundancyUnrealistic risk assessment: ignores factors that not able to quantify or just make up numbersIgnoring high-consequence, low-probability eventsAssuming risk decreases over timeIgnoring warning signs**New technology particularly digital technology2. System accidents, hmi and computers in control, human errors4. Hazard analysis and stpa (batch reactor) design for safety (precedence) risk analysis and management

    ***Ill talk about what that something more can be a little later***2. System accidents, hmi and computers in control, human errors4. Hazard analysis and stpa (batch reactor) design for safety (precedence) risk analysis and management

    *****Not just management of change for planned changes but also migration (changes) in system due to natural factors*****Starting from this view of accidents as a control problem, can model and analyze safety. Two types of models used.*************Takeaways: 1- Overly aggressive schedule enforcement has little effect on completion time (