Simultaneous increase of frequencies and consequences of ...ftp.feq.ufu.br/Luis_Claudio/Segurança/Safety... · An extract from the event tree for the LPG sphere alone might terminate

Special Topics and

Other Techniques

This chapter briefly reviews a selection of important topics and techniques not pre-sented elsewhere in this volume. Detailed discussion of these topics and techniques,because of their complexity, stage of development, and/or utility, is beyond the scopeof this volume. These topics are domino effects (Section 6.1), availability analysis ofprotective systems (Section 6.2), reliability analysis of programmable electronic sys-tems (Section 6.3), and other techniques (Section 6.4).

6.1. Domino Effects

6.1.1. Background

PURPOSEOften, an incident which starts in one item may affect nearby items (e.g., vessels con-

taining hazardous materials) by thermal, blast, or fragment impact. Examples of"domino effect" incidents are the LPG sphere BLEVE disaster at Feysin in France(Lees, 1980) and the Mexico City LPG explosion (Pietersen, 1985); domino incidentssometimes occur in the explosives industry (e.g., in nitroglycerine manufacturing). Thepurpose of domino analysis is to predict the occurrence of such incidents, so that theCPQRA will not underestimate the consequences or frequencies of major incidents.Domino analysis is also used to evaluate equipment separation to minimize the poten-tial for incident propagation.

PHILOSOPHYDomino incidents may be analyzed be either of two approaches:

• Increase the consequences of a given incident at fixed frequency, to allow forlarger consequences due to domino effects (modifying the outcome in an eventtree context).

• Increase the failure frequency of a given incident at fixed consequences, to allowfor domino effect contribution (treating the domino as an external event in afault tree context).

Simultaneous increase of frequencies and consequences of incidents may berequired in some cases. However, one must not overestimate the overall risk, becausethe consequences and frequencies of incidents may not be mutually exclusive (i.e.,double counting of incidents may overestimate the risk). Analyzing domino effects canexpand CPQRA work substantially, as any given incident may cause many dominoincidents, most of which are small. Only those domino incidents that lead to a signifi-cant escalation of consequences or frequency of occurrence are normally analyzed. Cri-teria for including domino incidents are usually arbitrary. For example, one may decidethat escalation in consequence or frequency of 50% or greater should require consider-ation of domino events.

Application. Despite its potential importance, few published risk assessmentsaddress the subject explicitly—most study domino events in terms of checking equip-ment separation distances. An early risk assessment for Canvey Island (HSE, 1978) didconsider domino events and discuss their interrelationships with nearby units and/orplants.

6.1.2. Description

DESCRIPTION OF TECHNIQUEAs noted earlier, domino incidents may be considered in two domains: either as anincreased frequency of occurrence of a specified consequence (fault tree context) or asan increased consequence occurring at the original frequency (event tree context). Ingeneral, these two approaches are not mutually exclusive, but are essentially iterativeand are additive to the list of incidents generated for total risk determination (Table1.2). An initial analysis is used to identify primary incidents and their effect zones.Effect zones should be defined in terms sufficient to cause equipment failure ratherthan fatality or injury effects. As equipment failures require greater effect intensities(Section 2.2), effect zones for these will be considerably smaller than those for fatalityor injury. Several companies use this domino effect analysis to assist in layout design forparticularly hazardous items (large inventories of flammable or toxic materials).

Increasing the frequency of occurrence of identified major incidents was employedby the HSE (1978) for Canvey in the treatment of domino effects. In fault tree terms,extra external events (Section 3.3.3) are added, to account for domino mechanismsleading to the top event. For example, BLEVE incidents are frequently caused by leaksleading to pool or jet fires. Such leaks substantially increase the frequency of BLEVEincidents, but do not contribute much to the consequences.

The other approach is to increase the consequences of a given incident withoutmodification of frequency. This would be done in a CPQBA by adding extra outcomesto the event tree analysis (Section 3.2.2). Rather than terminating the event tree at asingle outcome (e.g., UVCE), the event tree would be continued with extra nodes toaccount for additional domino outcomes (e.g., BLEVE and toxic gas release from anearby vessel). The consequence analysis would consider both events. This procedurewould continue iteratively, until no significant additional domino incidents weregenerated.

LOGIC DIAGRAMA logic diagram showing the methodology of the two approaches is given in Figure6.1. A necessary input is the likelihood of vessel or pipework failure.

THEORETICAL FOUNDATIONDomino analysis is an extension of hazard identification. However, analysis should notcommence until the primary hazards have been identified and consequence calculationscompleted. Once identified, domino incidents are treated using conventional CPQRAmethodology.

INPUT REQUIREMENTS AND AVAILABILITYThe technique requires that a plot layout be available, showing the orientation and sep-arations between equipment items. Consequence zones sufficient to cause domino fail-ures and the frequencies of the primary incidents must be determined. These latterinputs are available from the normal techniques of CPQRA discussed in Chapters 2and 3. The layout information may not be available at the earliest stages of a design.Thus, to perform domino analysis at this stage one may have to await this information,or assume separation distances based on existing practices and codes.

OUTPUTThe output is a modified list of consequences and frequencies for each incident, whichis integrated into the risk calculations, as outlined in Chapter 4.

IDENTIFY HAZARDOUSINCIDENTS

(HEP GUIDELINES)

PERFORM INITIALCONSEQUENCE ANALYSIS

(Section 2.1)

DETERMINEVESSELS/PIPESSUBJECTTODOMINO EFFECTS

(Section 2.2)

REVISE INCIDENTCONSEQUENCE/FREQUENCYLIST

FIGURE 6.1. Logic diagram for domino incident investigation.

EVENTTREECONTEXTAMEND INCIDENT LIST

INCREASING MAGNITUDEOFINCIDENTSWITH

EXISTING FREQUENCIES

FAULTTREECONTEXTAMENDINCIDENTLIST

USING EXISTINGINCIDENT AND INCREASE

FREQUENCIES

SIMPLIFIED APPROACHESThe selection of only those domino incidents that lead to a significant escalation of inci-dent consequences or frequencies is a simplified approach.

6.1.3. Sample Problem

Consider two adjacent spheres, one for LPG and one for ammonia. A risk analysis forthese, as well as the initial ones considering single sphere failure cases, can be developedby domino incident analysis to consider failure cases involving the two spheres.

An extract from the event tree for the LPG sphere alone might terminate at out-comes such as BLEVE or UVCE at frequencies of FE and P17. This analysis omitspotential failure of the nearby ammonia sphere due to overpressure from heat radiationexposure or fragment effects from blast impacts.

Separate consequence calculations for the LPG sphere BLEVE and UVCE inci-dents show that overpressure effects from either incident could lead to collapse and fail-ure of the ammonia sphere (at 0.3 and 0.9 probabilities, respectively). Theconsequence of LPG sphere failure should be increased to include the possible effects ofammonia release, incorporating the likelihood of ammonia sphere failure by thedomino effects.

Alternatively, ammonia tank total collapse, which might have been included in theoriginal analysis at a frequency FA, should be increased:

(^A)modified = FA + 0.3F8+ 0.9 F11

6.1.4. Discussion

STRENGTHS AND WEAKNESSESThe strength of domino analysis is that the frequencies and consequences developed inthe CPQRA are appropriate and not underestimated. Also, the largest incidents, suchas those involving multiple vessels, that may not have been identified in the initialhazard identification stage are adequately considered. It is particularly important toconsider domino effects for congested layouts. The primary weakness of domino analy-sis is the uncertainty of the probabilities assigned to the frequencies of equipment fail-ures predicted to be involved in a domino effect.

IDENTIFICATION AND TREATMENT OF POSSIBLE UNCERTAINTIESThe technique is based on the validity and accuracy of effects models (Section 2.3), andon the judgment of the analyst in recognizing the potential interaction. The effect nec-essary to involve nearby equipment is difficult to determine precisely. Where possibleaccident records should be consulted to determine whether domino effects haveoccurred previously for the type of primary failure being considered.

UTILITYThe technique is relatively simple in concept, but it can involve substantial work toapply rigorously. Modification of the frequency of large event consequences can beapplied. Domino analysis requires the same skills as defined in Chapters 2 and 3 forCPQRA.

RESOURCES NEEDEDThe resources needed are similar to those required for Chapters 2 and 3. If layout verifi-cation is the purpose of the analysis, using the appropriate consequence analysis todetermine effect zones would be adequate. If a risk estimate is necessary, both conse-quence and frequency models will be necessary. This might increase the effort of theCPQRA by 10-20%.

AVAILABLE COMPUTER CODENo known computer codes specifically address the issue of domino incidents.

6.2. Unavailability Analysis of Protective Systems

6.2.1. Background

PURPOSEProtective systems, such as pressure vessel relief valves, function to prevent or mitigatethe occurrence of incidents. This section shows how to determine the probability that aprotective system will be in a failed state when a demand on that system occurs.

TECHNOLOGYMany processes, as well as individual pieces of equipment, are protected by specificprotection systems. Protection can be achieved by various means, but is usually of twotypes: protective systems that relieve excessive energy (e.g., relief valves) and those thatisolate the energy (e.g., alarm and shutdown systems). Protective systems, unlike manyother process systems, can fail in two distinct ways:

1. Protective systems can fail in a manner such that failure is revealed (e.g., a rup-ture disc that fails prematurely due to cyclic fatigue at the normal operatingcondition of the process equipment). If the discharge from the system has beendesigned properly, no hazard will result; but there could be economic implica-tions such as lost and off-spec product, and process downtime.

2. Protective systems fail to function on demand allowing design conditions to beexceeded. In this failure pathway, the failure is unrevealed until the demandoccurs (e.g., a high-pressure switch fails to shut down a pump).

Although revealed failures of a protective system are important in consideringboth safety and reliability, unrevealed failures are also important.

The failure to function of particular systems for long periods of mission time is ofmajor importance. A parameter that is also very useful when considering unrevealedfailures in protective systems is the unavailability or probability of failure on demand orfractional dead time (FDT). These terms are synonymous in this context. This parame-ter is a probability and is the average fraction of time that the protective system isunavailable to do its assigned function. FDT will be used to identify this parameter inthe remainder of the section.

APPLICATIONSKnowledge of the FDT of a protective component or system can be used in many ways.Such information can provide direct input into a fault tree (Section 3.2.1), especially

those fault trees using gate-by-gate techniques. Deviations that take the process out ofcontrol create "demands55 on protective systems. If the frequency of a demand [demandrate (D)] on a protective system is known and the FDT of the protective system isknown, then a resulting "hazard or incident rate55 (PT) can be calculated. For lowdemand rates and small FDTs, the hazard or incident rate can be obtained by directmultiplication of the demand rate and FDT.

H = Dx FDT (6.2.1)

where H is the hazard or incident rate per year; D is the demand or incident rate peryear; and FDT is the fractional dead time.

This analysis approach can also be used to optimize the design of a protectivesystem to produce a lowest practicable FDT or the minimum required FDT necessaryto achieve a particular target criteria. Lawley and Kletz (1975) discuss the applicationof these techniques to high-pressure trip systems. The Fault Tree Handbook (Roberts etal., 1981) provides some other examples.

6.2.2. Description

DESCRIPTION OF THE TECHNIQUEDetailed texts on reliability engineering (Henley and Kumamoto, 1981; Smith, 1985)provide a broad introduction to the analysis of protective system unavailability. Appen-dix E of this volume provides additional definitions and formulas that are relevant.

Consider a protective system, with possible unrevealed failures, consisting of asingle component. At any time, the component can be in a failed or operational state.There are only two opportunities to determine whether the protective system isworking:

1. If a demand occurs and the protective system has to operate.2. If a proof test is used to check the system on a routine basis.

Obviously, if a protective system is never proof tested, the system will continue todegrade until it fails. The probability of a failure on demand will increase as a functionof time. However, if a proof test is used to detect and "repair5 the system to workingorder, the probability of failure on demand is significantly reduced.

The fractional dead time (FDT) of a single component protective system due tocomponent failure is therefore a function of both the failure rate of the component (A),and the proof test interval (T).

The failure rate dictates, on the average, how often failures occur. If it is assumedthey occur randomly at any time during a proof test interval, then, on average, over alarge number of failures, failure will occur halfway through the proof test interval. Inany time increment the fraction of time the system will be down will be approximatelyA (772). In this expression, T/2 represents the average down time per failure and A theaverage number of failures per unit time.

The expression

FTDC =}(AT) (6.2.2)

where FDTC, fractional dead time due to component failure, is valid only when AT < 0.1.

Fractional dead time may be more accurately expressed as

FTDC=^/P(*)^ (6.2.3)1 o

If P(t) = fy Eq. (6.2.2) results.IfP(Z") = 1 - e~h then the following equation results:

FTDC =1-^(1 -*-Ar) <6'2-4)

It is obvious that the smaller the FDT of the protective system the smaller thechance the protective system will be failed when a demand occurs.

From Eq. (6.2.2) FDTC can be reduced by

1. reducing the proof test interval (T] or2. reducing the failure rate (A) of the component

From Eq. (6.2.2) it is apparent that there is a direct relationship between FDT andT. If T is reduced by one-half, FDT will be reduced by one-half. However, there arelimits to this approach. In many instances a proof test will be done while the plant isoperating and the test will require the protective system to be off-line when the test isconducted. The time that the system is off-line must be included as part of a total FDTand this contribution can become significant if proof testing is very frequent. This com-ponent of the total FDT is expressed as

FTD1 =^ (6.2.5)

where FDTt is the fractional dead time due to duration of on-line testing and £t is theaverage on-line time to conduct proof test.

In addition, each time the system undergoes a proof test, there is a probability,although usually very small, that the system will be restored in a failed state. This FDTcan be expressed as:

FTD =— P Tx x *^ et rp •*• et ̂

where FDTet is the fractional dead time due to human error from leaving the protectivecomponent or system in a failed state via proof testing; 1/T is the number of proof testsper time; Pet is the probability of human error per proof test; and T is the proof testinterval.

The FDTet is therefore equivalent to the human error probability of leaving thesystem in a fail-dangerous state following the proof test.

There is also a component of FDT from leaving the protective component orsystem in a failed state via repair. This FDT can be expressed as

FTDe r=(A fdr=iA f sT)Pe r (6.2.6)

where Afd = fail-to-danger failure rateAfs = fail-safe failure rateT = proof test intervalPer = probability of human error during the repair operation.

In Eq. (6.2.6), it is assumed that the proof test interval is reestablished after eachcomponent or system failure. Dangerous failures will not be revealed until the next prooftest, while fail-safe events will, on average, occur halfway through the proof test period.

Finally, failures of the components of a system may not always be independent.Therefore, there can be a contribution to a system fractional dead time due to commoncause failure which can be expressed as FDTCC

The unavailability contribution that is due to common cause failures often domi-nates redundant system unavailabilities. This topic is addressed in more detail in Sec-tion 3.3.1.

Total fractional dead time can be estimated as the sum of the fractional dead timesfor components of dead times without significant error when the fractional dead time isless than 0.1. When applying this probability equation ensure that the sum of fractionaldead times do produce a total fractional dead time less than 1.0 and preferably less than0.1.

The total fractional dead time (FDTx) is, therefore the sum of the fractional deadtimes for component or system failure (FDTJ5 on-line testing duration (FDTt),human error in proof testing (FDTet), and repairing (FDTer) the system and commoncause failures (FDTCJ.

FDTT = FDTC + FDTt + FDTet + FDTer + FDTCC

The fractional dead time due to component failure (FDTJ in single signal path sys-tems usually dominates the calculations for the total fractional dead time but the othereffects should be considered.

The second approach that can be used to reduce the fractional dead time of a pro-tective system is the use of redundancy. When using this approach, it is important toreduce the potential for common cause failures in the design stage as much as possible.

If fractional dead time for component or system failure is expressed per Eq. (6.2.3) as

FTDC =~fP(t)dt1 O

and if it is assumed there are two redundant components which are identical:

P(t] = (Ar)(Af)

and if it is also farther assumed that the components are tested as the same time, thensubstituting the above factors for P(t) into Eq. (6.2.3), and integrating, yields the fol-lowing approximation:

A2T2

FTD = —^- (6.2.8)

which applies to a redundant system where both components must be in a failed statefor the protective system to be failed.

A more general approximation is presented below for calculating the FDT forredundant systems:

"PTT^ — WT r

™-[r!(»-r)!j7+rT (6.2.9)

where m = number of components or systems that must work to be sure protectionoccurs on demand

n = number of redundant components or systemsr = n -m + 1

Comparing a simple 1 out of 2 redundant system (e.g., a vessel with two reliefvalves on-line, both sized for all overpressure demands)

m = I5 n = 2, and r = 2

which, when substituted into Eq. (6.2.9), becomes

FTD _(J!_Y_LWaMD "U.'O.'JU + lJ^

FTD =-A2T2

O

which is equivalent to Eq. (6.2.8).Table 6.1 provides expressions for FDT system failures for various combinations

ofm out of n systems.Because of common cause failures, there are lower limits of FDT achievable by

redundancy alone. These limits will vary depending on the type and complexity of theprotective system, but it is usually very difficult to achieve a total FDT of less than 10"4.

LOGIC DIAGRAMA simplified logic diagram for the assessment of protective system unavailability, whenused in conjunction with a hazard rate target, is given in Figure 6.2. In practice, it issometimes difficult to maintain total independence between the demands and the pro-tective system.

TECHNOLOGICAL FOUNDATIONThe mathematical methods employed in the analysis of protective system unavailabilityare those widely developed and used in the field of reliability engineering. Furtherdetails, derivations, and discussions are given in texts by Henley and Kumamoto(1981), Smith (1985), and Lees (1980).

TABLE 6.1. Fractional Dead Time Expressions for System Combinations

m n r Failure rates equivalent Failure rates not equivalent

FIGURE 6.2. Logic diagram for the assessment of protective system unavailability.

INPUT REQUIREMENTS AND AVAILABILITYGenerally, the following information is required:

• protective system configuration• maintenance programs and policies (to aid in establishing proof test intervals)• failure rate data for protective system components• system demand data• FDT target or hazard rate target (if specified)

System demand data are normally calculated using fault tree analysis (Section3.2.1). Information relating to failure rate data sources is given in Chapter 5. The

Accept design

Develop fault tree forprotective system

configuration

Failure rate data

Human Reliability data

Proof test interval

Considercommoncause failureSection 6.3

Develop fault tree forhazardous demands

Failurerate data

Humanreliability data

Calculate systemdemand rate, D

Start analysis

Calculate FDT forprotective system

Calculate hazard rateHH-DxFDT

Satisfactory

• Improve protective system:• Reduce proof test interval• Add additional protection• Provide redundancy

• Reduce demand rates

source and context of any quoted data should be checked prior to their use, to ensuretheir relevance to a particular application. The remainder of the information should beavailable (already existing in an operating plant or as developed in an engineeringdesign project).

OUTPUTThe output of this analysis is the total FDT of the protective system. This is used toassess the adequacy of the design and establish appropriate inspection and maintenanceprograms.

SIMPLIFIED APPROACHESThe calculations described in this section we simplified approximations. The assump-tion that ^T is <0.1 should be verified whenever Eq. (6.2.2) is used. To calculate ahazard rate, DT should also be < 0.1. A more rigorous model can be found in other ref-erences (e.g., Lees, 1982a).


The sample problem calculates the FDT for the system shown in Figure 6.3. FDT con-tributions from on-line test duration, human error in testing, repairing and restoringthe system, and common cause failure are not considered in this analysis. This systemconsists of three independent pressure sensors activating a trip valve via a 2-out-of-3voting system (i.e., failure of any one sensor is overridden by the two remaining correctsignals). Assume, in this example, that the logic operator is perfect (i.e., its failure rateis negligible relative to the other components in the system).

FDT of Trip Valve:Assume: Failure Rate, A = 2.1 x lO^/hr

Test interval T = 8760 hr (one year)

FTDva lve=JAT

= |(2.1"10~6/hr)(8760hr)

= 9.2"10~3

PRESSURE LOGIC FINALSENSORS OPERATOR ACTUATOR

DEVICE

FIGURE 6.3. Block diagram for the sample problem.

PRESSUREVESSEL

TRIPVALVE

2OUTOF

3

FDT of Pressure Sensors:Assume: Failure rate, X = 5.2 X lO^/hr

Test interval T = 4380 hr (one-half year)

FDTsensors = A2T2 (Table 6.1 for system requiring two units to operate out of three)

= (5.2 x 10-6/hr)2(4380 hr)2

= 5.2 x 1(T*

FDT of Protective System:

Total FDT of protective systems = FDTvaJve + FDTsensors

= 9.2 x IQ-3 + 5.2 x 10"4

= 9.7 x IQ-3

This is approximately a 1% probability of failure on demand. If the demand rate on thisprotective system were known, the hazard incident rate could be calculated directly.

6.2.4. Discussion

STRENGTHS AND WEAKNESSESAs a means of assessing the design of a protective system, this approach is simple andstraightforward and can provide valuable insight. The impacts of protective systems infault trees whose top events are hazard rates are not as easily assessed using minimal cutset methods (Appendix D), because the specific rate and probability information arenot retained in the methodology. However, minimal cut sets can be used successfullyto construct fault trees for protective system unavailability alone.

IDENTIFICATION AND TREATMENT OF POSSIBLE ERRORSThe type of analysis introduced in this section can be made more rigorous, but this isnot always justified given the assumptions that are built into these models and the qual-ity of the repair and failure rate data available.

A significant source of uncertainty is common cause failure, especially whereredundant systems are evaluated. Smith (1985) discusses the causes of and defensesagainst common cause failures, and also details various methods by which commoncause failures can be accounted for in system reliability predictions. However, there isvery little firm data on which to confirm or disprove models being used for commoncause analysis.

When performing FDT calculations, it is important to remember that FDT is aprobability and must, therefore be always less than or equal to 1.

UTILITYFor the analysis of the availability of complex interlock systems, a specialist in instru-mented systems and fault tree analysis may be required.

RESOURCES NEEDEDA simple system could take approximately 1 day to examine, while a moderately com-plex system might require 1 man-week for a risk analyst.

AVAILABLE COMPUTER CODESMost of the computer codes listed in Section 3.2.1 can be applied to this analysis.

6.3. Reliability Analysis of Programmable Electronic Systems

6.3.1. Background

Most process plants employ automatic control systems to improve plant safety effi-ciency, and product quality. In the past, a process control system consisted of sensorslocated in process lines, signals transmitted to either pneumatic or electron panel-mounted controllers to in-line control valves. Plant operators could assess the state ofeach unit operation by observing the instruments mounted in a control panel near theequipment or in a "central" control room. In addition to the "control loops" used forregulation of continuous process parameters (e.g., flow, pressure, level, temperature),a control system often contained two other types of automatic control functions:sequence controllers and process interlock systems. Sequence controllers set the controlvalve positions according to a time-based program as long as process conditions, whichwere being monitored by a process interlock system, were within an acceptable range.Sequence controllers—often mechanical timers with many user-set switches werewidely used for batch process control, and, more generally, for plant start-up and shut-down control. Process interlocks—large numbers of signal trip units and mechanicalrelays for execution of logic functions—were used extensively in batch plants. In addi-tion, these interlock systems were often separated from the continuous control hard-ware and identified as the plant safety shutdown system.

Microprocessor-based digital control systems—introduced to the petrochemicalinstrument marketplace in 1975—were a pioneering effort using microcomputerchips, cathode-ray tubes, and digital communication technology for petrochemicalplant control applications. The equipment was designed to replace the then state-of-the-art combination of minicomputer and electronic panel board instruments. Eachmanufacturer's product consisted of a number of computing modules that individuallywere limited in function but which were easily integrated into a single control system toprovide the display and control functions required for plant control. In a parallel devel-opment, intelligent modules to perform the tasks of sequential control and processlogic solution—often identified as programmable logic controllers (PLCs)—were firstused in the automotive industry and then integrated into the DDCS. The total com-puter-based plant control system is identified as a programmable electronic system(PES) in this text. Increasingly, hard-wire logic has been replaced in the chemicalindustry with PLCs, and these modules have been integrated into general purposePESs to provide both regulatory and interlock plant control functions.

Control strategy complexity, the programmable nature of the controller modules,and integration of multiple control functions into one large system make PED reliabil-ity difficult to assess. Many companies have recognized the importance of PES reliabil-ity and have developed guidelines, primarily using qualitative approaches. Someapproaches to reliability analysis of these programmable control systems have consid-ered the logic solvers (the software-driven units) as "black boxes" in which a specific

input will produce a specific output with no loss of reliability. This approach hasproven inadequate since the modular structure of PES allows for many equipment con-figurations to accomplish the same control functions. The reliability of a PES cannot beanalyzed without applying methods that incorporate more detailed information aboutthe configuration and environment of the total control system. A worst case analysiscould assume that the "black box" malfunctions. For catastrophic outcomes this con-servative assumption may be appropriate; usually the PES has a fairly good reliability.

PURPOSEThe purpose of this section is to introduce quantitative methods for the analysis ofPESs. Primarily as they are used as protective systems. The methodology follows veryclosely to the techniques presented in Section 6.2.

TECHNOLOGYMethodologies for both qualitative and quantitative analysis of PESs used for safetysystems have been described in recent publications (Balls et al., 1986). Work sponsoredby the HSE (1987) and the Norwegian Foundations for Scientific and IndustrialResearch (Hokstad and Bodsberg, 1987) make effective use of reliability block dia-grams (Section 6.4.7) to determine the probability that a safety monitoring system willfunction when and only when needed. Signal flow diagrams are used to help identify"critical paths" of safety system components that must be functioning in order for thePES to carry out a specific task, and well-known approximation methods, based onKinetic Tree Theory (Appendix D), are applied to these block diagram models to cal-culate total safety system availability and unavailability.

APPLICATIONSThe techniques presented in this section can be applied in principle to any PES. Theresults of the analysis, which can be obtained in the form of a fractional dead time(FDT), can be used to provide direct input into fault trees (Section 3.2.1). The analysistechniques can also be used to compare reliability characteristics of alternative PESconfigurations to produce an optimal configuration to meet specific system reliabilityrequirements.

6.3.2. Description

INTRODUCTIONSeveral disciplines are required to design an emergency shutdown system to achieve arequired level of safety and reliability for a manufacturing facility. An existing facility ora preliminary plant design is the beginning point for this analysis. The logic diagram ofFigure 6.4 illustrates the design process. Each identified activity is discussed brieflyhere and references are provided to literature sources or to other sections of theCPQRA Guidelines for more detailed information. A more complete explanation ofconsideration required to analyze the reliability of both continuous control and safetymonitoring PESs is provided in the HSE (1987) and Hokstand (1987) reports. Bothreferences present complete and complementary analyses of more complete PES con-trol systems.

FIGURE 6.4. PES safety system development logic diagram for using a quantitative designmethodology.

Failure rate data

Develop fault tree forhazardous demands

Human reliability data

Calculate demandrate

Criteria

Define safety interlockrequirements for PES

Calculate targetunavailability for PES

Define PESarchitecture

PES systemfailure rate data

Define system testmethods and frequency

Calculate PESsystem FDT

Calculate PES systemunreliability

Design complete

Does PESsystem meet

target unavaila-bility

IsPESsystem

reliabilityadequate?

ANALYZE SYSTEM

Develop Fault Trees for Hazardous Demands. Fault trees can be developed to identify fail-ures or deviations that could generate hazardous process conditions. From these quali-tative fault trees, some insight into where protective systems would be most beneficialcan be determined.

Calculate Demand Rate. A demand rate (Z)), in units of reciprocal time, can be deter-mined from qualitative analysis of the fault trees using equipment failure data andhuman reliability data.

Define Safety Interlock Requirements for PES. Process designers can now (1) define a setof process sensors (typically flow, pressure, level, temperature, and composition mea-surements) that detect the potential hazards and sound alarms and (2) actuate shut-down systems that allow bringing the plant operations to a safe state.

Some of the hazardous events developed in the fault trees may have demands thatare considered "critical." Company policy may dictate that safety systems designed forthese events remain external to a PES, or, conversely, particular measures may be nec-essary (e.g., voting systems) to permit the use of a PES. This is a subject that is stillreceiving a great deal of attention and there is no clear cut industry practice.

Calculate Target Unavailability for PES. Once the application for the PES has beenestablished, configuration of the system can begin. If a quantitative target has beenestablished, a level of average unavailability or FDT can be set. Conceptually, from Eq.(6.2.1) this can be expressed as

FDT = /#D (6.3.1)

and consequently, the FDT of the PES must be no greater than that calculated fromEq. (6.3.1) if the required target availability of the PES is to be met.

Other approaches to determination of an acceptable FDT are possible (companystandards, cost-benefit analysis, etc.).

Define PES Architecture. The initial definition of the PES modules and their inter-connection-which is referred to as the system architecture-takes the general formshown in Figure 6.5. The modular system may have single, double, or triple elementsat each functional stage of the safety system (sensor, input channel, logic solver, com-munication to remaining PES, output channel, and process manipulator). This struc-ture provides the designer with the option of 1-to-w paths for identifying andcorrecting an abnormal plant condition.

Two types of failures may occur in each signal path of the safety system: one arevealed fault in which the path calls for and executes a process shutdown when not war-ranted by a plant hazard; and two, an unrevealed failure in which the safety systemremains operable but unable to take the necessary action when a hazard does occur.The first type signal path failure is referred to as a fail-safe event; the second, a fail-danger fault. These two types of failures have different consequences on operationalreliability and must be considered separately in the design and reliability analysis of thesafety system. Fail-safe and fail-danger probabilities of the computer-based modulesare not equal, and different techniques are required to avoid the negative aspects ofeach type module failure within the PES.

The safety system is required to operate on demand—a fundamentally differentrequirement (from a reliability analysis point of view) than that of continuous controlloops which also can be part the PES. A failure in a PES continuous control modulewill quickly become apparent since a process control loop is dynamic. If redundant orfault-tolerant control module configurations are used for these loops, a fault will bequickly recognized and can often be repaired without interruption to the plant opera-tion. A failure of a module associated with a monitoring safety system will be detectedonly when a diagnostic test is performed or when a hazardous event occurs.

Define System Test Methods and Frequency. A dangerous failure may be discovered by aperiodic test of the installed system or by an automatic, self-diagnostic feature of thePES. System detection of failures within the logic solver (Figure 6.5) is normallyaccomplished by an external watchdog timer (WDT), and in some machines by exclu-sive internal diagnostics, which should be included in a reliability analysis. HSE (1987)suggests that a PLC with a WDT and the following periodically executed internal diag-nostics:

• check sums of the program stored in ROM• tests of read/write memory• tests of internal registers• tests of arithmetic and logic functions• is able to self-identify 98% of the FDT faults within this part of the PES.

Proof testing of the PES safety system requires careful planning, Both written testprocedures and bypass switches to permit proof testing of total signal paths withoutdisrupting plant operation are essential. Designing the test procedures as part of thesafety system logic with appropriate alarms when safety functions are removed from

FIGURE 6.5. PES block diagram model.

Watchdog timer

Power supply

DIAGNOSTICS

Logicsolver

M,

Logicsolver

M2

Logicsolver

M3

SENSORS INPUTCHANNELS

PROGRAMMABLEELECTRONICS

OUTPUTCHANNELS ACTUATOR

DEVICES

COMM

UNIC

ATIO

NS

COMM

UNIC

ATIO

NS

service minimizes the potential of human mistakes during proof testing and mainte-nance. An unavailability analysis of a system must include both the frequency of prooftesting and the probability that the tester will leave the system in an inoperative mode(Section 6.2).

CALCULATE SYSTEM FDT

Qualification of PES Module Failure Rates. Failure rates of the various modules (Am)described in Figure 6.5 may be calculated using standard reliability engineering tech-niques, making use of generic component failure rate data. The number of like compo-nents in each module are counted; the typical failure rate of each component type isselected from a data bank and multiplied by the number of components; and themodule failure rate is calculated as the sum of the failure rates of all constituent parts.Military Standardization Handbook MIL-HNDBK 217E (1986) provides compo-nent failure rate data at reference conditions and as a function of component type andquality. Often, manufacturers have historical data obtained from the module repairrecords of installed equipment that are available to validate calculated Am values. Jansen(1980) applied this technique to a DCS Basic Controller, and the results illustrating thetechnique for the power supply unit and the total multicard controller are shown inTable 6.2.

Failure rates of PES modules depend on the quality of the electrical componentsselected by the manufacturer, the amount of "burn-in-time" applied to the modulesbefore installation in the plant, and the environmental conditions of the installed PESequipment (temperature and humidity control, cleanliness of the atmosphere, level ofvibration, limited electrical loads, etc. are all important). A quantitative assessment ofPES reliability requires assumptions about the quality of the environment that will bemaintained for the life of the control equipment. The calculated or vendor-suppliedcomponent failure rates (Am) must be modified when environmental conditions varyfrom reference values. Jansen (1980) suggests the following adjustments to calculated

^=

Environmental Failure rate Typicalcondition multiplier location

Mild 0.5 Control room, laboratory

Average 1.0 Industrial plant, ground level

Demanding 2.0 Chemical process, shipboard

Mobile 4.0 Containers on road, rail

Ranges of Am for instruments and PES components are widely reported in the lit-erature (Lees, 1976; HSE, 1987, BaUs et al., 1987; Hokstad and Bodsberg, 1987,etc.). Estimates of characteristic failures modes and analytical methods to include in thereliability analysis are discussed in the HSE (pp. 43-45), Balls, and Hokstad references.Selected reliability parameters are listed in Table 6.3. Typical PES reliability factors canpermit calculation of:

• the beneficial effect of internal self-diagnostics and WDTs often found in PLCsand

• the harmful effects of common mode failures are included in this table along withinstruments and PES module failure rates.

Signal Path Failure Rates. The modular design of a PE Safety System allows calculationof a signal path failure rate (A ) from reliability data describing each of the constituentmodules. Failure rates of signal paths can be calculated from the following equation:

Ap = Aml + Am2 + • • • + A111x [failures/hr] (6.3.2)

The path failure rate is obtained by summing failure rates of only those modulesthat have a direct influence on activating a shutdown. Each of the PES modules hasmore than one failure mode, and a meaningful analytical evaluation requires expand-ing each A1n into several additive parts that are relevant to the analysis.

Calculation of FDTand System Reliability. FDT calculations (Section 6.2) can be carriedout to determine if an acceptable FDT can be obtained initially for a single path PESusing appropriate proof testing intervals. Since the PES signal path between an abnor-

TABLE 6.2. Failure Rate Calculation for a DCS Basic Controller Power Supply Unitand Basic Controller Unit

Failure rate, f/lQ6 hr

Type component Number used Per comp. Total

Basic controller power supply unit

Power transformer 1 0.5 0.5

Tank capacitor, paper/oil 3 0.5 1.5

Power rectifier 3 0.4 1.2

Discharge resistor 1 0.1 0.1

Reservoir cape., alum, foil 7 0.5 3.5

Relay 1 0.7 0.7

Neon indicator 1 0.2 0.2

LED indicator 1 2.0 2.0

Wired connections 40 0.005 0.2

Power junction assembly 1 0.03 0.03

Total subassembly failure rate: 9.93//106 hr

Basic controller unit

Mux. ADC. Memory, CPU and regulator 1 128.9 128.9

Output card 2 21.5 43.0

Terminal panel 1 14.5 14.5

Back plane and wiring 1 10.9 10.9

Power supply 1 9.9 9.9

Cables (set) 1 2.0 2.0

Other items 1 0.2 0.2

Basic controller failure rate: 209.4f/106 hr

TABLE 6.3. Guide Figures for PES Reliability Analysis

Failure rate// 106 hr Failure mode (%)

Event or factor Min Type Max Min Type Max

Sensor failure

Temperature, flow 10

Analyzer 200

Transmitter failure

Pressure, level, flow 20

Temperature 30

Analyzer 100

Signal trip units

Electronic and pressure switch oninstrument air 10

Valves

Solenoid and process block 10

Sensor, transmitter, and valve

Failure modes fail-to-danger 33

Single analog or digital I/O channel fail 0.3 1.0 3.0

Failure mode—high value 50

Failure mode—low value 50

Programmable electronics (logic solver)

Fails to danger 30 100 300

PE failures detected by externalwatch dog timer 90

PE failures detected by external WDTand extensive internal self-diagnostics 98

Common cause failure, /?-factor

Identical signal path modules 3 10 30

Diverse signal path modules 0.3 1 3

Probability of incorrect test/repair

Leaving system in FTD mode 1 5 10

mal process event and the corrective actuator shown in Figure 6.5 contains severalmodules, the fail-to-danger mode path failure rate (Ap) can be significant. There are twoconcerns in this analysis. The first is related to the FDT itself. The failure rate of a singlepath PES may be too high for a reasonable proof testing interval. The second concern,however, relates to the reliability of the PES. Because of the number of modulesrequired, opportunities for spurious failures of the protective system that can lead tounplanned start-ups and shutdowns not only represent undesired events from a processefficiency standpoint, but may create additional hazards during restarts. Unreliabilityof the PES can normally be expressed in failures-per-unit time, and summation of the

failure rates of the individual components can provide an estimate of the frequency ofspurious safety system trips, that is, failures caused by a false detection of a dangerouscondition due to component failure, rather than due to an actual dangerous condition.

To achieve low FDT and high PES reliability, redundant path and fault-tolerantPESs have been developed.

• Redundant PESs. Redundant PE Safety Systems have two or more parallel signalpaths, in which the signal from each path is used to determine when a shutdownaction is required. The number and characteristics of each signal path plus thelogic implemented determine the reliability of the total safety system. Each signalpath is independent until they converge in the logic unit.

• Fault-Tolerant PESs. These PESs have modules connected in multiple signalpaths with internal management logic that ensures operation of each functionalstep from input to output signal. When a PES component or module fails, thefault-tolerant system is able to identify the failure, communicate the need formaintenance, and maintain reliable operation during repair. Typically, two outof three voting logic is found in these applications. If three or more sensors andprocess manipulators are included with a fault-tolerant PES, the total safetysystem may become fault tolerant.

The petrochemical industry is tending toward the use of redundant and fault-tolerant PE safety systems. Care must be exercised in the design of these systems, how-ever, to avoid introducing sources of failure that are common to parallel signal paths(Section 3.3.1). Potential common cause failures in these systems can come from

• power supply faults• signal and chassis ground faults• logic software bugs not found in commissioning• common system design faults.

Any analysis of a PES that takes advantage of redundant signal paths shouldinclude consideration of common cause failure elements.

LOGIC DIAGRAMFigure 6.4 illustrates the quantitative design process for the development of a PES tomeet system unavailability and unreliability targets.

THEORETICAL FOUNDATIONSThe mathematical methods employed in the analysis of PESs are those widely devel-oped and used in the field of reliability engineering. A recent publication by the HSE(1987) is a very useful reference that provides a more complete account of the applica-tion of reliability engineering principles of PESs.

INPUT REQUIREMENTS AND AVAILABILITYGenerally, the following information will be required:

• protective system philosophy• PES configuration• maintenance programs and policies

• failure rate data for the PE components (both spurious failure and fail-to-danger)

• unavailability and reliability targets (if available).

OUTPUTThe output of this analysis is a calculated value for the PES FDT and unreliability.

SIMPLIFIED APPROACHESThe calculations in this section are simplified approaches. Consideration should begiven to all possible contributors to FDT including common cause failures whenredundant systems are required.


A reactor has one of its raw materials supplied by a positive displacement feed pump. Ifthe reactor is overfilled, overpressurization results, with a release of the reactor materialthrough the relief system. A high-pressure switch trips the pump if the pressure getstoo high. The basic system consists of a pressure sensor, PES modules consisting of aninput module, programmable electronics and output module, and a signal trip unit.The block diagram in Figure 6.6 shows the system signal path:

Part 1. For this configuration, using failure rate data from Table 6.3, determine

a. fractional dead time assuming• proof testing occurs four times per year• downtime due to proof test is insignificant• downtime due to any repair is insignificant• probability of the system being left in a failed state either during a proof test or

during repair is 0.01b. frequency of spurious failures for this system.

Part 2. Repeat the above calculations for a total redundant system using the assump-tions in Part 1 plus common causes-factor of 0.1.

Pressure Input Programmable Output Tripsensor module electronics module actuator

(*») (*|) (In) (*o> (*v>

FIGURE 6.6. Reactor overpressure protective system block diagram.

A5 = failure rate of sensor

A1 = failure rate of input module

Am= failure rate of programmable electronics

A0 = failure rate of output module

Av = failure rate of trip actuator

Solution to Part 1:a. The fail-to-danger failure rate of the total signal path (A d) can be expressed as

^Pd = ̂ + A1 + ^n + ^o + ^v

where A5 = failure rate of sensorA1 = failure rate of input moduleAm = failure rate of programmable electronicsA0 = failure rate of output moduleAv = failure rate of trip actuator.

From Table 6.3 for fail-to-danger failure modes

A5 = 20 x IQ-6Ar x 0.33

A1 = 1 x 1(T6THr x 0.50

Am = 100 x lO^/hr x 0.02

A0 = 1 x 10^/hr x 0.50

Av = 10 x 10^/hr x 0.33

therefore, Apd = 12.9 x lO^/hr. The fail-safe failure rate of the signal path can beexpressed as

A p i = A i + A i + A m + A 0 + A v

From Table 6.3 for fail-safe failure rates

A5 = 20 x lO^/hr x 0.67

A 1 = 1 x 10-6Ar x 0.50

Am = 100 x 10-6Ar x 0.98

A0 = 1 x 10-6Ar x 0.50

Av = 10 x IQ-6Ar x 0.67

therefore, Aps = 119 x lO^/hr. From Eq. (6.2.7),

FDTt = FDTC + FDTet + FDTer

FDTt = V2 ApdT + Pet + [(Apd + y^ps)T]Per

FDT = y2(12.9 x 10-6/hr)(2160hr) + 0.01+ [(12.9 x IQ-6Ar + V2 x 119 x 10^/^)2160 hr](0.01)

FDTt = 0.014 + 0.01 + 0.002

FDTt =0.026

b. The solution to Part b is simply the frequency of spurious failures (Aps) that isequal to 119 x 10"6Ar or approximately one trip per year.

Solution to Part 2:a. From Part 1

Apj for signal path 1 = 12.9 x 10"6Ar

Apj for signal path 2 = 12.9 x 10"6Ar

From Eq. (6.2.7)

FDT1 = FDTt + FDTct + FDTcr + FDTCC

FDTc = iA2pdr2

FDTet = The probability of human error during proof testing is probably notindependent. For this example the fractional dead time due to prooftesting is calculated using a common cause factor of 0.1

FDTn = Pj9

FDT^ = [IA2P5T2 +^r2JP0

FDT« = (1^)03)Substituting the approximate numerical values in each of the above expressions

produces

FDTt = [|(12.9 x 10-*/hr)2(2160 hr)2 4- 0.01](0.1)

+ [i(110 x 10^/hr)2(2160)2 4-1(12.9 x 1(T6)2

x(2160hr)2](0.01) H- [j(12.9 x ICr6/hr)(2160 hr)](0.1)

FDTt = 0.00026 + 0.001 + 0.00023 + 0.00139

FDTt = 0.0029

b. The frequency of spurious failures is simply

2 x Aps = 238 x 10~*/hr or approximately two trips per year.

Results from the sample problem are

Frequency ofFDT1 spurious failures

Single systems 0.026 1 per year

Redundant system 0.0029 2 per year

The redundant system improved the FDT by an order of magnitude. Commoncause failures and human error in proof testing are major contributors. The redundantsystem, however, has doubled the number of spurious failures which may not beacceptable. A fault tolerant PES with full two of three logic should be considered ifimprovement in spurious failure rates is desired. The FDTt will increase slightly com-pared to the redundant system.

6.3.4. Discussion

STRENGTHS AND WEAKNESSESThe quantitative assessment methodology outlined in this section is an importantadvance over previous "black box" and simple unavailability calculations based on com-

posite equipment failure rate estimates. PES component unavailability, common modeeffects in redundant control system structures, and the probability that the mainte-nance of the safety system will contribute to an ultimate fail-to-danger event are allcombined to estimate the reliability of a specific installation. Main weaknesses of thisapproach stem from

• the lack of adequate instrument and PES module failure rate data plus the subjec-tive nature of techniques for analyzing common cause and maintenance effectson system FDTs

• the complexity of the analysis (which is difficult to make much simpler while stillretaining physical significance)

• the depth of PES knowledge needed by the assessor.

UTILITYThe quantitative analysis of PES system is not a well-developed tool, primarily becausethe technology of PESs (DDCS and PLCs) is relatively young. The technology ofsystem components continues to change and mature. Typically, a PE safety systeminstalled 5 years ago will now include replacement modules containing componentswhich did not exist then.

Manufacturers change PES hardware and software as products are enhanced andwhen design faults are identified in established products. Since PES maintenance con-sists of removing modules in which failures have been diagnosed and returning theseunits to the manufacturer for replacement, functional reliability of an installed DCSdepends on the owner's attention both to keeping the installed equipment in step withthe supplier's latest product revision and to testing of the safety logic in concert withreplacement of modified modules. This managed upgrade of installed digital systems isessential to maintaining the integrity and reliability of PESs. Therefore, the quantifiedreliability of the system is a dynamic parameter influenced greatly by site management.

Reliability parameters determined from a quantified analysis of a PES permit

• meaningful comparisons to be made among a large number of possible PES con-figurations that can be envisioned for any real plant application, and

• identification of the truly significant factors (number of independent signalpaths, proof test interval, quality of maintenance, and common cause limita-tions) in achievement of a high integrity safety system.

6.4. Other Techniques

There are many other techniques that complement CPQBA. While fault and event treeanalyses are the key techniques, some brief reviews of other techniques that have beenfound useful in CPQRA are provided here. References are also provided for furtherdetails. The other techniques include MORT (Section 6.4.1), IFAL (Section 6.4.2),Hazard Warning Structure (Section 6.4.3), Markov Processes (Section 6.4.4), MonteCarlo Techniques (Section 6.4.5), GO Methods (Section 6.4.6), Reliability Block Dia-grams (Section 6.4.7), Cause-Consequence Analysis (Section 6.4.8), and MultipleFailure/Error Analysis (Section 6.4.9).

Next Page

Documents

Simultaneous increase of frequencies and consequences of ...ftp.feq.ufu.br/Luis_Claudio/Segurança/Safety... · An extract from the event tree for the LPG sphere alone might terminate