Otc 18504

Embed Size (px)

Citation preview

  • 8/10/2019 Otc 18504

    1/6

    Copyright 2007, Offshore Technology Conference

    This paper was prepared for presentation at the 2007 Offshore Technology Conference held inHouston, Texas, U.S.A., 30 April3 May 2007.

    This paper was selected for presentation by an OTC Program Committee following review ofinformation contained in an abstract submitted by the author(s). Contents of the paper, aspresented, have not been reviewed by the Offshore Technology Conference and are subject tocorrection by the author(s). The material, as presented, does not necessarily reflect anyposition of the Offshore Technology Conference, its officers, or members. Papers presented atOTC are subject to publication review by Sponsor Society Committees of the OffshoreTechnology Conference. Electronic reproduction, distribution, or storage of any part of thispaper for commercial purposes without the written consent of the Offshore TechnologyConference is prohibited. Permission to reproduce in print is restricted to an abstract of notmore than 300 words; illustrations may not be copied. The abstract must contain conspicuousacknowledgment of where and by whom the paper was presented. Write Librarian, OTC, P.O.Box 833836, Richardson, TX 75083-3836, U.S.A., fax 01-972-952-9435.

    AbstractThis paper shows how to deal properly with "Safety IntegrityLevels" (SIL) as per IEC 61508 [1] and 61511 [2] for "High

    Integrity Protection Systems" (HIPS) which are more and

    more extensively used in oil industry to replace traditional

    protection systems. If IEC 61508/511 are rather efficient froman organizational point of view, some difficulties

    unfortunately exist at definition and calculation levels. The

    formulae proposed in part 6 of IEC 61508 are, for example,

    not really tractable for actual industrial systems. This paperdescribes the probabilistic methods and tools that we have

    developed in our company to overcome the above difficulties.

    Three main conventional methods are investigated: "Fault

    Trees" which, when properly handled, are very efficient forlow demand topside HIPS, markovian approach which is

    interesting but tractable only for very small systems and

    Monte Carlo simulation on behavioural models (Petri Nets or

    AltaRica Data Flow formal language) which is efficient in any

    cases. Results are given on simple examples in order to showthe principles of the various approaches. It is interesting to

    notice that using those approaches is simpler than what is

    proposed in the standards. Therefore, until the publication of

    an updated version improving IEC 61508 part 6, it seemsbetter to replace it by sound conventional methods and tools

    adapted to SIL calculations for production systems. We have

    began to disseminate this approaches toward our contractors.

    IntroductionIn the oil industry, the traditional protection systems defined

    in API 14C are more and more often replaced by safety

    instrumented systems: the so-called HIPS (High IntegrityProtection Systems). Therefore, according to IEC 61508 and

    IEC 61511 Standards, their SILs (Safety Integrity Levels)

    shall be calculated

    Unfortunately, when using above standards some

    difficulties arises [3, 4]. They often remain ignored by those

    who perform SIL studies and the main ones are the next:

    1. insufficient failure taxonomy and definitions,2. tests and maintenance procedures handling,3. introduction of the Safe failure Fraction (SFF) which

    is not a relevant concept,

    4. probability of Failure on Demand (PFD) andProbability of Failure per Hour (PFH) Calculations.After presenting briefly the 3 first problems, the 4

    th one

    will be detailed more in depth to show what we have done to

    cope with the various SIL assessment problems encountered in

    the oil industry:1. topside HIPS easily tested and maintained,2. subsea HIPS difficult to test and maintain,3. preventive HIPS.

    According to the standards topside and subsea HIPS areso-called "low demand mode" safety instrumented systems

    (SIS) while preventive HIPS are so-called "continuous" mode

    SIS. This paper is mainly focused on methods and tools

    devoted to low demand mode HIPS.

    Failure taxonomyIn IEC 61508 and 61511 standards the failures are split into

    dangerousor safeand detectedor undetected.This is a littledifferent of the classical failure taxonomy:

    safeversus unsafe,

    revealedversus hidden,

    time dependantversus on demand.If the dangerous failure definition is very similar to the

    classical unsafefailure (i.e. a failure which tends to inhibit the

    safety function) this is not the case for the safefailure. In thestandards it is only a failure which is not dangerous when in

    the traditional approach this is a failure which tends to

    anticipate the safety action.

    The classification "detected versus undetected" of the

    standard is similar to "revealedversus hidden". The problem is

    that the users reading the standards too quickly thought thatthey can assimilates straightforwardly revealed failures with

    safe failures. Of course this is generally not true.

    Among the third class of failures, only the time dependanfailures are recognized by the standards. The true "on demand

    failure" are completely ignored and, even worse, are hidden

    behind the term Probability of Failure on Demand (PFD

    which encompasses only time dependant failures occurred

    during the test interval. This is a big problem as those failures

    OTC 18504

    High-Integrity Protection Systems (HIPS): Methods and Tools for Efficient SafetyIntegrity Levels Analysis and CalculationsJean-Pierre Signoret, Total

  • 8/10/2019 Otc 18504

    2/6

    2 OTC 18504

    which are likely to arise each time a demand (including tests)

    produces a change in the states of some items (ex. rupture of

    the spring of a relay, blockage of a valve, ...) cannot bedetected by any test.

    Low demand versus continuous demand modesThe standards identify two modes of functioning: SIS working

    in low demand mode of operation and SIS working in highdemandor continuousmodes. The calculation of the so-called

    Probability of failure on Demand (PFD) is required for thefirst ones when the calculation of the so-called Probability of

    Failure per Hour(PFH) is required for the second ones.

    When the demand frequency is low compared to the test

    frequency, a failure occurring during the test interval is likelyto be detected and repaired before the occurrence of a demand.

    As the SIS behaves almost independently of the demand from

    the Equipment Under Control (EUC), the probability of

    accident is equal to the SIS "average unavailability"

    multiplied by the demand frequency. Then, the PFD of thestandards is simply the traditional unavailability of the

    classical approach.When the demand increases until becoming of the same

    order or higher than the test frequency, there are almost no

    chances to detect and repair a failure before a demand occurs.

    If the demand becomes continuous, the probability goes even

    to 0. Then, an accident occurs as soon as the SIS fails and theprobability of accident is equal to the unreliabilityF(T) of the

    SIS over [0, T]. Contrary to above, this cannot be directly

    assimilated to the PFH as per the standards and it is more

    difficult to find a sound equivalent in the classical approach.The simplest way may be is to consider PFH = F(T)/T.

    Then, except the use of a new name for a classical

    parameter, there is no problem for low demand mode as aclassical approach may be used. This is more difficult for high

    demand or continuous mode where the standards introduce thenew PFH concept which has no clear mathematical definition.

    Therefore, it is a good idea to come back to the sound

    probabilistic concepts of unavailability and reliability whenassessing the SIL of safety instrumented systems and this is

    what we do in our Company when dealing with our HIPS.

    Models and toolsGeneralities

    Probabilistic calculations are described in part 6 of IEC61508 which gives a list of simplified formulae for some

    particular cases and describes some examples about the

    mixing of several components to model systems.

    Unfortunately the method used to establish the formula is notprovided nor the underlying hypotheses under which formulaeare valid. This would not be a problem as part 6 is only

    informative and its content is not intended to cope with all

    problems encountered and there is no obligation to use it. The

    problem arises because, instead of considering this part assimple information, a lot of users use it as if it was normative.

    They trust that they just have to apply it to obtain relevant

    results and even worse some providers have developedsoftwares based on that. Then everybody, without the tiniest

    idea of what a probability may be, allows himself to perform

    SIL calculations ... This is very dangerous indeed!

    What is presented in part 6 doesn't reflect the state of the

    art in probabilistic calculations for industrial systems. This is

    not really a method of analysis and the underlyingmathematical background and hypothesis are not clearly

    stated. Using them without understanding the hypothesis is

    likely to produce non conservative results and this is not

    acceptable from a safety point of view.

    Three years ago, we have noticed that the SIL studies fromour contractors were very poor and have diagnosed that the

    common cause failurewas part 6 of IEC 61508. This is whywe have decided to develop a sound methodology from the

    method and tools currently in use in house since the early

    eighties:

    fault treeapproach because it is a method widelyused by most of our reliability contractors,

    markovian approach because it is sometimesknown by our contractors,

    behaviouralmodelling(Petri nets or AltaRica DFlanguage) andMonte Carlosimulationbecause i

    is solving all the difficulties encountered.

    Single component analysis

    For a single component with a dangerous undetected failure

    rate and a test interval, IEC 61508 part 6 gives the

    traditional widely used formula:

    PFDavg /2 (1)

    This very simple formula is valid only when the underlying

    hypothesis is met: EUC stopped both during tests and

    maintenance. Unfortunately this is almost never true for actuaindustrial systems for which the use of formula 1 leads to non

    conservative results ...In fact a lot of other parameters have to

    be taken into consideration to properly model components as

    actually used in industry. For example:

    : repair rate,

    : on demandfailure probability,

    : test staggering,

    : test duration.With the above parameter, PFDavg becomes:

    PFDavg /2 +/ + /(.) + / (2)

    This is more complex than formula n1! Test staggering has

    no effect on the average but other parameters may be

    considered like test coverage or human errors. A thoroughanalysis of the component is needed to identify which

    parameters to handle according to the actual study.

    Figure 1 : PFD(t) of a single component

    As shown on figure 1, PFDavg is not a good representation of

    the component behaviour because its unavailability PFD(t) is a

    0 .0 1 00 0 2 00 0 3 00 0 4 00 0 5 00 0 6 00 0 7 00 0 8 00 0

    0.0e+0

    2.0e-2

    6.0e-2

    1.0e-1

    1.4e-1

    1.8e-1

    2.2e-1

    Time

    = 5. 10-5 h-1

    = 0.01

    = 0.05

    = 4380 h

    = 2190 h

    = 10 h

    SIL0

    SIL1

    PFDavg

    8.12 10-2

    0 .0 1 00 0 2 00 0 3 00 0 4 00 0 5 00 0 6 00 0 7 00 0 8 00 0

    0.0e+0

    2.0e-2

    6.0e-2

    1.0e-1

    1.4e-1

    1.8e-1

    2.2e-1

    Time

    = 5. 10-5 h-1

    = 0.01

    = 0.05

    = 4380 h

    = 2190 h

    = 10 h

    SIL0

    SIL1

    PFDavg

    8.12 10-2

  • 8/10/2019 Otc 18504

    3/6

    OTC 18504 3

    time dependant saw-tooth curve which may spread over

    several SIL zones. On figure 1, 29% of the time is spent in

    "SIL0" when PFDavg gives SIL1. If an averageis a very goodaggregated parameter for a cloud of dots, it may give

    misleading indications for continuous curves. On the figure

    above, 3.5 months are spent in "SIL0" before each of the tests.

    Figure 2 shows in detail what happens when a test is

    performed. The jump corresponds to the on demandfailuredue to the test itself. After that, the test is performed and its

    duration is . At the end of the test, there are two possibilities:either the component is available or in revealed failure state

    (unavailable). The competition between these two situations

    gives the decreasing part of the curve. It reaches its minimum

    for the MTTR (i.e. 1/) and after that increases again as

    shown on figure 1.

    Figure 2 : Detail of the test zone

    On figure 2 the component remains available for its safetyfunction during the tests but if it is tested off line it would be

    unavailable over the whole test duration and a contribution

    /would be added to PFDavg as shown on formula 2. Thismay be the main contributor to PFDavg and this is obviously

    forgotten when using formula 1. Of course, methods and tools

    are needed to draw the previous curves and this is what we are

    going to explain now.

    Fault Tree (FT) approach

    Most of our HIPS are HIPPS (High Integrity Pressure

    Protection Systems) operating in low demand mode and

    PFDavg has to be calculated according to the standards. Faulttree approach just designed for unavailability calculations

    seems to be the right tool to do that. Nevertheless, this shall

    be done cautiously because this works only if the leaves (i.e.the failures events) are independent. A strong warning shall be

    done here: PFDavg of individual leaves cannot be combinedthrough a FT to calculate the PFDavg of the top event.

    Formulae like 1 or 2 shall not be used directly in fault tree

    calculation even if it is a common practice implemented insome FT software packages which misled their users in

    achieving bad calculations. This is very dangerous as resultsare more and more non conservative as fault toleranceincreases (and higher SIL are targeted) .

    Fortunately, PFDavg can be very easily assessed just by

    averaging the instantaneous unavailability PFD(t) of the topevent over the relevant period [0,T]. As shown on figure 3,

    this may be done just by using the instantaneous

    unavailabilities PFDi(t) of each leaves [5].

    Various sources of dependencies have to be considered:

    limited number of repair teams. This is generallynegligible for safety systems which are reliableand have priority for repair (i.e. the probability to

    have to repair 2 safety failures at the same time is

    low),

    repair at the second failure. This is a strongdependency which cannot be managed by FT,

    reliability calculation. This induces strongdependencies between all the leaves and, except

    in particular cases, FT are not able to perform

    genuine reliability calculations.Therefore, in oil industry, the fault tree approach is mainly

    efficient for low demand topside HIPS. It must be usedcautiously for preventive topside HIPS and shall be discarded

    for subsea HIPS.

    Figure 3: Example of Fault tree

    Figure 3 illustrates a very simple system made of 3 identicalcomponents working in 2oo3. Only dangerous undetected

    failure rate () and test interval () have been modelled as it isenough to draw two important conclusions:

    the difference between PFDavg and the maximumPFD is big ( 2.5 times),

    the equivalent failure rate of the system isobviously not constant between tests.

    On figure 3, the three components are tested at the same time

    but it is interesting to see what happen when tests are

    staggered.

    Figure 4: Test staggering effect

    As shown on this figure, staggering the tests makes PFDavg

    and PFDmax decreasing. This is due to two different effects:

    the maximum decreases because the tests aremore homogeneously distributed along the time,

    the average decreases because the common causefailures(CCF) test frequency has been multipliedby three.

    Therefore, staggering the tests is a best way to improve

    PFDavg (i.e. SIL), to decrease the spreading of the saw-toothcurve and to diminish the impact of common cause failures

    This very important characteristic is completely missed by

    IEC 61508 part 6. All what we have presented above has beenintroduced in a special SIL menu of ARALIA Workshop

    which is the software that is used in our office.

    5. e-3

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    Max : 3.5e-2

    Mean : 1.4e-2

    = 10%

    1 32

    CCF

    TOP

    2oo3

    0. 1000. 2000. 3000. 4000. 5000.

    2. e-2

    1. e-2

    PFD(t)

    = 1.10-4

    = 1000

    5. e-3

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-3

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    Max : 3.5e-2

    Mean : 1.4e-2

    = 10%

    1 32

    CCF

    TOP

    2oo32oo3

    0. 1000. 2000. 3000. 4000. 5000.

    2. e-2

    1. e-2

    PFD(t)

    0. 1000. 2000. 3000. 4000. 5000.

    2. e-2

    1. e-2

    PFD(t)

    0. 1000. 2000. 3000. 4000. 5000.

    2. e-2

    1. e-2

    PFD(t)

    = 1.10-4

    = 1000

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    = 10%

    1 32

    CCF

    TOP

    2oo3

    = 1.10-4

    = 1000

    0. 1000. 2000. 3000. 40005000.

    1. e-2

    2. e-2

    PFD(t)

    0 1000 2000 3000 4000. 5000.

    5. e-2 5. e-2

    0 1000 2000 3000 4000. 5000.

    2. e-3

    0 1000 2000 3000 4000. 5000.

    Max : 1.4e-2

    Mean : 7.3e-3

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    5. e-2

    0. 1000. 2000. 3000. 4000. 5000.

    = 10%

    1 32

    CCF

    TOP

    2oo32oo3

    = 1.10-4

    = 1000

    0. 1000. 2000. 3000. 40005000.

    1. e-2

    2. e-2

    PFD(t)

    0. 1000. 2000. 3000. 40005000.

    1. e-2

    2. e-2

    PFD(t)

    0 1000 2000 3000 4000. 5000.

    5. e-2

    0 1000 2000 3000 4000. 5000.

    5. e-2 5. e-2

    0 1000 2000 3000 4000. 5000.

    5. e-2

    0 1000 2000 3000 4000. 5000.

    2. e-3

    0 1000 2000 3000 4000. 5000.

    2. e-3

    0 1000 2000 3000 4000. 5000.

    Max : 1.4e-2

    Mean : 7.3e-3

    6000 6200 6400 6600 6800 7000 7200

    5.0e-2

    1.0e-1

    1.5e-1

    2.0e-1

    2.5e-1

    PFD(t)

    Time

    6000 6200 6400 6600 6800 7000 7200

    5.0e-2

    1.0e-1

    1.5e-1

    2.0e-1

    2.5e-1

    PFD(t)

    Time

  • 8/10/2019 Otc 18504

    4/6

  • 8/10/2019 Otc 18504

    5/6

    OTC 18504 5

    token(small circle in black) in the various places(represented

    by circles). It is currently running:

    from this state the sensor may fail by itself () or by acommon cause failure (message ?DCC received from

    another sub PN),

    when failed, it enters in a waiting-for-detection state,

    the failure is detected only when a rig reach the

    location above the subsea platform and performs atest (a token arrives in the place Rig),

    when the failure is detected, it has to wait to berepaired until a rig is available to do that (message?StR),

    then the repair is started and, when finished, thesensor becomes available again.

    Figure 8: PN of a subsea sensor

    Figure 9 shows an example of sub Petri nets as they are

    actually input into the Petri Net module of our GRIF software

    package which implements generalized stochastic Petri netswhich have been enhanced thanks to the use of predicatesand

    assertions. This is a very powerful tool that we are using both

    for our RAM (Reliability, Availability and Maintainability)and SIL calculations.

    Figure 9: PN with predicates and assertions

    When using such sub Petri nets, it is rather easy to build step

    by step the Petri Net modelling the behaviour of a whole

    safety instrumented system like this on figure 6. Of course,

    results obtained in this way gives curves which are lesssmooth than those obtained by analytical ways (FT, Markov)

    because only few and well chosen points can be calculated.

    Anyway, figure 10 is similar to figure 7. On this curve the

    90% confidence bounds of the simulation have beenrepresented and we can see that the Monte Carlo simulation is

    rather accurate. It has to be noted that this curve has been

    drawn only to assess the maximum PFD. The PFDavg which

    is straightforwardly calculated just by estimating the time

    spent in the failed state gives the same results as fault-tree and

    markovian approaches.

    Figure 10: Results from Monte Carlo Simulation

    The above approach is very powerful for SIL calculations buunfortunately some analysts are reluctant to handle PN

    (especially when they think that using the simplistic

    calculations of IEC 61508 part 6 is sufficient to do that!). Thiis why we had developed five years ago a tool allowing hiding

    Petri nets behind reliability block diagram (RBD) thanks to

    the use of libraries of pre-established sub models [9]. Then

    we have developed a library of periodically tested componentsto use this tool for SIL calculations. The principle is verysimple:

    building a model like the RBD on figure 6,

    attributing the relevant sub PN model to each moduleby picking it from the library,

    launching the calculations to obtain the results.The overall Petri Net is automatically generated and calculatedand it is not even necessary to have heard about PN to use this

    tool! Of course, it is always possible to look at the generated

    PN if we want to. Used on the simple HIPS example, this

    leads exactly to the same results as presented on figure 10.

    ConclusionThe problems encountered when using IEC 61508 and IEC61511 standards may be easily overcame provided that

    relevant methods and tools are used. Fortunately, thesemethods and tools exist and some of them have begun to be

    developed a long time ago in the early eighties and even in the

    seventies. Our company has adapted several reliabilitysoftware packages, ARALIA Workshop, GRIF and

    COMBAVA to perform SIL calculations according to what is

    presented in this paper. This constitutes a powerful set of tool

    able to manage any SIL calculations on sound bases and wehave began their dissemination toward our contractors.

    It is rather interesting to notice that, most of the time, it is

    easier to perform rigorous calculations by using the righ

    methods and tools than trying to apply ad-hoc formulae likethose presented in the standards. Then, until the publication of

    updated versions of the standards improving IEC 61508 part 6

    it seems better, to forget it and replace it by more accurate andefficient methods like those presented in this paper which

    would help to the purpose of SIL calculations of oi

    production systems.

    References

    1. IEC 61508: "Functional safety of electric/ electronicprogrammable electronic safety-related systems. Parts 1-7"(1998, 2000)

    PFDavg

    PFD(t)

    PFDavg

    PFD(t)

    W

    Failure

    WaitR

    End of Rep

    = 0

    D

    Detection

    Start Rep.

    Running

    Waiting

    Failure

    detected

    Repair

    Rig on

    location

    Rig

    ?DCC

    DCC

    ?StR

    !nbF=nbF+1!nbF=nbF+1

    !nbF=nbF-1

    ?EoR

    = 0

    = 0 = 0

    W

    Failure

    WaitR

    End of Rep

    = 0

    D

    Detection

    Start Rep.

    Running

    Waiting

    Failure

    detected

    Repair

    Rig on

    location

    Rig

    ?DCC

    DCC

    ?StR

    !nbF=nbF+1!nbF=nbF+1

    !nbF=nbF-1

    ?EoR

    = 0

    = 0 = 0

  • 8/10/2019 Otc 18504

    6/6

    6 OTC 18504

    2. IEC 61511: "Functional safety. Safety Instrumented systems for theprocess sector. Parts 1-3". (2003)

    3. Signoret, J-P: "Managing risks in HIPS by making SIL calculationseffective". Published in the proceedings of the seminar

    IQPC2006, Aberdeen, Great Britain, (2006).

    4. Dutuit,Y., Innal, F., Rauzy, A., Signoret, J-P: "An attempt tounderstand better and apply some recommendations of IEC

    61508 standard". Published in the proceedings of theinternational seminar ESREIDA, Trondheim, Norway (2006).

    5. Rauzy, A., Dutuit, Y., Signoret, J-P: "Assessment of safety integrity

    levels with fault trees". Published in the proceedings of theinternational conferenceESREL, Estoril, Portugal (2006).

    6. Signoret, J-P: "Modeling the behavior of complex industrial

    systems with stochastic Petri nets". Published in the proceedings

    of the international conference ESREL 1998, TrondheimNorway. (1998)

    7. Dutuit,Y., Signoret, J-P: "Tutorial on dynamic system modelling byusing stochastic Petri nets and Monte Carlo simulation"Presented at the international conferences Konbin03, Gdansk

    Poland and ESREL 2003, Maastricht, the Netherland. (2003)

    8. Arnold, A. Griffault, A., Rauzy, A., Point, G. : "TheAltaRica language and its semantics". FundamentaInformaticae, 34 (2000) 109.

    9. Signoret, J-P, Chabot, J-L, Hutinet, T.: "Hiding a stochastic Petr

    net behind a reliability block diagram". Published in theproceedings of the international conference ESREL, LyonFrance(2002)