Otc 18504

8/10/2019 Otc 18504

1/6

Copyright 2007, Offshore Technology Conference

This paper was prepared for presentation at the 2007 Offshore Technology Conference held inHouston, Texas, U.S.A., 30 April3 May 2007.

This paper was selected for presentation by an OTC Program Committee following review ofinformation contained in an abstract submitted by the author(s). Contents of the paper, aspresented, have not been reviewed by the Offshore Technology Conference and are subject tocorrection by the author(s). The material, as presented, does not necessarily reflect anyposition of the Offshore Technology Conference, its officers, or members. Papers presented atOTC are subject to publication review by Sponsor Society Committees of the OffshoreTechnology Conference. Electronic reproduction, distribution, or storage of any part of thispaper for commercial purposes without the written consent of the Offshore TechnologyConference is prohibited. Permission to reproduce in print is restricted to an abstract of notmore than 300 words; illustrations may not be copied. The abstract must contain conspicuousacknowledgment of where and by whom the paper was presented. Write Librarian, OTC, P.O.Box 833836, Richardson, TX 75083-3836, U.S.A., fax 01-972-952-9435.

AbstractThis paper shows how to deal properly with "Safety IntegrityLevels" (SIL) as per IEC 61508 [1] and 61511 [2] for "High

Integrity Protection Systems" (HIPS) which are more and

more extensively used in oil industry to replace traditional

protection systems. If IEC 61508/511 are rather efficient froman organizational point of view, some difficulties

unfortunately exist at definition and calculation levels. The

formulae proposed in part 6 of IEC 61508 are, for example,

not really tractable for actual industrial systems. This paperdescribes the probabilistic methods and tools that we have

developed in our company to overcome the above difficulties.

Three main conventional methods are investigated: "Fault

Trees" which, when properly handled, are very efficient forlow demand topside HIPS, markovian approach which is

interesting but tractable only for very small systems and

Monte Carlo simulation on behavioural models (Petri Nets or

AltaRica Data Flow formal language) which is efficient in any

cases. Results are given on simple examples in order to showthe principles of the various approaches. It is interesting to

notice that using those approaches is simpler than what is

proposed in the standards. Therefore, until the publication of

an updated version improving IEC 61508 part 6, it seemsbetter to replace it by sound conventional methods and tools

adapted to SIL calculations for production systems. We have

began to disseminate this approaches toward our contractors.

IntroductionIn the oil industry, the traditional protection systems defined

in API 14C are more and more often replaced by safety

instrumented systems: the so-called HIPS (High IntegrityProtection Systems). Therefore, according to IEC 61508 and

IEC 61511 Standards, their SILs (Safety Integrity Levels)

shall be calculated

Unfortunately, when using above standards some

difficulties arises [3, 4]. They often remain ignored by those

who perform SIL studies and the main ones are the next:

1. insufficient failure taxonomy and definitions,2. tests and maintenance procedures handling,3. introduction of the Safe failure Fraction (SFF) which

is not a relevant concept,

4. probability of Failure on Demand (PFD) andProbability of Failure per Hour (PFH) Calculations.After presenting briefly the 3 first problems, the 4

th one

will be detailed more in depth to show what we have done to

cope with the various SIL assessment problems encountered in

the oil industry:1. topside HIPS easily tested and maintained,2. subsea HIPS difficult to test and maintain,3. preventive HIPS.

According to the standards topside and subsea HIPS areso-called "low demand mode" safety instrumented systems

(SIS) while preventive HIPS are so-called "continuous" mode

SIS. This paper is mainly focused on methods and tools

devoted to low demand mode HIPS.

Failure taxonomyIn IEC 61508 and 61511 standards the failures are split into

dangerousor safeand detectedor undetected.This is a littledifferent of the classical failure taxonomy:

safeversus unsafe,

revealedversus hidden,

time dependantversus on demand.If the dangerous failure definition is very similar to the

classical unsafefailure (i.e. a failure which tends to inhibit the

safety function) this is not the case for the safefailure. In thestandards it is only a failure which is not dangerous when in

the traditional approach this is a failure which tends to

anticipate the safety action.

The classification "detected versus undetected" of the

standard is similar to "revealedversus hidden". The problem is

that the users reading the standards too quickly thought thatthey can assimilates straightforwardly revealed failures with

safe failures. Of course this is generally not true.

Among the third class of failures, only the time dependanfailures are recognized by the standards. The true "on demand

failure" are completely ignored and, even worse, are hidden

behind the term Probability of Failure on Demand (PFD

which encompasses only time dependant failures occurred

during the test interval. This is a big problem as those failures

OTC 18504

High-Integrity Protection Systems (HIPS): Methods and Tools for Efficient SafetyIntegrity Levels Analysis and CalculationsJean-Pierre Signoret, Total

8/10/2019 Otc 18504

2/6

2 OTC 18504

which are likely to arise each time a demand (including tests)

produces a change in the states of some items (ex. rupture of

the spring of a relay, blockage of a valve, ...) cannot bedetected by any test.

Low demand versus continuous demand modesThe standards identify two modes of functioning: SIS working

in low demand mode of operation and SIS working in highdemandor continuousmodes. The calculation of the so-called

Probability of failure on Demand (PFD) is required for thefirst ones when the calculation of the so-called Probability of

Failure per Hour(PFH) is required for the second ones.

When the demand frequency is low compared to the test

frequency, a failure occurring during the test interval is likelyto be detected and repaired before the occurrence of a demand.

As the SIS behaves almost independently of the demand from

the Equipment Under Control (EUC), the probability of

accident is equal to the SIS "average unavailability"

multiplied by the demand frequency. Then, the PFD of thestandards is simply the traditional unavailability of the

classical approach.When the demand increases until becoming of the same

order or higher than the test frequency, there are almost no

chances to detect and repair a failure before a demand occurs.

If the demand becomes continuous, the probability goes even

to 0. Then, an accident occurs as soon as the SIS fails and theprobability of accident is equal to the unreliabilityF(T) of the

SIS over [0, T]. Contrary to above, this cannot be directly

assimilated to the PFH as per the standards and it is more

difficult to find a sound equivalent in the classical approach.The simplest way may be is to consider PFH = F(T)/T.

Then, except the use of a new name for a classical

parameter, there is no problem for low demand mode as aclassical approach may be used. This is more difficult for high

demand or continuous mode where the standards introduce thenew PFH concept which has no clear mathematical definition.

Therefore, it is a good idea to come back to the sound

probabilistic concepts of unavailability and reliability whenassessing the SIL of safety instrumented systems and this is

what we do in our Company when dealing with our HIPS.

Models and toolsGeneralities

Probabilistic calculations are described in part 6 of IEC61508 which gives a list of simplified formulae for some

particular cases and describes some examples about the

mixing of several components to model systems.

Unfortunately the method used to establish the formula is notprovided nor the underlying hypotheses under which formulaeare valid. This would not be a problem as part 6 is only

informative and its content is not intended to cope with all

problems encountered and there is no obligation to use it. The

problem arises because, instead of considering this part assimple information, a lot of users use it as if it was normative.

They trust that they just have to apply it to obtain relevant

results and even worse some providers have developedsoftwares based on that. Then everybody, without the tiniest

idea of what a probability may be, allows himself to perform

SIL calculations ... This is very dangerous indeed!

What is presented in part 6 doesn't reflect the state of the

art in probabilistic calculations for industrial systems. This is

not really a method of analysis and the underlyingmathematical background and hypothesis are not clearly

stated. Using them without understanding the hypothesis is

likely to produce non conservative results and this is not

acceptable from a safety point of view.

Three years ago, we have noticed that the SIL studies fromour contractors were very poor and have diagnosed that the

common cause failurewas part 6 of IEC 61508. This is whywe have decided to develop a sound methodology from the

method and tools currently in use in house since the early

eighties:

fault treeapproach because it is a method widelyused by most of our reliability contractors,

markovian approach because it is sometimesknown by our contractors,

behaviouralmodelling(Petri nets or AltaRica DFlanguage) andMonte Carlosimulationbecause i

is solving all the difficulties encountered.

Single component analysis

For a single component with a dangerous undetected failure

rate and a test interval, IEC 61508 part 6 gives the

traditional widely used formula:

PFDavg /2 (1)

This very simple formula is valid only when the underlying

hypothesis is met: EUC stopped both during tests and

maintenance. Unfortunately this is almost never true for actuaindustrial systems for which the use of formula 1 leads to non

conservative results ...In fact a lot of other parameters have to

be taken into consideration to properly model components as

actually used in industry. For example:

: repair rate,

: on demandfailure probability,

: test staggering,

: test duration.With the above parameter, PFDavg becomes:

PFDavg /2 +/ + /(.) + / (2)

This is more complex than formula n1! Test staggering has

no effect on the average but other parameters may be

considered like test coverage or human errors. A thoroughanalysis of the component is needed to identify which

parameters to handle according to the actual study.

Figure 1 : PFD(t) of a single component

As shown on figure 1, PFDavg is not a good representation of

the component behaviour because its unavailability PFD(t) is a

0 .0 1 00 0 2 00 0 3 00 0 4 00 0 5 00 0 6 00 0 7 00 0 8 00 0

0.0e+0

2.0e-2

6.0e-2

1.0e-1

1.4e-1

1.8e-1

2.2e-1

Time

= 5. 10-5 h-1

= 0.01

= 0.05

= 4380 h

= 2190 h

= 10 h

SIL0

SIL1

PFDavg

8.12 10-2

0 .0 1 00 0 2 00 0 3 00 0 4 00 0 5 00 0 6 00 0 7 00 0 8 00 0

0.0e+0

2.0e-2

6.0e-2

1.0e-1

1.4e-1

1.8e-1

2.2e-1

Time

= 5. 10-5 h-1

= 0.01

= 0.05

= 4380 h

= 2190 h

= 10 h

SIL0

SIL1

PFDavg

8.12 10-2

8/10/2019 Otc 18504

3/6

OTC 18504 3

time dependant saw-tooth curve which may spread over

several SIL zones. On figure 1, 29% of the time is spent in

"SIL0" when PFDavg gives SIL1. If an averageis a very goodaggregated parameter for a cloud of dots, it may give

misleading indications for continuous curves. On the figure

above, 3.5 months are spent in "SIL0" before each of the tests.

Figure 2 shows in detail what happens when a test is

performed. The jump corresponds to the on demandfailuredue to the test itself. After that, the test is performed and its

duration is . At the end of the test, there are two possibilities:either the component is available or in revealed failure state

(unavailable). The competition between these two situations

gives the decreasing part of the curve. It reaches its minimum

for the MTTR (i.e. 1/) and after that increases again as

shown on figure 1.

Figure 2 : Detail of the test zone

On figure 2 the component remains available for its safetyfunction during the tests but if it is tested off line it would be

unavailable over the whole test duration and a contribution

/would be added to PFDavg as shown on formula 2. Thismay be the main contributor to PFDavg and this is obviously

forgotten when using formula 1. Of course, methods and tools

are needed to draw the previous curves and this is what we are

going to explain now.

Fault Tree (FT) approach

Most of our HIPS are HIPPS (High Integrity Pressure

Protection Systems) operating in low demand mode and

PFDavg has to be calculated according to the standards. Faulttree approach just designed for unavailability calculations

seems to be the right tool to do that. Nevertheless, this shall

be done cautiously because this works only if the leaves (i.e.the failures events) are independent. A strong warning shall be

done here: PFDavg of individual leaves cannot be combinedthrough a FT to calculate the PFDavg of the top event.

Formulae like 1 or 2 shall not be used directly in fault tree

calculation even if it is a common practice implemented insome FT software packages which misled their users in

achieving bad calculations. This is very dangerous as resultsare more and more non conservative as fault toleranceincreases (and higher SIL are targeted) .

Fortunately, PFDavg can be very easily assessed just by

averaging the instantaneous unavailability PFD(t) of the topevent over the relevant period [0,T]. As shown on figure 3,

this may be done just by using the instantaneous

unavailabilities PFDi(t) of each leaves [5].

Various sources of dependencies have to be considered:

limited number of repair teams. This is generallynegligible for safety systems which are reliableand have priority for repair (i.e. the probability to

have to repair 2 safety failures at the same time is

low),

repair at the second failure. This is a strongdependency which cannot be managed by FT,

reliability calculation. This induces strongdependencies between all the leaves and, except

in particular cases, FT are not able to perform

genuine reliability calculations.Therefore, in oil industry, the fault tree approach is mainly

efficient for low demand topside HIPS. It must be usedcautiously for preventive topside HIPS and shall be discarded

for subsea HIPS.

Figure 3: Example of Fault tree

Figure 3 illustrates a very simple system made of 3 identicalcomponents working in 2oo3. Only dangerous undetected

failure rate () and test interval () have been modelled as it isenough to draw two important conclusions:

the difference between PFDavg and the maximumPFD is big ( 2.5 times),

the equivalent failure rate of the system isobviously not constant between tests.

On figure 3, the three components are tested at the same time

but it is interesting to see what happen when tests are

staggered.

Figure 4: Test staggering effect

As shown on this figure, staggering the tests makes PFDavg

and PFDmax decreasing. This is due to two different effects:

the maximum decreases because the tests aremore homogeneously distributed along the time,

the average decreases because the common causefailures(CCF) test frequency has been multipliedby three.

Therefore, staggering the tests is a best way to improve

PFDavg (i.e. SIL), to decrease the spreading of the saw-toothcurve and to diminish the impact of common cause failures

This very important characteristic is completely missed by

IEC 61508 part 6. All what we have presented above has beenintroduced in a special SIL menu of ARALIA Workshop

which is the software that is used in our office.

5. e-3

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

Max : 3.5e-2

Mean : 1.4e-2

= 10%

1 32

CCF

TOP

2oo3

0. 1000. 2000. 3000. 4000. 5000.

2. e-2

1. e-2

PFD(t)

= 1.10-4

= 1000

5. e-3

0. 1000. 2000. 3000. 4000. 5000.

5. e-3

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

Max : 3.5e-2

Mean : 1.4e-2

= 10%

1 32

CCF

TOP

2oo32oo3

0. 1000. 2000. 3000. 4000. 5000.

2. e-2

1. e-2

PFD(t)

0. 1000. 2000. 3000. 4000. 5000.

2. e-2

1. e-2

PFD(t)

0. 1000. 2000. 3000. 4000. 5000.

2. e-2

1. e-2

PFD(t)

= 1.10-4

= 1000

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

= 10%

1 32

CCF

TOP

2oo3

= 1.10-4

= 1000

0. 1000. 2000. 3000. 40005000.

1. e-2

2. e-2

PFD(t)

0 1000 2000 3000 4000. 5000.

5. e-2 5. e-2

0 1000 2000 3000 4000. 5000.

2. e-3

0 1000 2000 3000 4000. 5000.

Max : 1.4e-2

Mean : 7.3e-3

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

5. e-2

0. 1000. 2000. 3000. 4000. 5000.

= 10%

1 32

CCF

TOP

2oo32oo3

= 1.10-4

= 1000

0. 1000. 2000. 3000. 40005000.

1. e-2

2. e-2

PFD(t)

0. 1000. 2000. 3000. 40005000.

1. e-2

2. e-2

PFD(t)

0 1000 2000 3000 4000. 5000.

5. e-2

0 1000 2000 3000 4000. 5000.

5. e-2 5. e-2

0 1000 2000 3000 4000. 5000.

5. e-2

0 1000 2000 3000 4000. 5000.

2. e-3

0 1000 2000 3000 4000. 5000.

2. e-3

0 1000 2000 3000 4000. 5000.

Max : 1.4e-2

Mean : 7.3e-3

6000 6200 6400 6600 6800 7000 7200

5.0e-2

1.0e-1

1.5e-1

2.0e-1

2.5e-1

PFD(t)

Time

6000 6200 6400 6600 6800 7000 7200

5.0e-2

1.0e-1

1.5e-1

2.0e-1

2.5e-1

PFD(t)

Time

8/10/2019 Otc 18504

4/6

8/10/2019 Otc 18504

5/6

OTC 18504 5

token(small circle in black) in the various places(represented

by circles). It is currently running:

from this state the sensor may fail by itself () or by acommon cause failure (message ?DCC received from

another sub PN),

when failed, it enters in a waiting-for-detection state,

the failure is detected only when a rig reach the

location above the subsea platform and performs atest (a token arrives in the place Rig),

when the failure is detected, it has to wait to berepaired until a rig is available to do that (message?StR),

then the repair is started and, when finished, thesensor becomes available again.

Figure 8: PN of a subsea sensor

Figure 9 shows an example of sub Petri nets as they are

actually input into the Petri Net module of our GRIF software

package which implements generalized stochastic Petri netswhich have been enhanced thanks to the use of predicatesand

assertions. This is a very powerful tool that we are using both

for our RAM (Reliability, Availability and Maintainability)and SIL calculations.

Figure 9: PN with predicates and assertions

When using such sub Petri nets, it is rather easy to build step

by step the Petri Net modelling the behaviour of a whole

safety instrumented system like this on figure 6. Of course,

results obtained in this way gives curves which are lesssmooth than those obtained by analytical ways (FT, Markov)

because only few and well chosen points can be calculated.

Anyway, figure 10 is similar to figure 7. On this curve the

90% confidence bounds of the simulation have beenrepresented and we can see that the Monte Carlo simulation is

rather accurate. It has to be noted that this curve has been

drawn only to assess the maximum PFD. The PFDavg which

is straightforwardly calculated just by estimating the time

spent in the failed state gives the same results as fault-tree and

markovian approaches.

Figure 10: Results from Monte Carlo Simulation

The above approach is very powerful for SIL calculations buunfortunately some analysts are reluctant to handle PN

(especially when they think that using the simplistic

calculations of IEC 61508 part 6 is sufficient to do that!). Thiis why we had developed five years ago a tool allowing hiding

Petri nets behind reliability block diagram (RBD) thanks to

the use of libraries of pre-established sub models [9]. Then

we have developed a library of periodically tested componentsto use this tool for SIL calculations. The principle is verysimple:

building a model like the RBD on figure 6,

attributing the relevant sub PN model to each moduleby picking it from the library,

launching the calculations to obtain the results.The overall Petri Net is automatically generated and calculatedand it is not even necessary to have heard about PN to use this

tool! Of course, it is always possible to look at the generated

PN if we want to. Used on the simple HIPS example, this

leads exactly to the same results as presented on figure 10.

ConclusionThe problems encountered when using IEC 61508 and IEC61511 standards may be easily overcame provided that

relevant methods and tools are used. Fortunately, thesemethods and tools exist and some of them have begun to be

developed a long time ago in the early eighties and even in the

seventies. Our company has adapted several reliabilitysoftware packages, ARALIA Workshop, GRIF and

COMBAVA to perform SIL calculations according to what is

presented in this paper. This constitutes a powerful set of tool

able to manage any SIL calculations on sound bases and wehave began their dissemination toward our contractors.

It is rather interesting to notice that, most of the time, it is

easier to perform rigorous calculations by using the righ

methods and tools than trying to apply ad-hoc formulae likethose presented in the standards. Then, until the publication of

updated versions of the standards improving IEC 61508 part 6

it seems better, to forget it and replace it by more accurate andefficient methods like those presented in this paper which

would help to the purpose of SIL calculations of oi

production systems.

References

1. IEC 61508: "Functional safety of electric/ electronicprogrammable electronic safety-related systems. Parts 1-7"(1998, 2000)

PFDavg

PFD(t)

PFDavg

PFD(t)

W

Failure

WaitR

End of Rep

= 0

D

Detection

Start Rep.

Running

Waiting

Failure

detected

Repair

Rig on

location

Rig

?DCC

DCC

?StR

!nbF=nbF+1!nbF=nbF+1

!nbF=nbF-1

?EoR

= 0

= 0 = 0

W

Failure

WaitR

End of Rep

= 0

D

Detection

Start Rep.

Running

Waiting

Failure

detected

Repair

Rig on

location

Rig

?DCC

DCC

?StR

!nbF=nbF+1!nbF=nbF+1

!nbF=nbF-1

?EoR

= 0

= 0 = 0

8/10/2019 Otc 18504

6/6

6 OTC 18504

2. IEC 61511: "Functional safety. Safety Instrumented systems for theprocess sector. Parts 1-3". (2003)

3. Signoret, J-P: "Managing risks in HIPS by making SIL calculationseffective". Published in the proceedings of the seminar

IQPC2006, Aberdeen, Great Britain, (2006).

4. Dutuit,Y., Innal, F., Rauzy, A., Signoret, J-P: "An attempt tounderstand better and apply some recommendations of IEC

61508 standard". Published in the proceedings of theinternational seminar ESREIDA, Trondheim, Norway (2006).

5. Rauzy, A., Dutuit, Y., Signoret, J-P: "Assessment of safety integrity

levels with fault trees". Published in the proceedings of theinternational conferenceESREL, Estoril, Portugal (2006).

6. Signoret, J-P: "Modeling the behavior of complex industrial

systems with stochastic Petri nets". Published in the proceedings

of the international conference ESREL 1998, TrondheimNorway. (1998)

7. Dutuit,Y., Signoret, J-P: "Tutorial on dynamic system modelling byusing stochastic Petri nets and Monte Carlo simulation"Presented at the international conferences Konbin03, Gdansk

Poland and ESREL 2003, Maastricht, the Netherland. (2003)

8. Arnold, A. Griffault, A., Rauzy, A., Point, G. : "TheAltaRica language and its semantics". FundamentaInformaticae, 34 (2000) 109.

9. Signoret, J-P, Chabot, J-L, Hutinet, T.: "Hiding a stochastic Petr

net behind a reliability block diagram". Published in theproceedings of the international conference ESREL, LyonFrance(2002)

Documents

Otc 18504