Dependent Failures · 2018-05-18 · Methods for dependent-failure analysis •Explicit methods...

Preview:

Citation preview

Dependent Failures

An Example

Uc=0.2

How many components should be added in parallel to achieve a system

unreliability lower than 10-10?

Us=(Uc)n

n=14 Us=1.6∙10-10

n=15 Us=0.3∙10-10

An Example: dependent failures

Uc=0.2

How many components should be added in parallel to achieve a system

unreliability Us lower than 10-10?

Us=(Uc)n

n=14 Us=1.6∙10-10

n=15 Us=0.3∙10-10

Power Supply

UPS=10-5

Ignoring dependent failure gross underestimation of risk !!!

Power Supply

UPS=10-5

Motivation of dependent failure analysis

All modern technological systems are highly redundant but still fail because ofdependent failures. This is because dependent failures can defeat redundantprotective barriers and thus contribute significantly to risk

Quantification of such contribution is thus necessary to avoid grossunderestimation of risk.

The modeling of this kind of failures is still a critical issue in PSA (ProbabilisticSafety Assessment).

4

Definition of dependent failures

5

A

B

Power Supply

The failure events A and B are said to be dependent if:

𝑃(𝐴 ∩ 𝐵) ≠ 𝑃(𝐴) ∙ 𝑃(𝐵)

Definition of dependent failures

6

The failure events A and B are said to be dependent if:

𝑃 𝐴 ∩ 𝐵 ≠ 𝑃 𝐴 ∙ 𝑃 𝐵

𝑃 𝐴 𝐵 > 𝑃 𝐴 and 𝑃 𝐵 𝐴 > 𝑃(𝐵) Positive dependence

𝑃 𝐴 𝐵 < 𝑃 𝐴 and 𝑃 𝐵 𝐴 < 𝑃(𝐵) Negative dependence

General Classification

i. Common Cause Failures (CCF): multiple failures that result directly from a common or shared root cause

Extreme enviromental conditions Failure of a piece of hardware extenal to the system Human Error (operational or maintenance)e.g. fire at Browns Ferry Nuclear Power Plant (1975)

ii. Cascading Failures: several component share a common load 1 component failure may lead to increase load on the remaining ones increased likelihood of failuree.g. 2003 northeast Blackout

7

Dependent failures Analysis

Traditional techniques (FMECA)

8

Dedicated analysis

Protection from dependent failures

• Barriers (physical impediments that tend to confine and/or restrict a potentially damaging condition, e.g a dam against tsunami)

• Personnel training (ensure that procedures are followed in all operation conditions)

• Quality control (ensure the product is conforming with the design and its operation and maintenance follow the approved procedures and norms)

• Monitoring, testing and inspection (including dedicated tests performed on redundant components following observed failures)

• Diversity (equipment diversity as for manufacturing, functional diversity as for the physical principle of operation)

9

Types of probabilistic dependence

• 1: Common Cause initiating event (external event, e.g. fires, floods, earthquakes, loss of off-site power, aircraft crashes, gas clouds)

• 2: Intersystem dependences (the conditional probability of failure for a given system along an accident sequence depends from the success or failure of the system that precede it in the sequence)▪ 2.A: Fuctional: System 2 functions only if system 1 fails▪ 2.B: Shared-equipment dependences: components in different systems

fed by the same electrical bus▪ 2.C: Physical interactions: failure of one system to provide cooling

results in excessive temperature which causes the failure of a set of sensors.

▪ 2.D: Human interaction dependences: operator turns off a system after failing to correctly diagnose the conditions of a plant

• 3: Intercomponent dependences▪ same cases of intersystem dependences, but on the scale of

the component 10

Methods for dependent-failure analysis

• Explicit methods

Involve the identification and treatment of specific root causes of dependent failures at the system level, in the event and fault-tree logic.

• Implicit methods

Multiple failure events, for which no clear root cause event can be identified and treated explicitly, can be modeled using implicit, parametric models (typically at the component level).

11

Explicit methods

1. Common Cause initiating events

• External events (earthquakes, fires, floods, …) are treated explicitly as initiating events in the risk analysis.

13

2. Intersystem dependences

• Two safety systems S1 and S2 are expected to intervene upon the occurrence of an initiating event (IE)

14

Generic Event Tee

2.A: Functional dependences

15

System 2 is Not Needed (NN) unless system 1 fails

2.B Shared equipment

16

Method of the ‘event trees with boundary conditions’

17

Develop the event tree: set the failures of components A and F as independent ET

headings

Method of the ‘event trees with boundary conditions’

• 2. To evaluate the probabilities, develop the conditional fault trees

▪ Example 1: sequence , compute P(Sys 1 fails|A fails, B operates)

18

''

P(A)=1

P(F)=0

system 1 mcs= {C,B,DE}

Method of the ‘event trees with boundary conditions’

• 2. To evaluate the probabilities, develop the conditional fault trees

▪ Example 2: sequence , compute P(Sys 1 fails|A operates, B operates)

19

P(A)=0

P(F)=0

system 1 mcs= {C,DE}

Method of ‘Fault tree link”

The fault trees of systems S1 and S2 are linked together, thus developing a single large fault tree for each accident sequence

• Sequence = “S1 fails and S2 operates”

20

GFAEDGFAC

GFAFGFAED

GFACGFABA

GFAFEDCBASS

21

F 𝑃 𝛾 ≅ 𝑃 𝐴 𝐹 𝐺 𝐶 + 𝑃 𝐴 𝐹 𝐺 𝐷 𝐸

Exercise 1

• Compute the probability of sequence = “S1 fails and S2 operates” using the method of the evet tree with boundary conditions

2.B Shared equipment: comments

• Methods:▪ Event tree with boundary conditions (analyst must explicitly

recognize the shared equipment dependence)▪ Fault tree links (share equipment dependence is

automatically considered)• Correctly applied Same results

22

2.C Physical interactions

• Similar treatment of functional dependencies

• System S2 can operate only if system S1 operates successfully. When system S1 fails a physical interaction takes place, which inhibits system S2 Sequence γ is impossible

23

3. Intercomponent dependences (common cause failure)

• Parallel system

24

SYSTEM

FAILS

Minimal cut sets

Without common

causes of failures

With common

causes of failures

BAD

BA ''

Numerical example (Parallel)

25

Even small probability of common cause failure can result in large (relative) increase

of the system failure probability!!!

System

Failure

Probability

𝑃 𝐴 = 𝑃 𝐴′ ∪ 𝐷 = 10−3

𝑃 𝐵 = 𝑃 𝐵′ ∪ 𝐷 = 10−3

3. Intercomponent dependences

• Series System

26

Minimal cut sets

Without common

causes of failures

With common

causes of failures

C

B

E

C

B

'

'

B C

Numerical example (Series)

Even large probability of common cause failure results in small (relative) decrease of

the system failure probability!!!

3. Intercomponent dependencies

Neglecting the causes of dependent failures (i.e., assuming independence inthe component unavailabilities) leads to:

▪ Optimistic predictions of system availability for components in the same mcs (i.e., in parallel)

▪ Conservative predictions of system availability for components in different mcs (i.e. in series)

28

Implicit methods

Multiple failure events, for which no clear root cause event can be

identified and treated explicitly, can be modeled using implicit,

parametric models

Square root method (1) [Reactor safety study, WASH 1400]

• Parallel system of 2 component

• If and are positively dependent

• Combining (1)+(2)

30

)(, BAPUBA

(1))(),(min)()()(

)()(BPAPBAP

BPBAP

APBAP

BA )()|( APBAP

(2))()()()|()( BPAPBPBAPBAP

)(),(min)()()( BPAPBAPBPAP

PL PU

Square root method (2)

31

)(),(min)()()( BPAPBAPBPAP

ULMPPBAP )(

Estimate by using the

geometric average of and

(no proven theoretical foundation)

)( BAP

ULPP

LP

UP

Exercise 2

• System of n identical component in parallel

• Unavailability of the single component at time t :

Estimate the system unavailability at time t using the square root method

32

210cU

Exercise 2

• System of n identical component in parallel

• Unavailability of the single component at time t :

Estimate the system unavailability at time t using the square root method

33

210cU

2

1

21

1

,...,,min

)(

n

cMLs

cnU

n

c

n

i

iL UPPU

UAPAPAPP

UAPP

Example (2)

34

The difference increases as the number of components n increases!

2

1

n

cs UU ncs UU

A methodological framework for

Common Cause Failures (CCF)

analysis

CCF in PSA

i. System logic model development

ii. Identification of common-cause component groups

iii. Common-cause modeling and data analysis

iv. System quantification and interpretation of results

36

CCF in PSA

i. System logic model development

ii. Identification of common-cause component groups

iii. Common-cause modeling and data analysis

iv. System quantification and interpretation of results

37

System logic model development

38

CCF in PSA

i. System logic model development

ii. Identification of common-cause component groups

iii. Common-cause modeling and data analysis

iv. System quantification and interpretation of results

39

Identification of common-cause component groups

OBJECTIVES:

- Identifying group of components potentially involved in dependent failures and thus to be included in the CCF analysis

- Prioritizing the groups for the best resource allocation of the successive analysis

DEFINITION OF COMMON CAUSE COMPONENT GROUPS:

“a group of similar or identical components that have a significant likelihood of experiencing a common cause event”

Qualitative and quantitative screenings needed to discriminate the most important component groups to be included in the analysis!

40

Qualitative screening

• Check-list:- Similarity of component type- Similarity of component use- Similarity of component manufacturer- Similarity of component internal conditions (pressure, temperature, chemistry)- Similarity of component boundaries and system interfaces- Similarity of component location name and/or code- Similarity of component external environmental conditions (humidity, temperature, pressure)- Similarity of component initial conditions and operating characteristics (standby, operating)- Similarity of component testing procedures and characteristics- Similarity of component maintenance procedures and characteristics

• Practical guidelines to be followed in the assignment of component groups:- Identical components providing redundancy in the system should always be assigned to a

common cause group- Diverse redundant components which have piece parts that are identically redundant, should

not be assumed fully independent in spite of their diversity- Susceptibility of a group of components to CCFs not only depends on their degree of similarity

but also on the existence/lack of defensive measures (barriers) against CCFs.

41

Quantitative Screening

• A complete quantitative common cause analysis except that a conservative and very simple quantification model is used. The following steps are carried out:▪ The fault trees are modified to explicitly include a single CCF basic event for

each component in a common cause group that fails all members of the group, e.g. if component A,B and C are in the same common cause group

▪ The fault trees are solved to obtain the minimal cut sets

▪ Numerical values for the probabilities of the CCF basic events can be estimated by the beta factor model (conservative regardless of the number of components in the CCF basic event) :

P(CABC) = P(A) =0.1 for screeningP(A) = total failure probability in absence of common cause

▪ Those common cause failure events which are found to contribute little to the overall system failure probability are screened out

42

CCF in PSA

i. System logic model development

ii. Identification of common-cause component groups

iii. Common-cause modeling and data analysis

iv. System quantification and interpretation of results

43

Common cause failure modeling and data analysis

OBJECTIVE:

complete the system quantification by incorporating the effects of common cause events for those component groups that survive the screening

STEPS:

1. Definition of common cause basic events

2. Selection of implicit probability models for common cause basic events

3. Data classification and screening

4. Parameter estimation

44

Definition of common cause basic events (an example)

45

Ai=Failure of component A from independent causes

Component level Common cause impact level (each component basic event becomes a sub tree)

Common cause failure modeling and data analysis

OBJECTIVE:

complete the system quantification by incorporating the effects of common cause events for those component groups that survive the screening

STEPS:

1. Definition of common cause basic events

2. Selection of implicit probability models for common cause basic events

3. Data classification and screening

4. Parameter estimation

46

Selection of implicit probability models for common cause basic events

• Possible Classification (according to how multiple failures are modeled to occur)

▪ Shock models: the binomial failure rate model which assumes that the system is subject to a common cause ‘shock’ which occurs at a certain rate

▪ Non-shock models

47

Direct models – use the probabilities of thecommon cause events directly, e.g. basicparameter model

Indirect models – estimate the probabilities ofthe common cause events through theintroduction of other parameters

The basic parameter model

• Non-shock, direct model

• Assumptions:▪ Rare event approximation

▪ The probability of similar events involving similar types of components are the same

▪ The probability of failure of any given basic event within a common cause component group depends only on the number and not on the specific components in that basic event (symmetry assumption)

48

The basic parameter model: example of the 2-out-of-3 system

• Rare event approximation:

• Other assumptions:

• Total failure probability of one component: Qt=P(A)=P(B)=P(C):

• Probability of failure of the 2-out-of-3 logical system:

49

Qk = probability of a basic event involving k specific components

𝑄𝑡 = 𝑄𝐼 + 2𝑄2 + 𝑄3

𝑄𝑆 = 3𝑄12 + 3𝑄2 + 𝑄3

3 probabilities to be

estimated:𝑄1, 𝑄2, 𝑄3

• Total probability of failure of a component in a common cause group of mcomponents:

50

The basic parameter model: generalization to a common cause group of mcomponents

Criticality of the method: all the necessary data to estimate Qk are normally not available

Models with more assumptions but less stringent requirement on the data

Number of different ways in which a component can fail with (k-1) other componentsin a group of m similar components

The factor model

• Assumption: common cause failure all m components in the group fail

• factor:

51

mIt QQQ

tI

tm

mI

m

t

m

QQ

QQ

QQ

Q

Q

Q

1

The factor model: Example of the 2 out of 3 system

• Basic parameter model:

• factor model

• Notice:

▪ All units fail when a CCF occurs conservative predictions

▪ Parameter to be estimated from data: , Qt

▪ Time dependent failure probability:

32

2

1 33 QQQQs

t

t

βQQ

Q

QβQ

3

2

1

0

1

tts QQQ 22

13

mI

m

t

m

t

t

t

m

t

m

e

e

tQ

tQ

1

1

)(

)(

Example

• A parallel structure of n identical components with failure rate .

• Components non repairable.

( ) t

tR t e

Example

• A parallel structure of n identical components with failure rate .

• Components non repairable.

• An external event can cause simultaneous failure of all components in the system = fraction of the total failure rate of a component attributable to the external event.

factor model External event = hypothetical component C in

series with the rest of the system

Example

• A parallel structure of n identical components with failure rate .

• Components non repairable.

• An external event can cause simultaneous failure of all components in the system = fraction of the total failure rate of a component attributableto the external event.

1( )

t

IR t e

Binomial failure rate (BFR) model

• System composed of m identical components.

➢ Each component can fail at random times, independently of each other, with failure rate .

➢ a common cause shock can hit the system with occurrence rate .

➢ Whenever a shock occurs, each of the m individual components may fail with probability p, independent of the states of the other components (p=1-model)

56

number I of individual components failing as a consequence of the shock?

Binomial failure rate (BFR) model

• System composed of m identical components.

➢ Each component can fail at random times, independently of each other, with failure rate .

➢ a common cause shock can hit the system with occurrence rate .

➢ Whenever a shock occurs, each of the m individual components may fail with probability p, independent of the states of the other components (p=1-model)

57

number I of individual components failing as a consequence of the shock is binomially distributed with parameters m and p:

Binomial failure rate (BFR) model

• Additional assumptions:

➢ Shocks and individual failures occur independently of each other;

➢ All failures are immediately discovered and repaired, with negligible repair time

• Failure rate for 1 unit in a common cause failure group of multiplicity m :

58

Binomial failure rate (BFR) model

• Failure rate of i units in a common cause failure group of multiplicity m is:

• Failure rate of more than 1 unit in a common cause failure group of multiplicity m is:

• Parameters to be estimated from data: , μ and p

59

Common cause failure modeling and data analysis

OBJECTIVE:

complete the system quantification by incorporating the effects of common cause events for those component groups that survive the screening

STEPS:

1. Definition of common cause basic events

2. Selection of implicit probability models for common cause basic events

3. Data classification and screening

4. Parameter estimation

60

Data classification and screening

• Data sources available are typically of two kinds:

▪ generic raw data (NUREG, …)

▪ plant specific data (rarity of common cause failure + limited experience of specific plant - very limited)

▪ Binary Impact Vector for each event that has occurred in a group of size m

e.g. , 2 components have failed in a group of size 3:

▪ Event descriptions are not clear classification of the event requires establishing hypotheses representing different interpretations of the event. Probability are associated to the hypotheses.

],...,,[ 10 mIIII

[0,0,1,0]I

Data classification and screening

• Average impact vector (not binary): for a given event:

• Several events computation of nk = the total number of events involving the failure of k similar components in the group

62

],...,,[ 10 mPPPI

( )k k

j

n P j

Common cause failure modeling and data analysis

OBJECTIVE:

complete the system quantification by incorporating the effects of common cause events for those component groups that survive the screening

STEPS:

1. Definition of common cause basic events

2. Selection of implicit probability models for common cause basic events

3. Data classification and screening

4. Parameter estimation

63

Parameter estimation

• Impact vectors number of events in which 1,2,3,....m components failed

• (Possibly) the single component failure probability:

64

• Basic event probabilities directly (within the basic

parameter model:

E.g. Safety system: number of demands = N

• the parameters of the common cause failure models

(beta factor, BFR)

)( jPnj

kk

N

nQ k

k

tQ

Beta-factor estimation for a two-train redundant standby safety system tested for failures

• Available recorded evidence:▪ n1 failures of single components

▪ n2 failures of both components

65

number of tests

for common-

cause failures

number of single-component

demands to start

They depend from thetype of tests!

Surveillance testing strategy: • Both components are tested at the same time

Beta-factor estimation for a two-train redundant standby safety system tested for failures (TEST STRATEGY I)

Day 0 Day 15 Day 30 Day 45 Day 60

Comp. 1 S S F F S

Comp. 2 S F S F S

N 2 4 6 8 10

N2 1 2 3 4 5

n1 0 1 2 2 2

n2 0 0 0 1 1

Tests = Demands to start

𝑁2 =𝑁

2𝑄2 =

𝑛2𝑁2

=2𝑛2𝑁

𝛽 =𝑄2𝑄𝑡

=𝑄2

𝑄1 + 𝑄2=

2𝑛2𝑁

𝑛1𝑁+2𝑛2𝑁

=2𝑛2

𝑛1 + 2𝑛2

Surveillance testing strategy • The components are tested at staggered intervals, if there is a failure, the

second component is tested immediately. (N2 is known. N is linked to N2 )

Beta-factor estimation for a two-train redundant standby safety system tested for failures (TEST STRATEGY II)

Tests = Demands to start

Day 0 Day 15 Day 30 Day 45 Day 60

Comp. 1 S S F F S

Comp. 2 NO TEST NO TEST S F NO TEST

N 1 2 4 6 7

N2 1 2 3 4 5

n1 0 0 1 1 1

n2 0 0 0 1 1

Beta-factor estimation for a two-train redundant standby safety system tested for failures (TEST STRATEGY II)

• Link between 𝑁 and 𝑁2

Number of single-component demands to start

212 nnNN

Number of tests for common-cause failures

Number of failures of a single component

Number of failures involving both components

68

Estimates of β are based

on the assumptions on

the testing strategies

𝑄2 =𝑛2𝑁2

≅𝑛2𝑁

(𝑛1 ≪ 𝑁 𝑎𝑛𝑑 𝑛2 ≪ 𝑁)

𝛽 =𝑄2𝑄𝑡

=𝑄2

𝑄1 + 𝑄2≅

𝑛2𝑁

𝑛1𝑁+𝑛2𝑁

≅𝑛2

𝑛1 + 𝑛2

Binomial failure rate model parameter estimation

• Parameter to be estimated: 𝑝,

• is not directly available because:

▪ Shocks that do not cause any failure are not observable

▪ Single failures from common-cause shocks may not be distinguishable from single independent failures

• Data available for the estimation:

▪ ni = number of observations of i concurrent failures

𝑛+ = σ𝑖=2𝑚 𝑛𝑖 number of observations of dependent

failures of any multiplicity order

69

Binomial failure rate model parameter estimation

• Method for the estimation: maximizing the likelihood:

• For a given observation time T, the variables N1 and N+ have Poisson distributions with parameters 1T and +T , respectively. Maximizing the likelihoods P1 and P+

70

Binomial failure rate model parameter estimation

• The likelihood, Pm, follows a multinomial distribution. From the total number of unit failing:

• The estimate of the value of p which maximizes Pm is found from:

• From 1, + and p it is possible to estimate from:

71

Valid for m > 2

EXAMPLE: Auxiliary Feedwater System (AFWS) of a nuclear Pressurized Water Reactor

72

TRAIN ≡ UNIT

EXAMPLE: Auxiliary Feedwater System (AFWS) of a nuclear Pressurized Water Reactor

73

Tank

UNIT M

UNIT T

UNIT D

SGs

DATA FROM US PWR PLANT:

•Different number of units

•Different type of units (M,T, D)

Instances of multiple failures in PWR auxiliary feedwater systems

74

Ne = n. of multiple failure events= 11

Nc = n. of unit failure in multiple failure events = 24

Plant with 2 units

**

*

*

*

* Failure of the

system (6)

*

EXAMPLE: Auxiliary Feedwater System (AFWS) of a nuclear Pressurized Water Reactor

75

From real data:

•System per-demand failure probability:

•Two-component per-demand failure probability (1 out of 2) :

•Three-component per-demand failure probability

(1 out of 3):

Assumption: one complete (i.e., all units) system demand for each calendar month

= multiple-unit

system demands

= two-unit system

demands

= three-unit system demands

Analysis of the data at the system level

T

EXAMPLE: Auxiliary Feedwater System (AFWS) of a nuclear Pressurized Water Reactor

76

From real data:

•System per-demand failure probability:

•Two-component per-demand failure probability (1 out of 2) :

•Three-component per-demand failure probability

(1 out of 3):

Assumption: one complete (i.e., all units) system demand for each calendar month

= multiple-unit

system demands

= two-unit system

demands

= three-unit system demands

Data at the unit level

𝒏+

𝑺

𝒏𝟏

N

𝒏𝒊

n. of unit failures in multiple(dependent)-failure events

• QS = Q1-2 per-demand probability of failure-to-start for a 1-out-of-2 system is:

•With:

From data = 8.4 10-3

𝑆/𝑇መ𝛽 =

𝑄𝑚𝑄𝑚 + 𝑄𝑖

=

𝑆𝑇

𝑆𝑇+𝑛1𝑇

=𝑆

𝑆 + 𝑛1=

2𝑛2 + 3𝑛3𝑛1 + 2𝑛2 + 3𝑛3

=24

24 + 68= 0.26

𝑄𝑡 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑜𝑛 𝑑𝑒𝑚𝑎𝑛𝑑 𝑜𝑓 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑢𝑛𝑖𝑡 =𝑛𝑖+2𝑛2+3𝑛3

𝑁=

69+24

4682= 0.02

=Example: Beta-factor model

Example: Beta-factor model

• QS = Q1-3 per-demand probability of failure-to-start for a 1-out-of-3 system is:

78

From data:

negligible

Example: Binomial failure-rate model

79

Number of observed single failures in multi-unit systems

n. of system demand

Number of observed multiple failures

መ𝜆1 =𝑛1𝑇=

68

1641= 0.0414

መ𝜆+ =𝑛+𝑇

=11

1641= 0.0067

80

Example: Binomial failure-rate model

Estimation of p (for 3-unit systems)• n+ = 7 multiple failure events in 3-unit systems• S : total number of units failing in 7 multiple failures in 3-units systems : S=16 )

81

Example: Binomial failure-rate model

(Three-component per-demand failure probability)

From data:

Rate of dependent failures of multiplicity i

Q(2)1-3

Q1-3

Discussion and comparison of Beta-factor and Binomial-Failure-Rate models

82

Recommended