Reliability Engineering Lec Notes #6

7/30/2019 Reliability Engineering Lec Notes #6

1/21

46

SYSTEM RELIABILITY MODELING AND PREDICTION -

STATIC METHODS

Reliability Block Diagrams

A reliability block diagram is a graphical procedure which describes the system

operation in terms of successful "signal" transmission between the system units.

1 2

1

2

Two Unit Series System Two Unit ActiveParallel System

Consider a system which consists of two units both of which must function for the system

to function (series system). Assume component failures are statistically independent and

let

A1 : Unit#1 functions at time t; P(A1) = R1(t)

A2 : Unit#2 functions at time t; P(A2) = R2(t)

A : system functions at time t; P(A) = R(t)

Then

P(A) = P(A1A2) = P(A1)P(A2)

=> R(t) = R1(t)R2(t)

If the system functions when either Unit#1 or Unit#2 functions (active-parallel system),

then

P(A) = P(A1 + A2) = P(A1) + P(A2) P(A1)P(A2)

=> R(t) = R1(t) + R2(t) R1(t)R2(t)

If units are identical with constant failure rate

Rseries(t) = e2t


2/21

47

Rparallel(t) = 2 et e2t

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

._. R(t)=e2 t

__ R(t)=2e t

e2 t

... R(t)=e

t

*t

R(t)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

__ F(t)=1e t

+e2 t

... F(t)=12e t

._ . F(t)=1e2 t

*t

F(t)

In general,

R(t) =N

n=1 Rn(t) for N series system

R(t) = 1 N

n=1 [1 Rn(t)] for N parallel(active) system

R(t) =N

n=M N!n! (N n)! [R(t)]n[1 R(t)]Nn for M out ofN(good) system

with N identical units


3/21

48

For a system with N identical, randomly failing units

MTTFseries =

0

dt eNt = 1N

MTTFparallel =

0

dt {1 [1 et]N} = 11

0

dy 1 yN

1 y

=1

1

0

dyN1

n=0 yn = 1

N1

n=0 1n + 1 =

1

N

n=1 1n

MTTFMoutofN =

0

dtN

n=M N!n! (N n)! ent[1 et]Nn

=N

n=M N!n! (N n)!

0

dt ent[1 et]Nn

=1

N

n=M

N!

n! (N n)!

1

0

dy (1 y)n1yNn

Let

In =

1

0

dy (1 y)n1yNn.Then

In = n 1N n + 1

1

0dy (1 y)n2yNn+1 = n 1N n + 1 In1 with I1 =

1

0dy yN1 = 1N

=> In =(n 1)!

N(N 1) . . . (N n + 1)= (n 1)!

(N n)!N!

=> MTTFMoutofN =1

N

n=M 1n


4/21

49

Example 1

A system consists of 7 units connected as shown in the following reliability block dia-

gram. Units 1 through 4 are different (with 2,3, and 4 in active-parallel) and 3 units of

type 5 constitute a 2-out-of-3 system. IfRi(t) (i = 1, 2, . . . , 5) denotes the reliability func-tion of each unit as a function of time, find the reliability function for the system.

1

2

3

4

5

5

5

Solution

R234(t) = R2(t) + R3(t) + R4(t) R2(t)R4(t) R3(t)R4(t) + R2(t)R3(t)R4(t)

R1234(t) = R1(t) + R234(t) R1(t)R234(t)

Also, since for an M-out-of-N system (good) with identical unit

R(M)N(t) =N

n=M N!n! (N n)! [R(t)]n[1 R(t)]Nn,

we have

R(55)5(t)= 3R25(t)[1 R5(t)]+ R

35(t) = 3R

25(t) 2R

35(t)

and the reliability function for the system is

Rsys(t)= R1234(t)R(55)5(t)


5/21

50

Example 2

Find the R(t) and MMTF for the system whose reliability diagram is given below. In cal-

culating MTTF, assume all components are identical and fail randomly with failure rate

.

1

4

2

3

5

Solution

Rsys = R4R(sys|4) + R4R(sys|4)

R(sys|4) = 1 (1 R2)(1 R3)(1 R5)

R(sys|4) = R1(R2 + R3 R2R3)

Rsys = R4[1 (1 R2)(1 R3)(1 R5)] + (1 R4)R1(R2 + R3 R2R3)

If all components are identical and fail randomly with failure rate

Rsys (t) = et[1 (1 et)3] + (1 et)et(2et e2t)

=> Rsys (t) = 5 e2t 6 e3t + 2 e4t

=> MTTF=

0dt Rsys(t) =

1

We could have obtained the same results by choosing the "keystone element" as unit 3,

i.e.

Rsys = R3R(sys|3) + R3R(sys|3)

R(sys|3) = R1 + R4 R1R4


6/21

51

1

4

2

5

Reliability block diagram for Example 2 with component 3 failed

R(sys|3) = R2R(sys|23) + R2)R(sys|23)

= R2 (R1 + R4 R1R4) + R2R4R5

=> Rsys = R3(R1 + R4 R1R4) + (1 R3)[R2 (R1 + R4 R1R4) + (1 R2)R4R5]

= R4[1 (1 R2)(1 R3)(1 R5)] + (1 R4)R1(R2 + R3 R2R3)

Note that if the link were not present

RL = R4R5 + R1(R2 + R3 R2R3)(1 R4R5)

and for all components identical and failing randomly with failure rate

RL (t) = 3 e2t e3t 2 e4t + e5t

=> MTTFL =

0

dt Rsys(t) = 1315which shows that the presence of the link improves the system reliability.

Failure Modes and Effects Analysis (FMEA)

The FMEA was first developed by the aerospace industry in mid 60s. The FMEA

analysis


7/21

52

describes inherent causes of events that lead to system failure,

determines their consequences, and,

devises methods to minimize their occurrence or recurrence.

There are basically two types of FMEA:

Design FMEA is used to evaluate the failure modes and their effects for a product before

it is released to production and is normally applied at the component and subsystem lev-

els. Its objectives are:

Identify failure modes and rank them according to their effect on the product perfor-

mance.

Identify design actions to eliminate potential failure modes or reduce the occurrence

of the respective failures.

Document the rationale behind product design changes.

Process FMEA is used to analyze manufacturing and assembly processes. Its objectives

are to identify:

failure modes that can be associated with manufacturing and assembly process defi-

ciencies,

highly critical process characteristics that may cause the occurrence of particular

failure modes,

sources of manufacturing/assembly process variations.

An example of FMEA for transportation applications (using SEA J1739 FMEA Proce-

dure) is given below. The design controls are:

1. Prevent the failure cause/mechanism or mode from occurring or reduce rate of

occurrence

2. Detect the failure cause/mechanism and lead to corrective actions

3. Detect the failure mode.


8/21

53


9/21

54

RPM: Risk Priority Number

Occurrence Rating Scale


10/21

55

Severity Rating Scale


11/21

56

Detection Rating Scale

Some limitations of FMEA:

Limited insight into probabilistic system behavior.

FMEA is performed for only 1 failure at a time. There may be multiple failure

modes with comparable likelihoods.

Limited insight into the functional relationships between components

Time element in system operation cannot be represented.


12/21

57

Fault Tree/Event Tree Methodology

Fault-trees are logic diagrams that link primary or secondary faults (Basic Events) to

an undesirable event (Top Event).

Example 1

Construct a fault-tree with Top Event "Circuit breaker does not open upon demand" for

the system below:

Control CircuitA

Control CircuitB

Relay A

Relay B

TripCoil

CircuitBreaker

Solution

a

Circuit BreakerMechanism Fails

Closed

Voltage PresentAcross the Trip

Coil

b

Circuit BreakerDoes Not Open

Relay A ContactsStay Closed

Relay B ContactsStay Closed

c d

Relay AFails Closed Control

Circuit AFails On

ControlCircuit BFails On

Relay BFails Closed


13/21

58

Example 2

Construct a fault-tree with Top Event "Latch does not trip" for the system below: HydraulicControl A

Actuator A

HydraulicControl B

Actuator B Linkage Solution

a

b

c d

Latch Does NotTrip

LinkageFails

Extended

Actuators Failto Retract

Actuator AFails to Retract

Actuator BFails to Retract

Actuator AFails

Extended

HydraulicControl A

Fails Extended

HydraulicControl B

Fails Extended

Actuator BFails

Extended


14/21

59

Example 3

For the system of Example 1 find an expression which yields the probability of Top

Event occurrence in terms of the probability of basic event occurrence.

Solution

Let

A: Circuit breaker mechanism fails closed

B: Relay A fails closed

C: Control circuit A fails on

D: Relay B fails closed

E: Control circuit B fails on

a

A

TopEvent

b

c d

B C D E

Then

a

=A

+b c

=B

+C

b = c d d= D + E

which gives

a = A + b = A + cd= A + (B + C)(D + E).


15/21

60

From the rules of Boolean Algebra (or Event Algebra) given in Appendix B:

A + (B + C)(D + E) = A + B (D + E) + C(D + E) (Distributive Law)

= A + BD + BE+ CD + CE (Distributive Law)

Each A, BD, BE, CD, CEis called a cut set(in this case also a minimal cut set). Then

P(a) = P[A] + P[BD] + P[CD] + P[BE] + P[CE] P[BCE]

P[CD(BE+ CE)] P[BD(CD + BE+ CE)] P[A(BD + CD + BE+ CE)]

(using the Commutative and Idempotent Laws)

= P[A] + P[BD] P[ABD] + P[CD] P[ACD] P[BCD] +

P[ABCD] + P[BE] P[ABE] + P[CE] P[ACE] P[BCE] +

P[ABCE] P[BDE] + P[ABDE] P[CDE] + P[ACDE] +

P[BCDE] P[ABCDE]

(using the Associative, Distributive and Idempotent Laws).

It is often reasonable to assume that P[A], P[BD], P[CD], P[BE] and P[CE] are much

larger that the other probabilities (i.e. rare event approximation ) which implies that Top

Event probability is the sum of minimal cut set probabilities, i.e.

P(a) P[A] + P[BD] + P[BE] + P[CD] + P[CE].

Statistical Importance

Statistical importance is a measure of the significance of a given basic event to the Top

Event. IfX is the event of interest, then one definition of statistical importance (Im) is

Im =Pr(Minimal Cut Sets Containing X)

Pr(Top Event)

Example 4

If P(A)=0.001/demand, P(B)=Pr(D)=0.001/demand and P(C)=P(E)=0.005/demand in

Example 2, use the rare event approximation to identify the component that needs most

frequent inspection to prevent the Top Event "Circuit breaker does not open upon


16/21

61

demand".

Solution

This component can be identified as the one with the highest statistical importance to the

Top Event. Then using the rare event approximation from Example 2,

Im(A) =P(A)

P[A] + P[BD] + P[BE] + P[CD] + P[CE]

=0. 001

0. 001 + (0. 001)2 + 2(0. 001)(0. 005) + (0. 005)2= 0. 9653

Im(B)=P(BD) + P(BE)

P[A] + P[BD] + P[BE] + P[CD] + P[CE]

= (0. 001)2

+ (0. 001)(0. 005)0. 001 + (0. 001)2 + 2(0. 001)(0. 005) + (0. 005)2

= 0. 0058

Im(C) =P(CD)+ P(CE)

P[A] + P[BD] + P[BE] + P[CD] + P[CE]

=(0. 001)(0. 005) + (0. 005)2

0. 001 + (0. 001)2 + 2(0. 001)(0. 005) + (0. 005)2= 0. 0290

Im(D)=P(BD)+ P(CD)

P[A]

+P[BD]

+P[BE]

+P[CD]

+P[CE]

=(0. 001)2 + (0. 001)(0. 005)

0. 001 + (0. 001)2 + 2(0. 001)(0. 005) + (0. 005)2= 0. 0058

Im(E) =P(BE) + P(CE)

P[A] + P[BD] + P[BE] + P[CD] + P[CE]

=(0. 001)(0. 005) + (0. 005)2

0. 001 + (0. 001)2 + 2(0. 001)(0. 005) + (0. 005)2= 0. 0290

The results show that the circuit breaker mechanism should be inspected most frequently.Note that we have assumed that the events B, C, D, Eare statistically independent as per

given data.


18/21

63

Root Cause Analysis

Root causes are the most basic causes that can be reasonably identified by experts and

can be corrected so as to minimize their recurrence. Several structured techniques are

used for root cause analysis, including change analysis, barrier analysis, events and causalfactors analysis, tree diagrams, management oversight and risk tree analysis (MORT) and

fishbone diagrams. Some other less structured approaches are process control charts,

trend analyses and Pareto diagrams. Root cause analysis consists of three steps:

1. Determine ev ents and causal factors

2. Code and document root causes

3. Generate recommendations

An example using the tree approach to Step 1 is given below. Step 2 consists of follow-

ing each path to the top event to determine its relevance for the particular incident (e.g by

asking "if not?"). Once root causes are identified corrective and preventive recommenda-

tions are made. For more information on root cause analysis see Ref.[6].

Aerosol Inhalation WhileSpray Painting

Personnel Procedures Material orEquipment

Poor work pactices

Inattention

Lack of supervision

No written procedures

Verbal instructions unclear

Work procedure inadequate

Defective mask

Inadequate ventilation

Statistically Dependent Failures

Statistically dependent failures are defined as events in which the probability of each fail-

ure is dependent on the occurrence of other failures. In general, statistically dependent


19/21

64

failures are handled using Markov models which we will discuss in Dynamic Methods.

However, in systems with redundant identical components static techniques may be used.

We will illustrate the factor methodfor a 2 component parallel system. For generaliza-

tion of the factor method and other methods see Ref.[2].

Consider a 2-component parallel system where each component can individually fail

with rate R or fail due to common cause (e.g. loss of power) with rate C. Then the reli-

ability function for the system is

R(t)= eCt1

1 eRt

2= eCt

2eRt e2Rt

Let

=C

C+ R

C

.

Then C= and R = (1 ) and

R(t)= e t2e(1)t e2(1)t

= 2et e(2)t= et

2 e(1)t

.

The factor method assumes that tis small enough that

et 1 t

and

e(1)t 1 (1 )t.

Then

R(t)= 1 t (1 )(t)2

or

F(t)= 1 R(t)= t+ (1 )(t)2.

Note that since can be interpreted as the probability that component failure occurs to

the common cause event, then the first term gives the probability of system failure due to

common cause event and the second term gives the probability of system failure due to

the non-common cause failure of the components.

New Static Methods

While the fault-tree/event-tree approach is perhaps the most commonly used tech-

nique for system reliability modeling, construction of fault-trees is difficult when the sys-

tem operation involves control loop action. Some alternative techniques that have been


20/21

65

proposed include influence diagrams, directed graphs (digraphs) and the GO-FLOW

methodology. Since the digraph approach can be used to simplify fault-tree construction,

we will illustrate this technique through a simple example. For influence diagrams see

Ref.[7] and for the GO-FLOW methodology see the Supplementary Material under

Course Notes on the web.

Consider the pressure tank system shown below. The switch is normally closed and the

motor drives a pump which feeds air into the tank. The air is discharged through the dis-

charge valve at periodic intervals. A timer set to these intervals opens the contacts before

an overpressure condition occurs and pumping stops. If the timer contacts fail-closed, the

operator observes from the pressure gauge that the tank pressure is high and manually

opens the switch. There are 2 control loops:

Loop 1: Tank, pressure gauge, operator, switch.

Loop 2: Tank, relief valve.

Digraph for the

Pressure Tank SystemPressure Tank System

The digraph is a tool to describe the cause-effect relationships between system compo-

nents and variables. A digraph consists of nodes which represent the system variables

and components and edges which connect the nodes. The numbers represent the direc-

tion and the qualitative magnitude of the gains between the variables. The gains multiply

at the nodes. For example, in tank pressure Ptank increases, the gauge pressure Pgauge

increases (+1 into Pgauge) which alerts the Operator(+1) who then opens the switch (+1)

and reduces the current Iswitch to the switch (-1 +1 = -1). With the switch open, current


21/21

66

Ipump to the pump motor decreases (-1 +1 = -1) and tank pressure stops increasing (-1 +1 = -1). Subsequently,if everything works as designed, an increase in Ptank leads a

decrease through the action of the feedback loop. The fault tree is constructed by consid-

ering the events that cause the loops to lead to the top event.

Pressure TankRupture

Zero Gainthrough Loop 1

OF SFCGS

TFC

RVS

Zero Gainthrough Loop 2

TFC: Timer Contacts Fail ClosedGS: Gauge Stuck

RVS: Relief valve StuckOF: Operator FailsSFC: Switch Fails Closed

Documents

Reliability Engineering Lec Notes #6