22
Page 1 Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .1 C. M. Krishna Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 2 Canonical Structures Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .2 Canonical Structures Larger structures can be constructed out of the individual components Complex structures can be constructed out of some basic structures We will assume statistical independence between failures in the individual components The basic structures are * A Series System * A Parallel System * An M out of N System

UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 1

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .1

C. M. KrishnaFall 2006

UNIVERSITY OF MASSACHUSETTSDept. of Electrical & Computer Engineering

Fault Tolerant ComputingECE 655

Part 2Canonical Structures

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .2

Canonical Structures

♦Larger structures can be constructed out of the individual components

♦Complex structures can be constructed out of some basic structures

♦We will assume statistical independencebetween failures in the individual components

♦The basic structures are ∗A Series System∗A Parallel System∗An M out of N System

Page 2: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 2

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .3

A Series System

♦A Series System - a set of components connected so that the failure of any one component causes the entire system to fail

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .4

Reliability of a Series System

♦Reliability of a series system - Rs(t) -the product of the reliabilities of its N modules

♦Ri(t) is the reliability of component i

N Rs(t) = Π Ri (t)

i=1

Page 3: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 3

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .5

Series System - Constant Failure Rates

♦If every module i has a constant failure rate λ i

♦ - λ i t Ri(t) = e

♦ - λs t - Σλ i tRs(t) = e = e

♦ λs =Σλi is the constant failure rate of the series system

♦ Mean Time To Failure of a series system -

MTTFs = 1/λs = 1/ Σλi

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .6

A Parallel System

♦A Parallel System - a set of modules connected so that all the modules must fail before the system fails

Page 4: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 4

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .7

Reliability of a Parallel System

♦Rp(t) - reliability of a parallel system

♦ N 1 - Rp(t) = Π (1-Ri(t))

i=1

♦ NRp(t) = 1 - Π (1-Ri(t))

i=1

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .8

Parallel System - Constant Failure Rates

♦Module i has a constant failure rate, λ i

♦ -λ i t N - λ itRi(t)=e ; Rp(t) = 1 - Π (1-e )

i=1♦Example - a parallel system with two modules

- λ1 t - λ2 t -(λ1+ λ2)tRp(t) = e +e - e

♦MTTF of a parallel system with the same λN

MTTFp=Σ 1/(i λ ) i=1

Page 5: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 5

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .9

M out of N Systems

♦An M-of-N system is one which consists of Nidentical components, with failure occurring if fewer than M components are still functional

♦Best-known example - The Triplex (TMR)♦Three identical components whose outputs are

voted on. This is a 2-of-3 system: as long as a majority of the processors produce correct results, the system will be functional

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .10

Reliability of M out of N Systems

♦N identical components♦R(t) - reliability of an individual component♦The reliability of the system is the probability

that N-M or fewer components have failed♦

N-M i N-iRm_of_n(t) = Σ C(N,i) (1-R(t) ) R(t)

i=0

♦ where C(N,i) = N ! / [i ! (N-i) !]

Page 6: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 6

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .11

Correlated Failures in M of N Systems

♦Statistical independence of failures in components - key to the high reliability

♦Correlated failure can greatly diminish reliability♦Example: Pcor - probability that the entire

system suffers a global failure

♦ N-M i N-iRm_of_n_cor (t) = (1-Pcor) Σ C(N,i) (1-R(t) ) R(t)

i=0

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .12

M out of N Systems - Modes of Correlation

♦If system is not designed carefully, the correlated failure factor can dominate the overall failure probability

♦Different modes of correlation among components exist - not necessarily a global failure

♦Correlated failure rates are extremely difficult to estimate

♦From now on we will deal with statisticallyindependent failures in components

Page 7: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 7

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .13

TMR - Triple Modular Redundant Cluster

♦TMR - perhaps the most important M-of-N system ♦M=2, N=3 - system good if at least two

components are operational ♦A voter picks the majority output

♦Voter can fail - reliability Rvot(t)♦ 1 i 3-i

Rtmr(t) = Rvot(t) Σ C(3,i) (1-R(t) ) R(t) i=0

= Rvot(t) ( 3R² (t) - 2R³ (t) )

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .14

TMR - Constant Failure Rates

♦ - λ tR(t)=e

♦No voter failures - Rvot(t)=1

♦ -2λt -3λt Rtmr(t)=3e - 2e

♦MTTFtmr = Rtmr(t) dt=5/(6 λ) <1/ λ = MTTFsimplex0

Page 8: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 8

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .15

NMR - N-Modular Redundant Cluster

♦M-of-N cluster with N odd and M = (N+1)/2♦Voter failure rate negligible - Rvot(t)=1

♦Below R=0.5 - redundancy becomes a disadvantage♦Usually R >> 0.5 - triplex offers significant

reliability gains

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .16

TMR - Compensating Faults♦Conservative assumption - every failure of the

voter will lead to an erroneous output and any failure of two modules is fatal

♦Counter Example - one module produces a permanent logical 1 and a second module has a permanent logical 0 - the TMR will function properly regarding this bit

♦A similar situation may arise regarding certain faults within the voter circuit

♦Another example - non-overlapping faults - one module has a faulty adder and another module has a faulty multiplier

♦If the circuits are disjoint, they are unlikely to generate wrong outputs simultaneously

Page 9: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 9

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .17

Voters

♦A voter receives inputs X1, X2,...,XN from an M-of-N cluster and generates a representative output

♦Simplest voter - bit-by-bit comparison of the outputs producing the majority vote

♦This only works when all functional processors generate outputs that match bit by bit∗Processors must be identical and use the same software

♦Otherwise - two correct outputs can diverge slightly, in the lower significant bits

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .18

Plurality Voting

♦We declare two outputs X and Y as practically identical if |x-y| < δ for some specified δ

♦A k-plurality voter looks for a set of at least k practically identical outputs, and picks any of them (or their median) as the representative

♦Example - δ = 0.1, five outputs ♦1.10, 1.11, 1.32, 1.49, 3.00♦The subset {1.10, 1.11} would be selected by

a 2-plurality voter

Page 10: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 10

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .19

Duplex Systems

♦Both processors execute the same task∗ If outputs are in agreement - result is assumed to be correct

∗If results are different - we can not identify the failed processor

∗a higher-level software has to decide how failure is to be handled

♦This can be done using one of several methods

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .20

First Method - Acceptance Tests

♦Acceptance Test - a range check of each processor's output

♦Example - the pressure in a boiler must be in some known range

♦We use semantic information of the task to predict which values of output indicate an error

♦How should the acceptance range be picked?

Page 11: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 11

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .21

Acceptance Test - Sensitivity Vs. Specificity

♦Narrow acceptance range: high probability of identifying an incorrect output, but also a high probability that a correct output will be misidentified as erroneous (false positive)

♦Wide acceptance range: low probability of both♦Sensitivity - the probability that the test will

recognize an erroneous output as such♦Specificity - the probability that the test will

identify a correct output as such♦Narrow range - high sensitivity but low specificity♦Wide range - low sensitivity but high specificity

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .22

Second Method - Testing

♦Both processors are subjected to some test ♦The processor which fails the test is identified

as faulty ♦Real-life tests are never perfect♦Test Coverage - same as test sensitivity - the

probability that the test can identify a faulty processor as such

♦Test Transparency - the complement of the test coverage - the probability that the test passes a faulty processor as good

Page 12: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 12

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .23

Third Method - Forward Recovery

♦Use a third processor to repeat the computation carried out by the duplex

♦If only one of the three processors is faulty, then the one that disagrees is the faulty one

♦It is possible to use a combination of these methods

♦Acceptance test - quickest to run but often the least sensitive

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .24

Duplex Reliability

♦Two active identical processors with reliability R(t)each

♦Lifetime of duplex - the time until both processors fail

♦C - Coverage Factor - the probability that a faulty processor will be correctly diagnosed, identified and disconnected

♦Rduplex(t) - the reliability of the duplex system: ♦

Rduplex(t) = Rcomp(t) [ R² (t)+2C R(t)(1-R(t) ]Rcomp(t) - reliability of comparator

Page 13: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 13

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .25

Duplex - Constant Failure Rates

♦Each processor has a constant failure rate λ♦Ideal comparator - Rcomp(t)=1

♦Duplex reliability -♦ -2λt - λt - λt

Rduplex(t) = e + 2Ce (1-e )

♦MTTFduplex = 1/(2λ) + C/λ

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .26

Duplex with Redundancy

♦Duplex with two active identical processors and an unlimited number of spares

♦When a processor fails, failure is detected with probability Pd and a new processor is replacing the one that failed

♦Probability that this process will result in failure of the entire duplex - 1-Ps

♦Induction process is instantaneous, spares are always functional

Page 14: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 14

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .27

Duplex with Redundancy - Model

♦Each processor has a constant failure rate λ♦Lifetime of a processor has an Exponential

distribution with parameter λ♦Time between two consecutive failures of the

same logical processor is Exponentially distributed with a parameter λ

♦M(t) - number of failures in one logical processor during the time interval [0,t]

♦N(t) - number of failures in the duplex system during the time interval [0,t]

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .28

Duplex with Redundancy -Distribution of M(t)

♦∆t - a small interval of time so that the probability of more than one failure occurring in ∆t is negligible

♦M(t+ ∆t )=n if either M(t)=n-1 and a fault occurred during ∆t, or M(t)=n and no fault occurred during ∆t

♦ Prob[M(t+ ∆t)=n]=Prob[M(t)=n-1] λ ∆t+Prob[M(t)=n](1- λ ∆t)♦This results in the differential equation♦ d Prob[M(t)=n]/dt=- λ Prob[M(t)=n] + λ Prob[M(t)=n-1]♦ Prob[M(0)=n]=0 for n≥1 and Prob[M(0)=0]=1♦The solution is - -λt n

Prob[M(t)=n]=e (λt ) /n! for n=0,1,2,...♦M(t) has a Poisson distribution with the parameter λt

Page 15: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 15

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .29

Duplex with redundancy - Reliability Calculation

♦Duplex has two processors - failure rate of system is 2λ

♦Comparator failure rate - negligible♦Probability of n failures in duplex in [0,t] -

-2λt nProb[N(t)=n]=e (2λt ) /n! for n=0,1,2,...

♦For the duplex not to fail, each of these failures must be detected and successfully replaced -

nprobability C=Pd Ps ; for n failures - C

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .30

Duplex - Reliability

n Rduplex(t) = Σ Prob[n failures] C

= Σ exp(-2 λ t) (2 λ t ) C / n!

= exp(-2 λ t) Σ (2 λ t C) / n!

= exp(-2 λ t) exp(2 λ t C)

Rduplex(t) = exp (-2 λ (1-C) t )

n=0

n

n∞

nn=0

n=0

Page 16: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 16

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .31

Duplex Reliability - Alternative Derivation

♦Individual processors fail at rate λ♦Rate of failures in the duplex is 2λ♦Probability C of each failure to be successfully

dealt with, and 1-C to cause duplex failure ♦Failures that crash the duplex occur with rate

2λ(1-C)

- 2λ(1-C)tThe reliability of the system is e

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .32

More Complex Systems♦NMR systems in which failing processors are

identified and replaced from an infinite pool of spares - similar calculation

♦Finite set of spares - the summation in the reliability derivation is capped at that number of spares, rather than going to infinity

♦Other variations of duplex systems -∗ One processor is active while the second is a standby spare ∗ Processors can be repaired when they become faulty

♦Combinatorial arguments may be insufficient for reliability calculation in more complex systems

♦If failure rates are constant, we can use MarkovModels for reliability calculations

Page 17: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 17

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .33

Markov Chains - Introduction♦Markov Models provide a structured approach for the

derivation of the reliability of complex systems ♦A Markov Chain is a stochastic process X(t) - an

infinite sequence of random variables indexed by time t , with a special probabilistic structure

♦For a stochastic process to be a Markov Chain, its future behavior must depend only on its present state, and not on any past state

♦X(t+s) depends on X(t), but given X(t), X(t+s) does not depend on any X(τ) for τ < t

♦If X(t)=i - the chain is in state i at time t♦We deal only with Markov Chains with continuous time

(0≤t≤ ∞ ) and discrete state (X(t)=0,1,2,…)

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .34

Markov Chain - Probabilistic Interpretation

♦P(X(t+s)=j/X(t)=i,X(τ)=k)=P(X(t+s)=j/X(t)=i) (τ<t)♦Once the chain moves into state i, it stays there

for a length of time which is Exponentially distributed with parameter λ i - it has a constant rate λ i of leaving state i

♦The probability that when leaving state i the chain will move to state j (with j≠i) - Pij

♦Transition rate from state i to state j - λij=Pijλ i

Σ Pij=1 Σ λij= λ ij≠ij≠i

Page 18: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 18

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .35

State Probabilities ♦Pi (t) - probability that the process is in state i at

time t, given it started at state i0 at time 0♦Differential equations for Pi(t), (i=0,1,2,...) -♦For given time instant t, state i and a very small

interval of time ∆t, the chain can be in state i at time t+∆t in one of the following cases: ∗It was in state i at time t and has not moved during the interval ∆t - probability Pi(t)(1- λi∆t)

∗It was at some state j at time t (j≠i) and moved from j to i during the interval ∆t - probability Pj(t) λji ∆t

∗No more than one transition if ∆t is small enough

♦Pi(t+∆t)=Pi(t)(1-λi ∆t)+ Σ Pj(t) λ ji ∆tj≠i

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .36

Differential Equations for Pi(t)

♦dPi(t)/dt = - λi Pi(t)+ Σ λji Pj(t)

♦Since λ i = Σ λ ij

♦dPi(t)/dt = - Σ λijPi (t) + Σ λji Pj (t)

♦This (for i=0,1,2,...) can now be solved, using the initial conditions

♦Pi0(0)=1 and Pi(0)=0 for i ≠ i0

j≠i

j≠i

j≠i

j≠i

Page 19: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 19

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .37

Duplex with a Standby

♦Example: One active processor and a one standby spare -connected when the active unit fails

♦Constant failure rate λ of an active processor ♦C- coverage factor - probability that a failure of

the active processor is correctly detected and the spare processor is successfully connected

♦The Markov chain -

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .38

Differential Equations for Duplex with Standby

♦dP2(t)/dt = - λ P2(t) ♦dP1(t)/dt = λ C P2(t) - λ P1(t)♦dP0(t)/dt = λ (1-C) P2(t) + λ P1(t)♦Initial conditions:♦P2(0)=1, P1(0)=P0(0)=0

dPi(t)/dt = - Pi (t) Σ λij + Σ λji Pj (t)j≠i j≠i

Page 20: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 20

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .39

Reliability of Duplex with Standby

♦Solution of differential equations:♦P2(t)=exp(- λ t)♦P1(t)=C λ t exp(- λ t)♦P0(t)=1-P2(t)-P1(t)

♦Rsystem=1-P0(t)= P2(t)+P1(t) = exp(- λt)+Cλt exp(- λt)

♦Exercise - derive this expression using combinatorial arguments

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .40

Duplex with Repair♦Two active processors: each with failure rate λ

and repair rate µ♦The Markov model

♦The differential equations -♦dP2(t)/dt=-2λ P2(t)+µ P1(t)♦dP1(t)/dt= 2λ P2(t)+2µ P0(t)-(λ+µ)P1(t) ♦dP0(t)/dt= λ P1(t) -2µ P0(t)♦Initial conditions -♦P2(0)=1, P1(0)=P0(0)=0

dPi(t)/dt = - Pi (t) Σ λij + Σ λji Pj (t)j≠i j≠i

Page 21: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 21

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .41

Duplex with Repair - State Probabilities

♦The solution to the differential equations -

♦P2(t)=µ /(λ+µ) +2λµ/(λ+µ) exp[-(λ+µ)t]

+λ /(λ+µ) exp[-2(λ+µ)t]

♦ P1(t)=2λµ/(λ+µ) +2λ(λ-µ)/(λ+µ) exp[-(λ+µ)t]

-2λ /(λ+µ) exp[-2(λ+µ)t]

♦P0(t)=1-P2(t)-P1(t)

22

22 2

22

22

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .42

Different Measures♦In systems without repair, mainly the reliability

measure is of significance♦With repair all three - reliability, availability and

point availability - are meaningful♦ Point Availability - Ap(t)

= Prob{The system is operational at time t}=1-P0(t)♦Reliability - R(t)=Prob {The system is operational

during [0,t] } - can be calculated by removing the transition from state 0 to state 1, solving the resulting new differential equations - R(t)=1-P0(t)

♦Availability - A(t) - average proportion of time during the time interval [0,t] that the system is operational -most relevant in systems with repair

Page 22: UNIVERSITY OF MASSACHUSETTS Dept. of …krishna/655/FALL06/Part2...during Dt, or M(t)=n and no fault occurred during Dt Prob[M(t+ Dt)=n]=Prob[M(t)=n-1] l Dt+Prob[M(t)=n](1- l Dt) This

Page 22

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .43

Steady State Availability

♦We calculate the steady state availability - A(∞) (or A) - the proportion of time in the long run that the system is operational

♦We first calculate the steady-state probabilities -P2(∞), P1(∞), and P0(∞) (or P2,P1,P0)

♦These steady-state probabilities can be calculated in one of the two methods:∗ letting t approach ∞ in Pi(t)∗ setting dPi(t)/dt=0 (i=0,1,2) and solving the linear equations for Pi, using the relationship P2+P1+P0=1

♦ A=1-P0

Copyright 2004 Koren & Krishna ECE655/Krishna Part.2 .44

Duplex with Repair - Steady State♦Steady state probabilities -♦P2= µ /(λ+µ)

♦P1= 2λµ/(λ+µ)

♦P0= λ /(λ+µ)

♦Steady state availability -♦A=A(∞) = P2 + P1 = 1 - P0

= (µ +2λµ )/(λ+µ) = 1 - λ /(λ+µ)

2

2 2

22

2

2 2

2