Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Chapter 1
Introduction
Since Enlightenment technology has been shaping our life to a greater extent. Machines
are not only supplementing the human effort rather they have come to almost replace it in some
sectors. This scenario requires us to develop newer, more complex and sophisticated technical
systems to cater to the needs of the contemporary postmodernist, post-industrial society. Today
we are surrounded by more powerful systems than ever and they are being constantly and
continuously designed and developed. The process and technology is aimed at making our lives
smooth though also ends up contributing to making it more complex. The impact of the failure or
mismanagement of a power generating and power distribution system in a major city, the
malfunction of an air traffic control system at an international airport, miscommunication in
today‟s internet systems or the breakdown of a nuclear power plant is simply frightening. There
is too much at stake in terms of cost, human life, and national security to take any risks with
devices, and we cannot afford any malfunctioning, even an accidental one. Hence, if the systems
are to be reliable, high level of reliability components are required to be used—a constant in
science and technology. Our knowledge about the universe and all that it contains is in a state of
constant up-gradation and technology simply being “a marriage of money and scientific
principle” also keeps becoming old, obsolete, financially unviable and almost redundant. As a
consequence, the importance of reliability at all stages of modern engineering processes,
including design, manufacturing, distribution and operation is a must. In efficiency of an
equipment, „reliability‟ acts as a vital factor.
The concept of reliability has been known since long. It has grown out of the demand of
modern technology and particularly out of the experiences of the complex military systems
during World War II. The automation and complexity of the military systems resulted into
dealing with the problems of maintenance and repair and a lot of work was reported in the field
of reliability which compelled to consider the reliability as a technical parameter. Davis (1952)
2
discussed failure data and goodness of fit tests for various competing failure distributions.
Epstein and Sobel (1955) and Epstein (1958) worked in the field of life testing with the
assumption of exponential distribution. After these studies, the exponential failure time
distribution acquired a unique position in life testing and reliability analysis.
To tackle with general distribution, the techniques like imbedded Markov chains and
phase techniques etc. were widely used to solve the problems. Another technique named as
'Inclusion of Supplementary Variable(s)' developed by Cox in 1955 was employed by Garg
(1963) for the first time in the reliability evaluation problems. This idea was further developed
by Busacott (1971) and Chow (1973), who studied the concept of reliability of some redundant
systems with repair.
Mine and Kaiwal (1979) enhanced the system reliability by assigning priority repair
disciplines. Kontoleon (1980) determined the reliability of r-successive-out-of-n: F
system.Chung (1982) studied some stress-strength reliability models. Kumar (1982) presented a
recursive algorithm to evaluate the reliability of a consecutive-k-out-of-n: F system. Mandaltsis
and Ontolen (1987) studied the overall reliability determination of computer networks with
hierarchical routing strategies. Sharma (1991) gave the corrected bounds for reliability when
strength and stress distributions are known. Ksir and Boushaba (1993) gave the reliability bounds
and direct computation of the reliability of a consecutive k-out-of-n : F system with Markov
dependence. Rajamanickam and Chandeasekhar (1997) gave the reliability measures for two-unit
systems with a dependent structure for failure and repair times. Wang et al. (2002) modeled the
bathtub shape hazard rate function in terms of reliability. Yam et al. (2003) proposed a method
for evaluation of reliability indices for repairable circular consecutive k-ou-of-n: F system. Li
and Chen (2004) discussed the aging properties of the residual life length of k-out-of-n system
with independent but non-identical components. Peiravi (2008) gave the estimation of expected
lifetime and reliability during burn in field operation using Markov chain Monte Carlo
simulations. Taneja (2009) developed a reliability model for a system with conditional warranty
and various types of repair/replacement. Pathak and Joshi (2009, 2011) optimized the reliability
modeling of MEMS devices. Hu et al. (2010) developed the system reliability prediction model
based on evidential reasoning algorithm with non-linear optimization. Lakshminarayana and
Kumar (2013) optimized the reliability of integrated reliability model using dynamic
3
programming and failure modes effects and gave criticality analysis. Mi et al. (2013) studied the
reliability analysis of multi-state system with common cause failure based on Bayesian networks.
Besides finding reliability, the studies were carried out to evaluate other measures
also by various researchers for complex systems consisting of one or two or more units
introducing various concepts. The concept of availability was widely discussed in literature
and the main contributors are Barlow and Hunter (1960), Gaver(1963), Sandler (1963), Myers et
al. (1964), Barlow and Proschan (1965), Rau (1970), Beran (1974) and Arndt and Franken
(1977). Srinivasan and Gopalan (1973) concentrated on regenerative point technique. Nakagawa
and Osaki (1975) considered stochastic behavior of a two-unit priority standby redundant system
with repair. Gopalan et al. (1975) discussed in his paper the availability and reliability of one
server two unit systems with imperfect switchover. Nakagawa (1976) considered the replacement
of the unit at a certain level of damage while Arora (1977). Dhillon (1980) discussed the
availability analysis of systems with two types of repair facilities. Nakagawa (1980) studied the
optimum inspection policy for a standby unit by taking a standby electric generator as an
example. Yamashen (1980) worked on a multistate with several failure modes and cold standby
unit. Murari and Maruthachalam (1981) studied the working of two unit parallel systems with
periods of working and rest. Ramamurthy and Jaiswal (1982) studied the time to failure for a
two-dissimilar-unit cold standby system with allowed down time. Murari and Goyal (1983)
studied a two-unit cold standby system with two types of repair facility. Murari and
Maruthachalam (1984) considered a two unit system with two different interlinking in two
different periods. Goyal (1984) studied a two-unit cold standby system with two types of repair
facilities. Goel and Gupta (1984) discussed the stochastic behaviour of a two-unit standby
system with better utilization of units. Dharmadkikari and Gupta (1985) studied the stochastic
behaviour of 1-out-of-2:G warm standby repairable system. Goel et al. (1985) dealt with a two-
unit cold standby system with two types of operation and repair. Goel et al. (1986) obtained the
reliability analysis of a system with preventive maintenance and two types of repair. Murari and
Al-Ali (1986) analysed the reliability of a system subject to random shocks and preventive
maintenance. Gazit and Malek (1988) considered the fault tolerance capabilities in multistage
network based multi computer systems. Guo Tong De (1989) studied stochastic behavior of a
system with preparation for repair. Mahmoud (1989) worked on two-unit system with two types
of failure and preventive maintenance. Gopalan and Muralidharan (1991) analysed a system with
4
online preventive maintenance and repair. Kumar (1995) made a comparative study for standby
redundancy at system and component levels. As some of the causes of failures are due to human
errors like misinterpretation of instruments, wrong actions, maintenance errors, lack of good
knowledge of jobs, environment, poor training or skills of operating personnel and hence Yang
and Dhillon (1995) analysed a general standby system stochastically with constant human error.
Dekker (1996) discussed about application of maintenance models: A review and
analysis.Mokkaddis et al. (1997) analysed a two-unit warm standby system subject to
degradation. Attahiru and Zhao (1998) studied the stochastic analysis of a repairable system with
three-units and repair facilities. Wang (2002) carried out a survey of maintenance policies of
deteriorating systems. Apeland and Scarf (2003) wrote about subjective approach to modeling
inspection maintenance. Barros et al. (2003) studied optimization of replacement times using
imperfect information. Attardi and Pulcini (2005) wrote about a new model for repairable
systems with bounded failure intensity. Glazebrook et al. (2005) proposed index policies for the
maintenance of a collection of machines by a set of repairmen. Chien and Sheu (2006) presented
an extended optimal age-replacement policy with minimal repair of a system subject to shocks.
Karamatsoukis and Kyriakidis (2009) wrote about optimal maintenance of a production-
inventory system with idle periods. Brick and Uchoa (2009) discussed a facility location and
installation of resources model for level of repair analysis. Kuzin (2010) studied the vibration
reliability and endurance of a centrifuge for separating suspensions.
Needless to say that aspect of cost/profit is also very important for the systems for
which the problem of reliability and maintenance being carried out. Thus, this aspect
received attention in 80s and thereafter a lot of work has been done on analysing cost/profit for
such systems in the literature of reliability. Goel and Gupta (1985) dealt with cost analysis of a
two-unit priority standby system with imperfect switch and arbitrary distributions. Murari et al.
(1985) worked out cost analysis of two-unit warm standby system with regular repairman and
patience time. Mokaddis et al. (1989) gave the profit analysis of two-unit priority system with
administrative delay in repair. Gopalan et al. (1991) carried out the cost analysis of a system
subject to on-line preventive maintenance and repair. Tuteja and Taneja (1991, 92, 93)
investigated reliability and profit analysis of two-unit standby system introducing the concepts of
two identical repairmen, minor repair, partial failure and random inspection. Goel et al. (1992)
gave the idea of random change of operative unit. Rander et al. (1991, 92) discussed a system
5
with major and minor failures and preparation time in case of major failure and a system with
imperfect assistant repairman and perfect master repairman. Gupta et al. (1993) dealt with the
profit analysis of a two-unit priority standby system subject to degradation and random shocks.
Singh and Mishra (1994) evaluated profit for a two-unit standby system with two operating
modes. Rander et al. (1994) investigated the cost analysis of two dissimilar cold standby systems
with preventive maintenance and replacement of standby. Pandey and Jacob (1995) gave the cost
analysis and MTTF of a three state standby complex system under common cause and human
failures. Gupta et al. (1997) dealt with the analysis of a system with three non-identical units
(Super-priority, priority and ordinary) with arbitrary distributions. Tuteja et al. (1999) discussed
a two server system with regular repairman who is not always available. Rizwan and Taneja
(2000) analysed the profit of a system with perfect repair at partial failure or complete failure.
Sehgal (2000) studied some reliability models with partial failure, accidents and various types of
repair. Singh et al. (2001) wrote on a two unit warm standby system with accident and various
types of repair. Siwach et al. (2001) studied two-unit cold standby system with instruction and
accident. Tuteja et al. (2001) carried out the reliability and profit analysis of a two unit cold
standby system with partial failure and two types of repairman. Tuteja et al. (2001) carried out
the cost benefit analysis of a system where operation and sometimes repair of main unit depends
on sub unit. Taneja et al. (2001) discussed a system with two types of repairman wherein the
expert repairman may not always be available. Sindhu and Gupta (2002) evaluated reliability and
profit of a two-unit cold standby system with regular and visiting repairman. Gupta and Taneja
(2003) evaluated the expected for a system with rest period, patience time and various types of
repair. Taneja and Nanda (2003) incorporated the idea of adopting one of the two repair policies-
repeat repair policy or resume repair policy by the expert repairman after the try made by the
ordinary repairman. Nanda et al. (2003) discussed the reliability properties of reverse residual
lifetime. Goyal and Gupta (2005) gave the reliability and economic analysis of a two-unit cold
standby system with three types of repair policy and replacement. Bai and Pham (2006)
discussed about the cost analysis on renewable full-service warranties for multi component
systems, while Chelbi and Rezg (2006) wrote on analysis of a production/inventory system with
randomly failing production unit subjected to a minimum required availability level.
These researches, while making the analysis through graphs and other means, took
the assumed values of failure, repair and other rates i.e. the real data on these rates were
6
not taken into consideration. Taneja et al. (2004) collected the real data on failure and repair
rates of 232 programmable logic controllers (PLC) and studied a single unit PLC considering the
four types of failure. Taneja (2005) discussed reliability and profit analysis of a system which
consists of one main unit (used for manufacturing) and two PLCs (used for controlling). Initially,
one of the PLCs is operative and the other is hot standby. Bhupender and Taneja (2007) gave the
reliability and profit evaluation of a PLC hot standby system based on a master slave concept and
two types of repair facilities. Taneja et al. (2007) carried out the profit evaluation of a two-out-
of-three unit system for an ash handling plant wherein situation of system failure did not arise.
Zuhair and Rizwan (2007) studied the reliability analysis of a two unit system. Minocha (2007)
discussed the profit evaluation of some reliability model for Technological systems. Goyal et.al
(2009) studied the reliability and profit evaluation of a 2-unit cold standby system working in a
sugar mill with operating and rest periods. Mathew et al. (2009) analysed the profit evaluation of
a single unit CC plant with scheduled maintenance. Rizwan et al. (2010) discussed the reliability
analysis of a hot standby industrial system. Goyal et al. (2010) made comparative study on the
basis of profits between two models for sulphated juice pump system working seasonally and
having different configurations. Mathew et al. (2011) discussed the reliability analysis of an
identical two-unit parallel CC plant system operative with full installed capacity. Kumar and
Bhatia (2011) studied the impact of ignored faults on reliability and availability of centrifuge
system that undergoes periodic rest. Kumar and Kapoor (2012) examined the cost-benefit
analysis of a base transceiver system considering hardware/software faults and congestion of
calls. Zhang et al. (2012) developed a reliability model and optimized maintenance of the diesel
system in locomotives.
We propose to study the reliability and economic analysis of some models on Gas turbine
power plants considering different situations depending on the variation in demand and power
production capacity of the system, gathering information on failure times, repair times, etc. from
Gas turbine power plants.
Let us now discuss some basic concepts related to our work:
7
Concept of Reliability
Introduction to reliability contains rich blend of basic concepts and practical problems
from the real world. In the most-wider sense, the word „reliability‟ can be viewed as Re and
liability which simply means that it is the liability, not one but again and again; from designer,
manufacturer, inspector, and vendor to user on all those who are involved with the system in
anyway, to make it reliable. The concept of reliability has been interpreted in many different
ways in numerous works out of which a few are listed below:
(i) Reliability is the integral of the distribution of probabilities of failure free operation from
the instant of switch on to the first failure.
(ii) Reliability is the probability that the device will operate without failure for a given time
under given operating conditions.
(iii) Reliability of a system is called its capacity for failure free operation for a definite period
of time under the given operating conditions and for lower time lost for repair and
preventive maintenance.
(iv) The reliability of equipment is arbitrarily assumed to be the equipment capacity to
maintain given properties under specified conditions and for a given period of time.
Many definitions of reliability were given by various Engineers/ Mathematicians but the
widely accepted by most contemporary reliability authorities is given by the Electronics
Industries Association (EIA) U.S.A., which states;
“Reliability is the probability of a device performing its purpose adequately for
the period of time intended under the operating conditions encounter”
This definition breaks down into four basic parts which are discussed below:
a) Probability
It provides the numerical input for the assessment of reliability and also the first
index of system adequacy. Thus, a statement that the probability of an item functioning is
8
0.9 for 60 hours indicates that only 90 times out of 100 the item would be expected its
functioning for a period of 60 hours.
b) Adequate Performance
An assessment of adequately performance is a matter of engineering appraisal and
appreciation. It requires a detailed investigation of the mode of failure for each
component and the system. It is impossible to specify and adequate reliability level, as
this well obviously vary with the system and associated consequences of failure.
c) Time
Time is the most important factor in the assessment of the reliability, since it
represents a measure of the period during which one can expect a certain of
performance from an item. In the case of a mission time may be 2 minutes for which the
reliability is required to be computed.
d) Operating Conditions
The reliability of an item must always be specified with relation to its operating
conditions, because if these vary, so will the numerical value which is used to express
reliability. For example, if a brake reliability test for an automobile is conducted over to
roads which require infrequent stops, the brake usage will be infrequent; experience has
shown that the environmental conditions such as temperature, humidity, pressure, shock
vibration, voltage, acceleration, acoustic, torque, corrosive atmosphere, gravity etc., have
definite effects on the performance of the item. It one of these conditions change beyond
given limits, item may fail.
In order to develop reliability as a model to help designers and to predict reliability at the
design stage, the main methods to improve reliability are:
(a) Reduce the complexities of the equipment to lower essential for the required operation.
(b) Increase the reliability of the components in the system.
9
(c) Introduced (parallel or series stand by units/components) and
(d) Use service facility
Quantitatively, reliability of a device in time „t‟ is the probability that it will not fail in a
given environment before time t. If T is a random variable representing the time till failure of the
device starting with an initial operable condition at t = 0, then reliability R(t) of device is given
by
R(t) = P[T > t]=1 P[T t] = 1 F(t) .
In terms of probability density function(pdf) of T, namely f(t) we get
( ) ( )t
R t f x dx
The reliability R(t) or probability of survival has the properties,
(i) R(0) = 1 since the device is assumed to be operable at t = 0.
(ii) R () = 0 since no device can work forever without failure.
(iii) R(t) is non-increasing function between 0 and 1.
In describing the reliability of a given system, it is necessary to specify
(i) the failure process,
(ii) the system configuration that describes how the system is connected and the rules of
operation and
(iii) the state in which the system is defined to be failed.
To enhance the reliability of components/systems, one needs to access their reliability
and other related measures. Furthermore, the system concept extends to service the systems and
supply chain systems for which reliability and accuracy are an important goal to achieve.
Reliability can also be improved by redundancy also which will be discussed in this chapter later
on.
10
Instantaneous Hazard Rate (or Failure Rate)
It is defined as the conditional probability that the system fails during the time interval (t,
t + t] given that it was operating during (0, t].
Let r(t) t = probability that the device has life time between t and t + t, given that it has
functioned up to time t.
= P [t < T t + t| T > t]
= ]tT[P
]tΔtTt[P
=
]tT[P
]tT[P]tΔtT[P
= )t(R
)]t(R1[)]tΔt(R1[ =
)t(R
)t(R)tΔt(R
Now, the instantaneous failure rate or hazard rate r(t) at time t is defined as
r(t) = 0Δ
lim
tΔ)t(R
)t(R)tΔt(R =
)t(R
)t('R =
)t(R
)t(f ,
where f(t) is the p.d.f. of the device life time.
It can be seen that
0
0
( ) ( ) ( ) exp[ ( ) ]
( ) ( )exp[ ( ) ]
t
t
t
F t f u du R t r u du
f t r t r u du
Reliability Modeling
For theoretical study of reliability and for conducting the effective and efficient reliability
analysis of applied problems, we need mathematical models. The correct choice of a
mathematical model which represents the essential features of an applied problem is of vital
importance for applying the theory of reliability successfully. Mathematically models may be
either deterministic or probabilistic (stochastic). If we can predict the effect of any change in the
system with certainty, it is said to be deterministic. In practice, there are the situations of
uncertainty in any prediction. The uncertainty can be accommodated by introducing a random
11
variable having some probability distribution instead of mathematical variable. Such type of a
model is known as stochastic model.
To compute the reliability of an item, it is necessary to conceive the failure. For an item,
it is important to list the properties that it must possess in the course of its usage, A deviation in
the properties from the prescribed condition is considered as a fault. A state of fault is known as
„failure‟. An item is considered to have failed under one of the following conditions:
(i) When it becomes completely inoperable due to any reason.
(ii) When it is still operable, but is no longer able to perform as required, for example, a 12
volt battery providing 3 volt instead of 12.
(iii) When a sudden serious deterioration makes the item unsafe for its further use.
(iv) When the item is operative but provides the wrong result (unwanted operations).
Most of the systems, in their life time, have three phases with changed failure
rates described as follows:
(i) Initial Failure
In the beginning, due to the defective design or manufacturing of a unit/system, a high
failure rate may be experienced and is the significant cause of failure. These failures may be
eliminated by operating the item for several hours and replacing the failed components with
tested and good components. The concept of warranty of an item is based on initial failure.
(ii) Random Failure (Chance Failure)
Defects found in the first phase are corrected and the failure rate drops to a steady-state
level for some period of time and during the second phase of life cycle of operation, we
experience a constant failure rate and the failure, if any, is due to chance. This is called the
„useful life period‟ of the item. The effect of such type of failure can be minimized by
duplicating the components (also referred to as redundancy).
(iii) Wear-Out Failure
12
Early
Failure
Period Random
Failure
Period
Wear-out
Failure
Period
Failure time
Fre
qu
ency
of
Fail
ure
At the final phase of operation, the failure rate rises again as the system suffers from the
cumulative effect of dust, vibration, abuse, temperature extremes, and many other environmental
maladies, that is , the system begins to wear out. The effect of „wear and tear‟ can be removed by
proper maintenance of the item.
All the above three phases of the failure are shown in Fig 1.1. The curve shown in the
figure is known as “Bath Tub Curve” and can be represented by Weibull distribution. From the
actuarial point of view, the failure phenomenon in an item is very analogous to the mortality or
death phenomenon in a human being shown as follows:
Phase Cause for System Failure Cause for Human Death
1 Original defect (defective design. Manufacturing and assembly) Birth defect
2 Random (Chance) Accident
3 Wear and Tear (Wear out) Age factor
Fig. 1.1
13
CAUSE
E
EFFECT
System Configurations
By a system, we mean an arbitrary device made up of parts and components assembled to
perform a certain function assuming that their reliabilities are known which help predict the
reliability of the whole system. The combined reliability of all the subsystems put together adds
up to what is known as system reliability. It is now important that the system structures be
known. Various system structures have been considered as follows:-
a) Series Configuration
A system having n-units is said to have series configuration if the failure of an arbitrary
unit (say ith
unit) causes the entire system failure. The examples of the series configurations are:
i) The aircraft electronic system consists of mainly a sensor subsystem, a guidance
subsystem, computer subsystem and the fire control subsystem. This system can only
operate successfully if all these operate simultaneously.
ii) Deepawali or Christmas traditional glow bulbs where if one bulb fails the whole lead
fails. The block diagram of a series system configuration is shown as follows : -
Fig. 1.2. Series Configuration
Let Ri(t) be the reliability of ith component, then the system reliability is given by
R(t) = Pr (T > t] = Pr (min [T1, T2, T3,…, Tn) > t]
n
1i
P [Ti > t] )t(Rn
1ii
where Ti is the life time of the ith unit of the system.
The system hazard rate, therefore, is
r(t) = n
1i )t(r
where ri(t) is the instantaneous failure rate of the ith unit.
Unit 1 Unit 2 Unit 3 Unit n
N
14
CAUSE EFFECT
b) Parallel Configuration
In this configuration, all the units in a system are connected in parallel i.e. the failure of
the system occurs only when all the units of the system fail. For example, four engined aircraft
which is still able to fly with only two engines working. Block diagram representing a parallel
configuration is shown in Fig. 1.3.
Fig.1.3. Parallel Configuration
Suppose Ri(t) and Ti be the reliability of ith component and the life time of the ith unit in
time t, respectively, then the system reliability is given by
R(t) = Pr(T >t) = Pr [ max (T1, T2, T3,…, Tn) > t]
= 1P [max (T1, T2, T3,…, Tn) t]
= 1P (T1 t, T2 t, T3 t,… ,Tn t]
If the units function independently, then
R(t) = 1[1R1(t)] [1R2(t)] [1R3(t)]… [1Rn(t)]
=
n
1ii )]t(R1[1 .
c) Standby Redundant Configuration
Redundancy is a device to improve the reliability of a system. In redundant system, more
units are made available than which are necessary. There are two types of redundancy:
(a) Active Redundancy
(b) Passive redundancy
Unit 1
Unit 2
…….
Unit n
15
INPUT
OUTPUT
(a) Active Redundancy
In this case of redundancy, the system has a positive probability of failure even when it is
not in operation. This may happen due to the effect of temperature, environmental condition etc.
Active redundancy can further be classified as hot redundancy and warm redundancy:-
(i) If the off-line unit can fail and is loaded in exactly the same way as the operating unit, it
is called hot standby unit.
(ii) If the off-line unit can fail and can diminish the load, it is called warm standby unit. The
probability of failure for a warm standby is less than that of failure for operative unit.
(b) Passive or Cold Standby Redundancy
This is that form of redundancy in which the off-line unit cannot fail and is completely
unloaded.
Reliability R(t) of an n-unit standby system at any time instant t is given by
R(t) = P ]tT[n
1ii
where Ti is the life time of ith unit and all the n-units are independent.
Fig.1.4. Standby redundant configuration
Unit 1
Unit 2
Unit n
16
A standby system functions as long as one of the units is available for the task on hand.
A block diagram of such a system is shown as in Fig. 1.4.
(d) k-out-of-n configuration
In many problems the system operates if at least k-out-of-n units function, e.g., a bridge
supported by n-cables, k of which are necessary to support the maximum load. If each of n-units
is identical with the same reliability then the system reliability becomes
R(t) =
n
ki
nCi e
it (1e
t)ni
There exists many other configurations such as series-parallel, parallel-series, mixed
parallel, etc. which are used by the industries.
Stochastic Processes
A stochastic process is a family of random variables indexed by a parameter set realising
values on another set known as the state space. Both the parametric set and the state space can
be either discrete or continuous.
In a stochastic process {X(t), t T}, where X(t), t and T respectively are the state space,
parameter (generally taken to be time) and the index set. If T is countable set such as T = {0, 1,
2, 3,…}, then the stochastic process is said to be a discrete parameter process and if T = {t :
< t < } or T = {t : t 0}, the stochastic process is said to be continuous parametric process.
The state space is classified as discrete or continuous according to whether it is countable or
consists of an interval on the real line. In the present study, we deal with discrete state space
continuous parameter stochastic process.
Markov Process
A stochastic process is known as Markov Process if the future development is completely
determined by the present state and is independent of the way in which the present state has been
developed. If {X(t), t T} is a stochastic process such that, given the value of X(s), the value of
X(t), t > s do not depend on the values of X(u), u < s, i.e. for t > s, i s
Pr[X(t) = i|X(u), 0 u s] = Pr[X(t) = i |X(s)]
Then the process {X(t), t T} is a Markov process.
17
Stochastic processes which do not possess the Markovian property are said to be non-
Markovian.
Markov Chain
A Markov process with discrete state space is said to be a Markov chain.
Mathematically, a stochastic process {Xn ; n = 0, 1,2,…} is called a Markov chain if, for j, k, j1,
j2……….jn1 N
If the transition probabilities pjk are independent of n, the Markov chain is said to be
homogeneous and if it is dependent on n the chain is said to be non-homogeneous.
Renewal Process
Suppose we have a repairable system which starts operation at t = 0. If X1 denotes the
time to first failure and Y1 denotes the time from first failure to next system operation (after
repair) then t1 = X1 + Y1 denotes the time of first renewal. Similarly, if X2 denotes the time from
first renewal to second failure and Y2 denotes the time from second failure to second renewal
then t2 = X2 + Y2 and the time of second renewal is t1 + t2. In general, ti = Xi + Yi (inter-arrival)
is the time between the (i1)th and ith renewal) for i = 1,2,3,…. . If we define
S0 = 0, Sn = t1 + t2 + … tn
= epoch of nth renewal,
and N(t) = number of renewals during (0, t]
then the process {N(t), t > 0} is called renewal process.
Markov Renewal Process
Let the states of a process be denoted by the set E = {0, 1, 2, …}, and let the transitions
of the process occur at epochs t0 (= 0), t1, t2,…,tn (tn < tn+1). If
Pr{Xn+1 = k, tn+1tn t|X0 = i0,…, Xn = in : t0, t1,…tn}
= Pr(Xn+1 = k, tn+1 tn t | Xn = in}
then {Xn, tn}, n = 0, 1, 2, …., constitutes a Markov renewal process with state space E.
18
Semi-Markov Process
In the above, if we assume that the process is time homogeneous, i.e.
Pr{Xn+1=j, tn+1 tn |Xn = i} = Qij(t), i, j s
is independent of n, then there exist limiting transition probabilities
pij = t
lim Qij(t) = Pr(Xn+1 = j | Xn = i}.
Then {Xn, n = 0, 1, 2,…} constitutes a Markov chain with state space E and transition probability
matrix (t.p.m) is given by
P = [pij].
The continuous parameter stochastic process Y(t) with state space E defined by
Y(t) = Xn, tn < t < tn+1
is called a semi-Markov process.
In other words, we define the semi-Markov process is a process in which transition from
one state to another is governed by the transition probabilities of a Markov process but the time
spent in each state before a transition occurs is a random variable depending upon the last
transition made. Thus at transition instants, the semi-Markov behaves just like a Markov
process. However, the times at which transitions occur are governed by a different probability
mechanism.
Regenerative Process
Regenerative stochastic process was defined by Smith (1955) and has been crucial in the
analysis of complex systems. In this, we take a time point at which the system history prior to
the time point is irrelevant to the system conditions. These points are called regeneration points.
Let X(t) be the state of the system at epoch t. If t1, t2, … are the epochs at which the process
probabilistically restarts, then these epochs are called regenerative epochs and the process {X(t),
t = t1, t2…} is called regenerative process.
Supplementary Variable Technique
19
It was developed by Cox (1955), in which the process is made Markovian by introducing
some supplementary variables. This technique can briefly be explained as under:
Consider a complex system in which repair times follow general time distribution. At a
particular instant „t‟, the system can either be in operational state or in the failed state. If the
system is in failed state at time „t‟, the probability of transition to the operable state cannot be
determined unless the elapsed repair time at that time t is specified. A supplementary variable
say „x‟, representing the elapsed repair time of the failed unit is introduced and as such is defined
as the probability that at time t, the system is in the failed state and elapsed repair time lies in the
interval (x, x + ∆). Thus the process becomes Markovian in nature. It is to be noted that such
supplementary variable automatically disappears at the solution stage.
Transforms and Convolutions
(a) Laplace Transform
Let f(t) be a function of a positive real variable t. Then the Laplace transform (L.T.) of
f(t) is defined as
L[f(t)] = f*(s) =
0
est
f(t) dt
for the range of the values of s for which the integral exists. Here, f(t) is called an inverse
Laplace transform of f*(s) and we write f(t) = L1
{f*(s)}. The following are some important
properties of Laplace transform:
(i) L [
n
1iii )]t(fc =
n
1i
*
iifc (s)
(ii) L [tn f(t)] = (1)
n
n
n
ds
)s(*fd
(iii) L [ t
0
]du)u(f = L[F(t)] = s
)s(*f
(iv) 0t
lim
f(t) = s
lim sf*(s) (initial value theorem)
20
(v) t
lim F(t) = 0s
lim
s f*(s) (final value problem)
(vi) 0s
lim
f*(s) = 1 if f*(s) is L.T. of a p.d.f.
(b) Laplace Stieltjes Transform
Let X be a non-negative random variable with distribution function
F(x) = Pr [ X x ]
then Laplace Stieltjes transform (L.S.T.) of F(x) is defined, for s > 0 by
F**(s) =
0
esx
dF(x)
Therefore, we have
F**(s) =
0
esx
f(x) dx = f*(s).
where f(x) = dx
)x(dF .
Convolution
Let f(t) and g(t) be two real valued non-negative continuous functions of t, then the
integral
t
0
)ut(f g(u)du = t
0
g (tu)f(u)du
= f(t) g(t) = L1
[f*(s).g*(s)]
is called Laplace convolution of the functions f(t) and g(t).
If F(t) and G(t) be two real valued distribution functions defined for t 0, the resulting
convolution is again a distribution function and the integral
t
0
)ut(F dG(u) = t
0
)ut(G dF(u) = F(t) G(t)
is known as Stieltjes convolution of F(t) and G(t).
First Passage Time
21
Suppose that a system starts with a state j, then time taken to reach a given state k for the
first time from state j is called first passage time. In general, first passage time is a measure of
how long it takes to reach a given state from another state.
Mean Sojourn Time in a State
The expected time taken by the system in a particular state before transiting to any other
state is known as mean sojourn time or mean survival time in that state. If Ti be the sojourn time
in state i, then mean sojourn time in state i is
i =
0
P (Ti > t) dt
Mean Time to System Failure (MTSF)
No system can operate in the same manner and also it cannot operate for an infinitely
long time due to aging of components or some other reasons. One must, therefore, be interested
in a measure representing the lifetime of the system to avoid sudden failure. Such measure is the
Mean Time to System Failure (MTSF) which corresponds to the average duration between
successive system failures. This measure is defined as the expected time for which the system is
in operation before it completely fails.
Suppose the reliability function for a system is given by R(t) = 1 F(t), where F(t)
is the failure time distribution function and f(t) = dF(t)/(dt) is the failure time density function.
The mean time to system failure is given by
MTSF =
0
t f(t) dt
=
0
t
dt
)t(dRdt
= [tR(t)
0
] +
0
R(t) dt
=
0
R(t) dt = 0s
lim
R*(s) .
Let 0(t) be the cumulative distribution function of the first passage time from initial state to a
failed state, then
22
R*(s) = s
)s(1 **
0
Thus, we have
MTSF =0s
lim
s
)s(1 **
0
.
Availability
On the unavailability of a system due to break downs, it is put back into operation with
proper repairs. In fact, it is concerned with availability equally as it does with reliability because
of additional costs and inconvenience incurred when the system is not available. The differences
between the measures reliability and availability are given as follows:
(i) The reliability is an interval function while the availability is a point function
describing the behaviour of the system at a specified epoch.
(ii) The reliability function precludes the failure of the system during the interval under
consideration, while availability function does not impose any such restriction on the
behaviour of the system.
We may categorize availability as :
(i) Instantaneous (Point wise) Availability
This is the probability that the system will be able to operate within the tolerances at a
given instant of time and is also called operational readiness.
Let X(t) = 1, if the system is operable at time t; and X(t) = 0, when it is not operable. The
availability A(t) of the system at time t is given by
A(t) = P[X(t) = 1| X(0) = 1].
Hence, X (t) is a binary variable having values 1 and 0, respectively for the operation and
non-operation of the system at an instant t.
(ii) Average (Interval) Availability
It is the expected fraction of a given interval of time that the system will be able to
operate within tolerances. It is also called the efficiency of the system and its limiting value is the
inherent availability.
23
Suppose the given interval of time is (0, T]. Then interval availability H(0, T] = A(T) for
this interval is given by
A(T) =
T
0T1 A(t) dt .
(iii) Steady State (Limiting Interval) Availability
It is defined as the probability that in the long run that the system operates satisfactory.
To obtain steady state availability, we simply compute
A(∞) = T
lim H(0, T) = T
lim A(T) .
Maintainability
Maintainability is associated with a system under repair. It is the probability that the
system will be restored to operational effectiveness within a specified time when the
maintenance action is taken in accordance with prescribed conditions. Maintenance is one of the
effective ways of increasing the reliability of a system. Maintenance of a system is of two types:
(i) Preventive maintenance (PM)
(ii) Corrective maintenance (CM)
PM includes actions such as lubrications, replacement of a nut or a screw or some part of
the system, refueling, cleaning, etc., while CM involves minor repairs that may crop up between
inspections.
On failure of a unit, it is sent to a repair facility, if available, otherwise it queues up for
repair. There may be two types of repair policies as follows:
(i) Repeat Repair Policy
Due to certain reason the repair of a failed unit has to be stopped. When the repair is
begun again, it is started all over again.
(ii) Resume Repair Policy
The repair of a failed component is terminated before completion due to one reason or the
other. When it begins again, it is started from the stage where it was prior to the termination of
the repair.
24
Busy Period
Let B(t) be the probability that a repairman is busy with the system in the interval (0,t].
Then in the long run, the total fraction of time for which a repairman is busy, is given by
B =t
lim B(t)
Down Period
Let D(t) be the probability that system is down due to unavailability of the required
number of operable units for the system in the interval (0,t]. Then in the long run, the total
fraction of time for which the system is down, is given by
D =t
lim D(t)
Expected Number of Visits by the Repairman to the System
Let V(t) be a random variable representing the number of times a repairman has visited
the system in the interval (0,t] then the expected number of visits by the repairman to the system
in (0,t] is E[V(t)] and in the long run, the expected number of visits per unit time is given by
V = t
limt
tVE )]([
Profit Analysis
No organization can serve for long without minimum financial returns for its investment.
Therefore, profit analysis is an important aspect in the field of reliability. Profit of a system
depends upon various factors. For instance, production cost of maintenance and spares, failure
rates, repairman employed, cost of calling the repairman, etc. Availability of the system leads to
revenue whereas the busy period of the repairman for inspection, busy period of the repairman
for repair, the number of visits by the repairman, the down time of the system lead to the
costs/loss.
The profit is excess of revenue over the cost of production. The profit function takes the
form:
P(t) = Expected revenue in (0, t] expected total cost in (0, t]
In general, the optimal policies can more easily be derived for an infinite time span as
compared to a finite span. The profit per unit time is expressed as
25
t
)t(Plimt
i.e. profit per unit time = total revenue per unit time total cost per unit time.
Let us, for example, consider a system which involves only the following costs:
C0 = revenue per unit up time of the system.
C1 = cost per unit time for which the repairman is busy.
C2 = cost per visit of the repairman.
C3 = cost per unit down time.
Let A = the total fraction of time for which the system is up.
B = the total fraction of time for which the repairman is busy.
V = expected number of visits of the repairman.
D = expected down time of the system
Then the expected profit in steady-state is given by
P = C0 A C1 B C2 V C3 D.
Let us now discuss some important continuous distributions which are used for
failure/repair times of various systems/components.
Some Important Continuous Distributions
Data on fatigue failure of materials and life length of systems/components are fitted to
variety of distributions. However, failure/repair times of the systems/components usually follow
one of the following distributions:
Exponential Distribution
A continuous random variable having the range 0 t < is said to have an exponential
distribution if it has the probability density function of the form
et
, 0 t <
f(t) = 0 , t < 0
where is a positive constant. The corresponding distribution function is
26
1 e
t, 0 t <
F(t) = 0 , t < 0 .
The hazard rate „‟ is constant. The Laplace transformation of the p.d.f. of exponential
distribution is /(+s).
Exponential distribution plays an important role in reliability studies. Besides a number
of mathematical properties, it has a very important property known as „memory less property‟.
For example, an electric fuse (assuming it cannot melt partially) whose failure life distribution is
practically unchanged as long as it has not yet failed.
Weibull Distribution
A Weibull distribution has the density function defined by
1
exp , x 01
bb ax
f x axb
Its distribution function is
1
1 exp , x 01
baxF x
b
where a and b are positive constants and are known as “scale” and “shape” parameters
respectively.
It is evident that the exponential and Rayleigh distributions are the special cases of the
two-parameter Weibull distribution when b = 0 and b = 1 respectively.
Normal Distribution
Normal distribution is a two-parameter distribution of a continuous random variable whose
probability has the form:
2 21
exp / 2 , - < < 2
f x x x
The constants and > 0 are arbitrary and represent the mean and standard deviation of the
random variable.
27
This is the most important probability distribution for use in statistics. In reliability work
it is mostly used as a limiting form for binomial and Poisson distributions.
The Lognormal Distribution
If the random variable T, the time to failure, has a lognormal distribution, the logarithm
of T has a normal distribution. This is a very useful relationship in working with the lognormal
distribution. The density function for the lognormal is
22
2
1 1exp ln , t 0
22 med
tf t
s tst
where the parameter s is a shape parameter and medt , the location parameter, is the median time
to failure.
The distribution is defined for only positive values of t and is therefore more appropriate
than the normal as a failure distribution. Like the Weibull distribution, the lognormal can take on
a variety of shapes. It is frequently the case that data that fit a Weibull distribution will also fit a
lognormal distribution.
The mean, variance, and mode of the lognormal are
2exp / 2medMTTF t s
2 2 2 2exp exp 1medt s s
mod 2exp
mede
tt
s
To compute failure probabilities, the lognormal‟s relationship to the normal is utilized.
Goodness-of-Fit Tests
For the selection of a theoretical distribution, a statistical test for goodness of fit is
performed. Such a test compares a null hypothesis (H0) with an alternative hypothesis (H1)
having the following form:
H0: The failure times came from the specified distribution.
H1: The failure times did not come from the specified distribution.
28
The test consists of computing a statistic based on the sample of failure times. This
statistic is then compared with a critical value obtained from a table of such values. Generally, if
the test statistic is less than the critical value, the null hypothesis (H0) is accepted; otherwise, the
alternative hypothesis (H1) is accepted.
There are two types of goodness-of-fit tests: general tests and specific tests. A general
test is applicable to fitting more than one theoretical distribution, and a specific test is tailored to
a single distribution. When available, specific tests will be more powerful (have a higher
probability of correctly rejecting a distribution) than general tests.
Here, we shall discuss specific tests for the exponential, Weibull, normal, and lognormal
failure distributions.
Bertlett’s Test for Exponential Distribution
This test is applied to test the Hypothesis
H0 : failures times are exponential
against HA : failures times are not exponential.
The test statistic is
B = r6/)1r(1
tlogr
1t
r
1logr2
r
1i
i
r
1i
i
,
where ti be ith time to failure and r be number of failures. It follows Chi-square distribution with
r1 degree of freedom. For the level of significance , if 2)1r,2/(
2)1r,2/1( B , then the
null hypothesis is accepted and we can say that failure times follow exponential distribution.
Mann’s Test for Weibull Distribution
A specific test for the Weibull failure distribution is a test developed by Mann, Schafer,
and Singpurwalla. The hypotheses are
H0: The failulre times are Weibull.
H1: The failure times are not Weibull.
The test statistic is
29
1
1
1
1 11
2 11
ln ln /
ln ln /
r
i i ii k
k
i i ii
k t t MM
k t t M
where 1
2
rk
2
1
2
rk
1i i iM Z Z
0.5ln ln 1
0.25i
iZ
n
and x is the integer portion of the number x. Mi is an approximation. If M > Fcrit, then H1 is
accepted. Values for Fcrit may be obtained from tables of the F-distribution if one lets the number
of degrees of freedom for the numerator be 2k2 and the number of degrees of freedom for the
denominator be 2k1.
This test is for the two-parameter Weibull distribution. Therefore, if the alternative
hypothesis is accepted, the three-parameter Weibull as well as other distributions should be
considered. Observe that the data must be rank-ordered for the test statistic to be computed.
Kolmogorov-Smirnov Test for Normal and Lognormal Distributions
A goodness-of-fit test for use with the normal distribution when the parameters are
estimated is a version of the Kolmogorov-Smirnov test developed by H.W. Lilliefors. It
compares the empirical cumulative distribution function with the normal cumulative distribution
function. The hypotheses are
H0: The failure times are normal.
H1: The failure times are not normal.
The test statistic is Dn = max {D1, D2}, where
1 2max max1 1
1 i i
i n i n
t t t ti iD D
s n n s
2
2 1
1
s1
nn
ii i
i
t ttt
n n
If Dn < Dcrit, then accept H0; if Dn > Dcrit, then accept H1. This test is appropriate for complete
samples only.
30
Glimpses of the Thesis
The present thesis entitled "Reliability and Economic Analysis of Some Models on
Gas Turbine Power Plants" is an attempt to develop the reliability models on Gas turbine
power plants considering different situations depending on the variation in demand and power
production capacity of the system, gathering information on failure times, repair times, etc. from
Gas turbine power plants, with the following objectives:
To obtain the reliability, the expressions for the mean time to system failure and for
various other measures of the system effectiveness.
To discuss the economic analysis of the system using various measures of system
effectiveness.
To know the behaviour of the MTSF and the profit function graphically with respect to
various rates, costs, revenue, etc.
To make comparison between the models for the systems working in Gas Turbine plants
studied under different situations/ considerations and to identify which and when one
model is better than the other.
The methodology used for the analysis is as follows:
Data/information on failure times, repair times, various costs, etc. has been gathered
visiting some Gas turbine power plants. Then some models have been developed on the basis of
the situations existing in the plants visited and some proposed situations. Reliability, Mean Time
to System Failure (MTSF) and various other measures of the system effectiveness have been
obtained by making use of semi-Markov processes and regenerative point technique. Expression
for the profit has been obtained for each of the models discussed using the obtained measures of
the system effectiveness. Computer programs using BASIC / C language were developed for
evaluating various measures of the system effectiveness and hence the profit for particular cases,
that is, for some numerical values of various rates, costs, revenue, probabilities, etc. taken on the
basis of the data/information gathered from the plants visited and assuming values for other
parameters for which the information was not provided. Then various graphs have been plotted
for the MTSF, availability, and the profit with respect to various rates, costs, revenue,
31
probabilities, etc. using MS Excel. Comparative study, so far as the profitability of the system
under different situations is concerned, has also been made among the models studied. The
techniques/ methods used for deriving the expressions for various measures of system
effectiveness include the Laplace/ Laplace Stieltjes Transforms and convolutions, Cramer‟s rule
for solving a system of equations, etc.
The present study is covered in the seven chapters of the thesis and is summarized as
follows:
Chapter 1 is introductory in nature. Origin, history and development of reliability are
covered in this chapter. It also discusses the fundamental concepts and definitions related to the
work done in the thesis to make the thesis sufficient in itself.
Chapter 2 presents the information gathered on failures and repairs of the systems
working in Gas turbine plants visited by the author. Estimates of mean failure/repair/inspection
times and hence the failure/repair/inspection rates are obtained on the basis of the information
gathered from the plants. Estimates of various costs and probabilities have also been estimated
from the gathered information. These estimated values have been used in the subsequent chapters
for making the graphical study and giving useful interpretations.
In Chapter 3, a reliability model is developed for a gas turbine power plant comprising
one gas and one steam turbine wherein scheduled inspection is done at regular intervals of time
for maintenance. Initially, both the units i.e. the gas turbine as well as the steam turbine are
operative. On failure of the gas turbine, system goes to down state, whereas on failure of the
steam turbine, the system may be kept in the up state with only gas turbine working or put to
down state according as the buyer of the power so generated is ready to pay higher amount or
not. When only the gas turbine is operative and the steam turbine is failed, this type of working
of the system is called working in the Single Cycle; whereas when both the units are operative
then it is called the Combined Cycle. Three types of scheduled inspection, that is, minor, path
and major inspection are done in this order at regular intervals of times for maintenance.
Chapter 4 investigates a model for a gas turbine power plant comprising one gas and one
steam turbine wherein random inspection is carried out instead of scheduled inspection to detect
32
which one of the three types of maintenance (Minor, Path or Major) needs to be done. Initially,
both the units i.e. the gas turbine as well as the steam turbine are operative. On failure of the gas
turbine, system goes to down state, whereas on failure of the steam turbine, the system may be
kept in the up state with only gas turbine working or put to down state according as the buyer of
the power so generated is ready to pay higher amount or not. When only the gas turbine is
operative and the steam turbine is failed, this type of working of the system is called working in
the Single Cycle; whereas when both the units are operative then it is called the Combined
Cycle. Inspection is done at random points of time which reveals as to which one of the three
types of maintenance is required and accordingly that type of maintenance is done.
In Chapter 5, the reliability and cost-benefit analysis of a gas turbine power plant
comprising two gas turbines and one steam turbine wherein scheduled inspection is done at
regular intervals of time for maintenance is examined. Initially, all the three units i.e. two gas
turbines as well as one steam turbine are operative and the system is considered as to work at full
capacity. On failure of one of the gas turbines with steam turbine working, the system works at
reduced capacity. If both the gas turbines get failed, the system goes to down state; whereas on
failure of the steam turbine, the system may be kept in the up state with one of the gas turbines
working or put to down state according as the buyer of the power so generated is ready to pay
higher amount or not and this is working in single cycle. Three types of scheduled inspection,
that is, minor, path and major inspection are done in this order at regular intervals of times for
maintenance.
Chapter 6 studies the reliability and cost-benefit analysis of a gas turbine power plant
generating system comprising two gas turbines and one steam turbine wherein random inspection
is carried out instead of scheduled inspection to detect which one of the three types of
maintenance (Minor, Path or Major) needs to be done. Initially, all the three units i.e. two gas
turbines as well as one steam turbine are operative and the system is considered as to work at full
capacity. On failure of one of the gas turbines with steam turbine working, the system works at
reduced capacity. If both the gas turbines get failed, the system goes to down state; whereas on
failure of the steam turbine, the system may be kept in the up state with one of the gas turbines
working or put to down state according as the buyer of the power so generated is ready to pay
33
higher amount or not and this is working in single cycle. Inspection is done at random points of
time which reveals as to which one of the three types of maintenance is required and accordingly
that type of maintenance is done.
In Chapter 7, the comparative study of the models studied in the preceding chapters is
made on the basis of profits evaluated for them. The logic behind the comparative study is that
no model can be best in every situation. One model may be better for a situation whereas it may
be worse for some other situation and hence the comparative study becomes more important.
Comparative analysis has been done plotting the graphs for profits of two models at a time and
also for the profits of all the studied models at atime. Interesting interpretations have been made
on the basis of the graphs which help decide which and when one model is better than the other.
In each of the four Chapters 3-6, use of semi-Markov processes and regenerative point
technique has been made for analyzing the models discussed in the thesis. Various measures of
system effectiveness such as MTSF, steady-state availability at full capacity (all the turbines
working), at reduced capacity (one of the two gas turbines and one steam turbine working) and in
single cycle (only one gas turbine working and steam turbine not working), busy period analysis
of the repair facility for repair/inspection, expected down time, expected number of visits, and
the expected profit incurred to the system have been obtained. Graphical study for particular
cases is also made for each of the models and various interesting interpretations have been made.
----- o -----