CHAPTER I - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/44783/6/06...Chapter 1 Introduction Since Enlightenment technology has been shaping our life to a greater extent. Machines

Chapter 1

Introduction

Since Enlightenment technology has been shaping our life to a greater extent. Machines

are not only supplementing the human effort rather they have come to almost replace it in some

sectors. This scenario requires us to develop newer, more complex and sophisticated technical

systems to cater to the needs of the contemporary postmodernist, post-industrial society. Today

we are surrounded by more powerful systems than ever and they are being constantly and

continuously designed and developed. The process and technology is aimed at making our lives

smooth though also ends up contributing to making it more complex. The impact of the failure or

mismanagement of a power generating and power distribution system in a major city, the

malfunction of an air traffic control system at an international airport, miscommunication in

today‟s internet systems or the breakdown of a nuclear power plant is simply frightening. There

is too much at stake in terms of cost, human life, and national security to take any risks with

devices, and we cannot afford any malfunctioning, even an accidental one. Hence, if the systems

are to be reliable, high level of reliability components are required to be used—a constant in

science and technology. Our knowledge about the universe and all that it contains is in a state of

constant up-gradation and technology simply being “a marriage of money and scientific

principle” also keeps becoming old, obsolete, financially unviable and almost redundant. As a

consequence, the importance of reliability at all stages of modern engineering processes,

including design, manufacturing, distribution and operation is a must. In efficiency of an

equipment, „reliability‟ acts as a vital factor.

The concept of reliability has been known since long. It has grown out of the demand of

modern technology and particularly out of the experiences of the complex military systems

during World War II. The automation and complexity of the military systems resulted into

dealing with the problems of maintenance and repair and a lot of work was reported in the field

of reliability which compelled to consider the reliability as a technical parameter. Davis (1952)

2

discussed failure data and goodness of fit tests for various competing failure distributions.

Epstein and Sobel (1955) and Epstein (1958) worked in the field of life testing with the

assumption of exponential distribution. After these studies, the exponential failure time

distribution acquired a unique position in life testing and reliability analysis.

To tackle with general distribution, the techniques like imbedded Markov chains and

phase techniques etc. were widely used to solve the problems. Another technique named as

'Inclusion of Supplementary Variable(s)' developed by Cox in 1955 was employed by Garg

(1963) for the first time in the reliability evaluation problems. This idea was further developed

by Busacott (1971) and Chow (1973), who studied the concept of reliability of some redundant

systems with repair.

Mine and Kaiwal (1979) enhanced the system reliability by assigning priority repair

disciplines. Kontoleon (1980) determined the reliability of r-successive-out-of-n: F

system.Chung (1982) studied some stress-strength reliability models. Kumar (1982) presented a

recursive algorithm to evaluate the reliability of a consecutive-k-out-of-n: F system. Mandaltsis

and Ontolen (1987) studied the overall reliability determination of computer networks with

hierarchical routing strategies. Sharma (1991) gave the corrected bounds for reliability when

strength and stress distributions are known. Ksir and Boushaba (1993) gave the reliability bounds

and direct computation of the reliability of a consecutive k-out-of-n : F system with Markov

dependence. Rajamanickam and Chandeasekhar (1997) gave the reliability measures for two-unit

systems with a dependent structure for failure and repair times. Wang et al. (2002) modeled the

bathtub shape hazard rate function in terms of reliability. Yam et al. (2003) proposed a method

for evaluation of reliability indices for repairable circular consecutive k-ou-of-n: F system. Li

and Chen (2004) discussed the aging properties of the residual life length of k-out-of-n system

with independent but non-identical components. Peiravi (2008) gave the estimation of expected

lifetime and reliability during burn in field operation using Markov chain Monte Carlo

simulations. Taneja (2009) developed a reliability model for a system with conditional warranty

and various types of repair/replacement. Pathak and Joshi (2009, 2011) optimized the reliability

modeling of MEMS devices. Hu et al. (2010) developed the system reliability prediction model

based on evidential reasoning algorithm with non-linear optimization. Lakshminarayana and

Kumar (2013) optimized the reliability of integrated reliability model using dynamic

3

programming and failure modes effects and gave criticality analysis. Mi et al. (2013) studied the

reliability analysis of multi-state system with common cause failure based on Bayesian networks.

Besides finding reliability, the studies were carried out to evaluate other measures

also by various researchers for complex systems consisting of one or two or more units

introducing various concepts. The concept of availability was widely discussed in literature

and the main contributors are Barlow and Hunter (1960), Gaver(1963), Sandler (1963), Myers et

al. (1964), Barlow and Proschan (1965), Rau (1970), Beran (1974) and Arndt and Franken

(1977). Srinivasan and Gopalan (1973) concentrated on regenerative point technique. Nakagawa

and Osaki (1975) considered stochastic behavior of a two-unit priority standby redundant system

with repair. Gopalan et al. (1975) discussed in his paper the availability and reliability of one

server two unit systems with imperfect switchover. Nakagawa (1976) considered the replacement

of the unit at a certain level of damage while Arora (1977). Dhillon (1980) discussed the

availability analysis of systems with two types of repair facilities. Nakagawa (1980) studied the

optimum inspection policy for a standby unit by taking a standby electric generator as an

example. Yamashen (1980) worked on a multistate with several failure modes and cold standby

unit. Murari and Maruthachalam (1981) studied the working of two unit parallel systems with

periods of working and rest. Ramamurthy and Jaiswal (1982) studied the time to failure for a

two-dissimilar-unit cold standby system with allowed down time. Murari and Goyal (1983)

studied a two-unit cold standby system with two types of repair facility. Murari and

Maruthachalam (1984) considered a two unit system with two different interlinking in two

different periods. Goyal (1984) studied a two-unit cold standby system with two types of repair

facilities. Goel and Gupta (1984) discussed the stochastic behaviour of a two-unit standby

system with better utilization of units. Dharmadkikari and Gupta (1985) studied the stochastic

behaviour of 1-out-of-2:G warm standby repairable system. Goel et al. (1985) dealt with a two-

unit cold standby system with two types of operation and repair. Goel et al. (1986) obtained the

reliability analysis of a system with preventive maintenance and two types of repair. Murari and

Al-Ali (1986) analysed the reliability of a system subject to random shocks and preventive

maintenance. Gazit and Malek (1988) considered the fault tolerance capabilities in multistage

network based multi computer systems. Guo Tong De (1989) studied stochastic behavior of a

system with preparation for repair. Mahmoud (1989) worked on two-unit system with two types

of failure and preventive maintenance. Gopalan and Muralidharan (1991) analysed a system with

4

online preventive maintenance and repair. Kumar (1995) made a comparative study for standby

redundancy at system and component levels. As some of the causes of failures are due to human

errors like misinterpretation of instruments, wrong actions, maintenance errors, lack of good

knowledge of jobs, environment, poor training or skills of operating personnel and hence Yang

and Dhillon (1995) analysed a general standby system stochastically with constant human error.

Dekker (1996) discussed about application of maintenance models: A review and

analysis.Mokkaddis et al. (1997) analysed a two-unit warm standby system subject to

degradation. Attahiru and Zhao (1998) studied the stochastic analysis of a repairable system with

three-units and repair facilities. Wang (2002) carried out a survey of maintenance policies of

deteriorating systems. Apeland and Scarf (2003) wrote about subjective approach to modeling

inspection maintenance. Barros et al. (2003) studied optimization of replacement times using

imperfect information. Attardi and Pulcini (2005) wrote about a new model for repairable

systems with bounded failure intensity. Glazebrook et al. (2005) proposed index policies for the

maintenance of a collection of machines by a set of repairmen. Chien and Sheu (2006) presented

an extended optimal age-replacement policy with minimal repair of a system subject to shocks.

Karamatsoukis and Kyriakidis (2009) wrote about optimal maintenance of a production-

inventory system with idle periods. Brick and Uchoa (2009) discussed a facility location and

installation of resources model for level of repair analysis. Kuzin (2010) studied the vibration

reliability and endurance of a centrifuge for separating suspensions.

Needless to say that aspect of cost/profit is also very important for the systems for

which the problem of reliability and maintenance being carried out. Thus, this aspect

received attention in 80s and thereafter a lot of work has been done on analysing cost/profit for

such systems in the literature of reliability. Goel and Gupta (1985) dealt with cost analysis of a

two-unit priority standby system with imperfect switch and arbitrary distributions. Murari et al.

(1985) worked out cost analysis of two-unit warm standby system with regular repairman and

patience time. Mokaddis et al. (1989) gave the profit analysis of two-unit priority system with

administrative delay in repair. Gopalan et al. (1991) carried out the cost analysis of a system

subject to on-line preventive maintenance and repair. Tuteja and Taneja (1991, 92, 93)

investigated reliability and profit analysis of two-unit standby system introducing the concepts of

two identical repairmen, minor repair, partial failure and random inspection. Goel et al. (1992)

gave the idea of random change of operative unit. Rander et al. (1991, 92) discussed a system

5

with major and minor failures and preparation time in case of major failure and a system with

imperfect assistant repairman and perfect master repairman. Gupta et al. (1993) dealt with the

profit analysis of a two-unit priority standby system subject to degradation and random shocks.

Singh and Mishra (1994) evaluated profit for a two-unit standby system with two operating

modes. Rander et al. (1994) investigated the cost analysis of two dissimilar cold standby systems

with preventive maintenance and replacement of standby. Pandey and Jacob (1995) gave the cost

analysis and MTTF of a three state standby complex system under common cause and human

failures. Gupta et al. (1997) dealt with the analysis of a system with three non-identical units

(Super-priority, priority and ordinary) with arbitrary distributions. Tuteja et al. (1999) discussed

a two server system with regular repairman who is not always available. Rizwan and Taneja

(2000) analysed the profit of a system with perfect repair at partial failure or complete failure.

Sehgal (2000) studied some reliability models with partial failure, accidents and various types of

repair. Singh et al. (2001) wrote on a two unit warm standby system with accident and various

types of repair. Siwach et al. (2001) studied two-unit cold standby system with instruction and

accident. Tuteja et al. (2001) carried out the reliability and profit analysis of a two unit cold

standby system with partial failure and two types of repairman. Tuteja et al. (2001) carried out

the cost benefit analysis of a system where operation and sometimes repair of main unit depends

on sub unit. Taneja et al. (2001) discussed a system with two types of repairman wherein the

expert repairman may not always be available. Sindhu and Gupta (2002) evaluated reliability and

profit of a two-unit cold standby system with regular and visiting repairman. Gupta and Taneja

(2003) evaluated the expected for a system with rest period, patience time and various types of

repair. Taneja and Nanda (2003) incorporated the idea of adopting one of the two repair policies-

repeat repair policy or resume repair policy by the expert repairman after the try made by the

ordinary repairman. Nanda et al. (2003) discussed the reliability properties of reverse residual

lifetime. Goyal and Gupta (2005) gave the reliability and economic analysis of a two-unit cold

standby system with three types of repair policy and replacement. Bai and Pham (2006)

discussed about the cost analysis on renewable full-service warranties for multi component

systems, while Chelbi and Rezg (2006) wrote on analysis of a production/inventory system with

randomly failing production unit subjected to a minimum required availability level.

These researches, while making the analysis through graphs and other means, took

the assumed values of failure, repair and other rates i.e. the real data on these rates were

6

not taken into consideration. Taneja et al. (2004) collected the real data on failure and repair

rates of 232 programmable logic controllers (PLC) and studied a single unit PLC considering the

four types of failure. Taneja (2005) discussed reliability and profit analysis of a system which

consists of one main unit (used for manufacturing) and two PLCs (used for controlling). Initially,

one of the PLCs is operative and the other is hot standby. Bhupender and Taneja (2007) gave the

reliability and profit evaluation of a PLC hot standby system based on a master slave concept and

two types of repair facilities. Taneja et al. (2007) carried out the profit evaluation of a two-out-

of-three unit system for an ash handling plant wherein situation of system failure did not arise.

Zuhair and Rizwan (2007) studied the reliability analysis of a two unit system. Minocha (2007)

discussed the profit evaluation of some reliability model for Technological systems. Goyal et.al

(2009) studied the reliability and profit evaluation of a 2-unit cold standby system working in a

sugar mill with operating and rest periods. Mathew et al. (2009) analysed the profit evaluation of

a single unit CC plant with scheduled maintenance. Rizwan et al. (2010) discussed the reliability

analysis of a hot standby industrial system. Goyal et al. (2010) made comparative study on the

basis of profits between two models for sulphated juice pump system working seasonally and

having different configurations. Mathew et al. (2011) discussed the reliability analysis of an

identical two-unit parallel CC plant system operative with full installed capacity. Kumar and

Bhatia (2011) studied the impact of ignored faults on reliability and availability of centrifuge

system that undergoes periodic rest. Kumar and Kapoor (2012) examined the cost-benefit

analysis of a base transceiver system considering hardware/software faults and congestion of

calls. Zhang et al. (2012) developed a reliability model and optimized maintenance of the diesel

system in locomotives.

We propose to study the reliability and economic analysis of some models on Gas turbine

power plants considering different situations depending on the variation in demand and power

production capacity of the system, gathering information on failure times, repair times, etc. from

Gas turbine power plants.

Let us now discuss some basic concepts related to our work:

7

Concept of Reliability

Introduction to reliability contains rich blend of basic concepts and practical problems

from the real world. In the most-wider sense, the word „reliability‟ can be viewed as Re and

liability which simply means that it is the liability, not one but again and again; from designer,

manufacturer, inspector, and vendor to user on all those who are involved with the system in

anyway, to make it reliable. The concept of reliability has been interpreted in many different

ways in numerous works out of which a few are listed below:

(i) Reliability is the integral of the distribution of probabilities of failure free operation from

the instant of switch on to the first failure.

(ii) Reliability is the probability that the device will operate without failure for a given time

under given operating conditions.

(iii) Reliability of a system is called its capacity for failure free operation for a definite period

of time under the given operating conditions and for lower time lost for repair and

preventive maintenance.

(iv) The reliability of equipment is arbitrarily assumed to be the equipment capacity to

maintain given properties under specified conditions and for a given period of time.

Many definitions of reliability were given by various Engineers/ Mathematicians but the

widely accepted by most contemporary reliability authorities is given by the Electronics

Industries Association (EIA) U.S.A., which states;

“Reliability is the probability of a device performing its purpose adequately for

the period of time intended under the operating conditions encounter”

This definition breaks down into four basic parts which are discussed below:

a) Probability

It provides the numerical input for the assessment of reliability and also the first

index of system adequacy. Thus, a statement that the probability of an item functioning is

8

0.9 for 60 hours indicates that only 90 times out of 100 the item would be expected its

functioning for a period of 60 hours.

b) Adequate Performance

An assessment of adequately performance is a matter of engineering appraisal and

appreciation. It requires a detailed investigation of the mode of failure for each

component and the system. It is impossible to specify and adequate reliability level, as

this well obviously vary with the system and associated consequences of failure.

c) Time

Time is the most important factor in the assessment of the reliability, since it

represents a measure of the period during which one can expect a certain of

performance from an item. In the case of a mission time may be 2 minutes for which the

reliability is required to be computed.

d) Operating Conditions

The reliability of an item must always be specified with relation to its operating

conditions, because if these vary, so will the numerical value which is used to express

reliability. For example, if a brake reliability test for an automobile is conducted over to

roads which require infrequent stops, the brake usage will be infrequent; experience has

shown that the environmental conditions such as temperature, humidity, pressure, shock

vibration, voltage, acceleration, acoustic, torque, corrosive atmosphere, gravity etc., have

definite effects on the performance of the item. It one of these conditions change beyond

given limits, item may fail.

In order to develop reliability as a model to help designers and to predict reliability at the

design stage, the main methods to improve reliability are:

(a) Reduce the complexities of the equipment to lower essential for the required operation.

(b) Increase the reliability of the components in the system.

9

(c) Introduced (parallel or series stand by units/components) and

(d) Use service facility

Quantitatively, reliability of a device in time „t‟ is the probability that it will not fail in a

given environment before time t. If T is a random variable representing the time till failure of the

device starting with an initial operable condition at t = 0, then reliability R(t) of device is given

by

R(t) = P[T > t]=1 P[T t] = 1 F(t) .

In terms of probability density function(pdf) of T, namely f(t) we get

( ) ( )t

R t f x dx

The reliability R(t) or probability of survival has the properties,

(i) R(0) = 1 since the device is assumed to be operable at t = 0.

(ii) R () = 0 since no device can work forever without failure.

(iii) R(t) is non-increasing function between 0 and 1.

In describing the reliability of a given system, it is necessary to specify

(i) the failure process,

(ii) the system configuration that describes how the system is connected and the rules of

operation and

(iii) the state in which the system is defined to be failed.

To enhance the reliability of components/systems, one needs to access their reliability

and other related measures. Furthermore, the system concept extends to service the systems and

supply chain systems for which reliability and accuracy are an important goal to achieve.

Reliability can also be improved by redundancy also which will be discussed in this chapter later

on.

10

Instantaneous Hazard Rate (or Failure Rate)

It is defined as the conditional probability that the system fails during the time interval (t,

t + t] given that it was operating during (0, t].

Let r(t) t = probability that the device has life time between t and t + t, given that it has

functioned up to time t.

= P [t < T t + t| T > t]

= ]tT[P

]tΔtTt[P

=

]tT[P

]tT[P]tΔtT[P

= )t(R

)]t(R1[)]tΔt(R1[ =

)t(R

)t(R)tΔt(R

Now, the instantaneous failure rate or hazard rate r(t) at time t is defined as

r(t) = 0Δ

lim

tΔ)t(R

)t(R)tΔt(R =

)t(R

)t('R =

)t(R

)t(f ,

where f(t) is the p.d.f. of the device life time.

It can be seen that

0

0

( ) ( ) ( ) exp[ ( ) ]

( ) ( )exp[ ( ) ]

t

t

t

F t f u du R t r u du

f t r t r u du

Reliability Modeling

For theoretical study of reliability and for conducting the effective and efficient reliability

analysis of applied problems, we need mathematical models. The correct choice of a

mathematical model which represents the essential features of an applied problem is of vital

importance for applying the theory of reliability successfully. Mathematically models may be

either deterministic or probabilistic (stochastic). If we can predict the effect of any change in the

system with certainty, it is said to be deterministic. In practice, there are the situations of

uncertainty in any prediction. The uncertainty can be accommodated by introducing a random

11

variable having some probability distribution instead of mathematical variable. Such type of a

model is known as stochastic model.

To compute the reliability of an item, it is necessary to conceive the failure. For an item,

it is important to list the properties that it must possess in the course of its usage, A deviation in

the properties from the prescribed condition is considered as a fault. A state of fault is known as

„failure‟. An item is considered to have failed under one of the following conditions:

(i) When it becomes completely inoperable due to any reason.

(ii) When it is still operable, but is no longer able to perform as required, for example, a 12

volt battery providing 3 volt instead of 12.

(iii) When a sudden serious deterioration makes the item unsafe for its further use.

(iv) When the item is operative but provides the wrong result (unwanted operations).

Most of the systems, in their life time, have three phases with changed failure

rates described as follows:

(i) Initial Failure

In the beginning, due to the defective design or manufacturing of a unit/system, a high

failure rate may be experienced and is the significant cause of failure. These failures may be

eliminated by operating the item for several hours and replacing the failed components with

tested and good components. The concept of warranty of an item is based on initial failure.

(ii) Random Failure (Chance Failure)

Defects found in the first phase are corrected and the failure rate drops to a steady-state

level for some period of time and during the second phase of life cycle of operation, we

experience a constant failure rate and the failure, if any, is due to chance. This is called the

„useful life period‟ of the item. The effect of such type of failure can be minimized by

duplicating the components (also referred to as redundancy).

(iii) Wear-Out Failure

12

Early

Failure

Period Random

Failure

Period

Wear-out

Failure

Period

Failure time

Fre

qu

ency

of

Fail

ure

At the final phase of operation, the failure rate rises again as the system suffers from the

cumulative effect of dust, vibration, abuse, temperature extremes, and many other environmental

maladies, that is , the system begins to wear out. The effect of „wear and tear‟ can be removed by

proper maintenance of the item.

All the above three phases of the failure are shown in Fig 1.1. The curve shown in the

figure is known as “Bath Tub Curve” and can be represented by Weibull distribution. From the

actuarial point of view, the failure phenomenon in an item is very analogous to the mortality or

death phenomenon in a human being shown as follows:

Phase Cause for System Failure Cause for Human Death

1 Original defect (defective design. Manufacturing and assembly) Birth defect

2 Random (Chance) Accident

3 Wear and Tear (Wear out) Age factor

Fig. 1.1

13

CAUSE

E

EFFECT

System Configurations

By a system, we mean an arbitrary device made up of parts and components assembled to

perform a certain function assuming that their reliabilities are known which help predict the

reliability of the whole system. The combined reliability of all the subsystems put together adds

up to what is known as system reliability. It is now important that the system structures be

known. Various system structures have been considered as follows:-

a) Series Configuration

A system having n-units is said to have series configuration if the failure of an arbitrary

unit (say ith

unit) causes the entire system failure. The examples of the series configurations are:

i) The aircraft electronic system consists of mainly a sensor subsystem, a guidance

subsystem, computer subsystem and the fire control subsystem. This system can only

operate successfully if all these operate simultaneously.

ii) Deepawali or Christmas traditional glow bulbs where if one bulb fails the whole lead

fails. The block diagram of a series system configuration is shown as follows : -

Fig. 1.2. Series Configuration

Let Ri(t) be the reliability of ith component, then the system reliability is given by

R(t) = Pr (T > t] = Pr (min [T1, T2, T3,…, Tn) > t]

n

1i

P [Ti > t] )t(Rn

1ii

where Ti is the life time of the ith unit of the system.

The system hazard rate, therefore, is

r(t) = n

1i )t(r

where ri(t) is the instantaneous failure rate of the ith unit.

Unit 1 Unit 2 Unit 3 Unit n

N

14

CAUSE EFFECT

b) Parallel Configuration

In this configuration, all the units in a system are connected in parallel i.e. the failure of

the system occurs only when all the units of the system fail. For example, four engined aircraft

which is still able to fly with only two engines working. Block diagram representing a parallel

configuration is shown in Fig. 1.3.

Fig.1.3. Parallel Configuration

Suppose Ri(t) and Ti be the reliability of ith component and the life time of the ith unit in

time t, respectively, then the system reliability is given by

R(t) = Pr(T >t) = Pr [ max (T1, T2, T3,…, Tn) > t]

= 1P [max (T1, T2, T3,…, Tn) t]

= 1P (T1 t, T2 t, T3 t,… ,Tn t]

If the units function independently, then

R(t) = 1[1R1(t)] [1R2(t)] [1R3(t)]… [1Rn(t)]

=

n

1ii )]t(R1[1 .

c) Standby Redundant Configuration

Redundancy is a device to improve the reliability of a system. In redundant system, more

units are made available than which are necessary. There are two types of redundancy:

(a) Active Redundancy

(b) Passive redundancy

Unit 1

Unit 2

…….

Unit n

15

INPUT

OUTPUT

(a) Active Redundancy

In this case of redundancy, the system has a positive probability of failure even when it is

not in operation. This may happen due to the effect of temperature, environmental condition etc.

Active redundancy can further be classified as hot redundancy and warm redundancy:-

(i) If the off-line unit can fail and is loaded in exactly the same way as the operating unit, it

is called hot standby unit.

(ii) If the off-line unit can fail and can diminish the load, it is called warm standby unit. The

probability of failure for a warm standby is less than that of failure for operative unit.

(b) Passive or Cold Standby Redundancy

This is that form of redundancy in which the off-line unit cannot fail and is completely

unloaded.

Reliability R(t) of an n-unit standby system at any time instant t is given by

R(t) = P ]tT[n

1ii

where Ti is the life time of ith unit and all the n-units are independent.

Fig.1.4. Standby redundant configuration

Unit 1

Unit 2

Unit n

16

A standby system functions as long as one of the units is available for the task on hand.

A block diagram of such a system is shown as in Fig. 1.4.

(d) k-out-of-n configuration

In many problems the system operates if at least k-out-of-n units function, e.g., a bridge

supported by n-cables, k of which are necessary to support the maximum load. If each of n-units

is identical with the same reliability then the system reliability becomes

R(t) =

n

ki

nCi e

it (1e

t)ni

There exists many other configurations such as series-parallel, parallel-series, mixed

parallel, etc. which are used by the industries.

Stochastic Processes

A stochastic process is a family of random variables indexed by a parameter set realising

values on another set known as the state space. Both the parametric set and the state space can

be either discrete or continuous.

In a stochastic process {X(t), t T}, where X(t), t and T respectively are the state space,

parameter (generally taken to be time) and the index set. If T is countable set such as T = {0, 1,

2, 3,…}, then the stochastic process is said to be a discrete parameter process and if T = {t :

< t < } or T = {t : t 0}, the stochastic process is said to be continuous parametric process.

The state space is classified as discrete or continuous according to whether it is countable or

consists of an interval on the real line. In the present study, we deal with discrete state space

continuous parameter stochastic process.

Markov Process

A stochastic process is known as Markov Process if the future development is completely

determined by the present state and is independent of the way in which the present state has been

developed. If {X(t), t T} is a stochastic process such that, given the value of X(s), the value of

X(t), t > s do not depend on the values of X(u), u < s, i.e. for t > s, i s

Pr[X(t) = i|X(u), 0 u s] = Pr[X(t) = i |X(s)]

Then the process {X(t), t T} is a Markov process.

17

Stochastic processes which do not possess the Markovian property are said to be non-

Markovian.

Markov Chain

A Markov process with discrete state space is said to be a Markov chain.

Mathematically, a stochastic process {Xn ; n = 0, 1,2,…} is called a Markov chain if, for j, k, j1,

j2……….jn1 N

If the transition probabilities pjk are independent of n, the Markov chain is said to be

homogeneous and if it is dependent on n the chain is said to be non-homogeneous.

Renewal Process

Suppose we have a repairable system which starts operation at t = 0. If X1 denotes the

time to first failure and Y1 denotes the time from first failure to next system operation (after

repair) then t1 = X1 + Y1 denotes the time of first renewal. Similarly, if X2 denotes the time from

first renewal to second failure and Y2 denotes the time from second failure to second renewal

then t2 = X2 + Y2 and the time of second renewal is t1 + t2. In general, ti = Xi + Yi (inter-arrival)

is the time between the (i1)th and ith renewal) for i = 1,2,3,…. . If we define

S0 = 0, Sn = t1 + t2 + … tn

= epoch of nth renewal,

and N(t) = number of renewals during (0, t]

then the process {N(t), t > 0} is called renewal process.

Markov Renewal Process

Let the states of a process be denoted by the set E = {0, 1, 2, …}, and let the transitions

of the process occur at epochs t0 (= 0), t1, t2,…,tn (tn < tn+1). If

Pr{Xn+1 = k, tn+1tn t|X0 = i0,…, Xn = in : t0, t1,…tn}

= Pr(Xn+1 = k, tn+1 tn t | Xn = in}

then {Xn, tn}, n = 0, 1, 2, …., constitutes a Markov renewal process with state space E.

18

Semi-Markov Process

In the above, if we assume that the process is time homogeneous, i.e.

Pr{Xn+1=j, tn+1 tn |Xn = i} = Qij(t), i, j s

is independent of n, then there exist limiting transition probabilities

pij = t

lim Qij(t) = Pr(Xn+1 = j | Xn = i}.

Then {Xn, n = 0, 1, 2,…} constitutes a Markov chain with state space E and transition probability

matrix (t.p.m) is given by

P = [pij].

The continuous parameter stochastic process Y(t) with state space E defined by

Y(t) = Xn, tn < t < tn+1

is called a semi-Markov process.

In other words, we define the semi-Markov process is a process in which transition from

one state to another is governed by the transition probabilities of a Markov process but the time

spent in each state before a transition occurs is a random variable depending upon the last

transition made. Thus at transition instants, the semi-Markov behaves just like a Markov

process. However, the times at which transitions occur are governed by a different probability

mechanism.

Regenerative Process

Regenerative stochastic process was defined by Smith (1955) and has been crucial in the

analysis of complex systems. In this, we take a time point at which the system history prior to

the time point is irrelevant to the system conditions. These points are called regeneration points.

Let X(t) be the state of the system at epoch t. If t1, t2, … are the epochs at which the process

probabilistically restarts, then these epochs are called regenerative epochs and the process {X(t),

t = t1, t2…} is called regenerative process.

Supplementary Variable Technique

19

It was developed by Cox (1955), in which the process is made Markovian by introducing

some supplementary variables. This technique can briefly be explained as under:

Consider a complex system in which repair times follow general time distribution. At a

particular instant „t‟, the system can either be in operational state or in the failed state. If the

system is in failed state at time „t‟, the probability of transition to the operable state cannot be

determined unless the elapsed repair time at that time t is specified. A supplementary variable

say „x‟, representing the elapsed repair time of the failed unit is introduced and as such is defined

as the probability that at time t, the system is in the failed state and elapsed repair time lies in the

interval (x, x + ∆). Thus the process becomes Markovian in nature. It is to be noted that such

supplementary variable automatically disappears at the solution stage.

Transforms and Convolutions

(a) Laplace Transform

Let f(t) be a function of a positive real variable t. Then the Laplace transform (L.T.) of

f(t) is defined as

L[f(t)] = f*(s) =

0

est

f(t) dt

for the range of the values of s for which the integral exists. Here, f(t) is called an inverse

Laplace transform of f*(s) and we write f(t) = L1

{f*(s)}. The following are some important

properties of Laplace transform:

(i) L [

n

1iii )]t(fc =

n

1i

*

iifc (s)

(ii) L [tn f(t)] = (1)

n

n

n

ds

)s(*fd

(iii) L [ t

0

]du)u(f = L[F(t)] = s

)s(*f

(iv) 0t

lim

f(t) = s

lim sf*(s) (initial value theorem)

20

(v) t

lim F(t) = 0s

lim

s f*(s) (final value problem)

(vi) 0s

lim

f*(s) = 1 if f*(s) is L.T. of a p.d.f.

(b) Laplace Stieltjes Transform

Let X be a non-negative random variable with distribution function

F(x) = Pr [ X x ]

then Laplace Stieltjes transform (L.S.T.) of F(x) is defined, for s > 0 by

F**(s) =

0

esx

dF(x)

Therefore, we have

F**(s) =

0

esx

f(x) dx = f*(s).

where f(x) = dx

)x(dF .

Convolution

Let f(t) and g(t) be two real valued non-negative continuous functions of t, then the

integral

t

0

)ut(f g(u)du = t

0

g (tu)f(u)du

= f(t) g(t) = L1

[f*(s).g*(s)]

is called Laplace convolution of the functions f(t) and g(t).

If F(t) and G(t) be two real valued distribution functions defined for t 0, the resulting

convolution is again a distribution function and the integral

t

0

)ut(F dG(u) = t

0

)ut(G dF(u) = F(t) G(t)

is known as Stieltjes convolution of F(t) and G(t).

First Passage Time

21

Suppose that a system starts with a state j, then time taken to reach a given state k for the

first time from state j is called first passage time. In general, first passage time is a measure of

how long it takes to reach a given state from another state.

Mean Sojourn Time in a State

The expected time taken by the system in a particular state before transiting to any other

state is known as mean sojourn time or mean survival time in that state. If Ti be the sojourn time

in state i, then mean sojourn time in state i is

i =

0

P (Ti > t) dt

Mean Time to System Failure (MTSF)

No system can operate in the same manner and also it cannot operate for an infinitely

long time due to aging of components or some other reasons. One must, therefore, be interested

in a measure representing the lifetime of the system to avoid sudden failure. Such measure is the

Mean Time to System Failure (MTSF) which corresponds to the average duration between

successive system failures. This measure is defined as the expected time for which the system is

in operation before it completely fails.

Suppose the reliability function for a system is given by R(t) = 1 F(t), where F(t)

is the failure time distribution function and f(t) = dF(t)/(dt) is the failure time density function.

The mean time to system failure is given by

MTSF =

0

t f(t) dt

=

0

t

dt

)t(dRdt

= [tR(t)

0

] +

0

R(t) dt

=

0

R(t) dt = 0s

lim

R*(s) .

Let 0(t) be the cumulative distribution function of the first passage time from initial state to a

failed state, then

22

R*(s) = s

)s(1 **

0

Thus, we have

MTSF =0s

lim

s

)s(1 **

0

.

Availability

On the unavailability of a system due to break downs, it is put back into operation with

proper repairs. In fact, it is concerned with availability equally as it does with reliability because

of additional costs and inconvenience incurred when the system is not available. The differences

between the measures reliability and availability are given as follows:

(i) The reliability is an interval function while the availability is a point function

describing the behaviour of the system at a specified epoch.

(ii) The reliability function precludes the failure of the system during the interval under

consideration, while availability function does not impose any such restriction on the

behaviour of the system.

We may categorize availability as :

(i) Instantaneous (Point wise) Availability

This is the probability that the system will be able to operate within the tolerances at a

given instant of time and is also called operational readiness.

Let X(t) = 1, if the system is operable at time t; and X(t) = 0, when it is not operable. The

availability A(t) of the system at time t is given by

A(t) = P[X(t) = 1| X(0) = 1].

Hence, X (t) is a binary variable having values 1 and 0, respectively for the operation and

non-operation of the system at an instant t.

(ii) Average (Interval) Availability

It is the expected fraction of a given interval of time that the system will be able to

operate within tolerances. It is also called the efficiency of the system and its limiting value is the

inherent availability.

23

Suppose the given interval of time is (0, T]. Then interval availability H(0, T] = A(T) for

this interval is given by

A(T) =

T

0T1 A(t) dt .

(iii) Steady State (Limiting Interval) Availability

It is defined as the probability that in the long run that the system operates satisfactory.

To obtain steady state availability, we simply compute

A(∞) = T

lim H(0, T) = T

lim A(T) .

Maintainability

Maintainability is associated with a system under repair. It is the probability that the

system will be restored to operational effectiveness within a specified time when the

maintenance action is taken in accordance with prescribed conditions. Maintenance is one of the

effective ways of increasing the reliability of a system. Maintenance of a system is of two types:

(i) Preventive maintenance (PM)

(ii) Corrective maintenance (CM)

PM includes actions such as lubrications, replacement of a nut or a screw or some part of

the system, refueling, cleaning, etc., while CM involves minor repairs that may crop up between

inspections.

On failure of a unit, it is sent to a repair facility, if available, otherwise it queues up for

repair. There may be two types of repair policies as follows:

(i) Repeat Repair Policy

Due to certain reason the repair of a failed unit has to be stopped. When the repair is

begun again, it is started all over again.

(ii) Resume Repair Policy

The repair of a failed component is terminated before completion due to one reason or the

other. When it begins again, it is started from the stage where it was prior to the termination of

the repair.

24

Busy Period

Let B(t) be the probability that a repairman is busy with the system in the interval (0,t].

Then in the long run, the total fraction of time for which a repairman is busy, is given by

B =t

lim B(t)

Down Period

Let D(t) be the probability that system is down due to unavailability of the required

number of operable units for the system in the interval (0,t]. Then in the long run, the total

fraction of time for which the system is down, is given by

D =t

lim D(t)

Expected Number of Visits by the Repairman to the System

Let V(t) be a random variable representing the number of times a repairman has visited

the system in the interval (0,t] then the expected number of visits by the repairman to the system

in (0,t] is E[V(t)] and in the long run, the expected number of visits per unit time is given by

V = t

limt

tVE )]([

Profit Analysis

No organization can serve for long without minimum financial returns for its investment.

Therefore, profit analysis is an important aspect in the field of reliability. Profit of a system

depends upon various factors. For instance, production cost of maintenance and spares, failure

rates, repairman employed, cost of calling the repairman, etc. Availability of the system leads to

revenue whereas the busy period of the repairman for inspection, busy period of the repairman

for repair, the number of visits by the repairman, the down time of the system lead to the

costs/loss.

The profit is excess of revenue over the cost of production. The profit function takes the

form:

P(t) = Expected revenue in (0, t] expected total cost in (0, t]

In general, the optimal policies can more easily be derived for an infinite time span as

compared to a finite span. The profit per unit time is expressed as

25

t

)t(Plimt

i.e. profit per unit time = total revenue per unit time total cost per unit time.

Let us, for example, consider a system which involves only the following costs:

C0 = revenue per unit up time of the system.

C1 = cost per unit time for which the repairman is busy.

C2 = cost per visit of the repairman.

C3 = cost per unit down time.

Let A = the total fraction of time for which the system is up.

B = the total fraction of time for which the repairman is busy.

V = expected number of visits of the repairman.

D = expected down time of the system

Then the expected profit in steady-state is given by

P = C0 A C1 B C2 V C3 D.

Let us now discuss some important continuous distributions which are used for

failure/repair times of various systems/components.

Some Important Continuous Distributions

Data on fatigue failure of materials and life length of systems/components are fitted to

variety of distributions. However, failure/repair times of the systems/components usually follow

one of the following distributions:

Exponential Distribution

A continuous random variable having the range 0 t < is said to have an exponential

distribution if it has the probability density function of the form

et

, 0 t <

f(t) = 0 , t < 0

where is a positive constant. The corresponding distribution function is

26

1 e

t, 0 t <

F(t) = 0 , t < 0 .

The hazard rate „‟ is constant. The Laplace transformation of the p.d.f. of exponential

distribution is /(+s).

Exponential distribution plays an important role in reliability studies. Besides a number

of mathematical properties, it has a very important property known as „memory less property‟.

For example, an electric fuse (assuming it cannot melt partially) whose failure life distribution is

practically unchanged as long as it has not yet failed.

Weibull Distribution

A Weibull distribution has the density function defined by

1

exp , x 01

bb ax

f x axb

Its distribution function is

1

1 exp , x 01

baxF x

b

where a and b are positive constants and are known as “scale” and “shape” parameters

respectively.

It is evident that the exponential and Rayleigh distributions are the special cases of the

two-parameter Weibull distribution when b = 0 and b = 1 respectively.

Normal Distribution

Normal distribution is a two-parameter distribution of a continuous random variable whose

probability has the form:

2 21

exp / 2 , - < < 2

f x x x

The constants and > 0 are arbitrary and represent the mean and standard deviation of the

random variable.

27

This is the most important probability distribution for use in statistics. In reliability work

it is mostly used as a limiting form for binomial and Poisson distributions.

The Lognormal Distribution

If the random variable T, the time to failure, has a lognormal distribution, the logarithm

of T has a normal distribution. This is a very useful relationship in working with the lognormal

distribution. The density function for the lognormal is

22

2

1 1exp ln , t 0

22 med

tf t

s tst

where the parameter s is a shape parameter and medt , the location parameter, is the median time

to failure.

The distribution is defined for only positive values of t and is therefore more appropriate

than the normal as a failure distribution. Like the Weibull distribution, the lognormal can take on

a variety of shapes. It is frequently the case that data that fit a Weibull distribution will also fit a

lognormal distribution.

The mean, variance, and mode of the lognormal are

2exp / 2medMTTF t s

2 2 2 2exp exp 1medt s s

mod 2exp

mede

tt

s

To compute failure probabilities, the lognormal‟s relationship to the normal is utilized.

Goodness-of-Fit Tests

For the selection of a theoretical distribution, a statistical test for goodness of fit is

performed. Such a test compares a null hypothesis (H0) with an alternative hypothesis (H1)

having the following form:

H0: The failure times came from the specified distribution.

H1: The failure times did not come from the specified distribution.

28

The test consists of computing a statistic based on the sample of failure times. This

statistic is then compared with a critical value obtained from a table of such values. Generally, if

the test statistic is less than the critical value, the null hypothesis (H0) is accepted; otherwise, the

alternative hypothesis (H1) is accepted.

There are two types of goodness-of-fit tests: general tests and specific tests. A general

test is applicable to fitting more than one theoretical distribution, and a specific test is tailored to

a single distribution. When available, specific tests will be more powerful (have a higher

probability of correctly rejecting a distribution) than general tests.

Here, we shall discuss specific tests for the exponential, Weibull, normal, and lognormal

failure distributions.

Bertlett’s Test for Exponential Distribution

This test is applied to test the Hypothesis

H0 : failures times are exponential

against HA : failures times are not exponential.

The test statistic is

B = r6/)1r(1

tlogr

1t

r

1logr2

r

1i

i

r

1i

i

,

where ti be ith time to failure and r be number of failures. It follows Chi-square distribution with

r1 degree of freedom. For the level of significance , if 2)1r,2/(

2)1r,2/1( B , then the

null hypothesis is accepted and we can say that failure times follow exponential distribution.

Mann’s Test for Weibull Distribution

A specific test for the Weibull failure distribution is a test developed by Mann, Schafer,

and Singpurwalla. The hypotheses are

H0: The failulre times are Weibull.

H1: The failure times are not Weibull.

The test statistic is

29

1

1

1

1 11

2 11

ln ln /

ln ln /

r

i i ii k

k

i i ii

k t t MM

k t t M

where 1

2

rk

2

1

2

rk

1i i iM Z Z

0.5ln ln 1

0.25i

iZ

n

and x is the integer portion of the number x. Mi is an approximation. If M > Fcrit, then H1 is

accepted. Values for Fcrit may be obtained from tables of the F-distribution if one lets the number

of degrees of freedom for the numerator be 2k2 and the number of degrees of freedom for the

denominator be 2k1.

This test is for the two-parameter Weibull distribution. Therefore, if the alternative

hypothesis is accepted, the three-parameter Weibull as well as other distributions should be

considered. Observe that the data must be rank-ordered for the test statistic to be computed.

Kolmogorov-Smirnov Test for Normal and Lognormal Distributions

A goodness-of-fit test for use with the normal distribution when the parameters are

estimated is a version of the Kolmogorov-Smirnov test developed by H.W. Lilliefors. It

compares the empirical cumulative distribution function with the normal cumulative distribution

function. The hypotheses are

H0: The failure times are normal.

H1: The failure times are not normal.

The test statistic is Dn = max {D1, D2}, where

1 2max max1 1

1 i i

i n i n

t t t ti iD D

s n n s

2

2 1

1

s1

nn

ii i

i

t ttt

n n

If Dn < Dcrit, then accept H0; if Dn > Dcrit, then accept H1. This test is appropriate for complete

samples only.

30

Glimpses of the Thesis

The present thesis entitled "Reliability and Economic Analysis of Some Models on

Gas Turbine Power Plants" is an attempt to develop the reliability models on Gas turbine

power plants considering different situations depending on the variation in demand and power

production capacity of the system, gathering information on failure times, repair times, etc. from

Gas turbine power plants, with the following objectives:

To obtain the reliability, the expressions for the mean time to system failure and for

various other measures of the system effectiveness.

To discuss the economic analysis of the system using various measures of system

effectiveness.

To know the behaviour of the MTSF and the profit function graphically with respect to

various rates, costs, revenue, etc.

To make comparison between the models for the systems working in Gas Turbine plants

studied under different situations/ considerations and to identify which and when one

model is better than the other.

The methodology used for the analysis is as follows:

Data/information on failure times, repair times, various costs, etc. has been gathered

visiting some Gas turbine power plants. Then some models have been developed on the basis of

the situations existing in the plants visited and some proposed situations. Reliability, Mean Time

to System Failure (MTSF) and various other measures of the system effectiveness have been

obtained by making use of semi-Markov processes and regenerative point technique. Expression

for the profit has been obtained for each of the models discussed using the obtained measures of

the system effectiveness. Computer programs using BASIC / C language were developed for

evaluating various measures of the system effectiveness and hence the profit for particular cases,

that is, for some numerical values of various rates, costs, revenue, probabilities, etc. taken on the

basis of the data/information gathered from the plants visited and assuming values for other

parameters for which the information was not provided. Then various graphs have been plotted

for the MTSF, availability, and the profit with respect to various rates, costs, revenue,

31

probabilities, etc. using MS Excel. Comparative study, so far as the profitability of the system

under different situations is concerned, has also been made among the models studied. The

techniques/ methods used for deriving the expressions for various measures of system

effectiveness include the Laplace/ Laplace Stieltjes Transforms and convolutions, Cramer‟s rule

for solving a system of equations, etc.

The present study is covered in the seven chapters of the thesis and is summarized as

follows:

Chapter 1 is introductory in nature. Origin, history and development of reliability are

covered in this chapter. It also discusses the fundamental concepts and definitions related to the

work done in the thesis to make the thesis sufficient in itself.

Chapter 2 presents the information gathered on failures and repairs of the systems

working in Gas turbine plants visited by the author. Estimates of mean failure/repair/inspection

times and hence the failure/repair/inspection rates are obtained on the basis of the information

gathered from the plants. Estimates of various costs and probabilities have also been estimated

from the gathered information. These estimated values have been used in the subsequent chapters

for making the graphical study and giving useful interpretations.

In Chapter 3, a reliability model is developed for a gas turbine power plant comprising

one gas and one steam turbine wherein scheduled inspection is done at regular intervals of time

for maintenance. Initially, both the units i.e. the gas turbine as well as the steam turbine are

operative. On failure of the gas turbine, system goes to down state, whereas on failure of the

steam turbine, the system may be kept in the up state with only gas turbine working or put to

down state according as the buyer of the power so generated is ready to pay higher amount or

not. When only the gas turbine is operative and the steam turbine is failed, this type of working

of the system is called working in the Single Cycle; whereas when both the units are operative

then it is called the Combined Cycle. Three types of scheduled inspection, that is, minor, path

and major inspection are done in this order at regular intervals of times for maintenance.

Chapter 4 investigates a model for a gas turbine power plant comprising one gas and one

steam turbine wherein random inspection is carried out instead of scheduled inspection to detect

32

which one of the three types of maintenance (Minor, Path or Major) needs to be done. Initially,

both the units i.e. the gas turbine as well as the steam turbine are operative. On failure of the gas

turbine, system goes to down state, whereas on failure of the steam turbine, the system may be

kept in the up state with only gas turbine working or put to down state according as the buyer of

the power so generated is ready to pay higher amount or not. When only the gas turbine is

operative and the steam turbine is failed, this type of working of the system is called working in

the Single Cycle; whereas when both the units are operative then it is called the Combined

Cycle. Inspection is done at random points of time which reveals as to which one of the three

types of maintenance is required and accordingly that type of maintenance is done.

In Chapter 5, the reliability and cost-benefit analysis of a gas turbine power plant

comprising two gas turbines and one steam turbine wherein scheduled inspection is done at

regular intervals of time for maintenance is examined. Initially, all the three units i.e. two gas

turbines as well as one steam turbine are operative and the system is considered as to work at full

capacity. On failure of one of the gas turbines with steam turbine working, the system works at

reduced capacity. If both the gas turbines get failed, the system goes to down state; whereas on

failure of the steam turbine, the system may be kept in the up state with one of the gas turbines

working or put to down state according as the buyer of the power so generated is ready to pay

higher amount or not and this is working in single cycle. Three types of scheduled inspection,

that is, minor, path and major inspection are done in this order at regular intervals of times for

maintenance.

Chapter 6 studies the reliability and cost-benefit analysis of a gas turbine power plant

generating system comprising two gas turbines and one steam turbine wherein random inspection

is carried out instead of scheduled inspection to detect which one of the three types of

maintenance (Minor, Path or Major) needs to be done. Initially, all the three units i.e. two gas

turbines as well as one steam turbine are operative and the system is considered as to work at full

capacity. On failure of one of the gas turbines with steam turbine working, the system works at

reduced capacity. If both the gas turbines get failed, the system goes to down state; whereas on

failure of the steam turbine, the system may be kept in the up state with one of the gas turbines

working or put to down state according as the buyer of the power so generated is ready to pay

33

higher amount or not and this is working in single cycle. Inspection is done at random points of

time which reveals as to which one of the three types of maintenance is required and accordingly

that type of maintenance is done.

In Chapter 7, the comparative study of the models studied in the preceding chapters is

made on the basis of profits evaluated for them. The logic behind the comparative study is that

no model can be best in every situation. One model may be better for a situation whereas it may

be worse for some other situation and hence the comparative study becomes more important.

Comparative analysis has been done plotting the graphs for profits of two models at a time and

also for the profits of all the studied models at atime. Interesting interpretations have been made

on the basis of the graphs which help decide which and when one model is better than the other.

In each of the four Chapters 3-6, use of semi-Markov processes and regenerative point

technique has been made for analyzing the models discussed in the thesis. Various measures of

system effectiveness such as MTSF, steady-state availability at full capacity (all the turbines

working), at reduced capacity (one of the two gas turbines and one steam turbine working) and in

single cycle (only one gas turbine working and steam turbine not working), busy period analysis

of the repair facility for repair/inspection, expected down time, expected number of visits, and

the expected profit incurred to the system have been obtained. Graphical study for particular

cases is also made for each of the models and various interesting interpretations have been made.

----- o -----

Documents

CHAPTER I - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/44783/6/06...Chapter 1 Introduction Since Enlightenment technology has been shaping our life to a greater extent. Machines