14
Research Article Fault Detection and Diagnosis for Gas Turbines Based on a Kernelized Information Entropy Model Weiying Wang, 1,2 Zhiqiang Xu, 1,2 Rui Tang, 2 Shuying Li, 2 and Wei Wu 3 1 College of Power and Energy Engineering, Harbin Engineering University, Harbin 150001, China 2 Harbin Marine Boiler & Turbine Research Institute, Harbin 150036, China 3 Harbin Institute of Technology, Harbin 150001, China Correspondence should be addressed to Zhiqiang Xu; [email protected] Received 28 April 2014; Accepted 19 June 2014; Published 28 August 2014 Academic Editor: Xibei Yang Copyright © 2014 Weiying Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in power generation, airplanes, and naval ships and also in oil drilling platforms. However, they are monitored without man on duty in the most cases. It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faults. In this work, we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil well drilling platforms based on a kernelized information entropy model. Shannon information entropy is generalized for measuring the uniformity of exhaust temperatures, which reflect the overall states of the gas paths of gas turbine. In addition, we also extend the entropy to compute the information quantity of features in kernel spaces, which help to select the informative features for a certain recognition task. Finally, we introduce the information entropy based decision tree algorithm to extract rules from fault samples. e experiments on some real-world data show the effectiveness of the proposed algorithms. 1. Introduction Gas turbines, mechanical systems operating on a thermo- dynamic cycle, usually with air as the working fluid, are considered as one kind of the most important devices in power engineering, where the air is compressed, mixed with fuel, and burnt in a combustor, with the generated hot gas expanded through a turbine to generate power, which is used for driving the compressor and for providing the means to overcome external loads. Gas turbines play an increasingly important role in the domains of mechanical drives in the oil and gas sectors, electricity generation in the power sector, and propulsion systems in the aerospace and marine sectors. Safety and economy are always two fundamentally impor- tant factors in designing, producing, and operating gas turbine systems. Once a malfunction occurs to a gas turbine, a serious accident, even disaster, may take place. It was reported that about 25 accidents take place every year due to jet malfunctioning. In 1989, 111 were killed in a plane crash due to an engine fault. Although great progress has been made these years in the area of condition monitoring and fault diagnosis, how to predict and detect malfunctions is still an open problem for the complex systems. In some cases, such as offshore oil well drilling platforms, the main power system is self-monitoring without man on duty. So the reliability and stabilization are of critical importance to these systems. ere are hundreds of offshore platforms with gas turbines providing electricity and powers in China. ere is an urgent requirement to design and develop online remote monitoring and health management techniques for these systems. More than two hundred sensors are installed in each gas turbine for monitoring the state of a gas turbine. e data gathered by these sensors reflects the state and trend of the system. If we build a center to monitor two hundred gas turbine systems, we should watch the data coming from more than forty thousand sensors. Obviously, it is infeasible to manually analyze them. Techniques on intelligent data analysis have been employed in gas turbine monitoring and diagnosis. In 2007, Wang et al. designed a conceptual system for remote monitoring and fault diagnosis of gas turbine- based power generation systems [1]. In 2008, Donat et al. discussed the issue of data visualization, data reduction, Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 617162, 13 pages http://dx.doi.org/10.1155/2014/617162

Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

Research ArticleFault Detection and Diagnosis for Gas Turbines Based on aKernelized Information Entropy Model

Weiying Wang12 Zhiqiang Xu12 Rui Tang2 Shuying Li2 and Wei Wu3

1 College of Power and Energy Engineering Harbin Engineering University Harbin 150001 China2Harbin Marine Boiler amp Turbine Research Institute Harbin 150036 China3Harbin Institute of Technology Harbin 150001 China

Correspondence should be addressed to Zhiqiang Xu xuzhiqianghiteducn

Received 28 April 2014 Accepted 19 June 2014 Published 28 August 2014

Academic Editor Xibei Yang

Copyright copy 2014 Weiying Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Gas turbines are considered as one kind of the most important devices in power engineering and have been widely used in powergeneration airplanes and naval ships and also in oil drilling platforms However they are monitored without man on duty in themost cases It is highly desirable to develop techniques and systems to remotely monitor their conditions and analyze their faultsIn this work we introduce a remote system for online condition monitoring and fault diagnosis of gas turbine on offshore oil welldrilling platforms based on a kernelized information entropy model Shannon information entropy is generalized for measuringthe uniformity of exhaust temperatures which reflect the overall states of the gas paths of gas turbine In addition we also extendthe entropy to compute the information quantity of features in kernel spaces which help to select the informative features for acertain recognition task Finally we introduce the information entropy based decision tree algorithm to extract rules from faultsamples The experiments on some real-world data show the effectiveness of the proposed algorithms

1 Introduction

Gas turbines mechanical systems operating on a thermo-dynamic cycle usually with air as the working fluid areconsidered as one kind of the most important devices inpower engineering where the air is compressed mixed withfuel and burnt in a combustor with the generated hot gasexpanded through a turbine to generate power which is usedfor driving the compressor and for providing the means toovercome external loads Gas turbines play an increasinglyimportant role in the domains of mechanical drives in the oiland gas sectors electricity generation in the power sector andpropulsion systems in the aerospace and marine sectors

Safety and economy are always two fundamentally impor-tant factors in designing producing and operating gasturbine systemsOnce amalfunction occurs to a gas turbine aserious accident even disastermay take place It was reportedthat about 25 accidents take place every year due to jetmalfunctioning In 1989 111 were killed in a plane crash dueto an engine fault Although great progress has been madethese years in the area of condition monitoring and fault

diagnosis how to predict and detect malfunctions is still anopen problem for the complex systems In some cases suchas offshore oil well drilling platforms the main power systemis self-monitoring without man on duty So the reliabilityand stabilization are of critical importance to these systemsThere are hundreds of offshore platforms with gas turbinesproviding electricity and powers in ChinaThere is an urgentrequirement to design and develop online remotemonitoringand health management techniques for these systems

More than two hundred sensors are installed in eachgas turbine for monitoring the state of a gas turbine Thedata gathered by these sensors reflects the state and trendof the system If we build a center to monitor two hundredgas turbine systems we should watch the data coming frommore than forty thousand sensors Obviously it is infeasibleto manually analyze them Techniques on intelligent dataanalysis have been employed in gas turbine monitoring anddiagnosis In 2007 Wang et al designed a conceptual systemfor remote monitoring and fault diagnosis of gas turbine-based power generation systems [1] In 2008 Donat et aldiscussed the issue of data visualization data reduction

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 617162 13 pageshttpdxdoiorg1011552014617162

2 The Scientific World Journal

and ensemble learning for intelligent fault diagnosis in gasturbine engines [2] In 2009 Li and Nilkitsaranont describeda prognostic approach to estimating the remaining useful lifeof gas turbine engines before their next major overhaul basedon a combined regression technique with both linear andquadratic models [3] In the same year Bassily et al proposeda technique which assessed whether or not the multivariateautocovariance functions of two independently sampledsignals coincide to detect faults in a gas turbine [4] In 2010Young et al presented an offline fault diagnosis method forindustrial gas turbines in a steady-state using Bayesian dataanalysis The authors employed multiple Bayesian modelsvia model averaging for improving the performance of theresulted system [5] In 2011 Yu et al designed a sensorfault diagnosis technique for Micro-Gas Turbine Enginebased on wavelet entropy where wavelet decomposition wasutilized to decompose the signal in different scales and thenthe instantaneous wavelet energy entropy and instantaneouswavelet singular entropy are computed based on the previouswavelet entropy theory [6]

In recent years signal processing and data miningtechniques are combined to extract knowledge and buildmodels for fault diagnosis In 2012 Wu et al studied theissue of bearing fault diagnosis based on multiscale per-mutation entropy and support vector machine [7] In 2013they designed a technique for defecting diagnostics basedon multiscale analysis and support vector machines [8]Nozari et al presented a model-based robust fault detectionand isolation method with a hybrid structure where time-delay multilayer perceptron models local linear neurofuzzymodels and linear model tree were used in the system [9]Sarkar et al [10] designed symbolic dynamic filtering byoptimally partitioning sensor observation and the objectiveis to reduce the effects of sensor noise level variation andmagnify the system fault signatures Feature extraction andpattern classification are used for fault detection in aircraftgas turbine engines

Entropy is a fundamental concept in the domains ofinformation theory and thermodynamics It was first definedto be a measure of progressing towards thermodynamicequilibrium then it was introduced in information theory byShannon [11] as a measure of the amount of information thatis missing before receptionThis concept gets popular in bothdomains [12ndash16] Now it is widely used in machine learningand data driven modeling [17 18] In 2011 a new measure-ment called maximal information coefficient was reportedThis function can be used to discover the association betweentwo random variables [19] However it cannot be used tocompute the relevance between feature sets

In this work we will develop techniques to detect abnor-mality and analyze faults based on a generalized informationentropy model Moreover we also describe a system forstate monitoring of gas turbines on offshore oil well drillingplatforms First we will describe a system developed forremote and online condition monitoring and fault diagnosisof gas turbines installed on oil drilling platforms As vastamount of historical records is gathered in this system it isan urgent task to design algorithms for automatically onlinedetecting abnormality of the data and analyze the data to

obtain the causes and sources of faults Due to the complexityof gas turbine systems we focus on the gas-path subsystemin this workThe function of entropy is employed to measurethe uniformity of exhaust temperatures which is a key factorreflecting the health of the gas path of a gas turbine and alsoreflecting the performance of the gas turbineThenwe extractfeatures from the healthy and abnormal records An extendedinformation entropy model is introduced to evaluate thequality of these features for selecting informative attributesFinally the selected features are used to build models forautomatic fault recognition where support vector machines[20] and C45 are considered Real-world data are collectedto show the effectiveness of the proposed techniques

The remainder of the work is organized as followsSection 2 describes the architecture of the remotemonitoringand fault diagnosis center for gas turbines installed on theoil drilling platforms Section 3 designs an algorithm fordetecting abnormality of the exhaust temperatures Thenwe extract features from the exhaust temperature data andselect informative ones based on evaluating the informationbottlenecks with extend information entropy in Section 4Support vector machines and C45 are introduced for build-ing fault diagnosis models in Section 5 In addition numer-ical experiments are also described in this section Finallyconclusions and future work are given in Section 6

2 Framework of Remote Monitoring andFault Diagnosis Center for Gas Turbine

Gas turbines are widely used as power and electric powersources The structure of a general gas turbine is presentedin Figure 1 This system transforms chemical energy intothermal power then mechanical energy and finally electricenergy Gas turbines are usually considered as the hearts of alot of mechanical systems

As the offshore oil well drilling platforms are usuallyunattended an online and remote state monitoring systemis much useful in this area which can help find abnormalitybefore serious faults occur However the sensor data cannotbe sent into a center with ground based internetThe data canonly be transmitted via telecommunication satellite whichwas too expensive in the past Now this is available

The system consists of four subsystems data acquisitionand local monitoring subsystem (DALM) data commu-nication subsystem (DAC) data management subsystem(DMS) and intelligent diagnosis system (IDS) The firstsubsystem gathers the outputs from different sensors andchecks whether there is any abnormality in the system Thesecond one packs the acquired data and transforms theminto the monitoring center Users in the center can also senda message to this subsystem to ask for some special data ifabnormality or fault occursThe datamanagement subsystemstores the historic information and also fault data and faultcases A data compression algorithm is embedded in thesystem As most of the historic data are useless for the finalanalysis they will be compressed and removed for savingstorage space Finally IDS watches the alarm informationfrom different unit assemblies and starts the correspondingmodule to analyze the related information This system gives

The Scientific World Journal 3

Exhaust

Turbine

Combustor

Compressor

Output shaft and gearbox

Titan 130

Single shaft gas turbine forpower generation applications

Figure 1 Prototype structure of a gas turbine

some decision and explains how the decision has been madeThe structure of the system is shown in Figure 2

One of the webpages of the system is given in Figure 3where we can see the rose figure of exhaust temperaturesand some statistical parameters varying with time are alsopresented

3 Abnormality Detection inExhaust Temperatures Based onInformation Entropy

Exhaust temperature is one of the most critical parameters ina gas turbine as excessive turbine temperatures may lead tolife reduction or catastrophic failures In the current gener-ation of machines temperatures at the combustor dischargeare too high for the type of instrumentation available Exhausttemperature is also used as an indicator of turbine inlettemperature

As the temperature profile out of a gas turbine is notuniform a number of probes will help pinpoint disturbancesor malfunctions in the gas turbine by highlighting the shiftsin the temperature profile Thus there are usually a set ofthermometers fixed on the exhaust If the system is normallyoperating all the thermometers give similar outputs How-ever if a fault occurs to some components of the turbinedifferent temperatures will be observed The uniformity ofexhaust temperatures reflects the state of the system So weshould develop an index to measure the uniformity of theexhaust temperatures In this work we consider the entropyfunction for it is widely used in measuring uniformity ofrandomvariables However to the best of our knowledge thisfunction has not been used in this domain

Assume that there are 119899 thermometers and their outputsare 119879119894 119894 = 1 119899 respectively Then we define the unifor-

mity of these outputs as

119864 (119879) = minus

119899

sum

119894=1

119879119894

119879

log2

119879119894

119879

(1)

where 119879 = sum119895119879119895 As 119879

119894ge 0 we define 0 log 0 = 0

Obviously we have log2119899 ge 119864(119879) ge 0119864 (119879) = log

2119899 if and

only if 1198791= 1198792= sdot sdot sdot = 119879

119899 In this case all the thermometers

produce the same output So the uniformity of the sensorsis maximal In another extreme case if 119879

1= 1198792= 119879119894minus1

=

119879119894+1

sdot sdot sdot = 119879119899= 0 and 119879

119894= 119879 then 119864 (119879) = 0

It is notable that the value of entropy is independentof the values of thermometers while it depends on thedistribution of the temperatures The entropy is maximal ifall the thermometers output the same values

Now we show two sets of real exhaust temperatures mea-sured on an oil well drilling platform where 13 thermometersare fixed In the first set the gas turbine starts from a timepoint and then runs for several minutes finally the systemstops

Observing the curves in Figure 4 we can see that the13 thermometers give the almost the same outputs at thebeginning In fact the outputs are the room temperature inthis case as shown in Figure 6(a) Thus the entropy reachesthe peak value

Some typical samples are presented in Figure 6 wherethe temperature distributions around the exhaust at timepoints 119905 = 5130250400 and 500 are given Obviously thedistributions at 119905 = 130250 and 400 are not desirable It canbe derived that some abnormality occurs to the system Theentropy of temperature distribution is given in Figure 5

4 The Scientific World Journal

Local monitoring Data acquiring Inner firewallOuter firewall

TelstarSatellite signal generator Satellite signal receiver

Inner firewall Outer firewall Internet

Online database subsystem

Video conference system

Internet

Client

Client

Client

Intelligent diagnosis subsystem

Human-system interface

Data communication

Oil drilling platforms

Gas turbine

Diagnosis center

Figure 2 Structure of the remote system of condition monitoring and fault analysis

Figure 3 A typical webpage for monitoring of the subsystem

0 50 100 150 200 250 300 350 400 450 500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 4 Exhaust temperatures from a set of thermometers

0 50 100 150 200 250 300 350 400 450 500

Time

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)

The Scientific World Journal 5

20

40

30

210

60

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 5

20

40

60

80

100

30

210

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 =130

30

210

60

240

90

270

120

300

150

330

180 0

400

500

300

200

100

(c) Rose map of exhaust temperatures at 119905 =250

30

50

210

60

240

90

270

120

100

150

300

150

330

180 0

(d) Rose map of exhaust temperatures at 119905 =400

100

80

60

40

20

30

210

60

240

90

270

120

300

150

330

180 0

(e) Rose map of exhaust temperatures at 119905 =500

Figure 6 Samples of temperature distribution in different times

Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9

Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures

If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path

4 Fault Feature Quality Evaluation withGeneralized Entropy

The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults

6 The Scientific World Journal

0 500 1000 1500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 7 Exhaust temperatures from another set of thermometers

0 500 1000 1500

Time

245

25

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case

occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault

Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue

Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879

0is also an impor-

tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as

119865 = 1198790 1198791 1198792 119879

119899 1198791minus 1198792 1198792minus 1198793

119879119899minus 1198791 1198791015840

1 1198791015840

2 119879

1015840

119899

(2)

where 1198791015840119894= 119879119894(119895) minus 119879

119894(119895 minus 1) 119879

119894(119895) is the temperature at time

119895 of the 119894th thermometerApart from the above features we can also construct other

attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally

There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it

Given a set of samples 119880 = 1199091 1199092 119909

119898 each sample

is described with 119899 features 119865 = 1198911 1198912 119891

119899 As to

classification learning each training sample 119909119894is associated

with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a

kernel function119870 we can calculate a kernel matrix

119870 =[

[

[

11989611

1198961119898

d

1198961198981

119896119898119898

]

]

]

(3)

where 119896119894119895= 119896(119909

119894 119909119895) The Gaussian function is a representa-

tive kernel function

119896119894119895= exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120590

) (4)

A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896

119894119895= 119896119895119894

Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as

119863 =[

[

[

11988911

1198891119898

d

1198891198981

119889119898119898

]

]

]

(5)

where 119889119894119895= 1 if 119910

119894= 119910119895 otherwise 119889

119894119895= 0 In fact the matrix

119863 is a matching kernel

Definition 1 Given a set of samples119880 = 1199091 1199092 119909

119898 each

sample is described with 119899 features 119865 = 1198911 1198912 119891

119899 1198651015840 sube

119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as

119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(6)

where119870119894= sum119898

119895=1119896119894119895

As to the above entropy function if we use Gaussianfunction as the kernel we have log

2119898 ge 119864 (119870) ge 0 119864 (119870) = 0

if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log

2119898 if and only if 119896

119894119895= 0

119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be

The Scientific World Journal 7

30

210

60

15

10

5

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 500

30

210

800

600

400

200

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 = 758

30

210

250

200

150

100

50

60

240

90

270

120

300

150

330

180 0

(c) Rose map of exhaust temperatures at 119905 = 820

30

210

60

240

90

270

120

300

150

330

180 0

400

300

200

100

(d) Rose map of exhaust temperatures at 119905 =1220

Figure 9 Samples of temperature distribution in different moments

distinguished with the current features while 119864 (119870) = log2119898

means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log

2119898]

Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge

119864(1198702) where119870

1sube 1198702means119870

1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895

Definition 2 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651 1198652sube 119865 119870

1and 119870

2are two kernel matrices induced by 119865

1

and 1198652119870 is a new function computed with 119865

1cup 1198652 Then the

joint entropy of 1198651and 119865

2is defined as

119864 (1198701 1198702) = 119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(7)

where119870119894= sum119898

119895=1119896119894119895

As to the Gaussian function 119870(119909119894 119909119895) = 119870

1(119909119894 119909119895) times

1198702(119909119894 119909119895) Thus 119870 sube 119870

1and 119870 sube 119870

2 In this case 119864(119870) ge

119864(1198701) and 119864(119870) ge 119864(119870

2)

Definition 3 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912

119891119899One has 119865

1 1198652sube 119865 119870

1and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Knowning 119865

1 the condition entropy of 119865

2is

defined as

119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870

1) (8)

As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870

2)

so 119864(1198701| 1198702) ge 0 and 119864(119870

2| 1198701) ge 0

Definition 4 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

One has 1198651 1198652

sube 119865 1198701and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Then the mutual information of 119870

1and 119870

2is

defined as

MI (1198701 1198702) = 119864 (119870

1) + 119864 (119870

2) minus 119864 (119870) (9)

As to Gaussian kernel MI(1198701 1198702) = MI(119870

2 1198701) If 119870

1sube

1198702 we have MI(119870

1 1198702) = 119864(119870

2) and if 119870

2sube 1198701 we have

MI(1198701 1198702) = 119864(119870

1)

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

2 The Scientific World Journal

and ensemble learning for intelligent fault diagnosis in gasturbine engines [2] In 2009 Li and Nilkitsaranont describeda prognostic approach to estimating the remaining useful lifeof gas turbine engines before their next major overhaul basedon a combined regression technique with both linear andquadratic models [3] In the same year Bassily et al proposeda technique which assessed whether or not the multivariateautocovariance functions of two independently sampledsignals coincide to detect faults in a gas turbine [4] In 2010Young et al presented an offline fault diagnosis method forindustrial gas turbines in a steady-state using Bayesian dataanalysis The authors employed multiple Bayesian modelsvia model averaging for improving the performance of theresulted system [5] In 2011 Yu et al designed a sensorfault diagnosis technique for Micro-Gas Turbine Enginebased on wavelet entropy where wavelet decomposition wasutilized to decompose the signal in different scales and thenthe instantaneous wavelet energy entropy and instantaneouswavelet singular entropy are computed based on the previouswavelet entropy theory [6]

In recent years signal processing and data miningtechniques are combined to extract knowledge and buildmodels for fault diagnosis In 2012 Wu et al studied theissue of bearing fault diagnosis based on multiscale per-mutation entropy and support vector machine [7] In 2013they designed a technique for defecting diagnostics basedon multiscale analysis and support vector machines [8]Nozari et al presented a model-based robust fault detectionand isolation method with a hybrid structure where time-delay multilayer perceptron models local linear neurofuzzymodels and linear model tree were used in the system [9]Sarkar et al [10] designed symbolic dynamic filtering byoptimally partitioning sensor observation and the objectiveis to reduce the effects of sensor noise level variation andmagnify the system fault signatures Feature extraction andpattern classification are used for fault detection in aircraftgas turbine engines

Entropy is a fundamental concept in the domains ofinformation theory and thermodynamics It was first definedto be a measure of progressing towards thermodynamicequilibrium then it was introduced in information theory byShannon [11] as a measure of the amount of information thatis missing before receptionThis concept gets popular in bothdomains [12ndash16] Now it is widely used in machine learningand data driven modeling [17 18] In 2011 a new measure-ment called maximal information coefficient was reportedThis function can be used to discover the association betweentwo random variables [19] However it cannot be used tocompute the relevance between feature sets

In this work we will develop techniques to detect abnor-mality and analyze faults based on a generalized informationentropy model Moreover we also describe a system forstate monitoring of gas turbines on offshore oil well drillingplatforms First we will describe a system developed forremote and online condition monitoring and fault diagnosisof gas turbines installed on oil drilling platforms As vastamount of historical records is gathered in this system it isan urgent task to design algorithms for automatically onlinedetecting abnormality of the data and analyze the data to

obtain the causes and sources of faults Due to the complexityof gas turbine systems we focus on the gas-path subsystemin this workThe function of entropy is employed to measurethe uniformity of exhaust temperatures which is a key factorreflecting the health of the gas path of a gas turbine and alsoreflecting the performance of the gas turbineThenwe extractfeatures from the healthy and abnormal records An extendedinformation entropy model is introduced to evaluate thequality of these features for selecting informative attributesFinally the selected features are used to build models forautomatic fault recognition where support vector machines[20] and C45 are considered Real-world data are collectedto show the effectiveness of the proposed techniques

The remainder of the work is organized as followsSection 2 describes the architecture of the remotemonitoringand fault diagnosis center for gas turbines installed on theoil drilling platforms Section 3 designs an algorithm fordetecting abnormality of the exhaust temperatures Thenwe extract features from the exhaust temperature data andselect informative ones based on evaluating the informationbottlenecks with extend information entropy in Section 4Support vector machines and C45 are introduced for build-ing fault diagnosis models in Section 5 In addition numer-ical experiments are also described in this section Finallyconclusions and future work are given in Section 6

2 Framework of Remote Monitoring andFault Diagnosis Center for Gas Turbine

Gas turbines are widely used as power and electric powersources The structure of a general gas turbine is presentedin Figure 1 This system transforms chemical energy intothermal power then mechanical energy and finally electricenergy Gas turbines are usually considered as the hearts of alot of mechanical systems

As the offshore oil well drilling platforms are usuallyunattended an online and remote state monitoring systemis much useful in this area which can help find abnormalitybefore serious faults occur However the sensor data cannotbe sent into a center with ground based internetThe data canonly be transmitted via telecommunication satellite whichwas too expensive in the past Now this is available

The system consists of four subsystems data acquisitionand local monitoring subsystem (DALM) data commu-nication subsystem (DAC) data management subsystem(DMS) and intelligent diagnosis system (IDS) The firstsubsystem gathers the outputs from different sensors andchecks whether there is any abnormality in the system Thesecond one packs the acquired data and transforms theminto the monitoring center Users in the center can also senda message to this subsystem to ask for some special data ifabnormality or fault occursThe datamanagement subsystemstores the historic information and also fault data and faultcases A data compression algorithm is embedded in thesystem As most of the historic data are useless for the finalanalysis they will be compressed and removed for savingstorage space Finally IDS watches the alarm informationfrom different unit assemblies and starts the correspondingmodule to analyze the related information This system gives

The Scientific World Journal 3

Exhaust

Turbine

Combustor

Compressor

Output shaft and gearbox

Titan 130

Single shaft gas turbine forpower generation applications

Figure 1 Prototype structure of a gas turbine

some decision and explains how the decision has been madeThe structure of the system is shown in Figure 2

One of the webpages of the system is given in Figure 3where we can see the rose figure of exhaust temperaturesand some statistical parameters varying with time are alsopresented

3 Abnormality Detection inExhaust Temperatures Based onInformation Entropy

Exhaust temperature is one of the most critical parameters ina gas turbine as excessive turbine temperatures may lead tolife reduction or catastrophic failures In the current gener-ation of machines temperatures at the combustor dischargeare too high for the type of instrumentation available Exhausttemperature is also used as an indicator of turbine inlettemperature

As the temperature profile out of a gas turbine is notuniform a number of probes will help pinpoint disturbancesor malfunctions in the gas turbine by highlighting the shiftsin the temperature profile Thus there are usually a set ofthermometers fixed on the exhaust If the system is normallyoperating all the thermometers give similar outputs How-ever if a fault occurs to some components of the turbinedifferent temperatures will be observed The uniformity ofexhaust temperatures reflects the state of the system So weshould develop an index to measure the uniformity of theexhaust temperatures In this work we consider the entropyfunction for it is widely used in measuring uniformity ofrandomvariables However to the best of our knowledge thisfunction has not been used in this domain

Assume that there are 119899 thermometers and their outputsare 119879119894 119894 = 1 119899 respectively Then we define the unifor-

mity of these outputs as

119864 (119879) = minus

119899

sum

119894=1

119879119894

119879

log2

119879119894

119879

(1)

where 119879 = sum119895119879119895 As 119879

119894ge 0 we define 0 log 0 = 0

Obviously we have log2119899 ge 119864(119879) ge 0119864 (119879) = log

2119899 if and

only if 1198791= 1198792= sdot sdot sdot = 119879

119899 In this case all the thermometers

produce the same output So the uniformity of the sensorsis maximal In another extreme case if 119879

1= 1198792= 119879119894minus1

=

119879119894+1

sdot sdot sdot = 119879119899= 0 and 119879

119894= 119879 then 119864 (119879) = 0

It is notable that the value of entropy is independentof the values of thermometers while it depends on thedistribution of the temperatures The entropy is maximal ifall the thermometers output the same values

Now we show two sets of real exhaust temperatures mea-sured on an oil well drilling platform where 13 thermometersare fixed In the first set the gas turbine starts from a timepoint and then runs for several minutes finally the systemstops

Observing the curves in Figure 4 we can see that the13 thermometers give the almost the same outputs at thebeginning In fact the outputs are the room temperature inthis case as shown in Figure 6(a) Thus the entropy reachesthe peak value

Some typical samples are presented in Figure 6 wherethe temperature distributions around the exhaust at timepoints 119905 = 5130250400 and 500 are given Obviously thedistributions at 119905 = 130250 and 400 are not desirable It canbe derived that some abnormality occurs to the system Theentropy of temperature distribution is given in Figure 5

4 The Scientific World Journal

Local monitoring Data acquiring Inner firewallOuter firewall

TelstarSatellite signal generator Satellite signal receiver

Inner firewall Outer firewall Internet

Online database subsystem

Video conference system

Internet

Client

Client

Client

Intelligent diagnosis subsystem

Human-system interface

Data communication

Oil drilling platforms

Gas turbine

Diagnosis center

Figure 2 Structure of the remote system of condition monitoring and fault analysis

Figure 3 A typical webpage for monitoring of the subsystem

0 50 100 150 200 250 300 350 400 450 500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 4 Exhaust temperatures from a set of thermometers

0 50 100 150 200 250 300 350 400 450 500

Time

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)

The Scientific World Journal 5

20

40

30

210

60

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 5

20

40

60

80

100

30

210

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 =130

30

210

60

240

90

270

120

300

150

330

180 0

400

500

300

200

100

(c) Rose map of exhaust temperatures at 119905 =250

30

50

210

60

240

90

270

120

100

150

300

150

330

180 0

(d) Rose map of exhaust temperatures at 119905 =400

100

80

60

40

20

30

210

60

240

90

270

120

300

150

330

180 0

(e) Rose map of exhaust temperatures at 119905 =500

Figure 6 Samples of temperature distribution in different times

Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9

Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures

If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path

4 Fault Feature Quality Evaluation withGeneralized Entropy

The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults

6 The Scientific World Journal

0 500 1000 1500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 7 Exhaust temperatures from another set of thermometers

0 500 1000 1500

Time

245

25

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case

occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault

Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue

Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879

0is also an impor-

tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as

119865 = 1198790 1198791 1198792 119879

119899 1198791minus 1198792 1198792minus 1198793

119879119899minus 1198791 1198791015840

1 1198791015840

2 119879

1015840

119899

(2)

where 1198791015840119894= 119879119894(119895) minus 119879

119894(119895 minus 1) 119879

119894(119895) is the temperature at time

119895 of the 119894th thermometerApart from the above features we can also construct other

attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally

There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it

Given a set of samples 119880 = 1199091 1199092 119909

119898 each sample

is described with 119899 features 119865 = 1198911 1198912 119891

119899 As to

classification learning each training sample 119909119894is associated

with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a

kernel function119870 we can calculate a kernel matrix

119870 =[

[

[

11989611

1198961119898

d

1198961198981

119896119898119898

]

]

]

(3)

where 119896119894119895= 119896(119909

119894 119909119895) The Gaussian function is a representa-

tive kernel function

119896119894119895= exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120590

) (4)

A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896

119894119895= 119896119895119894

Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as

119863 =[

[

[

11988911

1198891119898

d

1198891198981

119889119898119898

]

]

]

(5)

where 119889119894119895= 1 if 119910

119894= 119910119895 otherwise 119889

119894119895= 0 In fact the matrix

119863 is a matching kernel

Definition 1 Given a set of samples119880 = 1199091 1199092 119909

119898 each

sample is described with 119899 features 119865 = 1198911 1198912 119891

119899 1198651015840 sube

119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as

119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(6)

where119870119894= sum119898

119895=1119896119894119895

As to the above entropy function if we use Gaussianfunction as the kernel we have log

2119898 ge 119864 (119870) ge 0 119864 (119870) = 0

if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log

2119898 if and only if 119896

119894119895= 0

119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be

The Scientific World Journal 7

30

210

60

15

10

5

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 500

30

210

800

600

400

200

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 = 758

30

210

250

200

150

100

50

60

240

90

270

120

300

150

330

180 0

(c) Rose map of exhaust temperatures at 119905 = 820

30

210

60

240

90

270

120

300

150

330

180 0

400

300

200

100

(d) Rose map of exhaust temperatures at 119905 =1220

Figure 9 Samples of temperature distribution in different moments

distinguished with the current features while 119864 (119870) = log2119898

means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log

2119898]

Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge

119864(1198702) where119870

1sube 1198702means119870

1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895

Definition 2 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651 1198652sube 119865 119870

1and 119870

2are two kernel matrices induced by 119865

1

and 1198652119870 is a new function computed with 119865

1cup 1198652 Then the

joint entropy of 1198651and 119865

2is defined as

119864 (1198701 1198702) = 119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(7)

where119870119894= sum119898

119895=1119896119894119895

As to the Gaussian function 119870(119909119894 119909119895) = 119870

1(119909119894 119909119895) times

1198702(119909119894 119909119895) Thus 119870 sube 119870

1and 119870 sube 119870

2 In this case 119864(119870) ge

119864(1198701) and 119864(119870) ge 119864(119870

2)

Definition 3 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912

119891119899One has 119865

1 1198652sube 119865 119870

1and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Knowning 119865

1 the condition entropy of 119865

2is

defined as

119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870

1) (8)

As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870

2)

so 119864(1198701| 1198702) ge 0 and 119864(119870

2| 1198701) ge 0

Definition 4 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

One has 1198651 1198652

sube 119865 1198701and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Then the mutual information of 119870

1and 119870

2is

defined as

MI (1198701 1198702) = 119864 (119870

1) + 119864 (119870

2) minus 119864 (119870) (9)

As to Gaussian kernel MI(1198701 1198702) = MI(119870

2 1198701) If 119870

1sube

1198702 we have MI(119870

1 1198702) = 119864(119870

2) and if 119870

2sube 1198701 we have

MI(1198701 1198702) = 119864(119870

1)

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

The Scientific World Journal 3

Exhaust

Turbine

Combustor

Compressor

Output shaft and gearbox

Titan 130

Single shaft gas turbine forpower generation applications

Figure 1 Prototype structure of a gas turbine

some decision and explains how the decision has been madeThe structure of the system is shown in Figure 2

One of the webpages of the system is given in Figure 3where we can see the rose figure of exhaust temperaturesand some statistical parameters varying with time are alsopresented

3 Abnormality Detection inExhaust Temperatures Based onInformation Entropy

Exhaust temperature is one of the most critical parameters ina gas turbine as excessive turbine temperatures may lead tolife reduction or catastrophic failures In the current gener-ation of machines temperatures at the combustor dischargeare too high for the type of instrumentation available Exhausttemperature is also used as an indicator of turbine inlettemperature

As the temperature profile out of a gas turbine is notuniform a number of probes will help pinpoint disturbancesor malfunctions in the gas turbine by highlighting the shiftsin the temperature profile Thus there are usually a set ofthermometers fixed on the exhaust If the system is normallyoperating all the thermometers give similar outputs How-ever if a fault occurs to some components of the turbinedifferent temperatures will be observed The uniformity ofexhaust temperatures reflects the state of the system So weshould develop an index to measure the uniformity of theexhaust temperatures In this work we consider the entropyfunction for it is widely used in measuring uniformity ofrandomvariables However to the best of our knowledge thisfunction has not been used in this domain

Assume that there are 119899 thermometers and their outputsare 119879119894 119894 = 1 119899 respectively Then we define the unifor-

mity of these outputs as

119864 (119879) = minus

119899

sum

119894=1

119879119894

119879

log2

119879119894

119879

(1)

where 119879 = sum119895119879119895 As 119879

119894ge 0 we define 0 log 0 = 0

Obviously we have log2119899 ge 119864(119879) ge 0119864 (119879) = log

2119899 if and

only if 1198791= 1198792= sdot sdot sdot = 119879

119899 In this case all the thermometers

produce the same output So the uniformity of the sensorsis maximal In another extreme case if 119879

1= 1198792= 119879119894minus1

=

119879119894+1

sdot sdot sdot = 119879119899= 0 and 119879

119894= 119879 then 119864 (119879) = 0

It is notable that the value of entropy is independentof the values of thermometers while it depends on thedistribution of the temperatures The entropy is maximal ifall the thermometers output the same values

Now we show two sets of real exhaust temperatures mea-sured on an oil well drilling platform where 13 thermometersare fixed In the first set the gas turbine starts from a timepoint and then runs for several minutes finally the systemstops

Observing the curves in Figure 4 we can see that the13 thermometers give the almost the same outputs at thebeginning In fact the outputs are the room temperature inthis case as shown in Figure 6(a) Thus the entropy reachesthe peak value

Some typical samples are presented in Figure 6 wherethe temperature distributions around the exhaust at timepoints 119905 = 5130250400 and 500 are given Obviously thedistributions at 119905 = 130250 and 400 are not desirable It canbe derived that some abnormality occurs to the system Theentropy of temperature distribution is given in Figure 5

4 The Scientific World Journal

Local monitoring Data acquiring Inner firewallOuter firewall

TelstarSatellite signal generator Satellite signal receiver

Inner firewall Outer firewall Internet

Online database subsystem

Video conference system

Internet

Client

Client

Client

Intelligent diagnosis subsystem

Human-system interface

Data communication

Oil drilling platforms

Gas turbine

Diagnosis center

Figure 2 Structure of the remote system of condition monitoring and fault analysis

Figure 3 A typical webpage for monitoring of the subsystem

0 50 100 150 200 250 300 350 400 450 500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 4 Exhaust temperatures from a set of thermometers

0 50 100 150 200 250 300 350 400 450 500

Time

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)

The Scientific World Journal 5

20

40

30

210

60

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 5

20

40

60

80

100

30

210

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 =130

30

210

60

240

90

270

120

300

150

330

180 0

400

500

300

200

100

(c) Rose map of exhaust temperatures at 119905 =250

30

50

210

60

240

90

270

120

100

150

300

150

330

180 0

(d) Rose map of exhaust temperatures at 119905 =400

100

80

60

40

20

30

210

60

240

90

270

120

300

150

330

180 0

(e) Rose map of exhaust temperatures at 119905 =500

Figure 6 Samples of temperature distribution in different times

Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9

Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures

If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path

4 Fault Feature Quality Evaluation withGeneralized Entropy

The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults

6 The Scientific World Journal

0 500 1000 1500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 7 Exhaust temperatures from another set of thermometers

0 500 1000 1500

Time

245

25

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case

occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault

Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue

Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879

0is also an impor-

tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as

119865 = 1198790 1198791 1198792 119879

119899 1198791minus 1198792 1198792minus 1198793

119879119899minus 1198791 1198791015840

1 1198791015840

2 119879

1015840

119899

(2)

where 1198791015840119894= 119879119894(119895) minus 119879

119894(119895 minus 1) 119879

119894(119895) is the temperature at time

119895 of the 119894th thermometerApart from the above features we can also construct other

attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally

There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it

Given a set of samples 119880 = 1199091 1199092 119909

119898 each sample

is described with 119899 features 119865 = 1198911 1198912 119891

119899 As to

classification learning each training sample 119909119894is associated

with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a

kernel function119870 we can calculate a kernel matrix

119870 =[

[

[

11989611

1198961119898

d

1198961198981

119896119898119898

]

]

]

(3)

where 119896119894119895= 119896(119909

119894 119909119895) The Gaussian function is a representa-

tive kernel function

119896119894119895= exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120590

) (4)

A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896

119894119895= 119896119895119894

Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as

119863 =[

[

[

11988911

1198891119898

d

1198891198981

119889119898119898

]

]

]

(5)

where 119889119894119895= 1 if 119910

119894= 119910119895 otherwise 119889

119894119895= 0 In fact the matrix

119863 is a matching kernel

Definition 1 Given a set of samples119880 = 1199091 1199092 119909

119898 each

sample is described with 119899 features 119865 = 1198911 1198912 119891

119899 1198651015840 sube

119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as

119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(6)

where119870119894= sum119898

119895=1119896119894119895

As to the above entropy function if we use Gaussianfunction as the kernel we have log

2119898 ge 119864 (119870) ge 0 119864 (119870) = 0

if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log

2119898 if and only if 119896

119894119895= 0

119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be

The Scientific World Journal 7

30

210

60

15

10

5

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 500

30

210

800

600

400

200

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 = 758

30

210

250

200

150

100

50

60

240

90

270

120

300

150

330

180 0

(c) Rose map of exhaust temperatures at 119905 = 820

30

210

60

240

90

270

120

300

150

330

180 0

400

300

200

100

(d) Rose map of exhaust temperatures at 119905 =1220

Figure 9 Samples of temperature distribution in different moments

distinguished with the current features while 119864 (119870) = log2119898

means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log

2119898]

Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge

119864(1198702) where119870

1sube 1198702means119870

1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895

Definition 2 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651 1198652sube 119865 119870

1and 119870

2are two kernel matrices induced by 119865

1

and 1198652119870 is a new function computed with 119865

1cup 1198652 Then the

joint entropy of 1198651and 119865

2is defined as

119864 (1198701 1198702) = 119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(7)

where119870119894= sum119898

119895=1119896119894119895

As to the Gaussian function 119870(119909119894 119909119895) = 119870

1(119909119894 119909119895) times

1198702(119909119894 119909119895) Thus 119870 sube 119870

1and 119870 sube 119870

2 In this case 119864(119870) ge

119864(1198701) and 119864(119870) ge 119864(119870

2)

Definition 3 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912

119891119899One has 119865

1 1198652sube 119865 119870

1and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Knowning 119865

1 the condition entropy of 119865

2is

defined as

119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870

1) (8)

As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870

2)

so 119864(1198701| 1198702) ge 0 and 119864(119870

2| 1198701) ge 0

Definition 4 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

One has 1198651 1198652

sube 119865 1198701and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Then the mutual information of 119870

1and 119870

2is

defined as

MI (1198701 1198702) = 119864 (119870

1) + 119864 (119870

2) minus 119864 (119870) (9)

As to Gaussian kernel MI(1198701 1198702) = MI(119870

2 1198701) If 119870

1sube

1198702 we have MI(119870

1 1198702) = 119864(119870

2) and if 119870

2sube 1198701 we have

MI(1198701 1198702) = 119864(119870

1)

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

4 The Scientific World Journal

Local monitoring Data acquiring Inner firewallOuter firewall

TelstarSatellite signal generator Satellite signal receiver

Inner firewall Outer firewall Internet

Online database subsystem

Video conference system

Internet

Client

Client

Client

Intelligent diagnosis subsystem

Human-system interface

Data communication

Oil drilling platforms

Gas turbine

Diagnosis center

Figure 2 Structure of the remote system of condition monitoring and fault analysis

Figure 3 A typical webpage for monitoring of the subsystem

0 50 100 150 200 250 300 350 400 450 500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 4 Exhaust temperatures from a set of thermometers

0 50 100 150 200 250 300 350 400 450 500

Time

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 5 Uniformity of the temperatures (red dash line is the idealcase blue line is the real case)

The Scientific World Journal 5

20

40

30

210

60

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 5

20

40

60

80

100

30

210

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 =130

30

210

60

240

90

270

120

300

150

330

180 0

400

500

300

200

100

(c) Rose map of exhaust temperatures at 119905 =250

30

50

210

60

240

90

270

120

100

150

300

150

330

180 0

(d) Rose map of exhaust temperatures at 119905 =400

100

80

60

40

20

30

210

60

240

90

270

120

300

150

330

180 0

(e) Rose map of exhaust temperatures at 119905 =500

Figure 6 Samples of temperature distribution in different times

Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9

Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures

If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path

4 Fault Feature Quality Evaluation withGeneralized Entropy

The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults

6 The Scientific World Journal

0 500 1000 1500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 7 Exhaust temperatures from another set of thermometers

0 500 1000 1500

Time

245

25

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case

occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault

Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue

Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879

0is also an impor-

tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as

119865 = 1198790 1198791 1198792 119879

119899 1198791minus 1198792 1198792minus 1198793

119879119899minus 1198791 1198791015840

1 1198791015840

2 119879

1015840

119899

(2)

where 1198791015840119894= 119879119894(119895) minus 119879

119894(119895 minus 1) 119879

119894(119895) is the temperature at time

119895 of the 119894th thermometerApart from the above features we can also construct other

attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally

There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it

Given a set of samples 119880 = 1199091 1199092 119909

119898 each sample

is described with 119899 features 119865 = 1198911 1198912 119891

119899 As to

classification learning each training sample 119909119894is associated

with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a

kernel function119870 we can calculate a kernel matrix

119870 =[

[

[

11989611

1198961119898

d

1198961198981

119896119898119898

]

]

]

(3)

where 119896119894119895= 119896(119909

119894 119909119895) The Gaussian function is a representa-

tive kernel function

119896119894119895= exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120590

) (4)

A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896

119894119895= 119896119895119894

Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as

119863 =[

[

[

11988911

1198891119898

d

1198891198981

119889119898119898

]

]

]

(5)

where 119889119894119895= 1 if 119910

119894= 119910119895 otherwise 119889

119894119895= 0 In fact the matrix

119863 is a matching kernel

Definition 1 Given a set of samples119880 = 1199091 1199092 119909

119898 each

sample is described with 119899 features 119865 = 1198911 1198912 119891

119899 1198651015840 sube

119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as

119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(6)

where119870119894= sum119898

119895=1119896119894119895

As to the above entropy function if we use Gaussianfunction as the kernel we have log

2119898 ge 119864 (119870) ge 0 119864 (119870) = 0

if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log

2119898 if and only if 119896

119894119895= 0

119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be

The Scientific World Journal 7

30

210

60

15

10

5

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 500

30

210

800

600

400

200

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 = 758

30

210

250

200

150

100

50

60

240

90

270

120

300

150

330

180 0

(c) Rose map of exhaust temperatures at 119905 = 820

30

210

60

240

90

270

120

300

150

330

180 0

400

300

200

100

(d) Rose map of exhaust temperatures at 119905 =1220

Figure 9 Samples of temperature distribution in different moments

distinguished with the current features while 119864 (119870) = log2119898

means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log

2119898]

Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge

119864(1198702) where119870

1sube 1198702means119870

1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895

Definition 2 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651 1198652sube 119865 119870

1and 119870

2are two kernel matrices induced by 119865

1

and 1198652119870 is a new function computed with 119865

1cup 1198652 Then the

joint entropy of 1198651and 119865

2is defined as

119864 (1198701 1198702) = 119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(7)

where119870119894= sum119898

119895=1119896119894119895

As to the Gaussian function 119870(119909119894 119909119895) = 119870

1(119909119894 119909119895) times

1198702(119909119894 119909119895) Thus 119870 sube 119870

1and 119870 sube 119870

2 In this case 119864(119870) ge

119864(1198701) and 119864(119870) ge 119864(119870

2)

Definition 3 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912

119891119899One has 119865

1 1198652sube 119865 119870

1and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Knowning 119865

1 the condition entropy of 119865

2is

defined as

119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870

1) (8)

As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870

2)

so 119864(1198701| 1198702) ge 0 and 119864(119870

2| 1198701) ge 0

Definition 4 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

One has 1198651 1198652

sube 119865 1198701and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Then the mutual information of 119870

1and 119870

2is

defined as

MI (1198701 1198702) = 119864 (119870

1) + 119864 (119870

2) minus 119864 (119870) (9)

As to Gaussian kernel MI(1198701 1198702) = MI(119870

2 1198701) If 119870

1sube

1198702 we have MI(119870

1 1198702) = 119864(119870

2) and if 119870

2sube 1198701 we have

MI(1198701 1198702) = 119864(119870

1)

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

The Scientific World Journal 5

20

40

30

210

60

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 5

20

40

60

80

100

30

210

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 =130

30

210

60

240

90

270

120

300

150

330

180 0

400

500

300

200

100

(c) Rose map of exhaust temperatures at 119905 =250

30

50

210

60

240

90

270

120

100

150

300

150

330

180 0

(d) Rose map of exhaust temperatures at 119905 =400

100

80

60

40

20

30

210

60

240

90

270

120

300

150

330

180 0

(e) Rose map of exhaust temperatures at 119905 =500

Figure 6 Samples of temperature distribution in different times

Another example is also given in Figures 7 to 9 In thisexample there is significant difference between the outputsof 13 thermometers even when the gas turbine is not runningjust as shown in Figure 9(a)Thus the entropy of temperaturedistribution is a little lower than the ideal case as shown inFigure 8 Besides some representative samples are also givenin Figure 9

Considering the above examples we can see that the func-tion of entropy is an effective measurement of uniformity Itcan be used to reflect the uniformity of exhaust temperatures

If the uniformity is less than a threshold some faults possiblyoccur to the gas path of the gas turbine Thus the entropyfunction is used as an index of the health of the gas path

4 Fault Feature Quality Evaluation withGeneralized Entropy

The above section gives an approach to detecting the abnor-mality in the exhaust temperature distribution However thefunction of entropy cannot distinguish what kind of faults

6 The Scientific World Journal

0 500 1000 1500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 7 Exhaust temperatures from another set of thermometers

0 500 1000 1500

Time

245

25

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case

occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault

Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue

Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879

0is also an impor-

tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as

119865 = 1198790 1198791 1198792 119879

119899 1198791minus 1198792 1198792minus 1198793

119879119899minus 1198791 1198791015840

1 1198791015840

2 119879

1015840

119899

(2)

where 1198791015840119894= 119879119894(119895) minus 119879

119894(119895 minus 1) 119879

119894(119895) is the temperature at time

119895 of the 119894th thermometerApart from the above features we can also construct other

attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally

There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it

Given a set of samples 119880 = 1199091 1199092 119909

119898 each sample

is described with 119899 features 119865 = 1198911 1198912 119891

119899 As to

classification learning each training sample 119909119894is associated

with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a

kernel function119870 we can calculate a kernel matrix

119870 =[

[

[

11989611

1198961119898

d

1198961198981

119896119898119898

]

]

]

(3)

where 119896119894119895= 119896(119909

119894 119909119895) The Gaussian function is a representa-

tive kernel function

119896119894119895= exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120590

) (4)

A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896

119894119895= 119896119895119894

Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as

119863 =[

[

[

11988911

1198891119898

d

1198891198981

119889119898119898

]

]

]

(5)

where 119889119894119895= 1 if 119910

119894= 119910119895 otherwise 119889

119894119895= 0 In fact the matrix

119863 is a matching kernel

Definition 1 Given a set of samples119880 = 1199091 1199092 119909

119898 each

sample is described with 119899 features 119865 = 1198911 1198912 119891

119899 1198651015840 sube

119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as

119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(6)

where119870119894= sum119898

119895=1119896119894119895

As to the above entropy function if we use Gaussianfunction as the kernel we have log

2119898 ge 119864 (119870) ge 0 119864 (119870) = 0

if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log

2119898 if and only if 119896

119894119895= 0

119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be

The Scientific World Journal 7

30

210

60

15

10

5

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 500

30

210

800

600

400

200

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 = 758

30

210

250

200

150

100

50

60

240

90

270

120

300

150

330

180 0

(c) Rose map of exhaust temperatures at 119905 = 820

30

210

60

240

90

270

120

300

150

330

180 0

400

300

200

100

(d) Rose map of exhaust temperatures at 119905 =1220

Figure 9 Samples of temperature distribution in different moments

distinguished with the current features while 119864 (119870) = log2119898

means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log

2119898]

Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge

119864(1198702) where119870

1sube 1198702means119870

1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895

Definition 2 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651 1198652sube 119865 119870

1and 119870

2are two kernel matrices induced by 119865

1

and 1198652119870 is a new function computed with 119865

1cup 1198652 Then the

joint entropy of 1198651and 119865

2is defined as

119864 (1198701 1198702) = 119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(7)

where119870119894= sum119898

119895=1119896119894119895

As to the Gaussian function 119870(119909119894 119909119895) = 119870

1(119909119894 119909119895) times

1198702(119909119894 119909119895) Thus 119870 sube 119870

1and 119870 sube 119870

2 In this case 119864(119870) ge

119864(1198701) and 119864(119870) ge 119864(119870

2)

Definition 3 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912

119891119899One has 119865

1 1198652sube 119865 119870

1and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Knowning 119865

1 the condition entropy of 119865

2is

defined as

119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870

1) (8)

As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870

2)

so 119864(1198701| 1198702) ge 0 and 119864(119870

2| 1198701) ge 0

Definition 4 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

One has 1198651 1198652

sube 119865 1198701and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Then the mutual information of 119870

1and 119870

2is

defined as

MI (1198701 1198702) = 119864 (119870

1) + 119864 (119870

2) minus 119864 (119870) (9)

As to Gaussian kernel MI(1198701 1198702) = MI(119870

2 1198701) If 119870

1sube

1198702 we have MI(119870

1 1198702) = 119864(119870

2) and if 119870

2sube 1198701 we have

MI(1198701 1198702) = 119864(119870

1)

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

6 The Scientific World Journal

0 500 1000 1500

0

100

200

300

400

500

600

700

Time

Tem

pera

ture

Figure 7 Exhaust temperatures from another set of thermometers

0 500 1000 1500

Time

245

25

255

26

265

27

275

28

Info

rmat

ion

entro

py

Figure 8 Entropy of the temperature distribution where the reddash line is the ideal case and the blue one is the real case

occurs to the system although it detects abnormality In orderto analyze why the temperature distribution is not uniformwe should develop some algorithms to recognize the fault

Before training an intelligent model we should constructsome features and select the most informative subsets torepresent different faults In this section we will discuss thisissue

Intuitively we know that the temperatures of all ther-mometers reflect the state of the system Besides the tem-perature difference between neighboring thermometers alsoindicates the source of faults which are considered as spaceneighboring information Moreover we know the temper-ature change of a thermometer necessarily gives hints tostudy the faults which can be viewed as time neighboringinformation In fact the inlet temperature119879

0is also an impor-

tant factor In summary we can use exhaust temperaturesand their neighboring information along time and space torecognize different faults If there are 119899 (119899 = 13 in our system)thermometers we can form a feature vector to describe thestate of the exhaust system as

119865 = 1198790 1198791 1198792 119879

119899 1198791minus 1198792 1198792minus 1198793

119879119899minus 1198791 1198791015840

1 1198791015840

2 119879

1015840

119899

(2)

where 1198791015840119894= 119879119894(119895) minus 119879

119894(119895 minus 1) 119879

119894(119895) is the temperature at time

119895 of the 119894th thermometerApart from the above features we can also construct other

attributes to reflect the conditions of the gas turbine In thiswork we consider a gas turbine with 13 thermometers aroundthe exhaust So we can form a 40-attribute vector finally

There are some questions whether all the extractedfeatures are useful for finalmodeling and howwe can evaluatethe features and find the most informative features In factthere are a number of measures to estimate feature qualitysuch as dependency in the rough set theory [21] consistency[22] mutual information in the information theory [23] andclassification margin in the statistical learning theory [24]However all these measures are computed in the originalinput space while the effective classification techniquesusually implement a nonlinear mapping of the original spaceto a feature space by a kernel function In this case we requirea new measure to reflect the classification information ofthe feature space Now we extend the traditional informationentropy to measure it

Given a set of samples 119880 = 1199091 1199092 119909

119898 each sample

is described with 119899 features 119865 = 1198911 1198912 119891

119899 As to

classification learning each training sample 119909119894is associated

with a decision 119910119894 As to an arbitrary subset 1198651015840 sube 119865 and a

kernel function119870 we can calculate a kernel matrix

119870 =[

[

[

11989611

1198961119898

d

1198961198981

119896119898119898

]

]

]

(3)

where 119896119894119895= 119896(119909

119894 119909119895) The Gaussian function is a representa-

tive kernel function

119896119894119895= exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120590

) (4)

A number of kernel functions have the properties(1) 119896119894119895isin [0 1] (2) 119896

119894119895= 119896119895119894

Kernel matrix plays a bottleneck role in kernel basedlearning [25] All the information that a classification algo-rithm can use is hidden in this matrix In the same time wecan also calculate a decision kernel matrix as

119863 =[

[

[

11988911

1198891119898

d

1198891198981

119889119898119898

]

]

]

(5)

where 119889119894119895= 1 if 119910

119894= 119910119895 otherwise 119889

119894119895= 0 In fact the matrix

119863 is a matching kernel

Definition 1 Given a set of samples119880 = 1199091 1199092 119909

119898 each

sample is described with 119899 features 119865 = 1198911 1198912 119891

119899 1198651015840 sube

119865119870 is a kernelmatrix over119880 in terms of1198651015840Then the entropyof 1198651015840 is defined as

119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(6)

where119870119894= sum119898

119895=1119896119894119895

As to the above entropy function if we use Gaussianfunction as the kernel we have log

2119898 ge 119864 (119870) ge 0 119864 (119870) = 0

if and only if 119896119894119895= 1 forall119894 119895 119864 (119870) = log

2119898 if and only if 119896

119894119895= 0

119894 = 119895 119864 (119870) = 0 means that any pair of samples cannot be

The Scientific World Journal 7

30

210

60

15

10

5

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 500

30

210

800

600

400

200

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 = 758

30

210

250

200

150

100

50

60

240

90

270

120

300

150

330

180 0

(c) Rose map of exhaust temperatures at 119905 = 820

30

210

60

240

90

270

120

300

150

330

180 0

400

300

200

100

(d) Rose map of exhaust temperatures at 119905 =1220

Figure 9 Samples of temperature distribution in different moments

distinguished with the current features while 119864 (119870) = log2119898

means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log

2119898]

Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge

119864(1198702) where119870

1sube 1198702means119870

1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895

Definition 2 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651 1198652sube 119865 119870

1and 119870

2are two kernel matrices induced by 119865

1

and 1198652119870 is a new function computed with 119865

1cup 1198652 Then the

joint entropy of 1198651and 119865

2is defined as

119864 (1198701 1198702) = 119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(7)

where119870119894= sum119898

119895=1119896119894119895

As to the Gaussian function 119870(119909119894 119909119895) = 119870

1(119909119894 119909119895) times

1198702(119909119894 119909119895) Thus 119870 sube 119870

1and 119870 sube 119870

2 In this case 119864(119870) ge

119864(1198701) and 119864(119870) ge 119864(119870

2)

Definition 3 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912

119891119899One has 119865

1 1198652sube 119865 119870

1and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Knowning 119865

1 the condition entropy of 119865

2is

defined as

119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870

1) (8)

As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870

2)

so 119864(1198701| 1198702) ge 0 and 119864(119870

2| 1198701) ge 0

Definition 4 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

One has 1198651 1198652

sube 119865 1198701and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Then the mutual information of 119870

1and 119870

2is

defined as

MI (1198701 1198702) = 119864 (119870

1) + 119864 (119870

2) minus 119864 (119870) (9)

As to Gaussian kernel MI(1198701 1198702) = MI(119870

2 1198701) If 119870

1sube

1198702 we have MI(119870

1 1198702) = 119864(119870

2) and if 119870

2sube 1198701 we have

MI(1198701 1198702) = 119864(119870

1)

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

The Scientific World Journal 7

30

210

60

15

10

5

240

90

270

120

300

150

330

180 0

(a) Rose map of exhaust temperatures at 119905 = 500

30

210

800

600

400

200

60

240

90

270

120

300

150

330

180 0

(b) Rose map of exhaust temperatures at 119905 = 758

30

210

250

200

150

100

50

60

240

90

270

120

300

150

330

180 0

(c) Rose map of exhaust temperatures at 119905 = 820

30

210

60

240

90

270

120

300

150

330

180 0

400

300

200

100

(d) Rose map of exhaust temperatures at 119905 =1220

Figure 9 Samples of temperature distribution in different moments

distinguished with the current features while 119864 (119870) = log2119898

means any pair of samples is different from each other Sothey can be distinguished These are two extreme cases Inreal-world applications part of samples can be discernedwiththe available features while others are not In this case theentropy function takes value in the interval [0 log

2119898]

Moreover it is easy to show that if 1198701sube 1198702 119864(1198701) ge

119864(1198702) where119870

1sube 1198702means119870

1(119909119894 119909119895) le 1198702(119909119894 119909119895) forall119894 119895

Definition 2 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651 1198652sube 119865 119870

1and 119870

2are two kernel matrices induced by 119865

1

and 1198652119870 is a new function computed with 119865

1cup 1198652 Then the

joint entropy of 1198651and 119865

2is defined as

119864 (1198701 1198702) = 119864 (119870) = minus

1

119898

119898

sum

119894=1

log2

119870119894

119898

(7)

where119870119894= sum119898

119895=1119896119894119895

As to the Gaussian function 119870(119909119894 119909119895) = 119870

1(119909119894 119909119895) times

1198702(119909119894 119909119895) Thus 119870 sube 119870

1and 119870 sube 119870

2 In this case 119864(119870) ge

119864(1198701) and 119864(119870) ge 119864(119870

2)

Definition 3 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912

119891119899One has 119865

1 1198652sube 119865 119870

1and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Knowning 119865

1 the condition entropy of 119865

2is

defined as

119864 (1198701| 1198702) = 119864 (119870) minus 119864 (119870

1) (8)

As to the Gaussian kernel 119864(119870) ge 119864(1198701) and 119864(119870) ge 119864(119870

2)

so 119864(1198701| 1198702) ge 0 and 119864(119870

2| 1198701) ge 0

Definition 4 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

One has 1198651 1198652

sube 119865 1198701and 119870

2are two kernel matrices

induced by 1198651and 119865

2 119870 is a new kernel function computed

with 1198651cup 1198652 Then the mutual information of 119870

1and 119870

2is

defined as

MI (1198701 1198702) = 119864 (119870

1) + 119864 (119870

2) minus 119864 (119870) (9)

As to Gaussian kernel MI(1198701 1198702) = MI(119870

2 1198701) If 119870

1sube

1198702 we have MI(119870

1 1198702) = 119864(119870

2) and if 119870

2sube 1198701 we have

MI(1198701 1198702) = 119864(119870

1)

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

8 The Scientific World Journal

0 5 10 15 20 25 30 35 40 450

005

01

015

02

025

03

035

04

Feature index

Dep

ende

ncy

Figure 10 Fuzzy dependency between a single feature and decision

0 5 10 15 20 25 30 35 40 45

Feature index

Dep

ende

ncy

0

02

04

06

08

1

12

14

Figure 11 Kernelized mutual information between a single featureand decision

Please note that if 1198651sube 1198652 we have 119870

2sube 1198701 However

1198702sube 1198701does not mean 119865

1sube 1198652

Definition 5 Given a set of samples 119880 = 1199091 1199092 119909

119898

each sample is described with 119899 features 119865 = 1198911 1198912 119891

119899

1198651015840sube 119865119870 is a kernelmatrix over119880 in terms of1198651015840 and119863 is the

kernel matrix computed with the decision Then the featuresignificance 1198651015840 related to the decision is defined as

MI (119870119863) = 119864 (119870) + 119864 (119863) minus 119864 (119870119863) (10)

MI (119870119863) measures the importance of feature subset 1198651015840in the kernel space to distinguish different classes It can beunderstood as a kernelized version of Shannon informationentropy which is widely used feature evaluation selectionIn fact it is easy to derive the equivalence between thisentropy function and Shannon entropy in the condition thatthe attributes are discrete and the matching kernel is used

Now we show an example in gas turbine fault diagnosisWe collect 3581 samples from two sets of gas turbine systems1440 samples are healthy and the others belong to four kindsof faults load rejection sensor fault fuel switching and saltspray corrosion The numbers of samples are 45 588 71 and1437 respectively Thirteen thermometers are installed in theexhaust According to the approach described above we forma 40-dimensional vector to represent the state of the exhaust

1 2 3 4

0

02

04

06

08

1

Selected feature number

Fuzz

y de

pend

ency

03475

09006

0999609969

Figure 12 Fuzzy dependency between the selected features anddecisions (Features 5 37 2 and 3 are selected sequentially)

0

02

04

06

08

1

12

14

16

18

Kern

eliz

ed m

utua

l inf

orm

atio

n

1267

161215771510

1 2 3 4

Number of selected features

Figure 13 Kernelized mutual information between the selectedfeatures and decisions (Features 39 31 38 and 40 are selectedsequentially)

Obviously the classification task is not understandable insuch high dimensional space Moreover some features maybe redundant for classification learning which may confusethe learning algorithm and reduce modeling performanceSo it is a key preprocessing step to select the necessary andsufficient subsets

Here we compare the fuzzy rough set based featureevaluation algorithm with the proposed kernelized mutualinformation Fuzzy dependency has been widely discussedand applied in feature selection and attribute reduction theseyears [26ndash28] Fuzzy dependency can be understood as theaverage distance from the samples and their nearest neighborbelonging to different classes while the kernelized mutualinformation reflects the relevance between features anddecision in the kernel space

Comparing Figures 10 and 11 significant difference isobtained As to fuzzy rough sets Feature 5 produces thelargest dependency and then Feature 38 However Feature39 gets the largest mutual information and Feature 2 is thesecond one Thus different feature evaluation functions willlead to completely different results

Figures 10 and 11 present the significance of singlefeatures In applications we should combine a set of featuresNow we consider a greedy search strategy Starting from anempty set and the best features are added one by one In

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

The Scientific World Journal 9

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 14 Scatter plots in 2D space expended with feature pairs selected by fuzzy dependency

each round we select a feature which produces the largestsignificance increment with the selected subset Both fuzzydependency and kernelized mutual information increasemonotonically if new attributes are added If the selectedfeatures are sufficient for classification these two functionswill keep invariant by adding any new attributes So we canstop the algorithm if the increment of significance is less thana given threshold The significances of the selected featuresubset are shown in Figures 12 and 13 respectively

In order to show the effectiveness of the algorithm wegive the scatter plots in 2D spaces as shown in Figures 14 to16 which are expended by the feature pairs selected by fuzzydependency kernelized mutual information and Shannonmutual information As to fuzzy dependency we selectFeatures 5 37 2 and 3Then there are 4times4 = 16 combinationsof feature pairs The subplot in the 119894th row and 119895th column inFigure 14 gives the scatters of samples in 2D space expandedby the 119894th selected feature and the 119895th selected feature

Observing the 2nd subplots in the first row of Figure 14we can find that the classification task is nonlinear The firstclass is dispersed and the third class is also located at different

regions which leads to the difficulty in learning classificationmodels

However in the corresponding subplot of Figure 15 wecan see that each class is relatively compact which leads to asmall intraclass distanceMoreover the samples in five classescan be classified with some linear models which also bringbenefit for learning a simple classification model

Comparing Figures 15 and 16 we can find that differentclasses are overlapped in feature spaces selected by Shannonmutual information or get entangled which leads to the badclassification performance

5 Diagnosis Modeling with InformationEntropy Based Decision Tree Algorithm

After selecting the informative features we now go to clas-sification modeling There are a great number of learningalgorithms for building a classificationmodel Generalizationcapability and interpretability are the two most importantcriteria in evaluating an algorithm As to fault diagnosis adomain expert usually accepts a model which is consistent

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

10 The Scientific World Journal

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 15 Scatter in 2D space expended with feature pairs selected by kernelized mutual information

with his common knowledge Thus he expects the modelis understandable otherwise he will not believe the outputsof the model In addition if the model is understandable adomain expert can adapt it according to his prior knowledgewhich makes the model suitable for different diagnosisobjects

Decision tree algorithms including CART [29] ID3[17] and C45 [18] are such techniques for training anunderstandable classification model The learned model canbe transformed into a set of rules All these algorithms builda decision tree from training samples They start from a rootnode and select one of the features to divide the samples withcuts into different branches according to their feature valuesThis procedure is interactively conducted until the branch ispure or a stopping criterion is satisfiedThe key difference liesin the evaluation function in selecting attributes or cuts InCART splitting rules GINI and Twoing are adopted whileID3 uses information gain and C45 takes information gainratioMoreover C45 can deal with numerical attributes com-pared with ID3 Competent performance is usually observedwith C45 in real-world applications compared with somepopular algorithms including SVM and Baysian net In this

work we introduce C45 to train classification models Thepseudocode of C45 is formulated as follows

Decision tree algorithm C45Input a set of training samples 119880 = 119909

1 1199092 119909

119898

with features 119865 = 1198911 1198912 119891

119899

Stopping criterion 120591Output decision tree 119879

(1) Check for sample set(2) For each attribute 119891 compute the normalized infor-

mation gain ratio from splitting on 119886(3) Let f best be the attribute with the highest normalized

information gain(4) Create a decision node that splits on f best(5) Recurse on the sublists obtained by splitting on

f best and add those nodes as children of node untilstopping criterion 120591 is satisfied

(6) Output 119879

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

The Scientific World Journal 11

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

0 05 1

0

05

1

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Figure 16 Scatter in 2D space expended with feature pairs selected by Shannon mutual information

F2

le050gt050

F37 F2

le041

Class 2

gt018

Class 5

le049

Class 1

gt049

Class 4 F3

Class 3

gt041

le018

Figure 17 Decision tree trained on the features selected with fuzzyrough sets

We input the data sets into C45 and build the followingtwo decision trees Features 5 37 2 and 3 are included in thefirst dataset and Features 39 31 38 and 40 are selected in thesecond dataset The two trees are given in Figures 17 and 18respectively

F39le017gt017

F39 F40

le042

Class 3

gt042

Class 5

le045

Class 2

gt045

F38le080gt080

Class 1Class 4

Figure 18 Decision tree trained on the features selected withkernelized mutual information

We start from the root node to a leaf node along thebranch and then a piece of rule is extracted from the treeAs to the first tree we can get five decision rules

(1) if F2 gt 050 and F37 gt 049 then the decision is Class4

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

12 The Scientific World Journal

(2) if F2 gt 050 and F37 le 049 then the decision is Class1

(3) if 018 lt F2 le 050 and F3 gt 041 then the decision isClass 5

(4) if 018 lt F2 le 050 and F3 le 041 then the decision isClass 3

(5) if F2 le 018 then the decision is Class 2

As to the second decision tree we can also obtain somerules as

(1) if F39 gt 045 and F38 gt 080 then the decision is Class4

(2) if F39 gt 045 and F38 le 080 then the decision is Class1

(3) if 017 lt F39 le 045 then the decision is Class 2(4) if F39 le 017 and F40 gt 042 then the decision is Class

5(5) if F39 le 017 and F40 le 042 then the decision is Class

3

We can see the derived decision trees are rather simpleand each can extract five pieces of rules It is very easy fordomain experts to understand the rules and even revise therules As the classification task is a little simple the accuracyof each model is high to 97 As new samples and faults arerecorded by the systemmore andmore complex tasksmay bestored In that case the model may become more and morecomplex

6 Conclusions and Future Works

Automatic fault detection and diagnosis are highly desirablein some industries such as offshore oil well drilling plat-forms for such systems are self-monitoring without manon duty In this work we design an intelligent abnormalitydetection and fault recognition technique for the exhaustsystem of gas turbines based on information entropy whichis used in measuring the uniformity of exhaust temperaturesevaluating the significance of features in kernel spaces andselecting splitting nodes for constructing decision trees Themain contributions of the work are two parts First weintroduce the entropy function to measure the uniformity ofexhaust temperatures The measurement is easy to computeand understand Numerical experiments also show its effec-tiveness Second we extend Shannon entropy for evaluatingthe significance of attributes in kernelized feature spaces Wecompute the relevance between a kernel matrix induced witha set of attributes and the matrix computed with the decisionvariable Some numerical experiments are also presentedGood results are derived

Although this work gives an effective framework forautomatic fault detection and recognition the proposedtechnique is not tested on large-scale real tasks We havedeveloped a remote state monitoring and fault diagnosis sys-tem Large scale data are flooding into the center In thefuture we will improve these techniques and develop areliable diagnosis system

Conflict of Interests

The authors declare that they have no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is partially supported by National Natural Founda-tion under Grants 61222210 and 61105054

References

[1] C Wang L Xu and W Peng ldquoConceptual design of remotemonitoring and fault diagnosis systemsrdquo Information Systemsvol 32 no 7 pp 996ndash1004 2007

[2] W Donat K ChoiW An S Singh and K Pattipati ldquoData visu-alization data reduction and classifier fusion for intelligent faultdiagnosis in gas turbine enginesrdquo Journal of Engineering for GasTurbines and Power vol 130 no 4 Article ID 041602 2008

[3] Y G Li and P Nilkitsaranont ldquoGas turbine performance prog-nostic for condition-based maintenancerdquo Applied Energy vol86 no 10 pp 2152ndash2161 2009

[4] H Bassily R Lund and JWagner ldquoFault detection inmultivari-ate signals with applications to gas turbinesrdquo IEEE Transactionson Signal Processing vol 57 no 3 pp 835ndash842 2009

[5] K Young D Lee V Vitali and Y Ming ldquoA fault diagnosismethod for industrial gas turbines using bayesian data analysisrdquoJournal of Engineering for Gas Turbines and Power vol 132Article ID 041602 2010

[6] B Yu D Liu and T Zhang ldquoFault diagnosis for micro-gasturbine engine sensors via wavelet entropyrdquo Sensors vol 11 no10 pp 9928ndash9941 2011

[7] S Wu P Wu C Wu J Ding and C Wang ldquoBearing faultdiagnosis based onmultiscale permutation entropy and supportvector machinerdquo Entropy vol 14 no 8 pp 1343ndash1356 2012

[8] S Wu C Wu T Wu and C Wang ldquoMulti-scale analysis basedball bearing defect diagnostics using Mahalanobis distance andsupport vector machinerdquo Entropy vol 15 no 2 pp 416ndash4332013

[9] H A Nozari M A Shoorehdeli S Simani andH D BanadakildquoModel-based robust fault detection and isolation of an indus-trial gas turbine prototype using soft computing techniquesrdquoNeurocomputing vol 91 pp 29ndash47 2012

[10] S Sarkar X Jin and A Ray ldquoData-driven fault detection inaircraft engines with noisy sensor measurementsrdquo Journal ofEngineering for Gas Turbines and Power vol 133 no 8 ArticleID 081602 10 pages 2011

[11] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 pp 379ndash423 1948

[12] S M Pincus ldquoApproximate entropy as a measure of systemcomplexityrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 88 no 6 pp 2297ndash2301 1991

[13] L I Kuncheva and C J Whitaker ldquoMeasures of diversity inclassifier ensembles and their relationship with the ensembleaccuracyrdquoMachine Learning vol 51 no 2 pp 181ndash207 2003

[14] I Csiszar ldquoAxiomatic characterizations of information mea-suresrdquo Entropy vol 10 no 3 pp 261ndash273 2008

[15] M Zanin L Zunino O A Rosso and D Papo ldquoPermutationentropy and itsmain biomedical and econophysics applicationsa reviewrdquo Entropy vol 14 no 8 pp 1553ndash1577 2012

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

The Scientific World Journal 13

[16] C Wang and H Shen ldquoInformation theory in scientific visual-izationrdquo Entropy vol 13 pp 254ndash273 2011

[17] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[18] J QuinlanC45 Programs forMachine Learning Morgan Kauf-mann 1993

[19] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[20] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998

[21] QHu D YuW Pedrycz andD Chen ldquoKernelized fuzzy roughsets and their applicationsrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 11 pp 1649ndash1667 2011

[22] Q Hu H Zhao Z Xie and D Yu ldquoConsistency based attributereductionrdquo in Advances in Knowledge Discovery and DataMining Z-H Zhou H Li and Q Yang Eds vol 4426 ofLecture Notes in Computer Science pp 96ndash107 Springer BerlinGermany 2007

[23] Q Hu D Yu and Z Xie ldquoInformation-preserving hybrid datareduction based on fuzzy-rough techniquesrdquo Pattern Recogni-tion Letters vol 27 no 5 pp 414ndash423 2006

[24] R Gilad-Bachrach A Navot and N Tishby ldquoMargin basedfeature selectionmdashtheory and algorithmsrdquo in Proceedings of the21th International Conference on Machine Learning (ICML rsquo04)pp 337ndash344 July 2004

[25] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press 2004

[26] Q Hu Z Xie andD Yu ldquoHybrid attribute reduction based on anovel fuzzy-roughmodel and information granulationrdquo PatternRecognition vol 40 no 12 pp 3509ndash3521 2007

[27] R Jensen and Q Shen ldquoNew approaches to fuzzy-rough featureselectionrdquo IEEE Transactions on Fuzzy Systems vol 17 no 4 pp824ndash838 2009

[28] S Zhao E C C Tsang and D Chen ldquoThe model of fuzzyvariable precision rough setsrdquo IEEE Transactions on FuzzySystems vol 17 no 2 pp 451ndash467 2009

[29] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth amp BrooksColeAdvanced Books amp Software Monterey Calif USA 1984

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article Fault Detection and Diagnosis for Gas ...downloads.hindawi.com/journals/tswj/2014/617162.pdf · Research Article Fault Detection and Diagnosis for Gas Turbines Based

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014