Reconstruction based fault prognosis for continuous processes

Control Engineering Practice 18 (2010) 1211–1219

Contents lists available at ScienceDirect

Control Engineering Practice

0967-06

doi:10.1

$A b

June 30� Corr

tel.: +8

E-m

journal homepage: www.elsevier.com/locate/conengprac

Reconstruction based fault prognosis for continuous processes$

Gang Li a, S. Joe Qin b,�, Yindong Ji c, Donghua Zhou a,�

a Department of Automation, TNList, Tsinghua University, Beijing 100084, PR Chinab The Mork Family Department of Chemical Engineering and Materials Science, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles,

CA 90089, USAc RIIT, Tsinghua University, Beijing 100084, PR China

a r t i c l e i n f o

Article history:

Received 11 August 2009

Accepted 25 May 2010Available online 20 June 2010

Keywords:

Multivariate fault prognosis

Principal component analysis

Reconstruction based estimation

Vector AR model

Wavelet based denoising

61/$ - see front matter & 2010 Elsevier Ltd. A

016/j.conengprac.2010.05.012

rief version of the paper was presented at IF

–July 3, 2009, Barcelona.

esponding authors. Tel.: +1 213 740 0317 (S.

6 010 62794461 (D. Zhou).

ail addresses: [email protected] (S.J. Qin), zdh@tsi

a b s t r a c t

In this paper, a multivariate fault prognosis approach for continuous processes with hidden faults is

proposed based on statistical process monitoring methods and multivariate time series prediction. It is

assumed that the fault is a slowly time-varying autocorrelated process and can be completely

reconstructed. Fault magnitude is estimated first via reconstruction, then predicted by a vector AR

model with wavelet based denoising. Given the fault direction, a new index is proposed to detect the

fault, which integrates fault detection and prognosis together. Case studies on a continuous stirred tank

reactor and the Tennessee Eastman process demonstrate the effectiveness of the proposed approaches.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Reliability, maintainability and safety of complex dynamicsystems, such as the monitoring system of high speed trains, haveattracted much attention in recent years (Labeau, Smidts, &Swaminathan, 2000; Saha, 2003). However, the traditionalmaintenance and repair strategy, which is so-called ‘run-to-failure’ maintenance, are not adequate, as industrial systems areincreasingly called to produce at higher throughput and betterquality (Jardine, Lin, & Banjevic, 2006; Kothamasu, Huang, &VerDuin, 2006). Meanwhile, the repairing and inventory cost ofsome important equipment have become a major cost in thecomplicated, integrated and interacting systems. To guaranteesystem safety and decrease maintenance cost, condition-basedmaintenance (CBM) was proposed (Jardine et al., 2006; Kothamasuet al., 2006). This strategy performs maintenance and repair basedon the assessment of the equipment condition instead of itsservice time. If the degradation process of components can bepredicted, the schedule for the engineer to repair and replacecomponents can be acquired before components fail and systemsbreak. Therefore, fault prognosis, which determines whether afailure is impending and estimates how soon and how likely afailure will occur, will play an important role in condition-based

ll rights reserved.

AC Safeprocess Symposium,

J. Qin);

nghua.edu.cn (D. Zhou).

maintenance and form a basis for further maintenance decision(Jardine et al., 2006).

Li et al. (1999) and Li, Kurfess, and Liang (2000) introduced twomethodologies of bearing fault prognosis based on a deterministicand a stochastic defect propagation model with vibration signalanalysis. The defect propagation is represented by a mechanisticmodel with time-varying parameters, which offers the bestbearing state prediction in the least squares sense. Oppenheimerand Loparo (2002) used a physical relationship between faultseverity, machine signatures and remaining life based on crackgrowth law to estimate remaining useful life (RUL). Qiu, Seth,Liang, and Zhang (2002) related the natural frequency and theacceleration amplitude of a bearing system to its running timeand failure lifetime by using system stiffness as a bridge. Thus, theRUL can be predicted online based on vibration measurement.Zhang and Ganesan (1997) applied a multivariate trendinganalysis based on self-organizing neural networks for faultprognosis. Wang, Golnaraghi, and Ismail (2004) compared theperformance of adopting recurrent neural networks and neural-fuzzy inference systems to predict the fault damage propagationtrend.

In the aforementioned research, it is assumed that faultprocess can be observed directly. However, it is difficult or costly,to observe fault or degradation processes directly in many cases.Wang and Vachtsevanos (2002) used wavelet neural networks asa virtual sensor to produce the fault evaluation information of abearing crack fault and then applied the fault prognosis with aDWNN (dynamic wavelet neural networks) predictor. Chelidzeand Cusumano (2004) considered a hierarchical dynamicalsystem consisting of a directly observable subsystem coupled to

www.elsevier.com/locate/conengprac

dx.doi.org/10.1016/j.conengprac.2010.05.012

mailto:[email protected]

mailto:[email protected]

dx.doi.org/10.1016/j.conengprac.2010.05.012

G. Li et al. / Control Engineering Practice 18 (2010) 1211–12191212

a slowly time-varying hidden fault process. Both structures werepreviously known and used to build a new observer for the hiddenstate, which was then recursively estimated to predict the RUL.Zhang et al. (2005) utilized principal signal features, which wereextracted by principal component analysis (PCA), to generate acomponent health/degradation index using a hidden Markovmodel (HMM), for further RUL prediction. Xu, Ji, and Zhou (2008)introduced a real-time reliability prediction method for a dynamicsystem that suffered from a hidden degradation process. Thismethod can also be applied for fault prognosis. Phelps, Willett,Kirubarajan, and Brideau (2007) proposed using failure probabil-ities of binary random signals of sensors to track the systemhealth, which avoided predicting essentially unbounded para-meters of a real-valued system. The main idea behind thesemethods is to monitor the hidden fault process from observations,and then use either mechanical or statistical prediction models topredict the RUL. However, model-based methods need accuratemodel knowledge (Chelidze & Cusumano, 2004; Xu et al., 2008),which may be difficult and time-consuming for many industrialprocesses. HMM-based methods assume that system states arefinite and discrete (Zhang et al., 2005), which is more suitable indiscrete processes rather than continuous processes.

Approaches based on statistical process monitoring (SPM),which is one of the most active research areas in process control,have been studied actively over the last two decades (Qin, 2003).They are much easier to be applied to real processes than model-based and knowledge-based methods, because of the data-basednature. Instead of a causal model or a mechanistic model, they usean empirical correlation model built from normal operating datato monitor the system and isolate faults with the fault directionextracted from historical faulty data (Valle, Qin, & Piovoso, 2001).These approaches have found wide applications in fault detectionand diagnosis of different industrial processes, including chemi-cals, polymers and microelectronics. If they are combined withprediction technologies, a fault prognosis approach with abovebenefits can be developed. Until now, little effort has been putinto the study related to fault prognosis based on theseapproaches. Chen and McAvoy (1998) developed a predictivemonitoring approach for batch processes based on multi-wayprincipal component analysis. Juricek, Seborg, and Larimore(2001) proposed a predictive monitoring approach for a dynamiclinear model with a disturbance of known type based on theevaluation of the Hotelling’s T2 statistic.

In this paper, a data-based multivariate fault prognosisapproach for continuous processes is proposed. The faultmagnitude is estimated based on the fault reconstruction. Then,a vector auto-regressive (AR) model with wavelet based denoisingis adopted to model the fault process and predict the RUL. Theremainder of this paper is organized as follows. Section 2 providesthe problem formulation of the multivariate fault prognosis forcontinuous processes. Section 3 introduces the algorithms forfault estimation from the observations based on the faultreconstruction. A new fault detection index is proposed tointegrate fault detection and prognosis. In Section 4, wavelet-based denoising procedure is used for preprocessing. Then, avector AR model is adopted to describe the fault process. Thetrained model is used to predict RUL. Section 5 shows thefeasibility of the approach with two case studies. Conclusions aregiven in the last section of the paper.

2. Problem formulation

Let xARm denote a sample vector of m sensors, and xn denotethe normal sample vector under normal operating conditions. If afault F occurs, the sample vector x can be related with fault F as

follows (Dunia & Qin, 1998b):

xk ¼ x�kþNfk ð1Þ

where NARm�s represents the fault direction matrix related to F ,along which the fault develops, and fkARs represents faultmagnitude along s directions of N at time k. In general, som.

The matrix N is supposed to be known beforehand. For themultidimensional faults, N is chosen to be an orthonormal matrix.For the unidimensional fault, N reduces to a vector with unit norm(Dunia & Qin, 1998b). Although there are no state equations here,N can represent a process fault and a sensor fault (Dunia & Qin,1998a) as long as the measured variables are affected by thehidden fault process. N can represent both simple and complexfaults which are classified by Yoon and MacGregor (2001). Iffaulty data are available for a type of fault, fault direction matrixcan be extracted from faulty data directly. Examples are given forcontinuous processes (Valle et al., 2001) and batch processes (Yue& Qin, 2001).

It is a necessary condition for fault prognosis that the hiddenfault process is a slowly time-varying autocorrelated process.Note that fk changes over time as the actual fault processdevelops. Thus fk is an autocorrelated time series.

3. Fault estimation via reconstruction

As the normal sample vector xn cannot be obtained, faultprocess fk cannot be observed directly. To estimate fk, a data-based process model is required. Effective data-based modelsexist in the statistical process monitoring area, including principalcomponent analysis (PCA), partial least squares (PLS) and theirvariants. In this section, a PCA model is used to build a variablecorrelation model.

3.1. PCA modeling

Let xARm denote a sample vector of m sensors. Assumingthere are n samples for each sensor, a data matrix XARn�m

consists of n samples with each row representing a sample andeach column representing a sensor. Correlation-based PCAdecomposes X into two parts (referring to Qin, 2003):

X¼ Xþ ~X ¼ TPTþ ~X ð2Þ

where columns of X are zero-centered and scaled to unit variance.X represents the modeled variations of X and ~X the residualvariations. The score and loading matrices, i.e. TARn�A andPARm�A, respectively, can be obtained from eigenvalue decom-position of the sample covariance matrix, with A the number ofsignificant principal components (typically Aom) retained suchthat T¼XP (Valle, Li, & Qin, 1999).

After the decomposition, the variable space is divided into twoorthogonal subspaces: the principal components subspace (PCS),Sp ¼ spanfPg, and the residual subspace (RS), Sr ¼ spanfI�PPT

g.Then, a sample vector xk can be projected onto the PCS and RS,respectively:

xk ¼ xkþ ~xk ð3Þ

xk ¼ PPT xk � CxkASp ð4Þ

~xk ¼ ðI�PPTÞxk ¼ ðI�CÞxkASr ð5Þ

Note that xTk~xk ¼ 0, dimðSpÞ ¼ A and dimðSrÞ ¼m�A.

G. Li et al. / Control Engineering Practice 18 (2010) 1211–1219 1213

3.2. Fault detection indices

Under the normal operation condition, the variable correlationdoes not change and the main variation occurs in the PCS. When afault occurs or an abnormal situation takes place, variablecorrelation varies, resulting in an increase of the sampleprojection onto RS. A typical statistic for detecting abnormalconditions is squared prediction error (SPE):

SPEðxkÞ � J ~xkJ2¼ JðI�CÞxkJ

2ð6Þ

The process is considered normal if

SPErd2a ð7Þ

where d2a is the upper control limit for SPE with confidence level a.

Jackson and Mudholkar (1979) developed an expression of d2a,

which found wide acceptance.Some other fault detection indices, such as Hotelling’s T2, have

been proposed and summarized to a unified form of the quadraticindices (Qin, 2003). In many cases, SPE index is preferred ratherthan other indices (Dunia & Qin, 1998b). Moreover, similar resultscan be concluded based on other indices. For the remainder of thepaper, fault reconstruction and fault prognosis are performedbased on SPE index.

3.3. Fault estimation via reconstruction

As fk cannot be measured directly, fault reconstruction isneeded. The objective of fault reconstruction is to estimatenormal sample xn

k eliminating the effect of fault F as much aspossible. Let zk denote the reconstruction of xn

k from xk, then zk

can be calculated as follows:

zk ¼ xk�Nfk ð8Þ

The purpose of reconstruction is to find fk such that thereconstructed SPE is minimized:

SPEðzkÞ ¼ J ~zkJ2¼ J ~xk�

~NfkJ2

ð9Þ

where ~N ¼ ðI�CÞN denotes the projection of N onto RS. Theoptimal solution of this problem gives an estimate of fk (Qin,2003):

f k ¼~Nþ~xk ¼

~Nþ

xk ð10Þ

where ~Nþ

represents the Moore–Penrose pseudoinverse of ~N, andf k is the estimate of fk.

There may exist many kinds of faults in a process, denoted byfault direction matrix, Ni ði¼ 1, . . . ,lÞ. Fault identification can beused to find the actual fault i. For the sake of simplicity, after faultidentification, the actual fault direction is denoted by N.

Dunia and Qin (1998b) studied the reconstructability ofmultidimensional fault and pointed out that f k can be uniquelycalculated if and only if the fault can be completely reconstructed.One form of the necessary and sufficient condition for completereconstructability is that ~N has full column rank, which indicatesthat after N is projected onto the RS, its rank does not reduce. Ithas also been proven that the estimate is unbiased and thevariance of estimate error is a constant under this condition.

3.4. An alternative fault detection index

In PCA based methods, SPE index is often preferred as the faultdetection index. However, if the type of fault, i.e. fault directionmatrix, is known beforehand, an alternative fault detection indexbased on fk can be proposed to integrate fault detection andprognosis.

Under normal operation conditions, f¼0. Thus,

Eðf Þ ¼ Eð ~Nþ

x�Þ ¼ 0 ð11Þ

Covðf Þ ¼ Covð ~Nþ

x�Þ ¼ ~Nþ

R ~NþT�Rf ð12Þ

Assuming the sample vector xn follows a multivariate normaldistribution, one can define the Mahalanobis distance of f

Df ¼ fTR�1

f f �sðn2�1Þ

nðn�sÞFs,n�s ð13Þ

where Fs,n�s is an F distribution with s and n�s degrees offreedom, s is the dimension of f, and n is the number of normalsamples. For a given significance level b the process is considerednormal if

Df rDf ,b �sðn2�1Þ

nðn�sÞFs,n�s;b ð14Þ

4. Fault prognosis using wavelets and vector AR models

Two assumptions are listed for fault prognosis discussed in thispaper:

Assumption 1. The fault process is a slowly time-varyingautocorrelated process.

Assumption 2. The fault can be completely reconstructed.

If those conditions are satisfied, fault estimate f k can beobtained, reflecting how serious the fault is currently. Then, avector AR model is applied to predict f k in this section.

4.1. Wavelet based denoising technology

According to Assumption 1, the fault information is locatedmainly in the low-frequency part of f k, while the high-frequencypart generally represents the noise and/or unknown disturbance.For fault evaluation purposes, it is necessary to extract the lowerfrequency trend from the initial fault estimation.

There are many signal denoising techniques in the time–frequency analysis area. Among these methods, wavelets havefound wide use for noise removal in a variety of fields. This isbecause they can represent deterministic features in a smallnumber of large coefficients, while stochastic noise affects allwavelet coefficients according to its power spectrum. Thus, thedeterministic fault trend (in lower frequency) and stochastic noiseor disturbance (in higher frequency) can be separated by anappropriate threshold. Therefore, a wavelet based denoisingtechnology is applied to remove the noise and extract the faultevolution trend in this subsection. However, in most cases,wavelet denoising for real-time signal is actualized via off-lineprocessing. Here, an online wavelet denoising method using amoving window is adopted to denoise each direction of the multi-dimension fault as follows (Xia, Meng, Qian, & Wang, 2007):

½yk�nwþ1ðiÞ, . . . ,ykðiÞ�T ¼ T�1

i HiTi½f k�nwþ1ðiÞ, . . . , f kðiÞ�T ð15Þ

where yk(i) (i¼1,y,s) represents fault evolution trend of the ithdirection, f kðiÞði¼ 1, . . . ,sÞ is the ith element of f k, nw is the lengthof the moving window. Ti is the wavelet coefficient matrix forfault in the ith direction, and Hi is the diagonal filtering matrix forfault in the ith direction.

The low-frequency part of f kðiÞ will be extracted efficiently, ifthe Hi and Ti are chosen properly. However, the wavelet basisfunctions are chosen instead of Hi, Ti in wavelet based denoising,which will be discussed in the first case study. Besides waveletbasis, there are several factors that may affect the wavelet based


denoising, including decomposing depth, window width and soon (Xia et al., 2007). Therefore, the denoising effect changes whenthese parameters have different combination, which may affectthe prediction model.

4.2. Fault prognosis using vector AR models

Vector AR (VAR) models have found wide use in describingautocorrelated multivariate processes. Thus, a VAR model with b

order is used to model the fault process (Deng, 2003):

yk ¼Xb

i ¼ 1

Aiyk�iþek ð16Þ

where ykARs is the fault trend, AiARs�sði¼ 1, . . . ,bÞ are model

parameters. ekARs is the modeling residual, which is oftenassumed to be zero mean and i.i.d. Let

H¼ ½A1, . . . ,Ab�T ð17Þ

uk ¼

yk�1

^

yk�b

0B@

1CA ð18Þ

The multivariate recursive least squares can be used toestimate Y iteratively (Deng, 2003)

Ykþ1 ¼ YkþPkþ1ukþ1½yTkþ1�uT

kþ1Yk�

Pkþ1 ¼ Pk�½Pkukþ1�½Pkukþ1�

T

1þuTkþ1Pkukþ1

8>><>>: ð19Þ

where H0 ¼ 0,P0 ¼ gI,gb0 is an arbitrary positive number.The model order b of the VAR model can be determined by

Akaike’s information criterion (AIC) (De Waele & Broersen, 2003).Then, training set is used to train the model (16) with thealgorithm in (19). Once the one step ahead predictor has beenobtained, multi-step predictions of yk can be calculated in aniterative way:

ykþp ¼ HTukþp ð20Þ

where H is the estimation from training set, uTkþp ¼ ½y

Tkþp�1,

. . . ,yTkþp�b�.

Define the mean square prediction error (MSPE) as follows:

MSPEpðxÞ ¼1

L

XL

k ¼ 1

ðxkþp�xkþpÞ2

ð21Þ

where xk is a one dimensional time series with truncated length L,xkþp represents the p step ahead prediction of x. MSPE can be usedfor evaluating the predictor (20).

Besides VAR models, other modeling approaches can also beused for the prediction of multivariate time series. Choosing aproper predictive model depends on the fault evaluationcharacteristics. For example, it is better to use a VAR model forlinear autocorrelation, a grey model for an exponential trend, andneural networks or support vector machine for a typical nonlinearautocorrelation (Heng, Zhang, Tan, & Mathew, 2009).

Vector AR models are widely used for multivariate time seriesmodeling prediction when the process is autocorrelated linearly.As the fault process is assumed to be slowly varying, it can betreated linear in a short time horizon for prediction. Therefore, itis reasonable to use a vector AR model. Furthermore, themodeling algorithm of vector AR is recursive and adaptable,which has robustness against modeling errors.

4.3. Remaining useful life prediction

When fault process develops, the fault magnitude grows. Thereexists a control limit for fault prognosis, denoted by fmax. Whenthe fault is still tolerable,

JfkJo fmax ð22Þ

The remaining useful life (RUL) at time k is defined using yk + p:

RULðkÞ ¼minfp : JykþpJZ fmaxg ð23Þ

The upper limit fmax depends not only on technology factors, suchas the fault effect on products and reliability of system, but alsoon economical factors, such as repair cost and storage of spareparts. Therefore, instead of using statistical distributions, fmax

should be manually specified on the basis of process knowledge.

4.4. Summary on the approach

The proposed approach can be summarized in Fig. 1. In thefigure, each block represents a procedure of the approach, and thestreams describe the dependence relationship among differentblocks, e.g. PCA model is built based on the historical normal data.Firstly, the historical normal data are used to build a PCA model(‘PCA’ block), which describes the correlations among allmeasured variables under normal operation condition. If somekinds of faulty data are collected beforehand, the residual faultdirections can be extracted for these faults (‘Residual faultdirection’ block). Given a real-time measurement with a knownfault, the fault can be detected based on the prebuilt PCA model(‘Fault detection’ block). The fault directions for known types arecombined to identify the fault type (‘fault identification’ block)and further estimate the fault magnitude (‘fault estimation’block). Subsequently, the multivariate fault estimation isprocessed with noise removal (‘Denoise’ block) and thenpredicted by a vector AR model (‘Fault prediction’ block). Lastly,the RUL can be calculated (‘RUL prediction’ block).

5. Case studies

In this section, two cases studies on CSTR and TEP, respec-tively, are considered to show the effectiveness of the proposedmultivariate fault prognosis approach. The new fault detectionindex and how to choose the parameters in denoising procedureare also considered.

5.1. Case study on CSTR

Firstly, a case study on continuous stirred tank reactor (CSTR)(Zhou & Ye, 2000, p. 312) with feedback control is used to showthe application of fault prognosis in detail.

5.1.1. Process description

The CSTR process can be described by the following group ofdifferential equations:

dCA

dt¼

q

VðCAf�CAÞ�k0exp �

E

RT

� �CAþv1 ð24aÞ

dT

dt¼

q

VðTf�TÞþ

�DH

rCpk0exp �

E

RT

� �CAþ

UA

VrCpðTc�TÞþv2 ð24bÞ

where CA is the outlet concentration, T is the reaction tempera-ture, Tc is the temperature of cooling water, q is the input fluentvelocity of reactant, CAf is the input reactant concentration, Tf isthe input reactant temperature, and v1,v2 are independent systemnoise process, where viðkÞ �Nð0,s2

viÞ. Other variables are the

HistoricalNormal Data PCA Model

HistoricalFaulty Data

ResidualFault

Direction

Faultidentification

Faultestimation

RealtimeMeasurements

DenoiseFaultPrediction

RULPrediction

Faultdetection

Fig. 1. The process chart of fault prognosis approach.

Table 1Parameters and conditions in CSTR simulation.

Simulation

parameters�DH¼ 17835:821 J=mol, r¼ 1000 g=L, E/R ¼ 5360 K,

V¼100 L, UA¼ 11950 J=ðmin KÞ, k0¼exp(13.4) min�1,

Cp ¼ 0:239 J=ðg KÞ

Initial conditions q¼100 L/min, Tc¼419 K, Tf¼400 K, CAf¼ 1 mol/L

Nominal values CAn¼0.2 mol/L, Tn¼446 K

Controller

information

K1 ¼ 1,Td ¼ 0:1,TI ¼ 10,K2 ¼ ½5,1;1,2�

Noise conditions s2v1 ¼ 0:01, s2

v2 ¼ 0:01, s2e ðTÞ ¼ 0:005, s2

e ðCAÞ ¼ 1e�5,

s2e ðqÞ ¼ s2

e ðTcÞ ¼ 1e�6

Time (min)

CA (m

ol /

L)

0 500 1000 0 500 1000Time (min)

T (K

)0 500 1000

0

−0.2

0.2

0.4

0.6

0

50

100

150

Time (min)

q (L

/ m

in)

0 500 1000

430

440

450

460

200

300

400

500

Time (min) T

C (K

)

Fig. 2. Measurements of process variables under the hidden fault process.

0 200 400 600 800 1000time (min)

SP

E

detection point:455

SPESPE control limit

0 200 400 600 800 1000

0

1

2

3

0

200

400

600

800

time (min)

Df

detection point:431

DfDf control limit

Fig. 3. Fault detection by SPE and Df.


simulation parameters, which are constant during the process. Inthe simulation, CA,T are the controlled objective with nominalvalues, and Tc,q are chosen as controllable variables with feedbackfrom control errors. It is assumed that there is a slowly increasingfault process in CAf, and CAf,Tf are not measurable. The observedprocess variables affected by the fault process are CA,T,q and Tc.Measurement noise is added to measurable variables:xmeas(k)¼x(k) +e(k), where xmeas means the measurement andeðkÞ �Nð0,s2

e Þ. The negative feedback inputs are added to [q,Tc]T

with the transfer function form of PID controller asK2ðK1þTdsþTI1=sÞe, where e¼ ½CA�C�A,T�T��T is the residualvector between measurements and nominal values. All the systemparameters and conditions are listed in Table 1.

5.1.2. Fault detection using SPE and Df

After mean-centering all variables and scaling all variablesinto unit variance, a PCA model is built with 1000 normalobservations. Based on the cumulative percent variance (CPV)criterion (Valle et al., 1999), three principal components are kept,i.e. A¼3. One thousand faulty observations under the conditionCAf¼1.2 mol/L are used to extract the residual fault directionmatrix ~N according to Valle et al. (2001). As there is only onefreedom in subspace Sr, the fault dimension is one, i.e. s¼1. Andthe reduced fault direction vector (i.e. ~N) is listed below:

~N ¼ ½�0:0189,�0:0048,�0:7052,0:7088�T

A set of faulty data consisting of 1000 samples with thefollowing hidden fault process are used for fault prognosis:

CAf ðkÞ ¼ 1 mol=L, ko300

CAf ðkÞ ¼ 1þðk�300Þ=1500 mol=L, k4 ¼ 300

(ð25Þ

Measured process variables affected by the hidden faultprocess, Eq. (25), are shown plotted in Fig. 2. Using PCA basedmethods, the fault can be detected by SPE index. Given fault

direction N, the fault detection can also be performed with Df.Fig. 3 shows the fault detection results using SPE and Df,respectively. The control limit for SPE index is given by Jacksonand Mudholkar (1979), while that for Df is calculated by (14).

0 200 400 600 800 1000−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

time (min)

faul

t mag

nitu

de

start point:455 fkyk

fmax

Fig. 4. Reconstruction based fault estimation and wavelet based denoising.

400 500 600 700 800 900 1000−8

−6

−4

−2

0

2

4

6

8

10

12x 10−3

time (min)

pred

ictio

n er

ror

training samplestesting samples

Fig. 5. Prediction error for training and testing samples.


From Fig. 3, the detection times given by them are 455 and 431,respectively, which indicates that Df may detect the fault processmore sensitively than the SPE index. This is because moreinformation of fault, i.e. the fault direction, is used in Df.Furthermore, the detection index Df is in the form of thefunction of fk, which integrates the fault detection and prognosistogether. To sum up, when the fault type can be predetermined,and the fault direction is known beforehand, index Df is morepreferred. Otherwise, SPE index is preferred, and a diagnosisprocedure is required before performing fault prognosis.

There is a delay of about 150 min for detection, as the faultprocess starts at 300 min. This is because the CSTR process is adynamic process, and the fault process varies very slowly. Whenthe fault is just introduced, the fault effect on process is too smallto change SPE or Df. When the fault grows larger, the effect can bedetected and then predicted.

5.1.3. Reconstruction based fault estimation

The fault vector is estimated based on reconstruction, asshown in Fig. 4. When the fault is detected by SPE or Df, the faultprognosis starts. The prognosis control limit fmax, which is setusing expert knowledge, determines when to end the prognosis.In this simulation, fmax is chosen as 1.45, to facilitate thepresentation of RUL prediction.

5.1.4. Wavelet based denoising

The denoising procedure is implemented to the fault estima-tion after the fault is detected, as shown in Fig. 4. In wavelet basedonline denoising, the discrete wavelet transform is performedinside moving window with length nw, which is also called Mallatalgorithm. The data are first decomposed into nd layers, comput-ing the scaling coefficients and wavelet coefficients for each layer.Then, wavelets coefficients are shrunk with the soft shrinkage ofuniversal threshold. Lastly, the signal is reconstructed withprocessed wavelet coefficients and scaling coefficients.

The main parameters in this algorithm are nw, nd and waveletbasis function. In general, there are several wavelet basis that aresuitable for denoising, such as Haar wavelets, Daubechieswavelets (dbN) and Symlet wavelets (symN). Selecting the properwavelet basis depends on the noise property. Further, accordingto application condition of wavelet multi-resolution algorithm, nw

is preferred to have the form 2LðL4ndÞ. In this subsection, Haar

wavelet function is used, and nd¼4, nw¼64. Parameter selection

of wavelet basis, nd and nw will be studied in the end of thissimulation.

5.1.5. Fault prognosis using VAR models

A model order of 3 was determined using AIC. An autore-gressive model with order 3 is built from about 300 samples of yk,while about 200 other samples of yk are used for prognosis. A one-step predictor is identified by the MRLS algorithm:

yk ¼ 1:438yk�1�0:143yk�2�0:295yk�3 ð26Þ

The prediction error for training and testing samples is plottedin Fig. 5. The MSPE of testing data using this model is 1.37e�5.Using one step ahead predictor (26) to predict yk +p iteratively, theRUL is predicted online. Fig. 6 depicts the RUL prediction when thefault grows around the prognosis control limit. Dashed linerepresents the real RUL, which is calculated by subtracting currenttime from the time when the fault reaches the prognosis controllimit. The maximum of prediction steps is set to 50, which shouldbe determined by an expert in practice. When the fault growsnearly to the control limit, the predicted RUL drops under 50steps, which gives an early warning. From Fig. 6, RUL prediction is

800 850 900 950 10000

10

20

30

40

50

60

70

80

time (min)

RU

L (m

in)

predicted RULreal RUL

Fig. 6. RUL prediction result.

Table 2MSPE based on wavelet denoising with different parameters.

Parameters MSPE1 of yk Model order b Time cost (s)

sym8, N¼3, nw¼32 1.98e�4 3 2.000

sym8, N¼3, nw¼64 1.97e�4 3 2.063

sym8, N¼3, nw¼128 1.99e�4 3 2.062

sym8, N¼4, nw¼64 4.10e�5 3 2.328

sym8, N¼5, nw¼64 3.26e�5 3 2.640

sym8, N¼6, nw¼64 2.64e�5 3 2.969

sym4, N¼4, nw¼64 6.25e�5 3 2.313

db4, N¼4, nw¼64 7.79e�5 3 2.281

db8, N¼4, nw¼64 1.09e�4 3 2.406

haar, N¼4, nw¼64 1.37e�5 4 1.937

0 100 200 300 400

SP

E

sample index0 100 200 300 400

RS

PE

1

sample index

0 100 200 300 400

0

20

40

60

80

100

0

2

4

6

8

sample index

RS

PE

2

0 100 200 300 400

0

5

10

15

20

0

0.2

0.4

0.6

0.8

sample index

RS

PE

3

Fig. 7. SPE and reconstructed SPE with different fault dimensions.

0 200 400 600 800 1000sample index

SP

E

detection point:202

0 200 400 600 800 1000

0

20

40

60

80

100

0

1000

2000

3000

sample index

Df

detection point:203

SPESPE control limit

DfDf control limit

Fig. 8. Fault detection for TEP using SPE and Df index.


effective, which helps to prepare the spares beforehand, andimprove the system safety significantly.

5.1.6. Parameter selection in denoising procedure

In this subsection, the parameter selection in wavelet baseddenoising is studied based on MSPE. Table 2 lists the MSPE of yk

using wavelet based denoising with different parameters, wheremodel order means the order of VAR model, time cost means howlong the denoising procedure takes. From Table 2, it is observedthat when the decomposition depth N increases, MSPE decreasesand the time cost raises. It is also seen that window depth nw hasa limited impact on the ultimate VAR modeling. Different kinds ofwavelet basis are compared, and haar wavelet is more proper inthis simulation.

5.2. Case study on TEP

In this subsection, the effectiveness of the proposed methodson multidimensional faults is investigated by application to theTennessee Eastman process (TEP).

5.2.1. Process description

TEP was created by the Eastman Chemical Company to providea realistic industrial process for evaluating process control andmonitoring methods (Downs & Vogel, 1993). The detaileddescription of the process can be found in Chiang, Russell, andBraatz (2001). TEP has been widely used as a benchmark processfor evaluating the process monitoring methods such as PCA, PLS,

and Fisher discriminant analysis (FDA) (Chiang, Russell, & Braatz,2000; Lee, Han, & Yoon, 2004). TEP contains two blocks ofvariables: the XMV block of 12 manipulated variables and XMEASblock of 41 measured variables. Process measurements aresampled with interval of 3 min. Nineteen composition measure-ments are sampled with time delays that vary from 6 to 15 min,which are not included in process variables. There are 15 knownfaults in TEP, where fault 13 is a slow drift in the reaction kinetics,which is suitable for fault prognosis.

5.2.2. Fault subspace extraction and estimation

In this study, 22 process measurements and 11 manipulatedvariables, i.e. XMEAS(1–22) and XMV(1–11), are chosen as X.Firstly, a PCA model is built from 480 normal samples. Thesamples are centered to zero mean and scaled to unit variance.Based on the cumulative percent variance captured by PCA model,the principal components number is selected as 23. Then, the faultsubspace is extracted using 480 faulty samples under fault 13.According to Valle et al. (2001), the dimension of fault vector is 3.The reconstructed SPE data for historical faulty samples withdifferent number of fault dimension are plotted in Fig. 7. It is

0 200 400 600 800 10000

5

10

|f k|0 200 400 600 800 1000

−10

0

10

f k(1)

0 200 400 600 800 1000−10

0

10f k(2

)

0 200 400 600 800 1000−5

0

5

f k(3)

sample index

Fig. 9. Fault estimation: magnitude and each dimension.

500 550 600 650 700 750 800 850 900 950 1000

1−step and 10−step ahead fault prediction

|f k|

500 550 600 650 700 750 800 850 900 950 1000

f k(1)

500 550 600 650 700 750 800 850 900 950 1000

f k(2)

500 550 600 650 700 750 800 850 900 950 1000

0

5

10

−505

10

−10−5

05

−505

10

f k(3)

sample index

Fig. 10. Fault prediction using a VAR model directly. Solid line represents fk,

dotted line represents f kþ1, dashed line represents f kþ10.

500 550 600 650 700 750 800 850 900 950 10000

5

101−step and 10−step ahead fault prediction

|yk|

500 550 600 650 700 750 800 850 900 950 1000−5

05

10

y k(1

)

500 550 600 650 700 750 800 850 900 950 1000−10−5

05

y k(2

)

500 550 600 650 700 750 800 850 900 950 1000−5

0

5

y k(3

)

sample index

Fig. 11. Fault prediction based on wavelet based denoising. Solid line represents

yk, dotted line represents ykþ1, dashed line represents ykþ10.


observed that the reconstructed SPE data with 3 dimension faultdirections drop below the fault detection limit. The reduced faultdirection matrix ~NAR33�3.

Following that, a set of 960 testing samples under fault 13 isused for prognosis. In the simulation, the fault is introduced in the160th sample. The fault is detected at sample 202 and 203 usingSPE and Df, respectively, as shown in Fig. 8. There is a delay ofabout 40 samples, for the similar reason to the CSTR case. Afterthe fault is detected, the fault vector fk is estimated based on faultreconstruction. The magnitude and each dimension of f k areplotted in Fig. 9.

5.2.3. Fault prediction using wavelets and VAR models

A VAR model is used to predict fault vector f kAR3 directly. Themodel order b is one according to AIC. The estimated model

parameter is

A1 ¼

0:9879 0:0150 �0:0483

0:0168 0:9170 0:0265

0:0447 �0:0791 1:0042

264

375

Fig. 10 describes the one step ahead and 10 steps ahead predictionresult based on the direct prognosis. To reduce the predictionerror, a wavelet denoising procedure is adopted. Similar to theCSTR case study, the Haar wavelet is used, and the decompositionlayers is set to three, nw¼64. Then a VAR model with p¼2 is usedfor prognosis. The estimated model parameter is

½A1,A2� ¼

1:7849 �0:0041 0:1016 �0:7956 0:0210 �0:1126

0:1849 1:7159 �0:0877 �0:1780 �0:7470 0:1039

�0:0502 �0:1870 1:8959 0:0542 0:1833 �0:9001

264

375

Table 3Mean square prediction error in different cases.

MSPE 1st dim 2nd dim 3rd dim

MSPE1 of f k0.3359 0.2914 0.1645

MSPE10 of f k5.3513 4.4192 5.5680

MSPE1 of yk 0.0750 0.0934 0.0155

MSPE10 of yk 3.0318 2.8529 0.7090


Fig. 11 describes the one step ahead and 10 steps ahead predictionof the denoised fault trend yk. The prediction error by the directprognosis and denoising based prognosis is listed in Table 3. It isobserved that wavelet denoising can reduce the prediction errorefficiently, which is significant for multi-step prediction.

6. Conclusions

This paper considers a multivariate fault prognosis method forcontinuous processes with a hidden fault process. The methoduses the fault description widely accepted in statistical processmonitoring, which can represent many types of faults. Theassumptions of fault reconstructability are given for multidimen-sional faults. Fault magnitude can be estimated based onreconstruction and a vector AR model with wavelet baseddenoising is used. Given the fault direction, a new fault index isproposed to integrate detection and prognosis, which is observedto have the same sensitiveness to the fault as the SPE index. Thecase studies on a CSTR and the Tennessee Eastman processdemonstrate the effectiveness of the proposed approaches.Parameters in denoising procedure may affect the predictionmodel, which should be chosen properly. The concept ofremaining useful life prediction is shown clearly in the paper.

Acknowledgements

This work was supported by National 973 project under Grants2010CB731800 and 2009CB320602 and the NSFC under Grants60721003, 60736026 and 60931160440. S. Joe Qin acknowledgesthe financial support from the Changjiang Professorship by theMinistry of Education of PR China.

References

Chelidze, D., & Cusumano, J. (2004). A dynamical systems approach to failureprognosis. Journal of Vibration and Acoustics, 126, 2–8.

Chen, G., & McAvoy, T. (1998). Predictive on-line monitoring of continuousprocesses. Journal of Process Control, 8, 409–420.

Chiang, L. H., Russell, E., & Braatz, R. D. (2001). Fault detection and diagnosis inindustrial systems. London: Springer.

Chiang, L. H., Russell, E. L., & Braatz, R. D. (2000). Fault diagnosis in chemicalprocesses using Fisher discriminant analysis, discriminant partial leastsquares, and principal component analysis. Chemometrics and IntelligentLaboratory Systems, 50(2), 243–252.

De Waele, S., & Broersen, P. (2003). Order selection for vector autoregressivemodels. IEEE Transactions on Signal Processing, 51(2), 427–433.

Deng, Z. L. (2003). Self-tuning filtering theory with applications modern time seriesanalysis method. Harbin: Press of Harbin Institute of Technology.

Downs, J. J., & Vogel, E. F. (1993). A plant-wide industrial process control problem.Computers and Chemical Engineering, 17(3), 245–255.

Dunia, R., & Qin, S. J. (1998a). Joint diagnosis of process and sensor faults usingprincipal component analysis. Control Engineering Practice, 6(4), 457–469.

Dunia, R., & Qin, S. J. (1998b). Subspace approach to multidimensional identifica-tion and reconstruction. AIChE Journal, 44(8), 1813–1831.

Heng, A., Zhang, S., Tan, A., & Mathew, J. (2009). Rotating machinery prognostics:State of the art, challenges and opportunities. Mechanical Systems and SignalProcessing, 23(3), 724–739.

Jackson, J. E., & Mudholkar, G. S. (1979). Control procedures for residualsassociated with principal component analysis. Technometrics, 21(3), 341–349.

Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnosticsand prognostics implementing condition-based maintenance. MechanicalSystems and Signal Processing, 20(7), 1483–1510.

Juricek, B. C., Seborg, D. E., & Larimore, W. E. (2001). Predictive monitoring forabnormal situation management. Journal of Process Control, 11(2), 111–128.

Kothamasu, R., Huang, S. H., & VerDuin, W. H. (2006). System health monitoringand prognostics—A review of current paradigms and practices. The Interna-tional Journal of Advanced Manufacturing Technology, 28(9), 1012–1024.

Labeau, P. E., Smidts, C., & Swaminathan, S. (2000). Dynamic reliability: Towardsan integrated platform for probabilistic risk assessment. Reliability Engineeringand System Safety, 68(3), 219–254.

Lee, G., Han, C. H., & Yoon, E. S. (2004). Multiple-fault diagnosis of the TennesseeEastman process based on system decomposition and dynamic PLS. Industrialand Engineering Chemistry Research, 43(25), 8037–8048.

Li, Y., Billington, S., Zhang, C., Kurfess, T., Danyluk, S., & Liang, S. (1999). Adaptiveprognostics for rolling element bearing condition. Mechanical Systems andSignal Processing, 13(1), 103–113.

Li, Y., Kurfess, T. R., & Liang, S. Y. (2000). Stochastic prognostics for rolling elementbearings. Mechanical Systems and Signal Processing, 14(5), 747–762.

Oppenheimer, C. H., & Loparo, K. A. (2002). Physically based diagnosis and prognosisof cracked rotor shafts. In Component and systems diagnostics, prognosticsand health management II, proceedings of SPIE, Vol. 4733 (pp. 122–132).Bellingham: SPIE.

Phelps, E., Willett, P., Kirubarajan, T., & Brideau, C. (2007). Predicting time to failureusing the IMM and excitable tests. IEEE Transactions on Systems, Man andCybernetics, Part A, 37(5), 630–642.

Qin, S. J. (2003). Statistical process monitoring: Basics and beyond. Journal ofChemometrics, 17(8–9), 480–502.

Qiu, J., Seth, B. B., Liang, S. Y., & Zhang, C. (2002). Damage mechanics approach forbearing lifetime prognostics. Mechanical Systems and Signal Processing, 16(5),817–829.

Saha, T. K. (2003). Review of modern diagnostic techniques for assessing insulationcondition in aged transformers. IEEE Transactions on Dielectrics and ElectricalInsulation, 10(5), 903–917.

Valle, S., Li, W., & Qin, S. (1999). Selection of the number of principal components:The variance of the reconstruction error criterion with a comparison to othermethods. Industrial and Engineering Chemistry Research, 38(11), 4389–4401.

Valle, S., Qin, S.J., Piovoso, M.J. (2001). Extracting fault subspaces for faultidentification of a polyester film process. In: American control conference, Vol. 6(pp. 4466–4471).

Wang, P., & Vachtsevanos, G. (2002). Fault prognostics using dynamic waveletneural networks. AI EDAM, 15(04), 349–365.

Wang, W. Q., Golnaraghi, M. F., & Ismail, F. (2004). Prognosis of machine healthcondition using neuro-fuzzy systems. Mechanical Systems and Signal Processing,18(4), 813–831.

Xia, R., Meng, K., Qian, F., & Wang, Z. L. (2007). Online wavelet denoising via amoving window. Acta Automatica Sinica, 33(9), 897–901.

Xu, Z. G., Ji, Y. D., & Zhou, D. H. (2008). Real-time reliability prediction for adynamic system based on the hidden degradation process identification. IEEETransactions on Reliability, 57(2), 230–242.

Yoon, S., & MacGregor, J. F. (2001). Fault diagnosis with multivariate statisticalmodels part I: Using steady state fault signatures. Journal of Process Control,11(4), 387–400.

Yue, H. H., & Qin, S. J. (2001). Reconstruction-based fault identification using acombined index. Industrial and Engineering Chemistry Research, 40(20),4403–4414.

Zhang, S., & Ganesan, R. (1997). Multivariable trend analysis using neural networksfor intelligent diagnostics of rotating machinery. Journal of Engineering for GasTurbines and Power, 119, 378–384.

Zhang, X., Xu, R., Kwan, C., Liang, S. Y., Xie, Q., & Haynes, L. (2005). An integratedapproach to bearing fault diagnostics and prognostics. In Proceedings of the2005 American Control Conference, 2005 (pp. 2750–2755).

Zhou, D. H., & Ye, Y. Z. (2000). Modern fault diagnosis and fault tolerant control.Beijing: Tsinghua University Press.

Documents

Reconstruction based fault prognosis for continuous processes