[IEEE Automation (MED 2010) - Marrakech, Morocco (2010.06.23-2010.06.25)] 18th Mediterranean Conference on Control and Automation, MED'10 - Fault detection and localization with Neural

Abstract—This paper presents a detection and diagnosis

fault based on Neural Non Linear Principal Component

Analysis (NNLPCA) and a Partial Least Square (PLS). This

method is applied on a manufactured system, and the NNLPCA

approach is used to estimate the non linear component. This

NNLPCA model helps to estimate the prediction error and to

define data classes with and without faults. The classes

associated to data with faults are isolated by applying a PLS-2.

Detecting faults is realized by SPE (square prediction error)

statistics method, while locating them is realized by calculating

contributions.

Keywords—Fault diagnosis, Neural Principal Component

Analysis, Partial Least Square, PLS-2, NIPALS algorithm.

I. INTRODUCTION

The aim of a cigarette manufacturing process is to control

and respect the interval constraint on the weight of the

manufactured unit.

In fact, from a quality point of view, a very heavy

cigarette has a difficult pulling and a light cigarette gives

consumers the impression to be harmed presenting less

garnished ends which empty easily. From a cost point of

view, tobacco excess in a cigarette is considered as a loss

and can cause stops resulted from stuffing, as in making

pudding. It is known that each stop of the machine

systematically generates a certain quantity of rejection

(cigarettes during formation are discarded). Usually, a

manufacturing system may have several kinds of sensors,

but information processing and decision making using the

data acquired by all the sensors are very difficult problems

for a cigarette manufacturing process. Sensor defaults

(biases or drifts) can affect the cigarette manufacturing

process comportment until damaging the production.

A cigarette manufacturing process is a non linear, non

stationary and multi variable complex system. For this kind

of industrial process, the precise mathematical model may be

difficult to obtain due to the complexity and the high

dimensionality of the process.

To get this objective, it is possible to consider the

modeling approaches based on the data driving technique

such as the Principal Component Analysis (PCA), linear or

non linear [14]. PCA is a statistical method which uses linear

correlations while reducing the variable dimension.

The PCA in summer often used for its capacity to capture

the linear relations between the variables of the system to the

still state. However this approach shows limitations to treat

data under forms of measures on industrial processes which

present generally nonlinear characteristics.

Often by the use of the linear PCA we cannot indicate or

the not linear part of variables implied in the data. For it we

need methods with nonlinear PCA to exploit this non

linearity between the parameters for the surveillance of the

system.

Applying PCA, for detecting and locating defects have

particularly been important and widely used to monitor

industrial processes [1], [2]. Most industrial processes have

non linear behaviors.

During the use of linear PCA, significant non linear

information is lost. Non linear PCA is an expansion of linear

PCA. Non Linear Principal Component Analysis (NLPCA)

aims at extracting both linear and non linear relations by

projecting data on curves and surfaces.

NLPCA method based on artificial neural network has

generated an important non linear part during the training

procedure [5].

Therefore the paper proposes a method associates the

NNLPCA with the PLS-2, which is composed of three parts.

Firstly, a data analysis allows isolating normal and abnormal

data clusters (data with and without faults). Secondly, fault

visualization in the principal component space 2-D is carried

out by performing PLS-2. Finally, locating faults is realized

by calculating contribution.

We will begin by a brief review of the PCA method and

the neural networks. Then, our approach will be represented.

The final part of this paper will be reserved for an industrial

application through a real data simulation of producing a

cigarette manufacturing process.

II. NON LINEAR PCA BASED NEURAL NETWORKS

Many NLPCA methods have been proposed. We

distinguish here two methods. The first one is based on

Fault Detection and Localization with Neural Principal Component

Analysis 1Khaled. O, 1Hedi. D, 1Lotfi. N, 1Messaoud. H and 2Zineb. S-abazi.

1Ecole Nationale d’Ingénieurs de Monastir–(ATSI), Route de Kairouan CP 5000, Monastir, Tunisie

2Laboratoire des Sciences pour la Conception, l'Optimisation et la Production de Grenoble – (G-SCOP)

Laboratoire G-SCOP 46, avenue Félix Viallet - 38031 Grenoble Cedex 1 - France [email protected], [email protected], [email protected], [email protected],

[email protected]

18th Mediterranean Conference on Control & AutomationCongress Palace Hotel, Marrakech, MoroccoJune 23-25, 2010

978-1-4244-8092-0/10/$26.00 ©2010 IEEE 880

principal curves proposed by Hastie [3]. However, we

cannot use this approach directly for diagnosis. The second

method of NLPCA is based on a 5-layer neural network

proposed by Kramer [4] and which has recently been used

for detecting faults. We have chosen to use the last method.

The NLPCA is applied by using a five-layer neural

network composed of parallel series layers. The output of a

layer k is the input of the layer 1k + . The network is

composed of five layers [7]: an input layer and an output one

with m neurons. The first layer hidden for coding and the

third one for decoding are based on a non linear transfer

function (sigmoid). The second layer hidden inside a

network is called a bottleneck layer.

In the first hidden layer, the transfer function is the sigmoid

function defined as follows:

1( )

1x

xe

σ =−+

(1)

Fig. 1. Optimal sigmoid five-layer neural network for extracting the non

linear component � .

The sigmoid neural network contains three hidden layers

between the input and output variables. The first hidden

layer is based on a non linear transfer function where you

use the sigmoid function σ [8].The function kjh realizes a

projection of input variables towards the first hidden layer

(coding layer), and is expressed by:

1

mk k kh w x bij ij j

i

σ

= +∑ =

(2)

kijw and k

ijb represent respectively the weights and biases,

which are optimized using a conjugate gradient algorithm.

The weights and biases are adjustable parameters.

Whereas, the function ( )x xϕ = is the identity function of

the second hidden layer (bottleneck layer):

( ) ( )

1

rk k k k k kt F x w h b w h bj ij j j ij j j

i

ϕ= = + = +∑=

(3)

Then, we have a function tjh which projects the outputs t j

towards the last hidden layer:

t t th w t bj j jσ = +

(4)

A final transfer function is the identity function ( )x xϕ =

which projects tjh towards the output �x of dimension m :

� ( )

1

rt t tx H x w h bij j i

j

= = +∑=

(5)

In our application, we have used the multi-PCA to

determine the number of neurons in the bottleneck layer.

The weights and biases are optimized using the gradient

propagation algorithm to minimize the error between the

network input ix and the output � ix :

�( )2

1

nE x xii

i

= −∑=

(6)

Non linear principal component analysis is used to

estimate the output of neural network and to calculate the

non linear principal components.

The choice of the neurons number of the bottleneck layer

is based on reducing the data dimensions by n step from m

to s . The percentage of residual space or the last information

can be less than one step; and this is the principal idea of

multiple PCA (MPCA). For more details, the reader can

consult [9], [10]. In other respects, we are interested in

determining the number of components to conserve or to

retain in the model. Thus, we use the multiple PCA tool.

III. NNLPCA- BASED APPROACH MONITORING

A. Detecting Faults

It can be applied after training a five-layer neural network

with a sigmoid transfer function. Using NNLPCA, detecting

faults can be realized by the quadratic error ( )SPE k (square

prediction error), also known as statistic Q .

2 2( ) ( ) ( ) ( ( ) ( ))

1

nTSPE k e k e k x k x ki i

i

δα= = − >∑=

� (7)

The process is considered wrong at the instant k if:

( ) 2SPE k δα> (8)

where δ is the trust threshold of SPE .

When

1

mi

i j

j l

θ λ= +

= ∑ , 1,2i = and jλ are the proper values of

the covariance matrix∑ .

.

.

.

.

.

.

1mrw

211w

2rjw

311w

411w

4rmw

3jrw

111w

1b

2b

3b

4b

σ

σ

σ

σ

σ

σ

σ

σ

1x�

2x�

mx�

jt

1x

2x

mx

881

2 2,hgα αδ χ= with 2

1

gθ

θ= and

21

2

hθ

θ= (9)

In this paper, we propose in Fig. 2 a new fault detection method based on three steps: data pre-analysis, fault visualization, and fault diagnosis. From this SPE and by using a detection threshold, it is possible to detect a default. We start with the comparison between SPE, and a detection threshold, to separate the data containing normal and abnormal classes of data. The first step of the approach is to visualize the number of classes in the data using the NIPALS algorithm of PLS-2. The NIPALS or PLS-class algorithm has been performed by using the PLS-2 algorithm implemented in PLS-Toolbox [12]. The data are then classified into different classes using k-means clustering. In this work, k-means clustering is used to isolate different classes of data. In the next step, PLS-2 is applied to obtain a clear fault visualization in 2-D PCA space. Finally, faults are diagnosed by contribution plots for fault class.

Fig. 2. Fault visualization, and fault diagnosis method.

B. Fault Visualization

Partial Least Square Discriminant Analysis (PLS-DA) is a technique of dimension reduction to maximize the covariance between the predicted (independent) matrix X and the predicted (dependent) matrix Y of the different variables. When the matrix Y is selected for the PLS-DA, p is the number of classes at fault. For each class at fault there

is 1 2, ,..., pn n n observation respectively for each variable in

the classes 1,2.... pc . The p classes are stocked in the data

matrix X n m×∈ℜ where there are the two methods PLS1

and PLS2 to predict the Y model. The PLS-DA is defined as the PLS-2 regression of X . The prediction of the block Y

n p×∈ℜ into PLS-2 is defined as:

1 0 0 0

1 0 0 0

0 1 0 0

0 1 0 0

0 0 0

0 0 0 1

Y

=

�

� � � � �

�

�

� � � � �

�

� � � � �

� � � � �

� �

� � � � �

�

where each column of Y corresponds to a class. We can then

consider the PLS-DA an approach modeling a set of binary

variables from explanatory X variables. Each Y element

can take 1 or 0. The first element 1n of column 1 of Y is

attributed to 1, which indicates that the line 1 of the matrix is

at fault.

The matrix X is decomposed into the score matrix Tn a×∈ℜ and the loading matrix P m a×∈ℜ plus the remainder

matrix E n m×∈ℜ , where a is the component of PLS-DA.

TX TP E= + (10)

In PLS2, the matrix Y is decomposed by the score matrix Un a×∈ℜ and the loading matrix W p a×∈ℜ plus the remainder

matrix n pF∗ ×∈ℜ

TY UW F∗= + (11)

The estimated matrix Y relates to the matrix X by the

principal components of the matrix T.

TY TBW F= + (12)

where F is the predicted matrix of error. The matrix B is

determined by maximizing the singular value decomposition

(SVD) of F [13].

The NIPALS algorithm is used to extract the predicted

matrix Y. We have the following expression:

ty T B q fi i i ii= + (13)

where niy ∈ℜ is the thi column of Y, n a

iT ×∈ℜ is the score

matrix, a aiB ×∈ℜ is the regression matrix, t a

iq ∈ℜ is the

proper value vector, and nif ∈ ℜ is the predicted error

vector. The estimated proper values and vectors project all

the data with or without fault in the 2-D space of PLS-2. The

PLS-2 discrimination helps us separate between classes with

Normal data Fault data

PLS-2 to predict classes

K-means clustering for

isolated class of fault

Contribution plot for

fault class

Step : 1

11

Pre- analysis

Step : 2

Step : 3 Fault diagnosis

Class visualiztion

882

and without faults.

In this work, k-means clustering is used to isolate different classes of data. K-means clustering can best be described as a partitioning method that partitions the samples in the data set into mutually exclusive clusters [14]. Unlike the hierarchical clustering methods, k-means clustering does not create a tree structure to describe the groupings in the data set, but rather creates a single level of clusters. Compared to hierarchical clustering methods, k-means is more effective for clustering large amounts of data.

C. Locating through Contribution Calculation

The calculation of contributions is an approach used for

locating faults. The variable having the highest contribution

is considered at fault. In the case of SPE , the contribution

( )SPEjcont k of the thj variable at the instant k is defined by

the following equation:

� 2( ) ( ( ) ( ))

SPEcont k x k x kjjj = − (14)

IV. EXPERIMENTS

The diagnosis method exposed previously has been validated in simulation on a real system: a cigarette manufacturing process. Results presented below are organized in four parts. In the first section, we describe the process. In the second section, we determine the number of non linear principal components to be used, in the following section, in the bottleneck layer for neural network training. We present in the fourth section the results obtained for the fault detection, visualization and localization.

A. Process description

The system of transforming and producing cigarettes is

described in a textual and functional manner [11]. The

problem of respecting the interval constraint on the weight

of the manufactured unit is then posed.

The weight of a cigarette is a function of many significant

parameters of quality: the modulus ( )E , the dampness rate

( )DR , the compactness ( )c and the pulling resistance ( )PR

(Fig.3).

For a normal functioning of the process, the value of each

parameter should lie in a given interval like:

-W : The weight of the cigarette with min max,W W W∈ is

expressed in g .

- E : The modulus of the cigarette with min max,E E E∈

is expressed in 3/g m .

- DR : The dampness rate of the cigarette with

min max,DR DR DR∈ .

- c : The compactness of tobacco with min max,c c c∈ .

- PR : The pulling resistance min max,PR PR PR∈ .

- d : Density of tobacco min max,d d d∈ .

Thus, we look for identifying and locating failing sources

of parameter drifts to prevent the negative consequences

which will affect all these factors.

Fig. 3. Tobacco manufacturing system.

B. Determining the Number of non Linear Principal

Components

For the cigarette manufacturing process, we have at our

disposal a data base corresponding to measurements carried

out during three months.

The measurements of the process variables are collected

in a matrix N mX ×∈ℜ . If m is the number of variables

, , , ,DR d PR E W and c are respectively the dampness rate of

tobacco, density, the pulling resistance, the modulus of the

cigarette, the weight and the compactness. If N is the

number of observations for each variable, all data is centred

and reduced and the new data matrix is standardized.

The data covariance matrix is noted:

1.0000 -0.4190 -0.0425 0.3933 0.2901 0.4704

-0.4190 1.0000 -0.1318 -0.5096 -0.1007 -0.2639

-0.0425 -0.1318 1.0000 -0.1157 -0.3397 0.2340

0.3933 -0.5096 -0.1157 1.0000 0.2058 -0.1242

0.2901 -0.1007 -0.3397 0.2058 1.0000 -0.1769

∑=

0.4704 -0.2639 0.2340 -0.1242 -0.1769 1.0000

The matrixes of proper vectors and values are given by:

-0.5593 0.4746 -0.0043 0.3534 0.0777 0.5753

0.2528 0.7097 0.0060 0.3530 -0.1344 -0.5383

0.0287 0.2518 0.6800 -0.3627 0.5839 -0.0308P=

0.4854 0.4048 -0.2143 -0.5122 -0.2416 0.4836

0.2599 -0.1610 0.6802 0.3256 -0.5195 0.2606

0.5651 -0.1342 -0.1703 0.5005 0.5539 0.2765

0.2401 0 0 0 0 0

0 0.4551 0 0 0 0

0 0 0.6371 0 0 0

0 0 0 0.9885 0 λ =

0

0 0 0 0 1.5967 0

0 0 0 0 0 2.0824

c

DR W

PR Cigarette manufacturing

process

E d

883

The dimensions of the reduced data from m to s are

determined by applying MPCA. As long as the number of

stages and the consumed time are more important, the result

is more satisfactory. Thus, we should choose the number of

applying stages.

So, we calculate in the first stage the matrix of proper

values 'λ and the matrix of proper vectors 'P which are

identical to PCA. We have chosen [ ]1 2 3 4 5, , , ,P p p p p p= as

first loading of principal components, and we calculate the

new matrix of output 'X . Then we calculate the matrixes ''λ

and ''P . At the end of the MPCA algorithm, the matrix ''''P

is:

''''

-0.4108 -0.1730

0.0108 -0.1391

P = -0.0744 0.9707

0.6144 -0.0661

-0.6694 -0.0646

Our input matrix of process variables is represented in

Fig. 4.

Fig. 4. The original input data

The output matrix after applying the MPCA algorithm is

illustrated in Fig. 5.

Fig. 5. The data matrix after 6- 5-4 3-2 MPCA

Then, the number of non linear principal components to

retain in the model is 2=� .

C. Neural Network Training

For the network training, we have used the gradient

retropropagation algorithm. After optimizing the number of

neurons, as four neurons, in the first and third hidden layers,

we have had good results of the training of the five-layer

neural network. The selection of training parameters for a

multi-layer neural network is not easy. It is necessary at a

certain moment to redo the training many times to get a good

result at the output. We do the training of a five-layer neural

network until capturing a variance of the input X , which is

the estimated output X̂ . After that, we present in Fig. 6 the

error between the input X and the output X̂ .

Fig. 6. Error between the input X and the estimated output X̂

D. Fault detection

Fig. 7 represents the results of the simulation of a defect

affecting the pulling resistance variable PR from the instant

k .

Fig. 7. Detecting a defect affecting the pulling resistance variable PR

using neural network-based NLPCA.

The evolution of SPE proves the existence of three

regions which are presented in Fig. 7.

E. Fault Visualization

The temporary information at the beginning and the end of

each region lets us project each region separately in the 2D-

PCA space. Although the data are highly dimensional, it is possible to project the fault classes to a low-dimensional

space using dimension-reduction techniques such as PCA.

The within-class and between-class scatter matrices are

calculated similarly by the score plot based on PCA. The k-

means clustering method is used in conjunction with a PCA-

based SPE chart to classify the historical data into normal

and abnormal operating regions.

884

Fig. 8. K-mean classified clusters in the PCA score

Fig. 8 presents the region 2 in the space of non linear

principal components where the class estimation has shown

that the region 2 is isolated. Isolating this predicted-error

associated region proves that it is at default.

F. Localization by Calculating Contributions

The result of applying the analysis of contributions, where

the defect affecting the pulling resistance variable PR

figures, is illustrated in Fig. 9. We have calculated the partial

contribution of class 2 at fault.

Fig. 9. Contributions of different variables in class2

The pulling resistance variable PR , having the highest

contribution than other variables, is considered as the

variable in defect.

V. CONCLUSION

In this paper, we are interested in the detecting and

diagnosis of sensor fault method proposed in [12].

The linear PCA is not adapted to the study of non linear

system. So, we have proposed an approach based on,

NNPCA. The PCA is associated to the neural network. Our

contribution is to extend this method to data processing

which presents non linear behavior.

It is to be noted that the presented data are real

production data of an existing workshop. The proposed

methodology is therefore validated by a large set of data, and

it provides an interesting industrial efficiency for the

considered case study. However, we are going to improve

the quality of the detection by improving the modeling

quality. To do this, we propose to develop a model approach

[6]. The results obtained make it possible to validate the

method which is applied for detecting and localizing the

considered sensor faults.

REFERENCES

[1] L. Erickson, J. Ohansson, N. Kettaneh, and S.Wold, “Multi- and

Megavariate Data Analysis: Principles and Applications,” Umetrics

Academy, 1991..

[2] M. F. Haraket, “ Détection et localisation de Défauts par Analyse en

Composant Principales”, Doctorat de l'Institut National Polytechnique

de Lorraine, 2005.

[3] T. Hastie and W. Stuetzle, “Principal curves”, Journal of the American

Statistical Association, vol. 84, N± 406, pp. 502-516, 1989

[4] A. K. Mark, “Nonlinear Principal Component Analysis Using

Autoassociative Neural Networks,” Aiche journal, vol. 37, N° 2, p.

233-243, 1991.

[5] D. W. Shifei, X. Jia, Xu and L. Zhang, “PCA-Based Elman Neural

Network Algorithm Chen, ” Springer-Verlag Berlin Heidelberg, pp.

315–321, 2008

[6] J. Zhang, E. B. Martin and, A. J. Morris, “Process Monitoring Using

Non-Linear Statistical Techniques, “ Chemical engineering journal,

pp. 181-189, 1997.

[7] C. Zang, and M. Imregun, “Structural Damage Detection using

Artificial Neural Network and Mesured FRF Data Rduced Via

Principal Component Projection, ” Journal of Sound and Vibration,

242(5), p, 813-827, 2001

[8] Z. Haslinda, and T.D.T. Thao, “A Principal Component Approach in

Diagnosing poor Control loop performance,” Proceedings of the

World Congress on Engineering and Computer Science 2007 WCECS

2007, San Francisco October 24-26, 2007.

[9] K. Yeung, and W. Ruzzo, “Principal component analysis for

clustering gene expression data,” Proceedings of the 16th

International

Workshop on Database and Expert Systems Applications (DEXA’05)

IEEE, 17(9), pp.763-774. 2001

[10] H. Geao, W.Hong, J. Cui, and X. Yonghong, “Optimization of

Principal Component Analysis in Feature Extraction international,”

Conference on Mechatronics and Automation August 5- 8, 2007,

Harbin, China. Proceedings of the 2007 IEEE

[11] H. Dhouibi, “Utilisation des réseaux de Pétri à Intervalles pour la

régulation d’une qualité : Application à une manufacture de tabac, ”

Thèse de Doctorat, Ecole Centrale de Lille, France, 2005.

[12] Fault Detection and Diagnosis in industrial Systems, 2nd ed.,

Springer, Chiang.L.H, Russell.E.L and Braatz.R.D. 2001, pp. 71–84.

[13] J. M. Amigo, C. Raven, N. B. Gallagher and R. Bro, “ A comparison

of a common approach to partial least squares-discriminant analysis

and classical least squares in hyperspectral imaging,” International

Journal of Pharmaceutics, 373, 2009, pp. 179-182.

[14] Q. Peter, S. Joe Qin and J. Wang, “A new fault diagnosis using fault

direction in fisher discriminant analysis,” American Institute of

Chemical Engineers Journal, vol. 51, No.2, 2005, pp. 555-571.

885

Documents

[IEEE Automation (MED 2010) - Marrakech, Morocco (2010.06.23-2010.06.25)] 18th Mediterranean Conference on Control and Automation, MED'10 - Fault detection and localization with Neural