Upload
zineb
View
212
Download
0
Embed Size (px)
Citation preview
Abstract—This paper presents a detection and diagnosis
fault based on Neural Non Linear Principal Component
Analysis (NNLPCA) and a Partial Least Square (PLS). This
method is applied on a manufactured system, and the NNLPCA
approach is used to estimate the non linear component. This
NNLPCA model helps to estimate the prediction error and to
define data classes with and without faults. The classes
associated to data with faults are isolated by applying a PLS-2.
Detecting faults is realized by SPE (square prediction error)
statistics method, while locating them is realized by calculating
contributions.
Keywords—Fault diagnosis, Neural Principal Component
Analysis, Partial Least Square, PLS-2, NIPALS algorithm.
I. INTRODUCTION
The aim of a cigarette manufacturing process is to control
and respect the interval constraint on the weight of the
manufactured unit.
In fact, from a quality point of view, a very heavy
cigarette has a difficult pulling and a light cigarette gives
consumers the impression to be harmed presenting less
garnished ends which empty easily. From a cost point of
view, tobacco excess in a cigarette is considered as a loss
and can cause stops resulted from stuffing, as in making
pudding. It is known that each stop of the machine
systematically generates a certain quantity of rejection
(cigarettes during formation are discarded). Usually, a
manufacturing system may have several kinds of sensors,
but information processing and decision making using the
data acquired by all the sensors are very difficult problems
for a cigarette manufacturing process. Sensor defaults
(biases or drifts) can affect the cigarette manufacturing
process comportment until damaging the production.
A cigarette manufacturing process is a non linear, non
stationary and multi variable complex system. For this kind
of industrial process, the precise mathematical model may be
difficult to obtain due to the complexity and the high
dimensionality of the process.
To get this objective, it is possible to consider the
modeling approaches based on the data driving technique
such as the Principal Component Analysis (PCA), linear or
non linear [14]. PCA is a statistical method which uses linear
correlations while reducing the variable dimension.
The PCA in summer often used for its capacity to capture
the linear relations between the variables of the system to the
still state. However this approach shows limitations to treat
data under forms of measures on industrial processes which
present generally nonlinear characteristics.
Often by the use of the linear PCA we cannot indicate or
the not linear part of variables implied in the data. For it we
need methods with nonlinear PCA to exploit this non
linearity between the parameters for the surveillance of the
system.
Applying PCA, for detecting and locating defects have
particularly been important and widely used to monitor
industrial processes [1], [2]. Most industrial processes have
non linear behaviors.
During the use of linear PCA, significant non linear
information is lost. Non linear PCA is an expansion of linear
PCA. Non Linear Principal Component Analysis (NLPCA)
aims at extracting both linear and non linear relations by
projecting data on curves and surfaces.
NLPCA method based on artificial neural network has
generated an important non linear part during the training
procedure [5].
Therefore the paper proposes a method associates the
NNLPCA with the PLS-2, which is composed of three parts.
Firstly, a data analysis allows isolating normal and abnormal
data clusters (data with and without faults). Secondly, fault
visualization in the principal component space 2-D is carried
out by performing PLS-2. Finally, locating faults is realized
by calculating contribution.
We will begin by a brief review of the PCA method and
the neural networks. Then, our approach will be represented.
The final part of this paper will be reserved for an industrial
application through a real data simulation of producing a
cigarette manufacturing process.
II. NON LINEAR PCA BASED NEURAL NETWORKS
Many NLPCA methods have been proposed. We
distinguish here two methods. The first one is based on
Fault Detection and Localization with Neural Principal Component
Analysis 1Khaled. O, 1Hedi. D, 1Lotfi. N, 1Messaoud. H and 2Zineb. S-abazi.
1Ecole Nationale d’Ingénieurs de Monastir–(ATSI), Route de Kairouan CP 5000, Monastir, Tunisie
2Laboratoire des Sciences pour la Conception, l'Optimisation et la Production de Grenoble – (G-SCOP)
Laboratoire G-SCOP 46, avenue Félix Viallet - 38031 Grenoble Cedex 1 - France [email protected], [email protected], [email protected], [email protected],
18th Mediterranean Conference on Control & AutomationCongress Palace Hotel, Marrakech, MoroccoJune 23-25, 2010
978-1-4244-8092-0/10/$26.00 ©2010 IEEE 880
principal curves proposed by Hastie [3]. However, we
cannot use this approach directly for diagnosis. The second
method of NLPCA is based on a 5-layer neural network
proposed by Kramer [4] and which has recently been used
for detecting faults. We have chosen to use the last method.
The NLPCA is applied by using a five-layer neural
network composed of parallel series layers. The output of a
layer k is the input of the layer 1k + . The network is
composed of five layers [7]: an input layer and an output one
with m neurons. The first layer hidden for coding and the
third one for decoding are based on a non linear transfer
function (sigmoid). The second layer hidden inside a
network is called a bottleneck layer.
In the first hidden layer, the transfer function is the sigmoid
function defined as follows:
1( )
1x
xe
σ =−+
(1)
Fig. 1. Optimal sigmoid five-layer neural network for extracting the non
linear component � .
The sigmoid neural network contains three hidden layers
between the input and output variables. The first hidden
layer is based on a non linear transfer function where you
use the sigmoid function σ [8].The function kjh realizes a
projection of input variables towards the first hidden layer
(coding layer), and is expressed by:
1
mk k kh w x bij ij j
i
σ
= +∑ =
(2)
kijw and k
ijb represent respectively the weights and biases,
which are optimized using a conjugate gradient algorithm.
The weights and biases are adjustable parameters.
Whereas, the function ( )x xϕ = is the identity function of
the second hidden layer (bottleneck layer):
( ) ( )
1
rk k k k k kt F x w h b w h bj ij j j ij j j
i
ϕ= = + = +∑=
(3)
Then, we have a function tjh which projects the outputs t j
towards the last hidden layer:
t t th w t bj j jσ = +
(4)
A final transfer function is the identity function ( )x xϕ =
which projects tjh towards the output �x of dimension m :
� ( )
1
rt t tx H x w h bij j i
j
= = +∑=
(5)
In our application, we have used the multi-PCA to
determine the number of neurons in the bottleneck layer.
The weights and biases are optimized using the gradient
propagation algorithm to minimize the error between the
network input ix and the output � ix :
�( )2
1
nE x xii
i
= −∑=
(6)
Non linear principal component analysis is used to
estimate the output of neural network and to calculate the
non linear principal components.
The choice of the neurons number of the bottleneck layer
is based on reducing the data dimensions by n step from m
to s . The percentage of residual space or the last information
can be less than one step; and this is the principal idea of
multiple PCA (MPCA). For more details, the reader can
consult [9], [10]. In other respects, we are interested in
determining the number of components to conserve or to
retain in the model. Thus, we use the multiple PCA tool.
III. NNLPCA- BASED APPROACH MONITORING
A. Detecting Faults
It can be applied after training a five-layer neural network
with a sigmoid transfer function. Using NNLPCA, detecting
faults can be realized by the quadratic error ( )SPE k (square
prediction error), also known as statistic Q .
2 2( ) ( ) ( ) ( ( ) ( ))
1
nTSPE k e k e k x k x ki i
i
δα= = − >∑=
� (7)
The process is considered wrong at the instant k if:
( ) 2SPE k δα> (8)
where δ is the trust threshold of SPE .
When
1
mi
i j
j l
θ λ= +
= ∑ , 1,2i = and jλ are the proper values of
the covariance matrix∑ .
.
.
.
.
.
.
1mrw
211w
2rjw
311w
411w
4rmw
3jrw
111w
1b
2b
3b
4b
σ
σ
σ
σ
σ
σ
σ
σ
1x�
2x�
mx�
jt
1x
2x
mx
881
2 2,hgα αδ χ= with 2
1
gθ
θ= and
21
2
hθ
θ= (9)
In this paper, we propose in Fig. 2 a new fault detection method based on three steps: data pre-analysis, fault visualization, and fault diagnosis. From this SPE and by using a detection threshold, it is possible to detect a default. We start with the comparison between SPE, and a detection threshold, to separate the data containing normal and abnormal classes of data. The first step of the approach is to visualize the number of classes in the data using the NIPALS algorithm of PLS-2. The NIPALS or PLS-class algorithm has been performed by using the PLS-2 algorithm implemented in PLS-Toolbox [12]. The data are then classified into different classes using k-means clustering. In this work, k-means clustering is used to isolate different classes of data. In the next step, PLS-2 is applied to obtain a clear fault visualization in 2-D PCA space. Finally, faults are diagnosed by contribution plots for fault class.
Fig. 2. Fault visualization, and fault diagnosis method.
B. Fault Visualization
Partial Least Square Discriminant Analysis (PLS-DA) is a technique of dimension reduction to maximize the covariance between the predicted (independent) matrix X and the predicted (dependent) matrix Y of the different variables. When the matrix Y is selected for the PLS-DA, p is the number of classes at fault. For each class at fault there
is 1 2, ,..., pn n n observation respectively for each variable in
the classes 1,2.... pc . The p classes are stocked in the data
matrix X n m×∈ℜ where there are the two methods PLS1
and PLS2 to predict the Y model. The PLS-DA is defined as the PLS-2 regression of X . The prediction of the block Y
n p×∈ℜ into PLS-2 is defined as:
1 0 0 0
1 0 0 0
0 1 0 0
0 1 0 0
0 0 0
0 0 0 1
Y
=
�
� � � � �
�
�
� � � � �
�
� � � � �
� � � � �
� �
� � � � �
�
where each column of Y corresponds to a class. We can then
consider the PLS-DA an approach modeling a set of binary
variables from explanatory X variables. Each Y element
can take 1 or 0. The first element 1n of column 1 of Y is
attributed to 1, which indicates that the line 1 of the matrix is
at fault.
The matrix X is decomposed into the score matrix Tn a×∈ℜ and the loading matrix P m a×∈ℜ plus the remainder
matrix E n m×∈ℜ , where a is the component of PLS-DA.
TX TP E= + (10)
In PLS2, the matrix Y is decomposed by the score matrix Un a×∈ℜ and the loading matrix W p a×∈ℜ plus the remainder
matrix n pF∗ ×∈ℜ
TY UW F∗= + (11)
The estimated matrix Y relates to the matrix X by the
principal components of the matrix T.
TY TBW F= + (12)
where F is the predicted matrix of error. The matrix B is
determined by maximizing the singular value decomposition
(SVD) of F [13].
The NIPALS algorithm is used to extract the predicted
matrix Y. We have the following expression:
ty T B q fi i i ii= + (13)
where niy ∈ℜ is the thi column of Y, n a
iT ×∈ℜ is the score
matrix, a aiB ×∈ℜ is the regression matrix, t a
iq ∈ℜ is the
proper value vector, and nif ∈ ℜ is the predicted error
vector. The estimated proper values and vectors project all
the data with or without fault in the 2-D space of PLS-2. The
PLS-2 discrimination helps us separate between classes with
Normal data Fault data
PLS-2 to predict classes
K-means clustering for
isolated class of fault
Contribution plot for
fault class
Step : 1
11
Pre- analysis
Step : 2
Step : 3 Fault diagnosis
Class visualiztion
882
and without faults.
In this work, k-means clustering is used to isolate different classes of data. K-means clustering can best be described as a partitioning method that partitions the samples in the data set into mutually exclusive clusters [14]. Unlike the hierarchical clustering methods, k-means clustering does not create a tree structure to describe the groupings in the data set, but rather creates a single level of clusters. Compared to hierarchical clustering methods, k-means is more effective for clustering large amounts of data.
C. Locating through Contribution Calculation
The calculation of contributions is an approach used for
locating faults. The variable having the highest contribution
is considered at fault. In the case of SPE , the contribution
( )SPEjcont k of the thj variable at the instant k is defined by
the following equation:
� 2( ) ( ( ) ( ))
SPEcont k x k x kjjj = − (14)
IV. EXPERIMENTS
The diagnosis method exposed previously has been validated in simulation on a real system: a cigarette manufacturing process. Results presented below are organized in four parts. In the first section, we describe the process. In the second section, we determine the number of non linear principal components to be used, in the following section, in the bottleneck layer for neural network training. We present in the fourth section the results obtained for the fault detection, visualization and localization.
A. Process description
The system of transforming and producing cigarettes is
described in a textual and functional manner [11]. The
problem of respecting the interval constraint on the weight
of the manufactured unit is then posed.
The weight of a cigarette is a function of many significant
parameters of quality: the modulus ( )E , the dampness rate
( )DR , the compactness ( )c and the pulling resistance ( )PR
(Fig.3).
For a normal functioning of the process, the value of each
parameter should lie in a given interval like:
-W : The weight of the cigarette with min max,W W W∈ is
expressed in g .
- E : The modulus of the cigarette with min max,E E E∈
is expressed in 3/g m .
- DR : The dampness rate of the cigarette with
min max,DR DR DR∈ .
- c : The compactness of tobacco with min max,c c c∈ .
- PR : The pulling resistance min max,PR PR PR∈ .
- d : Density of tobacco min max,d d d∈ .
Thus, we look for identifying and locating failing sources
of parameter drifts to prevent the negative consequences
which will affect all these factors.
Fig. 3. Tobacco manufacturing system.
B. Determining the Number of non Linear Principal
Components
For the cigarette manufacturing process, we have at our
disposal a data base corresponding to measurements carried
out during three months.
The measurements of the process variables are collected
in a matrix N mX ×∈ℜ . If m is the number of variables
, , , ,DR d PR E W and c are respectively the dampness rate of
tobacco, density, the pulling resistance, the modulus of the
cigarette, the weight and the compactness. If N is the
number of observations for each variable, all data is centred
and reduced and the new data matrix is standardized.
The data covariance matrix is noted:
1.0000 -0.4190 -0.0425 0.3933 0.2901 0.4704
-0.4190 1.0000 -0.1318 -0.5096 -0.1007 -0.2639
-0.0425 -0.1318 1.0000 -0.1157 -0.3397 0.2340
0.3933 -0.5096 -0.1157 1.0000 0.2058 -0.1242
0.2901 -0.1007 -0.3397 0.2058 1.0000 -0.1769
∑=
0.4704 -0.2639 0.2340 -0.1242 -0.1769 1.0000
The matrixes of proper vectors and values are given by:
-0.5593 0.4746 -0.0043 0.3534 0.0777 0.5753
0.2528 0.7097 0.0060 0.3530 -0.1344 -0.5383
0.0287 0.2518 0.6800 -0.3627 0.5839 -0.0308P=
0.4854 0.4048 -0.2143 -0.5122 -0.2416 0.4836
0.2599 -0.1610 0.6802 0.3256 -0.5195 0.2606
0.5651 -0.1342 -0.1703 0.5005 0.5539 0.2765
0.2401 0 0 0 0 0
0 0.4551 0 0 0 0
0 0 0.6371 0 0 0
0 0 0 0.9885 0 λ =
0
0 0 0 0 1.5967 0
0 0 0 0 0 2.0824
c
DR W
PR Cigarette manufacturing
process
E d
883
The dimensions of the reduced data from m to s are
determined by applying MPCA. As long as the number of
stages and the consumed time are more important, the result
is more satisfactory. Thus, we should choose the number of
applying stages.
So, we calculate in the first stage the matrix of proper
values 'λ and the matrix of proper vectors 'P which are
identical to PCA. We have chosen [ ]1 2 3 4 5, , , ,P p p p p p= as
first loading of principal components, and we calculate the
new matrix of output 'X . Then we calculate the matrixes ''λ
and ''P . At the end of the MPCA algorithm, the matrix ''''P
is:
''''
-0.4108 -0.1730
0.0108 -0.1391
P = -0.0744 0.9707
0.6144 -0.0661
-0.6694 -0.0646
Our input matrix of process variables is represented in
Fig. 4.
Fig. 4. The original input data
The output matrix after applying the MPCA algorithm is
illustrated in Fig. 5.
Fig. 5. The data matrix after 6- 5-4 3-2 MPCA
Then, the number of non linear principal components to
retain in the model is 2=� .
C. Neural Network Training
For the network training, we have used the gradient
retropropagation algorithm. After optimizing the number of
neurons, as four neurons, in the first and third hidden layers,
we have had good results of the training of the five-layer
neural network. The selection of training parameters for a
multi-layer neural network is not easy. It is necessary at a
certain moment to redo the training many times to get a good
result at the output. We do the training of a five-layer neural
network until capturing a variance of the input X , which is
the estimated output X̂ . After that, we present in Fig. 6 the
error between the input X and the output X̂ .
Fig. 6. Error between the input X and the estimated output X̂
D. Fault detection
Fig. 7 represents the results of the simulation of a defect
affecting the pulling resistance variable PR from the instant
k .
Fig. 7. Detecting a defect affecting the pulling resistance variable PR
using neural network-based NLPCA.
The evolution of SPE proves the existence of three
regions which are presented in Fig. 7.
E. Fault Visualization
The temporary information at the beginning and the end of
each region lets us project each region separately in the 2D-
PCA space. Although the data are highly dimensional, it is possible to project the fault classes to a low-dimensional
space using dimension-reduction techniques such as PCA.
The within-class and between-class scatter matrices are
calculated similarly by the score plot based on PCA. The k-
means clustering method is used in conjunction with a PCA-
based SPE chart to classify the historical data into normal
and abnormal operating regions.
884
Fig. 8. K-mean classified clusters in the PCA score
Fig. 8 presents the region 2 in the space of non linear
principal components where the class estimation has shown
that the region 2 is isolated. Isolating this predicted-error
associated region proves that it is at default.
F. Localization by Calculating Contributions
The result of applying the analysis of contributions, where
the defect affecting the pulling resistance variable PR
figures, is illustrated in Fig. 9. We have calculated the partial
contribution of class 2 at fault.
Fig. 9. Contributions of different variables in class2
The pulling resistance variable PR , having the highest
contribution than other variables, is considered as the
variable in defect.
V. CONCLUSION
In this paper, we are interested in the detecting and
diagnosis of sensor fault method proposed in [12].
The linear PCA is not adapted to the study of non linear
system. So, we have proposed an approach based on,
NNPCA. The PCA is associated to the neural network. Our
contribution is to extend this method to data processing
which presents non linear behavior.
It is to be noted that the presented data are real
production data of an existing workshop. The proposed
methodology is therefore validated by a large set of data, and
it provides an interesting industrial efficiency for the
considered case study. However, we are going to improve
the quality of the detection by improving the modeling
quality. To do this, we propose to develop a model approach
[6]. The results obtained make it possible to validate the
method which is applied for detecting and localizing the
considered sensor faults.
REFERENCES
[1] L. Erickson, J. Ohansson, N. Kettaneh, and S.Wold, “Multi- and
Megavariate Data Analysis: Principles and Applications,” Umetrics
Academy, 1991..
[2] M. F. Haraket, “ Détection et localisation de Défauts par Analyse en
Composant Principales”, Doctorat de l'Institut National Polytechnique
de Lorraine, 2005.
[3] T. Hastie and W. Stuetzle, “Principal curves”, Journal of the American
Statistical Association, vol. 84, N± 406, pp. 502-516, 1989
[4] A. K. Mark, “Nonlinear Principal Component Analysis Using
Autoassociative Neural Networks,” Aiche journal, vol. 37, N° 2, p.
233-243, 1991.
[5] D. W. Shifei, X. Jia, Xu and L. Zhang, “PCA-Based Elman Neural
Network Algorithm Chen, ” Springer-Verlag Berlin Heidelberg, pp.
315–321, 2008
[6] J. Zhang, E. B. Martin and, A. J. Morris, “Process Monitoring Using
Non-Linear Statistical Techniques, “ Chemical engineering journal,
pp. 181-189, 1997.
[7] C. Zang, and M. Imregun, “Structural Damage Detection using
Artificial Neural Network and Mesured FRF Data Rduced Via
Principal Component Projection, ” Journal of Sound and Vibration,
242(5), p, 813-827, 2001
[8] Z. Haslinda, and T.D.T. Thao, “A Principal Component Approach in
Diagnosing poor Control loop performance,” Proceedings of the
World Congress on Engineering and Computer Science 2007 WCECS
2007, San Francisco October 24-26, 2007.
[9] K. Yeung, and W. Ruzzo, “Principal component analysis for
clustering gene expression data,” Proceedings of the 16th
International
Workshop on Database and Expert Systems Applications (DEXA’05)
IEEE, 17(9), pp.763-774. 2001
[10] H. Geao, W.Hong, J. Cui, and X. Yonghong, “Optimization of
Principal Component Analysis in Feature Extraction international,”
Conference on Mechatronics and Automation August 5- 8, 2007,
Harbin, China. Proceedings of the 2007 IEEE
[11] H. Dhouibi, “Utilisation des réseaux de Pétri à Intervalles pour la
régulation d’une qualité : Application à une manufacture de tabac, ”
Thèse de Doctorat, Ecole Centrale de Lille, France, 2005.
[12] Fault Detection and Diagnosis in industrial Systems, 2nd ed.,
Springer, Chiang.L.H, Russell.E.L and Braatz.R.D. 2001, pp. 71–84.
[13] J. M. Amigo, C. Raven, N. B. Gallagher and R. Bro, “ A comparison
of a common approach to partial least squares-discriminant analysis
and classical least squares in hyperspectral imaging,” International
Journal of Pharmaceutics, 373, 2009, pp. 179-182.
[14] Q. Peter, S. Joe Qin and J. Wang, “A new fault diagnosis using fault
direction in fisher discriminant analysis,” American Institute of
Chemical Engineers Journal, vol. 51, No.2, 2005, pp. 555-571.
885