Upload
emerson-alves
View
49
Download
1
Embed Size (px)
DESCRIPTION
Reliability and Novelty Detection with Multivariate Gaussian
Citation preview
Técnicas de Classificação e Reconhecimento de Padrões (2014/1) 1
Abstract — This paper proposes some techniques to improve
reliability in novelty detection with PCA (Principal Component
Analysis). The approach to using PCA like a tools to detect
novelty in aperiodic times series obtained satisfactory results,
how described in first paper this class. Although the results were
satisfactory, in some curves with smooth novelty, the threshold of
the quadratic error sometimes is very close in curves without
novelty. This problem decreases the reliability in novelty
detection, because in real process curves with smooth novelty
could be useful to predict fault in automation system in the plant.
This kind of no periodic curves has different drawing and doesn’t
have novelty.
Key Words - Reliability, PCA, Threshold.
I. INTRODUCTION
Novelty detection with PCA in no periodic times series
were explored in first paper this class [1]. The method was
considered a tool with good potential to detect novelty in
system that generate aperiodic times series signal. Although
the application of PCA, in novelty detection in aperiodic times
series, have got satisfactory results, in some curves with
smooth novelty, the detection was impaired due threshold of
the quadratic error sometimes is very close in curves without
novelty. Due this fact we can conclude that threshold
definition is very important do improve the reliability in
novelty detection, this issue will be explored in this paper.
To apply PCA in novelty detection, it is necessary to
transformation of the times series of the time domain to phase
domain [2]. After transformation, each variable is considered a
component, and the time defines the position of the
component. Now we can create a component matrix, where
each row is a component and each curve is a sample of the
time series to analyses, like matrix in (1). The time defines
the size of the windows. The size of the window or size of the
components to time series was topic in paper of the Modenesi
[3]. This situation the size the window is no so important in
Paper sent in 2014, may. Emerson Maurício de Almeida Alves é aluno da disciplina “Técnicas de Classificação e Reconhecimento de Padrões” ministrada by Professor Doctor Antônio de Pádua Braga, in Programa de Pós Graduação em Engenharia Elétrica da Universidade Federal de Minas Gerais. Belo Horizonte - MG (e-mail: [email protected]).
this approach, because the size is determined by process, small
time creates overshot and great time creates a delay in load
system.
)1(
)1()...(1(),(
)1()...(1(),(
)1()...(1(),(
222
111
−++
−++
−++
=
Jtxtxtx
Jtxtxtx
Jtxtxtx
M
iii
PCA make a selection of the principal components, which
explain all curves in a defined confidence interval. To detect
novelty with PCA is necessary to create a model with a matrix
that contain just curves without novelty, so PCA will select all
components that explain the curves of the matrix. This
component are the base of the model, so to find a result in this
process is necessary use model to make a reconstruction in
matrix without novelty curves and after in matrix where
contain novelty curves.
The result this method is obtained with calculus of the
quadratic error of the difference between original matrix
without novelty and the both reconstructed matrix. Normally
the error of the reconstruction matrix from original is smaller
than error the reconstruction matrix from matrix with novelty.
With this result that is defined a threshold, then is possible to
detect novelty if result is bigger than threshold. Curves with
small novelty could create a confusion in define the threshold,
because the overlap of the variability, in Figure 1, curve 1 is
easily detected, but curve 3 is not easily because it is smooth
novelty.
Figure 1 – Dosing curves, different novelty degree.
Improvement of the Reliability in Novelty Detection using PCA in No Periodic Time
Series
Emerson Maurício de Almeida Alves, PPGEE, UFMG.
Técnicas de Classificação e Reconhecimento de Padrões (2014/1) 2
The analysis used real data from industrial process, a raw
material load system to silicon furnace. The system is
compound by dosing devices, signal generate by charcoal
dosing device was analyzed. The goal of the load system is
dosing material in conveyor belt to make a wafer with other
materials in stoichiometric balance [4], Figure 2.
Figure 2 – Load system, dosing in conveyor belt [4].
The dosing process is not continuous and periodic, another
situation is that begin weight and amount to dozing can
change in each cycle. The system stores values of the weight
each 0.5 seconds, then using this data is possible get the
curves of dosing system. There is possible realize this
inconstant dynamic of the process by analysis of the graphic in
Figure 3.
Figure 3 – Dosing curves of the process in 8 hour interval.
PCA were satisfactory technique to detect novelty in theses
curves from process, however due dosing device to change
behavior during process, it create variability in the curves that
generate a great uncertainty to define a threshold to detect
novelty. This variability between curves is normal and is part
of the dynamic of the process.
II. METHOD
A. Curve Extraction
In face the problem with definition of the threshold, the
concept to try creating groups with curves with small
variability. Then the idea was find a method to make the
extraction of the curves, which would be possible use the
some metric to join curves with small variability.
Curve extraction is a technique used to try to create a standard curve to reduce the error in residues. In paper about lightning impulse test techniques [4], curve extraction is used in waveforms that produce upward deviations from the central lines of the recorded waveforms in the wave front area and downward deviations from the recorded waveforms in the latter half of the wave tail. This may lead to an error in the isolation of a residual curve and bring about irrational results [5].
The first paper [1], it was tried one to join two groups of the
curves, the criteria to classification was visual, the used group
1 and group 2 to training PCA Figure 4.
Figure 4 – dosing curves, visual classification.
After separation the both groups was tested with PCA. In
first step was determined how many components was enough
to explain all curves without novelty for group 1 and group 2.
In a second step, two curves with novelty were inserted in
both groups. With training groups and defining what accuracy
to explain it, the PCA create an eigenvector with the quantity
of the components. The third step is make a reconstruction
with both groups training curves and curves with novelty, the
method is the analysis of the residues.
The result this method is the sum of the quadratic error of
Técnicas de Classificação e Reconhecimento de Padrões (2014/1) 3
the residues, then this point is important, where with a
threshold is possible to check if the group of the curves has
some curve with novelty.
It was realized the both groups in test with curves with
smooth novelty, the threshold definition was very difficult
because is very close of the values of the curves without
novelty.
B. Linear Regression
How visual classification didn’t have good results, or
improvement in reliability in novelty detection, was tried create a line how base to measure distance for each curve. This line was created with linear regression of average of all curves without novelty detection.
After defined the base line, the idea was to calculate the residues, which was calculated by the distance between each curve and the base line. This distance was calculated in all 40 point or components.
In the last step was tried join curves where residues had small variability.
This method has same problem to determine the threshold to join curve where residues has small variability. So this method was not suitable to solve the problem with threshold in PCA.
C. Nonlinear Least Square
Other attempt was tried use the nonlinear least square to
find a model, which generate a curve close at curves of the industrial process. Then the idea was create a groups with small distances of this curve.
Recent research and new paradigms in mathematics, engineering, and science assume nonlinear signal models of , consisting of a union of subspaces instead of a single subspace. These models have been used in sampling and reconstruction of signals with finite rate of innovation, the Generalized Principle Component Analysis and the subspace segmentation problem in computer vision, and problems related to sparsity, compressed sensing, and dictionary design [6].
Several attempts was made with this method to searches for the best nonlinear model the, but due the inconstancy of the dynamic of the process was very complex. Similar approach in define threshold to join curves with small variability was found.
With this method was aborted the idea the create a groups and trying minimize the disturbance of the variability in PCA, when is testing novelty detection with curves with smooth or small novelty.
D. Multivariate Gaussian Model
Multivariate Gaussian Model was proposed to use the
problem the variability, like tools to detect novelty in curves
with smooth novelty. This variability created a disturbance in
PCA model to detect novelty in curves with smooth novelty,
but when a consider each point like a random variable and
each point have correlation with other the multivariate
Gaussian model could be applied. This concept is detailed in
seminary presented by David in class [7].
In paper about tring to answer the question: “Why the
ubiquitous use and success of the Gaussian distribution law?.
Kiseon Kim and Georgy Shevlyakov [8] concerned with
highlighting the role of Gaussian models in signal processing,
based on the optimal property of the Gaussian distribution
minimizing Fisher information over the class of distributions
with a bounded variance.
The curves without novelty have a window with 40 point, or
components, there is variability in each point. Then each point
is considered like a random variable, and there is correlation
between points. This correlation is due the dozing raw
material, when the system is creating the load in process the
weight in device go decreasing.
The matrix used in PCA (1), is used to create a multivariate
model, where each row is a curve of the process, and each
column is a point, and each column is a variable. The
multivariate model is compound by centroid, which is a media
vector of the 40 variable and a covariance matrix.
In this situation where sometimes the number of the curves
or samples are smaller than number of points, the covariance
matrix was estimate through regularized covariance, that is
more robust to use in situation where the number of the
observations is smaller than points. This make this calculus
was used the package in R [9] “corpcor” [10] and the function
cov.shrink.
The method consist in create Gaussian model with the
matrix with curves without novelty, and then for each sample
calculate it distance at model centroid. Then the distance will
guide about threshold to detect novelty.
To calculus of the distance is used distance of Mahalanobis,
The Mahalanobis distance is a descriptive statistic that
provides a relative measure of a data point's distance (residual)
from a common point. It is a unitless measure introduced by P.
C. Mahalanobis in 1936 [11]. The Mahalanobis distance is
used to identify and gauge similarity of an unknown sample
set to a known one. It differs from Euclidean distance in that it
takes into account the correlations of the data set and is scale-
invariant [12].
The first step is creating a validation leave-one-out, with the
curves without novelty, the goal is calculate the distance of
each curve at centroid model with another other curves. After
this validation, is created a vector with all distances. Figure 5,
has code in R using this concept.
Técnicas de Classificação e Reconhecimento de Padrões (2014/1) 4
Figure 5 – Code leave-one-out in R in multivariate Gaussian model.
The second step is to find a threshold value for a determined
quantile, using the distances calculated in step 1. And then
calculate the Mahalanobis distance of the curve with novelty
at centroid of the Gaussian model. The threshold value is
determined through distribution of the distances of each curve
between model centroid considering the follow confidence
intervals 85%, 95% and 100%. The all calculus was made in R
software. In Figure 6, there is a code with calculus.
Figure 6 – Multivariate Gaussian model and threshold calculus.
Figure 7 shows the graphic with 30 curves without novelty,
this curves are used in multivariate Gaussian model to
calculate the centroid and distances of each sample.
Figure 7 – Curves without novelty to create a multivariate model.
Curves with novelty are showed in graphic in Figure 8. The
line continues is the average of the all other 30 curves without novelty. The curve with circles marks is the curve with more strong novelty, another two curves have smooth novelty.
Figure 8 – Curves with novelty and line average. In Figure 9, the novelty curves is joined with curves without novelty, it is possible to realize that one novelty curve has one spike more strong, another two curves the novelty is smooth. This creates a difficult situation to detect novelty, another detail is the variability, this curves with smooth novelty the rest of the curve stay in center of another curves. When the rest the curve stay in center of all curves, the variability the curves without novelty impaired the detection.
Técnicas de Classificação e Reconhecimento de Padrões (2014/1) 5
Figure 9 – Curves without novelty together with novelty curves.
The distance of the Mahalanobis of the each curve at centroid of the model is showed in graphic in Figure 10, in this same graphics there is a legend with confidence interval, the lines represent the threshold calculated for each quantile defined.
Figure 10 – Mahalanobis distance and threshold defined by quantiles.
III. EXPERIMENT
The experiment the complete group of the curves was made
using PCA and multivariate Gaussian model.
A. Novelty detection with PCA
In experiment with PCA was used function prcomp(m[]) of
R, this package apply PCA in matrix m[], so is possible to
analysis the all components and it variability. Using tha
summary command of the result of the function prcomp(m[])
is possible to realize how many components explain and with
what confidence interval.
The comparison of the sensibility of the PCA in to detect
novelty in curve with smooth novelty was made with a matrix
with 30 curves without novelty and after was removed
randomly one curve without novelty and inserted curve with
smooth novelty. In both matrix was applied PCA. The Figure
11, show the result of the quantities of the components that
explain all curves, due variability 4 components to explain
both matrixes, the difference is the degree that first and second
most important component explain. But this behavior is
complicate to define a threshold.
Figure 11 - Novelty detection with PCA.
B. Novelty Detection with Multivariate Gaussian Model
In multivariate Gaussian model was same matrix with curves without novelty used in PCA detection in topic before. In this method the Gaussian model was created with the matrix with curves without novelty. And then was calculated distances of each curve, and then with all distances was determined the distance threshold by confidence interval. After these steps was calculated the distance of each curve with novelty.
The Figure 12, show the bar graph that represent the distance of each curve with smooth novelty. The red line with mark “t”, express the threshold determined by the confidence interval express in x axis. In this graphics, each bar represent one curve, and in direction of the left to right, increasing the smooth.
Figure 12 – Detection result with Multivariate Gaussian model.
Técnicas de Classificação e Reconhecimento de Padrões (2014/1) 6
IV. CONCLUSION
To improvement of the reliability in novelty detection with
use of PCA, was tried several techniques: curve extraction,
nonlinear least square, residues of linear regression. The goal
these methods were to remove the effect of the variability, that
decrease the capacity of PCA in to detect novelty in curves
with smooth novelty. The assessments these methods were not
so good. Then another methods without use of the PCA was
tried, the method was Multivariate Gaussian Mixture. This
method considered that each component is a random variable,
and there is a correlation with next point or component. This
method using like metric Mahalanobis distance, which had
excellent improvement in reliability in novelty detection.
The curve extraction tried to create groups with curve with
small variability, but the problem was increase the complexity
to joint this curves and another problem was to find a good
method to define a threshold that could detect novelty in all
group created.
The nonlinear least square had goal to create a model curve,
with a model curve, measure the distance between the curve
and model, then generate a residues. Using the square
difference of the residues define a threshold to detect novelty.
This method had a problem to define a equation to
approximate of curves without novelty, but this equation to
express this model was complex, due the great different form
of curves that are considered without novelty. The method was
efficient but the necessity that adjust the model in specific
situation of the process create a complexity that drop this
method to use.
Linear regression was another method to try improve the
threshold to detection, the idea is similar a nonlinear least
square, where the goal is make a regression and generate a line
with average values of the curves without novelty. But this
method didn’t have capacity to decrease sensibility in the
novelty detection in curves with small or smooth novelty.
The multivariable Gaussian mixture method was the method
that had better results, with this method is it possible to choice
in what confidence interval to detect novelty. This method
create a good situation in novelty detection in process, because
there is intermittent fault that generate smooth novelty in
curves, this novelty has different intensity. Then is possible set
the threshold to identify that device, sensor or system begin a
fault mainly in intermittent fault.
The conclusion is specific in this case studied, and the
suggest is to set a threshold and make a statistical analysis of
efficiency in novelty this process, so is possible define to each
problem what threshold to use with base in statistical results.
In this paper was made several analysis to improve
reliability in novelty detection, and conclude that PCA is a
method that has a good result in novelty detection, but when
novelty appear in smooth curves it no so good, because is
difficult to define threshold. Then PCA to smooth novelty is
not recommended.
The nonlinear Gaussian mixture was method that suitable to
detect small or smooth novelty in nonlinear times series
curves, specific in this process. With this method was possible
not just improve the reliability, but also to set a threshold to
detect different kind of the novelty. The conclusion is that this
technique has good perspective to use in industrial process to
detect novelty.
V. BIBLIOGRAPHY
[1] Alves, A. Emerson. Detecção de Novidades com PCA em Séries Temporais não Periódicas. Artigo 1, da disciplina de reconhecimento de padrões. Universidade Federal de Minas Gerais. 04/2014.
[2] Pimentel, A. F. M. Clifton, A. D. A. Tarassenko L. review of novelty detection. 2014. Elsevier. Signal Processing 99 (2014) 215–249
[3] Modenesi, P. André. Braga P Antônio. Analysis of Time Series Novelty
Detection Strategies for Synthetic and Real Data. Neural Process Lett Springer Science+Business Media, LLC. 2009.
[4] Alves, M. A Emerson et al. “Vantagens do Desenvolvimento Interno de Sistemas de Automação e Controle: O caso da Liasa ”. Congresso Brasileiro de Automática. 2012.
[5] S. Okabe et al.: Basic Study of Fitting Method for Base Curve Extraction in Lightning Impulse Test Techniques. 1070-9878/10/$25.00 © 2010 IEEE.
[6] Aldroubi A. Zaringhalam Kourosh. Nonlinear Least Squares in.Acta Springer Science+Business Media B.V. 2009.
[7] Pinto David. Modelos Finitos de Mistura de Gaussiana. Seminário de Reconhecimento de Padrões. PPGEE, UFMG . 2014.
[8] Kiseon Kim and Georgy Shevlyakov. Why Gaussianity?. IEEE Signal Processing Magazine [1053-5888] Kiseon Kim, Kiseon yr:2008 vol:25 iss:2 pg:102 -113.
[9] R free software environment for statistical, version 3.0.2 (Frisbee
Sailing). disponível em: http://www.r-project.org/ [10] Juliane et al. ‘corpcor’ e Efficient Estimation of Covariance and (Partial)
Correlation. 2013.
[11] Mahalanobis, Prasanta Chandra (1936). "On the generalised distance in statistics". Proceedings of the National Institute of Sciences of India 2 (1): 49–55. Retrieved 2012-05-03
[12] Mahalanobis distance. Wikipedia.
http://en.wikipedia.org/wiki/Mahalanobis_distance