Improvement of the Reliability in Novelty Detection using PCA in No Periodic Time Series

Técnicas de Classificação e Reconhecimento de Padrões (2014/1) 1

Abstract — This paper proposes some techniques to improve

reliability in novelty detection with PCA (Principal Component

Analysis). The approach to using PCA like a tools to detect

novelty in aperiodic times series obtained satisfactory results,

how described in first paper this class. Although the results were

satisfactory, in some curves with smooth novelty, the threshold of

the quadratic error sometimes is very close in curves without

novelty. This problem decreases the reliability in novelty

detection, because in real process curves with smooth novelty

could be useful to predict fault in automation system in the plant.

This kind of no periodic curves has different drawing and doesn’t

have novelty.

Key Words - Reliability, PCA, Threshold.

I. INTRODUCTION

Novelty detection with PCA in no periodic times series

were explored in first paper this class [1]. The method was

considered a tool with good potential to detect novelty in

system that generate aperiodic times series signal. Although

the application of PCA, in novelty detection in aperiodic times

series, have got satisfactory results, in some curves with

smooth novelty, the detection was impaired due threshold of

the quadratic error sometimes is very close in curves without

novelty. Due this fact we can conclude that threshold

definition is very important do improve the reliability in

novelty detection, this issue will be explored in this paper.

To apply PCA in novelty detection, it is necessary to

transformation of the times series of the time domain to phase

domain [2]. After transformation, each variable is considered a

component, and the time defines the position of the

component. Now we can create a component matrix, where

each row is a component and each curve is a sample of the

time series to analyses, like matrix in (1). The time defines

the size of the windows. The size of the window or size of the

components to time series was topic in paper of the Modenesi

[3]. This situation the size the window is no so important in

Paper sent in 2014, may. Emerson Maurício de Almeida Alves é aluno da disciplina “Técnicas de Classificação e Reconhecimento de Padrões” ministrada by Professor Doctor Antônio de Pádua Braga, in Programa de Pós Graduação em Engenharia Elétrica da Universidade Federal de Minas Gerais. Belo Horizonte - MG (e-mail: [email protected]).

this approach, because the size is determined by process, small

time creates overshot and great time creates a delay in load

system.

)1(

)1()...(1(),(

)1()...(1(),(

)1()...(1(),(

222

111

−++

−++

−++

=

Jtxtxtx

Jtxtxtx

Jtxtxtx

M

iii

PCA make a selection of the principal components, which

explain all curves in a defined confidence interval. To detect

novelty with PCA is necessary to create a model with a matrix

that contain just curves without novelty, so PCA will select all

components that explain the curves of the matrix. This

component are the base of the model, so to find a result in this

process is necessary use model to make a reconstruction in

matrix without novelty curves and after in matrix where

contain novelty curves.

The result this method is obtained with calculus of the

quadratic error of the difference between original matrix

without novelty and the both reconstructed matrix. Normally

the error of the reconstruction matrix from original is smaller

than error the reconstruction matrix from matrix with novelty.

With this result that is defined a threshold, then is possible to

detect novelty if result is bigger than threshold. Curves with

small novelty could create a confusion in define the threshold,

because the overlap of the variability, in Figure 1, curve 1 is

easily detected, but curve 3 is not easily because it is smooth

novelty.

Figure 1 – Dosing curves, different novelty degree.

Improvement of the Reliability in Novelty Detection using PCA in No Periodic Time

Series

Emerson Maurício de Almeida Alves, PPGEE, UFMG.


The analysis used real data from industrial process, a raw

material load system to silicon furnace. The system is

compound by dosing devices, signal generate by charcoal

dosing device was analyzed. The goal of the load system is

dosing material in conveyor belt to make a wafer with other

materials in stoichiometric balance [4], Figure 2.

Figure 2 – Load system, dosing in conveyor belt [4].

The dosing process is not continuous and periodic, another

situation is that begin weight and amount to dozing can

change in each cycle. The system stores values of the weight

each 0.5 seconds, then using this data is possible get the

curves of dosing system. There is possible realize this

inconstant dynamic of the process by analysis of the graphic in

Figure 3.

Figure 3 – Dosing curves of the process in 8 hour interval.

PCA were satisfactory technique to detect novelty in theses

curves from process, however due dosing device to change

behavior during process, it create variability in the curves that

generate a great uncertainty to define a threshold to detect

novelty. This variability between curves is normal and is part

of the dynamic of the process.

II. METHOD

A. Curve Extraction

In face the problem with definition of the threshold, the

concept to try creating groups with curves with small

variability. Then the idea was find a method to make the

extraction of the curves, which would be possible use the

some metric to join curves with small variability.

Curve extraction is a technique used to try to create a standard curve to reduce the error in residues. In paper about lightning impulse test techniques [4], curve extraction is used in waveforms that produce upward deviations from the central lines of the recorded waveforms in the wave front area and downward deviations from the recorded waveforms in the latter half of the wave tail. This may lead to an error in the isolation of a residual curve and bring about irrational results [5].

The first paper [1], it was tried one to join two groups of the

curves, the criteria to classification was visual, the used group

1 and group 2 to training PCA Figure 4.

Figure 4 – dosing curves, visual classification.

After separation the both groups was tested with PCA. In

first step was determined how many components was enough

to explain all curves without novelty for group 1 and group 2.

In a second step, two curves with novelty were inserted in

both groups. With training groups and defining what accuracy

to explain it, the PCA create an eigenvector with the quantity

of the components. The third step is make a reconstruction

with both groups training curves and curves with novelty, the

method is the analysis of the residues.

The result this method is the sum of the quadratic error of


the residues, then this point is important, where with a

threshold is possible to check if the group of the curves has

some curve with novelty.

It was realized the both groups in test with curves with

smooth novelty, the threshold definition was very difficult

because is very close of the values of the curves without

novelty.

B. Linear Regression

How visual classification didn’t have good results, or

improvement in reliability in novelty detection, was tried create a line how base to measure distance for each curve. This line was created with linear regression of average of all curves without novelty detection.

After defined the base line, the idea was to calculate the residues, which was calculated by the distance between each curve and the base line. This distance was calculated in all 40 point or components.

In the last step was tried join curves where residues had small variability.

This method has same problem to determine the threshold to join curve where residues has small variability. So this method was not suitable to solve the problem with threshold in PCA.

C. Nonlinear Least Square

Other attempt was tried use the nonlinear least square to

find a model, which generate a curve close at curves of the industrial process. Then the idea was create a groups with small distances of this curve.

Recent research and new paradigms in mathematics, engineering, and science assume nonlinear signal models of , consisting of a union of subspaces instead of a single subspace. These models have been used in sampling and reconstruction of signals with finite rate of innovation, the Generalized Principle Component Analysis and the subspace segmentation problem in computer vision, and problems related to sparsity, compressed sensing, and dictionary design [6].

Several attempts was made with this method to searches for the best nonlinear model the, but due the inconstancy of the dynamic of the process was very complex. Similar approach in define threshold to join curves with small variability was found.

With this method was aborted the idea the create a groups and trying minimize the disturbance of the variability in PCA, when is testing novelty detection with curves with smooth or small novelty.

D. Multivariate Gaussian Model

Multivariate Gaussian Model was proposed to use the

problem the variability, like tools to detect novelty in curves

with smooth novelty. This variability created a disturbance in

PCA model to detect novelty in curves with smooth novelty,

but when a consider each point like a random variable and

each point have correlation with other the multivariate

Gaussian model could be applied. This concept is detailed in

seminary presented by David in class [7].

In paper about tring to answer the question: “Why the

ubiquitous use and success of the Gaussian distribution law?.

Kiseon Kim and Georgy Shevlyakov [8] concerned with

highlighting the role of Gaussian models in signal processing,

based on the optimal property of the Gaussian distribution

minimizing Fisher information over the class of distributions

with a bounded variance.

The curves without novelty have a window with 40 point, or

components, there is variability in each point. Then each point

is considered like a random variable, and there is correlation

between points. This correlation is due the dozing raw

material, when the system is creating the load in process the

weight in device go decreasing.

The matrix used in PCA (1), is used to create a multivariate

model, where each row is a curve of the process, and each

column is a point, and each column is a variable. The

multivariate model is compound by centroid, which is a media

vector of the 40 variable and a covariance matrix.

In this situation where sometimes the number of the curves

or samples are smaller than number of points, the covariance

matrix was estimate through regularized covariance, that is

more robust to use in situation where the number of the

observations is smaller than points. This make this calculus

was used the package in R [9] “corpcor” [10] and the function

cov.shrink.

The method consist in create Gaussian model with the

matrix with curves without novelty, and then for each sample

calculate it distance at model centroid. Then the distance will

guide about threshold to detect novelty.

To calculus of the distance is used distance of Mahalanobis,

The Mahalanobis distance is a descriptive statistic that

provides a relative measure of a data point's distance (residual)

from a common point. It is a unitless measure introduced by P.

C. Mahalanobis in 1936 [11]. The Mahalanobis distance is

used to identify and gauge similarity of an unknown sample

set to a known one. It differs from Euclidean distance in that it

takes into account the correlations of the data set and is scale-

invariant [12].

The first step is creating a validation leave-one-out, with the

curves without novelty, the goal is calculate the distance of

each curve at centroid model with another other curves. After

this validation, is created a vector with all distances. Figure 5,

has code in R using this concept.


Figure 5 – Code leave-one-out in R in multivariate Gaussian model.

The second step is to find a threshold value for a determined

quantile, using the distances calculated in step 1. And then

calculate the Mahalanobis distance of the curve with novelty

at centroid of the Gaussian model. The threshold value is

determined through distribution of the distances of each curve

between model centroid considering the follow confidence

intervals 85%, 95% and 100%. The all calculus was made in R

software. In Figure 6, there is a code with calculus.

Figure 6 – Multivariate Gaussian model and threshold calculus.

Figure 7 shows the graphic with 30 curves without novelty,

this curves are used in multivariate Gaussian model to

calculate the centroid and distances of each sample.

Figure 7 – Curves without novelty to create a multivariate model.

Curves with novelty are showed in graphic in Figure 8. The

line continues is the average of the all other 30 curves without novelty. The curve with circles marks is the curve with more strong novelty, another two curves have smooth novelty.

Figure 8 – Curves with novelty and line average. In Figure 9, the novelty curves is joined with curves without novelty, it is possible to realize that one novelty curve has one spike more strong, another two curves the novelty is smooth. This creates a difficult situation to detect novelty, another detail is the variability, this curves with smooth novelty the rest of the curve stay in center of another curves. When the rest the curve stay in center of all curves, the variability the curves without novelty impaired the detection.


Figure 9 – Curves without novelty together with novelty curves.

The distance of the Mahalanobis of the each curve at centroid of the model is showed in graphic in Figure 10, in this same graphics there is a legend with confidence interval, the lines represent the threshold calculated for each quantile defined.

Figure 10 – Mahalanobis distance and threshold defined by quantiles.

III. EXPERIMENT

The experiment the complete group of the curves was made

using PCA and multivariate Gaussian model.

A. Novelty detection with PCA

In experiment with PCA was used function prcomp(m[]) of

R, this package apply PCA in matrix m[], so is possible to

analysis the all components and it variability. Using tha

summary command of the result of the function prcomp(m[])

is possible to realize how many components explain and with

what confidence interval.

The comparison of the sensibility of the PCA in to detect

novelty in curve with smooth novelty was made with a matrix

with 30 curves without novelty and after was removed

randomly one curve without novelty and inserted curve with

smooth novelty. In both matrix was applied PCA. The Figure

11, show the result of the quantities of the components that

explain all curves, due variability 4 components to explain

both matrixes, the difference is the degree that first and second

most important component explain. But this behavior is

complicate to define a threshold.

Figure 11 - Novelty detection with PCA.

B. Novelty Detection with Multivariate Gaussian Model

In multivariate Gaussian model was same matrix with curves without novelty used in PCA detection in topic before. In this method the Gaussian model was created with the matrix with curves without novelty. And then was calculated distances of each curve, and then with all distances was determined the distance threshold by confidence interval. After these steps was calculated the distance of each curve with novelty.

The Figure 12, show the bar graph that represent the distance of each curve with smooth novelty. The red line with mark “t”, express the threshold determined by the confidence interval express in x axis. In this graphics, each bar represent one curve, and in direction of the left to right, increasing the smooth.

Figure 12 – Detection result with Multivariate Gaussian model.


IV. CONCLUSION

To improvement of the reliability in novelty detection with

use of PCA, was tried several techniques: curve extraction,

nonlinear least square, residues of linear regression. The goal

these methods were to remove the effect of the variability, that

decrease the capacity of PCA in to detect novelty in curves

with smooth novelty. The assessments these methods were not

so good. Then another methods without use of the PCA was

tried, the method was Multivariate Gaussian Mixture. This

method considered that each component is a random variable,

and there is a correlation with next point or component. This

method using like metric Mahalanobis distance, which had

excellent improvement in reliability in novelty detection.

The curve extraction tried to create groups with curve with

small variability, but the problem was increase the complexity

to joint this curves and another problem was to find a good

method to define a threshold that could detect novelty in all

group created.

The nonlinear least square had goal to create a model curve,

with a model curve, measure the distance between the curve

and model, then generate a residues. Using the square

difference of the residues define a threshold to detect novelty.

This method had a problem to define a equation to

approximate of curves without novelty, but this equation to

express this model was complex, due the great different form

of curves that are considered without novelty. The method was

efficient but the necessity that adjust the model in specific

situation of the process create a complexity that drop this

method to use.

Linear regression was another method to try improve the

threshold to detection, the idea is similar a nonlinear least

square, where the goal is make a regression and generate a line

with average values of the curves without novelty. But this

method didn’t have capacity to decrease sensibility in the

novelty detection in curves with small or smooth novelty.

The multivariable Gaussian mixture method was the method

that had better results, with this method is it possible to choice

in what confidence interval to detect novelty. This method

create a good situation in novelty detection in process, because

there is intermittent fault that generate smooth novelty in

curves, this novelty has different intensity. Then is possible set

the threshold to identify that device, sensor or system begin a

fault mainly in intermittent fault.

The conclusion is specific in this case studied, and the

suggest is to set a threshold and make a statistical analysis of

efficiency in novelty this process, so is possible define to each

problem what threshold to use with base in statistical results.

In this paper was made several analysis to improve

reliability in novelty detection, and conclude that PCA is a

method that has a good result in novelty detection, but when

novelty appear in smooth curves it no so good, because is

difficult to define threshold. Then PCA to smooth novelty is

not recommended.

The nonlinear Gaussian mixture was method that suitable to

detect small or smooth novelty in nonlinear times series

curves, specific in this process. With this method was possible

not just improve the reliability, but also to set a threshold to

detect different kind of the novelty. The conclusion is that this

technique has good perspective to use in industrial process to

detect novelty.

V. BIBLIOGRAPHY

[1] Alves, A. Emerson. Detecção de Novidades com PCA em Séries Temporais não Periódicas. Artigo 1, da disciplina de reconhecimento de padrões. Universidade Federal de Minas Gerais. 04/2014.

[2] Pimentel, A. F. M. Clifton, A. D. A. Tarassenko L. review of novelty detection. 2014. Elsevier. Signal Processing 99 (2014) 215–249

[3] Modenesi, P. André. Braga P Antônio. Analysis of Time Series Novelty

Detection Strategies for Synthetic and Real Data. Neural Process Lett Springer Science+Business Media, LLC. 2009.

[4] Alves, M. A Emerson et al. “Vantagens do Desenvolvimento Interno de Sistemas de Automação e Controle: O caso da Liasa ”. Congresso Brasileiro de Automática. 2012.

[5] S. Okabe et al.: Basic Study of Fitting Method for Base Curve Extraction in Lightning Impulse Test Techniques. 1070-9878/10/$25.00 © 2010 IEEE.

[6] Aldroubi A. Zaringhalam Kourosh. Nonlinear Least Squares in.Acta Springer Science+Business Media B.V. 2009.

[7] Pinto David. Modelos Finitos de Mistura de Gaussiana. Seminário de Reconhecimento de Padrões. PPGEE, UFMG . 2014.

[8] Kiseon Kim and Georgy Shevlyakov. Why Gaussianity?. IEEE Signal Processing Magazine [1053-5888] Kiseon Kim, Kiseon yr:2008 vol:25 iss:2 pg:102 -113.

[9] R free software environment for statistical, version 3.0.2 (Frisbee

Sailing). disponível em: http://www.r-project.org/ [10] Juliane et al. ‘corpcor’ e Efficient Estimation of Covariance and (Partial)

Correlation. 2013.

[11] Mahalanobis, Prasanta Chandra (1936). "On the generalised distance in statistics". Proceedings of the National Institute of Sciences of India 2 (1): 49–55. Retrieved 2012-05-03

[12] Mahalanobis distance. Wikipedia.

http://en.wikipedia.org/wiki/Mahalanobis_distance

Technology

Improvement of the Reliability in Novelty Detection using PCA in No Periodic Time Series