Statistical Process Monitoring of Time-Dependent Data · partment of Mathematics, KU Leuven, Celestijnenlaan 200B, B-3001 Heverlee, Belgium. lenges that Statistical Process Monitoring

Statistical Process Monitoring of Time-Dependent Data

Bart De Ketelaere∗, Tiago Rato†, Eric Schmitt ‡, Mia Hubert§

Abstract

During the last decades, we evolved from measur-ing few process variables at sparse intervals to asituation in which a multitude of variables are mea-sured at high speed. This evidently provides op-portunities for extracting more information fromprocesses and to pinpoint out of control situations,but transforming the large datastreams into valu-able information is still a challenging task. In thiscontribution we will focus on the analysis of time-dependent processes since this is the scenario mostoften encountered in practice, due to high samplingsystems and the natural behavior of many real-lifeapplications. The modelling and monitoring chal-

∗Bart De Ketelaere is research manager at theDivision of Mechatronics, Biostatistics and Sen-sors (MeBioS) of the KU Leuven. His email [email protected]. He is the corre-sponding author. His affiliation is: Department ofBiosystems, Division MeBioS, KU Leuven, KasteelparkArenberg 30, B-3001 Heverlee, Belgium.

†Tiago Rato is a researcher at the Division of Mechatron-ics, Biostatistics and Sensors (MeBioS) of the KU Leuven.His email is [email protected]. His affilia-tion is: Department of Biosystems, Division MeBioS, KULeuven, Kasteelpark Arenberg 30, B-3001 Heverlee, Bel-gium.

‡Eric Schmitt is a doctoral student in the De-partment of Mathematics, KU Leuven. His email [email protected]. His affiliation is: Depart-ment of Mathematics, KU Leuven, Celestijnenlaan 200B,B-3001 Heverlee, Belgium.

§Mia Hubert is full professor in the Depart-ment of Mathematics, KU Leuven. Her email [email protected]. Her affiliation is: De-partment of Mathematics, KU Leuven, Celestijnenlaan200B, B-3001 Heverlee, Belgium.

lenges that Statistical Process Monitoring (SPM)techniques face in this situation will be describedand possible routes will be provided. Simulationresults as well as a real-life dataset will be usedthroughout the paper.Keywords: statistical process monitoring, time-dependent data, autocorrelation, nonstationarity,principal component analysis, cointegration

1 Introduction

Contemporary processes are typically highly auto-mated, with in-line sensor technologies that pro-duce vast amounts of data in a short period oftime being the common situation. The result isthe availability of large process streams that oftendisplay autocorrelation because of the fast sam-pling schemes relative to the process dynamics(i.e. inertial elements defining the settling time ofthe process). Additionally, in a substantial partof those real-life processes nonstationarity is animportant factor. This scenario of multivariate,time-dependent data is one of the most challeng-ing encountered in Statistical Process Monitoring(SPM), but it is often overlooked, although theseparate fields of multivariate SPM and SPM forautocorrelated data have received more attentionduring the last decade (Woodall and Montgomery,2014; Bersimis et al., 2007; Ferrer, 2007, 2014). Inthis contribution, research directions for monitor-ing such complex processes are highlighted.

The first direction is the latent variable ap-proach. The idea is to project the data on alower dimensionality, and to analyse these pro-

1

jected data (Kourti, 2005). This is especially re-warding when high cross-correlations are presentso that large datasets can be projected on a lownumber of underlying variables. Such high correla-tions are increasingly common because of the useof spectral sensors (Near Infra Red (NIR) spec-troscopy, Mass spectrometry, Frequency spectra,amongst others). In those situations as much asthousands of data points are collected for eachsample. Principal Component Analysis (PCA) isa method that has shown success in coping withthis setting and naturally leads to the Hotelling’sT 2 and the Q control charts that monitor the dis-tance of a data point to the mean in the modelspace, and the distance between the original pointand its projection, respectively. PCA and the de-rived control charts are not suitable for monitoringtime-dependent data in their basic form, but ex-tensions have been proposed in literature (e.g. (Kuet al., 1995; Wikstrom et al., 1998; Li et al., 2000;Wang et al., 2005; Kruger and Xie, 2012)) to ad-dress autocorrelation as well as nonstationarity.De Ketelaere et al. (2015) recently provided a lit-erature review of different extensions to PCA thatallow for monitoring time-dependent data. Theydivide those methods into non-adaptive and adap-tive methods. The non-adaptive methods sharethe property that a model is built on an histor-ical (calibration) dataset, and this model is thenused to monitor data as they are acquired. Be-sides the classical PCA monitoring, also DynamicPCA (DPCA) and Dynamic PCA with Decorre-lated Residuals (DPCA-DR), due to Rato and Reis(2013b), belong to the non-adaptive methods. Be-cause of the fact that the model parameters are notadapted throughout the monitoring process, thesemethods are typically more suited for monitoringprocesses where stationarity is assumed. Whenthis assumption does not hold, it is better to adaptthe model parameters to describe the new situa-tion. Amongst the adaptive methods, RecursivePCA (RPCA) and Moving Window PCA (MW-PCA) are the most known extensions to PCA (Liet al., 2000; Wang et al., 2005). In this paper, we

will discuss the use of PCA, DPCA, RPCA andMWPCA for time-dependent processes, and willfocus mainly on the ability of these methods andthe derived control charts to describe such data.We will also touch briefly upon a similar approachadvocated in (Wikstrom et al., 1998), where theuse of a classical PCA in combination with mul-tivariate time series modeling of the scores is de-scribed.

A second direction comes from a completely dif-ferent field, econometrics, and is to date largelyunexplored in the SPM literature. In economet-rics the situation of time-dependency and non-stationarity is omnipresent, and investigating therelation between different data series is essential,but is compromised by the fact that classical testsof association, e.g. based on the t-test, are notvalid under nonstationarity (Phillips, 1986). Be-cause of this, alternative tests are proposed, andapproaches to model the dependency between mul-tiple series are developed. A large part of this workis due to Granger and Newbold (1974) who devel-oped the concept of cointegration which will bethe basis of the monitoring approach that we willadvocate. As far as the authors are aware of, us-ing cointegration in an SPM setting was only men-tioned in Chen et al. (2009) and De Ketelaere et al.(2011). Chen et al. (2009) conclude in their workthat the cointegration testing method can be a use-ful methodology for engineering system conditionmonitoring and fault diagnosis, typically in sys-tems under closed-loop control. De Ketelaere et al.(2011) also mention the potential merit of cointe-gration, but did not elaborate on this topic.

The goal of this discussion paper is thus to de-scribe SPM methodologies for processes that aretime-dependent. We will describe their basic work-ing principles, apply them to typical datasets anddiscuss their strengths and weaknesses. Based onthese, new directions of research will be proposed.

2

2 PCA-based methods for

SPM of time-dependent

processes

In this section we provide a brief overview of themodeling stages of typical PCA-based method-ologies. A particular focus is given to theparametrization problems (selection of lags andforgetting parameters) and modeling assumptions.The impact of such factors on modeling perfor-mance will be then assessed in Section 2.2.

2.1 Algorithms

2.1.1 Static PCA

Principal component analysis defines a linear rela-tionship between the original variables of a dataset,mapping them to a set of uncorrelated variables.In general, static PCA assumes that an (n× p)

data matrix Xn,p = [x1, . . . ,xn]′ is observed. The

sample mean of this dataset can be calculated asx = 1

nX ′

n,p1n and its sample covariance matrixas S = 1

n−1(Xn,p − 1nx

′)′(Xn,p − 1nx′), where

1n = [1, 1, . . . , 1]′ is a vector of length n. PCAmodeling proceeds by decomposing the sample co-variance matrix as S = PΛP ′, where P is thep × p loading matrix, containing columnwise theeigenvectors of S and Λ = diag(λ1, λ2, . . . , λp)has the respective eigenvalues in descending order.Afterwards, each p-dimensional vector x is trans-formed into a score vector y = P ′(x− x).In many cases, using k < p of the components

still results in a good model. The k-dimensionalscores are yk = P ′

k(x− x), where Pk contains onlythe first k columns of P . Many methods existto select the number of components to retain (seee.g. Valle et al. (1999) and Jolliffe (2002)). Thisstudy uses the Cumulative Percentage of Variance(CPV), which measures the amount of variationcaptured by the first k latent variables:

CPV(k) =

∑kj=1 λj∑pj=1 λj

100%, (1)

and k is selected such that the CPV is greater thana given threshold.PCA control charts are based on the Hotelling’s

T 2 statistic and theQ statistic (a.k.a. Squared Pre-diction Error, SPE ). For any p-dimensional vectorx, the Hotelling’s T 2 is

T 2 = (x− x)′PkΛ−1k P ′

k(x− x) = y′kΛ

−1k yk (2)

where Λk = diag(λ1, λ2, . . . , λk) is the diagonalmatrix consisting of the k largest eigenvalues of S.The Q statistic is defined as

Q = (x− x)′(I − PkP′k)(x− x) = ||x− x||2 (3)

with x = PkP′k(x− x).

If the number of observations is large, then as-suming temporal independence and multivariatenormality of the scores, the 100(1 − α)% controllimit for Hotelling’s T 2 is approximately the (1−α)percentile of the χ2 distribution with k degrees offreedom; thus T 2

α ≈ χ2k(α).

A number of approximations exist to set the con-trol limit of the Q statistic (e.g. Jackson and Mud-holkar (1979), Box (1954)). We will use that of Box(1954), which shows that the Q statistic is approxi-mately distributed as a scaled χ2-distribution withh degrees of freedom, denoted as gχ2

h. Providedthat all eigenvalues of S are available, the param-eters are given by:

θi =

p∑j=k+1

λij for i = 1, 2; g =

θ2θ1; and h =

θ21θ2. (4)

The control limit for the Q statistic, Qα, is thentaken as the (1 − α) quantile of the gχ2

h distribu-tion.

2.1.2 Dynamic PCA

Dynamic PCA extends static PCA to autocor-related, multivariate systems (Ku et al., 1995).DPCA works on the principle that in additionto the current observed variables, the respectivelagged values up to a proper order, l, can also be

3

included in the PCA model. Therefore, DPCA ap-plies PCA to an augmented dataset, XXX(l), con-structed of lagged replicates of the original vari-ables:

XXX(l) = [XXX(t),XXX(t− 1), . . . ,XXX(t− l)] . (5)

Here XXX(t − j) denotes the data matrix XXX shiftedj times into the past (i.e., with j lags). Due to itsconstruction, DPCA implicitly fits an autoregres-sive model to the data. For instance, an AR(1)process will be modeled if lagged values up toorder one are included in the model input, i.e.,XXX(1) = [XXX(t),XXX(t− 1)].

Ku et al. (1995) provide an algorithm to specifythe number of lags which adds an order to the lagstructure, evaluates whether this brings any newlinear relationship to the model and keeps it if itdoes. Experiments demonstrate that the numberindicated by this method is often too low to modelthe process. More recently, Rato and Reis (2013a)detail an approach for selecting the number of lagsby variable, allowing for a more refined model ofthe process being monitored. In this approach theappropriate number of lags is selected for each vari-able separately following a step forward procedurethat explores the connection between small singu-lar values and linear relationships of the data. Ineach stage, variables are tested, one at a time, forthe inclusion of one lag and the lagged variablethat leads to the smallest singular value is keptin the model. The procedure is repeated until amaximum number of lags is attained and the bestcombination of lags is then selected through an op-timization function. Both lag selection algorithmswere considered in this study, but result will onlybe discussed for the latter approach.

After building a DPCA model, the Hotelling’sT 2 and Q statistics are computed for an extendedlagged vector that contains the current observationand its appropriate past values. The theoreticalexpressions for the control limits are analogous tothose of static PCA and thus relay on the samei.i.d. assumptions.

2.1.3 Recursive PCA

If the stationarity assumptions of non-adaptivePCA models, such as those described above, areviolated, then model parameter estimates obtainedduring the calibration phase may not be appropri-ate for future monitoring. Recursive PCA with aforgetting factor (Li et al., 2000) (RPCA) incor-porates new observations and exponentially down-weights old ones to update the mean and covari-ance matrix used in PCA.Define the estimated mean and covariance of the

observations up to time t as xt, and St. Thenat time t + 1 the T 2 and Q statistics are evalu-ated for the new observation xt+1 = x(t + 1) =[x1(t + 1), . . . ,xp(t + 1)]′. If both values do notexceed their cut-off value, the data matrix Xt,p

is augmented with observation xt+1 as Xt+1,p =[X ′

t,p xt+1]′. Next, the model parameters are up-

dated by means of a forgetting factor 0 6 η 6 1.Denoting nt as the total number of observationsmeasured until time t, the updated mean is de-fined as:

xt+1 = (1− nt

nt + 1η)xt+1 +

nt

nt + 1η xt, (6)

and the updated covariance matrix is defined as:

St+1 = (1− nt

nt + 1η)(xt+1 − xt+1)(xt+1 − xt+1)

′

+nt

nt + 1ηSt. (7)

This is equivalent to computing a weighted meanand covariance of Xt+1 where older values aredownweighted exponentially. Using a forgettingfactor η < 1 allows RPCA to automatically givelower weight to older observations. As η → 1,the model forgets older observations more slowly.The eigenvectors of St+1 are then used to obtaina loading matrix Pt+1. Once the new value of kis determined (e.g. through a recalculation of theCPV) and the new eigenvalues calculated, the con-trol limits of the Hotelling’s T 2 andQ statistics canbe updated according to the formulas describedearlier. Due to the fact that the model updates

4

throughout time for each new in-control point, it isessential that the method has a high power for de-tecting out-of-control (OOC) points. If not, thoseOOC points are included into the model updatingscheme.

2.1.4 Moving Window PCA

MWPCA updates at each time point while re-stricting the observations used in the estimationsto those which fall within a specified window oftime (Wang et al., 2005; Kruger and Xie, 2012).With each new observation, this window excludesthe oldest observation and includes the observa-tion from the previous time period. Thus, forwindow size H, the data matrix at time t isXt = [xt−H+1,xt−H+2, . . . ,xt]

′, and at time t + 1it is Xt+1 = [xt−H+2,xt−H+3, . . . ,xt+1]

′. The up-dated xt+1 and St+1 can then be calculated usingthe observations in the new window. While com-pletely recalculating the parameters for each newwindow is straightforward, and intuitively appeal-ing, methods have been developed to improve oncomputational speed (see for example Wang et al.(2005) and Jyh-Cheng (2010)). As was the case forRPCA, the model is not updated when an obser-vation is determined to be out of control and againthe same control limits are used as described in thePCA section.

2.2 Simulation Studies

Typically, monitoring methods are evaluated fortheir fault detection in the literature, but goodfault detection is predicated on a good model of theprocess and a correct definition of the related con-trol limits. However, the monitoring approachesdescribed in Section 2.1 do not ensure that an ap-propriate model is obtained for a broad range ofprocess dynamics that are typical for real-life ap-plications. Therefore, in this section we evaluatetheir validity through investigation of the mod-elling accuracy of the PCA-based methods on theAR(1) and ARI(1,1) processes. The AR(1) is cho-

sen as it is a widely encountered process dynamicin modern processes, and its integrated form (ARI)is used as its nonstationary counterpart. Follow-ing convention (e.g. Burnham et al. (1999); Choiet al. (2006)) we generate data at the subspacelevel so that we can explicitly control the featuresmonitored by the PCA-based models. To obtaineach observation at time t we began by generatingfive latent variables, yyyt, according to the equationof the desired AR(1) or ARI(1,1) process. For allprocesses, we introduce variation onto the processdynamics through εεεt ∼ N (0005, 0.01III5), where III5 isthe 5 × 5 identity matrix. These are then trans-formed into a 50-dimensional dataset of measure-ments computed as

xxxt = PPP 0 yyyt + eeet, (8)

where PPP 0 is a 50 × 5 matrix with orthogonalcolumns randomly generated once and kept con-stant for all simulation runs. The eeet are 50 ×1 vectors of white noise errors, distributed asN (00050, 0.000025III50), that simulate measurementnoise, as is done, for instance, in (Ku et al. (1995)and Lakshminarayanan et al. (1997)). The eeet canbe seen as the error at the sensor level, and are setto a small value here under the assumption thatsensors are typically reliable. For all methods andsimulations, an arbitrary but common CPV of 95%is used.

The AR process is investigated because it is aparticularly relevant process type seen the highsampling rate of many contemporary sensors inher-ently introducing (positive) autocorrelation intothe data. Besides being a common process typein real life situations, AR processes have a naturalrelevance for studying the properties of DPCA.

The AR(1) process is defined as (Box et al.(1994)):

yyyt = ϕyyyt−1 + εεεt, (9)

where yyyt are the serial observations of the under-lying latent model (yyyt in Eq. (8)) and ϕ is the ARcoefficient. We consider values of ϕ equal to 0,

5

0.1, 0.3, 0.5, 0.7, 0.9, with larger values of the pa-rameter corresponding to stronger autocorrelation.Setting ϕ = 0 gives us a process with i.i.d. obser-vations, which is the reference condition for whichthe assumptions of PCA and the theoretical con-trol limits defined above are valid.

The ARI(1,1) process is defined as (Box et al.(1994)):

yyyt = yyyt−1 + ϕ(yyyt−1 − yyyt−2) + εεεt. (10)

Here the ϕ values are the same as for the AR(1)case. When ϕ = 0 this process is simply anintegrated process, I(1), or random walk. TheARI(1,1) process is considered because the adap-tive methods were designed to address nonstation-ary processes.

Figure 1 depicts the AR(1) and ARI(1,1) pro-cesses with ϕ values of 0.1 and 0.9. In general,when ϕ increases, the variance of xxxt increases (witha factor (1−ϕ2)−1 in case of the AR(1)), so decreas-ing the relative effect of the noise. Also the unitroot introduced in the nonstationary processes hasa marked influence on the total signal variance, be-ing even more pronounced than the effect of ϕ. Be-cause of both effects, different scaling factors wererequired in Figure 1 to visualize the typical be-havior of the different simulation settings. Movingfrom left to right and top to bottom, the scalingfactors used were 10, 5, 0.5 and 0.025. As a result,while the ARI(1,1) with ϕ = 0.9 appears to be rel-atively well behaved, its own scale is much greaterthan that of the other processes.

To assess the performance of the PCA-basedmodels and corresponding control limits, 100 repli-cates of normal operation conditions (NOC) weregenerated. Each of these replicates is composed by7000 NOC observations, divided into a calibration(first 6000 observations) and test (last 1000 obser-vations) dataset. Models were specified for eachreplicate using the respective calibration datasetand their performances were subsequently assessedon the contiguous test dataset. False detectionrates (FDR) were computed for each replicate as

AR, 0.1 AR, 0.9

ARI, 0.1 ARI, 0.9

−1.0

−0.5

0.0

0.5

1.0

1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

0 250 500 750 1000 0 250 500 750 1000Time

Figure 1: Plots of the rescaled AR(1) and ARI(1,1)processes with ϕ values of 0.1 and 0.9.

the number of observations above the theoreticalcontrol limit divided by the total number of ob-servations in the test phase. Therefore, for eachprocess type and ϕ, 100 FDRs were obtained. Thedistribution of the observed FDRs is then consid-ered as a measurement of the models performance.For DPCA, RPCA and MWPCA additional pa-rameters need to be chosen, such as the numberof lags l, the forgetting parameter η and the win-dow length H. In order to do so, an additionalcalibration dataset with 5000 NOC observationswas generated for each combination of process type(AR(1) or ARI(1,1)) and ϕ value. The number oflags used by the DPCA method was selected usingthe method of Rato and Reis (2013a).

Although the selection of the additional parame-ters for adaptive methods is critical to their properimplementation, this topic is not well covered inthe literature. We based their choice on evaluatinga range of possible values and assessing their ap-propriateness. The minimum and maximum valuesfor η and H are [0.9, 0.9999] and [50, 2500], respec-tively. For each process type models with thesecandidate parametrizations were applied to the ad-ditional calibration dataset, after splitting it intotwo equal parts. The first part is used to set up

6

the models with the given values of η and H, andthe second part is then used to calculate the Sumof Squared Prediction Errors (SSPE) following thesuggestions of Schmitt et al. (2014). The η andH values that minimizes the SSPE are then em-ployed for process modelling. This approach canbe thought of as a generalization of choosing theweighing factor in an EWMA control chart (Mont-gomery, 2008).

2.2.1 Simulation Results: AR(1)

For each of the simulation settings we considered,the parameter selection procedure explained aboveresulted in the parametrization which is given inTable 1. One trend that is apparent in this table isthat the number of retained latent variables tendsto decrease to the correct number, five, as ϕ in-creases. The fact that the i.i.d. case did not leadto the underlying five latent variables is mainly dueto the relatively large influence of the noise in thesestationary cases and the chosen CPV value of 95%.The influence of the noise through eeet is loweredwhen the autocorrelation increases (see Figure 1),explaining why, for higher values of ϕ, the correctnumber of latent variables is extracted.The impact of the dynamic features of the data

is also visible on the lag selection procedures. Inparticular, the Ku et al. (1995) method selects zerolags for all AR(1) processes, except for ϕ = 0.9,which has one lag (results not shown for the sakeof brevity). Thus, this lag selection methodologyfinds that the dynamic relationships are not signif-icant when the process exhibits moderate dynam-ics. This result is in line with the findings of Ratoand Reis (2013a), who also concluded that the Kuet al. (1995) method has a tendency to underes-timate the true dynamics of the data. As men-tioned before, to overcome this issue, we presentresults for DPCA where the lag selection proce-dure of Rato and Reis (2013a) is implemented. Inthis approach the number of lags is not necessar-ily the same for all variables. For the cases studyconsidered, the maximum number of lags was con-

sistently set as one, while the effective lag of eachvariable varied between zero and one. This meansthat some variables do not require any lag in or-der to describe the process data. Since DPCA alsomodels the cross-correlation structure of the origi-nal as well as the lagged variables, the exclusion ofredundant lags leads to more parsimonious models.It is noted, however, that in the i.i.d. case (ϕ =0), the Rato and Reis (2013a) method on averageadds one lag which is undesired. This happens be-cause the optimization algorithm assesses the mod-elling improvements of consecutive lag structuresand since the zero lag scenario is the first possiblestructure, it cannot be compared against a previ-ous reference. Subsequently, the lowest feasible lagstructure has at least one lag. Nevertheless, thissituation can be avoided through further analysisof the data and decision graphs produced by thealgorithm.For the adaptive models, the selected forgetting

parameters are all high, indicating that nonsta-tionarity is not dominant in the data. Most ofthe values are at their upper bound, except for thelarge ϕ cases. This does not come as a surprisesince in case of large ϕ the process mean does de-viate from 0 for longer time periods, and the adap-tive models try to capture these (random) dynam-ics by forgetting older observations faster.

Table 1: Parameter settings for monitoring meth-ods in the AR(1) processes. Ranges are given forvariable parameters, with the most frequent valuein brackets.

PCA DPCA RPCA MWPCAϕ k k Lags k η k H0 7 14 0-1 (1) 7-8 (7) 0.9999 6-8 (7) 2500

0.1 8 14 0-1 (1) 6-8 (7) 0.9982 6-8 (7) 25000.3 6 12 0-1 (1) 6-7 (7) 0.9999 6-7 (7) 24690.5 5 10 0-1 (1) 5 0.9999 5-6 (5) 24960.7 5 10 0-1 (1) 5 0.9998 5 17750.9 5 7 0-1 (1) 5 0.9981 5 886

Monitoring was performed on each of the AR(1)settings and the false detection rates (FDR) of theHotelling’s T 2 and Q statistics were recorded. Thedesired overall FDR is set at 1%, and since we

7

have no knowledge about the correlation betweenthe T 2 and Q statistic, it is assumed to be zeroand the Bonferroni correction is applied such thatαT 2 = αQ = FDR/2 = 0.005. Boxplots of theFDRs for the Hotelling’s T 2 and the Q statisticsare presented in Figures 2-5, as a function of theautocorrelation parameter ϕ.Across the results, we see that the effect of au-

tocorrelation on the modelling properties of thePCA-based methodologies is not strong, exceptfor high values of ϕ. This effect has a greaterinfluence in the Hotelling’s T 2 statistic dynamicssince the original autocorrelation of the data is di-rectly translated to the scores, which ultimatelycompromises the reliability of theoretical controllimits (Kruger et al., 2004; Vanhatalo and Ku-lachi, 2015). Although the observed false detec-tion rate of the Hotelling’s T 2 statistics is gener-ally within expectation, we see that the dispersionof the FDR values increases as the autocorrelationincreases. This is a direct result of the inherent dy-namics on the Hotelling’s T 2 statistics, since it in-creases the probability of having consecutive mea-surements with similar values. Thus, for replicateswhere the process experiences sustained deviationsfrom the model (i.e., high values of Hotelling’s T 2),the false detection rate is higher than specified,while the converse happens when the process runsclose to the model. In that case, the Hotelling’s T 2

statistic exhibits consecutive, low values. This re-sults in more variable detection performance, eventhough the average FDR is close to the desiredvalue. Although the FDRs obtained for the T 2 aregenerally in line with expectations, extensions tothe PCA framework can produce some additionalimprovement. This was the idea behind the ap-proach of Wikstrom et al. (1998), which appliesan ARIMA modelling approach to the scores. Weindeed observed a modest decrease in this disper-sion for high values of ϕ, but since the Wikstromapproach does not consider the residual space, itcannot solve the problem seen with the Q statis-tics.On the other hand, since the Q statistic is re-

lated with the model residuals, it should be seriallydecorrelated as long as the appropriate number oflatent variables is retained. This in turn shouldlead to good monitoring performance. We observethat this is the case for low values of ϕ, since rea-sonable FDR are obtained. For larger values of ϕthe models also produced serial decorrelated resid-uals. However, the scores subspace is not accu-rately explaining the dynamic characteristics of thedata, causing the residuals to be greater than ex-pected, which leads to a higher FDR than the tar-get. While extensions based on classical time seriesmethods, such as that of (Wikstrom et al., 1998),are applicable to the T 2 statistic as we mentionedabove, the number of variables used to calculatethe Q-statistic can be extremely large, and there-fore beyond the capacity of such methods.By using the correct number of lags in the DPCA

model, an elimination or at least reduction of someof the misspecifications is expected, resulting in animproved modelling and monitoring performance.However, even though the DPCA approach is ableto follow the process dynamics more closely, itstill produces monitoring statistics (especially theHotelling’s T 2) with dynamic characteristics. Thisindicates that DPCA is prone to the same deficien-cies identified earlier for PCA, which essentiallylead to misspecified control limits.The adaptive methods show similar results as

the non-adaptive PCA and DPCA methods. Thereason for this are the very large forgetting param-eters η and window size H presented in Table 1.These cause the methods to forget only very slowly.These large values indicate that nonstationarity isnot a major issue for this process, which in factis correct. For MWPCA, in case of ϕ = 0.9, thewindow size H was substantially smaller, causingthe FDR dispersion to be substantially larger forthe Hotelling’s T 2 statistic.

2.2.2 Simulation Results: ARI(1,1)

Next, we consider the ARI(1,1) process, again set-ting the target FDR equal to 1%. The ARI(1,1)

8

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030

Process parameter φ

FD

R

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030


FD

R

Figure 2: False detection rates of the T 2 (top) andQ statistics (bottom) of PCA on the AR(1) processwith ϕ ranging from 0 to 0.9. The value of αT 2 =αQ = 0.005.

process poses a greater monitoring challenge forPCA and DPCA than the AR(1) process becauseof the apparent nonstationarity (see Figure 1).This is expected since neither of these methodsis designed to cope with nonstationary behavior.Experiments confirmed that FDRs typically reach100% for these methods regardless of the value ofϕ, so results are not shown. This is caused by thefact that the data used to build the models are notrepresentative for new data encountered in the test

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030


FD

R

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030


FD

R

Figure 3: False detection rates of the T 2 (top) andQ statistics (bottom) of DPCA on the AR(1) pro-cess with ϕ ranging from 0 to 0.9. The value ofαT 2 = αQ = 0.005.

dataset. The approach of Wikstrom et al. (1998)does fail as well when an ARMA model is fittedthrough the T 2 statistics because they are alsononstationary, so that differencing the scores is re-quired. Furthermore, the nonstationarity aroundthe PCA model remains unexplained by the Wik-strom approach.

The model parametrizations of the adaptivemethods are shown in Table 2. We can see thatin general, as ϕ increases, the forgetting factor de-

9

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030


FD

R

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030


FD

R

Figure 4: False detection rates of the T 2 (top) andQ statistics (bottom) of RPCA on the AR(1) pro-cess with ϕ ranging from 0 to 0.9. The value ofαT 2 = αQ = 0.005.

creases, although there are deviations from thispattern. This was to be expected since those situ-ations are dominated by the strongest nonstation-arity as we have demonstrated in Figure 1, and isin line with the observation for the AR(1) case.RPCA and MWPCA are expected to perform ac-ceptably in this setting since they are able to adaptto process changes. However, both methods pro-duce unacceptable results across all values of ϕwhen the theoretical control limits are used in com-

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030


FD

R

0 0.1 0.3 0.5 0.7 0.9

0.00

00.

010

0.02

00.

030


FD

R

Figure 5: False detection rates of the T 2 (top) andQ statistics (bottom) of MWPCA on the AR(1)process with ϕ ranging from 0 to 0.9. The value ofαT 2 = αQ = 0.005.

bination with α = 0.01. Changing the forgettingfactors to improve results did not lead to consis-tently on-target performance.

The reason for this poor behavior is two-fold.First, when applying RPCA and MWPCA, themodels are only updated when a new point is con-sidered in control. When the forgetting factor isnot chosen ideally, or when the dynamics of the un-derlying process change, the adaptive methods canfail to follow those dynamics, leading the model

10

Table 2: Parameter settings for monitoring meth-ods in the ARI(1,1) processes. Ranges are given forvariable parameters, with the most frequent valuein brackets.

RPCA MWPCAϕ k η k H0 3-5 (4) 0.9981 3-5 (4) 2500

0.1 3-5 (4) 0.9986 3-5 (4) 14000.3 3-5 (4) 0.9972 4-5(4) 7000.5 3-5 (4) 0.9955 3-5 (4) 4500.7 3-5 (4) 0.9800 3-5 (4) 2500.9 3-4 (4) 0.9500 2-4 (3) 100

to consider a large portion of the data to be outof control. As stressed before, the right choice ofthe forgetting factor and its eventual updating toaccount for changing dynamics is important, butreferences are scarce (e.g. Choi et al. (2006)) andthe topic deserves further attention.

The second reason for the excessive FDR comesfrom the fact that the underlying assumptions ofthe analytical Hotelling’s T 2 and Q limits definedearlier and applied here with α set at 1% do nothold. This misspecification causes the control lim-its to be too tight, so that a substantial numberof observations are considered outlying. This inturn prevents the model from being updated sincesuch OOC points are not used for adapting themodel. As advocated in (Rato et al., 2015), insuch cases it is better to tune the α value suchthat the desired FDR is obtained (for other exam-ples of approaches for adjusting the limits, see e.g.(Ramaker et al., 2005), (Camacho et al., 2009),and (van Sprang et al., 2002)). In Figures 6 and 7,we consider smaller values of αT 2 and αQ (see Ta-ble 3), resulting in higher control limits but ac-ceptable FDR rates. These α values were deter-mined manually by dividing a fixed reference dataset in two parts, fitting a model to the first part(5000 observations), and assessing its performanceon the second part (15000 observations). Then,the selected α values were applied to all 100 simu-lation runs. The variable performance in the sim-ulations shows that this approach, while generallyeffective, does not result in models that generalize

to all of the realizations of the process encounteredin the simulations. The figures demonstrate thatfor a substantial number of the simulation runsthe FDR is far from the target FDR, with valuesthat reach 100%. The reason for those extremecases is that models are only updated when a newpoint is considered in control. If at the start of themonitoring phase points are considered OOC themodel does not update, increasing the probabilitythat later measurements will be considered OOCas well when the process further deviates from themodel because of the nonstationarity. Table 3 liststhe median FDR values since they are not clearlyvisible in the boxplots. For the T 2 values, the me-dian is actually zero, meaning that no NOC pointswere considered to be outlying. This illustratesthat tuning α for the Hotelling’s T 2 statistic is notworking adequately and improvements are needed.This is also visualized in Figure 6, where indeedthe T 2 control limit is too high because of the sub-stantial autocorrelation present in the Hotelling’sT 2 statistics.For the Q statistic, the medians are in line with

expectations, meaning that if the method is ac-tively monitoring, the tuned control limit for the Qis adequate. This is illustrated in Figure 8, wherethe Q statistic shows a random behavior in periodsof similar limits, and where the amount of adapta-tion of the control limit in periods of higher/lowerresiduals is acceptable.As mentioned above, the tuned α values in Ta-

ble 3 were selected by trial an error adjustment ona validation dataset because no better approach iscurrently available. From these results we can ob-serve a direct relationship between the value of ϕand the values of α needed to have a FDR that isin line with expectations: the higher ϕ, the smallerthe α values must be. The cause of this relation-ship is that increasing ϕ increases the process vari-ance, and this increased process variance is notaccounted for in the theoretical limits. Therefore,for processes with only moderate nonstationarity,the problems we observe in this simulation may notarise to the same extent, and results listed here can

11

be considered as extremes (worst case scenario).We note that the interquartile range does not showa clear trend except that it is typically larger forthe T 2-statistic than for the Q-statistic. It mightbe that even further refined model parametriza-tions or additional simulation runs are required fora more obvious pattern to emerge.

0 0.1 0.3 0.5 0.7 0.9

0.00

0.01

0.02

0.03

0.04

0.05


FD

R

0 0.1 0.3 0.5 0.7 0.9

0.00

0.01

0.02

0.03

0.04

0.05


FD

R

Figure 6: False detection rates of the T 2 (top) andQ statistics (bottom) of RPCA on the ARI(1,1)process with ϕ ranging from 0 to 0.9, using thetuned values of α in Table 3.

A visual appreciation of the monitoring behav-ior of RPCA applied to an ARI(1,1) process withϕ = 0.9 and tuned α values is given in Figure 8.

Table 3: Tuned α values for monitoring methodsin the ARI(1,1) processes with an intended overallFDR of 0.01 (FDRT 2 = FDRQ = 0.005). Theobserved FDRs are summarized by their medianand interquartile range.

RPCA MWPCAϕ Stat. α FDR α FDR

0

T 2 2× 10−2 0 5× 10−4 0(0.013) (0.012)

Q 10−3 0.003 5× 10−4 0.002(0.004) (0.019)

0.1

T 2 2× 10−2 0 1.5× 10−4 0(0.018) (0.011)

Q 1.5× 10−3 0.003 10−4 0.003(0.005) (0.017)

0.3

T 2 2× 10−2 0 5× 10−5 0(0.023) (0.009)

Q 1.5× 10−3 0.005 5× 10−5 0.003(0.007) (0.013)

0.5

T 2 1.5× 10−2 0 6× 10−5 0(0.019) (0.006)

Q 10−3 0.005 10−5 0.003(0.007) (0.009)

0.7

T 2 1.2× 10−2 0 2× 10−5 0(0.026) (0.012)

Q 1.5× 10−5 0.008 10−7 0.004(0.005) (0.003)

0.9

T 2 1.7× 10−2 0 1.5× 10−6 0(0.021) (0.018))

Q 5× 10−6 0.004 10−9 0.006(0.012) (0.005)

12

0 0.1 0.3 0.5 0.7 0.9

0.00

0.01

0.02

0.03

0.04

0.05


FD

R

0 0.1 0.3 0.5 0.7 0.9

0.00

0.01

0.02

0.03

0.04

0.05


FD

R

Figure 7: False detection rates of the T 2 (top) andQ statistics (bottom) of MWPCA on the ARI(1,1)process with ϕ ranging from 0 to 0.9, using thetuned values of α in Table 3.

From this figure we conclude that the monitoringstatistics still show evidence that the model is notcompletely explaining the structure of the process,which is especially visible in the T 2 statistic. Infact, the recursive nature of RPCA does allow tocover the simple case of nonstationarity, but doesnot seem to cope with the AR component that isadded to it.

Even though this parametrization reduces thedetection of small faults, the model is adapting

and actively monitoring. Therefore, if the processof interest displays large faults, these methods maystill be suitable for monitoring.

−2.5

−2.0

−1.5

−1.0

0.0

0.4

0.8

1.2

Log(Q−

statistic)Log(T

2−statistic)

0 500 1000 1500 2000Time

Mon

itorin

g st

atis

tic

Figure 8: RPCA-based control chart for theARI(1,1) process with ϕ equal to 0.9. The value ofαT 2 = 1.7× 10−2 and αQ = 5× 10−6. The controlstatistics are in log10 to improve readability.

2.3 Discussion and future perspec-tives

Based on simulation results we covered differentforms of time-dependency in process monitoring,focusing on the simple yet challenging cases of anAR(1) and an ARI(1,1) because those types of dy-namics are believed to be often present in modernprocess data.

The results of the AR(1) simulations demon-strate that under moderate dynamics, all of thestudied PCA-based methodologies have a similar,acceptable modelling performance. This happensbecause the optimal parameters of DPCA, RPCAand MWPCA tend to reduce them to static PCA.It is also visible that when process dynamics be-come more relevant (say, for ϕ ≥ 0.7) the modelstend to deviate more from expectation, with espe-cially the Hotelling’s T 2 statistic to be less reliable,confirming recent results from (Vanhatalo and Ku-lachi, 2015). However, simple AR(1) dynamics donot severely compromise the modelling capabilities

13

of the procedures, which are still producing falsedetection rates within expectation.On the other hand, the ARI(1,1) simulations

showed that PCA and DPCA cannot cope withnonstationarity. RPCA and MWPCA, the adap-tive methods that are devised for handling non-stationarity, do allow for modeling such data, butplugging the classical values for α in the controllimits for the Hotelling’s T 2 and Q statistic re-sulted in FDR values that were unacceptably high.Since no literature is available for defining thosecontrol limits under the nonstationarity assump-tion, it was proposed to relax the control limits forboth statistics by searching for α values that re-sult in acceptable FDRs so that these models couldcontinue to adapt to the time-varying process andmight be able to detect severe faults. The observa-tion that there is a clear link between the processdynamics and the monitoring method capable ofhandling such data is partly in line with the resultsof Camacho et al. (2009), who acknowledge that itis important to reflect the time-varying nature ofthe process in the model of the SPM method used.Interestingly, Camacho et al. (2009) mention thefact that besides the process dynamics also thefault type to be detected is important when de-ciding on the best monitoring method. This isultimately true, and fault detection is probablythe most important aspect when considering SPMmethods. As an example, (Rato et al., 2015) con-cluded that the capability of the adaptive methodsto detect ramp faults is highly dependent on theforgetting factor chosen, and should be consideredcarefully.The nonstationarity as introduced into the simu-

lations are extreme cases, as can be seen from Fig-ure 1. Performance may be more appropriate onprocesses with less severe forms of nonstationarity,like processes showing mild nonstationarity or sim-ple drifts due to sensor aging. In those cases, theproposed extensions to PCA might be able to cap-ture the drifts thus describing the data adequately.Contrarily, in order to turn these models into validand powerful monitoring schemes work is required

into a proper definition of the control limits con-nected to the methods. We believe that this direc-tion of research is highly relevant. In case of morecomplex nonstationary behavior, work is requiredinto the modeling aspect as we demonstrated in theARI(1,1) case and high ϕ values where even afteradapting α deviating FDR values were noted.

Although nonstationarity is present in a widerange of processes, the use of adaptive models isstill limited, especially in the multivariate case.The moderate results we have shown are only apartly explanation. From the practice side, thelack of intuition with the methods by the processowners themselves (often engineers) is a barrier aswell. From that perspective, it could be advan-tageous to translate the parameter choice of e.g.the RPCA into a selection procedure for parame-ters engineers are used to. More specifically, engi-neers typically have a good idea of the process dy-namics in terms of their in control frequency spec-trum, i.e. the speed of change which is typical tothose processes. This behavior can be visualizedthrough the generation of the Power Spectral Den-sity (PSD) of the process, denoting the power ofthe signal as a function of the frequency (Oppen-heim and Schafer, 1975). Typically, only the slowdynamics are proper to the underlying process, sothat the SPM scheme should apply a low pass fil-tering of the data. In essence, the cut-off frequencydetermining the frequencies which do (not) passthe SPM model are well related to the AR and/orMA terms. Said this, we feel that bridging the gapbetween the engineering and statistical reasoningcould help implementation of adaptive SPM meth-ods. This implementation issue is the last barrier:software to cope with such multivariate SPM mod-els is not wide-spread, and is required to translateadvanced multivariate SPM from a pure researchfield into a practical solution.

14

3 Cointegration and Error

Correction Models for

SPM

Above, we discussed extensions to PCA as plau-sible solutions to handle multivariate data seriesshowing time-dependent behavior. Besides this ob-vious choice, we will present a different, generalapproach which has not been fully explored in anSPM setting. The approach we will advocate hasits roots in econometrics, a discipline where non-stationary time series are a frequent issue.

3.1 Illustrating examples

Granger, a Nobel Prize winner for his work on non-stationary time series and causality, showed thatwhen stationarity cannot be assumed, correlationsamongst variables are often spurious (”spurious re-gressions”) and the asymptotic behavior of classi-cal tests do not hold (Granger and Newbold, 1974).This dictates the need for alternatives. The behav-ior of spurious correlations are demonstrated using1000 realizations of a simple i.i.d. process that con-sists of 1000 data points, versus 1000 realizationsof a random walk of 1000 data points, which is ac-tually the integration of the i.i.d. process. We willdenote the i.i.d. process by I(0), and the randomwalk by I(1), since it is a first order integrated (I)process having no autocorrelation (AR) nor mov-ing average (MA) term. Both processes are visu-alized in Figure 9.Evidently, both the i.i.d. processes as well as

the random walks are purely driven by random-ness, and any correlation between the different re-alizations is random. However, when we plot thesample correlations between the I(0) series on theone side and the I(1) series on the other side, wesee that they behave very differently (Figure 10).For the i.i.d. case, 5.5% of the correlations aresignificant at α = 0.05, completely within expec-tations. For the random walks this percentageis much higher, i.e., 91% of the correlations are

Figure 9: 1000 white noise (i.i.d.) sequences oflength 1000 (left); 1000 random walk sequences oflength 1000 (right).

significantly different from zero. This simple ex-ample shows that classical statistical tests do nothold in the presence of nonstationarity, and alter-natives are needed. Granger and co-workers de-veloped tests to check whether the correlation be-tween nonstationary series is spurious or not. If itis so, no statistical analysis is meaningful, but ifthe series are causally related, several techniquesare proposed to analyze their relation under theassumption of nonstationarity. This last point isinteresting because in reality we most often acquiresignals which all relate to different characteristicsof a given process, so that they are expected to becausally related (namely, through the underlyingprocess).

Figure 10: Pearson correlation coefficients for theI(0) (top) and the I(1) (bottom) situation.

15

An example of such nonstationary series froman industrial example is depicted in Figure 11where seven temperature profiles are shown. Thosetemperatures are used to monitor the bearings ofa rotating machine. Because of variable load ofthe machine, fluctuating ambient temperature andwind speed, the temperatures vary widely as func-tion of time. The large drop nearly half-way theseries is related to a stand-still and restart of themachine. Given those fluctuations which are ob-servable but uncontrollable and unpredictable, itis expected that temperature will rise when prob-lems occur with the bearings. This is typicallyseen as short increases due to temporal blocking,and those are most often much smaller than theobserved temperature variation under normal op-eration. Such data are challenging because of thenonstationarity, and the fact that they do not al-low to simply subtract an average trajectory as isoften done in batch processes.

Figure 11: Machine temperature change over timeat 7 locations.

In the following paragraph, we will briefly in-troduce tests for (Granger) causality and the ba-sic form of cointegration, a framework to analysethe relation between nonstationary series. We willuse the practical case of temperature profiles toshow the merits of this approach. As it is only thegoal to set the scene without going into depth, wewill explain the general methodology based on twotemperature series, and will provide guidelines forgeneralization later on.

3.2 Cointegration basics

Suppose we have a process for which two time se-ries are acquired, xt and yt, and suppose they areintegrated of order 1, I(1). The regression of yt onxt,

yyyt = c0 + c1xxxt + εεεt, (11)

will yield high correlations, but as we have dis-cussed before those might be spurious. If εεεt isalso I(1), the OLS estimators become inconsis-tent (Phillips, 1986). However, when the aboveregression yields stationary residuals, the OLS es-timators are consistent, and we say that xt and yt

are cointegrated and further analysis makes sense.Formally, we want to test

H0 : xxxt and yyyt are not cointegrated,

H1 : xxxt and yyyt are cointegrated.(12)

Under H0, the absence of cointegration betweenxt and yt is assumed, so if we reject it we haveevidence of cointegration. Only in that case esti-mation of the relation given in Eq. 11 makes sense.This relation is called the equilibrium, or long runrelation (Engle and Granger, 1987). The approachof fitting the equilibrium relation and checking forstationarity of the residuals is often referred to asthe Engle-Granger approach (Engle and Granger,1987). The check of the absence of a unit rootin the residuals can be performed using the Aug-mented Dickey Fuller test (ADF) (Fuller, 1976;Phillips, 1987). The ADF is illustrated for tworandom walks (Figure 12) from the earlier exampledisplaying a high correlation (r = 0.74). Applyingthe ADF on the residuals of the regression betweenboth random walks results in not rejecting the nullhypothesis (p > 0.05), and thus to the conclusionthat the series are not cointegrated.If we now take two temperature profiles from

the industrial dataset, which clearly display non-stationarity, we observe a high correlation as well(r = 0.92). However, performing the ADF test

16

Figure 12: Example of two random walks (left)displaying a high correlation and the regression ofSeries 2 on Series 1 (right).

rejects the null hypothesis (p < 0.05), implyingthat both series are cointegrated, and that the longterm relation between both series makes sense.This relation in visualized in Figure 13 and al-though it succeeds in describing the global rela-tion, at a shorter time span the series seem to di-verge and converge again. In fact, the two seriesxt and yt being cointegrated means that they areattracted towards the long term equilibrium givenin Eq. 11 (Maddala and Kim, 2003). The longterm relationship reflects the general behavior ofthe machine temperature as function of load andambient temperature (”steady state”), whereas theshort term behavior is typical for the transientbehavior. When the load of the machine is al-tered, the temperature will start changing, but thischange depends amongst others on exact locationin the machine and loading type.However, the stationary residuals of this long

term relation are not sufficient for monitoring theprocess and detecting the out of control pointscaused by the blocking of the bearings. If xt

and yt are cointegrated an Error Correction Model(ECM) is useful for describing the short term dy-namics between them. This ECM is given by

∆yyyt = α+β∆xxxt+γ(yyyt−1− c0− c1xxxt−1)+δδδt. (13)

The parameter γ measures the speed of conver-gence of the two series towards the long term equi-

Figure 13: Long term relationship between the twotemperature channels.

librium, and should have a negative value to be in-terpretable (indeed, negative means that they areattracted towards the equilibrium). α, β and γare parameters to be estimated and δt is the er-ror term which is assumed to be i.i.d. Fitting thisECM is typically performed in two steps. First,c0 and c1 are estimated using the long term rela-tion of Eq. 11. Then, their estimates are pluggedinto Eq. 13 to yield α, β and γ. Fitting the ECMon the temperature dataset is shown in Figure 14and results in an almost perfect fit of xt and yt, sothat the residuals are expected to be highly infor-mative. Formally checking those residuals leads tothe conclusion that they are i.i.d. After fitting theECM on the given dataset, it was tested on a sep-arate dataset. The residuals for this validation setare shown in Figure 15. They are clearly station-ary (ADF, p < 0.001) and the fault states whichoccurred after 250, 1,200, 4,950, 5,700 and 5,900samples are easily discernible using simple controlcharts.

3.3 Discussion and future perspec-tives

For the simple example of a bivariate, nonstation-ary series we used cointegration to turn the datainto a simple, univariate, stationary series of resid-uals that is easy to monitor. As such, the ex-ample proves the potential of cointegration and

17

Figure 14: The Temperature dataset: short termrelationship through the Error Correction Model.Zoomed view added to show the difference betweenthe data and the estimates.

Figure 15: Two temperature series for validatingthe model (left). Residuals of the ECM for the val-idation set, with fault states clearly visible (right).

the related models to cope with nonstationarity.However, to turn this simple example into a ro-bust solution for contemporary processes where thedimensionality is often substantially higher, addi-tional research is required. We list here some ex-tensions, issues and potential avenues for furtherresearch.

Although we used only the bivariate example,the cointegration framework extends well beyondthis and is capable of modelling multivariate timeseries. In that case there is eventually more thanone cointegration equation that needs to be esti-mated. Johansen (1991) developed a test to de-termine in a sequential way the number of cointe-

gration equations that describe the data (the Jo-hansen test). When the multivariate series arecointegrated, the ECM we discussed before can beexpanded to a Vector ECM (VECM), much like anAR process is expanded to a VAR model. In fact,the VECM is much like a VAR model for the seriesin differences (first derivative), but with the addi-tion of the cointegration equations that are foundsignificant using the Johansen test.A potential issue arises when we want to ap-

ply the cointegration principle to multivariate pro-cesses where a substantial number of variables aremeasured. In such cases, the number of param-eters is often prohibitive and can even be largerthan the number of samples taken. This fact is fur-ther complicated if the cross-correlation betweenthe observed variables is high so that the effectivesample size is further reduced. This situation is be-coming increasingly common, for instance in caseswhere spectral data (near infrared, mass spectrom-etry, vibration spectra, ...) are acquired. Thissituation inspired several researchers to developadapted techniques, which are most often basedon penalization techniques (e.g. lasso (Song andBickel, 2011)). An appealing approach to tacklethe dimensionality issue is the use of PrincipalComponent Analysis in combination with cointe-gration. Pena and Poncela (2006) described the useof PCA in estimating VAR models and tested itsuccessfully on a real-life dataset, although it onlyfeatured seven variables, providing a compellingcase for an extension of VECM in this direction.Another potential disadvantage of the cointegra-

tion assumptions is the fact that the parameters ofthe long term relationship dictated by the coin-tegration equation (Eq. 11) do not change overtime, neither do the parameters of the ECM. Ifthis assumption does not hold, the residuals of thecointegration approach will no longer be stationaryand monitoring is impeded. Hansen and Johansen(1999) described tests for the evaluation of param-eter constancy in cointegrated vector autoregres-sive models, and proposed two different ways ofre-estimating the parameters of the model. This

18

approach seems to be relevant for industrial appli-cations, but lacks the ability to cope with high-dimensional signals. This leads to the logical sug-gestion of combining the adaptive PCA models wehave described above with the cointegration ap-proach of this section.

4 Conclusions

In this paper we have highlighted the varietyof challenges posed by time-dependent processes.In Section 2, we showed that monitoring high-dimensional processes with autocorrelation can besuccessfully achieved using PCA-based methods.However, for the ARI(1,1) processes, we foundthat even methods which are purportedly designedto address nonstationarity (RPCA and MWPCA)have difficulties in specifying a suitable model forthat process type. Extensions to RPCA and MW-PCA exist in the literature (for examples, see (DeKetelaere et al., 2015)) that may offer improve-ments over the basic implementations. However,these methods have not been thoroughly comparedin the literature, so it is difficult to recommend onein particular.

In Section 3, we proposed that one potential steptowards accurate modeling of nonstationary pro-cesses could come from approaches that make useof the concept of cointegration. When appropri-ate, these approaches may result in a more validmodel of the processes than adaptive methods, likeRPCA or MWPCA. However, classical methodsfor cointegrated data, such as the VECM, do notscale well to the high-dimensional setting. For thisreason, further research on the integration of coin-tegration methodology into latent variable-basedmethods may be useful.

References

Bersimis, S., Psarakis, S., and Panaretos, J.(2007). ”Multivariate statistical process control

charts: an overview”. Quality and ReliabilityEngineering International, 23(5), pp. 517–543.

Box, G. E. P. (1954). ”Some Theorems onQuadratic Forms Applied in the Study of Anal-ysis of Variance Problems, I. Effect of Inequal-ity of Variance in the One-Way Classification”.The Annals of Mathematical Statistics, 25(2),pp. 290–302.

Box, G. E. P., Jenkins, G. M., and Reinsel, G. C.(1994). Time Series Analysis: Forecasting andControl. Prentice-Hall, New Jersey, 3rd edition.

Burnham, A. J., MacGregor, J. F., and Viveros,R. (1999). ”Latent variable multivariate regres-sion modeling”. Chemometrics and IntelligentLaboratory Systems, 48(2), pp. 167 – 180.

Camacho, J., Pic, J., and Ferrer, A. (2009). ”Thebest approaches in the on-line monitoring ofbatch processes based on PCA: Does the mod-elling structure matter?”. Analytica ChimicaActa, 642(12), pp. 59 – 68.

Chen, Q., Kruger, U., and Leung, A. Y. T. (2009).”Cointegration Testing Method for MonitoringNonstationary Processes”. Industrial & Engi-neering Chemistry Research, 48(7), pp. 3533–3543.

Choi, S., Martin, E., Morris, A., and Lee, I.-B.(2006). ”Adaptive Multivariate Statistical Pro-cess Control for Monitoring Time-Varying Pro-cesses”. Ind. Eng. Chem. Res, 45, pp. 3108–3118.

De Ketelaere, B., Hubert, M., and Schmitt, E.(2015). ”Overview of PCA-based statistical pro-cess monitoring methods for time-dependent,high-dimensional data”. Journal of QualityTechnology.

De Ketelaere, B., Mertens, K., Mathijs, F., Diaz,D., and Baerdemaeker, J. (2011). ”Nonstation-arity in statistical process control: issues, cases,

19

ideas”. Applied Stochastic Models in Businessand Industry, 27(4), pp. 367–376.

Engle, R. F. and Granger, C. W. J. (1987). ”Co-Integration and Error Correction: Representa-tion, Estimation, and Testing”. Econometrica,55(2), pp. 251–276.

Ferrer, A. (2007). ”Multivariate Statistical Pro-cess Control Based on Principal ComponentAnalysis (MSPC-PCA): Some Reflections and aCase Study in an Autobody Assembly Process”.Quality Engineering, 19(4), pp. 311–325.

Ferrer, A. (2014). ”Latent Structures-Based Multi-variate Statistical Process Control: A ParadigmShift”. Quality Engineering, 26(1), pp. 72–91.

Fuller, W. A. (1976). Introduction to StatisticalTime Series. New York. John Wiley and Sons.

Granger, C. and Newbold, P. (1974). ”Spuriousregressions in econometrics”. Journal of Econo-metrics, 2, pp. 111–120.

Hansen, H. and Johansen, S. (1999). ”Some testsfor parameter constancy in cointegrated VAR-models”. Econometrics Journal, 2(2), pp. 306–333.

Jackson, J. E. and Mudholkar, G. S. (1979). ”Con-trol Procedures for Residuals Associated WithPrincipal Component Analysis”. Technometrics,21(3), pp. 341–349.

Johansen, S. (1991). ”Estimation and Hypothe-sis Testing of Cointegration Vectors in GaussianVector Autoregressive Models”. Econometrica,59(6), pp. 1551–1580.

Jolliffe, I. (2002). Principal Component Analysis.Springer, New York, 2nd edition.

Jyh-Cheng, J. (2010). ”Adaptive process moni-toring using efficient recursive PCA and mov-ing window PCA algorithms”. Journal of theTaiwan Institute of Chemical Engineer, 44, pp.475–481.

Kourti, T. (2005). ”Application of latent variablemethods to process control and multivariate sta-tistical process control in industry”. Interna-tional Journal of Adaptive Control and SignalProcessing, 19(4), pp. 213–246.

Kruger, U. and Xie, L. (2012). Advances in sta-tistical monitoring of complex multivariate pro-cesses: with applications in industrial processcontrol. John Wiley and Sons Ltd.

Kruger, U., Zhou, Y., and Irwin, G. W. (2004).”Improved principal component monitoring oflarge-scale processes”. Journal of Process Con-trol, 14(8), pp. 879–888.

Ku, W., Storer, R. H., and Georgakis, C. (1995).”Disturbance detection and isolation by dy-namic principal component analysis”. Chemo-metrics and Intelligent Laboratory Systems,30(1), pp. 179–196.

Lakshminarayanan, S., Shah, S. L., and Nandaku-mar, K. (1997). ”Modeling and Control of Mul-tivariable Processes: Dynamic PLS Approach”.AIChE Journal, 43(9), pp. 2307–2322.

Li, W., Yue, H. H., Valle-Cervantes, S., and Qin,S. J. (2000). ”Recursive PCA for adaptive pro-cess monitoring”. Journal of Process Control,10(5), pp. 471–486.

Maddala, G. and Kim, I. (2003). Unit Root, Coin-tegration and Structural Change. CambridgeUniversity Press, Fifth Edition, UK.

Montgomery, D. (2008). Introduction to StatisticalQuality Control. Wiley Desktop Editions Series.Wiley.

Oppenheim, A. V. and Schafer, R. W. (1975). Dig-ital signal processing. Prentice-Hall, New York.

Pena, D. and Poncela, P. (2006). Dimension Re-duction in Multivariate Time Series, book sec-tion 28, pp. 433–458. Statistics for Industry andTechnology. Birkhuser Boston.

20

Phillips, P. (1986). ”Understanding Spurious Re-gressions in Econometrics”. Journal of Econo-metrics, 33, pp. 311–340.

Phillips, P. (1987). ”Time series regression withunit roots”. Econometrica, 55, pp. 277–301.

Ramaker, H.-J., van Sprang, E. N., Westerhuis,J. A., and Smilde, A. K. (2005). ”Fault detec-tion properties of global, local and time evolvingmodels for batch process monitoring”. Journalof Process Control, 15(7), pp. 799 – 805.

Rato, T., Schmitt, E., De Ketelaere, B., Hubert,M., and Reis, M. (2015). ”A Systematic Com-parison of PCA-based Statistical Process Mon-itoring Methods for High-dimensional, Time-dependent Processes”. AIChE, accepted.

Rato, T. J. and Reis, M. S. (2013a). ”Definingthe structure of DPCA models and its impacton process monitoring and prediction activities”. Chemometrics and Intelligent Laboratory Sys-tems, 125, pp. 74–86.

Rato, T. J. and Reis, M. S. (2013b). ”Fault detec-tion in the Tennessee Eastman benchmark pro-cess using dynamic principal components anal-ysis based on decorrelated residuals (DPCA-DR)”. Chemometrics and Intelligent LaboratorySystems, 125, pp. 101–108.

Schmitt, E., De Ketelaere, B., Rato, T., andReis, M. (2014). ”The Challenges of PCA-BasedStatistical Process Monitoring: An Overviewand Solutions”. ENBIS. Linz, Austria, 21-25September 2014.

Song, S. and Bickel, P. J. (2011). ”Largevector auto regressions”. Arxiv preprintarXiv:1106.3915v1.

Valle, S., Li, W., and Qin, S. J. (1999). ”Selectionof the Number of Principal Components: TheVariance of the Reconstruction Error Criterion

with a Comparison to Other Methods”. Indus-trial & Engineering Chemistry Research, 38(11),pp. 4389–4401.

van Sprang, E. N., Ramaker, H.-J., Westerhuis,J. A., Gurden, S. P., and Smilde, A. K. (2002).”Critical evaluation of approaches for on-linebatch process monitoring”. Chemical Engineer-ing Science, 57(18), pp. 3979 – 3991.

Vanhatalo, E. and Kulachi, M. (2015). ”The effectof autocorrelation on the Hotelling T 2 controlchart”. Quality and Reliability Engineering In-ternational, 10.1002/qre.1717.

Wang, X., Kruger, U., and Irwin, G. W. (2005).”Process Monitoring Approach Using Fast Mov-ing Window PCA”. Industrial & EngineeringChemistry Research, 44(15), pp. 5691–5702.

Wikstrom, C., Albano, C., Eriksson, L., Johans-son, E., Sandberg, M., Kettaneh-Wold, N.,Friden, H., Rannar, S., Nordahl, A., and Wold,S. (1998). ”Multivariate process and qual-ity monitoring applied to an electrolysis pro-cess: Part II. Multivariate time-series analysisof lagged latent variables”. Chemometrics andIntelligent Laboratory Systems, 42, pp. 233–240.

Woodall, W. and Montgomery, D. (2014). ”SomeCurrent Directions in the Theory and Applica-tion of Statistical Process Monitoring”. Journalof Quality Technology, 46(1), pp. 78–94.

21

Documents

Statistical Process Monitoring of Time-Dependent Data · partment of Mathematics, KU Leuven, Celestijnenlaan 200B, B-3001 Heverlee, Belgium. lenges that Statistical Process Monitoring