18
This article was downloaded by: [University North Carolina - Chapel Hill] On: 13 May 2013, At: 17:00 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK International Journal of Production Research Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tprs20 Multivariate process monitoring and fault identification using multiple decision tree classifiers Shuguang He a , Gang Alan Wang b , Min Zhang a & Deborah F. Cook b a School of Management, Tianjin University, Tianjin, P.R.China b Department of Business Information Technology, Pamplin College of Business, Virginia Tech, Blacksburg, USA Published online: 03 Apr 2013. To cite this article: Shuguang He , Gang Alan Wang , Min Zhang & Deborah F. Cook (2013): Multivariate process monitoring and fault identification using multiple decision tree classifiers, International Journal of Production Research, DOI:10.1080/00207543.2013.774474 To link to this article: http://dx.doi.org/10.1080/00207543.2013.774474 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Multivariate process monitoring and fault identification using multiple decision tree classifiers

Embed Size (px)

Citation preview

Page 1: Multivariate process monitoring and fault identification using multiple decision tree classifiers

This article was downloaded by: [University North Carolina - Chapel Hill]On: 13 May 2013, At: 17:00Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Production ResearchPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tprs20

Multivariate process monitoring and fault identificationusing multiple decision tree classifiersShuguang He a , Gang Alan Wang b , Min Zhang a & Deborah F. Cook ba School of Management, Tianjin University, Tianjin, P.R.Chinab Department of Business Information Technology, Pamplin College of Business, VirginiaTech, Blacksburg, USAPublished online: 03 Apr 2013.

To cite this article: Shuguang He , Gang Alan Wang , Min Zhang & Deborah F. Cook (2013): Multivariate processmonitoring and fault identification using multiple decision tree classifiers, International Journal of Production Research,DOI:10.1080/00207543.2013.774474

To link to this article: http://dx.doi.org/10.1080/00207543.2013.774474

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form toanyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses shouldbe independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly inconnection with or arising out of the use of this material.

Page 2: Multivariate process monitoring and fault identification using multiple decision tree classifiers

Multivariate process monitoring and fault identification using multiple decision treeclassifiers

Shuguang Hea, Gang Alan Wangb, Min Zhanga* and Deborah F. Cookb

aSchool of Management, Tianjin University, Tianjin, P.R.China; bDepartment of Business Information Technology, Pamplin College ofBusiness, Virginia Tech, Blacksburg, USA

(Received 13 February 2012; final version received 31 January 2013)

Machine learning based algorithms, such as a decision tree (DT) classifier, have been applied to automated process mon-itoring and fault identification in manufacturing processes, however the current DT-based process control models employa single DT classifier for both mean shift detection and fault identification. As many manufacturing processes use auto-mated data collection for multiple process parameters, a DT classifier would have to handle a large number of classes.Previous research shows that a large number of classes can degrade the accuracy of a DT multiclass classifier. In thisstudy we propose a new process monitoring model using multiple DT classifiers with each handling a small number ofclasses. Moreover, we not only detect mean shifts but also identify process variability levels that may cause out-of-con-trol signals. Experimental results show that our proposed model achieves satisfactory performance in process monitoringand fault identification with various parameter settings. It achieves better ARL performance compared with the baselinemethod based on a single DT classifier.

Keywords: multivariate analysis; process monitoring; fault identification; decision tree; statistical process control

1. Introduction

Statistical process control (SPC) is widely used to detect the presence of special causes of variation in a manufacturingor service process by identifying shifts in the process parameters. It is common that a modern manufacturing processinvolves multiple correlated quality variables (Yang and Trewn 2004). Process monitoring with multiple correlated vari-ables is referred to as multivariate quality control (Montgomery 2007) or multivariate statistical process control (MSPC).Hotelling’s T2 control chart is one of the earliest multivariate control charts for detecting mean shifts (Hotelling 1947)and is the most commonly used MSPC chart. However, the T2 chart has been found to be insensitive to small shifts(Lowry et al. 1992). Other MSPC techniques, such as multivariate exponentially weighted moving average (MEWMA)charts (Lowry et al. 1992; Kramer and Schmin 1997; Praphu and Runger 1997; Hawkins, Choi, and Lee 2007; Rey-nolds, Marion, and Kim 2005; Zou and Tsung 2008) and multivariate cumulative sum (MCUSUM) charts (Woodall andNcube 1985; Crosier, 1988; Pignatiello and Runger 1990; Runger and Testik 2004), are multivariate extensions of theirunivariate counterparts, and are known to be more sensitive to small and medium mean shifts than the T2 chart.Although those control charts can successfully identify process shifts, they do not provide any information on the causesof the shifts and information on the causes of the shifts is essential for improved process control.

The procedure of finding the causes of out-of-control signals is referred to as MSPC diagnosis or fault identification.Guh and Shiue (2008) identified two types of fault identification techniques, namely statistics-based and learning-basedmethods. Some statistics-based methods, including principal component analysis (PCA (Fuchs and Benjamini 1994;Jackson 1991; Jackson 1985), multi-way PCA (Nomikos and MacGregor 1994), and multi-way kernel PCA (Lee, Yoo,and Lee 2004), decompose MSPC statistics into various independent components and identify the components mostresponsible for the out-of-control signals. Other statistics-based techniques, such as the eigenspace comparison method(Jin and Zhou), the Mason-Young-Tracy (MYT) method (Mason, Tracy, and Young 1995, 1997), and the step-downanalysis (Sullivan et al. 2007), try to identify the source that is most likely responsible for the shifts. However, theimplementation of these techniques often involves advanced and complex statistical procedures (Guh and Shiue 2008)and requires extensive computation (Bersimis, Psarakis, and Panaretos 2007).

*Corresponding author. Email: [email protected]

International Journal of Production Research, 2013http://dx.doi.org/10.1080/00207543.2013.774474

� 2013 Taylor & Francis

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 3: Multivariate process monitoring and fault identification using multiple decision tree classifiers

With automated data collection and analysis techniques widely adopted in manufacturing processes, it has becomemore and more popular to apply machine-learning algorithms to automated process monitoring and fault identification.Artificial neural network (ANN) and decision tree (DT) algorithms are the two commonly used machine-learning algo-rithms. ANN has been extensively used to detect mean and variance shifts in univariate processes (Zorriassatine andTannock 1998). Although it can effectively monitor and diagnose processes in a multivariate context (Hou, Liu, and Lin2003; Niaki et al. 2005; Guh 2007; Yu, Xi, and Zhou 2008; Yu and Xi 2009), ANN is subject to two major criticisms(Warner and Misra 1996). First, the convergence to a solution can be very slow and depends upon the neural network’sinitial condition. Second, the parameters in a trained ANN model are often difficult to interpret. The DT learning algo-rithm provides a better way of presenting classification rules. The application of DT in SPC has also been examined inrecent years. In a univariate scenario, a DT-based approach outperformed an ANN-based approach for a control chartpattern recognition problem in terms of recognition accuracy and average run length (ARL) (Guh 2005; Guh and Shiue2005). Guh and Shiue (2008) proposed a DT-based model for multivariate process monitoring and fault identification.Their experiments showed that the model was more efficient than an ANN-based approach for detecting mean shiftsand diagnosing the causes of the shifts in multivariate control charts. They implemented a single DT multiclass classifierto identify multiple classes, including in-control status scenarios as well as the causes of out-of-control signals (i.e. shiftpatterns). However, a study on multiclass classification showed that, the more classes a multiclass classifier handles, themore likely the classifier’s prediction accuracy would degrade (Li, Zhang, and Ogihara 2004). The performance of Guhand Shiue’s model is expected to degrade even more in a multivariate process with many process quality characteristicsbecause the number of classes would increase exponentially. Moreover, their study assumed that the variance-covariancematrix of quality characteristics remains constant over time. Although the assumption can simplify the problem, it islikely that the variance-covariance matrix may change over time in practice.

In this study we propose a new MSPC model for multivariate process monitoring and fault identification. Our contri-butions to the MSPC field are twofold. First, we propose to use a combination of DT classifiers with each handling asmall number of classes (not more than three). When the number of process characteristics increases, only the numberof DT classifiers will increase. The performance of our model will not degrade because each DT classifier still handles asmall number of classes. Our second contribution lies in the removal of the constant variance-covariance assumption byallowing the variances of process characteristics to change. The removal of this restriction will result in a more realisticrepresentation of manufacturing conditions and will also greatly elevate the complexity of a DT-based model becausethe number of shift patterns significantly increases.

The rest of this paper is organised as follows. We review the single DT-based model for multivariate process moni-toring and fault identification in Section 2. We propose a new model based on multiple DT classifiers in Section 3. InSection 4 we discuss empirical evaluations of our model and report on a sensitivity analysis using various sample sizesand correlation coefficient settings. In Section 5 we report the results of a generalising study where our model is appliedto contexts different from those in the model training process. We also compare the performance of our model to that ofthe single DT-based model. Finally, we conclude our study and suggest future research directions in Section 6.

2. Related work

Guh and Shiue (2008) proposed a DT learning-based model for online detection of mean shifts in multivariate controlcharts. The model consists of two major modules including data pre-processing, DT learning and classifying and followsthree assumptions. First, the process variables are assumed to follow a multivariate normal distribution with knownparameters. Second, the covariance matrix is held constant. Lastly, a sequence of individual observations is available inthe process. In this section we briefly introduce the two modules in Guh and Shiue’s model and the motivation of ourstudy.

The data pre-processing model implements standardisation and coding to transform the process data into a form suit-able for DT learning and classifying. Standardisation transforms process data into a constant range because a DT cananalyse only a certain range of input data. Coding is then implemented to reduce the effect of the common cause varia-tion (noise) by dividing the value range of a process variable into a number of zones with an equal width. The intentionis to filter small random variations and to yield a smaller DT classifier. The output of the pre-processing module is datavectors, each of which contains an input vector and an output class label that can be used by DT learning and classify-ing. When p process quality variables are observed, the output from the DT model involves 3p classes because each var-iable has three possible shift statuses – no shift, an upward shift and a downward shift. With supervised learning, theDT learning module employs the C4.5 algorithm to create a decision tree using a set of learning examples. The decisiontree can be used to classify future process observations into a class.

2 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 4: Multivariate process monitoring and fault identification using multiple decision tree classifiers

As described earlier, the classification accuracy of a DT multiclass classifier degrades as the number of classesincreases. In a multivariate process the number of classes increases exponentially, which in turn would degrade the clas-sification accuracy of the Guh and Shiue model dramatically. Moreover, the primary focus of the Guh and Shiue’smodel is on detecting the mean shifts of process variables. Their assumption of a constant covariance matrix may over-simplify the effect of process variability on multivariate process monitoring. As many have pointed out in previous stud-ies (Djauhari 2005; Montgomery 2007), process variability is an additional vital consideration in process monitoringand fault identification. Therefore, the need to improve the performance of the DT multiclass classifier and to monitorprocess variability motivate this research effort.

3. A new model for multivariate process monitoring and fault identification using multiple DT classifiers

Our proposed model is designed for a multivariate manufacturing process where either grouped or individual observa-tions are collected. The model is built with three assumptions. Firstly, when a process is in control, its quality variablesare assumed to follow a multivariate normal distribution with known parameters such as a mean vector l0 and a vari-ance-covariance matrix R0. The parameters can be deemed as known when the in-control samples are large enough toaccurately estimate those parameters. Secondly, we assume that the correlation coefficients between the quality variablesremain constant while the variances of the variables can shift upwardly. The practical implication is that upward shiftsin variances are more common than downward shifts in manufacturing processes. Lastly, we only consider abrupt shiftswhere quality variables before and after a shift can be modelled reasonably as independently and identically distributedvariables (Sullivan et al. 2007).

3.1 DT learning

DT learning is a tree-based predictive model that maps samples to their correct classes (Quinlan 1986, 1987). A DTclassifier has a tree structure that represents classification rules. There are three types of nodes in a DT classifier: a rootnode, non-terminal nodes and leaf nodes. The root node represents all training cases. A non-terminal node represents asubset of the training cases. A leaf node represents a subset of the training cases with a specific classification label. Bothroot and non-terminal nodes include an attribute value test that divides training cases into two or more subsets based onthe test result. Each path from the root node to a leaf node represents a classification rule. The main advantages of DTlearning are its simplicity and efficiency. It can deal with a large amount of high-dimensional data with high computa-tional efficiency. The classification results are also easy to understand and interpret. In addition, DT learning algorithmshave the ability to solve nonlinear classification problems like ours (Quinlan 1996).

The procedure of building a DT classifier includes two steps: tree construction and pruning. The construction step isa recursive process beginning with the root node. A DT is built by dividing training cases into subsets based on an attri-bute value test. The process is repeated on each subset recursively. The recursion stops when the subset of a node hasthe same classification label. The pruning step simplifies a DT by removing redundant branches in order to avoid overfitting. C4.5 (Quinlan 1996) is the most widely used algorithm for supervised DT learning. C5.0 (RuleQuest 2009) is animproved version of C4.5 with many advantages including higher computing efficiency, more efficient memory usage,and less complex decision trees. Some new functions such as boosting and variable misclassification costs are also intro-duced in C5.0.

3.2 The proposed model framework

Our proposed model consists of 2p + 1 DT classifiers, including one for process monitoring, p for mean shifts identifica-tion, denoted as DTM1, DTM2, …, and DTMp, and p for variance shifts identification, denoted as DTS1, DTS2, …, andDTSp. There are three major modules illustrated in Figure 1: pre-processing, process monitoring and fault identification.We provide a brief introduction for each module and discuss the details in the following sections.

3.2.1 Preprocessing

The learning and testing processes of DT classifiers require training data with various mean and variance shift patternsand shift magnitudes. However, the observations collected in a manufacturing process are often inadequate becausethey do not cover all interested mean and variance shift patterns and shift magnitudes. We assume that the qualityvariables in a multivariate manufacturing process follow a multivariate normal distribution with known mean vector andvariance-covariance matrix. Historical process observations, if available in a large amount, can provide us a relatively

International Journal of Production Research 3

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 5: Multivariate process monitoring and fault identification using multiple decision tree classifiers

accurate estimation on the distribution parameters of the quality variables. Given the estimated distribution statistics, wecan manually generate random multivariate process data using various mean and variance shift patterns with designedmagnitudes.

3.2.2 Process monitoring

This module is designed to detect out-of-control conditions in a manufacturing process. We propose a DT classifier,denoted as DT1, to differentiate out-of-control data from in-control data. In the generated training data those in-controland out-of-control samples have a class label of 0 and 1, respectively. The trained DT1 classifier can be used to classifyfuture samples with unknown class labels for the purpose of process monitoring.

3.2.3 Fault identification

When an out-of-control condition is detected, we use the fault identification module to identify potential shiftedvariables. p DT classifiers (DTM1, DTM2, …, DTMp) are specifically designed to identify the mean shifts (x1, x2, …,and, xp, respectively) that may occur in the p quality variables of a multivariate process. Similarly, p DT classifiers

Figure 1. The framework of the proposed multiple DT classifiers based model.

4 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 6: Multivariate process monitoring and fault identification using multiple decision tree classifiers

(DTS1, DTS2, …, DTSp) are designed to identify the variance shifts (s21; s22, …, and, s2p, respectively) in the p variables.

We use the generated out-of-control instances associated with various mean and variance shift patterns and shift magni-tudes to train the DT classifiers. The trained classifiers can therefore be used to classify a future out-of-control samplewith an unknown cause into a particular mean and variance shift pattern.

3.3 Generating random multivariate process data

The DT classifiers in our model need to be trained before we can apply the resulting classification rule set to online pro-cess control and fault identification. Similar to the work by Guh and Shiue (2008), we use the Monte Carlo simulationapproach to generate random multivariate process samples for DT learning and testing. Let the in-control mean vectorand variance-covariance matrix be l0 and R0, respectively. We can generate a training data set Xi ¼ ðxi1; xi2; :::; xipÞ, i =1, 2, …, n, assuming that the process variables follow a multivariate normal distribution. The generated data can bestandardised as the following:

yij ¼xij � l0j

rj; i ¼ 1; 2; :::; n; j ¼ 1; 2; :::; p: ð1Þ

Standardisation is necessary in order to transform original data into a constant range that is suitable for DT learningand classification (Guh and Shiue 2008). The transformed variables Y follow a normal distribution Nð0;RYÞ, where thevariance of each variable in Y is 1. The standardised in-control mean vector becomes l0 ¼ 0. The covariance sij is thesame as the correlation coefficient qij of the variable i and variable j.

We assume that the in-control multivariate vectors follow a normal distribution Nð0;R0Þ:. When a shift occurs attime t, the process data after time t will be assumed to follow a normal distribution Nð0þ Dlt;RtÞ with shifted parame-ters, where Dlt ¼ ½kx1kx2:::kxp� and

Rt ¼

k2s1 ks1ks2q12 ::: ks1ks2q1p

ks1ks2q12 k2s2 ::: ks2kspq2p

..

. ... . .

. ...

ks1kspq1p ks2kspq2p ::: k2sp

26664

37775;

kxi and ksi (i =1, 2, … , p) denote the mean shift magnitudes and variance shift magnitudes of the process variables,respectively. Therefore, we can generate DT learning samples using the two distributions with various mean and vari-ance shift patterns and shift magnitudes. Based on desired quality specifications, we can arbitrarily specify f1 mean shiftmagnitudes that divide the interval of [�3.0 �1.0] [ f0:0g [ ½1:0 3:0� into f1–1 zones, those of which in the interval[�3.0 �1.0] [ ½1:0 3:0� have an approximately equal width. Small mean shift magnitudes between �1.0 and 1.0 areignored because small shift cases usually are trivial and would not cause a serious consequence in practice (Guh andShiue 2008).

For example, if we consider seven mean shift magnitudes, they will be (�3.0, �2.0, �1.0, 0.0, 1.0, 2.0, 3.0). Simi-larly, we can specify f2 variance shift magnitudes of interest that divide the interval of [1.0, 3.0] into f2–1 zones with anapproximately equal width. Combining the mean and variance shift magnitudes of interest, there are in total f p1 � f p2shift patterns including one in-control situation (kxi = 0 and ksi = 1). Similar to the work done by Guh and Shiue(2008), we classify the range of shift magnitudes into three categories: small shifts, moderate shifts, and large shifts(Table 1).

Following the above procedure, we can generate N ¼ N1 þ ðkp1 � kp2 � 1ÞN2 samples where N1 is the number of in-control samples and N2 is the number of out-of-control samples for each combination. Each sample, denoted as a matrix

Table 1. Coding for shift magnitudes.

Coding Mean shift Variance shift

Small shifts (S) jkxij � 1:7; for i ¼ 1; 2; :::; p jksij � 1:65; for i ¼ 1; 2; :::; pModerate shifts (M) ð1:7\jkxij � 2:4 and jkxjj � 2:4Þ

for i ¼ 1; 2; :::; p and j–ið1:65\jksij � 2:3 and jksjj � 2:3Þ

for i ¼ 1; 2; :::; p and j–iLarge shifts (L) At least one of jkxij[2:4 At least one of jksij[2:3

International Journal of Production Research 5

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 7: Multivariate process monitoring and fault identification using multiple decision tree classifiers

Xi ¼ ðxij1xij2:::xijpÞ, where i = 1, 2, … , N and j = 1, 2, … , m, that consists of m p-dimensional vectors. The mean vec-

tor and variance-covariance matrix of the ith sample can be calculated as Xi and Si.Existing process monitoring techniques rely on distance measures to differentiate out-of-control conditions from in-

control ones. Conventional statistics-based approaches such as χ2, MCUSUM, and MEWMA charts use the statisticaldistance (Guh and Shiue 2008) also known as the Mahalanobis distance. On the other hand, Guh and Shiue (2008)found that the performance of the learning-based approaches is more strongly related to the Euclidean distance than thestatistical distance (Guh 2007). In this study we consider both distance measures in our DT classifiers. The Mahalanobisdistance, denoted as D2

i , calculates the distance between an out-of-control mean vector �Xi and the in-control mean vec-tor l0. The Euclidean distance, denoted as Ei, computes the distance between an out-of-control variance vector½s21 s22 . . . s2p� and the in-control variance vector that consists of the main diagonal elements of R0.

D2i ¼ ð�Xi � l0ÞR�1

0 ð�Xi � l0ÞT ð2Þ

Ei ¼Xpi¼1

ðs2i � 1Þ2 !1=2

ð3Þ

We now can calculate a feature vector Vi ¼ ½�xi1 �xi2 :::�xip s2i1 s2i2 ::: s2ip D2i Ei� for each sample. Our DT classifiers rely

on those features to detect out-of-control signals and diagnose shifted variables.We use average run length (ARL) and classification accuracy (CR) to evaluate the performance of the process moni-

toring classifier DT1. CR alone is used to evaluate the performance of the fault identification classifiers DTMi (i = 1, 2,…, p) and DTSi (i = 1, 2, … , p). Both ARL and CR have been commonly used as the performance measures for learn-ing-based control chart pattern recognition problems (Guh 2005; Guh and Shiue 2005). ARL0 is the in-control ARL, i.e.the average number of observations needed for a control chart to detect an out-of-control signal when the process is incontrol (Type I error). ARL1 is the out-of-control ARL calculated as the average number of observations needed for acontrol chart to signal an out-of-control condition when the process is indeed out of control. A good multivariate pro-cess monitoring procedure is expected to have a large ARL0 and a small ARL1. CR is computed as the ratio of the num-ber of correctly classified samples over the total number of samples. A higher CR ratio indicates better faultidentification performance. Table 2 summarises the class scheme that our DT classifiers used.

4. Numerical experiments

In this section we report the performance of our proposed model for a bivariate case (p = 2) and a trivariate case (p =3), respectively.

4.1 A bivariate case study (p = 2)

We used the C5.0 DT learning algorithm in this work. C5.0 uses a misclassification cost matrix that defines the cost ofmisclassifying one state into another. We arbitrarily defined the following misclassification cost matrix (Figure 2) for theDT classifiers in order to penalise Type I errors. The costs were proposed for demonstration purposes. The cost matrix

Table 2. The class scheme of our DT classifiers.

Process monitoring CodingIn-control 0Out-of-control 1

Fault diagnosis: Mean shifts CodingNo mean shifts 0Downward mean shifts 1Upward mean shifts 2

Fault diagnosis: Variance shifts CodingNo variance shifts 0Upward variance shifts 1

6 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 8: Multivariate process monitoring and fault identification using multiple decision tree classifiers

for DT1 can also be used to adjust the in-control performance of the proposed model. In practice they can also be deter-mined based on the real costs of misclassifications.

Similar to the approach by Guh and Shiue (2008), we examined the problems in scenarios of p = 2 firstly. We arbi-trarily chose f1 = 13 mean shift magnitudes of interest for kx1, kx2, …, and kxp: (�3.0, �2.6, �2.2, �1.8, �1.4, �1.0,0.0, 1.0, 1.4, 1.8, 2.2, 2.6, 3.0). We also chose to consider f2 = 4 variance shift magnitudes of interest for ks1, ks2, …,ksp: (1, 1.65, 2.3, 3.0). We generated 64,060 training samples (N1 þ ð132 � 42 � 1Þ � N2) by arbitrarily setting N1 =

Figure 2. The misclassification cost matrices used in DT learning.

Table 3. Rules learned by DT1 when m = 10 and ρ = 0.5.

Rule ID Rule antecedent Rule conclusion Number of cases Correct ratio (%)

1 D2 � 0:936 and E � 3:297 and s22 � 2:235 and s21 � 2:072 0 9568 98.92 D2 � 0:583 andE � 3:297 and s21[2:072 and s21 � 2:684

and s22 � 2:2350 242 93.4

3 D2 � 0:506 andE � 3:297 and s22[2:235 and s21[0:692 0 141 92.24 D2[0:936 andD2 � 1:161 andE � 1:606 0 61 75.45 D2[0:304 andD2 � 0:936 andE[3:297 andE � 4:059

and s22[1:4451 72 87.5

6 D2 � 0:936 andE[4:059 1 991 99.77 D2[1:161 andD2 � 1:676 andE � 2:231 1 131 77.98 D2[0:936 andD2 � 1:676 andE[2:231 1 1469 99.89 D2[1:676 1 51,166 100.0

Table 4. DT1 testing performance with different sample sizes and correlation coefficients.

Correlation coefficient Sample size Errors in in-control samples Errors in out-of-control samples ARL0 ARL1 CR (%)

ρ = 0.1 5 161 1226 62.11 1.0232 97.8310 44 232 227.27 1.0043 99.5720 15 64 666.67 1.0012 99.88

ρ = 0.3 5 199 1136 50.25 1.0215 97.9210 51 281 196.08 1.0052 99.4820 27 66 370.37 1.0012 99.85

ρ = 0.5 5 228 1090 43.86 1.0206 97.9410 49 224 204.08 1.0042 99.5720 20 48 500.00 1.0009 99.89

ρ = 0.7 5 201 1002 49.75 1.0189 98.1210 43 265 232.56 1.0049 99.5220 11 49 909.09 1.0009 99.91

ρ = 0.9 5 133 665 75.19 1.0125 98.7510 38 178 263.16 1.0033 99.6620 26 33 384.62 1.0006 99.91

International Journal of Production Research 7

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 9: Multivariate process monitoring and fault identification using multiple decision tree classifiers

Table

5.DT1testingperformance

(CR)with

differentshiftpatternsforp=2.

Meanshift

magnitude

Varianceshift

magnitude

Sam

plesize

=5

Sam

plesize

=10

Sam

plesize

=20

ρ=0.1

ρ=0.3

ρ=0.5

ρ=0.7

ρ=0.9

ρ=0.1

ρ=0.3

ρ=0.5

ρ=0.7

ρ=0.9

ρ=0.1

ρ=0.3

ρ=0.5

ρ=0.7

ρ=0.9

NN

98.39

98.01

97.72

97.99

98.67

99.56

99.49

99.51

99.57

99.62

99.85

99.73

99.80

99.89

99.74

NS

46.67

30.00

43.33

43.33

45.00

50.00

43.33

46.67

41.67

50.00

66.67

60.00

55.00

66.67

78.33

NM

79.00

75.00

75.00

72.00

83.00

90.00

90.00

89.00

88.00

96.00

98.00

99.00

100.00

98.00

99.00

NL

91.43

92.14

91.43

91.43

94.29

99.29

98.57

99.29

95.71

98.57

100.00

100.00

100.00

100.00

100.00

SN

67.08

71.04

72.08

77.29

82.71

88.33

84.58

88.54

88.96

92.08

96.67

97.50

99.58

97.50

98.33

SS

78.61

80.63

84.79

85.42

91.81

93.26

93.13

94.93

94.65

95.21

98.26

98.33

98.82

99.31

99.44

SM

92.04

91.63

91.88

93.13

96.58

99.21

98.50

98.75

98.50

99.63

99.96

99.79

99.96

99.92

99.92

SL

96.64

96.76

97.17

97.32

98.54

99.76

99.67

99.94

99.79

99.79

100.00

100.00

99.97

100.00

100.00

MN

95.71

98.13

97.50

97.41

97.41

99.91

100.00

100.00

99.38

99.73

100.00

100.00

100.00

100.00

100.00

MS

96.99

97.44

97.50

97.56

98.10

99.82

99.88

99.70

99.67

99.73

100.00

100.00

100.00

99.94

99.97

MM

97.91

98.13

98.25

98.30

98.73

99.95

99.88

99.89

99.75

99.93

100.00

100.00

100.00

99.98

100.00

ML

99.07

99.22

98.84

99.11

99.43

99.99

99.96

99.99

99.94

99.97

100.00

100.00

100.00

100.00

100.00

LN

99.83

100.00

99.94

100.00

100.00

100.00

100.00

100.00

100.00

100.00

100.00

100.00

100.00

100.00

100.00

LS

99.87

99.89

99.75

99.70

99.70

100.00

100.00

99.98

100.00

99.98

100.00

100.00

100.00

100.00

100.00

LM

99.88

99.69

99.74

99.61

99.76

100.00

99.99

99.99

99.98

100.00

100.00

100.00

100.00

100.00

100.00

LL

99.75

99.81

99.69

99.77

99.76

100.00

100.00

99.99

100.00

100.00

100.00

100.00

100.00

100.00

100.00

Notes:Boldnu

mbersgeq80

%.Shiftsizes:N,no

ne;S,sm

all;M,mod

erate;

L,large.

8 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 10: Multivariate process monitoring and fault identification using multiple decision tree classifiers

Table6.

The

CR(%

)performance

ofclassifiersDTM

1,DTM

2,DTS1andDTS2forp=2andm

=10

.

Mean

shift

Variance

shift

ρ=0.1

ρ=0.3

ρ=0.5

ρ=0.7

ρ=0.9

DTM

1DTM

2DTS1

DTS2

DTM

1DTM

2DTS1

DTS2

DTM

1DTM

2DTS1

DTS2

DTM

1DTM

2DTS1

DTS2

DTM

1DTM

2DTS1

DTS2

NS

93.3

85.0

65.0

76.7

85.0

75.0

75.0

80.0

80.0

71.7

75.0

76.7

81.7

80.0

86.7

76.7

73.3

53.3

85.0

86.7

NM

80.0

73.0

87.0

90.0

75.0

69.0

90.0

89.0

73.0

61.0

91.0

91.0

69.0

57.0

92.0

89.0

78.0

58.0

96.0

92.0

NL

70.7

58.6

95.7

88.6

55.0

65.7

91.4

93.6

56.4

50.0

93.6

93.6

62.1

50.0

95.7

92.1

62.9

47.1

94.3

92.9

SN

89.2

94.4

97.1

96.7

94.6

92.5

94.2

92.9

92.5

93.3

96.0

94.0

92.1

95.2

93.5

91.0

90.0

96.5

92.3

93.5

SS

82.6

85.7

77.6

76.7

87.5

85.3

82.6

77.7

85.7

88.3

78.2

79.7

85.8

88.6

81.1

82.2

84.9

87.7

87.6

86.0

SM

75.7

81.0

90.2

90.3

81.1

79.8

91.5

90.9

81.0

83.0

89.3

90.9

80.8

81.8

91.3

91.0

77.5

82.8

92.7

92.1

SL

71.3

74.6

93.8

93.3

74.4

72.9

94.1

94.0

74.1

77.1

93.7

93.4

75.1

77.4

93.8

94.7

71.8

76.4

94.6

94.5

MN

95.9

97.2

96.1

97.5

98.4

98.0

95.1

94.2

97.2

98.5

96.0

95.4

96.7

98.8

92.1

91.7

96.8

98.3

93.5

93.7

MS

93.5

94.9

77.4

76.2

95.1

94.4

80.8

80.0

95.0

95.6

79.3

80.0

94.7

95.8

84.1

84.6

94.3

96.4

87.4

87.4

MM

89.4

90.5

89.2

89.5

91.6

91.1

90.9

90.3

90.9

91.9

89.7

90.3

91.9

92.8

92.6

92.0

91.2

92.7

93.3

92.7

ML

86.2

86.9

94.1

94.0

87.5

87.0

94.6

94.0

87.7

89.8

93.6

92.8

88.3

89.4

94.5

95.0

87.3

89.5

94.9

95.0

LN

97.4

98.0

96.1

94.6

98.8

98.4

94.1

93.2

98.6

98.2

95.6

94.2

98.5

98.9

93.3

93.4

98.6

99.3

92.1

93.5

LS

95.5

96.5

77.1

77.3

97.2

96.5

80.2

80.2

96.8

97.0

79.5

80.3

97.0

97.4

83.0

83.0

97.1

98.1

86.3

86.7

LM

94.4

94.8

89.6

90.1

95.0

94.5

91.6

90.5

94.9

95.5

89.7

90.6

94.8

96.3

92.1

91.6

95.9

96.7

93.7

92.9

LL

91.9

92.5

93.9

94.1

92.9

92.6

94.7

94.0

92.8

93.2

93.7

93.1

92.9

94.2

94.3

94.7

94.1

94.9

94.6

94.7

Notes:Shiftsizes:N,no

ne;S,sm

all;M,mod

erate;

L,large.

International Journal of Production Research 9

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 11: Multivariate process monitoring and fault identification using multiple decision tree classifiers

10,000 and N2 = 20. In addition, we also generated 64,060 samples by letting N1 = 10,000 and N2 = 20 for model test-ing. We trained the DT1 classifier using training samples with different sample sizes (m = 5, 10, 20). The larger thesample size is, the more accurate the estimated process parameter can be. Thus, the performance of process control isexpected to increase with larger sample sizes. However, a large sample size will increase data collection cost, computa-tional cost, and the time required to detect shifts. The three sample sizes that we choose to use are only for illustrationpurposes. In practice, they should be selected based on the requirements of process control in each scenario. We set thecorrelation coefficient ρ to be 0.1, 0.3, 0.5, 0.7, or 0.9. As an example, Table 3 shows the classification rules learned byDT1 when m = 10 and ρ = 0.5. For simplicity, rules consisting of less than 20 cases are eliminated. There are four ruleswith a class label 0 (i.e. being classified as in-control cases) that have imperfect CR values (i.e. less than 100%). Itshows that some out-of-control cases were misclassified as in-control by the DT1 classifier. On the other hand, there arefive rules with a class label 1 (i.e. being classified as out-of-control cases) that achieve good CR values. Only two ruleshave CR values smaller than 90% and these two rules are based on only a few cases.

We used the generated testing samples to examine the performance of DT1. Table 4 summarises the results of DT1

testing. We can observe that the correlation coefficient value had little effect on DT1’s performance. It shows that DT1

may work well in various scenarios where correlation varies from very weak to very strong. We also examined theeffect of sample size on DT1’s performance. ARL0 increased as the sample size m increased. It means that the probabil-ity of Type I errors decreased with an increasing sample size. Although all ARL1 values were close to 1 given differentsample sizes, a slightly decreasing trend can be observed when m was increasing. CR was also positively related to thesample size. Not surprisingly, CR was the best (at least 99.66%) with the largest sample size (m = 20). We can see fromTable 4 that the number of misclassified cases decreases when the sample size increases. CR was also satisfactory(>99%) when the sample size was 10.

An increasing ARL0 means a decreasing type I error in DT1. However, a large m will increase the sampling cost.ARL0 = 200 is often used in MSPC field, which leads to a type I error of α = 0.005. Based on Table 4, the ARL0 valuesare close to 200 when the sample size is 10. Therefore, we set the sample size to be 10 in the rest of the experiments.

We also examined DT1’s testing performance with regard to mean and variance shift magnitudes as well as samplesizes (Table 5). DT1 performed better with moderate and large shifts. The correct ratios were much lower in those caseswith small mean/variance shifts (bold in Table 5) than those in other cases. In general, the CR values show an improv-ing trend when the mean and/or variance shift magnitudes increase. When medium or large mean shifts took place, theCR values were greater than 95.71% even with the smallest sample size (m = 5). Increased sample sizes were needed tomaintain high CR values when the mean shift magnitude decreased. For example, when the mean shift magnitudeschanged from large to medium, the sample sizes needed to be at least 10 in order to maintain CR results comparable tothose with a sample size of 5. When the mean shift magnitude changed from medium to small and the variance shiftmagnitude was medium or large, the sample size needed to be at least 10 in order to get CR values comparable to thosewith m = 5. When there were no or only small variance shifts and small mean shifts, the sample size needed to be 20

Table 7. DT1 testing performance (CR) with different shift patterns for p = 3.

Mean shift magnitude Variance shift magnitude

Sample size = 10

ρ = 0.1 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.9

N N 99.83 99.82 99.77 99.85 99.93N S 43.57 42.14 42.14 41.43 28.57N M 88.57 85.71 84.76 82.86 80.00N L 97.50 97.14 97.50 97.14 95.71S N 90.00 90.73 91.85 95.16 97.10S S 95.61 96.26 96.09 97.05 98.27S M 99.24 99.29 99.21 99.19 99.45S L 99.91 99.92 99.90 99.88 99.90M N 99.95 99.92 99.80 99.82 99.92M S 99.94 99.93 99.89 99.79 99.88M M 99.97 99.97 99.96 99.91 99.95M L 99.99 99.99 99.99 99.98 99.99L N 100.00 100.00 100.00 100.00 100.00L S 100.00 100.00 100.00 100.00 100.00L M 100.00 100.00 100.00 100.00 100.00L L 100.00 100.00 100.00 100.00 100.00

Notes: Shift sizes: N, none; S, small; M, moderate; L, large.

10 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 12: Multivariate process monitoring and fault identification using multiple decision tree classifiers

Table

8.The

CR(%

)performance

ofclassifiersDTM

1–3andDTS1–3forp=3andm

=10

.

Mean

shift

Variance

shift

0:1

0:5

0:9

DTM

1DTM

2DTM

3DTS1

DTS2

DTS3

DTM

1DTM

2DTM

3DTS1

DTS2

DTS3

DTM

1DTM

2DTM

3DTS1

DTS2

DTS3

NS

86.43

82.86

77.86

75.00

71.43

69.29

88.57

85.71

84.29

75.00

67.14

67.86

85.00

86.43

87.14

86.43

82.86

77.86

NM

84.29

94.29

92.86

70.95

54.76

52.86

87.62

89.05

90.95

71.90

55.71

53.33

90.95

95.24

93.81

84.29

94.29

92.86

NL

77.86

94.29

93.57

72.50

44.29

45.71

87.14

93.93

93.21

71.43

46.43

49.29

88.21

97.86

95.71

77.86

94.29

93.57

SN

93.39

94.11

93.63

95.97

95.56

95.56

90.24

91.77

91.29

94.76

94.76

95.48

93.31

93.55

91.69

93.39

94.11

93.63

SS

81.44

82.90

82.42

88.13

88.13

88.53

83.15

83.56

82.19

87.40

88.17

88.80

85.55

85.89

86.41

81.44

82.90

82.42

SM

89.45

89.76

89.67

81.77

82.75

82.34

90.25

90.27

89.64

81.65

82.09

83.51

94.95

94.97

94.93

89.45

89.76

89.67

SL

92.79

92.61

92.84

77.97

78.95

78.55

92.59

92.51

92.47

77.31

77.94

78.47

94.81

94.98

94.55

92.79

92.61

92.84

MN

93.46

93.21

92.95

97.52

97.70

97.45

91.11

91.49

91.46

97.55

98.13

98.21

92.68

93.21

92.05

93.46

93.21

92.95

MS

81.96

82.02

82.29

94.30

94.53

94.47

83.61

83.49

83.29

94.04

94.77

94.87

86.40

86.35

86.36

81.96

82.02

82.29

MM

89.30

89.19

89.56

91.46

91.46

91.35

90.06

90.03

89.69

91.42

91.91

91.86

95.19

95.02

95.17

89.30

89.19

89.56

ML

92.76

92.58

92.72

88.57

88.72

88.48

92.68

92.64

92.54

88.75

89.28

89.32

94.73

94.97

94.57

92.76

92.58

92.72

LN

93.76

93.28

93.33

98.51

98.76

98.64

91.09

91.12

91.81

99.11

99.20

99.03

92.31

93.21

92.41

93.76

93.28

93.33

LS

82.44

82.10

81.98

96.74

96.66

96.75

83.63

83.74

83.55

97.23

97.35

97.28

86.27

86.35

86.23

82.44

82.10

81.98

LM

89.56

89.20

89.37

94.77

94.83

94.77

90.32

90.25

89.98

95.82

95.92

95.78

95.16

94.99

95.06

89.56

89.20

89.37

LL

92.93

92.69

92.67

93.01

92.92

93.00

92.80

92.80

92.61

94.28

94.50

94.34

94.74

94.92

94.63

92.93

92.69

92.67

Notes:Shiftsizes:N,no

ne;S,sm

all;M,mod

erate;

L,large.

International Journal of Production Research 11

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 13: Multivariate process monitoring and fault identification using multiple decision tree classifiers

in order to achieve comparable CR performance. The reason could be that in the estimation of normal distributionparameters, the variance estimator may be less accurate than the mean estimator because of rounding errors (Higham2002).

Additionally we analysed the performance of the four fault identification classifiers DTM1, DTM2, DTS1, and DTS2(Table 6). We only considered a sample size of 10 because it achieved acceptable CR values. The classifier performancewas quite good at identifying mean shifts without a change in variance and the correct ratios of DTM1, DTM2, DTS1,and DTS2 were 86% and higher for all combinations of moderate and large shifts in mean and variance. These perfor-mance levels illustrate the classification ability of the model. As expected, the CR degrades with smaller shift sizes as itis more difficult to identify smaller shifts. The proposed model has an acceptable performance for detecting both meanand variance shifts and the model performance will be further illustrated later in the paper when the proposed method iscompared to the reported performance of an existing model.

4.2 A trivariate case study (p = 3)

In a process with three quality characteristics, the number of shift patterns f p1 � f p2 is much higher than that in abivariate process. To reduce computational complexity, we considered fewer mean shift magnitudes (f1 = 7) and varianceshift magnitudes (f2 = 3) in this study where mean shift magnitudes and variance shift magnitudes of interest were(�3.0, �2.0, �1.0, 0.0, 1.0, 2.0, 3.0) and (1.0, 2.0, 3.0), respectively. In total there were 73 � 33 � 1 = 9260 mean andvariance shift patterns. We set N1 = 30,000 and N2 = 20. N1 þ ð73 � 33 � 1ÞN2=215,200 samples were used to train theDT classifiers. In performance testing, we also used the shift categories (i.e. small, moderate, and large shifts) discussedin Section 4.1. For simplicity, we only conducted the experiment with a fixed sample size m = 10. Table 7 presents theCR values of DT1 using the testing dataset for q ¼ 0:1; 0:3; 0:5; 0:7; and 0:9. Due to space limitation, we only reportedthe CR values of DTM1–3 and DTS1–3 for q ¼ 0:1; 0:5; and 0:9 in Table 8. We can observe that conclusions for p = 3are similar to those in the bivariate case study discussed in Section 4.1. DT1 performed better with moderate and largeshifts. The CR values showed an improving trend when the mean and/or variance shift magnitudes increased. The cor-rect ratios of DTM1–3 and DTS1–3 were at least 86% for all those shift patterns consisting of moderate and large shiftsin mean and variance. However, for those conditions where there were moderate or large variance shifts with no meanshifts or small mean shifts, the correct ratios of DTS1–3 were smaller than those of DTS1–2 in Table 6. It shows that theclassification ability of the DT model for variance shifts identification may deteriorate as the number of variablesincreases form p = 2 to p = 3. This is an important concern calling for further investigation.

5. Generalising experiments

The experiment in the previous section has two potential problems. First, the proposed model was trained and testedusing the same shift patterns and shift magnitudes. However, the shift magnitudes are often unknown and unpredictable.The change in shift magnitudes would be expected to have an impact on the performance of the proposed model. Sec-ond, the proposed model uses subgroup data instead of individual observations. Although the normality assumption canhold better in subgroup data, it is questionable that the proposed model will be robust to those processes where indepen-dent individual observations are collected. Therefore, we conducted two generalising experiments. First, we evaluatedthe performance of our model using subgroup data with shift magnitudes different from those used in model training.Our second experiment evaluated the performance of the model using individual observations pre-processed via a mov-ing window approach.

5.1 A generalising experiment using group data with new shift magnitudes

Previous results show that the correlation coefficient has little impact on the performance of the proposed model. There-fore, we arbitrarily set ρ to 0.5 at a medium level. The in-control samples had the following distribution parameters:

l0 ¼ ½0 0� and R0 ¼ 1:0 0:50:5 1:0

� �. We set both the number of in-control testing samples (N1) and that of out-of-control

samples (N2) to 400. The sample size was set to 10. Table 9 shows misclassifications and CR values corresponding tonew shift magnitudes. These shift sizes were arbitrarily chosen to test the ability of the model to identify shifts of differ-ent sizes that those used for training.

The process-monitoring module (DT1) in general achieved satisfactory results. There is an average number of Type Ierrors (i.e. classifying in-control samples as out-of-control instances) of 2.07. Type II errors occurred in only 11 out of

12 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 14: Multivariate process monitoring and fault identification using multiple decision tree classifiers

27 shift patterns. Most of them had less than 10 errors (at least 97.5% in CR). A large number of Type II errors onlyoccurred when there were small variance shifts and no mean shifts.

The performance of fault identification modules is also satisfactory. The classifiers for detecting mean shifts, DTM1

and DTM2, achieved at least 80% CR in most of the 27 shift patterns except those where only moderate or large vari-ance shifts existed. The classifiers for identifying variance shifts, DTS1 and DTS2, perform well with moderate and largeshifts with at least 80% CR values. There are only 27 of the total 108 fault identification results with CR values smallerthan 80%. Furthermore, the result shows that it is very difficult to identify the variance shifts if ks1 or ks2 equal or smal-ler than 1.5.

The testing shows that our proposed model still performs well when the shift magnitudes in testing are differentfrom those used in model training. Although the performance measures degrade slightly in comparison to the previousresults, a larger sample size is expected to improve the performance of our model.

5.2 A generalising experiment using individual data

We followed the moving window approach described in Guh and Shiue (2008) so as to group individual observationsinto grouped observations. Each moving window includes the current observation and the previous m-1 ones to make asample of size m. That way we can get a series of overlapping samples. Suppose we have N such samples in a multivar-iate process, each sample is denoted as Xi ¼ ½xij1 xij2�, where i = 1, 2, …, N, j = 1, 2, …, m. The mean vector andcovariance matrix of each sample can be calculated using Equation (2). A new feature vector for DT classifiers can be:Vi ¼ ð�xi1 �xi2 s2i1 s2i2 D2

i EiÞ; for i ¼ 1; 2; :::;N .In this case we also set m to be 10 so that it is comparable to the previous experiment where the group size is 10.

We generated N1 = 400 + m � 1 = 409 in-control samples and N2 = 400 out-of-control samples based on each shift pat-tern in Table 10. Using the moving window approach with window size of 10, we can obtain 800 grouped data withsample size of 10. Among these 800 grouped data, the first 400 of them contain in-control samples only; while the

Table 9. The performance testing of our proposed model using grouped data with sample size = 10.

kx1 kx2 ks1 ks2

DT1 error number CR values (%)

In-control Out-of-Control DTM1 DTM2 DTS1 DTS2

Small shifts 1.20 1.30 1.50 1.65 3 26 89.50 94.75 51.75 66.00�1.50 1.10 1.60 1.30 2 0 92.25 95.75 68.75 36.751.69 �1.20 1.00 1.65 4 0 100.00 85.75 96.75 73.750.00 0.00 1.50 1.65 2 163 77.50 61.50 52.75 78.500.00 0.00 1.60 1.20 1 259 74.75 81.25 65.25 31.000.00 0.00 1.00 1.65 3 222 96.00 65.00 97.25 80.001.20 1.30 1.00 1.00 1 43 96.75 99.50 94.00 94.00

�1.50 1.10 1.00 1.00 1 0 99.25 98.50 94.75 94.001.69 �1.20 1.00 1.00 1 0 100.00 96.75 95.75 96.25

Moderate shifts 1.75 2.20 1.75 2.00 3 1 98.50 100.00 75.50 90.25�1.80 1.80 2.00 1.75 1 0 95.00 99.75 86.25 75.001.80 �1.80 1.75 1.75 0 0 99.00 99.25 75.00 72.750.00 0.00 1.75 2.00 3 67 73.50 59.50 72.75 90.500.00 0.00 2.00 1.75 1 70 63.75 61.50 87.75 82.250.00 0.00 1.83 2.15 3 35 71.75 57.25 76.25 95.001.75 2.20 1.00 1.00 1 0 100.00 100.00 97.00 94.75

�1.80 1.80 1.00 1.00 3 0 100.00 100.00 97.00 93.251.80 �1.80 1.00 1.00 3 0 100.00 100.00 95.75 95.75

Large shifts 2.41 2.90 2.50 2.80 3 0 99.75 99.75 96.50 99.002.90 �2.41 2.35 2.79 2 0 100.00 98.50 96.50 98.00

�2.90 1.20 1.00 2.80 3 0 100.00 89.50 96.50 99.750.00 0.00 2.25 2.80 0 14 61.75 46.75 91.00 98.750.00 0.00 1.00 2.80 2 13 94.25 45.75 98.25 99.000.00 0.00 2.40 1.00 3 44 52.00 87.50 96.00 93.502.41 2.90 1.00 1.00 3 0 100.00 100.00 95.75 94.252.90 �2.41 1.00 1.00 3 0 100.00 100.00 95.25 94.00

�2.90 1.20 1.00 1.00 1 0 100.00 99.25 97.25 94.75

International Journal of Production Research 13

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 15: Multivariate process monitoring and fault identification using multiple decision tree classifiers

remaining 400 grouped data are out-of-control samples and the first nine of them contain both in-control and out-of-con-trol samples. There were in total 400 in-control grouped data and 400 out-of-control grouped data. We still used the DTclassifiers trained using group data (Section 4.1) to analyse the testing datasets (see Table 10).

In process monitoring classification (DT1), the average number of Type-I errors is 1.67. It means that our model canachieve an ARL0 of 239.99. The maximum number of Type-II errors was 237. That means the maximum ARL1 of ourmodel is 2.45. Only 10 out of 27 shift patterns had more than 10 Type-II errors. Type-II errors mostly occurred whenonly variance shifts were present. In fault identification (DTM1, DTM2, DTS1, and DTS2), 84 out of 108 (78%) CR val-ues were greater than 90% while 88% CR values were greater than 80% (bold in Table 10). Among those CR valuesless than 90%, 16 of them (67%) occurred when there were only variance shifts. The testing results show that the pro-posed model can still achieve satisfactory performance when being applied to process control based on individual obser-vations.

In Guh and Shiue’s work (Guh and Shiue 2008), they compared their methods with the traditional MSPC charts,such as the v2 chart, MCUSUM chart, MCI chart, MEWMA charts and it showed that Guh and Shiue’s model outper-forms the traditional charts. Consequently, we need only compare the ARL performance of our model to that reportedby Guh and Shiue (2008). In their study, Guh and Shiue considered only mean shifts with no consideration of varianceshifts. They chose to use 10 as their moving window size in order to maintain an expected in-control ARL. To makethe results comparable, we also set our moving window size to be 10. The proposed model is trained with group dataof sample size 10 considering both mean and variance shifts and tested in the situations where only mean shifts exist(see Table 11). The proposed model achieved an ARL0 of 216.04 with 1000 simulations compared to 192.00 achievedby Guh and Shiue’s approach. Most out-of-control ARL values of the proposed model were smaller than those of Guhand Shiue’s model. As for the small shifts, the ARL1 values of Guh and Shiue’s model are somewhat smaller than ourproposed model. This is due to the fact that the training dataset used in the proposed model contains both mean shiftsand variance shifts. Overall Table 11 shows that our model outperforms the Guh and Shiue’s model.

Table 10. Model testing results using individual observations.

Mean shiftVarianceshift DT1 error number Correct ratio (%)

kx1 kx2 ks1 ks2 In-control Out-of-control DTM1 DTM2 DTS1 DTS2

Small shifts 1.20 1.30 1.50 1.65 2 25 93.50 94.75 54.00 68.25�1.50 1.10 1.60 1.30 3 5 94.00 94.50 60.75 39.251.69 �1.20 1.00 1.65 1 2 98.50 90.75 96.75 74.250.00 0.00 1.50 1.65 3 179 81.75 62.50 59.00 80.000.00 0.00 1.60 1.20 2 237 66.50 84.50 62.00 22.000.00 0.00 1.00 1.65 3 210 93.75 61.25 97.00 81.751.20 1.30 1.00 1.00 0 62 94.50 99.50 94.00 94.50

�1.50 1.10 1.00 1.00 3 5 99.00 97.75 94.00 96.501.69 �1.20 1.00 1.00 2 1 99.25 93.00 91.25 99.75

Moderate shifts 1.75 2.20 1.75 2.00 1 5 96.50 99.50 77.50 93.50�1.80 1.80 2.00 1.75 3 0 96.25 98.75 96.25 80.251.80 �1.80 1.75 1.75 3 2 97.50 97.50 73.75 80.750.00 0.00 1.75 2.00 1 48 72.25 60.25 78.00 92.750.00 0.00 2.00 1.75 2 65 58.25 65.75 86.75 76.750.00 0.00 1.83 2.15 1 32 72.00 50.00 83.25 93.001.75 2.20 1.00 1.00 0 0 99.50 100.00 93.50 91.75

�1.80 1.80 1.00 1.00 2 4 99.00 99.25 90.50 91.751.80 �1.80 1.00 1.00 0 1 99.75 99.25 88.75 96.75

Large shifts 2.41 2.90 2.50 2.80 3 1 98.00 98.75 95.75 100.002.90 �2.41 2.35 2.79 2 1 99.75 99.25 96.75 95.50

�2.90 1.20 1.00 2.80 0 1 99.75 82.75 95.25 97.250.00 0.00 2.25 2.80 3 3 53.75 44.50 92.75 99.500.00 0.00 1.00 2.80 0 31 94.75 46.50 92.00 99.750.00 0.00 2.40 1.00 3 34 47.50 85.25 95.50 94.002.41 2.90 1.00 1.00 0 1 100.00 100.00 96.50 90.502.90 �2.41 1.00 1.00 2 2 99.50 99.00 92.00 92.50

�2.90 1.20 1.00 1.00 0 1 100.00 98.50 82.25 97.25

14 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 16: Multivariate process monitoring and fault identification using multiple decision tree classifiers

6. Conclusions and future research directions

In this research we proposed a multivariate process monitoring and fault identification model using DT learning tech-niques. Previous MSPC studies primarily focus on mean shifts assuming constant variances even though constant vari-ance may not actually occur in practice. In this study we considered both mean shifts and variance shifts. We alsoimprove the use of DT learning for detecting out-of-control shifts and identifying the causes to the shifts. Instead ofhaving one DT classifier handle both process monitoring and fault identification, we used 2p + 1 DT classifiers in theproposed model with one designed for process monitoring, p for mean shift identification, and p for variance shift iden-tification. Each classifier in our model is either a binary or a ternary DT classifier. Our experiments showed that the clas-sifiers performed well with different correlation coefficients. Our model also achieved satisfactory performance when theshift patterns in testing data were different from those in training data. Our model also outperformed Guh and Shiue’smodel for multivariate process monitoring.

Further research may make improvements in two directions. First, to relax the constant correlation coefficientassumption made in this work. Second, with the increase of p, there will be a huge number of mean and variance shiftpatterns and the related number of samples for DT training will also be very huge. Methods for optimising the DT train-ing set are needed, with which we can train the DT classifiers with a small number of samples and obtain a specifiedpredictive accuracy of the model.

Table 11. Comparison of the results between the proposed model and Guh and Shiue’s model.

kx1 kx2

Proposed model Guh and Shiue’s model

ARL Average ARL Average

0.000 0.000 216.043 216.043 192.000 192.000�0.900 �0.828 4.060 4.620 3.470 4.140�0.800 �0.920 3.940 3.743�0.700 �0.969 4.390 4.0200.000 0.866 4.910 5.8410.700 0.969 4.840 4.4060.800 0.920 4.930 3.8420.900 0.828 5.250 3.677�1.400 �1.167 1.220 1.210 2.293 2.610�1.200 �1.380 1.220 2.230�1.000 �1.468 1.170 2.4300.000 1.299 1.170 4.2541.000 1.468 1.230 2.5501.200 1.380 1.220 2.2771.400 1.167 1.220 2.253�2.000 �1.015 1.000 1.000 1.547 1.800�1.500 �1.896 1.010 1.250�1.000 1.000 1.000 2.8530.000 1.732 1.000 2.5171.000 2.000 1.000 1.6731.500 1.896 1.010 1.2602.000 1.015 1.000 1.470�2.500 �1.269 1.000 1.000 1.477 1.710�1.500 �2.482 1.000 1.327�1.000 1.484 1.000 2.2520.000 2.165 1.000 2.1871.000 2.484 1.000 1.8481.500 2.482 1.000 1.3702.500 1.269 1.000 1.473�3.000 �1.523 1.000 1.000 1.297 1.590�2.000 �2.937 1.000 1.050�1.000 1.949 1.000 2.3000.000 2.598 1.000 2.0731.000 2.950 1.000 1.9832.000 2.937 1.000 1.1403.000 1.523 1.000 1.277

International Journal of Production Research 15

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 17: Multivariate process monitoring and fault identification using multiple decision tree classifiers

AcknowledgementThe authors would like to thank the editor and referees for their helpful comments that improved the final version of this papersignificantly. This research was supported by Natural Science Foundation of China with Grants 71002105, 70931004, and 70802043.

References

Bersimis, S., S. Psarakis, and J. Panaretos. 2007. “Multivariate Statistical Process Control Charts: An Overview.” Quality and Reli-ability Engineering International 23 (5): 517–543.

Crosier, R. B. 1988. “Multivariate Generalizations of Cumulative Sum Quality-Control Schemes.” Technometrics 30 (3): 291–303.Djauhari, M. A. 2005. “Improved Monitoring of Multivariate Process Variability.” Journal of Quality Technology 37 (1): 32–39.Fuchs, C., and Y. Benjamini. 1994. “Multivariate Profile Charts for Statistical Process Control.” Technometrics 36 (2): 182–195.Guh, R.-S. 2005. “A Hybrid Learning-Based Model for On-Line Detection and Analysis of Control Chart Patterns.” Computers &

Industrial Engineering 49 (1): 35–62.Guh, R.-S. 2007. “On-Line Identification and Quantification of Mean Shifts in Bivariate Processes Using a Neural Network-Based

Approach.” Quality and Reliability Engineering International 23: 367–385.Guh, R.-S., and Y.-R. Shiue. 2005. “On-Line Identification of Control Chart Patterns Using Self-Organizing Approaches.” Interna-

tional Journal of Production Research 43 (6): 1225–1254.Guh, R.-S., and Y.-R. Shiue. 2008. “An Effective Application of Decision Tree Learning for On-Line Detection of Mean Shifts in

Multivariate Control Charts.” Computers & Industrial Engineering 55 (2): 475–493.Hawkins, D. M., S. W. Choi, and S. H. Lee. 2007. “A General Multivariate Exponentially Weighted Moving-Average Control Chart.”

Journal of Quality Technology 39 (2): 118–125.Higham, N. 2002. Accuracy and Stability of Numerical Algorithms, 2nd ed. Philadelphia: SIAMHotelling, H. H. 1947. Multivariate Quality Control in Techniques of Statistical Analysis, edited by, C. Eisenhart, M. W. Hastay, and

Wallis, New York, NY: McGraw-Hill Professional.Hou, T. H., W.-lin Liu, and L. Lin. 2003. “Intelligent Remote Monitoring and Diagnosis of Manufacturing Processes Using an Inte-

grated Approach of Neural Networks and Rough Sets.” Journal of Intelligent Manufacturing 14 (2): 239–253.Jackson, J. E. 1985. “Multivariate Quality Control.” Communications in Statistics-Theory and Methods 110 (14): 2657–2688.Jackson, J. E. 1991. A User’s Guide to Principal Component. New York, NY: Wiley.Jin, N., and S. Zhou. 2006. “Data-Driven Variation Source Identification for Manufacturing Process Using the Eigenspace Comparison

Method.” Naval Research Logistics 53: 383–396.Kramer, H., and L. V. Schmid. 1997. “EWMA Charts for Multivariate Time Series.” Sequential Analysis 16: 131–154.Lee, J. M., C. K. Yoo, and I. B. Lee. 2004. “Fault Detection of Batch Processes Using Multiway Kernel Principal Component Analy-

sis.” Computers & Chemical Engineering 28 (9): 1837–1847.Li, T., C. Zhang, and M. Ogihara. 2004. “A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue

Classification Based on Gene Expression.” Bioinformatics 20 (15): 2429–2437.Lowry, C. A., W. H. Woodall, C. W. Champ, and S. E. Rigdon. 1992. “A Multivariate Exponentially Weighted Moving Average Con-

trol Chart.” Technometrics 34 (1): 46–53.Mason, R. L., N. D. Tracy, and J. C. Young. 1995. “Decomposition of it’s T2 for Multivariate Control Chart Interpretation.” Journal

of Quality Technology 27 (2): 99–108.Mason, R. L., N. D. Tracy, and J. C. Young. 1997. “A Practical Approach for Interpreting Multivariate T2 Control Chart Signals.”

Journal of Quality Technology 29 (4): 396–406.Montgomery, D. C. 2007. Introduction to Statistical Quality Control. New Jersey, NJ: John Wiley & Sons.Niaki, S. T. A., S. Taghi, A. Niaki, and B. Abbasi. 2005. “Fault Diagnosis in Multivariate Control Charts Using Artificial Neural Net-

works.” Quality and Reliability Engineering International 21 (8): 825–840.Nomikos, P., and J. F. MacGregor. 1994. “Monitoring Batch Processes Using Multiway Principal Component Analysis.” American

Institute of Chemical Engineers Journal 40 (8): 1361–1375.Pignatiello, J. J. J., and G. C. Runger. 1990. “Comparisons of Multivariate CUSUM Charts.” Journal of Quality Technology 22 (3):

173–186.Praphu, S. S., and G. C. Runger. 1997. “Designing a Multivariate EWMA Control Chart.” Journal of Quality Technology 29 (1):

8–15.Quinlan, J. 1986. “Induction of Decision Trees.” Machine Learning 1 (1): 81–106.Quinlan, J. 1987. “Simplifying Decision Trees.” International Journal of Man-Machine Studies 27 (3): 221–234.Quinlan, J. R. 1996. “Improved Use of Continuous Attributes in C4.5.” Journal of Artificial Intelligence 4: 463–482.Reynolds, J., R. Marion, and K. Kim. 2005. “Multivariate Monitoring of the Process Mean Vector with Sequential Sampling.” Jour-

nal of Quality Technology 37 (2): 149–162.RuleQuest, R. P. L. 2009. Is See5/C5.0 better than C4.5?[online]. Available from: http://www.rulequest.com/see5-comparison.html

[Accessed date: 13/2/2012].

16 S. He et al.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3

Page 18: Multivariate process monitoring and fault identification using multiple decision tree classifiers

Runger, G. C., and M. C. Testik. 2004. “Multivariate Extensions to Cumulative Sum Control Charts.” Quality and Reliability Engi-neering International 20: 587–606.

Sullivan, J. H., Z. G. Stoumbos, R. L. Mason, and J. C. Young. 2007. “Step-Down Analysis for Changes in the Covariance Matrixand Other Parameters.” Journal of Quality Technology 39 (1): 66–84.

Warner, B., and M. Misra. 1996. “Understanding Neural Networks as Statistical Tools.” The American Statistician 50 (4): 284–293.Woodall, W. H., and M. M. Ncube. 1985. “Multivariate CUSUM Quality-Control Procedures.” Technometrics 27 (3): 285–292.Yang, K., and J. Trewn. 2004. Multivariate Statistical Methods in Quality Management. New York, NY: McGraw-Hill Professional.Yu, J. B., and L. F. Xi. 2009. “A Neural Network Ensemble-Based Model for On-Line Monitoring and Diagnosis of Out-of-Control

Signals in Multivariate Manufacturing Processes.” Expert Systems with Applications 36 (1): 909–921.Yu, J. B., L. F. Xi, and X. Zhou. 2008. “Intelligent Monitoring and Diagnosis of Manufacturing Processes Using an Integrated

Approach of KBANN and GA.” Computers in Industry 59 (5): 489–501.Zorriassatine, F., and J. Tannock. 1998. “A Review of Neural Networks for Statistical Process Control.” Journal of Intelligent Manu-

facturing 9 (3): 209–224.Zou, C. L., and F. G. Tsung. 2008. “Directional MEWMA Schemes for Multistage Process Monitoring and Diagnosis.” Journal of

Quality Technology 40 (4): 407–427.

International Journal of Production Research 17

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

7:00

13

May

201

3