MONITORING OF AUTOCORRELATED BATCH PRODUCTION PROCESSES THROUGH A GAUSSIAN PROCESS ... · 2017. 4. 6. · MONITORING OF AUTOCORRELATED BATCH PRODUCTION PROCESSES THROUGH A GAUSSIAN

MONITORING OF AUTOCORRELATED BATCH

PRODUCTION PROCESSES THROUGH A

GAUSSIAN PROCESS APPROACH


PRODUCTION PROCESSES THROUGH A GAUSSIAN

PROCESS APPROACH

By

Mu′men Hasan Rababah

Advisor

Dr. Hussam A. Alshraideh

Co-Advisor

Dr. Tarek H. Al-Hawari

Thesis submitted in partial fulfillment of the requirements for the degree of

M.Sc. in Industrial Engineering

At

The Faculty of Graduate Studies

Jordan University of Science and Technology

May, 2016



PROCESS APPROACH

By

Muˈmen Hasan Rababah

……………………… Signature of Author

Signature and Date Committee Member

Dr. Hussam A. Alshraideh (Chairman) ………………………

Dr. Tarek H. Al-Hawari (Co-Advisor) ………………………

Dr. Mohammed S. Obeidat (Member) ………………………

Dr. Yazan K. Migdadi (External Examiner) ………………………

May, 2016

تفويض

محتوى ي نشرفجامعة العلوم والتكنولوجيا الاردنية حرية التصرف نحن الموقعين أدناه، نتعهد بمنح

قوانين فق الوالرسالة الجامعية، بحيث تعود حقوق الملكية الفكرية لرسالة الماجستير الى الجامعة

والانظمة والتعليمات المتعلقة بالملكية الفكرية وبراءة الاختراع.

الطالب المشرف المشارك المشرف الرئيس

د. حسام الشريدة

د. طارق الحوري

مؤمن ربابعه

التوقيع والتاريخ

........................

........................

التوقيع والتاريخ

.........................

.........................

الرقم الجامعي والتوقيع

20143029004

.........................

I

DEDICATION

To the memory of my first teacher, my father, and my first

friend, my brother Mohammed. I miss you every day…

To my mother, Thanks is not enough to what you have done for

me, may Allah helps me to be a good son for you…

To my brothers, Mu’ath, Mu’nis, Mustafa, and AbdAlraheem…

To my sisters, Roa’, and Enas, I love you all so much and many

thanks for your support…

To my lovely friends and family thanks for all…

Audai, Abdullah, Saba’, Reham, Heba, Ansam, Mahmoud,

Osama, and Khalid sharing your moments with me was the best

things that happened during these two years, special thanks for

all of you…

II

ACKNOWLEDGMENT

“Glory to thy Lord, the Lord of Honour and Power! (He is free) from what

they ascribe (to Him)! (180) And Peace on the messengers! (181) And

Praise to Allah, the Lord and Cherisher of the Worlds. (182)” Quraan [37:180-

182]

First of all, I thank ALLAH for helping and guiding me through my life, and

prayer and peace be upon the Prophet Muhammad, peace be upon him.

Secondly, I also would like to thank Dr. Hussam Alshraideh for everything, it was

an honor for me to be one of your students, and many thanks to Dr. Tarek Al-Hawari,

thanks for both of you for your patience and support.

Finally, many thanks for my family, friends, colleagues, and everyone who

supported me. May ALLAH bless you all.

III

TABLE OF CONTENTS

Title Page

DEDICATION I

ACKNOWLEDGMENT II

TABLE OF CONTENTS III

LIST OF FIGURES V

LIST OF TABLES VII

LIST OF APPENDICES VIII

LIST OF ABBREVIATIONS IX

ABSTRACT x

Chapter One: Introduction 1

1.1 Overview 1

1.2 Problem Statement 4

1.3 Study Objectives 4

1.4 Significance of Work 5

1.5 Thesis organization 5

Chapter Two: Literature Review 7

Chapter Three: Gaussian Processes 13

3.1 Gaussian Processes Definition 13

3.2 Covariance Functions 15

3.3 Dependent Gaussian Process 20

3.4 Gaussian Process Applications 21

3.4.1 Gaussian Process for Regression And Prediction 21

3.4.2 Gaussian Process for Classification 22

3.5 Gaussian Process Fitting 23

Chapter Four: Research Methodology 24

4.1 Data Collection 24

IV

Title Page

4.2 Modeling Approach 24

Chapter Five: Data, Analysis, and Results 27

5.1 Simulated Data Method 27

5.2 Penicillin Fermentation Process Data 33

5.3 Wafer Data Set 39

5.3.1 Preparation Data for Testing 42

5.3.1.1 Fix Batch Length 42

5.3.1.2 Time Series Profiles Alignment 44

5.3.2 Wafer Results 46

Chapter Six: Conclusions and Future Work 48

References 50

Appendices 54

Abstract in Arabic Language 54

V

LIST OF FIGURES

Figure

Description

Page

1.1 Functional response example for Biomass concentration in fed-

Batch penicillin production

3

2.1 Principal components for 2-dimensions 9

2.2 The upper and lower bold lines indicate a two time series where

the small lines between it indicate that distance (a) Euclidean. (b)

Alignment

12

3.1 Dependent outputs, the model is represented by the solid lines,

where the true function are represented by the dotted lines, the

two figures in the top for an independent model, and the other are

for a dependent model

20

3.2 At the left the November 2010 closing price for the S&P, while

the right side is the prediction using GP for the data collected

21

4.1 Methodology flow chart 26

5.1 Simulated data profiles 28

5.2 The autocorrelation charts for the simulated data of each sensor 29

5.3 The correlation matrix between simulated sensors data set 30

5.4 The autocorrelation charts for the residuals of each sensor 31

5.5 Penicillin fermentation process 33

5.6 Penicillin production simulator inputs 34

5.7

Penicillin simulation outputs

Error!

Bookmark

not

defined.

5.8 Penicillin testing data profiles, (a) Biomass concentration (g/L),

(b) Penicillin Concentration (g/L), (c) Carbon Dioxide

Concentration (g/L), (d) Generated Heat (kcal)

37

5.9 The autocorrelation charts for the penicillin data of each sensor 37

5.10 The correlation matrix between penicillin sensors data set 38

5.11 Wafer data profiles for each sensor respectively 40

VI

5.12 The autocorrelation charts for the wafer data of each sensor 41

5.13 The correlation matrix between wafer sensors data set 42

5.14 Sensor 6 shows three different batches with variation in length 43

5.15 Sensor 6 with three different batches having the same length,

length=100 time unit

43

5.16 Radio frequency forward power data without alignment 44

5.17 Radio frequency forward power data after alignment 45

5.18 Wafer production process stages 46

VII

LIST OF TABLES

Table Description Page

4.1 Comparison between Gaussian process and Gaussian distribution 14

4.2 Summary of several commonly-used covariance functions 19

5.1 Simulated data results for type I error rate and ARL 32

5.2 Penicillin dataset accuracy results 38

5.3 Wafer data results 47

VIII

LIST OF APPENDICES

ageP Description Appendix

A UCL code 54

B Get residuals code 55

C Profiles alignment code 58

IX

LIST OF ABBREVIATIONS

Abbreviations Description

ARL Average Run Length

CUSUM Cumulative Sum

DTW Dynamic Time Wrapping

GMM Gaussian Mixture Model

GP Gaussian Process

LCL Lower Control Limit

MLE Maximum Likelihood Estimate

MPCA Multi-Way Principal Component Analysis

MPLS Multi-Way Partial Least Square Analysis

MSPC Multivariate Statistical Process Control

PCA Principal Component Analysis

PLS Partial Least Square

RGP Recursive Gaussian Process

SE Squared Exponential

UCL Upper Control Limit

x

ABSTRACT



PROCESS APPROACH

By

Muˈmen Hasan Rababah

Achieving the minimum number of nonconforming items is always the goal in any

industry. To achieve this goal, statistical process monitoring and control tools are used

where process variables are being monitored to detect any deviations from normal

operating conditions. In many industrial processes; such as batch production processes;

multiple process variables must be monitored as they play a key role in the quality of the

final product. Several monitoring tools are found in the literature that deal with the case of

multiple process variables, but a few of them deal with the case of autocorrelated variables.

Existing tools that are used to monitor autocorrelated process variables have two main

drawbacks. First, they are run off-line. That is, monitoring is performed at the end of the

production cycle and hence no corrective actions can be made. Second, these tools utilize

dimensionality reduction techniques to solve for computational complexity issues.

In this thesis, the case of monitoring multi-variable processes including batch production

processes is considered. A Gaussian Process (GP) based modeling approach is proposed.

The proposed approach takes into consideration both, the correlation between process

variables along with the autocorrelation within each variable readings.

Two simulated and one real data sets were used to validate the proposed modeling

approach. In the first data set, data was simulated to mimic a real production process with

three variables of interest. A web based simulator for batch production of penicillin was

used to generate observations for the second data set. For the real one, data from wafer

etching process was considered. Model performance was assessed through the average run

Length (ARL) and the accuracy in predicting out-of-control batches. In regards to the

Average Run length, the proposed modeling approach had similar results to the Shewhart

type control charts; the optimal case for independent process data. Depending on the value

of type I error rate assumed and number of training instances, accuracy values ranging

from 92% to 98% were achieved. Obtained results indicate that the proposed modeling

approach has a good performance to be used for monitoring of multi-variable processes. In

addition, the proposed approach overcomes the drawbacks of other tools in that it can be

used as an on-line monitoring tool and no dimensionality reduction is performed.

1

Chapter One: Introduction

1.1 Overview

Process monitoring is essential in every industry to cut down unnecessary quality

costs. Following Shewhart model, causes of variability or bad quality in any process are

classified into two types, assignable and common causes (Runger, 2002). Assignable

causes are those sources of variation external to the process and do not represent the

normal operating conditions, while common causes are those error sources that present as a

natural part of the process (Montgomery, 2013). Detection and elimination of assignable

causes reduces process variability and as a consequence cuts down bad quality cost. Hence,

it is the main objective of any process monitoring technique (Runger, 2002).

In most industrial processes, more than one process parameter plays a key role in

the quality of the final product. Monitoring of the final product quality requires an online

monitoring of those key process parameters through statistical techniques. Such monitoring

scheme is referred to as Multivariate Statistical Process Control (MSPC) in the literature.

Control charts are a basic process monitoring tool and are considered as one of the

magnificent tools for quality improvement. �� or R are examples of control charts, which

are used to monitor independent observations when observations are variables

(Montgomery, 2013), another example is the Hotlleing T2 control chart (Hotelling, 1947),

which will be used in this thesis later on as it is one of the common charts for MSPC, it is

valid if there is no autocorrelation within the variable observations. Basic process

2

monitoring tools such as control charts assume that observed data is free of

autocorrelations. That is, data points observed at nearby time points are independent or

uncorrelated, an assumption that is widely violated in many industrial processes specially

in chemical industries. Several techniques have been proposed in the literature to deal with

autocorrelated processes. Autoregressive Integrated Moving Average (ARIMA), is a set of

statistical models that are usually used to model autocorrelations within time series data.

Batch production processes are common industrial production processes. Such

production processes are used in the pharmaceutical industry, paint production, and many

others. In such processes, a large amount of certain product is usually processed at the

same time and is called a batch. Processing time of batches is mostly long, and hence, real

time monitoring of such processes is needed to minimize the amount of bad quality

batches. End-of-line monitoring of batch processes is not useful as in most cases bad

quality batches are sold at discounted price, sent back for reproduction, or scraped. In

either case, an extra cost is incurred, and can be eliminated through online monitoring at

which remedial actions can be done before the end of the production cycle once an out-of-

control batch is anticipated (Diwekar, 2014).

Two types of batch processes monitoring approaches are used to check the quality

of any produced batch. The first type as mentioned before is end-of-line monitoring that is

done after the process is finished, for this type of monitoring the data is collected along the

batch length, then these data is used to take a decision about the batch quality whether it is

in-control or out-of-control. The major disadvantage of the off-line monitoring that we

cannot do any corrective actions to fix the process if a batch was out-of-control. On the

other hand, on-line monitoring approaches are used to check the batch quality while it is

under processing, the available data from the start of the process up to the time that we

want to do the monitoring process is used to take a decision about the process, according to

3

the results obtained from the monitoring process if a batch was in-control keep it without

any changes, while if it was out-of-control we can apply any corrective actions if possible

to allow the process to return to the in-control situation (Gunther, 2007).

Process parameters in batch production processes are observed over production

time forming a whole time series, that is known as a functional response in the literature

(Holling, 1959). Figure 1.1 depicts an example of a functional response representing

biomass concentration in a fed-batch penicillin production process. Biomass concentration

is a key production parameter that controls the amount and quality of penicillin produced.

We note here that each batch of penicillin is processed typically over a 400 hours

production period. This long processing time urges the need for online monitoring of

penicillin, otherwise the whole batch processing time is wasted if the batch is out-of-

control.

Figure 1.1: Functional response example for biomass concentration in fed-batch penicillin

production (Birol et al., 2002).

Monitoring of batch production processes through traditional control charts is

improper due the nature of observed process data. As shown in Figure 1.1, observed

process data is highly autocorrelated and hence Shewhart based control charts are not

appropriate for such data.

4

In here, we propose a monitoring approach for batch production processes based on

the use of Gaussian Process Model. When only one variable of interest is observed over

production time, batch process data is considered as a replicated univariate time series

data. Extending the approach proposed by Data-driven monitoring for stochastic systems

and its application on batch process Alshraideh and Khatatbeh (2014) to the case of

replicated and multivariate time series data is proposed.

1.2 Problem Statement

On-line process monitoring is essential in every industry to cut down out-of-control

products cost. When the process has multiple variables to be monitored, the use of

multivariate statistical process monitoring tools is advised. Most of these tools assume no

correlation within the sequential observations of each of the process variables. A few tools

consider this autocorrelation, but unfortunately, they are run off-line making it difficult to

apply any corrective actions. Hence, a multivariate statistical process monitoring tool that

can be run on-line is indeed needed.

1.3 Study Objectives

1. Identifying a batch production process and collecting the related and required facts.

2. Fitting a Gaussian Process Model for the collected training data, then calculating

the residuals to obtain the Upper and Lower control limits (UCL and LCL) for T2

control chart. As the chart become ready to be tested the residuals for the testing

data are used to calcite the T2 statistic for each observation, finally we compare the

obtained T2 against the control limits to check quality of that batch.

3. Constructing an on-line process monitoring tool that considers both within and

between profiles correlation.

5

4. Overcoming the dimensionality techniques that have been used up today, where all

variables readings will be included for constructing the monitoring tool.

5. Helping manufacturers by having a monitoring tool that gives an early alarm to do

some corrective actions before the process is finished.

1.4 Significance of Work

The proposed approach will be different than other approaches found in the literature

in two aspects:

1- Proposed approaches in the literature rely on dimensionality reduction techniques

Principal Component Analysis and Partial Least Square (PCA and PLS) (Nomikos and

MacGregor, 1994) where some information might be lost. In here, the proposed modelling

approach takes into account all information available and no dimensionality reduction will

be performed.

2- A control chart with upper and lower limits will be provided based on observed in-

control data, which makes the monitoring process suitable for on-line monitoring as it will

be less computationally expensive than other approaches.

1.5 Thesis organization

This thesis is organized as follows. Chapter one provides a brief introduction to the

thesis and outlines the significance of the study. Chapter two provides an overview of

previous researches related to this study in the literature, and it explains some previous

methods used for multivariate statistical process control and on-line batch monitoring.

Chapter three introduces a review of the Gaussian Process. Chapter four includes the

methodology followed to develop the proposed monitoring approach. The experimental

6

results are listed in chapter five. Finally, conclusions and future work recommendations are

provided in chapter six.

7

Chapter Two: Literature Review

Literature on MSPC, autocorrelation process, on-line batch process monitoring and

GP Model are reviewed in this chapter as these topics describe the development of on-line

monitoring of batch processes.

MSPC has been suggested by Hotelling (1947), he developed a control chart called

the T2 control chart for monitoring processes when more than one process variable is taken

in concern. The T2 control chart, which is a Shewart–type chart, became widely used in this

field and the base stone for many related process monitoring methods (Umit and Cigdem,

2001). One of the efficient studies by Lucas (1985) discussed the Multivariate Cumulative

Sum (CUSUM), where it is used to detect small process change. Many studies in literature

discussed the uses and the development on MSPC such as (Montgomery, 2013; Martin et

al., 1996; Mason et al., 1997).

Basic process monitoring tools, such as control charts assume that observed data is

free of autocorrelations. Several techniques have been proposed in the literature to deal

with autocorrelated processes. This include the use of ARIMA models, hidden Markov

chain models, and GP model proposed by Alshraideh and Khatatbeh (2014).

Autocorrelated data arise frequently in batch production processes, where usually process

parameters are recorded continuously through sensors for the whole batch production

period. An example of such processes is shown in Olszewski (2001), where six process

parameters are monitored during the etching process in semiconductor wafer production.

8

As described before, many researchers discussed MSPC and how to deal with

autocorrelation within collected process data. Monitoring of batch production processes

have been considered by several authors. Studies found in the literature consider this same

problem from two points of view. Several authors have considered monitoring of batch

processes in a similar manner to classical process quality control, where causes of

variability are defined as common and special causes and the aim is to detect those special

causes that lead to an out of control process. Other researchers have considered this

problem as a classification of multivariate time series data where for each batch a set of

variables are observed over the batch production time and the quality of the batch as the

class is known.

One of the original studies in the field of on-line batch monitoring related to

Nomikos and MacGregor (1994), they proposed Multi-Way Principal Components

Analysis (MPCA) to deal with batch process data. In their approach, independent principal

components are extracted from the observed process data and process monitoring using the

Hotelling T2 statistic is performed. Their approach relies on dimensionality reduction

techniques using MPCA, were some information might be lost, and all the information

from a dataset of batches is folded into a few matrices. To incorporate available data about

the quality of observed batches, Nomikos and MacGregor (1995) considered the use of

Multi-Way Partial Least Squares (MPLS). Both MPCA and MPLS assume Gaussian and

stationary process data.

Two assumptions were taken into consideration when these approaches were

developed; the first assumption was about the validation of the approach, where it is valid

if the reference data set is representative of the process operation. In addition, a new data

set must be built if something changes in the process, which express changes in the process

and reapply the method. The second assumption was about the requirement that the events

9

that one wishes to detect must be observable. Each event that does not affect the

measurements cannot be detected by any monitoring procedure.

PCA is a statistical procedure that converts the observations of a set of correlated

data by shifting the data and centring it. Figure 2.1 describes the method of PCA, then

building new axes which will eliminate the correlation between the variables. Each new

axis is called a principal component, the number of principal components needed to

analyize the data is equal to the number of variables or less than the number of variables

which are the components that have the highest source of variability in the process.

Figure 2.1: Principal components for 2-dimensions (Montgomery, 2013).

To overcome the dimensionality reduction in MPCA and MPLS approaches, Yoo et

al. (2004) proposed an approach based on the use of Independent Components Analysis

(ICA) where stationarity and normality of data are not required. Garcia-Munoz, Kourti,

and MacGregor (2004) used the missing data option offered by Nomikos and MacGregor

(1995) because of its accuracy when estimating the new observed batch data with some

updates on the distance equation of the Hotelling’s statistic.

Recent studies discussed the on-line batch monitoring using new techniques that

are different from what was discussed before. One of these researches by Le Zhou, Chen

10

and Song (2015); an approach is proposed based on Recursive Gaussian Process (RGP)

regression model. This approach is valid for process monitoring when the batch length is

very wide where a lot of observations will be collected, for these types of processes as the

length is wide the quantity of the batch process data at the beginning of monitoring process

seems to be very limited. A GP model have been used at the initial stage of monitoring to

build a relation between the data variables, then the proposed technique is used for

monitoring new batches, but this approach is applied for a single response only.

For some cases where the data are collected from a complex process multivariate

Gaussian distribution is not applicable. Chen and Zhang (2010) have used Gaussian

mixture model (GMM) to improve on-line monitoring for batch processes and to estimate

the Probability Density Function (pdf). The results of this study showed that this approach

has a good indication to decrease the percent of false alarms. A limitation of this approach

is that if the dimensionality is high, it will not work due to computational reasons.

Subspace-aided approach has been constructed by Yin et al. (2011) to deal with

processes that show dynamic and random disturbances, and because of the normalization

step which is the basic procedure of multivariate statistical process control, this approach

cannot be implemented since process variables have a wide range of operation.

Spatio-temporal statistics can be used here; since the observed data comes from

batch processes. In Spatio-temporal statistics the interest is where and when the data have

been collected; to check if the location where the data have been collected, or the time

when the data have been collected have a significant effect. GP is a spatial stochastic

process that has been developed and used extensively in literature as a prediction and

interpolation tool (Alshraideh and del Castillo, 2013; Chen and Zhang, 2010; Le Zhou et

11

al., 2015). Gaussian Process Models have been used before by Banerjee et al. (2004) for

Spatio-temporal problems, it was used for analyzing geostatistics data sets.

Monitoring of batch processes can be done using GP Model, since there is an

autocorrelation within collected data. Alshraideh and Khatatbeh (2014) have recently

proposed a control chart for monitoring univariate correlated process data. In their

approach, they assume a continuous production process where a variable of interest is

observed over time and the goal is to detect any deviations in the process due to assignable

causes. This new method was based on the GP Model. Maximum Likelihood Estimator

(MLE) applied after the construction of the distance matrix to estimate the model

parameters, then the estimation of the response next point at time t+1 is estimated. To

build a control chart the upper and lower limits (µ ± L𝜎) need to be determined based on

the model residuals.

A detailed review of time series classification techniques is provided by Baydogan

(2012). Such techniques include the Nearest Neighbors (NN) classifiers, which classify

objects upon comparing the data set profiles with the closest training profile in the feature

space. Most of the time there are many problems that occur while analyzing time series

data sets, time series alignment is one of these problems. In classification process the data

is checked with Euclidean distance or not to do the classification, Figure 2.2 shows the

difference between four time series where two have a Euclidean distance, and the others

are aligned using Dynamic Time Wrapping (DTW) technique. To detect if there is a

correlation or not. Some techniques can be used to align the time series data, simply by

selecting a single time series profile to set it as a reference for other time series profiles.

One of the approaches that can be used for the alignment process is DTW, which shows a

measure of the similarity of time series independent of certain non-linear variations in the

time dimension.

12

Figure 2.2: The upper and lower bold lines indicate two time series where the small lines

between them indicates that distance (a) Euclidean. (b) Alignment by DTW (Baydogan,

2012).

Those Classification methods assume univariate time series data where breaking

into several univariate series is needed. In case of multivariate time series data, ensembling

methods have been proposed to solve this issue.

13

Chapter Three: Gaussian Processes

The proposed approach in this thesis is based on the use of GP Model, in here, we

will go through GP with some details. This chapter is organized as follow: an introduction

about GP, dependent GP and GP applications.

GP is a well-known class of probability distributions on functions. It has been used

for long time as prediction and interpolation method, specially used for time series

prediction (Osborne and Roberts, 2007). GP have been used since the 1970’s in

geostatistics. Kriging is the term used to describe prediction in geostatistics, where two or

three space dimensions are the inputs to the process in spatial statistics.

3.1 Gaussian Processes Definition

Many books and articles have defined and described the GP, it can be defined as

when observations are from a continues domain (time or space) we can say that the process

is a GP, where each point associated with normally distributed random variables. Every

finite collection of those random variables has a multivariate normal (or Gaussian)

distribution. The distribution of a GP is the joint distribution of all those random variables.

GP can be seen as an extremely large dimensional generalization of multivariate normal

distribution. Each point (observation) must be normally distributed.

When observations X at time t {Xt, t 𝜖 𝑇} come from a stochastic process, for any

random distinct value t1,….,tk 𝜖 𝑇, the random victor X = (Xt1, . . . , Xtk)ˊ has a multivariate

normal distribution, then:

14

X~ MVN (𝜇(𝑡), K(xi,xj)) (3.1)

Where 𝜇(𝑡) = E[Xt] is the mean function, which is a vector n×1, and k(xi,xj) is the

covariance function, which is a matrix (Σ covariance matrix) n×n, GP determined by its

mean and covariance functions. GP is a general case for the Gaussian distribution, a short

comparison between GP and Gaussian distribution is listed in table 3.1. The Gaussian pdf

for vector X is:

ƒx(x) = (2𝜋)-n/2 |Σ|-1/2 exp(−1

2(𝑥 − 𝜇)ˊ Ʃ

-1(𝑥 − 𝜇)) (3.2)

Table 3.1: Comparison between Gaussian process and Gaussian distribution (Kuss and

Rasmussen, 2006).

Gaussian Process Gaussian Distribution

Distribution over functions Distribution over vectors

Mean function and a covariance

function:

ƒ ∼ MVN (𝝁(𝒕), k(xi,xj))

Mean and a covariance:

X ∼ MVN (μ,Σ)

Index affected by the

argument x of ƒ(x)

Index affected by the

position of the random

variable xi

The mean and the covariance function should carefully be chosen to define the GP,

the zero is widely spread used to define the mean function, but the covariance function

have several definitions and are discussed in the next section.

15

3.2 Covariance Functions

Covariance functions play a significant role in the definition of the process

behaviour, and as described before that the mean function is set to zero most of the time; in

this case, the covariance function will fully determine the process behaviour.

One of the process properties is to be stationary, which mean that the process mean

and covariance do not change over time, it seems to be one of the important tools in time

series analysis. Process isotropy also can be defined by the covariance function which is

the uniformity in all direction, where the covariance function is a function only of ||x-xˈ||

(distance between x and xˈ), we can say that the process is homogenous if it is stationary

and isotropic (Grimmett and Stirzaker, 2001). The covariance function also defines the

smoothness and periodicity of the process (Barber, 2012).

The Gaussian time series is stationary if:

1. 𝜇(t) = E[Xt] = 𝜇 is independent of t.

2. k(t+h,t) = k(Xt+h, Xt) is independent of t for all h.

For any GP it is stationary if we have:

1. If Xt has a distribution as follow Xt ∼ N(µ, k(0)) for all t.

2. The covariance matrix for (Xt+h, Xt)ˊ is as below for all t and h:

[𝑘(0) 𝑘(ℎ)𝑘(ℎ) 𝑘(0)

]

The covariance function build the covariance matrix, where kij is the covariance

matrix and k(xi,xj) is the covariance function, which characterize the correlation between

the data points.

kij = k(xi,xj) (3.3)

16

The properties for an autocovariance function cov(∙) are:

1. k(0) ≥ 0.

2. |k(h)| ≤ k(0) for all h.

3. k(h) = k(-h).

4. The covariance matrix must be positive semi-definite.

Covariance functions have several forms (Williams and Rasmussen, 2006), a brief

description of the most used functions are listed below:

1. Constant covariance function:

k(x,xˈ) = 𝜎02 (3.4)

The constant covariance function is stationary and degenerate (when the covariance

function has only a limited number of non-zero eigenvalues).

2. Linear covariance function:

k(x,xˈ) = 𝜎02 x xˈ (3.5)

A simple linear regression model is the base for this function, f(x) = 𝛽x with

𝛽~N(0,𝜎02). It is nonstationary and degenerate function.

3. Squared Exponential (SE) covariance function:

k(x,xˈ) = exp ( - 𝑟2

2ℓ2 ) (3.6)

Where:

- ℓ: The characteristics scale length.

- r: The distance between x and xˈ and it is equal to r = ||x – xˈ||.

17

This function is one of the most widely-used covariance functions, it is stationary

and isotropic, also it is differentiable, which refers to the GP with this kernel has mean

square derivatives of all orders, which make it very smooth. It corresponds to projecting

the input data into a large scale dimensional feature space (Snelson, 2007).

4. Matern class covariance function:

k(x,xˈ) = 1

2𝜈−1𝛤(𝜈)(

√2𝜈

ℓ𝑟)𝜈 𝐾𝜈 (

√2𝜈

ℓ𝑟) (3.7)

Where:

- 𝛤(𝜈): The gamma function evaluated at 𝜈.

- 𝐾𝜈: The modified Bessel function of order 𝜈.

The roughness of the random functions defined by the order ν, this kernel can be

used for many applications such as: geostatistics and spatial statistics. It is also used to

define the statistical covariance between measurements made at two points. It is stationary,

Since the kernel only depends on distances between the points. It can be isotropic if the

distance is Euclidean.

The Matérn class function becomes simple when 𝜈 is half-integer: 𝜈=p+1/2, p is a

non-negative integer. The kernel here is a product of an exponential and a polynomial of

order p.

5. Rational quadratic covariance function:

k(x,xˈ) = (1 +𝑟2

2𝛼ℓ2)−𝛼 (3.8)

𝛼, ℓ > 0 can be seen as a scale mixture of SE kernel with different characteristic

length-scales, this function is used frequently in spatial statistics and image analysis. It is

18

stationary. As Matern class covariance function rational quadratic is isotopic if the distance

is Euclidean.

6. Polynomial covariance function:

k(x,xˈ) = (𝑥. 𝑥ˈ + 𝜎02)𝑝 (3.9)

When dealing with real-world problems this function is not preferable, as they

imply a high covariance for distant inputs. The reverse situation is commonly seen, i.e.

nearby inputs result in close to each other outputs. This often leads to an inferior prediction

performance of polynomial regression as compared each. It is one of the nonstationary

kernel and degenerate.

7. Exponential covariance function:

k(x,xˈ) = exp(- 𝑟

ℓ ) (3.10)

In the Matérn class if 𝜈 = ½ gives the exponential covariance function. It is

stationary and nondegenerate.

8. 𝛾-exponential covariance function:

k(x,xˈ) = 𝑒𝑥𝑝(−(𝑟

ℓ )𝛾) (3.11)

Both the exponential and SE are special cases of the 𝛾-exponential, this function

has a similar number of parameters to the Matérn class. As the exponential function it is

stationary and nondegenerate.

Table 3.2 summarizes the covariance functions as follow:

19

Table 3.2: Summary of several commonly-used covariance functions.

Kernel Expression Properties

Constant 𝜎02 Stationary, degenerate

Linear 𝜎02 x xˈ

Nonstationary,

degenerate

Squared Exponential exp( - 𝑟2

2ℓ2 ) Stationary, nondegenerate

Matern class 1

2𝜈−1𝛤(𝜈)(√2𝜈

ℓ𝑟)𝜈 𝐾𝜈 (

√2𝜈

ℓ𝑟) Stationary, nondegenerate

Polynomial (𝑥. 𝑥ˈ + 𝜎02)𝑝

Nonstationary,

degenerate

Exponential exp(- 𝑟

ℓ ) Stationary, nondegenerate

𝜸-Exponential 𝑒𝑥𝑝(−( 𝑟

ℓ )𝛾) Stationary, nondegenerate

Rational quadratic (1 +𝑟2

2𝛼ℓ2)−𝛼 Stationary, nondegenerate

Predicting financial markets is an example of the GP applications, Figure 3.1a

shows November closing price for the Standard & Poor's 500 (S&P) as points, while the

prediction for these observations using GP is shown in Figure 3.1b assuming zero mean

and using the Marten covariance function.

20

Figure 3.1: At the left the November 2010 closing price for the S&P, while the right side

is the prediction using GP for the data collected (McDuff, 2010).

3.3 Dependent Gaussian Process

When dealing with dependent observations over space, or time, or time and space,

and then we are trying to model that process, the use of GP is one of the preferred and most

widely used modelling technique.

A single output variable is mostly implemented by GP. An independent model is

built to deal with multiple outputs where each output has its model separate from the other

output as the multi-kriging method (Boyle & Frean, 2004). Consider the example shown in

Figure 3.2 describing two-coupled outputs. A detailed description about output 1, but

output 2 is scattered. If we deal with the outputs as independent, we cannot exploit their

similarity, so predictions will be made about output 2 using what we learn from both

output 1 and 2.

Multiple processes can be handled by inferring convolution kernels instead of

covariance functions. This makes it easy to construct the required positive definite

covariance matrices for covarying outputs.

21

Figure 3.2: Dependent outputs, the model is represented by the solid lines, where the true

function are represented by the dotted lines, the two figures in the top for an independent

model, and the other are for a dependent model (Boyle and Frean, 2004).

3.4 Gaussian Process Applications

3.4.1 Gaussian Process for Regression and Prediction

Gaussian process is a powerful tool for regression, it takes an important place in the

theory of probability. The regression problem is a general statistical problem. GP have

been used widely beyond the regression model and it was developed to be used in

classification (Williams and Barber, 1998; Kuss and Rasmussen, 2005).

If we have a data set D consisting of N input vectors x1,. . .,xN and continuous

outputs y1,. . .,yN. an assumption that the outputs are noisily observed from a function f(x).

The regression model object is to estimate f(x) from D. A GP regression model is a

probabilistic Bayesian model. A GP defines a probability distribution on functions p(f). it

can be used as a Bayesian prior to the regression.

22

Within a class of kernel, hyperparameters control properties such as lengthscale.

The GP is used to express a prior belief about the f(x) we are modelling. A noise model is

defined to link the data to f(x), and then regression is a matter of Bayesian inference.

Based on any finite set of training and test observations are jointly Gaussian

distributed, the Gaussian Process can be implemented to do regressions. In here, if y1 and

y2 are an output of a stochastic and it has a joint normal distribution

[𝑦1

𝑦2] ~ 𝑁 ([

𝜇1

𝜇2] , [

Ʃ11 Ʃ12

Ʃ21 Ʃ22]) (3.12)

Where Ʃ22 is nonsingular, then the mean is equal to:

𝜇𝑋1|𝑋2= 𝜇1+ Ʃ12 Ʃ22

−1(X2 ــ 𝜇2) (3.13)

The covariance matrix is:

Ʃ𝑋1|𝑋2= Ʃ11ــ Ʃ12 Ʃ22

−1 Ʃ21 (3.14)

Then for any new predicted observation for output y:

[𝑦𝑦∗] ~ 𝑁𝑛+1 ([

𝜇𝜇] , [

Ʃ11 Ʃ12

Ʃ21 𝜎2 ]) (3.15)

When having n training observations and n test observations the covariance matrix

is n×n were all pair of test and training observation are evaluated. According to test

location (𝑋ˈ), training output y, training input X and the covariance function the value of

𝑦∗ can be determined.

3.4.2 Gaussian Process for Classification

GPs have been applied to classification problems too. It is a very effective

classifier, it is a nonparametric classification technique, which is based on a Bayesian

methodology. This technique was developed in the geostatistics field. In a classification

23

problem we have discrete outputs, for example binary y = ±1. The classification model

uses a GP prior p(f) for a function f(x).

p(y = +1|ƒ(x)) = 𝜎2(ƒ(x)) (3.16)

In classification, yi ∈ {−1, 1}, p(yi|xi) = σ(f(xi)), where σ is a sigmoid

transformation. Marginal likelihood is the integral ∫ 𝑃(𝑦|𝑓)𝑃(𝑓|𝑋, 𝜃)𝑑𝑓. Integral is a

product of sigmoids multiplied by a Gaussian, and is therefore intractable. Recall that in

regression task, the likelihood was a Gaussian, which made the integration tractable. Thus,

the posterior cannot be found analytically. an approximation should be employed to get an

approximate posterior (Kuss and Rasmussen, 2006).

3.5 Gaussian Process Fitting

GP fitting can be done by many methods, MLE and Variogram are two common

methods. After the selection of the proper covariance function according to the data

collected and process properties, now we are trying to fit the GP Model to estimate the

covariance function parameters and the mean of the observations, the best common method

to do that job is MLE, where a mathematical expression of the data known as a likelihood

function must be provided at the beginning of the estimation process.

𝐿(𝜃; 𝑥1, … , 𝑥𝑛) = 𝑓(𝑥1, … , 𝑥𝑛|𝜃) = ∏ 𝑓(𝑥𝑖|𝜃)𝑛𝑖=1 (3.17)

This function can be maximized to get the estimation of needed parameters. The

Variogram method is commonly used for spatial statistics problem where a least squares fit

of theoretical variograms to an isotropic, experimental Variogram.

For more details about Gaussian process the following references are useful:

(Banerjee et al., 2004; Von, 2006; Fang et al, 2006; Davis, 2014; Abrahamsen, 1997;

Rasmussen and Williams, 2006).

24

Chapter Four: Research Methodology

Detailed description of the methodology, followed to develop our new multivariate

process control approach is outlined in this chapter.

The development process for establishing a control chart goes through two main

phases, Phase I where we specify the chart control limits, the data in this phase known as

training data, which must be in-control according to the process specifications. Phase II

starts as the control chart in phase I becomes ready to be implemented, data sets used to

check the approach performance contain both in-control and out-of-control observations.

Figure 4.1 shows the methodology flow chart. The proposed approach will be coded as a

set of MATLAB m-files.

4.1 Data Collection

Three data sets are used to check the power of our new approach, the first one was

a simulated data at for three uncorrelated time series using MATLAB, the second data set

was collected from a web-based program, which is used to simulate a batch-fed penicillin

production, the third was a real life data set collected during the etching process in

semiconductor fabrication constitute.

4.2 Modeling Approach

The proposed approach constitutes of the following 5 main steps:

25

Step 1: Fitting a Gaussian Process model for phase I data

In this step, we fit a GP model for each sensor observations using the in-control

(training) data, where we can predict the mean and the covariance structure. Equations 3.13

and 3.14 are used to predict the mean and covariance structure, respectively. The Euclidean

distance in the time domain is measured between each two observations; distances are

gathered into a symmetrical matrix where the diagonal members are zeros. Next, the

Variogram function is used to compute the spatial correlation between observations.

Step 2: Obtaining the residuals for the training data

Using the mean and covariance structure obtained in step 1, we obtain the residuals

R for the training data, such that

𝑅 = 𝑌 − �� (4.1)

Where Y is the observed data point and �� is the predicted observation according to

the mean and covariance structures found.

Step 3: Find the control limits

In this step, we use the residuals obtained from the previous step to get the control

limits (based on type I error rate alpha) through simulation. In this step the mean and the

covariance of the training data residuals are calculated. The UCL can be found through

ensampling that satisfy type I error (𝛼), while the LCL is zero. This is needed since the

exact distribution of the residuals is unknown.

Step 4: Calculate T2 statistic for testing data

For testing data, the following T2 statistic is calculated at each sampling point:

𝑇2 = 𝑅∑−1𝑅′ (4.2)

26

where:

- 𝑅 = 𝑌 − �� is the residuals vector at that point obtained through multivariate

normal theory.

- ∑ : is the covariance matrix of all sensors residuals.

Step 5: Compare T2 for test data against the control limits

Obtained T2 statistic of the test data is compared with the control limits obtained in

step 3. If T2 is greater than the control limit, a signal is issued indicating an out-of-control

point, and hence, an out-of-control batch.

Figure 4.1: Methodology flow chart.

27

Chapter Five: Data, Analysis, and Results

In this chapter, detailed description of the data used for proposed model validation

is provided. Analyses performed along with the results are also provided for each data set

considered. As mentioned before, three data sets were used to validate the proposed model.

Those three data sets are described in sections 5.1 through 5.3, respectively.

5.1 Simulated Data Method

A MATLAB simulation code to generate data for testing the proposed approach at

phase II was written. It assumes that there is a process with three time series outputs, these

outputs are readings from three different sensors as shown in Figure 5.1, functions used

are:

𝑓1(𝑥) = 10 (1 − 𝑒𝑥𝑝 (−0.05𝑥)) (5.1)

𝑓2(𝑥) = 10 𝑠𝑖𝑛 (1.8𝑥) (5.2)

𝑓3(𝑥) = 10 (exp(−0.1𝑥) + 𝑠𝑖𝑛 (𝑥)) (5.3)

28

Figure 5.1: Simulated data profiles. Where (a) f1(x), (b) f2(x), and (c) f3(x).

We run the simulation for 50,000 times, hence a collection of 50,000 time series

profiles for each output, each run was at different values for alpha (0.05, 0.01, and 0.0027),

two different batch length (50 and 100) time units, and at different shifting values of the

first sensor mean (0, 0.5, 1, 1.5, 2, 2.5, and 3). Errors were added to the simulated data

such that:

𝐸~𝑀𝑉𝑁(0, Ʃ) (5.4)

Where ∑ is the covariance matrix of the added error terms. The covariance matrix

∑ is assumed to have a Kronecker product structure of between sensors covariance and

within sensor covariance, such that:

Ʃ = Ʃ𝑏 ⨂ Ʃ𝑤 (5.5)

With ∑b representing between sensors covariance and ∑w representing within

profile covariance. The following values were assumed for ∑b and ∑w:

Ʃ𝑏 = [0.95 0.665 0.475

0.665 0.95 0.38

0.475 0.38 0.95

]

0 10 20 30 40 50 60 70 80 90 100-2

0

2

4

6

8

10

12

Time unit

f1(x

)

0 10 20 30 40 50 60 70 80 90 100-2

0

2

4

6

8

10

12

Time unit

f2(x

)

0 10 20 30 40 50 60 70 80 90 1004

5

6

7

8

9

10

11

f3(x

)

Time unit

(a) (b)

(c)

29

And

(Ʃ𝑤)𝑖𝑗

= 𝜓 + 𝑘𝑒−𝜙𝑑𝑖𝑗 (5.6)

with ψ=0.01, κ=0.05, φ=10, and dij is the Euclidean distance in time space between

observation i and observation j of the same profile.

The collected data was checked to see if there is an autocorrelation within the data

of each sensor, another test was for checking the correlation between sensors data sets.

Figure 5.2 shows that there is a high autocorrelation within each sensor, while Figure 5.3

shows that the is a correlation between the sensors.

Figure 5.2: The autocorrelation charts for the simulated data of each sensor.

30

Figure 5.3: The correlation matrix between simulated sensors data sets.

After the residuals is calculated we checked the autocorrelation between the

residuals for each sensor, Figure 5.4 shows that there is no autocorrelation within each

sensor, and the T2 chart can be used since the autocorrelation is not exist.

31

Figure 5.4: The autocorrelation charts for the residuals of each sensor.

The purpose of this data set is to check the performance of the proposed approach

compared to the traditional Shewhart control charts. The results for probability of detecting

a shift in the mean structure (𝛿), and Average Run Length (ARL) are summarized in Table

5.1.

32

Table 5.1: Simulated data results for type I error rate and ARL.

Batch Length = 50

Batch Length = 100

Shift

(𝜹)

Type I

error

rate (𝛼)

Probability

of

detection

ARL Shift

(𝛿)

Type I

error

rate (𝛼)

Probability

of

detection

ARL

0

0.0027 0.00364 274.7253

0

0.0027 0.00396 252.5253

0.01 0.01178 84.88964 0.01 0.01276 78.36991

0.05 0.04582 21.82453 0.05 0.03198 31.26954

0.5

0.0027 0.00768 130.2083

0.5

0.0027 0.00794 125.9446

0.01 0.02214 45.16712 0.01 0.02644 37.82148

0.05 0.09828 10.17501 0.05 0.10224 9.780908

1

0.0027 0.02874 34.79471

1

0.0027 0.0359 27.85515

0.01 0.07774 12.86339 0.01 0.0932 10.72961

0.05 0.24798 4.032583 0.05 0.28222 3.543335

1.5

0.0027 0.10878 9.192866

1.5

0.0027 0.12494 8.003842

0.01 0.25422 3.933601 0.01 0.2859 3.497726

0.05 0.60026 1.665945 0.05 0.67718 1.476712

2

0.0027 0.3509 2.849815

2

0.0027 0.44996 2.22242

0.01 0.5744 1.740947 0.01 0.69736 1.43398

0.05 0.91858 1.088637 0.05 0.961 1.040583

2.5

0.0027 0.71908 1.390666

2.5

0.0027 0.87266 1.145922

0.01 0.93074 1.074414 0.01 0.979 1.02145

0.05 0.99796 1.002044 0.05 0.9999 1.0001

3

0.0027 0.9841 1.016157

3

0.0027 0.99446 1.005571

0.01 0.99858 1.001422 0.01 0.99996 1.00004

0.05 1 1 0.05 1 1

If we compare the ARL results for our proposed approach to those for individual

control charts, which are used to monitor variables data as the data is impractical to use

rational subgroups such as the 𝑀𝑅 chart. We cannot consider any significant difference

when having some shifting in the mean, also a higher false alarm by our approach also is

33

provided. Hence this approach has the power to be implemented for different simulated

and real data sets.

5.2 Penicillin Fermentation Process Data

A Web Based simulator (PenSim v2.0) designed by Illinois Institute of Technology

to simulate the concentrations of different variables during the penicillin production

process (http://simulator.iit.edu/web/pensim/simul.html), some of these variables are heat

generation, CO2, oxygen, and penicillin concentration. The process is done through two

main stages: preparations and loading stage, and the operation stage. Our interest is to

check the batch quality during the second stage. The process flow chart is shown in Figure

5.5.

Figure 5.5: Penicillin fermentation process (Birol et al., 2001).

34

At Phase-I a simulation for 50 batches were run to collect training data, these

batches were in-control according to the initial inputs of the process, at each run we set the

initial conditions as in Figure 5.6, the simulation output includes 16 profiles which are

shown in Figure 5.7.

Figure 5.6: Penicillin production simulator inputs.

35

Figure 5.7: Penicillin simulation outputs (Birol et al., 2001).

36

Our interest is to check whether the batch is in-control or out-of-control, because of

that we do not need to take the 16 profiles to make our decision about the batch. Biomass

concentration (g/L), Penicillin Concentration (g/L), Carbon Dioxide Concentration (g/L),

and Generated Heat (kcal) are the most important output variables to make a conclusion

about the batch quality. A batch is in-control if the four profiles are within the control

limits, if one is out-of-control the whole batch is out-of-control.

After the completion of Phase-I a total of 500 batches were collected for phase-II

(testing phase), these 500 batches contain 100 in-control batches were labelled as 1, and

400 out-of-control batches that were labelled as 0, which were collected by changing the

value of substrate feed flow rate from 0.041 to 0.0418 L/h. From the 400 out-of-control,

about 300 out-of-control batches that were collected at the substrate feed flowrate value

equal to 0.041 L/h, for this group the decision that the batch is out-of-control can be taken

easily because of the clear difference from the normal batches. While 100 batches were

collected when the value of the substrate feed flowrate value equal to 0.0418 L/h, which is

close to the normal condition as it equal 0.0426 L/h, these batches are used to check the

accuracy of the approach since the batches are out-of-control but close to the in-control

region. Figure 5.8 on the top right shows the test data.

Figure 5.9 and 5.10 show the autocorrelation charts for the data within each sensor,

and the correlation matrix between sensors respectively.

37

Figure 5.8: Penicillin testing data profiles, (a) Biomass concentration (g/L), (b) Penicillin

Concentration (g/L), (c) Carbon Dioxide Concentration (g/L), (d) Generated Heat (kcal).

Figure 5.9: The autocorrelation charts for the penicillin data of each sensor.

0 10 20 30 40 50 60 70 80 90 1006

7

8

9

10

11

12

13

time

Bio

ma

ss c

oncentr

atio

n (

g/l)

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

1.2

1.4

time

Penic

illin c

on

centr

ation (

g/l

)0 10 20 30 40 50 60 70 80 90 100

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

time

Carb

on d

ioxid

e c

oncen

tration (

g/l)

0 10 20 30 40 50 60 70 80 90 10030

40

50

60

70

80

timeC

arb

on d

ioxid

e c

oncen

tration (

g/l)

(a) (b)

(c) (d)

38

Figure 5.10: The correlation matrix between penicillin sensors data sets.

For this data set we will make our decision about the approach quality by

calculating the accuracy which is equals to the percentage that batches labels (0 or 1)

before testing match the labels resulted from our approach. For example, batch number 50

is out-of-control and its actual label is 0, after the testing process if the batch is within the

control limits its label will be 1 and if not it is 0, the accuracy check is that how many

batches labels are kept the same. The accuracy results at different alpha values (0.05, 0.01,

and 0.0027) are listed in Table 5.2.

Table 5.2: Penicillin dataset accuracy results.

Type I error rate (𝜶) 0.05 0.01 0.0027

Accuracy 0.9650 0.9860 0.9860

39

According to the results we can conclude that the approach accuracy is excellent, as

discussed before that some of the out-of-control batches are close to the in-control batches,

although the accuracy at each 𝛼 values was acceptable and the approach is doing well. The

accuracy percentage decrease at 𝛼=0.05 as the width between UCL and LCL increased.

5.3 Wafer Data Set

In this section, a multivariate time series data set provided by (Olszewski, 2001)

will be used. This data set was collected from an etching process in semiconductor wafer

production using vacuum-chamber sensors. Readings from six sensors that represent

production parameters along with the batch quality being conforming to standards or not

are recorded. Figure 5.11 shows each sensor profile. A classification for each batch is

given as the batch is normal or abnormal.

Each batch contains six parameters, where the value for each parameter was

collected from a single sensor record, these parameters are:

1. Radio frequency forward power.

2. Radio frequency reflected power.

3. Chamber pressure.

4. 405 nanometer (nm) emission.

5. 520 nanometer (nm) emission.

6. Direct current bias.

Electrical power applied can be measured by the radio frequency forward power

and radio frequency reflected power, the pressure is calculated using the third parameter,

the 405 and 520 nanometer emission used to measure the intensity of light emitted by the

plasma, and the sixth parameter used to detect the potential difference within the tool for

the direct electrical current.

40

Figure 5.11: Wafer data profiles for each sensor.

0 10 20 30 40 50 60 70 80 90 100-120

-100

-80

-60

-40

-20

0

20

Time unit

Cham

ber pressure

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

Time

405 n

anom

ete

r (n

m)

em

issio

n

41

The Wafer data set contain 298 training batches and 896 batches for testing; the

results after the implementation of our approach for this data set will be discussed next.

The test for autocorrelation within each sensor data and the correlation between sensors are

shown in Figure 5.12 and 5.13. The inconsistency in batch length is one of the problems in

this data set shown in Figure 5.14. Another problem is related to time series profiles

alignment. In the next section we solved these problems to prepare data to be used by this

approach.

Figure 5.12: The autocorrelation charts for the wafer data of each sensor.

42

Figure 5.13: The correlation matrix between wafer sensors data sets.

5.3.1 Preparation Data for Testing

5.3.1.1 Fix Batch Length

This step includes resample time series data to a specified length, a predefined

function in MATLAB to do that job which is the “resample” function, for this function we

select the time series profile that we want to change its length, then choose the length that

you want to shrink or stretch the data to that length. Figure 5.15 shows profiles for batches

having the same batch length. The batch length was set to 100 time unit, then if the length

of any batch was less than 100 time unit its profiles will stretch to match that length

without changing the features of each profile, and if the length was more than 100 time

43

unit the batch profiles will be compacted to that length also without any change in the

profiles features.

Figure 5.14: Sensor 6 shows three different batches with variation in length.

Figure 5.15: Sensor 6 with three different batches having the same length, length=100

time unit.

0 20 40 60 80 100 120 140 160 180 2000

500

1000

1500

2000

2500

time

Radio

fre

quency f

orw

ard

pow

er

0 10 20 30 40 50 60 70 80 90 1000

500

1000

1500

2000

2500

time

Radio

fre

quency f

orw

ard

pow

er

44

5.3.1.2 Time Series Profiles Alignment

As discussed before that batches output variables profiles in the wafer data set are

not aligned as shown in Figure 5.16, in this case, we cannot implement any monitoring

technique, since we are not able to find and calculate the correlation between observations.

Figure 5.16: Radio frequency forward power data without alignment.

An alignment technique is applied here to align batch profiles to a given ideal

profile, and to make it clear to calculate the correlation. This was based on shifting each

time series by the lag corresponding to maximum cross correlation, before that we select a

batch to make it a reference for the alignment process. Figure 5.17 shows sensor 1 from the

wafer data set after alignment, we select the batch number 33 as a reference after fixing its

length to 100 time unit, then the remaining profiles are start shifting to match that batch

profile.

0 10 20 30 40 50 60 70 80 90 1000

500

1000

1500

2000

2500

Time

Radio

fre

quency f

orw

ard

pow

er

45

In the description of the wafer production process we found that the process was

done by two stages, while sensors readings in the period between these stages were almost

zero, because the production process might be stopped, hence we suggested to cut profiles

curves into periods. The most important period that can be used to check the batch quality

starts after the beginning of stage two until the end. Figure 5.18 shows the production

stages.

Figure 5.17: Radio frequency forward power data after alignment.

0 10 20 30 40 50 60 70 80 90 1000

500

1000

1500

2000

2500

Time

Radio

fre

quency f

orw

ard

pow

er

46

Figure 5.18: Wafer production process stages.

5.3.2 Wafer Results

Wafer data set results are summarized as the penicillin data set results, the accuracy

has calculated at different alpha values (0.0027, 0.01, and 0.05), and at different number of

training profiles (100, 150, 200, and 250). In additions to the accuracy results we

calculated the number of False Positive (FP) and False Negative (FN) batches, Table 5.3

summarizes the obtained results.

0 20 40 60 80 100 1200

500

1000

1500

2000

2500

time

Radio

fre

quency forw

ard p

ow

er

Stage 1

Stage 2

47

Table 5.3: Wafer data results.

Number of Training

profiles

Type I

error

rate (𝜶)

Accuracy FP FN

100

0.0027 0.9241 21.26% 5.79%

0.01 0.9196 4.72% 10.03%

0.05 0.9205 12.60% 7.34%

150

0.0027 0.9234 5.51% 7.96%

0.01 0.9128 4.72% 9.27%

0.05 0.8822 3.94% 12.87%

200

0.0027 0.9145 26.77% 5.88%

0.01 0.9135 18.11% 7.27%

0.05 0.9044 6.30% 10.03%

250

0.0027 0.7405 11.81% 28.15%

0.01 0.7172 6.30% 31.70%

0.05 0.6441 7.26% 40.76%

According to the results our new approach is valid to be implemented for

monitoring batch process with multi-outputs, the FP and FN percentages were acceptable

at different number of training profiles and different values of type I error rate.

48

Chapter Six: Conclusions and Future Work

In most production processes, several output variables must be monitored and

controlled in order to have specification conforming products. Multivariate Statistical

Process Control is considered a set of tools and techniques that are used to monitor such

multivariate production processes. Early detection of assignable causes that are the causes

of out-of-control products is always preferable as it cuts down the cost of bad quality

products. Early detection of such causes can be achieved through on-line process quality

monitoring. In on-line monitoring systems, process variables are monitored in real

production time where alarming signals are issued indicating a deviation from normal

operating conditions. Since monitoring is in real production time, corrective actions are

taken directly to bring back the process to its normal operating conditions, and hence,

avoiding the production of extra nonconforming products.

Several MSPC tools are found in the literature. Some of these tools; such as the T2

and MVEWMA control charts; assume no correlation within the observations of each

process variable. Tools that assume autocorrelated observations utilize the principles of

dimensionality reduction techniques and they mostly deal with batch production process, a

type of production processes where several process parameters are monitored generating

autocorrelated data. In this thesis, we proposed a MSPC approach that is based on the use

of Gaussian Process models for on-line monitoring of multivariate autocorrelated

processes. The modeling approach takes into account both, the correlation within each

sensor readings and the correlation among sensors (process variables). It also uses no

49

dimensionality reduction technique and hence reduces computation time compared to other

MSPC tools.

To illustrate the proposed approach, three different data sets were used. A

simulated data set consisting of three sensor readings was used to insure that our approach

is not significantly different from Shewhart charts. According to ARL results, this

approach is good enough to be used for real data sets as it provides similar performance to

Shewhart-type control charts for independent process data. The second set was from a web

based simulator for penicillin production, the accuracy results were excellent and supports

the results obtained from previous data set. Finally, we used a data set collected from a

semiconductor fabrication institute for wafer production using an etching process. Other

results in addition to the accuracy were obtained such as false positives and false negatives,

all the results at different combinations of type I error rate and number of training profiles

indicated that this approach is valid to be used for production processes with multi-outputs

monitoring.

Computational time is one of the problems faced during the implementation of this

approach in MATLAB, it increases as the number of training profiles increases as it needs

more time to fit the GP model. We have another problem that this approach is working

very well as it assumed that each time series profile is a smooth curve.

As a future work, trying to find another method that reduce the computation time

until obtaining data for any batch length, a solution for the curve smoothness might also be

obtained by some time series techniques.

50

References

Abrahamsen, P., 1997. A review of Gaussian random fields and correlation functions.

Norsk Regnesentral/Norwegian Computing Center.

Alshraideha, H. & Del Castillo, E., 2013. Gaussian Process Modeling and Optimization of

Profile Response Experiments. Quality and Reliability Engineering International, 34(4),

pp. 449-462.

Alshraideh, H. & Khatatbeh, E., 2014. A Gaussian process control chart for monitoring

autocorrelated process data. Journal of Quality Technology, 46(4), p. 317.

Banerjee, S., Carlin, B. P. & Gelfand, A. E., 2004. Hierarchical modeling and analysis for

spatial data. Crc Press.

Baydogan, M. G., 2012. Modeling Time Series Data for Supervised Learning. (Doctoral

dissertation, Arizona State University).

Birol, G., Ündey, C. & & Cinar, A., 2002. A modular simulation package for fed-batch

fermentation: penicillin production. Computers & Chemical Engineering, 26(11), pp.

1553-1565.

Boyle, P. & Frean, M., 2004. Dependent gaussian processes. In Advances in neural

information processing systems, pp. 217-224.

Camacho, J. & Picó, J., 2006. Online monitoring of batch processes using multi-phase

principal component analysis. Journal of Process Control, 16(10), pp. 1021-1035.

Chen, T. & Zhang, J., 2010. On-line multivariate statistical monitoring of batch processes

using Gaussian mixture model. Computers & chemical engineering, 34(4), pp. 500-507.

Diwekar, U., 2014. Batch Processing: Modeling and Design. CRC Press.

51

Fang, K. T., Li, R. & Sudjianto, A., 2006. Design and Modeling for Computer

Experiments. Crc press: s.n.

García-Muñoz, S., Kourti, T. & MacGregor, J. F., 2004. Model predictive monitoring for

batch processes. Industrial & engineering chemistry research, 43(18), pp. 5929-5941.

Grimmett, G. & Stirzaker, D., 2001. Probability and random processes. Oxford university

press.

Gunther, J. C., 2007. Process monitoring in fed-batch bioprocesses. ProQuest.

Holling, C. S., 1959. The components of predation as revealed by a study of small-

mammal predation of the European pine sawfly. The Canadian Entomologist, 90(05), pp.

293-320.

Hotelling, H., 1947. Multivariate Quality Control Illustrated by Air Testing of Sample

Bombsights. C.Eisenhart et. al., pp. 111-184.

Kuss, M. & Rasmussen, C. E., 2006. Assessing approximations for Gaussian process

classification. In Advances in Neural Information Processing Systems, pp. 699-706.

Martin, E. B., Morris, A. J. & Zhang, J., 1996. Process performance monitoring using

multivariate statistical process control. In Control Theory and Applications. IEE

Proceedings - Control Theory and Applications, 143(2), pp. 132-144.

Mason, R. L., Tracy, N. D. & Young, J. C., 1997. A practical approach for interpreting

multivariate T2 control chart signals.. Journal of Quality Technology, 29(4), pp. 396-406.

McDuff, D., 2010. Gaussian Processes, s.l.: MIT Media Lab, Tech Rep.

52

Montgomery, D. C., 2013. Introduction to Statistical Quality. 7th ed. Singapore: John

Wiley and Sons.

Nomikos, P. & MacGregor, J. F., 1994. Monitoring batch processes using multiway

principal component analysis. AIChE Journal, 40(8), pp. 1361-1375.

Nomikos, P. & MacGregor, J. F., 1995. Multi-way partial least squares in monitoring batch

processes. Chemometrics and intelligent laboratory systems, 30(1), pp. 97-108.

Olszewski, R. T., 2001. Generalized feature extraction for structural pattern recognition in

time-series data. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF

COMPUTER SCIENCE.

Osborne, M. A. & Roberts, S., 2007. Gaussian processes for prediction, Department of

Engineering Science, University of Oxford, Tech. Rep.

Runger, G. C., 2002. Assignable Causes and Autocorrelation: Control Charts on

Observations or Residuals?. Journal of Quality Technology, 34(2), p. 165–170.

Snelson, E. L., 2007. Flexible and efficient Gaussian process models for machine learning.

s.l.:(Doctoral dissertation, University of London).

Technology, I. I. o., n.d. A web-based program for dynamic simulation of fed-batch

penicillin production. [Online]

Available at: http://simulator.iit.edu/web/pensim/simul.html.

Umit, F. & Cigdem, A., 2001. Multivariate Quality Control: A Historical Perspective.

Yilditz Technical University, pp. 54-65.

Williams, C. K. & Rasmussen, C. E., 2006. the MIT Press. The MIT Press.

53

Yin, S., Ding, S. X., Abandan Sari, A. H. & Hao, H., 2013. Data-driven monitoring for

stochastic systems and its application on batch process. International Journal of Systems

Science, 44(7), pp. 1366-1376.

Yoo, C. K., Lee, J. M., Vanrolleghem, P. A. & Lee, I. B., 2004. On-line monitoring of

batch processes using multiway independent component analysis. Chemometrics and

Intelligent Laboratory Systems, 71(2), pp. 151-163.

Zhou, L., Chen, J. & Song, Z., 2015. Recursive Gaussian Process Regression Model for

Adaptive Quality Monitoring in Batch Processes. Mathematical Problems in Engineering.

54

Appendices

Appendix A. : UCL code

function [UCLopt]=findUCL(ResTrain,L,alpha)

% Find the UCL through ensampling that satisfy type I error = alpha

nSample=200000; mu=mean(ResTrain); sigma=cov(ResTrain); sigma_inv=inv(sigma); r=mvnrnd(mu,sigma,nSample*L);

Tscores=zeros(nSample*L,1);

for i=1:nSample*L Tscores(i)=r(i,:)*sigma_inv*r(i,:)'; end

Tscores=reshape(Tscores,nSample,L);

% Set optimization options options =

optimset('Algorithm','sqp','Display','iter','MaxIter',10000,'MaxFunEvals'

,10000,'TolFun',1e-12,'TolX',1e-12); [UCLopt]=fmincon(@(UCL)

optUCL(UCL,Tscores,alpha,L,nSample),max(max(Tscores))/2,[],[],[],[],0.0,m

ax(max(Tscores)),[],options);

55

Appendix B. : Get residuals code

Function[ResTrain,Res,InControlT,PredClassT,T2Train,T2Test]=getResidualsT

2(data,Test,nSensors,nProfiles,CovModel,maxd,alpha)

% This function fits a GP model for PCs derived from training data, fits % the testing data and calculates the residuals (actual-predicted)

% NO correction for alpha value is done here (use otimalAlpha)

% Reshape the data into useful form for PCA [n1 n2]=size(data); y=reshape(data',n2*nProfiles,nSensors);

% Define Residuals matrix [n3 n4]=size(Test); Res=ones(n4*n3/nSensors,nSensors); ResTrain=ones(n2*n1/nSensors,nSensors);

% Construct GPs for scores (independent GPs) and make predictions x=repmat([1:1:n2]',nProfiles,1); nugget_initial=100;

for i=1:nSensors display(sprintf('Fitting GP model for sensor %d data \n',i)); v = variogram([x

zeros(size(x))],y(:,i),'plotit',false,'maxdist',maxd); [dum,dum,dum,vstruct] =

variogramfit(v.distance,v.val,[],[],[],'model',CovModel,'nugget',nugget_i

nitial,'plotit',false); mu0=mean(y(:,i)); [Zhat(i,:) Zvar(i,:)]=predictGP(x,y(:,i),[vstruct.nugget vstruct.sill

vstruct.range mu0],n2,maxd);

% Get Residuals for TRAINING data temp=reshape(y(:,i),n2,n1/nSensors)'; ind=1; for j=1:n1/nSensors display(sprintf('Getting residuals for profile %d of sensor %d of

training data\n',j,i)); for k=1:n2 ytrain=[Zhat(i,:) temp(j,1:k)]; dist=squareform(pdist([1:n4,1:k]')); dist(dist>maxd)=inf; sigma=vstruct.nugget.*eye(size(dist))+vstruct.sill.*exp(-

vstruct.range.*dist); sigma11_inv=sigma(1:end-1,1:end-1)\eye(length(ytrain)-1);

%sigma12=sigma(1:end-1,end); sigma21=sigma(end,1:end-1); %sigma22=sigma(end,end);

56

mu=mu0+sigma21*sigma11_inv*(ytrain(1:end-1)'-repmat(mu0,n4+k-

1,1));

%var=sigma22-sigma21*sigma11_inv*sigma12;

ResTrain(ind,i)=ytrain(end)-mu;

ind=ind+1; end end % End residuals for TRAINING

% Reshape test data yTest=reshape(Test',n4*n3/nSensors,nSensors);

% Get Residuals for TESTING data temp=reshape(yTest(:,i),n4,n3/nSensors)'; ind=1; for j=1:n3/nSensors display(sprintf('Getting residuals for profile %d of sensor %d of

testing data\n',j,i)); for k=1:n4 ytrain=[Zhat(i,:) temp(j,1:k)]; dist=squareform(pdist([1:n4,1:k]')); dist(dist>maxd)=inf; sigma=vstruct.nugget.*eye(size(dist))+vstruct.sill.*exp(-

vstruct.range.*dist); sigma11_inv=sigma(1:end-1,1:end-1)\eye(length(ytrain)-1);

%sigma12=sigma(1:end-1,end); sigma21=sigma(end,1:end-1); %sigma22=sigma(end,end);

mu=mu0+sigma21*sigma11_inv*(ytrain(1:end-1)'-repmat(mu0,n4+k-

1,1));

%var=sigma22-sigma21*sigma11_inv*sigma12;

Res(ind,i)=ytrain(end)-mu;

ind=ind+1; end end % End residuals for TESTING

end

% Calculate T^2 values for training and testing data sig_inv=inv(cov(ResTrain));

57

T2Train=zeros(n2*n1/nSensors,1); for k=1:n2*n1/nSensors T2Train(k)=ResTrain(k,:)*sig_inv*ResTrain(k,:)'; end

T2Test=zeros(n3*n4/nSensors,1); for k=1:n3*n4/nSensors T2Test(k)=Res(k,:)*sig_inv*Res(k,:)'; end

% Find control limits of Chi-squared chart display(sprintf('Optimizing UCL...')); [UCL]=findUCL(ResTrain,n2,alpha); display(sprintf('Optimal UCL found %0.4f =\n',UCL));

InControlT=(T2Test<=UCL); PredClassT=ones(n3/nSensors,1); for j=1:n3/nSensors PredClassT(j)=(sum(InControlT((j-1)*n2+1:j*n2))==n2); end

58

Appendix C. : Profiles alignment code

%function[alTS,shift,corr] = alignTS(ideal,TS) %This program tries to align the time series to a given ideal prifile %by shifting it by the lag corresponding to maximum cross correlation function[alTS,shift,corr] = alignTS(ideal,TS) [n,l]=size(TS); alTS = zeros(n,l); maxlag = 30; for i = 1:n [c,lag]=xcorr(ideal,TS(i,:),maxlag,'coeff'); [corr(i),I]=max(c); shift(i)=lag(I); if shift(i)>0 alTS(i,shift(i)+1:l) = TS(i,1:l-shift(i)); elseif shift(i)<0 alTS(i,1:l+shift(i)) = TS(i,1-shift(i):l); else alTS(i,:)=TS(i,:); end end

59

مراقبة عمليات الإنتاج الكمية المترابطة بإستخدام منهاج عملية غاوس

مؤمن حسن فرحان ربابعهإعداد:

الملخص

لهدف اتحقيق هذا الهدف لأي صناعة، ول يمكن مطابقة للمواصفات إلى أقل ماالغير تقليل عدد المنتجات يعتبر

ف أي انحرا لاكتشاف التصنيعية اتيتم مراقبة متغيرات العمليالعمليات احصائيا وأدوات التحكم لمراقبة منهاج نستخدم

راقبة أكثر مفيها تم عمليات الإنتاج الكمية يفإن في معظم الصناعات وعلى سبيل المثال فعن أوضاع التشغيل الطبيعية.

تدخل في تحديد جودة المنتج النهائي. لى طرق لسابقة إأشارت العديد من البحوث اهذا وقد من متغير كونها جميعا

دوات الأفيرات. المتغ متعددة للتعامل مع العمليات متعددة المتغيرات، ولكن القليل منها تطرق إلى وجود ترابط بين هذه

لمراقبة العمليات ذات المتغيرات المترابطة لديها قصورين رئيسيين، أما الأول راقبة دم للمستخهي لاتفالموجودة حاليا

ستطيع أخذ ن لاعليه فإننا و ملية التصنيعية ثم أخذ القرار،طلب إنهاء مدة العأثناء الإنتاج، وبالتالي الحكم على المنتج يت

ق تخفيض ستخدم طرتهذه الأدوات فإن نتاجية قبل انتهائها. وبالنسبة للقصور الثاني ل لتصحيح العملية الإأي ردة فع

قيدات الحسابية.الأبعاد لحل مشكلة التع

الكمية، نتاجلإاتغيرات بالأضافة إلى عمليات في هذه الاطروحة تم التطرق الى مراقبة العمليات متعددة الم

متغيرات ترابط بينمعتمد على استخدام عملية جاوس. المنهج المطروح يأخذ بعين الاعتبار ال منهجنموذج ل اقتراحوتم

واحد من هذه المتغيرات.العملية والترابط داخل قراءات كل

وذلك قيةحقي عيةتصني لبياناتثالثة البيانات الناتجة من المحاكاة ومجموعة قواعد تم استخدام مجموعتين من

ثة تحتوي ثلا حقيقة للحكم على نموذج المنهج المطروح، حيث كانت قاعدة البيانات الأولى تمثل محاكاة لعملية تصنيعية

ات دة البيانلى قاععموقع على الشبكة العنكبوتية لمحاكاة انتاج كميات البنسلين للحصول متغيرات، بينما تم استخدام

نا بعد ذلك وصلات. قماه المالثانية، أما بالنسبة لقاعدة البيانات الحقيقة فقد كان مصدرها عملية نقش الرقائق لتصنيع اشب

لتي تكون خارج ( والدقة في توقع الكميات اARL) لتشغيال طول توسطبتقييم أداء النموذج المقترح من خلال حساب م

م اول التحكبه لجدنطاق التحكم. أما بالنسبة للنتائج المتعلقة بمتوسط طول التشغيل فأن منهج النموذج المقترح مشا

بالإعتمادولاقة، الخاصة بنوع شيوارت، حيث أن أفضل ما يمكن تحصيله وذلك عندما لا يكون بين بيانات العملية أي ع

% إلى92ين وعلى عدد حالات التدريب فإن قيم الدقة التي حققت كانت تتراوح ب الأول الخطأ معدل نوععلى قيم

عددة عمليات متقبة ال%. نظرأ إلى النتائج المحصلة نجد أن منهج النموذج المقترح لديه أداء جيد ويمكن تطبيقه لمرا98

من حيث لسابقةالا في البحوث المتغيرات. بالأضافة إلى ذلك فإن هذا المنهج المقترح تخطى القصور الذي كان حاص

أنه بالإمكان استخدامه لمراقبة العلميات اثناء الإنتاج ولا يوجد أي تخفيض بالأبعاد.

Documents

MONITORING OF AUTOCORRELATED BATCH PRODUCTION PROCESSES THROUGH A GAUSSIAN PROCESS ... · 2017. 4. 6. · MONITORING OF AUTOCORRELATED BATCH PRODUCTION PROCESSES THROUGH A GAUSSIAN