Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
MONITORING OF AUTOCORRELATED BATCH
PRODUCTION PROCESSES THROUGH A
GAUSSIAN PROCESS APPROACH
MONITORING OF AUTOCORRELATED BATCH
PRODUCTION PROCESSES THROUGH A GAUSSIAN
PROCESS APPROACH
By
Mu′men Hasan Rababah
Advisor
Dr. Hussam A. Alshraideh
Co-Advisor
Dr. Tarek H. Al-Hawari
Thesis submitted in partial fulfillment of the requirements for the degree of
M.Sc. in Industrial Engineering
At
The Faculty of Graduate Studies
Jordan University of Science and Technology
May, 2016
MONITORING OF AUTOCORRELATED BATCH
PRODUCTION PROCESSES THROUGH A GAUSSIAN
PROCESS APPROACH
By
Muˈmen Hasan Rababah
……………………… Signature of Author
Signature and Date Committee Member
Dr. Hussam A. Alshraideh (Chairman) ………………………
Dr. Tarek H. Al-Hawari (Co-Advisor) ………………………
Dr. Mohammed S. Obeidat (Member) ………………………
Dr. Yazan K. Migdadi (External Examiner) ………………………
May, 2016
تفويض
محتوى ي نشرفجامعة العلوم والتكنولوجيا الاردنية حرية التصرف نحن الموقعين أدناه، نتعهد بمنح
قوانين فق الوالرسالة الجامعية، بحيث تعود حقوق الملكية الفكرية لرسالة الماجستير الى الجامعة
والانظمة والتعليمات المتعلقة بالملكية الفكرية وبراءة الاختراع.
الطالب المشرف المشارك المشرف الرئيس
د. حسام الشريدة
د. طارق الحوري
مؤمن ربابعه
التوقيع والتاريخ
........................
........................
التوقيع والتاريخ
.........................
.........................
الرقم الجامعي والتوقيع
20143029004
.........................
I
DEDICATION
To the memory of my first teacher, my father, and my first
friend, my brother Mohammed. I miss you every day…
To my mother, Thanks is not enough to what you have done for
me, may Allah helps me to be a good son for you…
To my brothers, Mu’ath, Mu’nis, Mustafa, and AbdAlraheem…
To my sisters, Roa’, and Enas, I love you all so much and many
thanks for your support…
To my lovely friends and family thanks for all…
Audai, Abdullah, Saba’, Reham, Heba, Ansam, Mahmoud,
Osama, and Khalid sharing your moments with me was the best
things that happened during these two years, special thanks for
all of you…
II
ACKNOWLEDGMENT
“Glory to thy Lord, the Lord of Honour and Power! (He is free) from what
they ascribe (to Him)! (180) And Peace on the messengers! (181) And
Praise to Allah, the Lord and Cherisher of the Worlds. (182)” Quraan [37:180-
182]
First of all, I thank ALLAH for helping and guiding me through my life, and
prayer and peace be upon the Prophet Muhammad, peace be upon him.
Secondly, I also would like to thank Dr. Hussam Alshraideh for everything, it was
an honor for me to be one of your students, and many thanks to Dr. Tarek Al-Hawari,
thanks for both of you for your patience and support.
Finally, many thanks for my family, friends, colleagues, and everyone who
supported me. May ALLAH bless you all.
III
TABLE OF CONTENTS
Title Page
DEDICATION I
ACKNOWLEDGMENT II
TABLE OF CONTENTS III
LIST OF FIGURES V
LIST OF TABLES VII
LIST OF APPENDICES VIII
LIST OF ABBREVIATIONS IX
ABSTRACT x
Chapter One: Introduction 1
1.1 Overview 1
1.2 Problem Statement 4
1.3 Study Objectives 4
1.4 Significance of Work 5
1.5 Thesis organization 5
Chapter Two: Literature Review 7
Chapter Three: Gaussian Processes 13
3.1 Gaussian Processes Definition 13
3.2 Covariance Functions 15
3.3 Dependent Gaussian Process 20
3.4 Gaussian Process Applications 21
3.4.1 Gaussian Process for Regression And Prediction 21
3.4.2 Gaussian Process for Classification 22
3.5 Gaussian Process Fitting 23
Chapter Four: Research Methodology 24
4.1 Data Collection 24
IV
Title Page
4.2 Modeling Approach 24
Chapter Five: Data, Analysis, and Results 27
5.1 Simulated Data Method 27
5.2 Penicillin Fermentation Process Data 33
5.3 Wafer Data Set 39
5.3.1 Preparation Data for Testing 42
5.3.1.1 Fix Batch Length 42
5.3.1.2 Time Series Profiles Alignment 44
5.3.2 Wafer Results 46
Chapter Six: Conclusions and Future Work 48
References 50
Appendices 54
Abstract in Arabic Language 54
V
LIST OF FIGURES
Figure
Description
Page
1.1 Functional response example for Biomass concentration in fed-
Batch penicillin production
3
2.1 Principal components for 2-dimensions 9
2.2 The upper and lower bold lines indicate a two time series where
the small lines between it indicate that distance (a) Euclidean. (b)
Alignment
12
3.1 Dependent outputs, the model is represented by the solid lines,
where the true function are represented by the dotted lines, the
two figures in the top for an independent model, and the other are
for a dependent model
20
3.2 At the left the November 2010 closing price for the S&P, while
the right side is the prediction using GP for the data collected
21
4.1 Methodology flow chart 26
5.1 Simulated data profiles 28
5.2 The autocorrelation charts for the simulated data of each sensor 29
5.3 The correlation matrix between simulated sensors data set 30
5.4 The autocorrelation charts for the residuals of each sensor 31
5.5 Penicillin fermentation process 33
5.6 Penicillin production simulator inputs 34
5.7
Penicillin simulation outputs
Error!
Bookmark
not
defined.
5.8 Penicillin testing data profiles, (a) Biomass concentration (g/L),
(b) Penicillin Concentration (g/L), (c) Carbon Dioxide
Concentration (g/L), (d) Generated Heat (kcal)
37
5.9 The autocorrelation charts for the penicillin data of each sensor 37
5.10 The correlation matrix between penicillin sensors data set 38
5.11 Wafer data profiles for each sensor respectively 40
VI
5.12 The autocorrelation charts for the wafer data of each sensor 41
5.13 The correlation matrix between wafer sensors data set 42
5.14 Sensor 6 shows three different batches with variation in length 43
5.15 Sensor 6 with three different batches having the same length,
length=100 time unit
43
5.16 Radio frequency forward power data without alignment 44
5.17 Radio frequency forward power data after alignment 45
5.18 Wafer production process stages 46
VII
LIST OF TABLES
Table Description Page
4.1 Comparison between Gaussian process and Gaussian distribution 14
4.2 Summary of several commonly-used covariance functions 19
5.1 Simulated data results for type I error rate and ARL 32
5.2 Penicillin dataset accuracy results 38
5.3 Wafer data results 47
VIII
LIST OF APPENDICES
ageP Description Appendix
A UCL code 54
B Get residuals code 55
C Profiles alignment code 58
IX
LIST OF ABBREVIATIONS
Abbreviations Description
ARL Average Run Length
CUSUM Cumulative Sum
DTW Dynamic Time Wrapping
GMM Gaussian Mixture Model
GP Gaussian Process
LCL Lower Control Limit
MLE Maximum Likelihood Estimate
MPCA Multi-Way Principal Component Analysis
MPLS Multi-Way Partial Least Square Analysis
MSPC Multivariate Statistical Process Control
PCA Principal Component Analysis
PLS Partial Least Square
RGP Recursive Gaussian Process
SE Squared Exponential
UCL Upper Control Limit
x
ABSTRACT
MONITORING OF AUTOCORRELATED BATCH
PRODUCTION PROCESSES THROUGH A GAUSSIAN
PROCESS APPROACH
By
Muˈmen Hasan Rababah
Achieving the minimum number of nonconforming items is always the goal in any
industry. To achieve this goal, statistical process monitoring and control tools are used
where process variables are being monitored to detect any deviations from normal
operating conditions. In many industrial processes; such as batch production processes;
multiple process variables must be monitored as they play a key role in the quality of the
final product. Several monitoring tools are found in the literature that deal with the case of
multiple process variables, but a few of them deal with the case of autocorrelated variables.
Existing tools that are used to monitor autocorrelated process variables have two main
drawbacks. First, they are run off-line. That is, monitoring is performed at the end of the
production cycle and hence no corrective actions can be made. Second, these tools utilize
dimensionality reduction techniques to solve for computational complexity issues.
In this thesis, the case of monitoring multi-variable processes including batch production
processes is considered. A Gaussian Process (GP) based modeling approach is proposed.
The proposed approach takes into consideration both, the correlation between process
variables along with the autocorrelation within each variable readings.
Two simulated and one real data sets were used to validate the proposed modeling
approach. In the first data set, data was simulated to mimic a real production process with
three variables of interest. A web based simulator for batch production of penicillin was
used to generate observations for the second data set. For the real one, data from wafer
etching process was considered. Model performance was assessed through the average run
Length (ARL) and the accuracy in predicting out-of-control batches. In regards to the
Average Run length, the proposed modeling approach had similar results to the Shewhart
type control charts; the optimal case for independent process data. Depending on the value
of type I error rate assumed and number of training instances, accuracy values ranging
from 92% to 98% were achieved. Obtained results indicate that the proposed modeling
approach has a good performance to be used for monitoring of multi-variable processes. In
addition, the proposed approach overcomes the drawbacks of other tools in that it can be
used as an on-line monitoring tool and no dimensionality reduction is performed.
1
Chapter One: Introduction
1.1 Overview
Process monitoring is essential in every industry to cut down unnecessary quality
costs. Following Shewhart model, causes of variability or bad quality in any process are
classified into two types, assignable and common causes (Runger, 2002). Assignable
causes are those sources of variation external to the process and do not represent the
normal operating conditions, while common causes are those error sources that present as a
natural part of the process (Montgomery, 2013). Detection and elimination of assignable
causes reduces process variability and as a consequence cuts down bad quality cost. Hence,
it is the main objective of any process monitoring technique (Runger, 2002).
In most industrial processes, more than one process parameter plays a key role in
the quality of the final product. Monitoring of the final product quality requires an online
monitoring of those key process parameters through statistical techniques. Such monitoring
scheme is referred to as Multivariate Statistical Process Control (MSPC) in the literature.
Control charts are a basic process monitoring tool and are considered as one of the
magnificent tools for quality improvement. �� or R are examples of control charts, which
are used to monitor independent observations when observations are variables
(Montgomery, 2013), another example is the Hotlleing T2 control chart (Hotelling, 1947),
which will be used in this thesis later on as it is one of the common charts for MSPC, it is
valid if there is no autocorrelation within the variable observations. Basic process
2
monitoring tools such as control charts assume that observed data is free of
autocorrelations. That is, data points observed at nearby time points are independent or
uncorrelated, an assumption that is widely violated in many industrial processes specially
in chemical industries. Several techniques have been proposed in the literature to deal with
autocorrelated processes. Autoregressive Integrated Moving Average (ARIMA), is a set of
statistical models that are usually used to model autocorrelations within time series data.
Batch production processes are common industrial production processes. Such
production processes are used in the pharmaceutical industry, paint production, and many
others. In such processes, a large amount of certain product is usually processed at the
same time and is called a batch. Processing time of batches is mostly long, and hence, real
time monitoring of such processes is needed to minimize the amount of bad quality
batches. End-of-line monitoring of batch processes is not useful as in most cases bad
quality batches are sold at discounted price, sent back for reproduction, or scraped. In
either case, an extra cost is incurred, and can be eliminated through online monitoring at
which remedial actions can be done before the end of the production cycle once an out-of-
control batch is anticipated (Diwekar, 2014).
Two types of batch processes monitoring approaches are used to check the quality
of any produced batch. The first type as mentioned before is end-of-line monitoring that is
done after the process is finished, for this type of monitoring the data is collected along the
batch length, then these data is used to take a decision about the batch quality whether it is
in-control or out-of-control. The major disadvantage of the off-line monitoring that we
cannot do any corrective actions to fix the process if a batch was out-of-control. On the
other hand, on-line monitoring approaches are used to check the batch quality while it is
under processing, the available data from the start of the process up to the time that we
want to do the monitoring process is used to take a decision about the process, according to
3
the results obtained from the monitoring process if a batch was in-control keep it without
any changes, while if it was out-of-control we can apply any corrective actions if possible
to allow the process to return to the in-control situation (Gunther, 2007).
Process parameters in batch production processes are observed over production
time forming a whole time series, that is known as a functional response in the literature
(Holling, 1959). Figure 1.1 depicts an example of a functional response representing
biomass concentration in a fed-batch penicillin production process. Biomass concentration
is a key production parameter that controls the amount and quality of penicillin produced.
We note here that each batch of penicillin is processed typically over a 400 hours
production period. This long processing time urges the need for online monitoring of
penicillin, otherwise the whole batch processing time is wasted if the batch is out-of-
control.
Figure 1.1: Functional response example for biomass concentration in fed-batch penicillin
production (Birol et al., 2002).
Monitoring of batch production processes through traditional control charts is
improper due the nature of observed process data. As shown in Figure 1.1, observed
process data is highly autocorrelated and hence Shewhart based control charts are not
appropriate for such data.
4
In here, we propose a monitoring approach for batch production processes based on
the use of Gaussian Process Model. When only one variable of interest is observed over
production time, batch process data is considered as a replicated univariate time series
data. Extending the approach proposed by Data-driven monitoring for stochastic systems
and its application on batch process Alshraideh and Khatatbeh (2014) to the case of
replicated and multivariate time series data is proposed.
1.2 Problem Statement
On-line process monitoring is essential in every industry to cut down out-of-control
products cost. When the process has multiple variables to be monitored, the use of
multivariate statistical process monitoring tools is advised. Most of these tools assume no
correlation within the sequential observations of each of the process variables. A few tools
consider this autocorrelation, but unfortunately, they are run off-line making it difficult to
apply any corrective actions. Hence, a multivariate statistical process monitoring tool that
can be run on-line is indeed needed.
1.3 Study Objectives
1. Identifying a batch production process and collecting the related and required facts.
2. Fitting a Gaussian Process Model for the collected training data, then calculating
the residuals to obtain the Upper and Lower control limits (UCL and LCL) for T2
control chart. As the chart become ready to be tested the residuals for the testing
data are used to calcite the T2 statistic for each observation, finally we compare the
obtained T2 against the control limits to check quality of that batch.
3. Constructing an on-line process monitoring tool that considers both within and
between profiles correlation.
5
4. Overcoming the dimensionality techniques that have been used up today, where all
variables readings will be included for constructing the monitoring tool.
5. Helping manufacturers by having a monitoring tool that gives an early alarm to do
some corrective actions before the process is finished.
1.4 Significance of Work
The proposed approach will be different than other approaches found in the literature
in two aspects:
1- Proposed approaches in the literature rely on dimensionality reduction techniques
Principal Component Analysis and Partial Least Square (PCA and PLS) (Nomikos and
MacGregor, 1994) where some information might be lost. In here, the proposed modelling
approach takes into account all information available and no dimensionality reduction will
be performed.
2- A control chart with upper and lower limits will be provided based on observed in-
control data, which makes the monitoring process suitable for on-line monitoring as it will
be less computationally expensive than other approaches.
1.5 Thesis organization
This thesis is organized as follows. Chapter one provides a brief introduction to the
thesis and outlines the significance of the study. Chapter two provides an overview of
previous researches related to this study in the literature, and it explains some previous
methods used for multivariate statistical process control and on-line batch monitoring.
Chapter three introduces a review of the Gaussian Process. Chapter four includes the
methodology followed to develop the proposed monitoring approach. The experimental
6
results are listed in chapter five. Finally, conclusions and future work recommendations are
provided in chapter six.
7
Chapter Two: Literature Review
Literature on MSPC, autocorrelation process, on-line batch process monitoring and
GP Model are reviewed in this chapter as these topics describe the development of on-line
monitoring of batch processes.
MSPC has been suggested by Hotelling (1947), he developed a control chart called
the T2 control chart for monitoring processes when more than one process variable is taken
in concern. The T2 control chart, which is a Shewart–type chart, became widely used in this
field and the base stone for many related process monitoring methods (Umit and Cigdem,
2001). One of the efficient studies by Lucas (1985) discussed the Multivariate Cumulative
Sum (CUSUM), where it is used to detect small process change. Many studies in literature
discussed the uses and the development on MSPC such as (Montgomery, 2013; Martin et
al., 1996; Mason et al., 1997).
Basic process monitoring tools, such as control charts assume that observed data is
free of autocorrelations. Several techniques have been proposed in the literature to deal
with autocorrelated processes. This include the use of ARIMA models, hidden Markov
chain models, and GP model proposed by Alshraideh and Khatatbeh (2014).
Autocorrelated data arise frequently in batch production processes, where usually process
parameters are recorded continuously through sensors for the whole batch production
period. An example of such processes is shown in Olszewski (2001), where six process
parameters are monitored during the etching process in semiconductor wafer production.
8
As described before, many researchers discussed MSPC and how to deal with
autocorrelation within collected process data. Monitoring of batch production processes
have been considered by several authors. Studies found in the literature consider this same
problem from two points of view. Several authors have considered monitoring of batch
processes in a similar manner to classical process quality control, where causes of
variability are defined as common and special causes and the aim is to detect those special
causes that lead to an out of control process. Other researchers have considered this
problem as a classification of multivariate time series data where for each batch a set of
variables are observed over the batch production time and the quality of the batch as the
class is known.
One of the original studies in the field of on-line batch monitoring related to
Nomikos and MacGregor (1994), they proposed Multi-Way Principal Components
Analysis (MPCA) to deal with batch process data. In their approach, independent principal
components are extracted from the observed process data and process monitoring using the
Hotelling T2 statistic is performed. Their approach relies on dimensionality reduction
techniques using MPCA, were some information might be lost, and all the information
from a dataset of batches is folded into a few matrices. To incorporate available data about
the quality of observed batches, Nomikos and MacGregor (1995) considered the use of
Multi-Way Partial Least Squares (MPLS). Both MPCA and MPLS assume Gaussian and
stationary process data.
Two assumptions were taken into consideration when these approaches were
developed; the first assumption was about the validation of the approach, where it is valid
if the reference data set is representative of the process operation. In addition, a new data
set must be built if something changes in the process, which express changes in the process
and reapply the method. The second assumption was about the requirement that the events
9
that one wishes to detect must be observable. Each event that does not affect the
measurements cannot be detected by any monitoring procedure.
PCA is a statistical procedure that converts the observations of a set of correlated
data by shifting the data and centring it. Figure 2.1 describes the method of PCA, then
building new axes which will eliminate the correlation between the variables. Each new
axis is called a principal component, the number of principal components needed to
analyize the data is equal to the number of variables or less than the number of variables
which are the components that have the highest source of variability in the process.
Figure 2.1: Principal components for 2-dimensions (Montgomery, 2013).
To overcome the dimensionality reduction in MPCA and MPLS approaches, Yoo et
al. (2004) proposed an approach based on the use of Independent Components Analysis
(ICA) where stationarity and normality of data are not required. Garcia-Munoz, Kourti,
and MacGregor (2004) used the missing data option offered by Nomikos and MacGregor
(1995) because of its accuracy when estimating the new observed batch data with some
updates on the distance equation of the Hotelling’s statistic.
Recent studies discussed the on-line batch monitoring using new techniques that
are different from what was discussed before. One of these researches by Le Zhou, Chen
10
and Song (2015); an approach is proposed based on Recursive Gaussian Process (RGP)
regression model. This approach is valid for process monitoring when the batch length is
very wide where a lot of observations will be collected, for these types of processes as the
length is wide the quantity of the batch process data at the beginning of monitoring process
seems to be very limited. A GP model have been used at the initial stage of monitoring to
build a relation between the data variables, then the proposed technique is used for
monitoring new batches, but this approach is applied for a single response only.
For some cases where the data are collected from a complex process multivariate
Gaussian distribution is not applicable. Chen and Zhang (2010) have used Gaussian
mixture model (GMM) to improve on-line monitoring for batch processes and to estimate
the Probability Density Function (pdf). The results of this study showed that this approach
has a good indication to decrease the percent of false alarms. A limitation of this approach
is that if the dimensionality is high, it will not work due to computational reasons.
Subspace-aided approach has been constructed by Yin et al. (2011) to deal with
processes that show dynamic and random disturbances, and because of the normalization
step which is the basic procedure of multivariate statistical process control, this approach
cannot be implemented since process variables have a wide range of operation.
Spatio-temporal statistics can be used here; since the observed data comes from
batch processes. In Spatio-temporal statistics the interest is where and when the data have
been collected; to check if the location where the data have been collected, or the time
when the data have been collected have a significant effect. GP is a spatial stochastic
process that has been developed and used extensively in literature as a prediction and
interpolation tool (Alshraideh and del Castillo, 2013; Chen and Zhang, 2010; Le Zhou et
11
al., 2015). Gaussian Process Models have been used before by Banerjee et al. (2004) for
Spatio-temporal problems, it was used for analyzing geostatistics data sets.
Monitoring of batch processes can be done using GP Model, since there is an
autocorrelation within collected data. Alshraideh and Khatatbeh (2014) have recently
proposed a control chart for monitoring univariate correlated process data. In their
approach, they assume a continuous production process where a variable of interest is
observed over time and the goal is to detect any deviations in the process due to assignable
causes. This new method was based on the GP Model. Maximum Likelihood Estimator
(MLE) applied after the construction of the distance matrix to estimate the model
parameters, then the estimation of the response next point at time t+1 is estimated. To
build a control chart the upper and lower limits (µ ± L𝜎) need to be determined based on
the model residuals.
A detailed review of time series classification techniques is provided by Baydogan
(2012). Such techniques include the Nearest Neighbors (NN) classifiers, which classify
objects upon comparing the data set profiles with the closest training profile in the feature
space. Most of the time there are many problems that occur while analyzing time series
data sets, time series alignment is one of these problems. In classification process the data
is checked with Euclidean distance or not to do the classification, Figure 2.2 shows the
difference between four time series where two have a Euclidean distance, and the others
are aligned using Dynamic Time Wrapping (DTW) technique. To detect if there is a
correlation or not. Some techniques can be used to align the time series data, simply by
selecting a single time series profile to set it as a reference for other time series profiles.
One of the approaches that can be used for the alignment process is DTW, which shows a
measure of the similarity of time series independent of certain non-linear variations in the
time dimension.
12
Figure 2.2: The upper and lower bold lines indicate two time series where the small lines
between them indicates that distance (a) Euclidean. (b) Alignment by DTW (Baydogan,
2012).
Those Classification methods assume univariate time series data where breaking
into several univariate series is needed. In case of multivariate time series data, ensembling
methods have been proposed to solve this issue.
13
Chapter Three: Gaussian Processes
The proposed approach in this thesis is based on the use of GP Model, in here, we
will go through GP with some details. This chapter is organized as follow: an introduction
about GP, dependent GP and GP applications.
GP is a well-known class of probability distributions on functions. It has been used
for long time as prediction and interpolation method, specially used for time series
prediction (Osborne and Roberts, 2007). GP have been used since the 1970’s in
geostatistics. Kriging is the term used to describe prediction in geostatistics, where two or
three space dimensions are the inputs to the process in spatial statistics.
3.1 Gaussian Processes Definition
Many books and articles have defined and described the GP, it can be defined as
when observations are from a continues domain (time or space) we can say that the process
is a GP, where each point associated with normally distributed random variables. Every
finite collection of those random variables has a multivariate normal (or Gaussian)
distribution. The distribution of a GP is the joint distribution of all those random variables.
GP can be seen as an extremely large dimensional generalization of multivariate normal
distribution. Each point (observation) must be normally distributed.
When observations X at time t {Xt, t 𝜖 𝑇} come from a stochastic process, for any
random distinct value t1,….,tk 𝜖 𝑇, the random victor X = (Xt1, . . . , Xtk)ˊ has a multivariate
normal distribution, then:
14
X~ MVN (𝜇(𝑡), K(xi,xj)) (3.1)
Where 𝜇(𝑡) = E[Xt] is the mean function, which is a vector n×1, and k(xi,xj) is the
covariance function, which is a matrix (Σ covariance matrix) n×n, GP determined by its
mean and covariance functions. GP is a general case for the Gaussian distribution, a short
comparison between GP and Gaussian distribution is listed in table 3.1. The Gaussian pdf
for vector X is:
ƒx(x) = (2𝜋)-n/2 |Σ|-1/2 exp(−1
2(𝑥 − 𝜇)ˊ Ʃ
-1(𝑥 − 𝜇)) (3.2)
Table 3.1: Comparison between Gaussian process and Gaussian distribution (Kuss and
Rasmussen, 2006).
Gaussian Process Gaussian Distribution
Distribution over functions Distribution over vectors
Mean function and a covariance
function:
ƒ ∼ MVN (𝝁(𝒕), k(xi,xj))
Mean and a covariance:
X ∼ MVN (μ,Σ)
Index affected by the
argument x of ƒ(x)
Index affected by the
position of the random
variable xi
The mean and the covariance function should carefully be chosen to define the GP,
the zero is widely spread used to define the mean function, but the covariance function
have several definitions and are discussed in the next section.
15
3.2 Covariance Functions
Covariance functions play a significant role in the definition of the process
behaviour, and as described before that the mean function is set to zero most of the time; in
this case, the covariance function will fully determine the process behaviour.
One of the process properties is to be stationary, which mean that the process mean
and covariance do not change over time, it seems to be one of the important tools in time
series analysis. Process isotropy also can be defined by the covariance function which is
the uniformity in all direction, where the covariance function is a function only of ||x-xˈ||
(distance between x and xˈ), we can say that the process is homogenous if it is stationary
and isotropic (Grimmett and Stirzaker, 2001). The covariance function also defines the
smoothness and periodicity of the process (Barber, 2012).
The Gaussian time series is stationary if:
1. 𝜇(t) = E[Xt] = 𝜇 is independent of t.
2. k(t+h,t) = k(Xt+h, Xt) is independent of t for all h.
For any GP it is stationary if we have:
1. If Xt has a distribution as follow Xt ∼ N(µ, k(0)) for all t.
2. The covariance matrix for (Xt+h, Xt)ˊ is as below for all t and h:
[𝑘(0) 𝑘(ℎ)𝑘(ℎ) 𝑘(0)
]
The covariance function build the covariance matrix, where kij is the covariance
matrix and k(xi,xj) is the covariance function, which characterize the correlation between
the data points.
kij = k(xi,xj) (3.3)
16
The properties for an autocovariance function cov(∙) are:
1. k(0) ≥ 0.
2. |k(h)| ≤ k(0) for all h.
3. k(h) = k(-h).
4. The covariance matrix must be positive semi-definite.
Covariance functions have several forms (Williams and Rasmussen, 2006), a brief
description of the most used functions are listed below:
1. Constant covariance function:
k(x,xˈ) = 𝜎02 (3.4)
The constant covariance function is stationary and degenerate (when the covariance
function has only a limited number of non-zero eigenvalues).
2. Linear covariance function:
k(x,xˈ) = 𝜎02 x xˈ (3.5)
A simple linear regression model is the base for this function, f(x) = 𝛽x with
𝛽~N(0,𝜎02). It is nonstationary and degenerate function.
3. Squared Exponential (SE) covariance function:
k(x,xˈ) = exp ( - 𝑟2
2ℓ2 ) (3.6)
Where:
- ℓ: The characteristics scale length.
- r: The distance between x and xˈ and it is equal to r = ||x – xˈ||.
17
This function is one of the most widely-used covariance functions, it is stationary
and isotropic, also it is differentiable, which refers to the GP with this kernel has mean
square derivatives of all orders, which make it very smooth. It corresponds to projecting
the input data into a large scale dimensional feature space (Snelson, 2007).
4. Matern class covariance function:
k(x,xˈ) = 1
2𝜈−1𝛤(𝜈)(
√2𝜈
ℓ𝑟)𝜈 𝐾𝜈 (
√2𝜈
ℓ𝑟) (3.7)
Where:
- 𝛤(𝜈): The gamma function evaluated at 𝜈.
- 𝐾𝜈: The modified Bessel function of order 𝜈.
The roughness of the random functions defined by the order ν, this kernel can be
used for many applications such as: geostatistics and spatial statistics. It is also used to
define the statistical covariance between measurements made at two points. It is stationary,
Since the kernel only depends on distances between the points. It can be isotropic if the
distance is Euclidean.
The Mat´ern class function becomes simple when 𝜈 is half-integer: 𝜈=p+1/2, p is a
non-negative integer. The kernel here is a product of an exponential and a polynomial of
order p.
5. Rational quadratic covariance function:
k(x,xˈ) = (1 +𝑟2
2𝛼ℓ2)−𝛼 (3.8)
𝛼, ℓ > 0 can be seen as a scale mixture of SE kernel with different characteristic
length-scales, this function is used frequently in spatial statistics and image analysis. It is
18
stationary. As Matern class covariance function rational quadratic is isotopic if the distance
is Euclidean.
6. Polynomial covariance function:
k(x,xˈ) = (𝑥. 𝑥ˈ + 𝜎02)𝑝 (3.9)
When dealing with real-world problems this function is not preferable, as they
imply a high covariance for distant inputs. The reverse situation is commonly seen, i.e.
nearby inputs result in close to each other outputs. This often leads to an inferior prediction
performance of polynomial regression as compared each. It is one of the nonstationary
kernel and degenerate.
7. Exponential covariance function:
k(x,xˈ) = exp(- 𝑟
ℓ ) (3.10)
In the Mat´ern class if 𝜈 = ½ gives the exponential covariance function. It is
stationary and nondegenerate.
8. 𝛾-exponential covariance function:
k(x,xˈ) = 𝑒𝑥𝑝(−(𝑟
ℓ )𝛾) (3.11)
Both the exponential and SE are special cases of the 𝛾-exponential, this function
has a similar number of parameters to the Mat´ern class. As the exponential function it is
stationary and nondegenerate.
Table 3.2 summarizes the covariance functions as follow:
19
Table 3.2: Summary of several commonly-used covariance functions.
Kernel Expression Properties
Constant 𝜎02 Stationary, degenerate
Linear 𝜎02 x xˈ
Nonstationary,
degenerate
Squared Exponential exp( - 𝑟2
2ℓ2 ) Stationary, nondegenerate
Matern class 1
2𝜈−1𝛤(𝜈)(√2𝜈
ℓ𝑟)𝜈 𝐾𝜈 (
√2𝜈
ℓ𝑟) Stationary, nondegenerate
Polynomial (𝑥. 𝑥ˈ + 𝜎02)𝑝
Nonstationary,
degenerate
Exponential exp(- 𝑟
ℓ ) Stationary, nondegenerate
𝜸-Exponential 𝑒𝑥𝑝(−( 𝑟
ℓ )𝛾) Stationary, nondegenerate
Rational quadratic (1 +𝑟2
2𝛼ℓ2)−𝛼 Stationary, nondegenerate
Predicting financial markets is an example of the GP applications, Figure 3.1a
shows November closing price for the Standard & Poor's 500 (S&P) as points, while the
prediction for these observations using GP is shown in Figure 3.1b assuming zero mean
and using the Marten covariance function.
20
Figure 3.1: At the left the November 2010 closing price for the S&P, while the right side
is the prediction using GP for the data collected (McDuff, 2010).
3.3 Dependent Gaussian Process
When dealing with dependent observations over space, or time, or time and space,
and then we are trying to model that process, the use of GP is one of the preferred and most
widely used modelling technique.
A single output variable is mostly implemented by GP. An independent model is
built to deal with multiple outputs where each output has its model separate from the other
output as the multi-kriging method (Boyle & Frean, 2004). Consider the example shown in
Figure 3.2 describing two-coupled outputs. A detailed description about output 1, but
output 2 is scattered. If we deal with the outputs as independent, we cannot exploit their
similarity, so predictions will be made about output 2 using what we learn from both
output 1 and 2.
Multiple processes can be handled by inferring convolution kernels instead of
covariance functions. This makes it easy to construct the required positive definite
covariance matrices for covarying outputs.
21
Figure 3.2: Dependent outputs, the model is represented by the solid lines, where the true
function are represented by the dotted lines, the two figures in the top for an independent
model, and the other are for a dependent model (Boyle and Frean, 2004).
3.4 Gaussian Process Applications
3.4.1 Gaussian Process for Regression and Prediction
Gaussian process is a powerful tool for regression, it takes an important place in the
theory of probability. The regression problem is a general statistical problem. GP have
been used widely beyond the regression model and it was developed to be used in
classification (Williams and Barber, 1998; Kuss and Rasmussen, 2005).
If we have a data set D consisting of N input vectors x1,. . .,xN and continuous
outputs y1,. . .,yN. an assumption that the outputs are noisily observed from a function f(x).
The regression model object is to estimate f(x) from D. A GP regression model is a
probabilistic Bayesian model. A GP defines a probability distribution on functions p(f). it
can be used as a Bayesian prior to the regression.
22
Within a class of kernel, hyperparameters control properties such as lengthscale.
The GP is used to express a prior belief about the f(x) we are modelling. A noise model is
defined to link the data to f(x), and then regression is a matter of Bayesian inference.
Based on any finite set of training and test observations are jointly Gaussian
distributed, the Gaussian Process can be implemented to do regressions. In here, if y1 and
y2 are an output of a stochastic and it has a joint normal distribution
[𝑦1
𝑦2] ~ 𝑁 ([
𝜇1
𝜇2] , [
Ʃ11 Ʃ12
Ʃ21 Ʃ22]) (3.12)
Where Ʃ22 is nonsingular, then the mean is equal to:
𝜇𝑋1|𝑋2= 𝜇1+ Ʃ12 Ʃ22
−1(X2 ــ 𝜇2) (3.13)
The covariance matrix is:
Ʃ𝑋1|𝑋2= Ʃ11ــ Ʃ12 Ʃ22
−1 Ʃ21 (3.14)
Then for any new predicted observation for output y:
[𝑦𝑦∗] ~ 𝑁𝑛+1 ([
𝜇𝜇] , [
Ʃ11 Ʃ12
Ʃ21 𝜎2 ]) (3.15)
When having n training observations and n test observations the covariance matrix
is n×n were all pair of test and training observation are evaluated. According to test
location (𝑋ˈ), training output y, training input X and the covariance function the value of
𝑦∗ can be determined.
3.4.2 Gaussian Process for Classification
GPs have been applied to classification problems too. It is a very effective
classifier, it is a nonparametric classification technique, which is based on a Bayesian
methodology. This technique was developed in the geostatistics field. In a classification
23
problem we have discrete outputs, for example binary y = ±1. The classification model
uses a GP prior p(f) for a function f(x).
p(y = +1|ƒ(x)) = 𝜎2(ƒ(x)) (3.16)
In classification, yi ∈ {−1, 1}, p(yi|xi) = σ(f(xi)), where σ is a sigmoid
transformation. Marginal likelihood is the integral ∫ 𝑃(𝑦|𝑓)𝑃(𝑓|𝑋, 𝜃)𝑑𝑓. Integral is a
product of sigmoids multiplied by a Gaussian, and is therefore intractable. Recall that in
regression task, the likelihood was a Gaussian, which made the integration tractable. Thus,
the posterior cannot be found analytically. an approximation should be employed to get an
approximate posterior (Kuss and Rasmussen, 2006).
3.5 Gaussian Process Fitting
GP fitting can be done by many methods, MLE and Variogram are two common
methods. After the selection of the proper covariance function according to the data
collected and process properties, now we are trying to fit the GP Model to estimate the
covariance function parameters and the mean of the observations, the best common method
to do that job is MLE, where a mathematical expression of the data known as a likelihood
function must be provided at the beginning of the estimation process.
𝐿(𝜃; 𝑥1, … , 𝑥𝑛) = 𝑓(𝑥1, … , 𝑥𝑛|𝜃) = ∏ 𝑓(𝑥𝑖|𝜃)𝑛𝑖=1 (3.17)
This function can be maximized to get the estimation of needed parameters. The
Variogram method is commonly used for spatial statistics problem where a least squares fit
of theoretical variograms to an isotropic, experimental Variogram.
For more details about Gaussian process the following references are useful:
(Banerjee et al., 2004; Von, 2006; Fang et al, 2006; Davis, 2014; Abrahamsen, 1997;
Rasmussen and Williams, 2006).
24
Chapter Four: Research Methodology
Detailed description of the methodology, followed to develop our new multivariate
process control approach is outlined in this chapter.
The development process for establishing a control chart goes through two main
phases, Phase I where we specify the chart control limits, the data in this phase known as
training data, which must be in-control according to the process specifications. Phase II
starts as the control chart in phase I becomes ready to be implemented, data sets used to
check the approach performance contain both in-control and out-of-control observations.
Figure 4.1 shows the methodology flow chart. The proposed approach will be coded as a
set of MATLAB m-files.
4.1 Data Collection
Three data sets are used to check the power of our new approach, the first one was
a simulated data at for three uncorrelated time series using MATLAB, the second data set
was collected from a web-based program, which is used to simulate a batch-fed penicillin
production, the third was a real life data set collected during the etching process in
semiconductor fabrication constitute.
4.2 Modeling Approach
The proposed approach constitutes of the following 5 main steps:
25
Step 1: Fitting a Gaussian Process model for phase I data
In this step, we fit a GP model for each sensor observations using the in-control
(training) data, where we can predict the mean and the covariance structure. Equations 3.13
and 3.14 are used to predict the mean and covariance structure, respectively. The Euclidean
distance in the time domain is measured between each two observations; distances are
gathered into a symmetrical matrix where the diagonal members are zeros. Next, the
Variogram function is used to compute the spatial correlation between observations.
Step 2: Obtaining the residuals for the training data
Using the mean and covariance structure obtained in step 1, we obtain the residuals
R for the training data, such that
𝑅 = 𝑌 − �� (4.1)
Where Y is the observed data point and �� is the predicted observation according to
the mean and covariance structures found.
Step 3: Find the control limits
In this step, we use the residuals obtained from the previous step to get the control
limits (based on type I error rate alpha) through simulation. In this step the mean and the
covariance of the training data residuals are calculated. The UCL can be found through
ensampling that satisfy type I error (𝛼), while the LCL is zero. This is needed since the
exact distribution of the residuals is unknown.
Step 4: Calculate T2 statistic for testing data
For testing data, the following T2 statistic is calculated at each sampling point:
𝑇2 = 𝑅∑−1𝑅′ (4.2)
26
where:
- 𝑅 = 𝑌 − �� is the residuals vector at that point obtained through multivariate
normal theory.
- ∑ : is the covariance matrix of all sensors residuals.
Step 5: Compare T2 for test data against the control limits
Obtained T2 statistic of the test data is compared with the control limits obtained in
step 3. If T2 is greater than the control limit, a signal is issued indicating an out-of-control
point, and hence, an out-of-control batch.
Figure 4.1: Methodology flow chart.
27
Chapter Five: Data, Analysis, and Results
In this chapter, detailed description of the data used for proposed model validation
is provided. Analyses performed along with the results are also provided for each data set
considered. As mentioned before, three data sets were used to validate the proposed model.
Those three data sets are described in sections 5.1 through 5.3, respectively.
5.1 Simulated Data Method
A MATLAB simulation code to generate data for testing the proposed approach at
phase II was written. It assumes that there is a process with three time series outputs, these
outputs are readings from three different sensors as shown in Figure 5.1, functions used
are:
𝑓1(𝑥) = 10 (1 − 𝑒𝑥𝑝 (−0.05𝑥)) (5.1)
𝑓2(𝑥) = 10 𝑠𝑖𝑛 (1.8𝑥) (5.2)
𝑓3(𝑥) = 10 (exp(−0.1𝑥) + 𝑠𝑖𝑛 (𝑥)) (5.3)
28
Figure 5.1: Simulated data profiles. Where (a) f1(x), (b) f2(x), and (c) f3(x).
We run the simulation for 50,000 times, hence a collection of 50,000 time series
profiles for each output, each run was at different values for alpha (0.05, 0.01, and 0.0027),
two different batch length (50 and 100) time units, and at different shifting values of the
first sensor mean (0, 0.5, 1, 1.5, 2, 2.5, and 3). Errors were added to the simulated data
such that:
𝐸~𝑀𝑉𝑁(0, Ʃ) (5.4)
Where ∑ is the covariance matrix of the added error terms. The covariance matrix
∑ is assumed to have a Kronecker product structure of between sensors covariance and
within sensor covariance, such that:
Ʃ = Ʃ𝑏 ⨂ Ʃ𝑤 (5.5)
With ∑b representing between sensors covariance and ∑w representing within
profile covariance. The following values were assumed for ∑b and ∑w:
Ʃ𝑏 = [0.95 0.665 0.475
0.665 0.95 0.38
0.475 0.38 0.95
]
0 10 20 30 40 50 60 70 80 90 100-2
0
2
4
6
8
10
12
Time unit
f1(x
)
0 10 20 30 40 50 60 70 80 90 100-2
0
2
4
6
8
10
12
Time unit
f2(x
)
0 10 20 30 40 50 60 70 80 90 1004
5
6
7
8
9
10
11
f3(x
)
Time unit
(a) (b)
(c)
29
And
(Ʃ𝑤)𝑖𝑗
= 𝜓 + 𝑘𝑒−𝜙𝑑𝑖𝑗 (5.6)
with ψ=0.01, κ=0.05, φ=10, and dij is the Euclidean distance in time space between
observation i and observation j of the same profile.
The collected data was checked to see if there is an autocorrelation within the data
of each sensor, another test was for checking the correlation between sensors data sets.
Figure 5.2 shows that there is a high autocorrelation within each sensor, while Figure 5.3
shows that the is a correlation between the sensors.
Figure 5.2: The autocorrelation charts for the simulated data of each sensor.
30
Figure 5.3: The correlation matrix between simulated sensors data sets.
After the residuals is calculated we checked the autocorrelation between the
residuals for each sensor, Figure 5.4 shows that there is no autocorrelation within each
sensor, and the T2 chart can be used since the autocorrelation is not exist.
31
Figure 5.4: The autocorrelation charts for the residuals of each sensor.
The purpose of this data set is to check the performance of the proposed approach
compared to the traditional Shewhart control charts. The results for probability of detecting
a shift in the mean structure (𝛿), and Average Run Length (ARL) are summarized in Table
5.1.
32
Table 5.1: Simulated data results for type I error rate and ARL.
Batch Length = 50
Batch Length = 100
Shift
(𝜹)
Type I
error
rate (𝛼)
Probability
of
detection
ARL Shift
(𝛿)
Type I
error
rate (𝛼)
Probability
of
detection
ARL
0
0.0027 0.00364 274.7253
0
0.0027 0.00396 252.5253
0.01 0.01178 84.88964 0.01 0.01276 78.36991
0.05 0.04582 21.82453 0.05 0.03198 31.26954
0.5
0.0027 0.00768 130.2083
0.5
0.0027 0.00794 125.9446
0.01 0.02214 45.16712 0.01 0.02644 37.82148
0.05 0.09828 10.17501 0.05 0.10224 9.780908
1
0.0027 0.02874 34.79471
1
0.0027 0.0359 27.85515
0.01 0.07774 12.86339 0.01 0.0932 10.72961
0.05 0.24798 4.032583 0.05 0.28222 3.543335
1.5
0.0027 0.10878 9.192866
1.5
0.0027 0.12494 8.003842
0.01 0.25422 3.933601 0.01 0.2859 3.497726
0.05 0.60026 1.665945 0.05 0.67718 1.476712
2
0.0027 0.3509 2.849815
2
0.0027 0.44996 2.22242
0.01 0.5744 1.740947 0.01 0.69736 1.43398
0.05 0.91858 1.088637 0.05 0.961 1.040583
2.5
0.0027 0.71908 1.390666
2.5
0.0027 0.87266 1.145922
0.01 0.93074 1.074414 0.01 0.979 1.02145
0.05 0.99796 1.002044 0.05 0.9999 1.0001
3
0.0027 0.9841 1.016157
3
0.0027 0.99446 1.005571
0.01 0.99858 1.001422 0.01 0.99996 1.00004
0.05 1 1 0.05 1 1
If we compare the ARL results for our proposed approach to those for individual
control charts, which are used to monitor variables data as the data is impractical to use
rational subgroups such as the 𝑀𝑅 chart. We cannot consider any significant difference
when having some shifting in the mean, also a higher false alarm by our approach also is
33
provided. Hence this approach has the power to be implemented for different simulated
and real data sets.
5.2 Penicillin Fermentation Process Data
A Web Based simulator (PenSim v2.0) designed by Illinois Institute of Technology
to simulate the concentrations of different variables during the penicillin production
process (http://simulator.iit.edu/web/pensim/simul.html), some of these variables are heat
generation, CO2, oxygen, and penicillin concentration. The process is done through two
main stages: preparations and loading stage, and the operation stage. Our interest is to
check the batch quality during the second stage. The process flow chart is shown in Figure
5.5.
Figure 5.5: Penicillin fermentation process (Birol et al., 2001).
34
At Phase-I a simulation for 50 batches were run to collect training data, these
batches were in-control according to the initial inputs of the process, at each run we set the
initial conditions as in Figure 5.6, the simulation output includes 16 profiles which are
shown in Figure 5.7.
Figure 5.6: Penicillin production simulator inputs.
35
Figure 5.7: Penicillin simulation outputs (Birol et al., 2001).
36
Our interest is to check whether the batch is in-control or out-of-control, because of
that we do not need to take the 16 profiles to make our decision about the batch. Biomass
concentration (g/L), Penicillin Concentration (g/L), Carbon Dioxide Concentration (g/L),
and Generated Heat (kcal) are the most important output variables to make a conclusion
about the batch quality. A batch is in-control if the four profiles are within the control
limits, if one is out-of-control the whole batch is out-of-control.
After the completion of Phase-I a total of 500 batches were collected for phase-II
(testing phase), these 500 batches contain 100 in-control batches were labelled as 1, and
400 out-of-control batches that were labelled as 0, which were collected by changing the
value of substrate feed flow rate from 0.041 to 0.0418 L/h. From the 400 out-of-control,
about 300 out-of-control batches that were collected at the substrate feed flowrate value
equal to 0.041 L/h, for this group the decision that the batch is out-of-control can be taken
easily because of the clear difference from the normal batches. While 100 batches were
collected when the value of the substrate feed flowrate value equal to 0.0418 L/h, which is
close to the normal condition as it equal 0.0426 L/h, these batches are used to check the
accuracy of the approach since the batches are out-of-control but close to the in-control
region. Figure 5.8 on the top right shows the test data.
Figure 5.9 and 5.10 show the autocorrelation charts for the data within each sensor,
and the correlation matrix between sensors respectively.
37
Figure 5.8: Penicillin testing data profiles, (a) Biomass concentration (g/L), (b) Penicillin
Concentration (g/L), (c) Carbon Dioxide Concentration (g/L), (d) Generated Heat (kcal).
Figure 5.9: The autocorrelation charts for the penicillin data of each sensor.
0 10 20 30 40 50 60 70 80 90 1006
7
8
9
10
11
12
13
time
Bio
ma
ss c
oncentr
atio
n (
g/l)
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
time
Penic
illin c
on
centr
ation (
g/l
)0 10 20 30 40 50 60 70 80 90 100
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
time
Carb
on d
ioxid
e c
oncen
tration (
g/l)
0 10 20 30 40 50 60 70 80 90 10030
40
50
60
70
80
timeC
arb
on d
ioxid
e c
oncen
tration (
g/l)
(a) (b)
(c) (d)
38
Figure 5.10: The correlation matrix between penicillin sensors data sets.
For this data set we will make our decision about the approach quality by
calculating the accuracy which is equals to the percentage that batches labels (0 or 1)
before testing match the labels resulted from our approach. For example, batch number 50
is out-of-control and its actual label is 0, after the testing process if the batch is within the
control limits its label will be 1 and if not it is 0, the accuracy check is that how many
batches labels are kept the same. The accuracy results at different alpha values (0.05, 0.01,
and 0.0027) are listed in Table 5.2.
Table 5.2: Penicillin dataset accuracy results.
Type I error rate (𝜶) 0.05 0.01 0.0027
Accuracy 0.9650 0.9860 0.9860
39
According to the results we can conclude that the approach accuracy is excellent, as
discussed before that some of the out-of-control batches are close to the in-control batches,
although the accuracy at each 𝛼 values was acceptable and the approach is doing well. The
accuracy percentage decrease at 𝛼=0.05 as the width between UCL and LCL increased.
5.3 Wafer Data Set
In this section, a multivariate time series data set provided by (Olszewski, 2001)
will be used. This data set was collected from an etching process in semiconductor wafer
production using vacuum-chamber sensors. Readings from six sensors that represent
production parameters along with the batch quality being conforming to standards or not
are recorded. Figure 5.11 shows each sensor profile. A classification for each batch is
given as the batch is normal or abnormal.
Each batch contains six parameters, where the value for each parameter was
collected from a single sensor record, these parameters are:
1. Radio frequency forward power.
2. Radio frequency reflected power.
3. Chamber pressure.
4. 405 nanometer (nm) emission.
5. 520 nanometer (nm) emission.
6. Direct current bias.
Electrical power applied can be measured by the radio frequency forward power
and radio frequency reflected power, the pressure is calculated using the third parameter,
the 405 and 520 nanometer emission used to measure the intensity of light emitted by the
plasma, and the sixth parameter used to detect the potential difference within the tool for
the direct electrical current.
40
Figure 5.11: Wafer data profiles for each sensor.
0 10 20 30 40 50 60 70 80 90 100-120
-100
-80
-60
-40
-20
0
20
Time unit
Cham
ber pressure
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400
500
600
700
Time
405 n
anom
ete
r (n
m)
em
issio
n
41
The Wafer data set contain 298 training batches and 896 batches for testing; the
results after the implementation of our approach for this data set will be discussed next.
The test for autocorrelation within each sensor data and the correlation between sensors are
shown in Figure 5.12 and 5.13. The inconsistency in batch length is one of the problems in
this data set shown in Figure 5.14. Another problem is related to time series profiles
alignment. In the next section we solved these problems to prepare data to be used by this
approach.
Figure 5.12: The autocorrelation charts for the wafer data of each sensor.
42
Figure 5.13: The correlation matrix between wafer sensors data sets.
5.3.1 Preparation Data for Testing
5.3.1.1 Fix Batch Length
This step includes resample time series data to a specified length, a predefined
function in MATLAB to do that job which is the “resample” function, for this function we
select the time series profile that we want to change its length, then choose the length that
you want to shrink or stretch the data to that length. Figure 5.15 shows profiles for batches
having the same batch length. The batch length was set to 100 time unit, then if the length
of any batch was less than 100 time unit its profiles will stretch to match that length
without changing the features of each profile, and if the length was more than 100 time
43
unit the batch profiles will be compacted to that length also without any change in the
profiles features.
Figure 5.14: Sensor 6 shows three different batches with variation in length.
Figure 5.15: Sensor 6 with three different batches having the same length, length=100
time unit.
0 20 40 60 80 100 120 140 160 180 2000
500
1000
1500
2000
2500
time
Radio
fre
quency f
orw
ard
pow
er
0 10 20 30 40 50 60 70 80 90 1000
500
1000
1500
2000
2500
time
Radio
fre
quency f
orw
ard
pow
er
44
5.3.1.2 Time Series Profiles Alignment
As discussed before that batches output variables profiles in the wafer data set are
not aligned as shown in Figure 5.16, in this case, we cannot implement any monitoring
technique, since we are not able to find and calculate the correlation between observations.
Figure 5.16: Radio frequency forward power data without alignment.
An alignment technique is applied here to align batch profiles to a given ideal
profile, and to make it clear to calculate the correlation. This was based on shifting each
time series by the lag corresponding to maximum cross correlation, before that we select a
batch to make it a reference for the alignment process. Figure 5.17 shows sensor 1 from the
wafer data set after alignment, we select the batch number 33 as a reference after fixing its
length to 100 time unit, then the remaining profiles are start shifting to match that batch
profile.
0 10 20 30 40 50 60 70 80 90 1000
500
1000
1500
2000
2500
Time
Radio
fre
quency f
orw
ard
pow
er
45
In the description of the wafer production process we found that the process was
done by two stages, while sensors readings in the period between these stages were almost
zero, because the production process might be stopped, hence we suggested to cut profiles
curves into periods. The most important period that can be used to check the batch quality
starts after the beginning of stage two until the end. Figure 5.18 shows the production
stages.
Figure 5.17: Radio frequency forward power data after alignment.
0 10 20 30 40 50 60 70 80 90 1000
500
1000
1500
2000
2500
Time
Radio
fre
quency f
orw
ard
pow
er
46
Figure 5.18: Wafer production process stages.
5.3.2 Wafer Results
Wafer data set results are summarized as the penicillin data set results, the accuracy
has calculated at different alpha values (0.0027, 0.01, and 0.05), and at different number of
training profiles (100, 150, 200, and 250). In additions to the accuracy results we
calculated the number of False Positive (FP) and False Negative (FN) batches, Table 5.3
summarizes the obtained results.
0 20 40 60 80 100 1200
500
1000
1500
2000
2500
time
Radio
fre
quency forw
ard p
ow
er
Stage 1
Stage 2
47
Table 5.3: Wafer data results.
Number of Training
profiles
Type I
error
rate (𝜶)
Accuracy FP FN
100
0.0027 0.9241 21.26% 5.79%
0.01 0.9196 4.72% 10.03%
0.05 0.9205 12.60% 7.34%
150
0.0027 0.9234 5.51% 7.96%
0.01 0.9128 4.72% 9.27%
0.05 0.8822 3.94% 12.87%
200
0.0027 0.9145 26.77% 5.88%
0.01 0.9135 18.11% 7.27%
0.05 0.9044 6.30% 10.03%
250
0.0027 0.7405 11.81% 28.15%
0.01 0.7172 6.30% 31.70%
0.05 0.6441 7.26% 40.76%
According to the results our new approach is valid to be implemented for
monitoring batch process with multi-outputs, the FP and FN percentages were acceptable
at different number of training profiles and different values of type I error rate.
48
Chapter Six: Conclusions and Future Work
In most production processes, several output variables must be monitored and
controlled in order to have specification conforming products. Multivariate Statistical
Process Control is considered a set of tools and techniques that are used to monitor such
multivariate production processes. Early detection of assignable causes that are the causes
of out-of-control products is always preferable as it cuts down the cost of bad quality
products. Early detection of such causes can be achieved through on-line process quality
monitoring. In on-line monitoring systems, process variables are monitored in real
production time where alarming signals are issued indicating a deviation from normal
operating conditions. Since monitoring is in real production time, corrective actions are
taken directly to bring back the process to its normal operating conditions, and hence,
avoiding the production of extra nonconforming products.
Several MSPC tools are found in the literature. Some of these tools; such as the T2
and MVEWMA control charts; assume no correlation within the observations of each
process variable. Tools that assume autocorrelated observations utilize the principles of
dimensionality reduction techniques and they mostly deal with batch production process, a
type of production processes where several process parameters are monitored generating
autocorrelated data. In this thesis, we proposed a MSPC approach that is based on the use
of Gaussian Process models for on-line monitoring of multivariate autocorrelated
processes. The modeling approach takes into account both, the correlation within each
sensor readings and the correlation among sensors (process variables). It also uses no
49
dimensionality reduction technique and hence reduces computation time compared to other
MSPC tools.
To illustrate the proposed approach, three different data sets were used. A
simulated data set consisting of three sensor readings was used to insure that our approach
is not significantly different from Shewhart charts. According to ARL results, this
approach is good enough to be used for real data sets as it provides similar performance to
Shewhart-type control charts for independent process data. The second set was from a web
based simulator for penicillin production, the accuracy results were excellent and supports
the results obtained from previous data set. Finally, we used a data set collected from a
semiconductor fabrication institute for wafer production using an etching process. Other
results in addition to the accuracy were obtained such as false positives and false negatives,
all the results at different combinations of type I error rate and number of training profiles
indicated that this approach is valid to be used for production processes with multi-outputs
monitoring.
Computational time is one of the problems faced during the implementation of this
approach in MATLAB, it increases as the number of training profiles increases as it needs
more time to fit the GP model. We have another problem that this approach is working
very well as it assumed that each time series profile is a smooth curve.
As a future work, trying to find another method that reduce the computation time
until obtaining data for any batch length, a solution for the curve smoothness might also be
obtained by some time series techniques.
50
References
Abrahamsen, P., 1997. A review of Gaussian random fields and correlation functions.
Norsk Regnesentral/Norwegian Computing Center.
Alshraideha, H. & Del Castillo, E., 2013. Gaussian Process Modeling and Optimization of
Profile Response Experiments. Quality and Reliability Engineering International, 34(4),
pp. 449-462.
Alshraideh, H. & Khatatbeh, E., 2014. A Gaussian process control chart for monitoring
autocorrelated process data. Journal of Quality Technology, 46(4), p. 317.
Banerjee, S., Carlin, B. P. & Gelfand, A. E., 2004. Hierarchical modeling and analysis for
spatial data. Crc Press.
Baydogan, M. G., 2012. Modeling Time Series Data for Supervised Learning. (Doctoral
dissertation, Arizona State University).
Birol, G., Ündey, C. & & Cinar, A., 2002. A modular simulation package for fed-batch
fermentation: penicillin production. Computers & Chemical Engineering, 26(11), pp.
1553-1565.
Boyle, P. & Frean, M., 2004. Dependent gaussian processes. In Advances in neural
information processing systems, pp. 217-224.
Camacho, J. & Picó, J., 2006. Online monitoring of batch processes using multi-phase
principal component analysis. Journal of Process Control, 16(10), pp. 1021-1035.
Chen, T. & Zhang, J., 2010. On-line multivariate statistical monitoring of batch processes
using Gaussian mixture model. Computers & chemical engineering, 34(4), pp. 500-507.
Diwekar, U., 2014. Batch Processing: Modeling and Design. CRC Press.
51
Fang, K. T., Li, R. & Sudjianto, A., 2006. Design and Modeling for Computer
Experiments. Crc press: s.n.
García-Muñoz, S., Kourti, T. & MacGregor, J. F., 2004. Model predictive monitoring for
batch processes. Industrial & engineering chemistry research, 43(18), pp. 5929-5941.
Grimmett, G. & Stirzaker, D., 2001. Probability and random processes. Oxford university
press.
Gunther, J. C., 2007. Process monitoring in fed-batch bioprocesses. ProQuest.
Holling, C. S., 1959. The components of predation as revealed by a study of small-
mammal predation of the European pine sawfly. The Canadian Entomologist, 90(05), pp.
293-320.
Hotelling, H., 1947. Multivariate Quality Control Illustrated by Air Testing of Sample
Bombsights. C.Eisenhart et. al., pp. 111-184.
Kuss, M. & Rasmussen, C. E., 2006. Assessing approximations for Gaussian process
classification. In Advances in Neural Information Processing Systems, pp. 699-706.
Martin, E. B., Morris, A. J. & Zhang, J., 1996. Process performance monitoring using
multivariate statistical process control. In Control Theory and Applications. IEE
Proceedings - Control Theory and Applications, 143(2), pp. 132-144.
Mason, R. L., Tracy, N. D. & Young, J. C., 1997. A practical approach for interpreting
multivariate T2 control chart signals.. Journal of Quality Technology, 29(4), pp. 396-406.
McDuff, D., 2010. Gaussian Processes, s.l.: MIT Media Lab, Tech Rep.
52
Montgomery, D. C., 2013. Introduction to Statistical Quality. 7th ed. Singapore: John
Wiley and Sons.
Nomikos, P. & MacGregor, J. F., 1994. Monitoring batch processes using multiway
principal component analysis. AIChE Journal, 40(8), pp. 1361-1375.
Nomikos, P. & MacGregor, J. F., 1995. Multi-way partial least squares in monitoring batch
processes. Chemometrics and intelligent laboratory systems, 30(1), pp. 97-108.
Olszewski, R. T., 2001. Generalized feature extraction for structural pattern recognition in
time-series data. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF
COMPUTER SCIENCE.
Osborne, M. A. & Roberts, S., 2007. Gaussian processes for prediction, Department of
Engineering Science, University of Oxford, Tech. Rep.
Runger, G. C., 2002. Assignable Causes and Autocorrelation: Control Charts on
Observations or Residuals?. Journal of Quality Technology, 34(2), p. 165–170.
Snelson, E. L., 2007. Flexible and efficient Gaussian process models for machine learning.
s.l.:(Doctoral dissertation, University of London).
Technology, I. I. o., n.d. A web-based program for dynamic simulation of fed-batch
penicillin production. [Online]
Available at: http://simulator.iit.edu/web/pensim/simul.html.
Umit, F. & Cigdem, A., 2001. Multivariate Quality Control: A Historical Perspective.
Yilditz Technical University, pp. 54-65.
Williams, C. K. & Rasmussen, C. E., 2006. the MIT Press. The MIT Press.
53
Yin, S., Ding, S. X., Abandan Sari, A. H. & Hao, H., 2013. Data-driven monitoring for
stochastic systems and its application on batch process. International Journal of Systems
Science, 44(7), pp. 1366-1376.
Yoo, C. K., Lee, J. M., Vanrolleghem, P. A. & Lee, I. B., 2004. On-line monitoring of
batch processes using multiway independent component analysis. Chemometrics and
Intelligent Laboratory Systems, 71(2), pp. 151-163.
Zhou, L., Chen, J. & Song, Z., 2015. Recursive Gaussian Process Regression Model for
Adaptive Quality Monitoring in Batch Processes. Mathematical Problems in Engineering.
54
Appendices
Appendix A. : UCL code
function [UCLopt]=findUCL(ResTrain,L,alpha)
% Find the UCL through ensampling that satisfy type I error = alpha
nSample=200000; mu=mean(ResTrain); sigma=cov(ResTrain); sigma_inv=inv(sigma); r=mvnrnd(mu,sigma,nSample*L);
Tscores=zeros(nSample*L,1);
for i=1:nSample*L Tscores(i)=r(i,:)*sigma_inv*r(i,:)'; end
Tscores=reshape(Tscores,nSample,L);
% Set optimization options options =
optimset('Algorithm','sqp','Display','iter','MaxIter',10000,'MaxFunEvals'
,10000,'TolFun',1e-12,'TolX',1e-12); [UCLopt]=fmincon(@(UCL)
optUCL(UCL,Tscores,alpha,L,nSample),max(max(Tscores))/2,[],[],[],[],0.0,m
ax(max(Tscores)),[],options);
55
Appendix B. : Get residuals code
Function[ResTrain,Res,InControlT,PredClassT,T2Train,T2Test]=getResidualsT
2(data,Test,nSensors,nProfiles,CovModel,maxd,alpha)
% This function fits a GP model for PCs derived from training data, fits % the testing data and calculates the residuals (actual-predicted)
% NO correction for alpha value is done here (use otimalAlpha)
% Reshape the data into useful form for PCA [n1 n2]=size(data); y=reshape(data',n2*nProfiles,nSensors);
% Define Residuals matrix [n3 n4]=size(Test); Res=ones(n4*n3/nSensors,nSensors); ResTrain=ones(n2*n1/nSensors,nSensors);
% Construct GPs for scores (independent GPs) and make predictions x=repmat([1:1:n2]',nProfiles,1); nugget_initial=100;
for i=1:nSensors display(sprintf('Fitting GP model for sensor %d data \n',i)); v = variogram([x
zeros(size(x))],y(:,i),'plotit',false,'maxdist',maxd); [dum,dum,dum,vstruct] =
variogramfit(v.distance,v.val,[],[],[],'model',CovModel,'nugget',nugget_i
nitial,'plotit',false); mu0=mean(y(:,i)); [Zhat(i,:) Zvar(i,:)]=predictGP(x,y(:,i),[vstruct.nugget vstruct.sill
vstruct.range mu0],n2,maxd);
% Get Residuals for TRAINING data temp=reshape(y(:,i),n2,n1/nSensors)'; ind=1; for j=1:n1/nSensors display(sprintf('Getting residuals for profile %d of sensor %d of
training data\n',j,i)); for k=1:n2 ytrain=[Zhat(i,:) temp(j,1:k)]; dist=squareform(pdist([1:n4,1:k]')); dist(dist>maxd)=inf; sigma=vstruct.nugget.*eye(size(dist))+vstruct.sill.*exp(-
vstruct.range.*dist); sigma11_inv=sigma(1:end-1,1:end-1)\eye(length(ytrain)-1);
%sigma12=sigma(1:end-1,end); sigma21=sigma(end,1:end-1); %sigma22=sigma(end,end);
56
mu=mu0+sigma21*sigma11_inv*(ytrain(1:end-1)'-repmat(mu0,n4+k-
1,1));
%var=sigma22-sigma21*sigma11_inv*sigma12;
ResTrain(ind,i)=ytrain(end)-mu;
ind=ind+1; end end % End residuals for TRAINING
% Reshape test data yTest=reshape(Test',n4*n3/nSensors,nSensors);
% Get Residuals for TESTING data temp=reshape(yTest(:,i),n4,n3/nSensors)'; ind=1; for j=1:n3/nSensors display(sprintf('Getting residuals for profile %d of sensor %d of
testing data\n',j,i)); for k=1:n4 ytrain=[Zhat(i,:) temp(j,1:k)]; dist=squareform(pdist([1:n4,1:k]')); dist(dist>maxd)=inf; sigma=vstruct.nugget.*eye(size(dist))+vstruct.sill.*exp(-
vstruct.range.*dist); sigma11_inv=sigma(1:end-1,1:end-1)\eye(length(ytrain)-1);
%sigma12=sigma(1:end-1,end); sigma21=sigma(end,1:end-1); %sigma22=sigma(end,end);
mu=mu0+sigma21*sigma11_inv*(ytrain(1:end-1)'-repmat(mu0,n4+k-
1,1));
%var=sigma22-sigma21*sigma11_inv*sigma12;
Res(ind,i)=ytrain(end)-mu;
ind=ind+1; end end % End residuals for TESTING
end
% Calculate T^2 values for training and testing data sig_inv=inv(cov(ResTrain));
57
T2Train=zeros(n2*n1/nSensors,1); for k=1:n2*n1/nSensors T2Train(k)=ResTrain(k,:)*sig_inv*ResTrain(k,:)'; end
T2Test=zeros(n3*n4/nSensors,1); for k=1:n3*n4/nSensors T2Test(k)=Res(k,:)*sig_inv*Res(k,:)'; end
% Find control limits of Chi-squared chart display(sprintf('Optimizing UCL...')); [UCL]=findUCL(ResTrain,n2,alpha); display(sprintf('Optimal UCL found %0.4f =\n',UCL));
InControlT=(T2Test<=UCL); PredClassT=ones(n3/nSensors,1); for j=1:n3/nSensors PredClassT(j)=(sum(InControlT((j-1)*n2+1:j*n2))==n2); end
58
Appendix C. : Profiles alignment code
%function[alTS,shift,corr] = alignTS(ideal,TS) %This program tries to align the time series to a given ideal prifile %by shifting it by the lag corresponding to maximum cross correlation function[alTS,shift,corr] = alignTS(ideal,TS) [n,l]=size(TS); alTS = zeros(n,l); maxlag = 30; for i = 1:n [c,lag]=xcorr(ideal,TS(i,:),maxlag,'coeff'); [corr(i),I]=max(c); shift(i)=lag(I); if shift(i)>0 alTS(i,shift(i)+1:l) = TS(i,1:l-shift(i)); elseif shift(i)<0 alTS(i,1:l+shift(i)) = TS(i,1-shift(i):l); else alTS(i,:)=TS(i,:); end end
59
مراقبة عمليات الإنتاج الكمية المترابطة بإستخدام منهاج عملية غاوس
مؤمن حسن فرحان ربابعهإعداد:
الملخص
لهدف اتحقيق هذا الهدف لأي صناعة، ول يمكن مطابقة للمواصفات إلى أقل ماالغير تقليل عدد المنتجات يعتبر
ف أي انحرا لاكتشاف التصنيعية اتيتم مراقبة متغيرات العمليالعمليات احصائيا وأدوات التحكم لمراقبة منهاج نستخدم
راقبة أكثر مفيها تم عمليات الإنتاج الكمية يفإن في معظم الصناعات وعلى سبيل المثال فعن أوضاع التشغيل الطبيعية.
تدخل في تحديد جودة المنتج النهائي. لى طرق لسابقة إأشارت العديد من البحوث اهذا وقد من متغير كونها جميعا
دوات الأفيرات. المتغ متعددة للتعامل مع العمليات متعددة المتغيرات، ولكن القليل منها تطرق إلى وجود ترابط بين هذه
لمراقبة العمليات ذات المتغيرات المترابطة لديها قصورين رئيسيين، أما الأول راقبة دم للمستخهي لاتفالموجودة حاليا
ستطيع أخذ ن لاعليه فإننا و ملية التصنيعية ثم أخذ القرار،طلب إنهاء مدة العأثناء الإنتاج، وبالتالي الحكم على المنتج يت
ق تخفيض ستخدم طرتهذه الأدوات فإن نتاجية قبل انتهائها. وبالنسبة للقصور الثاني ل لتصحيح العملية الإأي ردة فع
قيدات الحسابية.الأبعاد لحل مشكلة التع
الكمية، نتاجلإاتغيرات بالأضافة إلى عمليات في هذه الاطروحة تم التطرق الى مراقبة العمليات متعددة الم
متغيرات ترابط بينمعتمد على استخدام عملية جاوس. المنهج المطروح يأخذ بعين الاعتبار ال منهجنموذج ل اقتراحوتم
واحد من هذه المتغيرات.العملية والترابط داخل قراءات كل
وذلك قيةحقي عيةتصني لبياناتثالثة البيانات الناتجة من المحاكاة ومجموعة قواعد تم استخدام مجموعتين من
ثة تحتوي ثلا حقيقة للحكم على نموذج المنهج المطروح، حيث كانت قاعدة البيانات الأولى تمثل محاكاة لعملية تصنيعية
ات دة البيانلى قاععموقع على الشبكة العنكبوتية لمحاكاة انتاج كميات البنسلين للحصول متغيرات، بينما تم استخدام
نا بعد ذلك وصلات. قماه المالثانية، أما بالنسبة لقاعدة البيانات الحقيقة فقد كان مصدرها عملية نقش الرقائق لتصنيع اشب
لتي تكون خارج ( والدقة في توقع الكميات اARL) لتشغيال طول توسطبتقييم أداء النموذج المقترح من خلال حساب م
م اول التحكبه لجدنطاق التحكم. أما بالنسبة للنتائج المتعلقة بمتوسط طول التشغيل فأن منهج النموذج المقترح مشا
بالإعتمادولاقة، الخاصة بنوع شيوارت، حيث أن أفضل ما يمكن تحصيله وذلك عندما لا يكون بين بيانات العملية أي ع
% إلى92ين وعلى عدد حالات التدريب فإن قيم الدقة التي حققت كانت تتراوح ب الأول الخطأ معدل نوععلى قيم
عددة عمليات متقبة ال%. نظرأ إلى النتائج المحصلة نجد أن منهج النموذج المقترح لديه أداء جيد ويمكن تطبيقه لمرا98
من حيث لسابقةالا في البحوث المتغيرات. بالأضافة إلى ذلك فإن هذا المنهج المقترح تخطى القصور الذي كان حاص
أنه بالإمكان استخدامه لمراقبة العلميات اثناء الإنتاج ولا يوجد أي تخفيض بالأبعاد.