Upload
maria-claudia
View
232
Download
0
Embed Size (px)
Citation preview
7/28/2019 Model Validation on Selection of Process Models
1/16
Computers and Chemical Engineering 29 (2005) 15071522
Influence of model validation on proper selection of processmodelsan industrial case study
N. Hvala a,, S. Strmcnika, D. Sel a, S. Milanic b, B. Banko b
a J. Stefan Institute, Jamova 39, 1000 Ljubljana, Sloveniab Faculty of Electrical Engineering, Trzaska c. 25, 1000 Ljubljana, Slovenia
Received 26 July 2000; received in revised form 30 November 2004; accepted 30 November 2004
Available online 7 March 2005
Abstract
This paper considers the design and validation of a model of an industrial batch process in TiO 2 production. The model will be used for
flexible recipe control, which is a model-based approach used in control and optimisation of batch processes. Because of insufficient knowledge
and a lack of proper data, different process models were developed: a semi-empirical dynamic model based on chemical kinetics laws, and
several experimental black-box models. In the paper the models are validated and mutually compared. Validation of models has shown
that besides comparing the model and the process output behaviour, additional measures considering also the model input error should be
introduced for proper model validation related to the model use. In our case, introducing additional measures also contributed to improvement
of the model design procedure, so that a simple yet satisfactory black-box model was obtained despite a small amount of process data.
2004 Elsevier Ltd. All rights reserved.
Keywords: Model validation; Model input error; Model-based control; Optimisation; Flexible recipe control; Hydrolysis batch process
1. Introduction
Mathematical modelling is a pervasive methodology,
which represents an important and steadily increasing part of
almost every field of science and engineering. The range of
problems addressed by using mathematical models is practi-
cally unlimited, therefore a vast amount of methods and tools
have been developed trying to make the process of modelling
simpler and more efficient. In spite of these endeavours math-
ematical modelling is far from becoming a routine work. On
the contrary, it is still considered to be more art then science.The reason lies in the fact that modelling is an iterative design
procedure in which human judgement and creativity play a
decisive role.
One of the most important steps in this procedure is the
evaluation of the model. Model evaluation generally consists
of two stages: verification and validation. Verification con-
Corresponding author. Tel.: +386 1 4773 900; fax: +386 1 4773 994.
E-mail address: [email protected] (N. Hvala).
cerns the consistency and accuracy of simulation programs
compared with the associated mathematical models, while
model validation concerns the level of agreement between
mathematical descriptions and the real system under investi-
gation (Murray-Smith, 1998). In the literature on modelling
and simulation it is generally accepted that model validation
is a crucial part of themodel development procedure. Without
validation a model is of very little use (Neelamkavil, 1987).
However, it is also stated that in reality, model validation is
treated in a superficial way and does not form a central el-
ement of the modelling process. As stated (Murray-Smith,1995), most application papers in journals and conference
proceedings pass over questions of model validation in a su-
perficial fashion or make no mention of it at all.
In practice, validation is usually reduced to checking the
agreement between outputs of the model and those of the
real system (Qureshi, Harrison, & Wegener, 1999). This is
especially true in the development of industrial process mod-
els, where process complexity, insufficient knowledge about
the process and a lack of proper data lead to this kind of
0098-1354/$ see front matter 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compchemeng.2004.11.013
7/28/2019 Model Validation on Selection of Process Models
2/16
1508 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
simplifications. However,the mainissue in the design of mod-
els for engineering purposes is to design a model, which is
valid for the intended purpose. Therefore, the primary aim of
the model validation should be to find out whether the model
is good enough for the intended use. This task requires a
much more comprehensive validation approach in which ad-
ditional methods are used to build up the confidence into theconsidered model.
The aim of the paper is to show that using more complete
validation approach may lead to decisions in the model de-
sign procedure, which are different from those obtained on
the basis of simple evaluation of the output fit. In our case
the main additional issue in the applied validation process
is using the concept of the inverse model and the related in-
put error, which provide complementary information about
the model usability. The need for such an approach was en-
countered over the design of a model of an industrial batch
process in TiO2 production. The aim of the model is to apply
it within the flexible recipe control (Rijnsdorp, 1991), which
is a model-based approach used in control and optimisationof batch processes. Different models were developed for this
process, i.e. a semi-empirical model based on chemical kinet-
ics laws, and several experimental (black-box) models. The
models were designed based on different sources of process
information, i.e. process knowledge and available process
data. All the models express relatively similar quality if con-
sidering the agreement to process output data. In the paper it
is shown that only after validating the models also in relation
to the model use, i.e. evaluating the model input error and
performing sensitivity analysis of the control variable, was it
possible to distinguish between the performances of different
model structures, and to direct the model design in the properway.
The paper is organized as follows. After a short overview
of model validation methods, it describes the industrial pro-
cess of batch hydrolysis and presents the concept of flexible
recipes to be applied for control of this process. The main em-
phasis is then placed on the design and validation of different
process models to be used in the control scheme. The models
are evaluated and mutually compared in relation to different
performance measures. The obtained results are then com-
mented on in the discussion.
2. Model validation
The main concepts of model validation can be found in
most textbooks on modelling and simulation (e.g. Bossel,
1994; Kheir, 1988; Murray-Smith, 1995; Neelamkavil,
1987), while a survey of model validation methods has been
presented by Murray-Smith (1998). Although the available
concepts and methods are rather general, one has to be aware
that model validation is always problem dependent (Ljung,
1999; Qureshi et al., 1999).
Generally speaking,the quality of themodel canbe judged
with respect to several features. The most important ones
are model purposiveness, model falseness and model plau-
sibility (Bohlin, 1991; Sage, 1992; Zele, Juricic, Strmcnik,
& Matko, 1998). Purposiveness (usefulness) tells whether a
model satisfies its purpose. Falseness is related to agreement
with measurements (data) coming from the real system to be
modelled (a falsified model is one, which is contradicted by
data). Plausibility, also referred to as conceptual validity orface validity (Qureshi et al., 1999), expresses the confor-
mity of the model with a priori knowledge about the process.
Below is a short summary of model validation concepts re-
lated to the above-defined features.
2.1. Model purposiveness
A model is always developed with a certain purpose, i.e.
with theaim to solvea certain problem. Thereforethe ultimate
validation of the model is to test whether the problem that
motivated the modelling exercise can be solved using the ob-
tained model (Ljung, 1999). Testing of model purposiveness
might be often impossible, too expensive, time consuming,dangerous, etc. In some cases the mentioned problems can be
alleviated by testing the solution in a simulation environment
based on the process model. However, such an approach only
partially solves the problem and is still difficult to perform.
In practice, assessment of model plausibility and espe-
cially falsification is generally easier. Hence, it often hap-
pens that falsification is performed instead of validation of
model purposiveness. However, even if a model agrees with
the available data it may not be necessarily good enough for
a given purpose, and such an approach may lead to the design
of an inappropriate model. Unfortunately the opposite might
also be true. The model may not agree with data, and is stillgood enough to serve its purpose.
2.2. Model plausibility
Assessment of model plausibility is tightly related to ex-
pert judgement of whether the model is good or not. The
level of plausibility, or better said the expert opinion about it,
is basically related to two features of the model.
The first one considers the question whether the model
looks logical. This question concerns characteristics of
the model structure (type of equations, connections between
equations, etc.) and its parameters (gains, time constants,
signs, etc.), and is relevant when the model is derived from
first principles. If the structure and the parameters are feasi-
ble, which means comparable to what experts know about the
real process, then the confidence into the model is greater.
The second one is related to the question whether the
model behaves logically. This part concerns assessment
of the reaction of the model outputs (dynamics, shape, etc.)
to typical events (scenarios) on the inputs. If the model in
different situation reacts in accordance with expectations of
the experts, then again our confidence about its validity is
increased. Note that for black box models this is the only
way to assess the plausibility.
7/28/2019 Model Validation on Selection of Process Models
3/16
N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1509
Fig. 1. Computationof modeloutput error (P, process;M, model; u, process
input; n, noise; y, process output; ym, model output; eo, output error).
2.3. Model falseness
As already mentioned falsification is the most widely used
approach to the validation of models and is related to direct
comparison of inputoutput data from the model and from
the real system. However, also within this validation area the
methods substantially differ concerning the applied princi-
ples. The basic distinction concerns the questions what iscompared and how it is compared.
2.3.1. Comparison of outputs
Comparison of model and process outputs is a standard
approach, which does not need explanation. For sake of com-
pleteness and to enable a comparison with other approaches,
let us just point out that the issue under consideration here
is the difference between the process and the model output,
which is referred to as the output error. It is computed in the
way as shown in Fig. 1.
2.3.2. Comparison of inputsIn some applications differences, which appear in the in-
put variables, may highlight deficiencies in the model more
readily than conventional comparisons based on output vari-
ables (Murray-Smith, 2000). This is especially important in
the case of control or optimisation problems for processes
with multiple inputs. It is known that different combinations
of values of input variables may result in the same (or very
similar) values of the output variables. Thus, the same output
error could be produced by different input combinations. For
that purpose, the input error is determined as another rele-
vant measure for model validation (Gray & von Grunhagen,
1998). Input error is the difference between the process in-
put and the output of the inverse model, the latter computed
when the measured process output is applied as the input
to the inverse model. The computation of the input error is
schematically shown in Fig. 2.
The basic problem concerning this kind of validation is
apparently related to the concept of the inverse model. This
concept has been extensively used in the areas of signal pro-
cessing, telecommunication, and control (Goodwin, 2002),
where also methods for derivation and use of inverse models
have been developed. Basically two approaches are possible:
first, the explicit analytical derivation, which is limited to
simple, mostly linear models, and second, the more general
Fig. 2. Computation of model input error (P, process; M1, inverse model;
u, process input; n, noise; y, process output; um, inverse model output; ei,
input error).
implicit simulation and optimisation technique, which does
not provide the inverse model itself, but provides the solution
of the inverse problem.
2.3.3. Comparison methods
The comparison of measured and simulated data can be
performed qualitatively, quantitatively, or based on statisticalmethods (Murray-Smith, 1998).
Qualitative approaches involve plotting the process and
the corresponding model variables and performing visual in-
spection of differences.
Quantitative methods are based on performance measures
that determine goodness of fit. In this paper root mean square
error (RMSE), Theils inequality coefficient (TIC) (Thiel,
1970), and relative error (REL) are used, which are defined
with the following equations:
RMSE = i(yi ym,i)2
N(1)
TIC =
i(yi ym,i)
2iy
2i +
iy
2m,i
(2)
REL =
i((yi ym,i)
2/y2i )
N(3)
Thereby yi represent the measured data points, ym,i the com-
puted data points andNthe number of data points. Lower val-
ues of RMSE and RELindicate better agreement between the
measured and computed data. The value of TIC lies between
zero and unity, with values closer to zero indicating better
model validity. The measures are more appropriate for model
comparison than for use in an absolute sense, although Zhou
(1993), for example, suggested that values of TIC smaller
than 0.3 indicate good agreement with measured data.
Statistical methods apply to the comparison of the distri-
bution of the data rather than to point-by-point comparisons.
They include descriptive statistics, which deals with means,
variances, correlations, etc., and inferential statistics, which
considers hypothesis tests and confidential intervals (Qureshi
et al., 1999). A widely used on-line validation approach is a
stepwise regression wherethe selection of the modelstructure
7/28/2019 Model Validation on Selection of Process Models
4/16
1510 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
is based on the correlation coefficient R and the F-ratio. R
gives a measure of the accuracy of the fit, while the F-ratio
provides a measure of the confidence that can be ascribed to
this fit.
2.4. Other validation methods
There are also validation methods, which cannot be di-
rectly assigned to one of the considered three groups.
One of such approaches is sensitivity analysis. Sensitivity
analysis examines the extent of variation in predicted perfor-
mance when parameters are systematically varied over some
range of interest (Qureshi et al., 1999). It provides the in-
sight into stability of the model and into priority ways for its
refinement.
A similar but complementary approach is known as distor-
tion method(Cameron, Marcos, & de Prada, 1998; Murray-
Smith, 1998). The approach is based on the assumption that
any model output can be made to follow a measured variable
by distorting the model parameters as a function of time.The better the model, the less distortion is required. If the
distortion needed to match the process response lies within
acceptable limits, the model is judged satisfactory.
Finally, there are some methods, which are related to data
analysis or data validity. One of the most known is based
on system identification techniques and uses the concept of
identifiability (Murray-Smith, 1998). Unique and reliable es-
timates of the parameters of a model of given structure can be
obtained only if that model is identifiable. Structural uniden-
tifiability occurswhen a model hastoo many parameters to al-
low all of them to be estimated separately from the measured
variables. In that case, it is known that certain parameterscan only be estimated in combination with other parameters.
Numerical identifiability is associated with measurement er-
ror, process noise or inadequate information in the measured
data. It is especially important in the design of experiments;
they should be designed so as to obtain data about the process
that is rich in information.
The above listed methods represent possible ways to
model validation, but are in practice still not fully exploited
for ascertaining the model quality. This paper shows how
some of these methods can be used in practice for the evalu-
ation and improvement of the process model.
3. Problem description
3.1. Process description
The problem addressed in this paper is modelling of hy-
drolysis process, which is one of 18 successive processes in
the production of TiO2 pigment. Hydrolysis runs in a batch
reactor (Fig. 3) where TiO2gel is formed by precipitation out
of TiOSO4 solution. The reaction occurs by adding water
and seeds, and maintaining the solution at boiling point for
an appropriate amount of time.
Fig. 3. Batch hydrolysis reactor.
The reaction is depicted by the following chemical equa-
tions:
TiOSO4 +H2O TiO2 +H2SO4 (4)
TiO2nucleation seeds (5)
TiO2 + seeds TiO2gel (6)
TiO2 is an intermediate product. In the nucleation process it
is transformed into seeds (5), or else it is precipitated onto
already formed seeds (6). Additional seeds are added to the
solution during the heating phase to accelerate precipitation.
The final product is the TiO2gel. H2SO4 is the hydrolysis by-
product.
Hydrolysis is one of the most crucial parts of the overallproduction process, since for the first time in the production
line particles (TiO2gel) are formed out of solution. The size
of particles has a strong influence on the quality of the final
product. Hence,it is desired that theparticles be of desired op-
timal size. Achieving this is not a straightforward task, since
the process inputs are subject to uncontrollable variations,
which are also reflected in the process output variations. The
size of particles in the TiO2gel is determined by the parameter
D50, which represents the diameter of floccules attributed to
the highest number of floccules. On-line control (during the
batch) of particles size is not possible, since parameter D50is measured only at the end of the batch, when a sample of
the final product is examined by laboratory analysis.
3.2. Flexible recipe control
It is expected that more uniform particles size between the
batches could be achieved by flexible recipe control, which is
a control concept introduced for batch processes (Rijnsdorp,
1991; Verwater-Lukszo, 1998). Unlike fixed recipes that pre-
scribe fixed recipe instructions for running a batch, flexible
recipe control adjusts the recipe parameters to account for
changes in input and operating conditions of a current batch.
In this way, final product quantity and/or quality could be
7/28/2019 Model Validation on Selection of Process Models
5/16
N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1511
Fig. 4. Flexible recipe control: optimisation of recipe parameters with theaid of a process model.
retained despite changing conditions in performing a batch.
Adjustment of recipe parameters is done by optimisation
(Fig. 4) where a performance index representing the eco-
nomic aspects of process operation (e.g. production yield,
product quality, energy consumption, etc.) is optimised by
varying the recipe parameters. The value of the performance
index at suggested values of recipe parameters is computed
by simulating the process behaviour using the process model.
In the case of hydrolysis process, flexible recipe control
is applied for control of quality parameter D50, which is theprocess output variable. Variations of input concentrations
are the main source of input variations, while seeds addition
can be chosen as an adjustable recipe item (control variable).
The construction of the process model used in flexible recipe
control is describedin thenext section. Potentialmodel inputs
affecting D50 are active acid concentration (AA), free acid
concentration (AP) and TiO2 concentration (CTiO2 ),all inthe
input solution, concentration (Cseeds) and volume (Vseeds) of
added seeds, temperature during the reaction, flow of added
water (w), and steam flow after boiling (s).
Control of hydrolysis process using flexible recipe control
will be performed by calculating the optimal seeds addition
for each batch with the aim to produce the particles of the de-sired optimal size. This calculation will be performedat a start
of each batch and will be based on most recently measured
concentrations of input solution, while the other operating
parameters, e.g. temperature, addition of water, etc., will be
set equal to the preceding batch.
4. Design of process models
As in many real world applications, the design of a pro-
cess model for the hydrolysis process was very much con-
strained by the available knowledge and data about the pro-
cess. Knowledge was insufficientdue to theprocess complex-
ity, where only some basic relations from chemical kinetics
laws could be derived. On the other hand, data about the pro-
cess was also limited to two groups:
a small set of experimental batches (23 batches) performed
purposely for model identification; a set of 449 batches from regular production.
The batches from both groups were processed in one of
the industrial hydrolysis reactors.
In the experimental batch set, some of the recipe param-
eters (i.e. the seeds addition, the flow of added water) were
subject to considerable change in order to observe the con-
sequent changes in the process output. Changes were made
great enough to exceed the process noise and to obtain a
model valid in a wide operational region. The input solution
concentrations are uncontrollable and varied in this set of
batches as in normal production.
The second set represented batches from normal produc-
tion. They were performed under fixed recipe control, so that
no special excitation of recipe parameters was imposed. Only
a small number of batches in this set were performed with a
change of seeds addition.
In the design of process models two types of models
were developed: a semi-empirical dynamic modeland several
static (black-box) models.
4.1. Semi-empirical dynamic model (SDM)
A semi-empirical dynamic model (Sel, Hvala, Strmcnik,
Milanic, & Suk-Lubej, 1999) was designed based on twotypes of knowledge: chemical kinetics laws and empirical
knowledge of process experts. The model consists of two
sub-models (Fig. 5):
a dynamic model representing the curve of precipitated
TiO2gel during the batch, and
a static model mapping the precipitation rate curve into
quality parameter D50.
4.1.1. Dynamic part of the SDM model
The structure of the dynamic model was based on chem-
ical kinetics laws that describe dynamic relations between
reactant concentrations (c) and the process reaction rate (r)during the batch. Reaction (precipitation) yield is expressed
as a function of time, and is related to the reaction rate r(t) in
the following way:
yield(t) =
t0
r(t) dt (7)
According to chemical kinetics laws, the reaction rate for the
reaction A + BC generally depends on the concentrations
of reactants A and B, and is written as
r(t) = k(T)caAcbB (8)
7/28/2019 Model Validation on Selection of Process Models
6/16
1512 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
Fig. 5. Structure of a semi-empirical dynamic model.
where a and b are empirically determined constants; k(T)
is a specific reaction rate that is modelled by the Arrheniusequation
k(T) = k0 eE/T (9)
where Trepresents temperature, and k0 and Eare constants.
Hydrolysis is a reversible process. Therefore, the concen-
tration of a particular component Cdepends on the reaction
rate of producing (rp) and consuming (rc) this component,
and is generally given as
dcc
dt= rp rc (10)
Following these laws, a set of differential equations was de-termined for each component in the hydrolysis process. Tak-
ing into account the chemical relations (4)(6), the model
equations are the following:
rTiOSO4 =dcTiOSO4
dt= k1cTiOSO4 cH2O + k2cTiO2
cH2SO4
(11)
rH2O =dcH2O
dt
= k1cTiOSO4 cH2O + k2cTiO2cH2SO4 + h2uH2O (12)
rTiO2= dc
TiO2
dt= k1cTiOSO4 cH2O
(k2 + k3)cTiO2 cH2SO4 k4cTiO2
ch1seeds + rgel (13)
rH2SO4 =dcH2SO4
dt= k1cTiOSO4 cH2O k2cTiO2
cH2SO4 (14)
rTiO2gel =dcTiO2gel
dt= k4cTiO2
ch1seeds rgel (15)
where the following notation is used: r is the reaction rate
of different components (indexes are used to distinguish be-
tween individual components presented in (4)(6)), c the
components concentrations, k the parameters modelled bythe Arrhenius equation (9), h the constant model parameters,
rgel the gel activity (constant). uH2O is the addition of water,
which is equal to the flow of indirectly added water s during
the heating phase, and directly added water w during the
cooking phase. Temperature is included in the model in the
Arrhenius model parameters k. cseeds is the concentration of
seeds in TiO2gel, and depends on the reaction rate of seeds
formation in the nucleation process, as well as on the volume
Vseeds and concentration Cseeds of added seeds. The latter two
parameters are also combined into technological parameter
named seeds addition useeds, which is defined as the percent
of TiO2 mass in added seeds compared to the TiO2 mass ininitial solution and can be written as follows:
useeds =CseedsVseeds
CTiO2 Vsol 100% (16)
CTiO2 is the TiO2 concentration in initial solution, and Vsol is
the volume of initial solution (equal for each batch).
Eqs. (11)(15) were constructed with the assumption that
all the chemical reactions are of the first order, i.e. all the
exponential parameters are set to 1 (except h1). Initial val-
ues of the simulated process variables are determined from
the initial solution concentrations. The result of the system
of differential equations is the concentration of precipitated
TiO2gel in time (cTiO2gel ), also denoted as yield, which can
be represented by the precipitation curve (Fig. 6). The to-
tal batch duration is 5 h with the reaction lasting of ap-
proximately 1.5 h, while the model time constant is around
1 h.
Parameters of thedynamic model were estimatedusingthe
experimental data set. For that purpose experimental batches
were performed with additional laboratory measurements
of precipitation yield at some pre-defined times during the
batch. Parameter values were obtained by a simplex optimi-
sation method, by which the difference between themeasured
and computed yield was optimised for a set of experimental
7/28/2019 Model Validation on Selection of Process Models
7/16
N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1513
Fig. 6. Dynamic part of the SDM model: measured (*) and computed ()
yield during an experimental batch.
batches. As an example, Fig. 6 shows the precipitation curve
for comparison with experimental data. In the set of batches
used for validation, the model predicts the final yield within
5% of the measured value for 78% of batches and within
10% for all the batches.
The performance of the model was also evaluated by the
sensitivity analysis of the model to changes of model input
variables. The model sensitivity was in accordance with the a
priori expert knowledge as well as with information found in
the literature (Santacesaria, Tonello, Storth, Pace, & Carra,
1986). The dynamic model is most sensitive to seeds addi-
tion andtemperature. Besidethesetwo inputsalso initial TiO2concentration and active acid concentration have an impor-
tant influence on precipitation curve.
4.1.2. Static part of the SDM model
The static part of the semi-empirical dynamic model was
designed in order to map the precipitation curve into out-
put quality parameter D50. The structure of the static model
could not be based on theoretical knowledge. Hence, it was
determined by trial and error, where different segments of
the precipitation curve were tested as potential model inputs.
Also, the seeds addition useeds was considered to have a direct
influence on the model output. In each step of the selection
procedure, thechosen model structurewas evaluated by thefit
between the model and the process data. The model structure
finally chosen is the following:
D50 = p1K23 + p2K223 + p3K67 + p4K89 + p5useeds
+p6 RVM+ p7 RVM2 + p8K78 RVM
2 + p9 (17)
where Kdenote the slopes of different parts of precipitation
curve shown in Fig. 7 (e.g. K23 denotes the slope between
the two points where the yield reaches the value of 20 and
30%), RVM is the maximum slope of the precipitation curve,
and p denote the model parameters. The chosen segments
Fig. 7. Segments of the precipitation curve used as inputs in the static part
of SDM model.
represent the most informative parts related to the shape of
the precipitation curve.
Parameters p were determined by the least-squares
method. They were estimated on a selected set of batches
from normal production. This batch set was chosen in such a
way that an equal distribution of batches over the entire range
of measured process output was obtained. The final values of
the complete model parameters are shown in Table 1.
4.1.3. Validation of SDM model
Fig. 8 shows a comparison between the measured and
computedD50 for the semi-empirical model validated on a setof regular batches. Two diagrams arepresented. Theupper di-
agram shows a chronological history of batches. For purposes
of better presentation the results are presented with lines, al-
though they do represent individual batches and should actu-
ally be represented by points. The lower diagram shows the
same results sorted according to ascending D50 (measured).
From the upper diagram it can be seen that the model predicts
relatively well the variations of the actual process output.
Seventy-two percentages of batches lie within 5% of the
measured D50 and 94% of batches within 10%. The stan-
dard deviation of the error between the model and the process
Table 1
Parameters of the SDM model
Dynamic model Static model
k10 0.0174 p1 1.7066
k20 0 p2 1.379
k30 0 p3 1.9071
k40 0.0395 p4 0.2171
E1 777.12 p5 2.1169
E2 0 p6 0.8844
E3 0 p7 1.6041
E4 1879 p8 1.4882
h1 0.638 p9 2.2523
h2 0.037
7/28/2019 Model Validation on Selection of Process Models
8/16
1514 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
Fig. 8. Measured and computed D50 for the SDM model.
is 0.106, which is approximately the same as the standard de-
viation of theactual process output (0.114). It canbe expected
that the model with this quality can be used to move the pro-
cess output closer to the desired value (the production aim is
to produce particles with D50 equal to 2.15, while the actual
mean value is 2.07), while it cannot reduce significantly de-viations around the desired value. Similar conclusions were
also drawn from a more profound analysis where 2-year plant
data was used (Sel et al., 1999).
The model wasevaluated also in relation to the input error.
The complete model is a multi input single output model,
with one of the input variables (seeds addition) used as a
control variable in the intended model-based control. In such
a case it is especially interesting to evaluate the input error of
the control variable, since the errors between the actual and
calculated model output are directly reflected in the wrong
calculation of the control variable, when the model is used
for control purposes. Observation of the input error of the
control variable in the cases where different combinations ofprocess inputs give approximately the same process output
helps to evaluate whether the control variable is included in
the model in a proper way. The input error is in our case
determined as a difference between the actual and calculated
seeds addition, the latter determined as a value at which the
model output is equal to the measured process output. The
SDM model has a nonlinear structure. Therefore, the volume
of added seeds (and consequently the seeds addition, see Eq.
(16)) is determined by optimisation so that the difference
between the actual and measured D50 is minimised, while all
the other model inputs are equal to measured values.
The input error of the SDM model is presented in Fig. 9.
For 79% of batches the calculated seeds addition is within
25% of the actual value.
4.2. Black-box linear regression models
Modelling of hydrolysis process was performed also by
black-box models where output quality parameter D50 is de-
termined as a static map from process input variables. The
main question was whether a simple static model structure
can give similar results as SDM model.
The same input variables were addressed as potential
model inputsas in thecase of semi-empirical dynamic model,
except for the temperature, where temperature profile used
in the case of SDM model was replaced by the minimum
(Tmin) and maximum (Tmax) temperature because of the static
model structure. The potential regressors of the black-box
model were all the mentioned input variables, the productCseedsVseeds, andtheir logarithm, squared and square root val-
ues.
4.2.1. Experimental model with complete model
structure (EKSC)
A straightforward approach to model structure selection
is a choice where all independent variables are included in
the model. The structure of the model is as follows:
D50 = 0 + 1Tmin + 2Vseeds + 3s + 4Cseeds
+5w + 6Tmax + 7CTiO2 +8 AP+ 9 AA (18)
7/28/2019 Model Validation on Selection of Process Models
9/16
N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1515
Fig. 9. Measured and computed seeds addition for the SDM model.
The values of model parameters were determined on a se-
lected set of batches from regular production as in the case of
the static part of the SDM model. Parameter values are given
in Table 2.
Also in this case, the model was evaluated in relation to the
input and output error. Because of the linear model structure,the input error related to the seeds addition could be deter-
mined analytically. For the static linear regression model of
the form
y = r + Tu (19)
where ris the model input for which the input error is calcu-
lated, is the corresponding modelparameter, u isthe n vector
of remaining model inputs (uT = [ 1 u1 u2 un1 ]),
is a n vector of corresponding model parameters, y is the
model output and n is the number of all model inputs, the
input error associated with the variable rcan be determined
Table 2
Parameters of the EKSC model
0 7.5645
1 3.2951 102
2 4.4447 104
3 3.159102
4 2.5453 102
5 1.7923 103
6 1.9229 102
7 7.8777 103
8 2.0235 104
9 2.1255 103
as
r = rp r|y=yp,u=up = rp 1
yp +
1
Tup (20)
where rp, up and yp are the measured values of input and
output process variables. From the calculation of the modeloutput error
y = yp y|r=rp,u=up = yp rp Tup (21)
it can be seen that in the case of linear models, the model
input and output error are interrelated in the following way
r = 1
y (22)
The above expression shows that for a time series of input
and output measurements, and corresponding model compu-
tations, the dynamics of the input error are equal to the dy-
namics of the output error. However, a model that expresses a
small output error does not necessarily give also a small input
error. The corresponding model parameter is responsible
for the absolute input error of the model.
The input error associated with the seeds addition was
determined in accordance with (20) as a difference between
the actual and computed seeds addition, the latter determined
so that the model output is equal to measured D50, while all
the other model inputs are equal to measured values. The
computation is performed by first calculating the volume of
added seeds Vseeds from (18) and then determining the seeds
addition in accordance with (16).
7/28/2019 Model Validation on Selection of Process Models
10/16
1516 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
Fig. 10. Measured and computed D50 for the EKSC model.
Figs. 10 and 11 show the EKSC model output and input
error, respectively. Fig. 10 shows that also the EKSC model
is characterised by a good agreement of model output with
process data. Seventy-seven percentages of batches lie within
5% of the measuredD50 and 97% of batches within10%.
However, from Fig. 11 it can be seen that the EKSC model
gives much worse results in relation to the input error. Only
for 60% of batches the calculated seeds addition is within
Fig. 11. Measured and computed seeds addition for the EKSC model.
7/28/2019 Model Validation on Selection of Process Models
11/16
N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1517
Fig. 12. Variations of on-line adapted parameters of EKSS model with indicated confidence limits.
25% of the actual value. The reason for poor model perfor-
mance associated with the model input error is the estimation
of model parameters on regular data. This data is not prop-
erly excited in relation to seeds addition, which results in an
inadequate estimate of the corresponding model parameter.
4.2.2. Experimental model with statistical choice of
process variables (EKSS)
To get more relevant model structure and parameters, the
black-box model was determined also by linear multiple re-
gression methods and identified on experimental data, which
include the necessary excitation of recipe parameters. A step-wise regression (Weisberg, 1985) was used to choose inde-
pendent variables that are most related to the output variable
D50. At each step during the selection procedure, ttests were
performed to decide whether to add a variable into the model,
delete a variable, or exchange two variables, tvalues to enter
or remove a variable from the model were chosen as 1 and
2.5, respectively. At each step, the parameters of the model
were determined by the least-squares method. The obtained
model is as follows:
D50 = 0 + 1Tmin + 2Vseeds + 3s (23)
Testing of the above model on data from regular production
didnot givea satisfactorymodelfit. Theaverage model output
error was about 25% larger compared to the SDM model.
Hence, on-line parameter adaptation was considered as a test
of whether the on-line adjustment of model parameters could
improve model performance.
Model adjustment is used to compensate for an approx-
imate model structure and to take into account the time-
varying process characteristics. On the other hand, it can also
help in model validation as explained in the description of
distortion method in Section 2. With the adaptation of model
parameters, the model better fits the measured output data.
If that requires considerable changes of model parameters,
the model is not of a good quality. On the contrary, if the
parameter values vary within certain limits, the model can be
qualified as a good one.
Fig. 12 shows the values of on-line adapted EKSS model
parameters for a set of batches from regular production. The
straight dashed lines indicate the parameters confidence in-
tervals, tstatistics was used to determine confidence intervals
for each parameter (Weisberg, 1985), while the parameter
values themselves were initially determined on a set of ex-
perimental batches. A 95% confidence interval was used to
determine the constraints on parameter values.
7/28/2019 Model Validation on Selection of Process Models
12/16
1518 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
From Fig. 12 it can be seen that during regular production
the model parameters mainly vary within the specified range,
except for the parameter associated with the volume of added
seeds, whose value falls deeply below the lower limit. This
again proves thefact that regular plant data arenot appropriate
for estimating this model parameter.
It should be also noted that the input variables in (23) donote include the concentrations of the input solution. There-
fore, the model could not be applied in flexible recipe control
where initial deviations of the input concentrations should be
compensated for by changing the seeds addition. The reason
for the statistical choice of independent variables, whereinput
solution concentrations are excluded from the model lies in
the excitation of experimental batches. In this set of batches,
excitation of some of the input variables is high compared
to the variations of the input solution concentrations. There-
fore, the latter were not determined as statistically significant
enough to be correlated with the process output. As already
mentioned, the input solution concentrations are uncontrol-
lableand couldnot be purposelychanged during experiments.
4.2.3. Experimental model with on-line structure
identification (EKSO)
Being aware of the above-explained deficiencies of avail-
able process data, the experimental black-box model was
also designed so as to employ the beneficial qualities of both
groups of data:
experimental batches, where high excitation of some input
variables (especially seeds) ensures the model validity in
a wider operating region, and
normal plant operation, where regular plant excitation (es-
pecially variations of input solution) affects the process
output and provides the information necessary for using
the model in flexible recipe control.
The procedure for constructing the model with the above
properties is shown in Fig.13.Fromtheleftpartofthescheme
it can be seen that the design of the model structure starts with
the experimental batches, from which a basic model structure
is identified and includes statistically important model inputs
that were selected on experimental batches (23). The model
structure is then completed with additional input variables
that are statistically determined from regular data. The pro-
cedure of selecting the additional input variables is repeated
every 100 batches.
The parameters of this model are computed from regular
plant data and are adjusted on-line every 100 batches, except
for:
the model parameter associated with the volume of added
seeds, which is kept at a constant value determined from
experimental batches;
the model constant that is on-line optimised over the last
four batches.
Thefirst exception is used since the volume of added seeds
is the only input of the basic model structure whose excitation
Fig. 13. Procedure for constructing the EKSO model.
in experimental batches is different (higher) than in regular
batches. The second exception was determined experimen-
tally, and can be explained by the dynamics of the input so-
lution variations. Although the batches of hydrolysis process
are performed independently one from the other, some sim-
ilarity in operating conditions of successive batches occurs
because of the similar quality of the input solution. Input so-
lution is prepared in preceding production processes and itscharacteristics change with the dynamics of these processes.
While the influence of measured input parameters on the hy-
drolysis process could be directly presented in the hydrolysis
process model, the influence of dynamic changes of unmea-
sured parameters can only be taken into account by adjusting
the model constant parameter. By simulation it was deter-
mined that the optimal number of batches for constant model
parameter adjustment is four batches.
The structure and parameters of the EKSO model as de-
termined on the available set of regular plant data are given
in Table 3, except for the constant model parameter, which is
adjusted on-line. Figs. 14 and 15 show the obtained results in
relation to the model output and input error, respectively. It
can be seen that good results have been obtained in relation
to both. Eighty-four percentages of batches lie within 5%
of the measured D50 and 97% of batches within 10%. For
91% of batches the calculated seeds addition is within25%
of the actual value.
In order to test the sensitivity of the obtained model to the
intended control variable, i.e. seeds addition, a simulation ex-
periment was also performed where the addition of seeds was
computed for the batches from normal production with the
aim to achieve the output quality parameterD50 equal to 2.15,
which is a desired production goal. With such an experiment
7/28/2019 Model Validation on Selection of Process Models
13/16
N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1519
Table 3
Structure and parameters of the EKSO model
Successive batches Model
1100 Input variable Vseeds Tmin s CTiO2 Tmax wAssociated parameter value 7.999104 4.22103 8.95 103 7.33103 5.85102 2.02 104
101200 Input variable Vseeds Tmin s AA Tmax Cseeds
Associated parameter value 7.999104
5.86102
4.37 102
3.733103
7.64102
2.37 102
201300 Input variable Vseeds Tmin s Cseeds wAssociated parameter value 7.999104 5.22102 4.37 102 1.918102 1.352 105
301400 Input variable Vseeds Tmin s AA Cseeds CTiO2Associated parameter value 7.999104 2.47102 3.05 102 1.077102 2.304 102 1.67 102
it is possible to check whether the model has an appropriate
sensitivity to intended control variable, and to verify that the
low input error is not a result of the low model sensitivity
to seeds addition. Because of the static linear model struc-
ture of the form (19), the optimal value of control variable
ro for a desired value of process output yo can be determined
analytically in the following way:
ro =1
yo
1
Tup (24)
whereyo is in our case the desired value ofD50 and is equal to
2.15, ro is thecalculated optimal volume of added seeds, from
which the optimal seeds addition is determined in accordance
with (16), and up is the vector of remaining model inputs as
presented in Table 3.
TheobtainedresultsareshowninFig.16. The figure shows
that the model expresses the necessary sensitivity to seeds
addition required for flexible recipe control. The adjustment
of seeds addition is in the range expected from the expert
knowledge.
5. Discussion
In addition to observing the process and model behaviour
visually, the models were also validated and mutually com-
pared by computing different quantitative measures repre-
sented by RMSE (1), TIC (2) and REL measures (3) ex-
plained in Section 2. These measures were used to evaluate
the output and input error of different model structures. The
values of these criteria with indicated best performance are
shown in Tables 4 and 5 for output and input errors, respec-
tively. The measures were used to mutually compare differ-
ent models. As explained in the description of these crite-
Fig. 14. Measured and computed D50 for the EKSO model.
7/28/2019 Model Validation on Selection of Process Models
14/16
1520 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
Fig. 15. Measured and computed seeds addition for the EKSO model.
Table 4
Model output errors
Output error
Model RMSE TIC REL
SDM 0.1061 0.0255 0.0504
EKSC 0.0899 0.0216 0.0427EKSO 0.0922 0.022S 0.0443
Table 5
Model input errors
Input error
Model RMSE TIC REL
SDM 0.1042 0.0967 0.1909EKSC 0.1657 0.1530 0.3040EKSO 0.0934 0.0873 0.1702
ria, low values of these measures mean better model quality.
The values of RMSE criterion are also shown graphically in
Fig. 17.
It can be seen that different measures mutually qualify
the models in the same way, i.e. the model with the lowest
RMSE value has also the lowest TIC and REL values. Thetables show that all the models are approximately of the same
quality in relation to the output error (see also Fig. 17). The
best performance in relation to output error is observed for
the EKSC model. This model is, however, not acceptable for
the intended use due to the high input error. The SDM and
EKSO models have low input and output errors and are most
appropriate for use in the flexible recipe control. Hence, the
same conclusions were derived as from observing the model
behaviour qualitatively.
Fig. 16. Optimal control signal for the EKSO model (D50 aim= 2.15).
7/28/2019 Model Validation on Selection of Process Models
15/16
N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1521
Fig. 17. Thevalues of RMSE criterionfor input andoutput error of different
models.
For use in an on-line control scheme, the SDM model is
more demanding. For its implementation it is necessary to use
appropriate software for simulating differential equations,
and optimisation routines for determining the optimal control
signal. Optimisation can be time-consuming, although in thisparticular case does not represent a time-critical task.
The implementation of the EKSO model is much simpler.
For this model the control signal is computed analytically.
Also, the identification of the EKSO model is simpler,
although the available data about the process in our case
requires us to perform the model design procedure very
carefully.
Besides obvious advantages also some drawbacks of the
experimental models compared to the SDM model can be
noticed. When using the SDM model on regular data no
problems regarding the sensitivity of model to seeds addition
were encountered, as was the case for experimental models.This is due to the dynamic part of the model that ensures the
appropriate model dependency on seeds addition despite a
low excitation signal during normal production.
6. Conclusions
In this paper, the design of models to be used in the flexible
recipe control scheme for control of the batch hydrolysis pro-
cess is presented. The models design and testing procedure
has shown that the process models used for control purposes
should be validated not only in relation to the output error, but
also in relation to the input error and optimal control signal.
Only the three measures used together provide the necessary
model validation in relation to model use. The output error
shows whether the designed model represents the overall pro-
cess behaviour accurately enough; the input error shows how
accurately the process control variable is incorporated into
the model; while the optimal control signal shows whether
the model expresses the necessary (or expected) sensitivity
of the process control variable.
In our particular case, thethree measures enabled us to val-
idate different model structures: the semi-empirical dynamic
model and different black-box static models. Furthermore,
they helped to improve the model design procedure in such
a way that a simple and satisfactory experimental model was
constructed despite a very limited data set for model identifi-
cation. Such a model was designed by combining experimen-
tal and regular plant data that together provided the necessary
excitation for the statistical choice of appropriate model in-
put variables and model parameters. Black-box identificationhas shown that observing only the model output error would
in this case result in a wrong model structure, which is not
suitable for the intended model use.
Two of the designed models are appropriate for use in
flexible recipe control of the hydrolysis process: the semi-
empirical dynamic model (SDM) and one of the experimen-
tal models (EKSO). Both perform satisfactorily in relation to
input and output errors, and provide the necessary sensitivity
of the intended control variable. The implementation of the
EKSO in the on-line control scheme is simpler, as it deter-
mines the optimal control signal analytically, while the SDM
model requires optimisation. On the other hand, the SDM
model is expected to be valid and more reliable in a wideroperating region due to a more complex model structure.
References
Bohlin, T. (1991). Interactive system identification: Prospects and pitfalls.
Springer-Verlag.
Bossel, H. (1994). Modeling and simulation. Wiesbaden: Verlag Vieweg.
Cameron, R., Marcos, R. L., & de Prada, C. (1998). Model validation of
discrete transfer functions using the distortion method. Mathematical
and Computer Modelling of Dynamical Systems, 4, 5872.
Goodwin, G. C. (2002). Inverse problems with constraints. In Proceedings
of the 15th triennial world congress of the international federation ofautomatic control.
Gray, G. J., & von Grunhagen, W. (1998). An investigation of open-
loop and inverse simulation as nonlinear model validation tools for
helicopter flight mechanics. Mathematical and Computer Modelling
of Dynamical Systems, 4, 3257.
Kheir, N. A. (1988). Systems modelling and computer simulation. New
York: Marcel Dekker.
Ljung, L. (1999). System identification. Englewood Cliffs: Prentice-Hall,
Inc.
Murray-Smith, D. J. (1995). Continuous system simulation. London:
Chapman & Hall.
Murray-Smith, D. J. (1998). Methods for the external validation of con-
tinuous system simulation models: A review. Mathematical and Com-
puter Modelling of Dynamical Systems, 4, 531.
Murray-Smith, D. J. (2000). The inverse simulation approach: A focusedreview of methods and applications. Mathematics and Computers in
Simulation, 53, 239247.
Neelamkavil, F. (1987). Computer simulation and modelling. Chichester:
John Wiley & Sons.
Qureshi, M. E., Harrison, S. R., & Wegener, M. K. (1999). Validation of
multicriteria analysis models. Agricultural Systems, 62, 105116.
Rijnsdorp, J. E. (1991). Integrated process control and automation. Am-
sterdam: Elsevier Science.
Sage, A. P. (1992). Validation. In D. P. Atherton & P. Borne (Eds.),
Concise encyclopaedia of modelling and simulation (pp. 477488).
Oxford: Pergamon Press.
Santacesaria, E., Tonello, M., Storth, G., Pace, R. C., & Carra, S. (1986).
Kinetics of titanium dioxide precipitation by thermal hydrolysis. Jour-
nal of Colloid and Interface Science, 11(1), 4453.
7/28/2019 Model Validation on Selection of Process Models
16/16
1522 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522
Sel, D., Hvala, N., Strmcnik, S., Milanic, S., & Suk-Lubej, B. (1999). Ex-
perimental testing of flexible recipe control based on a hybrid model.
Control Engineering Practice, 7(10), 11911208.
Thiel, H. (1970). Economic forecasting and policy. Amsterdam: North-
Holland.
Verwater-Lukszo, Z. (1998). A practical approach to recipe improvement
and optimization in the batch processing industry. Computers in In-
dustry, 36, 279300.
Weisberg, S. (1985). Applied linear regression. New York: John Wiley &
Sons.
Zhou, X. (1993). A new method with high confidence for validation of
computer simulation models for flight systems. Chinese Journal of
Systems Engineering and Electronics, 4, 4352.
Zele, M., Juricic, ., Strmcnik, S., & Matko, D. (1998). A probabilistic
measure for model purposiveness in identification for control. Inter-
national Journal of Systems Science, 29, 653662.