Model Validation on Selection of Process Models

7/28/2019 Model Validation on Selection of Process Models

1/16

Computers and Chemical Engineering 29 (2005) 15071522

Influence of model validation on proper selection of processmodelsan industrial case study

N. Hvala a,, S. Strmcnika, D. Sel a, S. Milanic b, B. Banko b

a J. Stefan Institute, Jamova 39, 1000 Ljubljana, Sloveniab Faculty of Electrical Engineering, Trzaska c. 25, 1000 Ljubljana, Slovenia

Received 26 July 2000; received in revised form 30 November 2004; accepted 30 November 2004

Available online 7 March 2005

Abstract

This paper considers the design and validation of a model of an industrial batch process in TiO 2 production. The model will be used for

flexible recipe control, which is a model-based approach used in control and optimisation of batch processes. Because of insufficient knowledge

and a lack of proper data, different process models were developed: a semi-empirical dynamic model based on chemical kinetics laws, and

several experimental black-box models. In the paper the models are validated and mutually compared. Validation of models has shown

that besides comparing the model and the process output behaviour, additional measures considering also the model input error should be

introduced for proper model validation related to the model use. In our case, introducing additional measures also contributed to improvement

of the model design procedure, so that a simple yet satisfactory black-box model was obtained despite a small amount of process data.

2004 Elsevier Ltd. All rights reserved.

Keywords: Model validation; Model input error; Model-based control; Optimisation; Flexible recipe control; Hydrolysis batch process

1. Introduction

Mathematical modelling is a pervasive methodology,

which represents an important and steadily increasing part of

almost every field of science and engineering. The range of

problems addressed by using mathematical models is practi-

cally unlimited, therefore a vast amount of methods and tools

have been developed trying to make the process of modelling

simpler and more efficient. In spite of these endeavours math-

ematical modelling is far from becoming a routine work. On

the contrary, it is still considered to be more art then science.The reason lies in the fact that modelling is an iterative design

procedure in which human judgement and creativity play a

decisive role.

One of the most important steps in this procedure is the

evaluation of the model. Model evaluation generally consists

of two stages: verification and validation. Verification con-

Corresponding author. Tel.: +386 1 4773 900; fax: +386 1 4773 994.

E-mail address: [email protected] (N. Hvala).

cerns the consistency and accuracy of simulation programs

compared with the associated mathematical models, while

model validation concerns the level of agreement between

mathematical descriptions and the real system under investi-

gation (Murray-Smith, 1998). In the literature on modelling

and simulation it is generally accepted that model validation

is a crucial part of themodel development procedure. Without

validation a model is of very little use (Neelamkavil, 1987).

However, it is also stated that in reality, model validation is

treated in a superficial way and does not form a central el-

ement of the modelling process. As stated (Murray-Smith,1995), most application papers in journals and conference

proceedings pass over questions of model validation in a su-

perficial fashion or make no mention of it at all.

In practice, validation is usually reduced to checking the

agreement between outputs of the model and those of the

real system (Qureshi, Harrison, & Wegener, 1999). This is

especially true in the development of industrial process mod-

els, where process complexity, insufficient knowledge about

the process and a lack of proper data lead to this kind of

0098-1354/$ see front matter 2004 Elsevier Ltd. All rights reserved.

doi:10.1016/j.compchemeng.2004.11.013


2/16

1508 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

simplifications. However,the mainissue in the design of mod-

els for engineering purposes is to design a model, which is

valid for the intended purpose. Therefore, the primary aim of

the model validation should be to find out whether the model

is good enough for the intended use. This task requires a

much more comprehensive validation approach in which ad-

ditional methods are used to build up the confidence into theconsidered model.

The aim of the paper is to show that using more complete

validation approach may lead to decisions in the model de-

sign procedure, which are different from those obtained on

the basis of simple evaluation of the output fit. In our case

the main additional issue in the applied validation process

is using the concept of the inverse model and the related in-

put error, which provide complementary information about

the model usability. The need for such an approach was en-

countered over the design of a model of an industrial batch

process in TiO2 production. The aim of the model is to apply

it within the flexible recipe control (Rijnsdorp, 1991), which

is a model-based approach used in control and optimisationof batch processes. Different models were developed for this

process, i.e. a semi-empirical model based on chemical kinet-

ics laws, and several experimental (black-box) models. The

models were designed based on different sources of process

information, i.e. process knowledge and available process

data. All the models express relatively similar quality if con-

sidering the agreement to process output data. In the paper it

is shown that only after validating the models also in relation

to the model use, i.e. evaluating the model input error and

performing sensitivity analysis of the control variable, was it

possible to distinguish between the performances of different

model structures, and to direct the model design in the properway.

The paper is organized as follows. After a short overview

of model validation methods, it describes the industrial pro-

cess of batch hydrolysis and presents the concept of flexible

recipes to be applied for control of this process. The main em-

phasis is then placed on the design and validation of different

process models to be used in the control scheme. The models

are evaluated and mutually compared in relation to different

performance measures. The obtained results are then com-

mented on in the discussion.

2. Model validation

The main concepts of model validation can be found in

most textbooks on modelling and simulation (e.g. Bossel,

1994; Kheir, 1988; Murray-Smith, 1995; Neelamkavil,

1987), while a survey of model validation methods has been

presented by Murray-Smith (1998). Although the available

concepts and methods are rather general, one has to be aware

that model validation is always problem dependent (Ljung,

1999; Qureshi et al., 1999).

Generally speaking,the quality of themodel canbe judged

with respect to several features. The most important ones

are model purposiveness, model falseness and model plau-

sibility (Bohlin, 1991; Sage, 1992; Zele, Juricic, Strmcnik,

& Matko, 1998). Purposiveness (usefulness) tells whether a

model satisfies its purpose. Falseness is related to agreement

with measurements (data) coming from the real system to be

modelled (a falsified model is one, which is contradicted by

data). Plausibility, also referred to as conceptual validity orface validity (Qureshi et al., 1999), expresses the confor-

mity of the model with a priori knowledge about the process.

Below is a short summary of model validation concepts re-

lated to the above-defined features.

2.1. Model purposiveness

A model is always developed with a certain purpose, i.e.

with theaim to solvea certain problem. Thereforethe ultimate

validation of the model is to test whether the problem that

motivated the modelling exercise can be solved using the ob-

tained model (Ljung, 1999). Testing of model purposiveness

might be often impossible, too expensive, time consuming,dangerous, etc. In some cases the mentioned problems can be

alleviated by testing the solution in a simulation environment

based on the process model. However, such an approach only

partially solves the problem and is still difficult to perform.

In practice, assessment of model plausibility and espe-

cially falsification is generally easier. Hence, it often hap-

pens that falsification is performed instead of validation of

model purposiveness. However, even if a model agrees with

the available data it may not be necessarily good enough for

a given purpose, and such an approach may lead to the design

of an inappropriate model. Unfortunately the opposite might

also be true. The model may not agree with data, and is stillgood enough to serve its purpose.

2.2. Model plausibility

Assessment of model plausibility is tightly related to ex-

pert judgement of whether the model is good or not. The

level of plausibility, or better said the expert opinion about it,

is basically related to two features of the model.

The first one considers the question whether the model

looks logical. This question concerns characteristics of

the model structure (type of equations, connections between

equations, etc.) and its parameters (gains, time constants,

signs, etc.), and is relevant when the model is derived from

first principles. If the structure and the parameters are feasi-

ble, which means comparable to what experts know about the

real process, then the confidence into the model is greater.

The second one is related to the question whether the

model behaves logically. This part concerns assessment

of the reaction of the model outputs (dynamics, shape, etc.)

to typical events (scenarios) on the inputs. If the model in

different situation reacts in accordance with expectations of

the experts, then again our confidence about its validity is

increased. Note that for black box models this is the only

way to assess the plausibility.


3/16

N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1509

Fig. 1. Computationof modeloutput error (P, process;M, model; u, process

input; n, noise; y, process output; ym, model output; eo, output error).

2.3. Model falseness

As already mentioned falsification is the most widely used

approach to the validation of models and is related to direct

comparison of inputoutput data from the model and from

the real system. However, also within this validation area the

methods substantially differ concerning the applied princi-

ples. The basic distinction concerns the questions what iscompared and how it is compared.

2.3.1. Comparison of outputs

Comparison of model and process outputs is a standard

approach, which does not need explanation. For sake of com-

pleteness and to enable a comparison with other approaches,

let us just point out that the issue under consideration here

is the difference between the process and the model output,

which is referred to as the output error. It is computed in the

way as shown in Fig. 1.

2.3.2. Comparison of inputsIn some applications differences, which appear in the in-

put variables, may highlight deficiencies in the model more

readily than conventional comparisons based on output vari-

ables (Murray-Smith, 2000). This is especially important in

the case of control or optimisation problems for processes

with multiple inputs. It is known that different combinations

of values of input variables may result in the same (or very

similar) values of the output variables. Thus, the same output

error could be produced by different input combinations. For

that purpose, the input error is determined as another rele-

vant measure for model validation (Gray & von Grunhagen,

1998). Input error is the difference between the process in-

put and the output of the inverse model, the latter computed

when the measured process output is applied as the input

to the inverse model. The computation of the input error is

schematically shown in Fig. 2.

The basic problem concerning this kind of validation is

apparently related to the concept of the inverse model. This

concept has been extensively used in the areas of signal pro-

cessing, telecommunication, and control (Goodwin, 2002),

where also methods for derivation and use of inverse models

have been developed. Basically two approaches are possible:

first, the explicit analytical derivation, which is limited to

simple, mostly linear models, and second, the more general

Fig. 2. Computation of model input error (P, process; M1, inverse model;

u, process input; n, noise; y, process output; um, inverse model output; ei,

input error).

implicit simulation and optimisation technique, which does

not provide the inverse model itself, but provides the solution

of the inverse problem.

2.3.3. Comparison methods

The comparison of measured and simulated data can be

performed qualitatively, quantitatively, or based on statisticalmethods (Murray-Smith, 1998).

Qualitative approaches involve plotting the process and

the corresponding model variables and performing visual in-

spection of differences.

Quantitative methods are based on performance measures

that determine goodness of fit. In this paper root mean square

error (RMSE), Theils inequality coefficient (TIC) (Thiel,

1970), and relative error (REL) are used, which are defined

with the following equations:

RMSE = i(yi ym,i)2

N(1)

TIC =

i(yi ym,i)

2iy

2i +

iy

2m,i

(2)

REL =

i((yi ym,i)

2/y2i )

N(3)

Thereby yi represent the measured data points, ym,i the com-

puted data points andNthe number of data points. Lower val-

ues of RMSE and RELindicate better agreement between the

measured and computed data. The value of TIC lies between

zero and unity, with values closer to zero indicating better

model validity. The measures are more appropriate for model

comparison than for use in an absolute sense, although Zhou

(1993), for example, suggested that values of TIC smaller

than 0.3 indicate good agreement with measured data.

Statistical methods apply to the comparison of the distri-

bution of the data rather than to point-by-point comparisons.

They include descriptive statistics, which deals with means,

variances, correlations, etc., and inferential statistics, which

considers hypothesis tests and confidential intervals (Qureshi

et al., 1999). A widely used on-line validation approach is a

stepwise regression wherethe selection of the modelstructure


4/16


is based on the correlation coefficient R and the F-ratio. R

gives a measure of the accuracy of the fit, while the F-ratio

provides a measure of the confidence that can be ascribed to

this fit.

2.4. Other validation methods

There are also validation methods, which cannot be di-

rectly assigned to one of the considered three groups.

One of such approaches is sensitivity analysis. Sensitivity

analysis examines the extent of variation in predicted perfor-

mance when parameters are systematically varied over some

range of interest (Qureshi et al., 1999). It provides the in-

sight into stability of the model and into priority ways for its

refinement.

A similar but complementary approach is known as distor-

tion method(Cameron, Marcos, & de Prada, 1998; Murray-

Smith, 1998). The approach is based on the assumption that

any model output can be made to follow a measured variable

by distorting the model parameters as a function of time.The better the model, the less distortion is required. If the

distortion needed to match the process response lies within

acceptable limits, the model is judged satisfactory.

Finally, there are some methods, which are related to data

analysis or data validity. One of the most known is based

on system identification techniques and uses the concept of

identifiability (Murray-Smith, 1998). Unique and reliable es-

timates of the parameters of a model of given structure can be

obtained only if that model is identifiable. Structural uniden-

tifiability occurswhen a model hastoo many parameters to al-

low all of them to be estimated separately from the measured

variables. In that case, it is known that certain parameterscan only be estimated in combination with other parameters.

Numerical identifiability is associated with measurement er-

ror, process noise or inadequate information in the measured

data. It is especially important in the design of experiments;

they should be designed so as to obtain data about the process

that is rich in information.

The above listed methods represent possible ways to

model validation, but are in practice still not fully exploited

for ascertaining the model quality. This paper shows how

some of these methods can be used in practice for the evalu-

ation and improvement of the process model.

3. Problem description

3.1. Process description

The problem addressed in this paper is modelling of hy-

drolysis process, which is one of 18 successive processes in

the production of TiO2 pigment. Hydrolysis runs in a batch

reactor (Fig. 3) where TiO2gel is formed by precipitation out

of TiOSO4 solution. The reaction occurs by adding water

and seeds, and maintaining the solution at boiling point for

an appropriate amount of time.

Fig. 3. Batch hydrolysis reactor.

The reaction is depicted by the following chemical equa-

tions:

TiOSO4 +H2O TiO2 +H2SO4 (4)

TiO2nucleation seeds (5)

TiO2 + seeds TiO2gel (6)

TiO2 is an intermediate product. In the nucleation process it

is transformed into seeds (5), or else it is precipitated onto

already formed seeds (6). Additional seeds are added to the

solution during the heating phase to accelerate precipitation.

The final product is the TiO2gel. H2SO4 is the hydrolysis by-

product.

Hydrolysis is one of the most crucial parts of the overallproduction process, since for the first time in the production

line particles (TiO2gel) are formed out of solution. The size

of particles has a strong influence on the quality of the final

product. Hence,it is desired that theparticles be of desired op-

timal size. Achieving this is not a straightforward task, since

the process inputs are subject to uncontrollable variations,

which are also reflected in the process output variations. The

size of particles in the TiO2gel is determined by the parameter

D50, which represents the diameter of floccules attributed to

the highest number of floccules. On-line control (during the

batch) of particles size is not possible, since parameter D50is measured only at the end of the batch, when a sample of

the final product is examined by laboratory analysis.

3.2. Flexible recipe control

It is expected that more uniform particles size between the

batches could be achieved by flexible recipe control, which is

a control concept introduced for batch processes (Rijnsdorp,

1991; Verwater-Lukszo, 1998). Unlike fixed recipes that pre-

scribe fixed recipe instructions for running a batch, flexible

recipe control adjusts the recipe parameters to account for

changes in input and operating conditions of a current batch.

In this way, final product quantity and/or quality could be


5/16


Fig. 4. Flexible recipe control: optimisation of recipe parameters with theaid of a process model.

retained despite changing conditions in performing a batch.

Adjustment of recipe parameters is done by optimisation

(Fig. 4) where a performance index representing the eco-

nomic aspects of process operation (e.g. production yield,

product quality, energy consumption, etc.) is optimised by

varying the recipe parameters. The value of the performance

index at suggested values of recipe parameters is computed

by simulating the process behaviour using the process model.

In the case of hydrolysis process, flexible recipe control

is applied for control of quality parameter D50, which is theprocess output variable. Variations of input concentrations

are the main source of input variations, while seeds addition

can be chosen as an adjustable recipe item (control variable).

The construction of the process model used in flexible recipe

control is describedin thenext section. Potentialmodel inputs

affecting D50 are active acid concentration (AA), free acid

concentration (AP) and TiO2 concentration (CTiO2 ),all inthe

input solution, concentration (Cseeds) and volume (Vseeds) of

added seeds, temperature during the reaction, flow of added

water (w), and steam flow after boiling (s).

Control of hydrolysis process using flexible recipe control

will be performed by calculating the optimal seeds addition

for each batch with the aim to produce the particles of the de-sired optimal size. This calculation will be performedat a start

of each batch and will be based on most recently measured

concentrations of input solution, while the other operating

parameters, e.g. temperature, addition of water, etc., will be

set equal to the preceding batch.

4. Design of process models

As in many real world applications, the design of a pro-

cess model for the hydrolysis process was very much con-

strained by the available knowledge and data about the pro-

cess. Knowledge was insufficientdue to theprocess complex-

ity, where only some basic relations from chemical kinetics

laws could be derived. On the other hand, data about the pro-

cess was also limited to two groups:

a small set of experimental batches (23 batches) performed

purposely for model identification; a set of 449 batches from regular production.

The batches from both groups were processed in one of

the industrial hydrolysis reactors.

In the experimental batch set, some of the recipe param-

eters (i.e. the seeds addition, the flow of added water) were

subject to considerable change in order to observe the con-

sequent changes in the process output. Changes were made

great enough to exceed the process noise and to obtain a

model valid in a wide operational region. The input solution

concentrations are uncontrollable and varied in this set of

batches as in normal production.

The second set represented batches from normal produc-

tion. They were performed under fixed recipe control, so that

no special excitation of recipe parameters was imposed. Only

a small number of batches in this set were performed with a

change of seeds addition.

In the design of process models two types of models

were developed: a semi-empirical dynamic modeland several

static (black-box) models.

4.1. Semi-empirical dynamic model (SDM)

A semi-empirical dynamic model (Sel, Hvala, Strmcnik,

Milanic, & Suk-Lubej, 1999) was designed based on twotypes of knowledge: chemical kinetics laws and empirical

knowledge of process experts. The model consists of two

sub-models (Fig. 5):

a dynamic model representing the curve of precipitated

TiO2gel during the batch, and

a static model mapping the precipitation rate curve into

quality parameter D50.

4.1.1. Dynamic part of the SDM model

The structure of the dynamic model was based on chem-

ical kinetics laws that describe dynamic relations between

reactant concentrations (c) and the process reaction rate (r)during the batch. Reaction (precipitation) yield is expressed

as a function of time, and is related to the reaction rate r(t) in

the following way:

yield(t) =

t0

r(t) dt (7)

According to chemical kinetics laws, the reaction rate for the

reaction A + BC generally depends on the concentrations

of reactants A and B, and is written as

r(t) = k(T)caAcbB (8)


6/16


Fig. 5. Structure of a semi-empirical dynamic model.

where a and b are empirically determined constants; k(T)

is a specific reaction rate that is modelled by the Arrheniusequation

k(T) = k0 eE/T (9)

where Trepresents temperature, and k0 and Eare constants.

Hydrolysis is a reversible process. Therefore, the concen-

tration of a particular component Cdepends on the reaction

rate of producing (rp) and consuming (rc) this component,

and is generally given as

dcc

dt= rp rc (10)

Following these laws, a set of differential equations was de-termined for each component in the hydrolysis process. Tak-

ing into account the chemical relations (4)(6), the model

equations are the following:

rTiOSO4 =dcTiOSO4

dt= k1cTiOSO4 cH2O + k2cTiO2

cH2SO4

(11)

rH2O =dcH2O

dt

= k1cTiOSO4 cH2O + k2cTiO2cH2SO4 + h2uH2O (12)

rTiO2= dc

TiO2

dt= k1cTiOSO4 cH2O

(k2 + k3)cTiO2 cH2SO4 k4cTiO2

ch1seeds + rgel (13)

rH2SO4 =dcH2SO4

dt= k1cTiOSO4 cH2O k2cTiO2

cH2SO4 (14)

rTiO2gel =dcTiO2gel

dt= k4cTiO2

ch1seeds rgel (15)

where the following notation is used: r is the reaction rate

of different components (indexes are used to distinguish be-

tween individual components presented in (4)(6)), c the

components concentrations, k the parameters modelled bythe Arrhenius equation (9), h the constant model parameters,

rgel the gel activity (constant). uH2O is the addition of water,

which is equal to the flow of indirectly added water s during

the heating phase, and directly added water w during the

cooking phase. Temperature is included in the model in the

Arrhenius model parameters k. cseeds is the concentration of

seeds in TiO2gel, and depends on the reaction rate of seeds

formation in the nucleation process, as well as on the volume

Vseeds and concentration Cseeds of added seeds. The latter two

parameters are also combined into technological parameter

named seeds addition useeds, which is defined as the percent

of TiO2 mass in added seeds compared to the TiO2 mass ininitial solution and can be written as follows:

useeds =CseedsVseeds

CTiO2 Vsol 100% (16)

CTiO2 is the TiO2 concentration in initial solution, and Vsol is

the volume of initial solution (equal for each batch).

Eqs. (11)(15) were constructed with the assumption that

all the chemical reactions are of the first order, i.e. all the

exponential parameters are set to 1 (except h1). Initial val-

ues of the simulated process variables are determined from

the initial solution concentrations. The result of the system

of differential equations is the concentration of precipitated

TiO2gel in time (cTiO2gel ), also denoted as yield, which can

be represented by the precipitation curve (Fig. 6). The to-

tal batch duration is 5 h with the reaction lasting of ap-

proximately 1.5 h, while the model time constant is around

1 h.

Parameters of thedynamic model were estimatedusingthe

experimental data set. For that purpose experimental batches

were performed with additional laboratory measurements

of precipitation yield at some pre-defined times during the

batch. Parameter values were obtained by a simplex optimi-

sation method, by which the difference between themeasured

and computed yield was optimised for a set of experimental


7/16


Fig. 6. Dynamic part of the SDM model: measured (*) and computed ()

yield during an experimental batch.

batches. As an example, Fig. 6 shows the precipitation curve

for comparison with experimental data. In the set of batches

used for validation, the model predicts the final yield within

5% of the measured value for 78% of batches and within

10% for all the batches.

The performance of the model was also evaluated by the

sensitivity analysis of the model to changes of model input

variables. The model sensitivity was in accordance with the a

priori expert knowledge as well as with information found in

the literature (Santacesaria, Tonello, Storth, Pace, & Carra,

1986). The dynamic model is most sensitive to seeds addi-

tion andtemperature. Besidethesetwo inputsalso initial TiO2concentration and active acid concentration have an impor-

tant influence on precipitation curve.

4.1.2. Static part of the SDM model

The static part of the semi-empirical dynamic model was

designed in order to map the precipitation curve into out-

put quality parameter D50. The structure of the static model

could not be based on theoretical knowledge. Hence, it was

determined by trial and error, where different segments of

the precipitation curve were tested as potential model inputs.

Also, the seeds addition useeds was considered to have a direct

influence on the model output. In each step of the selection

procedure, thechosen model structurewas evaluated by thefit

between the model and the process data. The model structure

finally chosen is the following:

D50 = p1K23 + p2K223 + p3K67 + p4K89 + p5useeds

+p6 RVM+ p7 RVM2 + p8K78 RVM

2 + p9 (17)

where Kdenote the slopes of different parts of precipitation

curve shown in Fig. 7 (e.g. K23 denotes the slope between

the two points where the yield reaches the value of 20 and

30%), RVM is the maximum slope of the precipitation curve,

and p denote the model parameters. The chosen segments

Fig. 7. Segments of the precipitation curve used as inputs in the static part

of SDM model.

represent the most informative parts related to the shape of

the precipitation curve.

Parameters p were determined by the least-squares

method. They were estimated on a selected set of batches

from normal production. This batch set was chosen in such a

way that an equal distribution of batches over the entire range

of measured process output was obtained. The final values of

the complete model parameters are shown in Table 1.

4.1.3. Validation of SDM model

Fig. 8 shows a comparison between the measured and

computedD50 for the semi-empirical model validated on a setof regular batches. Two diagrams arepresented. Theupper di-

agram shows a chronological history of batches. For purposes

of better presentation the results are presented with lines, al-

though they do represent individual batches and should actu-

ally be represented by points. The lower diagram shows the

same results sorted according to ascending D50 (measured).

From the upper diagram it can be seen that the model predicts

relatively well the variations of the actual process output.

Seventy-two percentages of batches lie within 5% of the

measured D50 and 94% of batches within 10%. The stan-

dard deviation of the error between the model and the process

Table 1

Parameters of the SDM model

Dynamic model Static model

k10 0.0174 p1 1.7066

k20 0 p2 1.379

k30 0 p3 1.9071

k40 0.0395 p4 0.2171

E1 777.12 p5 2.1169

E2 0 p6 0.8844

E3 0 p7 1.6041

E4 1879 p8 1.4882

h1 0.638 p9 2.2523

h2 0.037


8/16


Fig. 8. Measured and computed D50 for the SDM model.

is 0.106, which is approximately the same as the standard de-

viation of theactual process output (0.114). It canbe expected

that the model with this quality can be used to move the pro-

cess output closer to the desired value (the production aim is

to produce particles with D50 equal to 2.15, while the actual

mean value is 2.07), while it cannot reduce significantly de-viations around the desired value. Similar conclusions were

also drawn from a more profound analysis where 2-year plant

data was used (Sel et al., 1999).

The model wasevaluated also in relation to the input error.

The complete model is a multi input single output model,

with one of the input variables (seeds addition) used as a

control variable in the intended model-based control. In such

a case it is especially interesting to evaluate the input error of

the control variable, since the errors between the actual and

calculated model output are directly reflected in the wrong

calculation of the control variable, when the model is used

for control purposes. Observation of the input error of the

control variable in the cases where different combinations ofprocess inputs give approximately the same process output

helps to evaluate whether the control variable is included in

the model in a proper way. The input error is in our case

determined as a difference between the actual and calculated

seeds addition, the latter determined as a value at which the

model output is equal to the measured process output. The

SDM model has a nonlinear structure. Therefore, the volume

of added seeds (and consequently the seeds addition, see Eq.

(16)) is determined by optimisation so that the difference

between the actual and measured D50 is minimised, while all

the other model inputs are equal to measured values.

The input error of the SDM model is presented in Fig. 9.

For 79% of batches the calculated seeds addition is within

25% of the actual value.

4.2. Black-box linear regression models

Modelling of hydrolysis process was performed also by

black-box models where output quality parameter D50 is de-

termined as a static map from process input variables. The

main question was whether a simple static model structure

can give similar results as SDM model.

The same input variables were addressed as potential

model inputsas in thecase of semi-empirical dynamic model,

except for the temperature, where temperature profile used

in the case of SDM model was replaced by the minimum

(Tmin) and maximum (Tmax) temperature because of the static

model structure. The potential regressors of the black-box

model were all the mentioned input variables, the productCseedsVseeds, andtheir logarithm, squared and square root val-

ues.

4.2.1. Experimental model with complete model

structure (EKSC)

A straightforward approach to model structure selection

is a choice where all independent variables are included in

the model. The structure of the model is as follows:

D50 = 0 + 1Tmin + 2Vseeds + 3s + 4Cseeds

+5w + 6Tmax + 7CTiO2 +8 AP+ 9 AA (18)


9/16


Fig. 9. Measured and computed seeds addition for the SDM model.

The values of model parameters were determined on a se-

lected set of batches from regular production as in the case of

the static part of the SDM model. Parameter values are given

in Table 2.

Also in this case, the model was evaluated in relation to the

input and output error. Because of the linear model structure,the input error related to the seeds addition could be deter-

mined analytically. For the static linear regression model of

the form

y = r + Tu (19)

where ris the model input for which the input error is calcu-

lated, is the corresponding modelparameter, u isthe n vector

of remaining model inputs (uT = [ 1 u1 u2 un1 ]),

is a n vector of corresponding model parameters, y is the

model output and n is the number of all model inputs, the

input error associated with the variable rcan be determined

Table 2

Parameters of the EKSC model

0 7.5645

1 3.2951 102

2 4.4447 104

3 3.159102

4 2.5453 102

5 1.7923 103

6 1.9229 102

7 7.8777 103

8 2.0235 104

9 2.1255 103

as

r = rp r|y=yp,u=up = rp 1

yp +

1

Tup (20)

where rp, up and yp are the measured values of input and

output process variables. From the calculation of the modeloutput error

y = yp y|r=rp,u=up = yp rp Tup (21)

it can be seen that in the case of linear models, the model

input and output error are interrelated in the following way

r = 1

y (22)

The above expression shows that for a time series of input

and output measurements, and corresponding model compu-

tations, the dynamics of the input error are equal to the dy-

namics of the output error. However, a model that expresses a

small output error does not necessarily give also a small input

error. The corresponding model parameter is responsible

for the absolute input error of the model.

The input error associated with the seeds addition was

determined in accordance with (20) as a difference between

the actual and computed seeds addition, the latter determined

so that the model output is equal to measured D50, while all

the other model inputs are equal to measured values. The

computation is performed by first calculating the volume of

added seeds Vseeds from (18) and then determining the seeds

addition in accordance with (16).


10/16


Fig. 10. Measured and computed D50 for the EKSC model.

Figs. 10 and 11 show the EKSC model output and input

error, respectively. Fig. 10 shows that also the EKSC model

is characterised by a good agreement of model output with

process data. Seventy-seven percentages of batches lie within

5% of the measuredD50 and 97% of batches within10%.

However, from Fig. 11 it can be seen that the EKSC model

gives much worse results in relation to the input error. Only

for 60% of batches the calculated seeds addition is within

Fig. 11. Measured and computed seeds addition for the EKSC model.


11/16


Fig. 12. Variations of on-line adapted parameters of EKSS model with indicated confidence limits.

25% of the actual value. The reason for poor model perfor-

mance associated with the model input error is the estimation

of model parameters on regular data. This data is not prop-

erly excited in relation to seeds addition, which results in an

inadequate estimate of the corresponding model parameter.

4.2.2. Experimental model with statistical choice of

process variables (EKSS)

To get more relevant model structure and parameters, the

black-box model was determined also by linear multiple re-

gression methods and identified on experimental data, which

include the necessary excitation of recipe parameters. A step-wise regression (Weisberg, 1985) was used to choose inde-

pendent variables that are most related to the output variable

D50. At each step during the selection procedure, ttests were

performed to decide whether to add a variable into the model,

delete a variable, or exchange two variables, tvalues to enter

or remove a variable from the model were chosen as 1 and

2.5, respectively. At each step, the parameters of the model

were determined by the least-squares method. The obtained

model is as follows:

D50 = 0 + 1Tmin + 2Vseeds + 3s (23)

Testing of the above model on data from regular production

didnot givea satisfactorymodelfit. Theaverage model output

error was about 25% larger compared to the SDM model.

Hence, on-line parameter adaptation was considered as a test

of whether the on-line adjustment of model parameters could

improve model performance.

Model adjustment is used to compensate for an approx-

imate model structure and to take into account the time-

varying process characteristics. On the other hand, it can also

help in model validation as explained in the description of

distortion method in Section 2. With the adaptation of model

parameters, the model better fits the measured output data.

If that requires considerable changes of model parameters,

the model is not of a good quality. On the contrary, if the

parameter values vary within certain limits, the model can be

qualified as a good one.

Fig. 12 shows the values of on-line adapted EKSS model

parameters for a set of batches from regular production. The

straight dashed lines indicate the parameters confidence in-

tervals, tstatistics was used to determine confidence intervals

for each parameter (Weisberg, 1985), while the parameter

values themselves were initially determined on a set of ex-

perimental batches. A 95% confidence interval was used to

determine the constraints on parameter values.


12/16


From Fig. 12 it can be seen that during regular production

the model parameters mainly vary within the specified range,

except for the parameter associated with the volume of added

seeds, whose value falls deeply below the lower limit. This

again proves thefact that regular plant data arenot appropriate

for estimating this model parameter.

It should be also noted that the input variables in (23) donote include the concentrations of the input solution. There-

fore, the model could not be applied in flexible recipe control

where initial deviations of the input concentrations should be

compensated for by changing the seeds addition. The reason

for the statistical choice of independent variables, whereinput

solution concentrations are excluded from the model lies in

the excitation of experimental batches. In this set of batches,

excitation of some of the input variables is high compared

to the variations of the input solution concentrations. There-

fore, the latter were not determined as statistically significant

enough to be correlated with the process output. As already

mentioned, the input solution concentrations are uncontrol-

lableand couldnot be purposelychanged during experiments.

4.2.3. Experimental model with on-line structure

identification (EKSO)

Being aware of the above-explained deficiencies of avail-

able process data, the experimental black-box model was

also designed so as to employ the beneficial qualities of both

groups of data:

experimental batches, where high excitation of some input

variables (especially seeds) ensures the model validity in

a wider operating region, and

normal plant operation, where regular plant excitation (es-

pecially variations of input solution) affects the process

output and provides the information necessary for using

the model in flexible recipe control.

The procedure for constructing the model with the above

properties is shown in Fig.13.Fromtheleftpartofthescheme

it can be seen that the design of the model structure starts with

the experimental batches, from which a basic model structure

is identified and includes statistically important model inputs

that were selected on experimental batches (23). The model

structure is then completed with additional input variables

that are statistically determined from regular data. The pro-

cedure of selecting the additional input variables is repeated

every 100 batches.

The parameters of this model are computed from regular

plant data and are adjusted on-line every 100 batches, except

for:

the model parameter associated with the volume of added

seeds, which is kept at a constant value determined from

experimental batches;

the model constant that is on-line optimised over the last

four batches.

Thefirst exception is used since the volume of added seeds

is the only input of the basic model structure whose excitation

Fig. 13. Procedure for constructing the EKSO model.

in experimental batches is different (higher) than in regular

batches. The second exception was determined experimen-

tally, and can be explained by the dynamics of the input so-

lution variations. Although the batches of hydrolysis process

are performed independently one from the other, some sim-

ilarity in operating conditions of successive batches occurs

because of the similar quality of the input solution. Input so-

lution is prepared in preceding production processes and itscharacteristics change with the dynamics of these processes.

While the influence of measured input parameters on the hy-

drolysis process could be directly presented in the hydrolysis

process model, the influence of dynamic changes of unmea-

sured parameters can only be taken into account by adjusting

the model constant parameter. By simulation it was deter-

mined that the optimal number of batches for constant model

parameter adjustment is four batches.

The structure and parameters of the EKSO model as de-

termined on the available set of regular plant data are given

in Table 3, except for the constant model parameter, which is

adjusted on-line. Figs. 14 and 15 show the obtained results in

relation to the model output and input error, respectively. It

can be seen that good results have been obtained in relation

to both. Eighty-four percentages of batches lie within 5%

of the measured D50 and 97% of batches within 10%. For

91% of batches the calculated seeds addition is within25%

of the actual value.

In order to test the sensitivity of the obtained model to the

intended control variable, i.e. seeds addition, a simulation ex-

periment was also performed where the addition of seeds was

computed for the batches from normal production with the

aim to achieve the output quality parameterD50 equal to 2.15,

which is a desired production goal. With such an experiment


13/16


Table 3

Structure and parameters of the EKSO model

Successive batches Model

1100 Input variable Vseeds Tmin s CTiO2 Tmax wAssociated parameter value 7.999104 4.22103 8.95 103 7.33103 5.85102 2.02 104

101200 Input variable Vseeds Tmin s AA Tmax Cseeds

Associated parameter value 7.999104

5.86102

4.37 102

3.733103

7.64102

2.37 102

201300 Input variable Vseeds Tmin s Cseeds wAssociated parameter value 7.999104 5.22102 4.37 102 1.918102 1.352 105

301400 Input variable Vseeds Tmin s AA Cseeds CTiO2Associated parameter value 7.999104 2.47102 3.05 102 1.077102 2.304 102 1.67 102

it is possible to check whether the model has an appropriate

sensitivity to intended control variable, and to verify that the

low input error is not a result of the low model sensitivity

to seeds addition. Because of the static linear model struc-

ture of the form (19), the optimal value of control variable

ro for a desired value of process output yo can be determined

analytically in the following way:

ro =1

yo

1

Tup (24)

whereyo is in our case the desired value ofD50 and is equal to

2.15, ro is thecalculated optimal volume of added seeds, from

which the optimal seeds addition is determined in accordance

with (16), and up is the vector of remaining model inputs as

presented in Table 3.

TheobtainedresultsareshowninFig.16. The figure shows

that the model expresses the necessary sensitivity to seeds

addition required for flexible recipe control. The adjustment

of seeds addition is in the range expected from the expert

knowledge.

5. Discussion

In addition to observing the process and model behaviour

visually, the models were also validated and mutually com-

pared by computing different quantitative measures repre-

sented by RMSE (1), TIC (2) and REL measures (3) ex-

plained in Section 2. These measures were used to evaluate

the output and input error of different model structures. The

values of these criteria with indicated best performance are

shown in Tables 4 and 5 for output and input errors, respec-

tively. The measures were used to mutually compare differ-

ent models. As explained in the description of these crite-

Fig. 14. Measured and computed D50 for the EKSO model.


14/16


Fig. 15. Measured and computed seeds addition for the EKSO model.

Table 4

Model output errors

Output error

Model RMSE TIC REL

SDM 0.1061 0.0255 0.0504

EKSC 0.0899 0.0216 0.0427EKSO 0.0922 0.022S 0.0443

Table 5

Model input errors

Input error

Model RMSE TIC REL

SDM 0.1042 0.0967 0.1909EKSC 0.1657 0.1530 0.3040EKSO 0.0934 0.0873 0.1702

ria, low values of these measures mean better model quality.

The values of RMSE criterion are also shown graphically in

Fig. 17.

It can be seen that different measures mutually qualify

the models in the same way, i.e. the model with the lowest

RMSE value has also the lowest TIC and REL values. Thetables show that all the models are approximately of the same

quality in relation to the output error (see also Fig. 17). The

best performance in relation to output error is observed for

the EKSC model. This model is, however, not acceptable for

the intended use due to the high input error. The SDM and

EKSO models have low input and output errors and are most

appropriate for use in the flexible recipe control. Hence, the

same conclusions were derived as from observing the model

behaviour qualitatively.

Fig. 16. Optimal control signal for the EKSO model (D50 aim= 2.15).


15/16


Fig. 17. Thevalues of RMSE criterionfor input andoutput error of different

models.

For use in an on-line control scheme, the SDM model is

more demanding. For its implementation it is necessary to use

appropriate software for simulating differential equations,

and optimisation routines for determining the optimal control

signal. Optimisation can be time-consuming, although in thisparticular case does not represent a time-critical task.

The implementation of the EKSO model is much simpler.

For this model the control signal is computed analytically.

Also, the identification of the EKSO model is simpler,

although the available data about the process in our case

requires us to perform the model design procedure very

carefully.

Besides obvious advantages also some drawbacks of the

experimental models compared to the SDM model can be

noticed. When using the SDM model on regular data no

problems regarding the sensitivity of model to seeds addition

were encountered, as was the case for experimental models.This is due to the dynamic part of the model that ensures the

appropriate model dependency on seeds addition despite a

low excitation signal during normal production.

6. Conclusions

In this paper, the design of models to be used in the flexible

recipe control scheme for control of the batch hydrolysis pro-

cess is presented. The models design and testing procedure

has shown that the process models used for control purposes

should be validated not only in relation to the output error, but

also in relation to the input error and optimal control signal.

Only the three measures used together provide the necessary

model validation in relation to model use. The output error

shows whether the designed model represents the overall pro-

cess behaviour accurately enough; the input error shows how

accurately the process control variable is incorporated into

the model; while the optimal control signal shows whether

the model expresses the necessary (or expected) sensitivity

of the process control variable.

In our particular case, thethree measures enabled us to val-

idate different model structures: the semi-empirical dynamic

model and different black-box static models. Furthermore,

they helped to improve the model design procedure in such

a way that a simple and satisfactory experimental model was

constructed despite a very limited data set for model identifi-

cation. Such a model was designed by combining experimen-

tal and regular plant data that together provided the necessary

excitation for the statistical choice of appropriate model in-

put variables and model parameters. Black-box identificationhas shown that observing only the model output error would

in this case result in a wrong model structure, which is not

suitable for the intended model use.

Two of the designed models are appropriate for use in

flexible recipe control of the hydrolysis process: the semi-

empirical dynamic model (SDM) and one of the experimen-

tal models (EKSO). Both perform satisfactorily in relation to

input and output errors, and provide the necessary sensitivity

of the intended control variable. The implementation of the

EKSO in the on-line control scheme is simpler, as it deter-

mines the optimal control signal analytically, while the SDM

model requires optimisation. On the other hand, the SDM

model is expected to be valid and more reliable in a wideroperating region due to a more complex model structure.

References

Bohlin, T. (1991). Interactive system identification: Prospects and pitfalls.

Springer-Verlag.

Bossel, H. (1994). Modeling and simulation. Wiesbaden: Verlag Vieweg.

Cameron, R., Marcos, R. L., & de Prada, C. (1998). Model validation of

discrete transfer functions using the distortion method. Mathematical

and Computer Modelling of Dynamical Systems, 4, 5872.

Goodwin, G. C. (2002). Inverse problems with constraints. In Proceedings

of the 15th triennial world congress of the international federation ofautomatic control.

Gray, G. J., & von Grunhagen, W. (1998). An investigation of open-

loop and inverse simulation as nonlinear model validation tools for

helicopter flight mechanics. Mathematical and Computer Modelling

of Dynamical Systems, 4, 3257.

Kheir, N. A. (1988). Systems modelling and computer simulation. New

York: Marcel Dekker.

Ljung, L. (1999). System identification. Englewood Cliffs: Prentice-Hall,

Inc.

Murray-Smith, D. J. (1995). Continuous system simulation. London:

Chapman & Hall.

Murray-Smith, D. J. (1998). Methods for the external validation of con-

tinuous system simulation models: A review. Mathematical and Com-

puter Modelling of Dynamical Systems, 4, 531.

Murray-Smith, D. J. (2000). The inverse simulation approach: A focusedreview of methods and applications. Mathematics and Computers in

Simulation, 53, 239247.

Neelamkavil, F. (1987). Computer simulation and modelling. Chichester:

John Wiley & Sons.

Qureshi, M. E., Harrison, S. R., & Wegener, M. K. (1999). Validation of

multicriteria analysis models. Agricultural Systems, 62, 105116.

Rijnsdorp, J. E. (1991). Integrated process control and automation. Am-

sterdam: Elsevier Science.

Sage, A. P. (1992). Validation. In D. P. Atherton & P. Borne (Eds.),

Concise encyclopaedia of modelling and simulation (pp. 477488).

Oxford: Pergamon Press.

Santacesaria, E., Tonello, M., Storth, G., Pace, R. C., & Carra, S. (1986).

Kinetics of titanium dioxide precipitation by thermal hydrolysis. Jour-

nal of Colloid and Interface Science, 11(1), 4453.


16/16


Sel, D., Hvala, N., Strmcnik, S., Milanic, S., & Suk-Lubej, B. (1999). Ex-

perimental testing of flexible recipe control based on a hybrid model.

Control Engineering Practice, 7(10), 11911208.

Thiel, H. (1970). Economic forecasting and policy. Amsterdam: North-

Holland.

Verwater-Lukszo, Z. (1998). A practical approach to recipe improvement

and optimization in the batch processing industry. Computers in In-

dustry, 36, 279300.

Weisberg, S. (1985). Applied linear regression. New York: John Wiley &

Sons.

Zhou, X. (1993). A new method with high confidence for validation of

computer simulation models for flight systems. Chinese Journal of

Systems Engineering and Electronics, 4, 4352.

Zele, M., Juricic, ., Strmcnik, S., & Matko, D. (1998). A probabilistic

measure for model purposiveness in identification for control. Inter-

national Journal of Systems Science, 29, 653662.

Documents

Model Validation on Selection of Process Models