Model Validation on Selection of Process Models

Embed Size (px)

Citation preview

  • 7/28/2019 Model Validation on Selection of Process Models

    1/16

    Computers and Chemical Engineering 29 (2005) 15071522

    Influence of model validation on proper selection of processmodelsan industrial case study

    N. Hvala a,, S. Strmcnika, D. Sel a, S. Milanic b, B. Banko b

    a J. Stefan Institute, Jamova 39, 1000 Ljubljana, Sloveniab Faculty of Electrical Engineering, Trzaska c. 25, 1000 Ljubljana, Slovenia

    Received 26 July 2000; received in revised form 30 November 2004; accepted 30 November 2004

    Available online 7 March 2005

    Abstract

    This paper considers the design and validation of a model of an industrial batch process in TiO 2 production. The model will be used for

    flexible recipe control, which is a model-based approach used in control and optimisation of batch processes. Because of insufficient knowledge

    and a lack of proper data, different process models were developed: a semi-empirical dynamic model based on chemical kinetics laws, and

    several experimental black-box models. In the paper the models are validated and mutually compared. Validation of models has shown

    that besides comparing the model and the process output behaviour, additional measures considering also the model input error should be

    introduced for proper model validation related to the model use. In our case, introducing additional measures also contributed to improvement

    of the model design procedure, so that a simple yet satisfactory black-box model was obtained despite a small amount of process data.

    2004 Elsevier Ltd. All rights reserved.

    Keywords: Model validation; Model input error; Model-based control; Optimisation; Flexible recipe control; Hydrolysis batch process

    1. Introduction

    Mathematical modelling is a pervasive methodology,

    which represents an important and steadily increasing part of

    almost every field of science and engineering. The range of

    problems addressed by using mathematical models is practi-

    cally unlimited, therefore a vast amount of methods and tools

    have been developed trying to make the process of modelling

    simpler and more efficient. In spite of these endeavours math-

    ematical modelling is far from becoming a routine work. On

    the contrary, it is still considered to be more art then science.The reason lies in the fact that modelling is an iterative design

    procedure in which human judgement and creativity play a

    decisive role.

    One of the most important steps in this procedure is the

    evaluation of the model. Model evaluation generally consists

    of two stages: verification and validation. Verification con-

    Corresponding author. Tel.: +386 1 4773 900; fax: +386 1 4773 994.

    E-mail address: [email protected] (N. Hvala).

    cerns the consistency and accuracy of simulation programs

    compared with the associated mathematical models, while

    model validation concerns the level of agreement between

    mathematical descriptions and the real system under investi-

    gation (Murray-Smith, 1998). In the literature on modelling

    and simulation it is generally accepted that model validation

    is a crucial part of themodel development procedure. Without

    validation a model is of very little use (Neelamkavil, 1987).

    However, it is also stated that in reality, model validation is

    treated in a superficial way and does not form a central el-

    ement of the modelling process. As stated (Murray-Smith,1995), most application papers in journals and conference

    proceedings pass over questions of model validation in a su-

    perficial fashion or make no mention of it at all.

    In practice, validation is usually reduced to checking the

    agreement between outputs of the model and those of the

    real system (Qureshi, Harrison, & Wegener, 1999). This is

    especially true in the development of industrial process mod-

    els, where process complexity, insufficient knowledge about

    the process and a lack of proper data lead to this kind of

    0098-1354/$ see front matter 2004 Elsevier Ltd. All rights reserved.

    doi:10.1016/j.compchemeng.2004.11.013

  • 7/28/2019 Model Validation on Selection of Process Models

    2/16

    1508 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    simplifications. However,the mainissue in the design of mod-

    els for engineering purposes is to design a model, which is

    valid for the intended purpose. Therefore, the primary aim of

    the model validation should be to find out whether the model

    is good enough for the intended use. This task requires a

    much more comprehensive validation approach in which ad-

    ditional methods are used to build up the confidence into theconsidered model.

    The aim of the paper is to show that using more complete

    validation approach may lead to decisions in the model de-

    sign procedure, which are different from those obtained on

    the basis of simple evaluation of the output fit. In our case

    the main additional issue in the applied validation process

    is using the concept of the inverse model and the related in-

    put error, which provide complementary information about

    the model usability. The need for such an approach was en-

    countered over the design of a model of an industrial batch

    process in TiO2 production. The aim of the model is to apply

    it within the flexible recipe control (Rijnsdorp, 1991), which

    is a model-based approach used in control and optimisationof batch processes. Different models were developed for this

    process, i.e. a semi-empirical model based on chemical kinet-

    ics laws, and several experimental (black-box) models. The

    models were designed based on different sources of process

    information, i.e. process knowledge and available process

    data. All the models express relatively similar quality if con-

    sidering the agreement to process output data. In the paper it

    is shown that only after validating the models also in relation

    to the model use, i.e. evaluating the model input error and

    performing sensitivity analysis of the control variable, was it

    possible to distinguish between the performances of different

    model structures, and to direct the model design in the properway.

    The paper is organized as follows. After a short overview

    of model validation methods, it describes the industrial pro-

    cess of batch hydrolysis and presents the concept of flexible

    recipes to be applied for control of this process. The main em-

    phasis is then placed on the design and validation of different

    process models to be used in the control scheme. The models

    are evaluated and mutually compared in relation to different

    performance measures. The obtained results are then com-

    mented on in the discussion.

    2. Model validation

    The main concepts of model validation can be found in

    most textbooks on modelling and simulation (e.g. Bossel,

    1994; Kheir, 1988; Murray-Smith, 1995; Neelamkavil,

    1987), while a survey of model validation methods has been

    presented by Murray-Smith (1998). Although the available

    concepts and methods are rather general, one has to be aware

    that model validation is always problem dependent (Ljung,

    1999; Qureshi et al., 1999).

    Generally speaking,the quality of themodel canbe judged

    with respect to several features. The most important ones

    are model purposiveness, model falseness and model plau-

    sibility (Bohlin, 1991; Sage, 1992; Zele, Juricic, Strmcnik,

    & Matko, 1998). Purposiveness (usefulness) tells whether a

    model satisfies its purpose. Falseness is related to agreement

    with measurements (data) coming from the real system to be

    modelled (a falsified model is one, which is contradicted by

    data). Plausibility, also referred to as conceptual validity orface validity (Qureshi et al., 1999), expresses the confor-

    mity of the model with a priori knowledge about the process.

    Below is a short summary of model validation concepts re-

    lated to the above-defined features.

    2.1. Model purposiveness

    A model is always developed with a certain purpose, i.e.

    with theaim to solvea certain problem. Thereforethe ultimate

    validation of the model is to test whether the problem that

    motivated the modelling exercise can be solved using the ob-

    tained model (Ljung, 1999). Testing of model purposiveness

    might be often impossible, too expensive, time consuming,dangerous, etc. In some cases the mentioned problems can be

    alleviated by testing the solution in a simulation environment

    based on the process model. However, such an approach only

    partially solves the problem and is still difficult to perform.

    In practice, assessment of model plausibility and espe-

    cially falsification is generally easier. Hence, it often hap-

    pens that falsification is performed instead of validation of

    model purposiveness. However, even if a model agrees with

    the available data it may not be necessarily good enough for

    a given purpose, and such an approach may lead to the design

    of an inappropriate model. Unfortunately the opposite might

    also be true. The model may not agree with data, and is stillgood enough to serve its purpose.

    2.2. Model plausibility

    Assessment of model plausibility is tightly related to ex-

    pert judgement of whether the model is good or not. The

    level of plausibility, or better said the expert opinion about it,

    is basically related to two features of the model.

    The first one considers the question whether the model

    looks logical. This question concerns characteristics of

    the model structure (type of equations, connections between

    equations, etc.) and its parameters (gains, time constants,

    signs, etc.), and is relevant when the model is derived from

    first principles. If the structure and the parameters are feasi-

    ble, which means comparable to what experts know about the

    real process, then the confidence into the model is greater.

    The second one is related to the question whether the

    model behaves logically. This part concerns assessment

    of the reaction of the model outputs (dynamics, shape, etc.)

    to typical events (scenarios) on the inputs. If the model in

    different situation reacts in accordance with expectations of

    the experts, then again our confidence about its validity is

    increased. Note that for black box models this is the only

    way to assess the plausibility.

  • 7/28/2019 Model Validation on Selection of Process Models

    3/16

    N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1509

    Fig. 1. Computationof modeloutput error (P, process;M, model; u, process

    input; n, noise; y, process output; ym, model output; eo, output error).

    2.3. Model falseness

    As already mentioned falsification is the most widely used

    approach to the validation of models and is related to direct

    comparison of inputoutput data from the model and from

    the real system. However, also within this validation area the

    methods substantially differ concerning the applied princi-

    ples. The basic distinction concerns the questions what iscompared and how it is compared.

    2.3.1. Comparison of outputs

    Comparison of model and process outputs is a standard

    approach, which does not need explanation. For sake of com-

    pleteness and to enable a comparison with other approaches,

    let us just point out that the issue under consideration here

    is the difference between the process and the model output,

    which is referred to as the output error. It is computed in the

    way as shown in Fig. 1.

    2.3.2. Comparison of inputsIn some applications differences, which appear in the in-

    put variables, may highlight deficiencies in the model more

    readily than conventional comparisons based on output vari-

    ables (Murray-Smith, 2000). This is especially important in

    the case of control or optimisation problems for processes

    with multiple inputs. It is known that different combinations

    of values of input variables may result in the same (or very

    similar) values of the output variables. Thus, the same output

    error could be produced by different input combinations. For

    that purpose, the input error is determined as another rele-

    vant measure for model validation (Gray & von Grunhagen,

    1998). Input error is the difference between the process in-

    put and the output of the inverse model, the latter computed

    when the measured process output is applied as the input

    to the inverse model. The computation of the input error is

    schematically shown in Fig. 2.

    The basic problem concerning this kind of validation is

    apparently related to the concept of the inverse model. This

    concept has been extensively used in the areas of signal pro-

    cessing, telecommunication, and control (Goodwin, 2002),

    where also methods for derivation and use of inverse models

    have been developed. Basically two approaches are possible:

    first, the explicit analytical derivation, which is limited to

    simple, mostly linear models, and second, the more general

    Fig. 2. Computation of model input error (P, process; M1, inverse model;

    u, process input; n, noise; y, process output; um, inverse model output; ei,

    input error).

    implicit simulation and optimisation technique, which does

    not provide the inverse model itself, but provides the solution

    of the inverse problem.

    2.3.3. Comparison methods

    The comparison of measured and simulated data can be

    performed qualitatively, quantitatively, or based on statisticalmethods (Murray-Smith, 1998).

    Qualitative approaches involve plotting the process and

    the corresponding model variables and performing visual in-

    spection of differences.

    Quantitative methods are based on performance measures

    that determine goodness of fit. In this paper root mean square

    error (RMSE), Theils inequality coefficient (TIC) (Thiel,

    1970), and relative error (REL) are used, which are defined

    with the following equations:

    RMSE = i(yi ym,i)2

    N(1)

    TIC =

    i(yi ym,i)

    2iy

    2i +

    iy

    2m,i

    (2)

    REL =

    i((yi ym,i)

    2/y2i )

    N(3)

    Thereby yi represent the measured data points, ym,i the com-

    puted data points andNthe number of data points. Lower val-

    ues of RMSE and RELindicate better agreement between the

    measured and computed data. The value of TIC lies between

    zero and unity, with values closer to zero indicating better

    model validity. The measures are more appropriate for model

    comparison than for use in an absolute sense, although Zhou

    (1993), for example, suggested that values of TIC smaller

    than 0.3 indicate good agreement with measured data.

    Statistical methods apply to the comparison of the distri-

    bution of the data rather than to point-by-point comparisons.

    They include descriptive statistics, which deals with means,

    variances, correlations, etc., and inferential statistics, which

    considers hypothesis tests and confidential intervals (Qureshi

    et al., 1999). A widely used on-line validation approach is a

    stepwise regression wherethe selection of the modelstructure

  • 7/28/2019 Model Validation on Selection of Process Models

    4/16

    1510 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    is based on the correlation coefficient R and the F-ratio. R

    gives a measure of the accuracy of the fit, while the F-ratio

    provides a measure of the confidence that can be ascribed to

    this fit.

    2.4. Other validation methods

    There are also validation methods, which cannot be di-

    rectly assigned to one of the considered three groups.

    One of such approaches is sensitivity analysis. Sensitivity

    analysis examines the extent of variation in predicted perfor-

    mance when parameters are systematically varied over some

    range of interest (Qureshi et al., 1999). It provides the in-

    sight into stability of the model and into priority ways for its

    refinement.

    A similar but complementary approach is known as distor-

    tion method(Cameron, Marcos, & de Prada, 1998; Murray-

    Smith, 1998). The approach is based on the assumption that

    any model output can be made to follow a measured variable

    by distorting the model parameters as a function of time.The better the model, the less distortion is required. If the

    distortion needed to match the process response lies within

    acceptable limits, the model is judged satisfactory.

    Finally, there are some methods, which are related to data

    analysis or data validity. One of the most known is based

    on system identification techniques and uses the concept of

    identifiability (Murray-Smith, 1998). Unique and reliable es-

    timates of the parameters of a model of given structure can be

    obtained only if that model is identifiable. Structural uniden-

    tifiability occurswhen a model hastoo many parameters to al-

    low all of them to be estimated separately from the measured

    variables. In that case, it is known that certain parameterscan only be estimated in combination with other parameters.

    Numerical identifiability is associated with measurement er-

    ror, process noise or inadequate information in the measured

    data. It is especially important in the design of experiments;

    they should be designed so as to obtain data about the process

    that is rich in information.

    The above listed methods represent possible ways to

    model validation, but are in practice still not fully exploited

    for ascertaining the model quality. This paper shows how

    some of these methods can be used in practice for the evalu-

    ation and improvement of the process model.

    3. Problem description

    3.1. Process description

    The problem addressed in this paper is modelling of hy-

    drolysis process, which is one of 18 successive processes in

    the production of TiO2 pigment. Hydrolysis runs in a batch

    reactor (Fig. 3) where TiO2gel is formed by precipitation out

    of TiOSO4 solution. The reaction occurs by adding water

    and seeds, and maintaining the solution at boiling point for

    an appropriate amount of time.

    Fig. 3. Batch hydrolysis reactor.

    The reaction is depicted by the following chemical equa-

    tions:

    TiOSO4 +H2O TiO2 +H2SO4 (4)

    TiO2nucleation seeds (5)

    TiO2 + seeds TiO2gel (6)

    TiO2 is an intermediate product. In the nucleation process it

    is transformed into seeds (5), or else it is precipitated onto

    already formed seeds (6). Additional seeds are added to the

    solution during the heating phase to accelerate precipitation.

    The final product is the TiO2gel. H2SO4 is the hydrolysis by-

    product.

    Hydrolysis is one of the most crucial parts of the overallproduction process, since for the first time in the production

    line particles (TiO2gel) are formed out of solution. The size

    of particles has a strong influence on the quality of the final

    product. Hence,it is desired that theparticles be of desired op-

    timal size. Achieving this is not a straightforward task, since

    the process inputs are subject to uncontrollable variations,

    which are also reflected in the process output variations. The

    size of particles in the TiO2gel is determined by the parameter

    D50, which represents the diameter of floccules attributed to

    the highest number of floccules. On-line control (during the

    batch) of particles size is not possible, since parameter D50is measured only at the end of the batch, when a sample of

    the final product is examined by laboratory analysis.

    3.2. Flexible recipe control

    It is expected that more uniform particles size between the

    batches could be achieved by flexible recipe control, which is

    a control concept introduced for batch processes (Rijnsdorp,

    1991; Verwater-Lukszo, 1998). Unlike fixed recipes that pre-

    scribe fixed recipe instructions for running a batch, flexible

    recipe control adjusts the recipe parameters to account for

    changes in input and operating conditions of a current batch.

    In this way, final product quantity and/or quality could be

  • 7/28/2019 Model Validation on Selection of Process Models

    5/16

    N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1511

    Fig. 4. Flexible recipe control: optimisation of recipe parameters with theaid of a process model.

    retained despite changing conditions in performing a batch.

    Adjustment of recipe parameters is done by optimisation

    (Fig. 4) where a performance index representing the eco-

    nomic aspects of process operation (e.g. production yield,

    product quality, energy consumption, etc.) is optimised by

    varying the recipe parameters. The value of the performance

    index at suggested values of recipe parameters is computed

    by simulating the process behaviour using the process model.

    In the case of hydrolysis process, flexible recipe control

    is applied for control of quality parameter D50, which is theprocess output variable. Variations of input concentrations

    are the main source of input variations, while seeds addition

    can be chosen as an adjustable recipe item (control variable).

    The construction of the process model used in flexible recipe

    control is describedin thenext section. Potentialmodel inputs

    affecting D50 are active acid concentration (AA), free acid

    concentration (AP) and TiO2 concentration (CTiO2 ),all inthe

    input solution, concentration (Cseeds) and volume (Vseeds) of

    added seeds, temperature during the reaction, flow of added

    water (w), and steam flow after boiling (s).

    Control of hydrolysis process using flexible recipe control

    will be performed by calculating the optimal seeds addition

    for each batch with the aim to produce the particles of the de-sired optimal size. This calculation will be performedat a start

    of each batch and will be based on most recently measured

    concentrations of input solution, while the other operating

    parameters, e.g. temperature, addition of water, etc., will be

    set equal to the preceding batch.

    4. Design of process models

    As in many real world applications, the design of a pro-

    cess model for the hydrolysis process was very much con-

    strained by the available knowledge and data about the pro-

    cess. Knowledge was insufficientdue to theprocess complex-

    ity, where only some basic relations from chemical kinetics

    laws could be derived. On the other hand, data about the pro-

    cess was also limited to two groups:

    a small set of experimental batches (23 batches) performed

    purposely for model identification; a set of 449 batches from regular production.

    The batches from both groups were processed in one of

    the industrial hydrolysis reactors.

    In the experimental batch set, some of the recipe param-

    eters (i.e. the seeds addition, the flow of added water) were

    subject to considerable change in order to observe the con-

    sequent changes in the process output. Changes were made

    great enough to exceed the process noise and to obtain a

    model valid in a wide operational region. The input solution

    concentrations are uncontrollable and varied in this set of

    batches as in normal production.

    The second set represented batches from normal produc-

    tion. They were performed under fixed recipe control, so that

    no special excitation of recipe parameters was imposed. Only

    a small number of batches in this set were performed with a

    change of seeds addition.

    In the design of process models two types of models

    were developed: a semi-empirical dynamic modeland several

    static (black-box) models.

    4.1. Semi-empirical dynamic model (SDM)

    A semi-empirical dynamic model (Sel, Hvala, Strmcnik,

    Milanic, & Suk-Lubej, 1999) was designed based on twotypes of knowledge: chemical kinetics laws and empirical

    knowledge of process experts. The model consists of two

    sub-models (Fig. 5):

    a dynamic model representing the curve of precipitated

    TiO2gel during the batch, and

    a static model mapping the precipitation rate curve into

    quality parameter D50.

    4.1.1. Dynamic part of the SDM model

    The structure of the dynamic model was based on chem-

    ical kinetics laws that describe dynamic relations between

    reactant concentrations (c) and the process reaction rate (r)during the batch. Reaction (precipitation) yield is expressed

    as a function of time, and is related to the reaction rate r(t) in

    the following way:

    yield(t) =

    t0

    r(t) dt (7)

    According to chemical kinetics laws, the reaction rate for the

    reaction A + BC generally depends on the concentrations

    of reactants A and B, and is written as

    r(t) = k(T)caAcbB (8)

  • 7/28/2019 Model Validation on Selection of Process Models

    6/16

    1512 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    Fig. 5. Structure of a semi-empirical dynamic model.

    where a and b are empirically determined constants; k(T)

    is a specific reaction rate that is modelled by the Arrheniusequation

    k(T) = k0 eE/T (9)

    where Trepresents temperature, and k0 and Eare constants.

    Hydrolysis is a reversible process. Therefore, the concen-

    tration of a particular component Cdepends on the reaction

    rate of producing (rp) and consuming (rc) this component,

    and is generally given as

    dcc

    dt= rp rc (10)

    Following these laws, a set of differential equations was de-termined for each component in the hydrolysis process. Tak-

    ing into account the chemical relations (4)(6), the model

    equations are the following:

    rTiOSO4 =dcTiOSO4

    dt= k1cTiOSO4 cH2O + k2cTiO2

    cH2SO4

    (11)

    rH2O =dcH2O

    dt

    = k1cTiOSO4 cH2O + k2cTiO2cH2SO4 + h2uH2O (12)

    rTiO2= dc

    TiO2

    dt= k1cTiOSO4 cH2O

    (k2 + k3)cTiO2 cH2SO4 k4cTiO2

    ch1seeds + rgel (13)

    rH2SO4 =dcH2SO4

    dt= k1cTiOSO4 cH2O k2cTiO2

    cH2SO4 (14)

    rTiO2gel =dcTiO2gel

    dt= k4cTiO2

    ch1seeds rgel (15)

    where the following notation is used: r is the reaction rate

    of different components (indexes are used to distinguish be-

    tween individual components presented in (4)(6)), c the

    components concentrations, k the parameters modelled bythe Arrhenius equation (9), h the constant model parameters,

    rgel the gel activity (constant). uH2O is the addition of water,

    which is equal to the flow of indirectly added water s during

    the heating phase, and directly added water w during the

    cooking phase. Temperature is included in the model in the

    Arrhenius model parameters k. cseeds is the concentration of

    seeds in TiO2gel, and depends on the reaction rate of seeds

    formation in the nucleation process, as well as on the volume

    Vseeds and concentration Cseeds of added seeds. The latter two

    parameters are also combined into technological parameter

    named seeds addition useeds, which is defined as the percent

    of TiO2 mass in added seeds compared to the TiO2 mass ininitial solution and can be written as follows:

    useeds =CseedsVseeds

    CTiO2 Vsol 100% (16)

    CTiO2 is the TiO2 concentration in initial solution, and Vsol is

    the volume of initial solution (equal for each batch).

    Eqs. (11)(15) were constructed with the assumption that

    all the chemical reactions are of the first order, i.e. all the

    exponential parameters are set to 1 (except h1). Initial val-

    ues of the simulated process variables are determined from

    the initial solution concentrations. The result of the system

    of differential equations is the concentration of precipitated

    TiO2gel in time (cTiO2gel ), also denoted as yield, which can

    be represented by the precipitation curve (Fig. 6). The to-

    tal batch duration is 5 h with the reaction lasting of ap-

    proximately 1.5 h, while the model time constant is around

    1 h.

    Parameters of thedynamic model were estimatedusingthe

    experimental data set. For that purpose experimental batches

    were performed with additional laboratory measurements

    of precipitation yield at some pre-defined times during the

    batch. Parameter values were obtained by a simplex optimi-

    sation method, by which the difference between themeasured

    and computed yield was optimised for a set of experimental

  • 7/28/2019 Model Validation on Selection of Process Models

    7/16

    N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1513

    Fig. 6. Dynamic part of the SDM model: measured (*) and computed ()

    yield during an experimental batch.

    batches. As an example, Fig. 6 shows the precipitation curve

    for comparison with experimental data. In the set of batches

    used for validation, the model predicts the final yield within

    5% of the measured value for 78% of batches and within

    10% for all the batches.

    The performance of the model was also evaluated by the

    sensitivity analysis of the model to changes of model input

    variables. The model sensitivity was in accordance with the a

    priori expert knowledge as well as with information found in

    the literature (Santacesaria, Tonello, Storth, Pace, & Carra,

    1986). The dynamic model is most sensitive to seeds addi-

    tion andtemperature. Besidethesetwo inputsalso initial TiO2concentration and active acid concentration have an impor-

    tant influence on precipitation curve.

    4.1.2. Static part of the SDM model

    The static part of the semi-empirical dynamic model was

    designed in order to map the precipitation curve into out-

    put quality parameter D50. The structure of the static model

    could not be based on theoretical knowledge. Hence, it was

    determined by trial and error, where different segments of

    the precipitation curve were tested as potential model inputs.

    Also, the seeds addition useeds was considered to have a direct

    influence on the model output. In each step of the selection

    procedure, thechosen model structurewas evaluated by thefit

    between the model and the process data. The model structure

    finally chosen is the following:

    D50 = p1K23 + p2K223 + p3K67 + p4K89 + p5useeds

    +p6 RVM+ p7 RVM2 + p8K78 RVM

    2 + p9 (17)

    where Kdenote the slopes of different parts of precipitation

    curve shown in Fig. 7 (e.g. K23 denotes the slope between

    the two points where the yield reaches the value of 20 and

    30%), RVM is the maximum slope of the precipitation curve,

    and p denote the model parameters. The chosen segments

    Fig. 7. Segments of the precipitation curve used as inputs in the static part

    of SDM model.

    represent the most informative parts related to the shape of

    the precipitation curve.

    Parameters p were determined by the least-squares

    method. They were estimated on a selected set of batches

    from normal production. This batch set was chosen in such a

    way that an equal distribution of batches over the entire range

    of measured process output was obtained. The final values of

    the complete model parameters are shown in Table 1.

    4.1.3. Validation of SDM model

    Fig. 8 shows a comparison between the measured and

    computedD50 for the semi-empirical model validated on a setof regular batches. Two diagrams arepresented. Theupper di-

    agram shows a chronological history of batches. For purposes

    of better presentation the results are presented with lines, al-

    though they do represent individual batches and should actu-

    ally be represented by points. The lower diagram shows the

    same results sorted according to ascending D50 (measured).

    From the upper diagram it can be seen that the model predicts

    relatively well the variations of the actual process output.

    Seventy-two percentages of batches lie within 5% of the

    measured D50 and 94% of batches within 10%. The stan-

    dard deviation of the error between the model and the process

    Table 1

    Parameters of the SDM model

    Dynamic model Static model

    k10 0.0174 p1 1.7066

    k20 0 p2 1.379

    k30 0 p3 1.9071

    k40 0.0395 p4 0.2171

    E1 777.12 p5 2.1169

    E2 0 p6 0.8844

    E3 0 p7 1.6041

    E4 1879 p8 1.4882

    h1 0.638 p9 2.2523

    h2 0.037

  • 7/28/2019 Model Validation on Selection of Process Models

    8/16

    1514 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    Fig. 8. Measured and computed D50 for the SDM model.

    is 0.106, which is approximately the same as the standard de-

    viation of theactual process output (0.114). It canbe expected

    that the model with this quality can be used to move the pro-

    cess output closer to the desired value (the production aim is

    to produce particles with D50 equal to 2.15, while the actual

    mean value is 2.07), while it cannot reduce significantly de-viations around the desired value. Similar conclusions were

    also drawn from a more profound analysis where 2-year plant

    data was used (Sel et al., 1999).

    The model wasevaluated also in relation to the input error.

    The complete model is a multi input single output model,

    with one of the input variables (seeds addition) used as a

    control variable in the intended model-based control. In such

    a case it is especially interesting to evaluate the input error of

    the control variable, since the errors between the actual and

    calculated model output are directly reflected in the wrong

    calculation of the control variable, when the model is used

    for control purposes. Observation of the input error of the

    control variable in the cases where different combinations ofprocess inputs give approximately the same process output

    helps to evaluate whether the control variable is included in

    the model in a proper way. The input error is in our case

    determined as a difference between the actual and calculated

    seeds addition, the latter determined as a value at which the

    model output is equal to the measured process output. The

    SDM model has a nonlinear structure. Therefore, the volume

    of added seeds (and consequently the seeds addition, see Eq.

    (16)) is determined by optimisation so that the difference

    between the actual and measured D50 is minimised, while all

    the other model inputs are equal to measured values.

    The input error of the SDM model is presented in Fig. 9.

    For 79% of batches the calculated seeds addition is within

    25% of the actual value.

    4.2. Black-box linear regression models

    Modelling of hydrolysis process was performed also by

    black-box models where output quality parameter D50 is de-

    termined as a static map from process input variables. The

    main question was whether a simple static model structure

    can give similar results as SDM model.

    The same input variables were addressed as potential

    model inputsas in thecase of semi-empirical dynamic model,

    except for the temperature, where temperature profile used

    in the case of SDM model was replaced by the minimum

    (Tmin) and maximum (Tmax) temperature because of the static

    model structure. The potential regressors of the black-box

    model were all the mentioned input variables, the productCseedsVseeds, andtheir logarithm, squared and square root val-

    ues.

    4.2.1. Experimental model with complete model

    structure (EKSC)

    A straightforward approach to model structure selection

    is a choice where all independent variables are included in

    the model. The structure of the model is as follows:

    D50 = 0 + 1Tmin + 2Vseeds + 3s + 4Cseeds

    +5w + 6Tmax + 7CTiO2 +8 AP+ 9 AA (18)

  • 7/28/2019 Model Validation on Selection of Process Models

    9/16

    N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1515

    Fig. 9. Measured and computed seeds addition for the SDM model.

    The values of model parameters were determined on a se-

    lected set of batches from regular production as in the case of

    the static part of the SDM model. Parameter values are given

    in Table 2.

    Also in this case, the model was evaluated in relation to the

    input and output error. Because of the linear model structure,the input error related to the seeds addition could be deter-

    mined analytically. For the static linear regression model of

    the form

    y = r + Tu (19)

    where ris the model input for which the input error is calcu-

    lated, is the corresponding modelparameter, u isthe n vector

    of remaining model inputs (uT = [ 1 u1 u2 un1 ]),

    is a n vector of corresponding model parameters, y is the

    model output and n is the number of all model inputs, the

    input error associated with the variable rcan be determined

    Table 2

    Parameters of the EKSC model

    0 7.5645

    1 3.2951 102

    2 4.4447 104

    3 3.159102

    4 2.5453 102

    5 1.7923 103

    6 1.9229 102

    7 7.8777 103

    8 2.0235 104

    9 2.1255 103

    as

    r = rp r|y=yp,u=up = rp 1

    yp +

    1

    Tup (20)

    where rp, up and yp are the measured values of input and

    output process variables. From the calculation of the modeloutput error

    y = yp y|r=rp,u=up = yp rp Tup (21)

    it can be seen that in the case of linear models, the model

    input and output error are interrelated in the following way

    r = 1

    y (22)

    The above expression shows that for a time series of input

    and output measurements, and corresponding model compu-

    tations, the dynamics of the input error are equal to the dy-

    namics of the output error. However, a model that expresses a

    small output error does not necessarily give also a small input

    error. The corresponding model parameter is responsible

    for the absolute input error of the model.

    The input error associated with the seeds addition was

    determined in accordance with (20) as a difference between

    the actual and computed seeds addition, the latter determined

    so that the model output is equal to measured D50, while all

    the other model inputs are equal to measured values. The

    computation is performed by first calculating the volume of

    added seeds Vseeds from (18) and then determining the seeds

    addition in accordance with (16).

  • 7/28/2019 Model Validation on Selection of Process Models

    10/16

    1516 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    Fig. 10. Measured and computed D50 for the EKSC model.

    Figs. 10 and 11 show the EKSC model output and input

    error, respectively. Fig. 10 shows that also the EKSC model

    is characterised by a good agreement of model output with

    process data. Seventy-seven percentages of batches lie within

    5% of the measuredD50 and 97% of batches within10%.

    However, from Fig. 11 it can be seen that the EKSC model

    gives much worse results in relation to the input error. Only

    for 60% of batches the calculated seeds addition is within

    Fig. 11. Measured and computed seeds addition for the EKSC model.

  • 7/28/2019 Model Validation on Selection of Process Models

    11/16

    N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1517

    Fig. 12. Variations of on-line adapted parameters of EKSS model with indicated confidence limits.

    25% of the actual value. The reason for poor model perfor-

    mance associated with the model input error is the estimation

    of model parameters on regular data. This data is not prop-

    erly excited in relation to seeds addition, which results in an

    inadequate estimate of the corresponding model parameter.

    4.2.2. Experimental model with statistical choice of

    process variables (EKSS)

    To get more relevant model structure and parameters, the

    black-box model was determined also by linear multiple re-

    gression methods and identified on experimental data, which

    include the necessary excitation of recipe parameters. A step-wise regression (Weisberg, 1985) was used to choose inde-

    pendent variables that are most related to the output variable

    D50. At each step during the selection procedure, ttests were

    performed to decide whether to add a variable into the model,

    delete a variable, or exchange two variables, tvalues to enter

    or remove a variable from the model were chosen as 1 and

    2.5, respectively. At each step, the parameters of the model

    were determined by the least-squares method. The obtained

    model is as follows:

    D50 = 0 + 1Tmin + 2Vseeds + 3s (23)

    Testing of the above model on data from regular production

    didnot givea satisfactorymodelfit. Theaverage model output

    error was about 25% larger compared to the SDM model.

    Hence, on-line parameter adaptation was considered as a test

    of whether the on-line adjustment of model parameters could

    improve model performance.

    Model adjustment is used to compensate for an approx-

    imate model structure and to take into account the time-

    varying process characteristics. On the other hand, it can also

    help in model validation as explained in the description of

    distortion method in Section 2. With the adaptation of model

    parameters, the model better fits the measured output data.

    If that requires considerable changes of model parameters,

    the model is not of a good quality. On the contrary, if the

    parameter values vary within certain limits, the model can be

    qualified as a good one.

    Fig. 12 shows the values of on-line adapted EKSS model

    parameters for a set of batches from regular production. The

    straight dashed lines indicate the parameters confidence in-

    tervals, tstatistics was used to determine confidence intervals

    for each parameter (Weisberg, 1985), while the parameter

    values themselves were initially determined on a set of ex-

    perimental batches. A 95% confidence interval was used to

    determine the constraints on parameter values.

  • 7/28/2019 Model Validation on Selection of Process Models

    12/16

    1518 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    From Fig. 12 it can be seen that during regular production

    the model parameters mainly vary within the specified range,

    except for the parameter associated with the volume of added

    seeds, whose value falls deeply below the lower limit. This

    again proves thefact that regular plant data arenot appropriate

    for estimating this model parameter.

    It should be also noted that the input variables in (23) donote include the concentrations of the input solution. There-

    fore, the model could not be applied in flexible recipe control

    where initial deviations of the input concentrations should be

    compensated for by changing the seeds addition. The reason

    for the statistical choice of independent variables, whereinput

    solution concentrations are excluded from the model lies in

    the excitation of experimental batches. In this set of batches,

    excitation of some of the input variables is high compared

    to the variations of the input solution concentrations. There-

    fore, the latter were not determined as statistically significant

    enough to be correlated with the process output. As already

    mentioned, the input solution concentrations are uncontrol-

    lableand couldnot be purposelychanged during experiments.

    4.2.3. Experimental model with on-line structure

    identification (EKSO)

    Being aware of the above-explained deficiencies of avail-

    able process data, the experimental black-box model was

    also designed so as to employ the beneficial qualities of both

    groups of data:

    experimental batches, where high excitation of some input

    variables (especially seeds) ensures the model validity in

    a wider operating region, and

    normal plant operation, where regular plant excitation (es-

    pecially variations of input solution) affects the process

    output and provides the information necessary for using

    the model in flexible recipe control.

    The procedure for constructing the model with the above

    properties is shown in Fig.13.Fromtheleftpartofthescheme

    it can be seen that the design of the model structure starts with

    the experimental batches, from which a basic model structure

    is identified and includes statistically important model inputs

    that were selected on experimental batches (23). The model

    structure is then completed with additional input variables

    that are statistically determined from regular data. The pro-

    cedure of selecting the additional input variables is repeated

    every 100 batches.

    The parameters of this model are computed from regular

    plant data and are adjusted on-line every 100 batches, except

    for:

    the model parameter associated with the volume of added

    seeds, which is kept at a constant value determined from

    experimental batches;

    the model constant that is on-line optimised over the last

    four batches.

    Thefirst exception is used since the volume of added seeds

    is the only input of the basic model structure whose excitation

    Fig. 13. Procedure for constructing the EKSO model.

    in experimental batches is different (higher) than in regular

    batches. The second exception was determined experimen-

    tally, and can be explained by the dynamics of the input so-

    lution variations. Although the batches of hydrolysis process

    are performed independently one from the other, some sim-

    ilarity in operating conditions of successive batches occurs

    because of the similar quality of the input solution. Input so-

    lution is prepared in preceding production processes and itscharacteristics change with the dynamics of these processes.

    While the influence of measured input parameters on the hy-

    drolysis process could be directly presented in the hydrolysis

    process model, the influence of dynamic changes of unmea-

    sured parameters can only be taken into account by adjusting

    the model constant parameter. By simulation it was deter-

    mined that the optimal number of batches for constant model

    parameter adjustment is four batches.

    The structure and parameters of the EKSO model as de-

    termined on the available set of regular plant data are given

    in Table 3, except for the constant model parameter, which is

    adjusted on-line. Figs. 14 and 15 show the obtained results in

    relation to the model output and input error, respectively. It

    can be seen that good results have been obtained in relation

    to both. Eighty-four percentages of batches lie within 5%

    of the measured D50 and 97% of batches within 10%. For

    91% of batches the calculated seeds addition is within25%

    of the actual value.

    In order to test the sensitivity of the obtained model to the

    intended control variable, i.e. seeds addition, a simulation ex-

    periment was also performed where the addition of seeds was

    computed for the batches from normal production with the

    aim to achieve the output quality parameterD50 equal to 2.15,

    which is a desired production goal. With such an experiment

  • 7/28/2019 Model Validation on Selection of Process Models

    13/16

    N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1519

    Table 3

    Structure and parameters of the EKSO model

    Successive batches Model

    1100 Input variable Vseeds Tmin s CTiO2 Tmax wAssociated parameter value 7.999104 4.22103 8.95 103 7.33103 5.85102 2.02 104

    101200 Input variable Vseeds Tmin s AA Tmax Cseeds

    Associated parameter value 7.999104

    5.86102

    4.37 102

    3.733103

    7.64102

    2.37 102

    201300 Input variable Vseeds Tmin s Cseeds wAssociated parameter value 7.999104 5.22102 4.37 102 1.918102 1.352 105

    301400 Input variable Vseeds Tmin s AA Cseeds CTiO2Associated parameter value 7.999104 2.47102 3.05 102 1.077102 2.304 102 1.67 102

    it is possible to check whether the model has an appropriate

    sensitivity to intended control variable, and to verify that the

    low input error is not a result of the low model sensitivity

    to seeds addition. Because of the static linear model struc-

    ture of the form (19), the optimal value of control variable

    ro for a desired value of process output yo can be determined

    analytically in the following way:

    ro =1

    yo

    1

    Tup (24)

    whereyo is in our case the desired value ofD50 and is equal to

    2.15, ro is thecalculated optimal volume of added seeds, from

    which the optimal seeds addition is determined in accordance

    with (16), and up is the vector of remaining model inputs as

    presented in Table 3.

    TheobtainedresultsareshowninFig.16. The figure shows

    that the model expresses the necessary sensitivity to seeds

    addition required for flexible recipe control. The adjustment

    of seeds addition is in the range expected from the expert

    knowledge.

    5. Discussion

    In addition to observing the process and model behaviour

    visually, the models were also validated and mutually com-

    pared by computing different quantitative measures repre-

    sented by RMSE (1), TIC (2) and REL measures (3) ex-

    plained in Section 2. These measures were used to evaluate

    the output and input error of different model structures. The

    values of these criteria with indicated best performance are

    shown in Tables 4 and 5 for output and input errors, respec-

    tively. The measures were used to mutually compare differ-

    ent models. As explained in the description of these crite-

    Fig. 14. Measured and computed D50 for the EKSO model.

  • 7/28/2019 Model Validation on Selection of Process Models

    14/16

    1520 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    Fig. 15. Measured and computed seeds addition for the EKSO model.

    Table 4

    Model output errors

    Output error

    Model RMSE TIC REL

    SDM 0.1061 0.0255 0.0504

    EKSC 0.0899 0.0216 0.0427EKSO 0.0922 0.022S 0.0443

    Table 5

    Model input errors

    Input error

    Model RMSE TIC REL

    SDM 0.1042 0.0967 0.1909EKSC 0.1657 0.1530 0.3040EKSO 0.0934 0.0873 0.1702

    ria, low values of these measures mean better model quality.

    The values of RMSE criterion are also shown graphically in

    Fig. 17.

    It can be seen that different measures mutually qualify

    the models in the same way, i.e. the model with the lowest

    RMSE value has also the lowest TIC and REL values. Thetables show that all the models are approximately of the same

    quality in relation to the output error (see also Fig. 17). The

    best performance in relation to output error is observed for

    the EKSC model. This model is, however, not acceptable for

    the intended use due to the high input error. The SDM and

    EKSO models have low input and output errors and are most

    appropriate for use in the flexible recipe control. Hence, the

    same conclusions were derived as from observing the model

    behaviour qualitatively.

    Fig. 16. Optimal control signal for the EKSO model (D50 aim= 2.15).

  • 7/28/2019 Model Validation on Selection of Process Models

    15/16

    N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522 1521

    Fig. 17. Thevalues of RMSE criterionfor input andoutput error of different

    models.

    For use in an on-line control scheme, the SDM model is

    more demanding. For its implementation it is necessary to use

    appropriate software for simulating differential equations,

    and optimisation routines for determining the optimal control

    signal. Optimisation can be time-consuming, although in thisparticular case does not represent a time-critical task.

    The implementation of the EKSO model is much simpler.

    For this model the control signal is computed analytically.

    Also, the identification of the EKSO model is simpler,

    although the available data about the process in our case

    requires us to perform the model design procedure very

    carefully.

    Besides obvious advantages also some drawbacks of the

    experimental models compared to the SDM model can be

    noticed. When using the SDM model on regular data no

    problems regarding the sensitivity of model to seeds addition

    were encountered, as was the case for experimental models.This is due to the dynamic part of the model that ensures the

    appropriate model dependency on seeds addition despite a

    low excitation signal during normal production.

    6. Conclusions

    In this paper, the design of models to be used in the flexible

    recipe control scheme for control of the batch hydrolysis pro-

    cess is presented. The models design and testing procedure

    has shown that the process models used for control purposes

    should be validated not only in relation to the output error, but

    also in relation to the input error and optimal control signal.

    Only the three measures used together provide the necessary

    model validation in relation to model use. The output error

    shows whether the designed model represents the overall pro-

    cess behaviour accurately enough; the input error shows how

    accurately the process control variable is incorporated into

    the model; while the optimal control signal shows whether

    the model expresses the necessary (or expected) sensitivity

    of the process control variable.

    In our particular case, thethree measures enabled us to val-

    idate different model structures: the semi-empirical dynamic

    model and different black-box static models. Furthermore,

    they helped to improve the model design procedure in such

    a way that a simple and satisfactory experimental model was

    constructed despite a very limited data set for model identifi-

    cation. Such a model was designed by combining experimen-

    tal and regular plant data that together provided the necessary

    excitation for the statistical choice of appropriate model in-

    put variables and model parameters. Black-box identificationhas shown that observing only the model output error would

    in this case result in a wrong model structure, which is not

    suitable for the intended model use.

    Two of the designed models are appropriate for use in

    flexible recipe control of the hydrolysis process: the semi-

    empirical dynamic model (SDM) and one of the experimen-

    tal models (EKSO). Both perform satisfactorily in relation to

    input and output errors, and provide the necessary sensitivity

    of the intended control variable. The implementation of the

    EKSO in the on-line control scheme is simpler, as it deter-

    mines the optimal control signal analytically, while the SDM

    model requires optimisation. On the other hand, the SDM

    model is expected to be valid and more reliable in a wideroperating region due to a more complex model structure.

    References

    Bohlin, T. (1991). Interactive system identification: Prospects and pitfalls.

    Springer-Verlag.

    Bossel, H. (1994). Modeling and simulation. Wiesbaden: Verlag Vieweg.

    Cameron, R., Marcos, R. L., & de Prada, C. (1998). Model validation of

    discrete transfer functions using the distortion method. Mathematical

    and Computer Modelling of Dynamical Systems, 4, 5872.

    Goodwin, G. C. (2002). Inverse problems with constraints. In Proceedings

    of the 15th triennial world congress of the international federation ofautomatic control.

    Gray, G. J., & von Grunhagen, W. (1998). An investigation of open-

    loop and inverse simulation as nonlinear model validation tools for

    helicopter flight mechanics. Mathematical and Computer Modelling

    of Dynamical Systems, 4, 3257.

    Kheir, N. A. (1988). Systems modelling and computer simulation. New

    York: Marcel Dekker.

    Ljung, L. (1999). System identification. Englewood Cliffs: Prentice-Hall,

    Inc.

    Murray-Smith, D. J. (1995). Continuous system simulation. London:

    Chapman & Hall.

    Murray-Smith, D. J. (1998). Methods for the external validation of con-

    tinuous system simulation models: A review. Mathematical and Com-

    puter Modelling of Dynamical Systems, 4, 531.

    Murray-Smith, D. J. (2000). The inverse simulation approach: A focusedreview of methods and applications. Mathematics and Computers in

    Simulation, 53, 239247.

    Neelamkavil, F. (1987). Computer simulation and modelling. Chichester:

    John Wiley & Sons.

    Qureshi, M. E., Harrison, S. R., & Wegener, M. K. (1999). Validation of

    multicriteria analysis models. Agricultural Systems, 62, 105116.

    Rijnsdorp, J. E. (1991). Integrated process control and automation. Am-

    sterdam: Elsevier Science.

    Sage, A. P. (1992). Validation. In D. P. Atherton & P. Borne (Eds.),

    Concise encyclopaedia of modelling and simulation (pp. 477488).

    Oxford: Pergamon Press.

    Santacesaria, E., Tonello, M., Storth, G., Pace, R. C., & Carra, S. (1986).

    Kinetics of titanium dioxide precipitation by thermal hydrolysis. Jour-

    nal of Colloid and Interface Science, 11(1), 4453.

  • 7/28/2019 Model Validation on Selection of Process Models

    16/16

    1522 N. Hvala et al. / Computers and Chemical Engineering 29 (2005) 15071522

    Sel, D., Hvala, N., Strmcnik, S., Milanic, S., & Suk-Lubej, B. (1999). Ex-

    perimental testing of flexible recipe control based on a hybrid model.

    Control Engineering Practice, 7(10), 11911208.

    Thiel, H. (1970). Economic forecasting and policy. Amsterdam: North-

    Holland.

    Verwater-Lukszo, Z. (1998). A practical approach to recipe improvement

    and optimization in the batch processing industry. Computers in In-

    dustry, 36, 279300.

    Weisberg, S. (1985). Applied linear regression. New York: John Wiley &

    Sons.

    Zhou, X. (1993). A new method with high confidence for validation of

    computer simulation models for flight systems. Chinese Journal of

    Systems Engineering and Electronics, 4, 4352.

    Zele, M., Juricic, ., Strmcnik, S., & Matko, D. (1998). A probabilistic

    measure for model purposiveness in identification for control. Inter-

    national Journal of Systems Science, 29, 653662.