On Establishing the Credibility of a Model for a System

Embed Size (px)

Citation preview

  • ON ESTABLISHING THE CREDIBILITY OF A MODEL FOR

    A SYSTEM

    SANTOKH SINGH

    Department of Mathematics, Indian Institute of Technology, Kanpur, India

    Asituation is considered where a number of models can be proposed for a system: achemical reaction, for instance. These models, based on a certain pattern of behaviouror mechanism, are claimed to be equally plausible for the given system. The model

    which most adequately describes the underlying phenomenon must, therefore, be chosen. Anovel procedure is proposed for accomplishing this task. A versatile distance function isemployed for designing experiments as well as for analysis of data for discrimination amongthe proposed models so as to establish the credibility of a single model. The potential of theprocedure is demonstrated through its implementation to linear and nonlinear mechanisticmodels, comparing as well its performance with other procedures reported in the literature.

    Keywords: design; discrimination; distance; sequential

    INTRODUCTION

    Quite often more than one model can be proposed for asystem. Such a situation arises quite naturally in theprocesses where, based on different mechanisms thoughtto be plausible for the given system, a number of models canbe postulated.For example, in the catalytic dehydrogenationof ethyl alcohol to ether over an acid ion-exchange resin,three models could be postulated for describing the reaction(Kabel and Johanson1). In another situation, ve kineticmodels could be proposed for the synthesis of methanolfrom carbon monoxide and hydrogen (Buzzi-Ferraris andDonati2). Similarly, in the dehydrogenation of 1-butene tobutadiene several models can be proposed not only on thebasis of ve possible reaction schemes, but also dependingupon the rate determining steps in each reaction scheme(Dumez and Froment3). The necessity in all these situationsis of reporting one model which could most adequatelydescribe the reaction rate, for instance. In general, in suchmulti-model situations it is important to select the modelwhich most closely predicts the observations or outcomes ofthe actual experimentation. Different sequential strategieshave been proposed in the literature for tackling thisproblem of model discrimination. Hunter and Riener4,Roth5, and Hosten and Froment6 have proposed to designan experiment in such a way that the tted surfaces at theresulting observation would be farthest apart. These criteriaare solely based on the predictions, yet do not take intoaccount the errors which, because of the parameterestimation, might have accrued into the predicted values.There are criteria, however, which take into account thevariances of predictions, too. But, some of these have beencriticized due to the rationale used in developing them. Forinstance, Meter et al.7 and Reilly8 have expressed theirsurprise over maximizing the upper bound of the expectedentropy change which forms the basis of the Box-Hill9

    discrimination procedure. Meter et al.7 are doubtful ifthe posterior probabilities of models used as weights in the

    Box-Hill9 design criterion, effectively do their job. Siddik10,Froment andMezaki11, Wentzheimer12, Atkinson and Cox13,and Hill14 have observed oscillating behaviour of the modelprobabilities which are supposed to provide a guidelinefor model discrimination in the Box-Hill9 method.So far as the practical utility of these procedures is

    concerned, varied experiences have been reported by theinvestigators who have used the procedures which con-sider the variances of predictions.Hunter and Mezaki15 havebeen able to discriminate among the chemical kineticmodels through the Box-Hill9 procedure. On the other hand,Buzzi-Ferraris and Forzatti16 have demonstrated throughan example that the Box-Hill procedure may sometimesperform very poorly in identifying the correct model. Theyhave instead used the ratio of two estimates of the errorvariance as the design criterion and the test of modeladequacy through an F-test as the discrimination criterion.Buzzi-Ferraris and Forzatti16 have claimed the superiorityof their procedure over that of Box and Hill9, but whilediscriminating among ve kinetic models they haveterminated their procedure after 30 runs by declaring fourmodels as equivalent on statistical grounds. Keeping in viewthe total number ( ve) of the competingmodels, the number(four), nally, selected is rather a large number forreasonable discrimination.In the present study a different approach is used to

    develop a discrimination procedure, free from certain awsand shortcomings which other procedures proposed in theliterature are reported to have. A statistic is proposed forthe assessment of relative adequacies of the competingmodels, so as to take a decision on the best model. Also,a weighted function is formulated which can be used as acriterion function for designing additional discriminatingexperiments. The present work also proposes a terminationcriterion as may be required in the sequential approach.All this is done, basically, through the statistical distancebetween the probability distributions associated withthe rival models. The implementation of the proposed

    657

    02638762/98/$10.00+0.00 Institution of Chemical Engineers

    Trans IChemE, Vol 76, Part A, September 1998

  • discrimination procedure is demonstrated through exam-ples, comparing as well its performance with otherprocedures reported in the literature.

    STATISTICAL FORMULATION OF THE PROBLEM

    Let g(o)(jk, h), with the set of parameter values, h = (h1, h2,

    . . . , hp)9 , represent the true value of the response obtained at

    the input variables jk = (j1 k , j2 k, . . . , jqk)9 . Obviously, the

    model M (o) : g(o), though unknown, is entitled to be truemodel for the system. Without deviating from reality, M (o)

    and the `system will be alternately used in the subsequentdiscussion. As long as the underlying phenomenon occursunder the in uence of chance causes, the response Yk can berepresented by the equation involvingM (o); namely

    Yk = g(o)(j

    k, h) 1 ek . (1)

    The component ek in (1) accounts for the randomnesspeculiar to the system output and is assumed to bedistributed normally with mean zero and variance s2. Therepresentation (1) suggests that Yk is a random variable withcertain probability distributionP (o)k , say.In actual practice, the realization of a true model such as

    M (o) would be a mere hypothesis. What one actually looksfor is a model which could behave considerably akin toM (o)

    (i.e., to the system), if not exactly. Consider a situationwhere m models: M (1),M (2), . . . ,M (m) have been proposedwhich are claimed to be equiplausible for the given system.Under the hypothesis that modelM (i) : g(i)(jk , h

    (i)) is correct,the response Yk can be described, alternatively, by theequation

    Yk = g(i)(j

    k, h(i)) 1 e(i)k (2)

    involving model M(i) : g(i), where g(i) represents the func-tional form of this model [linear or nonlinear in theparameters h(i) = (h(i)1 , h

    (i)2 , . . . , h

    (i)pi) 9 ]. Under model i, e(i)k in

    (2) is assumed to be normally distributed with mean zeroand variance s2ik . The representation (2) suggests that theresponse Yk has an alternative probability distribution P(i )k ,say. In the situation under study, there are m representations(models), such as (2), and hence m probability distributions

    P (1)k , P (2)k , . . ., P (m)k .

    THE APPROACH

    There is no denying the fact that a model, if correct for asystem, must be capable of simulating such observations aswould have been generated by the system itself. There is,however, always a discrepancy in a model which would,somehow, be re ected through the lack of af nity (in thestatistical sense) between the observations simulated bythe model and those generated by the system. This suggeststhat the af nity between the joint probability distributions

    Q (o) and Q (i), say, of the response variables Y1,Y2, . . . ,Ynunder models M (o) and M (i), respectively, can be utilizedfor looking at the discrepancy of M (i) from M (o), i.e., thecredibility of M (i) for the system. In the presence of mmodels, the said af nity will vary from model to model,depending upon the proximity of a model to the system. Acomparison of the af nities of the rival models to the systemcould, therefore, provide a guideline for discriminationamong them.

    It further follows that an experiment j n 1 1, say, couldbe most informative from discrimination point of view, ifthe measurement yn 1 1 resulting from it could render theprobability distributions P (1)n 1 1, P (2)n 1 1, . . ., P (m)n 1 1, underthe rival models, least af ne to one another, i.e., mostdissimilar in the statistical sense. Therefore, for planningdiscriminating experiments the af nity between models andfor that reason between the pairs of correspondingprobability distributions can be utilized to formulate adesign criterion.

    MODEL CREDIBILITY CRITERION

    One possible measure of af nity between two distribu-tions is the statistical distance between the correspondingprobability density functions (p.d.f.). In fact, the assessmentof the credibility of a model can be made through a functionwhich measures the distance in such a way that

    (i) the distance between two distributions is a non negativenumber and is actually positive if the distributions aredifferent,(ii) measured in either direction, the distances between twodistributions are the same.

    Also, keeping in view the fact that an n-dimensionalNormal distribution can be completely speci ed by n loca-tion and n(n 1 1)/2{ } orientation parameters, the dissim-ilarity between any two normal distributions can be judgedthrough the disagreement between these two types ofparameters. Since the random errors ek and e

    (i)k in the model

    representations (1) and (2) of the response are normallydistributed, the appropriate function which can incorporatesuch a disagreement as well as satisfy properties (i) and (ii) is

    C ( f (o), f (i)) = 2 loge f (o)(y) f (i)(y)1/2dy , (3)

    where f (o) and f (i ) are the p.d.f.s corresponding to the jointprobability distributions Q (o) and Q (i ) attributable, respec-tively, to the experimental observations and to the model `ihypothesized to be true.

    Formulation of the Credibility Criterion

    As a result of conducting n experiments, what oneactually gets are realizations y1, y2, . . . , yn, each of adifferent random variable Yk , k = 1, 2, . . . , n, whose dis-tribution depends on jk . These data can, therefore, beconsidered as a single multivariate sample from the jointdistribution of a large number of observations made in thecourse of a series of experiments. Since the uctuations ekare distributed normally, the p.d.f. of the random vectorY = (Y1 ,Y2, . . . , Yn)

    9 can be written

    fo(y) = (2p)2 n/ 2S 2 1/ 2

    exp[2 (1/2)(y 2 g(o))9 S 2 1(y 2 g(o))], (4)

    where y = (y1, y2, . . . , yn)9 , g(o) = (g(o)1 ,g

    (o)2 , . . . , g

    (o)n )9 and

    S = s2In.Alternatively, consider the hypothesis that M (i) is the

    correct model [equation(2)]. Assume for the time being thatthe parameters of this model are known. Then, as a result ofn predictions made from M (i) for the settings j

    1, j

    2, . . ., j n,

    what one gets are the realizations each of a different random

    658 SINGH

    Trans IChemE, Vol 76, Part A, September 1998

  • variable Yk, k = 1, 2, . . . , n, whose distribution depends onjk and the parameters h

    (i). These data can be considered as asingle multivariate sample from the joint distribution ofa large number of observations simulated by M (i) . Sincethe random component e

    (i)k in (2) is distributed normally with

    mean zero and variance s2i;k, under the hypothesis that model`i is correct, the random vector Y has the alternative p.d.f.

    fi(y) = (2p)2 n/ 2 | S i| 2

    1/ 2

    exp 2 (1/2)(y 2 g(i))9 S 2 1i (y 2 g(i)) , (5)

    where g(i) = (g(i)1 ,g

    (i )2 , . . . , g

    (i)n )9 and S i =

    diag{s2i;1, s2i;2, ..,s2i;n}. Using the p.d.f.s (4) and (5) in (3),the distance between the system and the model i can bewritten [(B.6) Appendix B]

    C ( fo, fi) =1

    8(g

    (o) 2 g(i)) 9S 1 S i

    2

    2 1

    (g(o) 2 g(i ))

    11

    42loge

    S 1 S i2

    2 loge S 2 loge S i ,

    i = 1, 2, . . . ,m. (6)

    An examination of expression (6) shows that the rstcomponent de ning C measures the dissimilarity betweenthe two distributions with respect to their locations, whilethe second distinguishes them in terms of their orientations.If the parameters g(o),g(i),S, Si of the two distributions

    involved in (6) are known, then C ( fo, fi) can measureexactly the discrepancy of model i from the system. Inactual practice, however, these parameters are, generally,not known. One can, then, use their estimates and obtainrather an approximation (Anderson14) of C . Using (6), basedon the sample of size n, the af nity of model `i to thesystem can be approximated by

    C (i)n =1

    8(Y 2 Y

    (i)) 9

    V 1 Vi2

    2 1

    (Y 2 Y(i))

    11

    42loge

    V 1 Vi2

    2 loge V 2 loge Vi ,

    i = 1, 2, . . . ,m, (7)

    where Y is a vector of n experimental observations, Y (i) avector of response values estimated from model M (i),V = s2In, and Vi is an appropriately chosen estimate of S i.One possible choice of Vi is suggested in the following.Assuming that the bias, (Yk- Yk

    (i)), in the prediction is

    small as compared with the error in the estimation ofthe model parameters h(i) as well as with the error in themeasurement Yk and that these errors are statisticallyindependent, an appropriate estimate of Si is given by thematrix Vi = diag s

    2i;1 , s

    2i;2 , . . . , s

    2i;n , where

    s2i;k = s2 1 1 X (i) 9k

    n

    k = 1

    X (i)k X(i) 9k

    2 1

    X (i)k ,

    X (i)k = x(i)k1 , x

    (i)k2 , . . . , x

    (i)kpi

    9, (8)

    with

    x(i)k,t =g(i)(jk, h(i))

    h(i)t h(i ) = h(i ), t = 1, 2, . . . , pi (9)

    as the statistic which is sensitive to estimation of the modelparameters involved.In order to measure the relative adequacy of model `i , the

    statistic

    D(i)n =D(i)n 2 1C (i)n

    m

    j = 1

    D( j )n 2 1C ( j)n, i = 1, 2, . . . ,m (10)

    can be used, where D(i)n2 1 refers to the value of this statisticfor model `i at the previous, (n 2 1)-th, stage and C (i)n isgiven by (7). The inclusion of D(i)n 2 1 in (10) makes provisionfor incorporating the information gained on discrimina-tion upto the previous stage, while the component C (i)naccounts for the current status of model `i . Initially, whenall the models are considered equiplausible, D(i)n2 1 = 1/m,i = 1, 2, . . . ,m. However, it does not prohibit one fromusing different initial values of the index for one or moremodels in the given set, if the situation so demands,provided these values add to 1. The statistic D(i)n in (10)assumes values between 0 and 1. According to the approachproposed in this work, the model with the lowest value ofthe index D(*)n and hence the highest af nity to the systemis rated as the most credible model. In the followingdiscussion this statistic will be referred to as the Dis-crimination Index (DI).

    DESIGN CRITERION FOR MODEL CREDIBILITY

    It may happen that the given set of observations do notprovide a strong basis for discriminating among theproposed models. Then the requirement is of acquiringadditional observations which could enhance the discriminat-ing power of DI. Let themodelsM (u) andM (v) be hypothesizedto be the true alternative models. Then, following the earlierdiscussion, the observation Yn 1 1, yet to be realized, can beassumed to be distributed according to the probabilitydistributionsP (u)n1 1 and P (v)n 1 1 with p.d.f. s g(u)n1 1 and g(v)n1 1, say.

    Distance Due to Additional Observation

    In order that Yn 1 1 could be a discriminatory observation,the addition of this observation should result into maximumstatistical distance between P (u)n 1 1 and P (v)n 1 1. In the presenceof m models, there are (m2 ) such pairwise distances.Therefore, the appropriate function measuring the distancein this case would be the one which , in addition to theproperties (i) and (ii) mentioned in the previous section, alsomeets the requirement:

    (iii) the distances are not diminished even if they aremeasured via some third distribution, i.e., the distribution ofthe experimental observation Yn 1 1 under M

    (o).

    The functionwhich satis es the properties (i), (ii), and (iii) is

    Gu,v(j n 1 1) = 1 2 g(u)n 1 1(yn 1 1)g

    (v)n 1 1(yn 1 1)

    1/ 2dyn 1 1

    1/2

    .

    (11)

    The experiment jn 1 1, designed by maximizing the distancefunction G in (11) would be the one which could result intoan observation introducingmaximum dissimilarity between

    P (u)n 1 1 and P (v)n 1 1 and hence between the rivals M (u) and M (v).In order to obtain the p.d.f.s g(i)n 1 1, i = u, v appearing in

    659ON ESTABLISHING THE CREDIBILITY OF A MODEL FOR A SYSTEM

    Trans IChemE, Vol 76, Part A, September 1998

  • (11), it may be recalled that under the hypothesis : M (i) isthe correct model, the observation Yn 1 1 can be representedby

    Yn 1 1 = g(i)(j

    n1 1, h

    (i)) 1 e(i )n 1 1,

    where e(i)n1 1 is assumed to be distributed as N(0,s2). But, at

    the n-th stage E(Yn 1 1) [= g(i)n 1 1], which depends on the

    parameters h(i) and the (n 1 1)-th setting of the inputvariables jn 1 1, is not known. Thus, the knowledge about theprobability distribution of the observation Yn 1 1, yet to beobtained, is not complete. Therefore, the posterior p.d.f. ofYn 1 1 under the alternative models `u and `v will be used indeveloping the function G in (11).Consider the uniform prior (noninformative) distribution

    of h(i) which amounts to assuming that nothing is known `apriori about the distribution of h(i). Then it can be shown[Appendix A] that the posterior density of Yn 1 1 under model`i is

    g(i )n 1 1(yn 1 1) = (2pv2i )2 1/ 2 exp 2 (1/2)

    (yn 1 1 2 y(i)n 1 1)

    vi

    2

    ,

    (12)

    where vi = s(1 1 zi)1/2, zi = X

    (i) 9

    n 1 1[X (i)9X (i)] 2 1X (i)n 1 1, i = u, v

    with X (i)n 1 1 de ned in (8) [ k = n 1 1] and X(i) is an (n pi)

    matrix of the partial derivatives of the type de ned in (9).The two normal densities g(u)n 1 1 and g

    (v)n 1 1 under models `u

    and `v , respectively, as speci ed by (12), are used in (11) toobtain [Corollary, Appendix B]

    Gu,v(j n1 1) = 1 24v2uv

    2v

    (v2u 1 v2v )

    1/4

    exp 2 (1/4)(y(u)n 1 1 2 y

    (v)n 1 1)

    2

    v2u 1 v2v

    1/2

    . (13)

    The Weights

    In the case ofmmodels there are (m2 ) pairwise distances ofthe type (13) all of which must be maximized simulta-neously. While doing so it is also important to note that atany stage in the sequential discrimination, a pair consistingof closer models should be given more importance indesigning new experiments as compared to the one withcomparatively farther (in the sense of statistical distance)models. This can be done by attaching appropriate weightsto the pairwise distances composing the design criterionfunction. One set of weights which can do this jobeffectively is de ned as

    wu,v;n =D(u)ncD(v)n

    , if D(u)n D(v)n , (14)

    where D(u)n is given by (10) and c is the normalizing constant

    c =m 2 1

    i = 1

    m

    j = i 1 1

    D(i)nD( j )n

    , for D(i)n

  • done with other discrimination procedures reported inthe literature. The sequential approach is adopted in allthe applications. At each stage, the current standing ofthe rival models is assessed through the discriminationcriteria prescribed by the methods being considered in theexamples. In the Box-Hill9 method discrimination is doneon the basis of the posterior probabilities of the rivalmodels: namely,

    P(i)n =P(i)n 2 1 f

    (i)n

    m

    j = 1

    P( j )n 2 1 f( j)n

    , i = 1, 2, . . . ,m,

    where P(i)n is the probability of the model `i at the n-thstage, P(i)n 2 1 is its prior probability, and f

    (i)n the p.d.f. of the

    n-th observationYn undermodel `i . For discriminationamongthe rival models by the method of Buzzi-Ferraris and

    Forzatti16 the signi cance of the unexplained variance

    S (i) =

    n

    k = 1

    (yk 2 y(i)k )

    2

    n 2 pi, i = 1, 2, . . . ,m

    is utilized, where y(i)k is the value of the response estimatedfrom model i at jk . As proposed earlier, the discriminationachieved at each stage in the present method is assessedby the statistic de ned in (10).So far as designing new experiments is concerned, the

    design criterion

    B(jn1 1

    ) = maxjn 1 1

    1

    2

    m 2 1

    i = 1

    m

    j = i 1 1

    P(i)n P( j )n

    s2i; n 1 1 2 s

    2j; n1 1

    2

    s2 1 s2i; n 1 1 s2 1 s2j; n1 1

    y(i)n1 1 2 y( j )n 1 1

    2

    1

    s2 1 s2i; n1 11

    1

    s2 1 s2j; n 1 1

    is used in the Box-Hill9 method, where s2i; n 1 1 is thevariance of the (n 1 1)-th value of the response predictedfrom model `i . For designing more points in the procedureproposed by Buzzi-Ferraris and Forzatti16, the criterionfunction

    T (jn1 1

    ) =

    m 2 1

    i = 1

    m

    j = i 1 1

    y(i)n1 1 2 y( j )n 1 1

    2

    (m 2 1) ms2 1m

    i = 1

    s2i; n1 1

    is maximized, subject to the constraint T > 1, where s2 isthe pooled estimate of the error variance using all the rivalmodels. While using the present method, the criterionfunction w (j

    n 1 1) given by (15) is maximized with respect to

    the input variable(s) jn 1 1

    for designing discriminatingexperiments and the sequential procedure is stoppedaccording to the termination criterion (16).

    Example 1. Discrimination Among Linear Models

    The present procedure is illustrated, rst, by discriminat-ing among four linear models considered by Box and Hill9.Initially, the same set of ve values of the dependentvariable Yk are used as have been generated by them throughthe model: y = 1 1 j 1 j2 1 e, with e distributed as N(0, 1)and j = 0.0, 1.0, 2.0, 3.0, 4.0. A model is to be selected fromamongst the models:

    M (1) : g(1) = h(1)1 j,

    M (2) : g(2) = h(2)1 1 h(2)2 j,

    M (3) : g(3) = h(3)1 1 h(3)2 j 1 h

    (3)3 j

    2,

    M (4) : g(4) = h(4)1 j 1 h(4)2 j

    2

    which could describe the data most adequately.The present procedure is used to con rm if it could

    identify the correct model. Based on the constructed data,the value of the discrimination index for M (3) is found to

    661ON ESTABLISHING THE CREDIBILITY OF A MODEL FOR A SYSTEM

    Trans IChemE, Vol 76, Part A, September 1998

    Figure 1. Sequential procedure for identifying the most credible model.

  • be minimum (D(3)=0.0209), thereby showing that this modelis the best representation of the data. It also indicates thatM (1) with the maximum value of DI (D(1) = 0.4573) is thepoorest model. Using the same set of data, Box and Hill9

    have obtained maximum posterior probability 0.66 formodel 3, but according to their procedure this value is notlarge enough to declare it as the best model.In order to further testify the claim made by the present

    method, one more value of j is designed through the designcriterion (15). The maximization of the criterion functionw , with j constrained to lie in the range 0.0 # j # 4.0,results in j = 4.0. When another value of the response Ycorresponding to this point is appended to the initial set ofdata, the value ofD(3) drops to 0.0012,while that ofD(1) risesto 0.6540. The values 0.3058 and 0.0390 of DI for themodels M (2) and M(4), too, are suf ciently higher than thatforM (3). Thus, enough evidence is shown again in favour ofM (3) as the best model. This claim is further strengthenedwhen one more point j = 3.8 is designed through the presentcriterion, as it has raised the level of D(1) to 0.7099. Theprogress in discrimination can be seen in Table 1. Whereasthe DI value for M (1) keeps increasing, that for M (3) keepsdeceasing. At the 7-th run when the procedure is stoppedaccording to the termination criterion (16), the value ofD(3) = 0.0002 is much less than the values D(1) = 0.7099,D(2) = 0.2807, andD(4) = 0.0097. The results obtained fromthe present method are plotted in Figure 2 which clearlydepicts thatM(3) consistently shows up as the best model andthat M (4) is the closest model to M(3).A reference to the results obtainedsequentiallybyBoxand

    Hill9 [values bracketed as (.) in Table 1.] shows that twoadditional points designed through their criterion are notdecisive; the value 0.88 which P(3) attains at the sixth runsuddenly drops to 0.75 at the seventh run. In order to drawconclusions on the basis of a uctuating model probabilityfor M (3), two additional points have been designed by Boxand Hill9. It is only at the 9-th stage that the probability forone of themodels (M (3)) rises to 0.97which, according to them,is highenoughtodeclare thismodelas themostprobablemodel.A comparison of the two methods considered in this

    example shows that the Box-Hill9 method requires 4additional observations to be able to declare M (3) as thebest model, whereas in the present method, the initial data or

    at the most one additional point suf ces to draw theconclusion that M (3) is the best representation of the data.The termination, however, is done at the 7-th run accordingto the proposed criterion.

    Example 2. Discrimination Among Nonlinear Models

    In another discrimination problem, the data are simulatedfrom the model

    M (o) : g = 10 1 100[1 2 exp(2 0.115j)] 1 e,where the random deviation e is distributed normally withzero mean and variance s2 = 1. The constructed data areused to discriminate among ve nonlinear models:

    M (1) : g(1) = h(1)1 1 h(1)2 1 2 exp( 2 h

    (1)3 j) ,

    M (2) : g(2) = h(2)1 1 h(2)2

    h(2)3 j

    1 1 h(2)3 j,

    M (3) : g(3) = h(3)1 1 h(3)2 tan

    2 1(h(3)3 j) ,

    M (4) : g(4) = h(4)1 1 h(4)2 tanh(h

    (4)3 j) ,

    M (5) : g(5) = h(5)1 1 h(5)2 exp

    2 h(5)3j

    .

    Initially, ve data points are generated at j = 20.0, 40.0,60.0, 80.0, 100.0. In this example only the present procedureis used to identify the correct model. The results are shownin Table 2. At the initial stage, the values of thediscrimination index for the ve models are, respectively,0.1258, 0.2413, 0.2068, 0.1845, 0.2416. These values showthat M (1) with the smallest value of DI is most af ne to themodel which generated the data. Another revelation formthese results is the high level of af nity between the modelsM (2) and M (5). The overall discrimination being notsuf ciently sharp, one more point j6 = 26.456 is designedby the criterion function (15) and the response valuesimulated by M (o) is appended to the data. This time againthe value of D(1) happens to be the smallest (0.0842) and

    662 SINGH

    Trans IChemE, Vol 76, Part A, September 1998

    Table 1. Sequential discrimination by Discrimination Index and Posterior Probability: Linear models.

    Discrimination CriterionRun Input variable Response

    D(1)k D(2)k D

    (3)k D

    (4)k

    k jk yk (p(1)k ) ( p

    (2)k ) (p

    (3)k ) (p

    (4)k )*

    1 0.0 1.4312 1.0 2.5753 2.0 7.5404 3.0 11.7655 4.0 20.442 0.4573 0.3075 0.0209 0.2143

    (0.00) (0.01) (0.66) (0.33)6 4.0 20.692 0.6540 0.3058 0.0012 0.0390

    (0.0) (1.837) (0.00) (0.00) (0.88) (0.12)7 3.8 19.887 0.7099 0.2807 0.0002 0.0097

    (0.0) (0.140) (0.00) (0.00) (0.75) (0.25)8 (0.0) (1.686) (0.00) (0.00) (0.90) (0.10)9 (0.0) (1.714) (0.00) (0.00) (0.97) (0.03)

    * Values bracketted as ( ) are from Box and Hill9.

  • those of D(2) and D(5) still remain very close to each other,though higher than that ofD(1). As more points: j7 = 75.890and j8 = 96.245 are designed by means of the presentdesign criterion and more observations are appended to thesample, the value of D(1) decreases consistently at a fasterrate (from 0.0842 to 0.007) than the values of D(3) and D(4)

    do, which, respectively, change from 0.1856 to 0.1524 and0.1629 to 0.1079. On the other hand, D(2) and D(5) keepincreasing and, nally, at the termination stage (8-th run)have risen to 0.3421 and 0.3906, respectively. The trends ofDI for the ve rival models over sequential stages can beseen in Figure 3. When the procedure is stopped at the 8-thrun, it is concluded that M (1) is the most credible model,followed by M(3) and M(4) as the next best choices and thatM (2) and M(5) are bad models.

    Example 3. Discrimination Among Kinetic Models

    In this example, the present method is compared not onlywith the Box-Hill9 method but also with the one proposedby Buzzi-Ferraris and Forzatti16. Consider the followingnonlinear kinetic models proposed for the synthesis of

    methanol from carbon monoxide and hydrogen (Buzzi-Ferraris and Forzatti16):

    M (1) : g(1) =(j1j

    22 2 j3/h

    (1)1 )

    {h(1)2 1 h(1)3 j1 1 h

    (1)4 j2 1 h

    (1)5 j3}

    2,

    M (2) : g(2) =(j1j

    22 2 j3/h

    (2)1 )

    {h(2)2 1 h(2)3 j1 1 h

    (2)4 j2 1 h

    (2)5 j1j2}

    ,

    M (3) : g(3) =(j1j

    22 2 j3/h

    (3)1 )

    {h(3)2 1 h(3)3 j3 1 h

    (3)4 j2 1 (h

    (3)5 j3/j2)}j

    22

    ,

    M (4) : g(4) =(j1j

    22 2 j3/h

    (4)1 )

    {h(4)2 1 h(4)3 j1 1 (h

    (4)4 j3/j2) 1 h

    (4)5 j3}

    2j2,

    M (5) : g(5) =(j1j

    22 2 j3/h

    (5)1 )

    {h(5)2 1 h(5)3 j2 1 (h

    (5)4 j1j2) 1 h

    (5)5 j3}

    2.

    The data constructed by Buzzi-Ferraris and Forzatti16 fromthe model

    M (o) : g(o) =(j1j

    22 2 j3/1.7 10 2 5)

    {1704 1 4.25j2 1 0.241j1j2 1 444.6j3}2

    1 e

    are used, where e is considered as a pseudo random numberfrom the normal distribution: N(0.0, 4.0 102 6) and theinput variables j1, j2, j3 are restricted to lie in the ranges15 # j1 # 25, 200 # j2 # 250, 5 # j3 # 10. Initially, 8values of the response, based on a 23 factorial design, areused to discriminate among the proposed models. Theresults from the three methods are shown in Table 3, wherethe values bracketed as ( ) and [ ] correspond, respectively,to the Box-Hill9 and Buzzi-Ferraris and Forzatti16 methodsand are reported from the latter reference.Using the present criterion, the constructed data yield

    0.1621, 0.4295, 0.1723, 0.1645, 0.0716 as the values of thediscrimination index for modelsM (1), M (2), M (3), M (4), M (5),respectively. A comparison of these values not only showsthe superiority ofM (5) over other models, but also indicatesthat M (2) with the highest value of DI can be rated as thethe poorest model. With the same set of eight data points,the Box-Hill9 method produces 0.3, 0.0007, 0.04, 0.329,0.329 as the respective posterior probabilities of the vemodels. Based on their discrimination criterion (model

    663ON ESTABLISHING THE CREDIBILITY OF A MODEL FOR A SYSTEM

    Trans IChemE, Vol 76, Part A, September 1998

    Figure 2. Progress in discrimination among linear models over sequentialstages, as assessed by the Discrimination Index.

    Table 2. Sequential discrimination by Discrimination Index: Nonlinear Models.

    Run Input variable Response Discrimination Index

    k jk yk D(1)k D

    (2)k D

    (3)k D

    (4)k D

    (5)k

    1 20.0 98.57212 40.0 108.0863 60.0 108.64674 80.0 110.74305 100.0 110.7525 0.1258 0.2413 0.2068 0.1845 0.24166 26.456 103.2452 0.0842 0.2838 0.1856 0.1624 0.28407 75.890 109.4943 0.0117 0.3396 0.1641 0.1448 0.33988 96.245 110.4421 0.0070 0.3421 0.1524 0.1079 0.3906

  • probabilities), these values hardly show any distinctionbetween the models M (1),M (4), and M (5). However, at thisstage, the criterion does indicate that M (2) withP(2) = 0.0007 is likely to be a bad model. At this stage,the conclusion drawn through the method of Buzzi-Ferraris

    and Forzatti16 is completely different from that of Box andHill9 : the values of the variance estimates 4.83 10 2 6,16.8 10 2 6, 8.78 10 2 6, 4.65 10 2 6, 4.65 102 6 corres-ponding, respectively, to the ve competing models showthat all the models are adequate [Table 3.].

    664 SINGH

    Trans IChemE, Vol 76, Part A, September 1998

    Figure 3. Progress in discrimination among nonlinear models oversequential stages, as assessed by the Discrimination Index.

    Figure 4. Af nity levels of the mechanistic models, as measured by theDiscrimination Index over sequential stages. mechanistic models.

    Table 3. Sequential discrimination by Discrimination Index, Estimated Variance, and Posterior Probability: Mechanistic Models.

    Discrimination Criterion

    D(1)k D(2)k D

    (3)k D

    (4)k D

    (5)k

    Run Input variables Response ( p(1)k ) ( p(2)k ) (p

    (3)k ) (p

    (4)k ) (p

    (5)k )*

    k jk yk 102 [S(1)k 106] [S

    (2)k 106] [S

    (3)k 106] [S

    (4)k 106] [S

    (5)k 106]*

    1 17 210 6 1.2292 23 210 6 1.5173 17 240 6 1.7714 23 240 6 2.0885 17 210 9 0.1046 23 210 9 1.0417 17 240 9 0.7648 23 240 9 1.097 0.1621 0.4295 0.1723 0.1645 0.0716

    (0.3000) (0.0007) (0.0400) (0.329) (0.329)[4.83] [16.8] [8.78] [4.65] [4.65]

    9 25 202 5 2.202 0.1175 0.5833 0.1346 0.1297 0.0349[25] [230] [5] [2.530] (0.293) (1.0E-7)1 (0.0030) (0.352) (0.353)

    [3.87] [18.0] [8.26] [3.73] [3.72]10 25 250 5 2.910 0.0685 0.7414 0.0821 0.0982 0.0098

    [15] [250] [5] [1.500] (0.322) (0.0000) (0.0007) (0.322) (0.355)[5.11] [15.0] [6.88] [5.35] [5.23]

    11 25 206 5 2.011 0.0389 0.8268 0.0562 0.0761 0.0020[25] [250] [10] [1.030] (0.408) (0.0000) (0.0001) (0.2690) (0.324)

    [4.68] [19.5] [6.81] [5.16] [5.06]12 25 230 10 1.013 0.0188 0.9035 0.0317 0.0452 0.0008

    [25] [200] [5] [2.221] (0.570) (0.0000) (0.00003) (0.167) (0.263)[4.13] [20.1] [5.96] [4.94] [4.67]

    * Values bracketted as ( ) and [ ] are from Buzzi-Ferraris and Forzatti2.1 1.0E-7 = 1.0 102 7 .

  • All the discrimination procedures are further continued.One more setting (25, 202, 5) of the input variables j1, j2, j3is designed through the criterion function (15) [presentmethod]. After one more data point corresponding to thissetting is appended to the initial data, the value of D(5) dropsto 0.0349 and that of D(2) rises to 0.5833. As designing ofexperiments by the present method is further carried outsequentially, the value of D(5) drops to 0.0098 followedby 0.0020 and then to 0.0008 at the 12-th run when thetermination criterion suggests to stop. D(5) always remainsthe lowest of the values of DI for the rival models. Thus,M (5) consistently shows up as the correct model. Similarly, asigni cant increase from 0.4295 at the 8-th run to 0.9035 at12-th run in the value of D(2) clearly indicates that M (2) isthe worst model. The values of DI for the remaining threemodels; namely M (1), M(3), and M(4) always remain belowthe value ofD(2) and above that ofD(5). However, throughoutthe sequential procedure M(1), M (3), and M (4) remain closeto one another. The af nity levels of the rival models atdifferent sequential stages are plotted in Figure 4 whichshows the progress in discrimination.The results in Table 3. and Table 4. show that throughout

    the Box-Hill sequential procedure the values of P(2) and P(3)

    keep decreasing (Buzzi-Ferraris and Forzatti16) and nally,at the 30-th run go as low as 0.0 [Table 4.]. This shows thatM (2) and M(3) are poor models. The probability for M (1) hasincreased to 0.926 at the 30-th run. It is concluded throughthe Box-Hill9 method that M(1) is the best model represent-ing the data generated by the modelM(5). The latter is rathershown to be a poor model, as its probability reduces to 0.01at the 30-th run [Table 4.]. M(4) is also shown to be a poormodel as its probability keeps decreasing and, nally,attains a low value of 0.06. Thus, even after 22 additionalruns the Box-Hill9 procedure fails to identify the correctmodel.As a requirement, more points have been designed by

    Buzzi-Ferraris and Forzatti16, too [Table 4.]. They havedeclared M (2) as a bad model right in the beginning, as itproduces a signi cantly large value 16.8 102 6 of S (2). Theresults obtained by them show that all other models: M (1),M (3), M (4), and M (5) keep passing the adequacy test used asthe discrimination criterion in their procedure. At the 30-thrun when they stop their procedure, it is concluded thatleaving M (1), all other models are equivalent on statistical

    665ON ESTABLISHING THE CREDIBILITY OF A MODEL FOR A SYSTEM

    Trans IChemE, Vol 76, Part A, September 1998

    Table 4. Sequential Progress in discrimination by the Estimated Variance (13 to 30)-th run.

    Discrimination CriterionRun Input variables Response

    (p (1)k ) ( p(2)k ) (p

    (3)k ) (p

    (4)k ) (p

    (5)k )

    k jk yk 102 [S (1)k 106] [S(2)k 106 ] [S

    (3)k 106] [S

    (4)k 106] [S

    (5)k 106]

    13 23 240 5.0 2.590 (0.693) (0.0) (7.0E-6)1 (0.103) (0.204)[3.79] [21.2] [5.31] [4.39] [4.19]

    14 25 250 5.0 3.190 (0.724) (0.0) (3.0E-6) (0.10) (0.177)[4.19] [24.8] [4.97] [4.26] [4.34]

    15 15 250 5.0 1.632 (0.755) (0.0) (1.0E-6) (0.09) (0.153)[3.85] [22.7] [4.55] [3.92] [3.99]

    16 19 218 6.8 1.241 (0.781) (0.0) (4.0E-7) (0.09) (0.130)[3.77] [23.4] [4.96] [3.83] [3.90]

    17 21 218 6.8 1.051 (0.798) (0.0) (2.0E-7) (0.09) (0.115)[3.92] [22.3] [4.48] [3.95] [4.01]

    18 19 232 6.8 1.112 (0.808) (0.0) (5.0E-8) (0.09) (0.104)[3.86] [22.1] [4.26] [3.86] [3.92]

    19 21 232 6.8 1.784 (0.825) (0.0) (2.0E-8) (0.08) (0.09)[4.22] [20.1] [4.73] [4.25] [4.30]

    20 19 218 8.2 0.653 (0.843) (0.0) (4.0E-9) (0.08) (0.08)[4.01] [19.4] [4.49] [4.05] [4.09]

    21 21 218 8.2 1.123 (0.856) (0.0) (1.0E-9) (0.08) (0.07)[4.05] [18.3] [4.48] [4.07] [4.13]

    22 19 232 8.2 0.675 (0.868) (0.0) (0.0) (0.08) (0.06)[4.15] [18.6] [4.41] [4.16] [4.22]

    23 21 232 8.2 1.002 (0.877) (0.0) (0.0) (0.07) (0.05)[3.96] [18.4] [4.30] [3.97] [4.03]

    24 20 225 7.5 0.905 (0.884) (0.0) (0.0) (0.07) (0.04)[3.93] [18.2] [4.22] [3.94] [3.97]

    25 15 225 7.5 0.883 (0.894) (0.0) (0.0) (0.07) (0.04)[4.04] [17.4] [4.41] [4.06] [4.10]

    26 25 225 7.5 1.250 (0.904) (0.0) (0.0) (0.07) (0.03)[4.14] [17.9] [4.60] [4.17] [4.21]

    27 20 200 7.5 0.967 (0.913) (0.0) (0.0) (0.06) (0.02)[4.14] [17.2] [4.59] [4.16] [4.22]

    28 20 250 7.5 1.480 (0.922) (0.0) (0.0) (0.06) (0.02)[4.01] [16.5] [4.46] [4.03] [4.07]

    29 20 225 5.0 1.780 (0.926) (0.0) (0.0) (0.06) (0.01)[4.04] [16.0] [4.41] [4.04] [4.11]

    30 20 225 10 0.850 (0.926) (0.0) (0.0) (0.06) (0.01)[4.16] [15.4] [4.46] [4.14] [4.21]

    Values in the body of this table reproduced from Buzzi-Ferraris and Forzatti2.1 7.0E-6 = 7.0 10 2 6 .

  • grounds. Although, their method does not reject M (5),supposed to be the true model, yet it fails to distinguishM (5)

    from M (1),M (3), andM (4). Keeping in view the total number( ve)of the rivalmodels,four is rathera largenumberselectedas equally adequate models for reasonable discrimination.It can be seen in Table 3 that at the 12-th run when the

    present procedure is stopped, a reasonably sharp discrimi-nation has already been achieved, whereas the other twoprocedures are inconclusive.

    RESULTS AND DISCUSSION

    The comparison of the present procedure with thosereported in the literature clearly shows that it convergesmuch faster to the correct model. This can save not only theexperimental material, but also the experimental effort. Infact, in certain situations this saving could be appreciable,while in others it may be an essential requirement. Theeffectiveness of the procedure results from its importantfeature that it emphasizes more on picking up potentiallygood models rather than wasting experimental runs indetecting bad ones.It is also important to note that the discrimination

    achieved by the procedure proposed in this work is muchsharper as compared with other procedures. Unlike theposterior probabilities used in the Box-Hill9 procedure, thediscrimination index does not show oscillating behaviourso that the sequential procedure ends more decisively.Since the present procedure makes use of a measure of

    af nity ( statistical distance) as the decision criterion, thecompeting models can be rated according to their ability todescribe the data, i.e., their credibility in explaining theunderlying phenomenon. This ordering has the advantagein that the next best choice can be used if due to certainreasons, such as cost or complexity of the model, etc., themodel selected as the best cannot be used in a givensituation.The procedure proposed in this work has an additional

    advantage of improving the estimates of the parameters.This happens quite naturally, as the akinness of the bestmodel to the system keeps increasing. Thus, nally, whenthe procedure is terminated, the investigator is not only ableto achieve reasonable discrimination but also has attainedconsiderable improvement in the selected model.The method is especially useful in kinetic modelling

    where knowledge of the rate determining step is notpossible, but is most desired as it can help considerablyin understanding the kinetics of reaction. The presentmethod can be used in establishing the credibility of a singlemodel (best) from amongst those which are postulated onthe basis of possible rate determining steps in a givensituation. The choice of the most credible model suggeststhe corresponding step as the step which controls the rate ofreaction.The discrimination and design criteria used in the

    examples are based on the assumption that the randomerrors are distributed normally. In fact, most of themeasurements in nature behave according to the Normalprobability law, while in many other situations the validityof this assumption can be justi ed. Also, it is the limit towhich many other distributions approach when the samplesize is large. Besides, as desirable, by specifying the Normaldistribution one assumes the least possible extraneous

    information. However, if there is no reason to believe thatthis assumption holds good, the approach proposed in thiswork can still be used by deriving new forms of thediscrimination criterion function C ( f (o), f (i)) in (3) andthe design criterion function Gu,v(jn 1 1) in (11). In thesituations where one completely lacks information aboutthe distribution of random errors nor can one assumea distribution, what is needed is the development of adistribution-free discrimination method.

    APPENDIX A

    Lemma. Let Yl be represented by model i asYl = g

    (i)l (j l, h

    (i)) 1 el, where el is distributed as N(0,s2).Assume that the model function g(i) can be linearized aboutsome value h(i ). Then the posterior distribution of Yl is

    N y(i)l ,s2(1 1 X (i)

    9

    l [X(i) 9 X (i)] 2 1X (i)l )

    with X (i) and X (i)l de ned in equations (8) and (9) [text].

    Proof:Let f denote the conditionalp.d.f. of Yl, giveng(i)l and

    h the p.d.f. of g(i)l . Then the posterior p.d.f. of Yl can beobtained from the formula

    g(i )l (yl) = f(i)(yl/g

    (i)l )h

    (i )(g(i)l )dg

    (i )l . (A.1)

    Of the two densities involved in the integrand in (A.1), the rst can be written

    f (i)l (yl/g(i )l ) = (2ps

    2) 2 1/2 exp 21

    s2yl 2 g

    (i)l

    2,

    i = u, v. (A.2)

    To derive the second, consider the model

    Yl = g(i)l (j l, h

    (i)) 1 el (A.3)

    and assume that the model function g(i)l can be linearized inthe parameter space Rh. Then from (A.3)

    g(i )l 2 y

    (i)l = X

    (i) 9

    l (h(i) 2 h

    (i)), (A.4)

    where g(i)l = E

    (i)(Y (i )l ) and y(i )l = g

    (i)l (jl,

    h(i)). The relation(A.4) suggests that the distributionof g(i) l will be the same asthat of X (i)

    9

    l h(i). The distribution (posterior) of h(i) must,

    therefore, be obtained as a rst step.Let th denote the prior density of_h

    (i ). Then the posteriordensity rh of h

    (i) is given by

    rh(h(i)/y) =

    L(h(i)/y)th(h(i))

    Rh

    L(h(i)/y)th(h(i))dh(i)

    , (A.5)

    where L(h(i)/y) is the likelihood function. Based on nindependent observations y = (y1, y2, . . . , yn)

    9 , the likeli-hood can be written

    L(h(i)/y) = (2ps2) 2 n/ 2exp 21

    2s2_e(i ) 9 _e

    (i) , (A.6)

    666 SINGH

    Trans IChemE, Vol 76, Part A, September 1998

  • where e(i) = (e(i)1 , e

    (i)2 , . . . , e

    (i)n )

    9with e

    (i)k = yk 2 y

    (i)k represent-

    ing the residual error. Linearizing the model function g(i)

    around h(i) the residual error can be expressed as

    e(i)k = yk 2 y

    (i )k 2 X

    (i) 9

    k h(i) 2 h

    (i),

    that is, e(i)k = e

    (i)k 2 X

    (i) 9

    k (h(i) 2 h(i)), where e(i)k = yk 2 y

    (i)k is

    the discrepancy between the value generated by the systemand the one simulated by the model `i for given jk .Also X (i) is an (n pi) matrix of the partial derivativesx(i)k,t, k = 1, 2, . . . , n; t = 1, 2, . . . , pi, de ned in (9) [text].Thus the likelihood function in (A.6) becomes

    L(h(i)/y) = 2ps22 n/ 2

    exp 21

    2s2e(i) 2 X(i)(h(i) 2 h

    (i))

    9

    e(i) 2 X (i )(h(i) 2 h(i)) ,

    where e(i) is an n 1 vector of the discrepancies, e(i)k . Let h(i)

    be the maximum likelihood estimate of h(i). Then_e(i)

    9X (i) = 0. This, in turn, reduces the likelihood function

    to the form

    L(h(i)/y) = 2ps22 n/ 2

    exp 21

    2s2e(i)

    9e(i) 1 (h(i ) 2 h

    (i))9X (i)

    9X(i)(h(i) 2 h

    (i)) .

    (A.7)

    Assume uniform prior distribution of h(i), so that

    th(h(i))3c, (A.8)

    where c is a constant. Using (A.7) and (A.8) in (A.5)

    rh(h(i)/y)

    =exp 1

    2s2h(i) 2 h(i)

    9

    X (i)9X (i) h(i) 2 h(i)

    exp1

    2s2h(i) 2 h(i)

    9

    X (i)9X (i) h(i) 2 h(i) dh(i)

    ,

    that is,

    rh(h(i)/y) = (2ps2)pi /2X(i)

    9X(i)

    2 1/2

    exp 21

    2s2h(i) 2 h(i )

    9

    X (i)9X (i) h(i) 2 h(i) .

    This shows that h(i) is distributed as

    Npih(i), (X (i)

    9X (i)) 2 1s2 .

    It then follows from the relation (A.4) that g(i)l is distributed

    normally with mean y(i)l and variance X(i) 9

    l [X(i)9X(i)] 2 1X (i)l s2.

    The p.d.f. of g(i)l can accordingly be written

    g(i)(g(i)l ) = (2ps2zi)

    2 1/ 2 exp1

    2s2zig(i)l 2 y

    (i)l

    2

    (A.9)

    where zi = X(i) 9

    l [X(i)9X(i)]2 1X (i)l .

    The two densities f (i)l in (A.2) and g(i) in (A.9) can be

    used in (A.1) to give posterior the p.d.f. of Yl under model`i as

    g(i)l (yl) = (2pv2i )2 1/ 2 exp 2 (1/2)

    (yl 2 y(i)l )

    vi

    2

    ,

    (A.10)

    where vi = s(1 1 zi)1/ 2, i = u, v.

    APPENDIX B

    Lemma. Let Nr(a 1,L1) and Nr(a 2,L2) be two r-variatenormal probability distributions P1 and P2. Then thedistance between P1 and P2 is given by

    h( f1, f2) =4 L1L2

    L1 1 L22

    1/4

    exp 21

    4_a1 2 _a2

    9

    L1 1 L22 1

    _a1 2 _a2 .

    (B.1)

    Proof: The p.d.f. of an r-random vector y , distributed asNr(a i,Li), can be written

    fi(y) = (2p)rLi

    2 1/2

    exp 21

    2(y 2 _ai)

    9L 2

    1i (y 2 _ai) , i = 1, 2,

    (B.2)

    where Li is r r positive de nite symmetric matrix, Lidenotes the determinant of the matrix Li , L

    2 1i stands for

    its inverse, and y is an r-vector of observations. Substitutingfor f1 and f2 from (B.2) in the function

    h( f1, f2) = f1(y)f2 (y)1/2

    dy1/2

    ,

    the distance between f1 and f2 can be written

    h( f1, f2) = (2p)2 r L1L2

    2 1/4

    exp 21

    4(y 2 _a1)

    9L 2

    11 (y 2 _a1)

    1 (y 2 _a2)9L 2

    12 (y 2 _a2) dy. (B.3)

    Combining the two quadratic forms in the exponent of theintegrand in (B.3)

    (y 2 _a1)9L 2

    11 (y 2 _a1) 1 (y 2 _a2)

    9L 2

    12 (y 2 _a2)

    = (y 2 _a*)9L*(y 2 _a*)

    1 (_a1 2 _a2)9(L1 1 L2)

    2 1(_a1 2 _a2), (B.4)

    where a* = (L1 1 L2) 21(L2a 1 1 L1a 2),L* = L

    2 11 (L1 1

    L2)L2 12 . Using these relations in (B.3) and recognizing the

    resulting integral (corresponding to the rst quadratic form

    667ON ESTABLISHING THE CREDIBILITY OF A MODEL FOR A SYSTEM

    Trans IChemE, Vol 76, Part A, September 1998

  • on the right hand side of (B.4) as the normal integral, thefunction h can be written

    h( f1, f2) =4 L1L2

    L1 1 L22

    1/4

    exp 21

    4(_a1 2 _a2)

    9(L1 1 L2)

    2 1(a1 2 a 2) .

    (B.5)

    Then the distance between f1 and f2 can also be written

    G( f1, f2) = 1 24 L1L2

    L1 1 L22

    1/4

    exp 21

    4(_a1 2 _a2)

    9(L1 1 L2)

    2 1(_a1 2 a 2)1/2

    .

    (B.6)

    Taking log in (B.5) and changing sign

    C( f 9 1, f 9 2) =1

    8(_a1 2 _a2)

    9 L1 1 L22

    2 1

    (_a1 2 _a2)

    11

    42loge

    L1 1 L22

    2 loge L1 2 loge L2

    (B.7)

    Corollary. When the p.d.f.s f1, f2 correspond to theunivariate random variable, so that the probability distribu-tions are N(a(1),l1) and N(a

    (2),l2), the distance betweenthese distributions can be obtained by replacing the meanvectors a1, a2 by the corresponding means a

    (1), a(2) and thevariance-covariance matrices L1, L2 by the correspondingvariances l(1), l(2) in (B.6). Thus,

    G( f1, f2) = 1 24l(1)l(2)

    (l(1) 1 l(2))2

    1/4

    exp 21

    4

    (a(1) 2 a(2))2

    l(1) 1 l(2)

    1/2

    . (B.8)

    REFERENCES

    1. Kabel, R. L. and Johanson, L. N., 1962, AIChE J., 8: 621.2. Buzzi-Ferraris, G. and Donati, G., 1971, Ing Chem Ital, 7: 53.3. Dumez, F. J. and Froment, G. F., 1976, Ind Eng Chem Proc Des Dev,

    15: 291.4. Hunter, W. G. and Riener, A. M., 1965, Designs for discriminating

    between two rival models, Technometrics, 7: 307323.5. Roth, P. M., 1965, Design of experiments for discriminating among

    rival models, PhD Thesis (Princeton University, USA).6. Hosten, L. H. and Froment, G. F., 1976, Non bayesian sequential

    experimental design procedures for optimal discrimination betweeneival models, Proc Fourth Int Symp on Chemical Reaction Engineer-ing, Heidelberg, I1I13.

    7. Meter, D. A., Pirie, W. and Blot,W., 1970,A comparison of two modeldiscrimination criteria, Technometrics, 12: 457470.

    8. Reilly, P. M., 1970, Statistical methods in model discrimination,Can JChem Eng, 48: 168173.

    9. Box, G. E. P. and Hill, W. J., 1967, Discriminating among mechanisticmodels, Technometrics, 9: 5771.

    10. Siddik, S. M., 1972, Kullback-Leibler information function and thesequential selection of experiments to discriminate among severallinear models, PhD Thesis, (University of Pennsylvania, USA).

    11. Froment, G. and Mezaki, R., 1970, Sequential discrimination andetimation procedures for rate modelling in heterogeneous catalysts,Chem Eng Sci, 25: 293300.

    12. Wentzheimer, W. W., 1970, Modelling of heterogeneous catalyzedreactions using statistical experimental design analysis, PhD Thesis,(University of Pennsylvania, USA).

    13. Atkinson, A. C. and Cox, D. R., 1974, Planning experiments fordiscriminating between models, J Royal Statistical Soc, Series B, 36:321348.

    14. Hill, P. D. H., 1978, A review of experimental design procedures forregression model discrimination, Technometrics, 20, 1521.

    15. Hunter, W. G. and Mezaki, R., 1967, An experimental design strategyfor discriminating among rival mechanistic models An application,Can J Chem Eng, 45: 247249.

    16. Buzzi-Ferraris, G. and Forzatti, P., 1983, A new sequential experi-mental design procedure for discriminating among rival models, ChemEng Sci, 38: 225232.

    17. Anderson, T. W., 1984, Introduction to Multivariate StatisticalAnalysis, 2nd edition, (Wiley, New York).

    ADDRESS

    Correspondence concerning this paper should be addressed to DrSantokh Singh, Department of Management Science and InformationSystems, Business School, Rutgers University, 94 Rockafeller Road,Piscataway, NJ 08854-8054, USA.

    The manuscript was received 1 April 1997 and accepted for publicationafter revision 10 December 1997.

    668 SINGH

    Trans IChemE, Vol 76, Part A, September 1998