13
MODEL SELECTION FOR A MULTIRESPONSE SYSTEM SANTOKH SINGH RUTCOR, Rutgers University, Piscataway, New Jersey, USA T he problem of model selection in a multiresponse system is tackled through the statistical distance. The criterion for designing experiments capable of producing the discriminatory observations, as well as the analysis of data for discrimination utilizes this distance. The method gives due importance to the covariance structure of the estimated responses. The information about the distribution of model parameters is also incorporated. A sequential approach is proposed along with the proper termination criterion which keeps a check on the unnecessary experimentation to be carried for achieving the discrimination.When applied to some kinetic modelling problems, the procedure exhibitshigh ef® ciency in picking up the correct model. Keywords: design; discrimination; distance; probability distribution; sequential INTRODUCTION It is not uncommon to encounter situations where a host of models are proposed rather than a single model, each capable of explaining the underlying phenomenon. A model, the most be® tting one, must, therefore, be identi® ed if the analytical tool: the mathematical model is to be used effectively. Handling such situations is, of course, more dif® cult as compared to the ones where a single model is proposed and the task of modelling can be accomplished simply by estimating the parameters involved therein. The problem is further aggravated if the underlying process happens to be a multiresponse process. In fact, in this case the data from the system come as measurements on two or more characteristics, i.e., the responses, which must be considered together if one wishes to extract full information contained in the observations. The situations of this type arise quite naturally in the processes which involve complex chemical reactions and are not uncommon. For example, in the hydrogenolysis of benzothiophene on CoMo/c -Al 2 O 3 catalyst, the interest may be in three responses; namely the conversionsfor benzothiopheneand that for the reaction products dihydrobenzothiophene and ethylbenzene. This obviously represents a multiresponse (three responses) situation. Added to this is the complexity arising due to several models which could be proposed on the basis of possible mechanisms governing the reaction. In fact, for this reaction the mechanisms differ by the adsorption mode of hydrogen, such as competitive or noncompetitive and molecular or atomic. A single model based on a certain mode of adsorption must be selected which could best simulate the behaviour of the responses (conversions). The models structured for the systems involving these types of processes are usually complex and dif® cult to handle, especially when these are nonlinear in their parameters. There are only a few techniques available in the literature which can be used for discriminating among the models postulated for a multiresponse system. Roth 1 is one of the ® rst few who considered the problem of model discrimina- tion in a multivariate setup. The design criterion in his procedure is composed of the absolute divergences of the response values simulated by the rival models and the weights in terms of the model probabilities. The criterion neglects such errors in the divergences as are likely to creep in due to estimation of the model parameters. Besides, the possibility of undue importance being given to some models through the probabilities as weights cannot be ruled out. Hosten and Froment 2 attempted an improvement of Roth’s criterion by using variances of errors as weights. Their procedure as well as that of Roth 1 , however, uses the same criterion for comparing model adequacies; namely the posterior probability of each competing model. But, according to Atkinson 3 such a criterion is not a reliable tool for making inferences about the model adequacy. In an attempt to improve the procedure of Roth 1 and that of Hosten and Froment 2 , instead of using simple divergences Hill and Hunter 4 used entropy as the basis for model discrimination and extended the uniresponse criterion of Box and Hill 5 to the multiresponse case. Nevertheless, the sequential method thus formulated suffers from the same drawbacks as does its univariate analogue: the criterion tends to design points which are more informative about the inadequate models; the oscillating behaviour of the model probabilities is misleading [Atkinson 3 ]; and the procedure tends to support the simpler model at the cost of better models. An improvement over the method of Hill and Hunter 4 was also suggested by Prasad and Rao 6 . This was rather an attempt to remove the ¯ aw in the calculation of the model probabilities which are used both in the design and discrimination criteria of their method. They recommended the use of the expected likelihood in lieu of the point likelihood in calculating the posterior probability. The ef® cacy of their method, however, greatly depends on the models involved as well as on the type of data being used 138 0263±8762/99/$10.00+0.00 Institution of Chemical Engineers Trans IChemE, Vol 77, Part A, March 1999

Model Selection for a Multiresponse System

Embed Size (px)

Citation preview

MODEL SELECTION FOR A MULTIRESPONSE

SYSTEM

SANTOKH SINGH

RUTCOR, Rutgers University, Piscataway, New Jersey, USA

The problem of model selection in a multiresponse system is tackled through thestatistical distance. The criterion for designing experiments capable of producingthe discriminatory observations, as well as the analysis of data for discrimination

utilizes this distance. The method gives due importance to the covariance structure of theestimated responses. The information about the distribution of model parameters is alsoincorporated. A sequential approach is proposed along with the proper termination criterionwhich keeps a check on the unnecessary experimentation to be carried for achieving thediscrimination.When applied to some kinetic modelling problems, the procedure exhibits highef® ciency in picking up the correct model.

Keywords: design; discrimination; distance; probability distribution; sequential

INTRODUCTION

It is not uncommon to encounter situations where a hostof models are proposed rather than a single model, eachcapable of explaining the underlying phenomenon. Amodel, the most be® tting one, must, therefore, be identi® edif the analytical tool: the mathematical model is to be usedeffectively. Handling such situations is, of course, moredif® cult as compared to the ones where a single model isproposed and the task of modelling can be accomplishedsimply by estimating the parameters involved therein. Theproblem is further aggravated if the underlying processhappens to be a multiresponse process. In fact, in this casethe data from the system come as measurements on two ormore characteristics, i.e., the responses, which must beconsidered together if one wishes to extract full informationcontained in the observations. The situations of this typearise quite naturally in the processes which involve complexchemical reactions and are not uncommon. For example, inthe hydrogenolysis of benzothiophene on CoMo/c -Al2O3

catalyst, the interest may be in three responses; namelythe conversions for benzothiopheneand that for the reactionproducts dihydrobenzothiophene and ethylbenzene. Thisobviously represents a multiresponse (three responses)situation. Added to this is the complexity arising due toseveral models which could be proposed on the basis ofpossible mechanisms governing the reaction. In fact, for thisreaction the mechanisms differ by the adsorption mode ofhydrogen, such as competitive or noncompetitive andmolecular or atomic. A single model based on a certainmode of adsorption must be selected which could bestsimulate the behaviour of the responses (conversions). Themodels structured for the systems involving these typesof processes are usually complex and dif® cult to handle,especially when these are nonlinear in their parameters.There are only a few techniques available in the literaturewhich can be used for discriminating among the models

postulated for a multiresponse system. Roth1 is one of the® rst few who considered the problem of model discrimina-tion in a multivariate setup. The design criterion in hisprocedure is composed of the absolute divergences of theresponse values simulated by the rival models and theweights in terms of the model probabilities. The criterionneglects such errors in the divergences as are likely to creepin due to estimation of the model parameters. Besides, thepossibility of undue importance being given to some modelsthrough the probabilities as weights cannot be ruled out.Hosten and Froment2 attempted an improvement of Roth’ scriterion by using variances of errors as weights. Theirprocedure as well as that of Roth1, however, uses the samecriterion for comparing model adequacies; namely theposterior probability of each competing model. But,according to Atkinson3 such a criterion is not a reliabletool for making inferences about the model adequacy. In anattempt to improve the procedure of Roth1 and that ofHosten and Froment2, instead of using simple divergencesHill and Hunter4 used entropy as the basis for modeldiscrimination and extended the uniresponse criterion ofBox and Hill5 to the multiresponse case. Nevertheless, thesequential method thus formulated suffers from the samedrawbacks as does its univariate analogue: the criteriontends to design points which are more informative about theinadequate models; the oscillating behaviour of the modelprobabilities is misleading [Atkinson3]; and the proceduretends to support the simpler model at the cost of bettermodels. An improvement over the method of Hill andHunter4 was also suggested by Prasad and Rao6. This wasrather an attempt to remove the ¯ aw in the calculation of themodel probabilities which are used both in the design anddiscrimination criteria of their method. They recommendedthe use of the expected likelihood in lieu of the pointlikelihood in calculating the posterior probability. Theef® cacy of their method, however, greatly depends on themodels involved as well as on the type of data being used

138

0263±8762/99/$10.00+0.00� Institution of Chemical Engineers

Trans IChemE, Vol 77, Part A, March 1999

in the given situation. Another method which is claimedto be superior to the Hill and Hunter method is the oneproposed by Buzzi-Ferraris et al.7. Based on the minimiza-tion of the likelihood function of the divergences under anypair of models, the procedure keeps shifting its prioritybetween discrimination and parameter estimation.

In the present work, a multivariate probability distribu-tion has been associated with each rival model which, infact, is a natural consequence in a probabilistic phenom-enon. The problem of discriminating among equallyplausible models can, therefore, be looked upon as theproblem of discriminating among the alternative probabilitydistributions. This gives rise to a statistic which measuresthe distance between the probability distributions attribu-table to the system on the one hand and to each of the rivalmodels on the other. This statistic can be used fordiscriminating on the basis of a given set of observations.However, if it fails to do so with the data in hand, morediscriminatory observations are required to accomplishthe task. A criterion is proposed for designing additionalexperiments speci® cally for that purpose. This consists ofmaximizing a weighted function, composed of the distanceswhich are weighted on the basis of discrimination achievedat the previous stage. A sequential approach is proposed.Finally, the procedure is used in discriminating among somereal-life models. It has been seen to result in faster andsharper discrimination as compared to the discriminationachieved through some of the other methods: Hill andHunter4, Buzzi-Ferraris et al.7, Prasad and Rao6, forinstance.

FORMULATION OF THE PROBLEM

A mechanistic model is an abstract formulation aimedat simulating the mechanism of the real world phenomenon.The basic step involved in building a model for a system,therefore, is to establish the true nature of its underlyingprocess. Once this is done, expressing a reasonable faith inthe model is undoubtedly an essential requirement. Howclosely does the model approximate the actual process? Thisis an important question which the modeller must answerbefore actually putting the model to use. The answer to thisquestion becomes all the more important if, based on certainmechanistic principles, more than one model has beenproposed for the system. A single model has to be chosenfrom the given lot which could best describe the underlyingphenomenon.

So far as the probabilistic aspect of the problem isconcerned, however controlled the conditions one mayclaim to have used in the experimentation, randomness issomehow imbued into the process. This introduces random-ness in the nature of the responses. The observations Y k onthe responses of interest are, therefore, composed of twoparts; namely, their true values g o j

k, h and the random

components ek . Thus, an r-response system (r $ 2) can berepresented by a set of r algebraic equations of the form

Y k go

j k , h ek , k 1, 2, ¼, n, 1

where Y k is the vector of observations acquired fromthe system as a result of the k-th setting jk of the inputvariables. The random component ek is assumed to bedistributed normally: Nr 0, S with S representing thecovariance matrix of errors.

On the other hand, in a situation where `m’ models areclaimed capable of describing the system responses, onewould have alternative multivariate representations, such as

Y k gi

kj k, hu e

ik , k 1, 2, ¼, n,

i 1, 2, ¼, m, 2

where g ik

is the set of equations comprising model `i ’ ande i

k , the associated random error vector, distributed asNr 0, Si, k with Si, k representing the error covariance matrixunder model `i ’ .

An immediate consequence of the representations (1)and (2) is that the random vector Y k has true probabilitydistribution P o , say, and alternative probability distributionP i , say, under the hypothesis that the model `i ’ is correct,i 1, 2, ¼, m.

MODEL DISCRIMINATION

The Basis

However close an approximation of the type (2) one mayobtain, there is always an error attributable to the inabilityof the model to describe the underlying mechanism ofthe system. Such an error is re¯ ected in the values of theresponses predicted by the model. Therefore, in order toassess the inadequacy of each of the proposed models, itmakes a lot of sense to look at the statistical discrepancybetween the observations acquired from the system and theresponse values predicted by the model in question. Sincea probability distribution can describe the behaviour ofthe random responses, as established in the previous section,such a discrepancy can be measured by an appropriatemeasure of af® nity between the true probability distributionP o on the one hand and the alternative probabilitydistributions P i , i 1, 2, ¼, m, on the other. If f o andf i , i 1, 2, ¼, m, are the respective multivariate prob-ability density functions (p.d.f.), then this can be done bythe measure [Singh8]

C f o , f i loge f o y f i y 1/2dy. 3

Since the errors are distributed according to an r-variateNormal distribution, this measure is capable of distinguish-ing the distributions with respect to both the nr location andn r r 1 /2 orientation parameters. Having decided onthe measure by which the credibility of a model can beassessed, a usable form of the measure C will be derived inthe following.

The Discrimination Criterion

Consider a sample of `n’ multivariate observations Y k

resulting from n experiments. These data can be consideredas a multivariate sample from the joint distribution of `n’r-variates. Assuming that the sets of observations areindependent from one run to another, the joint distributionof the n vectors of observations Y Y1, Y 2, ¼, Y n can bewritten

f o y 2pnr

Sn 1/ 2

´ exp1

2

n

k 1

yk

g o

kS 1 y

kg o

k, 4

139MODEL SELECTION FOR A MULTIRESPONSE SYSTEM

Trans IChemE, Vol 77, Part A, March 1999

where

y

y11 y12 ¼ y1r

y21 y22 ¼ y2r

..

. ... ..

. ...

yn1 yn2 ¼ ynr

y 1

y 2

..

.

y n

is the data matrix. Similarly, under the hypothesis thatmodel `i ’ is correct, the n-vectors, each of a differentr-response vector Y k , are distributed jointly according tothe distribution with the p.d.f.

f i y 2pnr

n

k 1

S i,k

1/2

´ exp1

2

n

k 1

yk

gi

kS

1i,k y

kg

i

k.

5

Using (4) and (5) in (3), the distance between model `i ’and the true model (i.e., the system) can be obtained[Appendix A]

C f o , f i 1

8

n

k 1

g o

kg i

k

S Si, k

2

1

g o

kg i

k

1

4

n

k 1

2 loge

S Si,k

2loge S loge Si,k .

6

With all the parameters known in (6), C can measure exactlythe discrepancy between model `i ’ and the true modeland hence can assess the ability of the model to describethe given system. Nevertheless, in real life situations thequantities, such as g o , g i , S, Si, k are seldom known.However, with n sets of observations obtained from thesystem and n sets of the corresponding response valuessimulated by the model `i ’ an approximation of C can beobtained

C in

1

8

n

k 1

Y kÃY i

k

V Vi,k

2

1

Y kÃY i

k

1

4

n

k 1

2 loge

V Vi k

2loge V loge Vi,k ,

7

where ÃY i is the set of response values estimated frommodel `i ’ and the estimated matrix V is given by

VSm

i 1Snk 1 e i

k e ik

Smi 1 n

pi

r

, 8

with r as the number of responses and e ik an r-vector of

residuals attributable to model `i ’ which has pi parameters.So far as the estimate Vi,k in equation (7) is concerned, it canbe appreciated that under the hypothesis of model `i’ beingcorrect, there are two sources of errors in the predictionsmade by this model; namely, the errors in the measurementsY k and that in the estimate Ãh i . Both these errors contributeto the difference between the predicted values of theresponse vector and the eventually observed values, i.e.,the measurements. More often than not, there will be some

bias in the predicted ÃY ik . Assuming that this bias is small

compared to other errors involved and that the errors fromthe two sources are statistically independent, an approxima-tion to the covariance matrix of the predictions can beobtained [Appendix B.].

Vi,k V X ik Mi Sh

1X ik , 9

if the distribution of h i is known to be Npiho, Sh ,

V X ik M 1

i X ik , 10

if the distribution of h i is not known,where

M i

n

k 1

X ik V 1X i

k ,

X ik x i

1k , xi

2k , ¼, x irk ,

x ik x i

j k1, xi

jk2, ¼, x ijk p ,

x ijkt

¶g ij

jk, h i

¶h it

h i Ãhi

,j 1, 2, . . . r,

t 1, 2, . . . pi.

The overall picture of the discrimination achieved with nsets of observations is given by the statistic [Singh8]

D in

D in 1C

in

S mj 1D

jn 1C

jn

, i 1, 2, ¼, m, 11

where C in is substituted from (7) and D i

n 1 is the value ofthis statistic at the previous stage. The statistic D i

n

(0 # D in # 1) comprising the dissimilarity (statistical)

between the model simulated and the system responsevalues at the previous and the current stages providesthe measure of akinness of model `i ’ to the true model. Inthe subsequent discussion, this statistic will be referred toas the Discrimination Index (DI). To start with, when allthe rival models are assumed equally plausible, D i

n 1

1/m, i 1, 2, ¼, m. However, different values of D in 1 can

be used, initially, if for certain reasons some models can bepreferred over others, provided Sm

i 1Di

n 1 1. The de® nitionof this statistic suggests that at a given stage, the lower thevalue of this index for a particular model, the higher is itsworthiness for the system.

EXPERIMENTAL DESIGN

The Basis

An adequate discrimination may not always be possiblewith the observations in hand. It is important, therefore,to design one or more experiments so that the resultingobservations could add to the discriminatory power of DI.This can be done by maximizing the pairwise statisticaldissimilarity of the rival models. In the following, thedistance function is developed which can measure such adissimilarity.

Let Y n 1 be the random vector on which discriminatoryobservations are required and f u

n 1 and f vn 1 be the

alternative p.d.f.s of Y n 1, under models `u’ and `v’ . Inorder that an observation on the response vector Y n 1 have adiscriminatory power it is important to conduct theexperiment at the setting j

n 1, such that the posterior

distance, Gu,v, say, between models `u’ and `v’ , attributableto the addition of the observation vector Y n 1, is maximized.

140 SINGH

Trans IChemE, Vol 77, Part A, March 1999

For that purpose the distance function Gu,v should be suchthat

(i) Gu,v 0 # Gu,v # 1 is positive only if models `u’ and `v’are distinct,(ii) Gu,v Gv,u, i.e., the distances are not affected by thedirection in which they are measured,(iii) Gu,v # Gu,w Gw,v, i.e., the distances are not diminished,even if they are measured via some third distribution: thetrue distribution of the system, for instance.

The function Gu,v de® ned as

Gu,v jn 1

1 f un 1 y

n 1f v

n 1 yn 1

1/2dyn 1

1/ 2

12

satis® es the properties listed above and can, therefore,provide the posterior distance for the purpose of designingadditional experiments eventually resulting into discrimi-natory observation vector Y n 1.

The Design Criterion

So far as the p.d.f.s f in 1, i u, v in equation (12) are

concerned, at the n-th stage, the knowledge about thedistribution of the response vector Y n 1 is not complete,especially because g i

n 1 E i Y n 1 , which depends onthe parameter vector h i and j

n 1, is not known. Therefore,

the posterior densities of Y n 1 under models u and v will beused instead and are derived in the following.

Two cases are considered, depending upon the `prior’knowledge about the distribution of the parameter vectorh i .

Case 1. Distribution of the model parameters is knownIt may be possible that in a given situation the distribution

of the parameters of the model may be known from aprevious study of the given or a similar system. Assume thatthe prior distribution of h i is Npi

ho, Sh . Using this prior, itcan be shown [Appendix B.] that under model `i’ , Y n 1 isdistributed as Nr

ÃY in 1, Zi with Z i S

X in 1 Mi Sh

1X in 1. Accordingly, the p.d.f. of Y n 1 can

be written

f in 1 yn 1 2p r Zi

1/2

´ exp1

2y

n 1Ãy i

n 1Z 1

i yn 1

Ãy i

n 1;

i u, v. 13

Making use of (13) in (12) the distance function Gu,v canbe written from (A3) [Appendix A.]

Gu,v jn 1

14 ZuZv

Zu Z v2

1/4

´ exp1

4Ãy u

n 1Ãy v

n 1Zu Zv

1 Ãy u

n 1Ãy v

n 1

1/ 2

.

14

Case 2. Complete ignorance about the model parametersMore common situations are the ones where the modeller

has virtually no information about the distribution of model

parameters. One can then use a noninformative priorfor obtaining the posterior distribution of Y n 1. With thisoption, the posterior density of Y n 1 can be written[Appendix B.]

f in 1 y

n 12p

r Wi1/2

´ exp1

2y

n 1Ãy i

n 1W 1

i yn 1

Ãy i

n 1;

i u, v,

where Wi S X in 1M

1i X i

n 1. These p.d.f.s when usedin (12) give the distance function Gu,v [equation (A3)Appendix A.]

Gu,v jn 1

14 WuWv

Wu Wv2

1/4

´ exp1

4Ãy u

n 1Ãy v

n 1Wu Wv

1

´ Ãy u

n 1Ãy v

n 1

1/2

. 15

The function Gu,v in (14) or (15) will form the basis ofthe design criterion function in the sequential procedure tobe formulated in the following.

SEQUENTIAL PROCEDURE FORMODEL DISCRIMINATION

In a given situation, if the observations in hand fail tore¯ ect a reasonable discrimination, it is advisable to appendthe given sample by one or more observations, especiallyacquired for the purpose of discrimination. As arguedearlier, this can be done by conducting the experimentsdesigned through the distance function Gu,v.

The Weights

In the presence of m models all the m2 pairwise distances

Gu,v must be taken into account. This can be reasonably doneby considering the weighted average of such distances. Caremust, however, be taken so that the magnitude of theweights is directly related to the distance between themodels comprising each pair at the previous stage. In otherwords, higher weight should be given to the distancebetween the closer models. One such set of weights is[Singh8]

wu,v;n

D un

c D vn

, if D un < D v

n ,

D vn

c D un

, if D un > D v

n , 16

where D un is given by (11) and c is the normalizing constant

cm 1

i 1

m

j i 1

D in

D jn

, for D in < D j

n .

The weights thus designed give due weightage to thedistance gained by the models in each pair at the n-th stageand thus accelerate the process of discrimination.

141MODEL SELECTION FOR A MULTIRESPONSE SYSTEM

Trans IChemE, Vol 77, Part A, March 1999

The Design Criterion Function

The distances Gu,v and the weights wu,v;n can now becombined to form the design criterion function

F jn 1

m 1

u 1

m

v u 1

wu,v;nGu,v jn 1

, 17

where Gu,v jn 1

is substituted from (14) or (15), depend-ing upon whether the distribution of the model parametersis known or not. The information for the purpose ofdiscrimination can be acquired if the (n 1)-th experimentis conducted at the setting j

n 1which maximizes the

criterion function (17). The search for the maximum ismade over the operability region of the input variablesj

n 1.

The Termination Criterion

The experiments can be sequentially designed throughthe criterion function (17). However, in order to avoidthe unnecessary experimentation a termination criterionmust be provided for in the sequential procedure. As theweights used in the criterion function are indicative of thediscrimination achieved at a given stage, they can also beutilized for formulating the termination criterion. Accord-ingly, the discrimination procedure is stopped at the stage nif

wu ,u ;n wu ,u ;n 1 # d, 18

where u and u , respectively, denote the best and thesecond best models as identi® ed by DI at the stage n and(n 1) the previous stage. The quantity d 0 # d # 1 ischosen according to the stringency required in the givensituation. The complete sequential scheme for implemen-tation of the proposed discrimination procedure is shown inFigure 1 and demonstrated through examples in which thepresent method is also compared with some otherprocedures reported in the literature.

EXAMPLES

Example 1: Discrimination Among Four KineticModels in a Biresponse System

In this example, the present method is implemented andcompared with the method proposed by Buzzi-Ferrariset al.7. Consider the problem of discrimination among thefollowing four bivariate models proposed for a chemicalreaction system:

M 1 : g1

1

h1

1 j1j2

1 h1

3 j1 h1

4 j2

,

g 12

h1

2 j1j2

1 h1

3 j1 h1

4 j2

;

M 2 : g 21

h2

1 j1j2

1 h2

3 j1 h2

1 j2

, g 22

h2

2 j1j2

1 h2

3 j12;

M 3 : g 31

h 31 j1j2

1 h3

3 j22, g 3

2

h 32 j1j2

1 h3

4 j12;

M 4 : g 41

h 41 j1j2

1 h4

3 j1 h4

4 j2

, g 42

h 42 j1j2

1 h4

3 j1

.

The model

M o : y1

0.01j1j2

1 0.001j1 0.01j2

e1,

y2

0.001j1j2

1 0.001j1 0.01j2

e2

was assumed to be true model for the system. Theobservations were generated through this model with(e1, e2) as pseudo random numbers from N2 0, S with thecovariance matrix S diag 0.35, 0.0023 . It is assumedthat nothing is known `a priori’ about the model parameters.Therefore, the covariance matrix required for calculation ofD i

k [equation (11)] was estimated from (10). Initially, 8observations were used in calculating the values of DI forassessment of adequacy of the given four models. Theresults are presented in Table 1. It can be noticed that M 1 ,with 0.1341 as the value of D 1

8 , is closest to the true modelM o . On the other hand, the highest value 0.3088 of D 3

8 isindicative of the worst performance of M 3 . Besides, thevalues 0.2833 and 0.2738 of D 2

8 and D 48 , respectively,

show the closeness between M 2 and M 4 , though theseprove to be inferior models as compard to M 1 .

In order to make the distinction clearer, some additionalexperiments were designed. The independent variables(j1, j2) were constrained to lie in the interval 5.0 # j1,j2 # 55.0 . The distribution of the model parameterswas assumed to be unknown [Case 2.]. Accordingly, thecriterion function F j

n 1[equation (17)], comprising wu,v;n

from (16) and Gu,v jn 1

from (15) was employed fordesigning discriminating experiments. It is interesting tosee the ef® cacy of the weight function wu,v;n in assigning theappropriate weights for different pairs of models. At the 8thrun, models M 2 and M 4 were found to be closest to eachother. As expected, the weight 0.231882 assigned to thispair was actually the highest. In contrast, the pair consistingof the farthest models M 1 , M 3 received the lowest weight0.141912. Similarly, other pairs were assigned appropriateweights, depending upon the distance between the modelscomprising each pair. Maximization of F using theseweights resulted into (35.0, 25.0) as the new optimal settingfor the 9th run. Correspondingly, the values (5.90, 0.71) ofthe responses (Y1, Y2), resulted in a considerable decrease inthe value of D 1

9 which dropped to 0.0604. On the otherhand, D 3

9 rose to 0.4462. Models M 2 and M 4 , too, lookedfurther apart from each other, as can be seen in Table 1. Itcan also be noticed in this table that by this run the modelM 1 has considerably diverged from M 2 , M 3 and M 4 .This information was utilized for designing the setting forthe 10th run. The weight function assigned low weights, ofthe orders of 0.0845, 0.0558, 0.1251 to the pairs (M 1 , M 2 ),(M 1 , M 3 ), and (M 1 , M 4 ), respectively, in formulatingthe design criterion function. Suitable weights were alsoassigned to other pairs. This time (54.0, 28.0) happenedto be the values of the input variables which resulted into(9.66, 0.61) as the values of the responses (Yi, Y2). As aconsequence of this observation, there was a further drop inD 1

10 to 0.011, which clearly indicated that the procedure hadidenti® ed the correct model. The present procedure was,however, stopped at the 11th run according to the proposedtermination criterion. The value of the discrimination indexfor M 1 at this stage had gone as low as 0.0065, therebyindicating that M 1 is the best model. It could also be

142 SINGH

Trans IChemE, Vol 77, Part A, March 1999

concluded that M 4 with D 411 0.0987 was closest to M 1

and that M 3 with D 311 0.5842 should be rated as a bad

model. The stage to stage progress in discrimination throughthe present method can be seen in Table 1. The trend ofthe discrimination index for different models over thesequential stages is shown in Figure 2.

For discrimination for the same set of models, Buzzi-Ferraris et al.7 initially used 9 data points. The results ofdiscrimination through their method are presented inTable 2. A comparison of these results with those shownin Table 1, shows that the adequacy difference whichthe initial data (eight points) re¯ ected through DI is notvisible in their discrimination criterion which is basedon x2-test. In fact, the values 15.1, 14.7, 24.3, and 14.4 of

the statistic

x2u

n

k 1

yk

y uk S

1 yk

y uk

used by them for the assessment of discrimination level, atthe initial stage, do not show any distinction among the rivalmodels. These values rather indicate that all the models areadequate. Addition of the responses (9.15, 0.93) correspond-ing to the 10th setting (55.0, 32.8) of the independentvariables designed by them branded M 3 as an inadequatemodel with x2

u 58.3. As laid down in their procedure,this model was dropped for designing the next (11th)experiment.

143MODEL SELECTION FOR A MULTIRESPONSE SYSTEM

Trans IChemE, Vol 77, Part A, March 1999

Figure 1. Scheme for implementation of the sequential procedure.

In fact, M 3 continued to be an inadequate modelthroughout their sequential procedure, as can be seen fromTable 2. Similarly, M 2 , too, once dropped at the 11th stage,never got a chance to be included in the criterion function.In fact, M 1 and M 4 were the only models whichparticipated in designing new experiments. These modelscontinued to be close rivals until the 19th experiment, whenthe procedure of Buzzi-Ferraris et al.7 was stopped with theconclusion that M 1 was the correct model.

It can be clearly seen from Table 2 that this decision,which was possible after 19 runs through the procedure ofBuzzi-Ferraris et al.7, could be made after 11 runs throughthe present procedure. In addition to the faster convergenceto the true model, the present method also determines, toa large extent, the status of each rival model for itsappropriateness for the given system, as shown in Figure 2.

Example 2: Discrimination Between Three ModelsProposed by Hydrogenolysis of Thiophene into

Hydrogen Sulphide, Butene and Butane

The present method is applied to discriminate amongmodels postulated for the consecutive reaction scheme

T B A H2S

for hydrogenolysis of thiophene into butene and thehydrogenation of butene into butane, where the symbols

144 SINGH

Trans IChemE, Vol 77, Part A, March 1999

Table 1. Sequential discrimination among four biresponse models by the present procedure.

Discrimination indexRun Input variables Responses [based on `k’ observations]

k j1k

j2k

y1k y2k D 1k D 2

k D 3k D 4

k

1 20 20 3.61 0.532 30 20 5.42 0.443 20 30 5.00 0.644 30 30 7.50 0.665 25 25 5.73 0.556 25 15 3.80 0.337 25 35 7.30 0.798 15 25 4.90 0.35 0.1341 0.2833 0.3088 0.27389 35 25 5.90 0.71 0.0604 0.2944 0.4462 0.1990

10 54 28 9.66 0.61 0.0110 0.3088 0.5451 0.135111 17 45 6.30 0.52 0.0065 0.3106 0.5842 0.0987

Figure 2. Discrimination Index for four biresponse models, discriminatedsequential by the present procedure.

Table 2. Sequential discrimination among four biresponse models by the procedure of Buzzi-Ferraris et al.7.

Run Input variables Responses Chi-square statistics

k j1k

j2k

y1k y2k x21 x2

2 x23 x2

4

1±9 15.1 14.7 24.3 14.410 55.0 32.8 9.15 0.93 15.9 26.5 58.3 15.611 55.0 55.0 13.74 1.34 18.9 72.2 68.0 23.812 10.6 55.0 6.00 0.70 21.8 81.0 101.0 29.413 16.0 55.0 8.20 0.84 21.8 81.5 151.0 34.814 5.0 5.0 0.45 0.04 Points15 5.0 55.0 3.27 0.44 generated for16 55.0 5.0 1.47 1.13 precise estimation17 55.0 55.0 14.00 1.37 25.1 114.6 117.3 36.418 10.6 55.0 5.50 0.55 33.0 108.0 185.0 52.819 16.0 55.0 8.40 0.70 35.7 119.0 259.0 67.3

Values in the body of this table reproduced from Buzzi-Ferraris et al.7.

T, B, A, H, respectively, stand for thiophene, butene,n-butane, and hydrogen. According to Van Parijs andFromen9, this reaction can be described by 288 plausibleHougen-Watson reaction mechanisms. Preliminary discri-mination among these models was done by them throughthe physicochemical criteria, analysis of residuals, andstatistical tests. Their study led to the selection of two rivalmodels which basically differ with respect to the modeof adsorption of hydrogen accounted for in the reactionmechanism.

If it is assumed that molecular hydrogen is competitivelyadsorbed and that the surface reaction between the reactantsand the adsorbed hydrogen is the rate determining step,then the resulting mechanism leads to the biresponse model[Van Parijs and Froment9]

M 1 : g1

1

h1

1 h1

2 h1

3 pT pH

1 h1

2 pT h1

3 pH h1

4

pS

pH

2,

g1

2

h1

5 h1

6 h1

7 pB pH

1 h 16 pB h 1

7 pH h 18 pA

2.

On the other hand, if adsorption of hydrogen is assumed tobe atomic, though the surface reaction is still assumed to berate determining, then the postulated biresponse model is

[Van Parijs and Froment9]

M 2 : g 21

h 21 h 2

2 h 23

2pT pH

1 h2

2 pT h2

3 pH1/2

h2

4

pS

pH

3,

g2

2

h2

5 h2

6 h2

72pBpH

1 h 26 pT h 2

7 pH1/2 h 2

8 pB3.

In yet another case, it can be assumed that the spilloverhydrogen, generated on Co9S8 reacts with MoS2 to produceslightly reduced centres, active in hydrogenationor stronglyreduced centres, active in hydrogenolysis. The kineticmodel, for this case, can be postulated describing the ratesof hydrogenolysis of thiophene and of butane formation;namely, the model [Van Parijs et al.10]

The parameters h ij , i 1, 2, 3; j 1, 2, 3, 4, 5, 6, 7, 8 in the

models M 1 , M 2 , and M 3 represent kinetic constants rele-vant to the particular reaction mechanism used in postulatingthe models speci® ed above and pT , pB , pA, pH , pS represent,respectively, the partial pressures of T, B, A, H, and S.

According to Van Parijs and Froment9 discriminationbetween models M 1 and M 2 through residual analysisand other statistical tests was dif® cult to achieve. Thepresent method was applied for discriminating betweenthese close rivals and con® rm if, instead, the interconversionmodel M 3 was the model which could adequately describethe given reaction system. Following the integral method ofkinetic analysis used by Froment and Bischoff11, theconversions xT and xA were expressed as functions ofthe space time W /F o

T through the continuity equation

dF

dWR,

where F is the vector of molar ¯ ow rates: FT F oT 1 xT

and FA F oT xA with F o

T as molar feed ¯ ow rate of thiopheneand R as the vector of net reaction rates: RT g i

1and

RA g i2 , i 1, 2.

For estimating the parameters of all the rival models,initially 10 data points were generated using bivariateNormal distribution N2 g, S with g as the vector of modelfunctions:

and the error covariance matrix

S1.5 ´ 10 4 0.3 ´ 10 5

0.3 ´ 10 5 2.3 ´ 10 6.

To start with, it was assumed that all the proposed modelswere equidistant from the true model, so that D u

913,

u 1, 2, 3. Before proceeding to design new settings of theindependent variables, it was important to make anassessment of the discrimination possible with the initialset of data. The values of the DI, D 1

10 0.3071 andD 2

10 0.3143, were slightly indicative of the differencebetween models M 1 and M 2 with an inclination towardsM 1 as the correct model. M 3 looked a little farther (in thesense of statistical distance) from M o as compared to M 1

and M 2 . In order to make this slightly visible distinction

145MODEL SELECTION FOR A MULTIRESPONSE SYSTEM

Trans IChemE, Vol 77, Part A, March 1999

M 3 : g3

1

h 31 h 3

2 h 33 pT pH

1 h 32 pT h 3

3 pH h 34

pS

pH

1 h 35 pH h 3

6 pA h 37 pB

2,

g3

2

h3

8 h3

7 h3

5 pBpH

1 h 35 pH h 3

6 pA h 37 pB

pH

h3

4 pS

1 h 32 pT h 3

3 pH

2.

g o1

34.762383pTpH

1 5.619015 ´ 10 6pT 1.57 ´ 10 1pH 1.48 ´ 102 pS

pH

2,

go

2

25.77390852 ´ 10 6pBpH

1 1.358189 ´ 10 4pB 1.1277828 ´ 10 8pH 1.358189 ´ 10 4pA2.

clearer, more experiments were designed through theproposed design criterion. The criterion function (17) wasmaximized within the range (0.0, 0.150) of W /F o

T and (2.0,30.0) that of pt . The addition of one more point (25.1,40.012) resulted in D 1

11 0.2511 and D 211 0.2960,

thereby indicating that model M 1 was superior to modelsM 2 and M 3 . The model M 3 with D 3

11 0.4529 showedrather poor prospect to be selected as an adequate model.Another designed point further reduced the value of D 1

12 to0.1527. The subsequent designed point reduced the value ofD 1

13 to 0.0251 and that of D 213 to 0.1469. This showed an

evidence in favour of model M 1 . The procedure was,however, stopped only at the 14th run according to theproposed termination criterion with the conclusion that M 1

was the best model. On the other hand, the value of

D 314 0.8280 now clearly showed that M 3 was a bad

model. The progress in discrimination through the sequen-tial stages can be seen in Table 3, while the trend of DI forthe competing models is shown in Figure 3.

Example 3. Discrimination between Models Postulatedfor the Reaction between Tetrachloroethane

and Chlorine

In this example, the present procedure was implementedto identify the most af® ne model to the system fromamongst the models proposed for the reaction betweentetrachloroethane and a large excess of chlorine on theactivated silica gel catalyst. Six models were consideredplausible for describing the reaction:

C2H2Cl4 C2HCl5 C2Cl6;

namely the models:

M 1 : g1

1 h1j1

1 exp h1

3 j1j3 ,

g1

2 h1

1 j1 h1

2 j2 exp h1

3 j1j3 ;

M 2 : g 21 h 2

1 j1 exp h 23 j1j2j3 ,

g2

2 h2

1 j1 h2

2 j2 exp h2

3 j1j2j3 ;

M 3 : g3

1 h3

1 j1 exp h3

3 j1j2 1 j1 j2 j3 ,

g 32 h 3

1 j1 h 32 j2 exp h 3

3 j1j2 1 j1 j2 j3 ;

M 4 : g 41 h 4

1 j1 exp h 43 j1 j2 j3 ,

g4

2 h4

1 j1 h4

2 j2 exp h4

3 j1 j2 j3 ;

M 5 : g 51 h 5

1 j1 exp h 53 1 j2 j3 ,

g5

2 h5

1 j1 h5

2 j2 exp h5

3 1 j2 j3 ;

M 6 : g6

1 h6

1 j1 exp h6

3 j3 ,

g6

2 h6

1 j1 h6

2 j2 exp h6

3 j3 .

The model

M 0 : g o1 0.01j1 exp 0.03j1j3

go

2 0.01j1 0.001j2 exp 0.03j1j3

146 SINGH

Trans IChemE, Vol 77, Part A, March 1999

Table 3. Sequential discrimination among three kinetic models proposed for hydrogenolysis of thiophene.

Discrimination indexRun Input variables Responses [based on `k’ observations]

k pt W/F oT xT xA D 1

k D 2k D 3

k

1 4.0 6.300 0.188 0.0252 11.3 16.700 0.313 0.0513 17.5 41.576 0.501 0.0914 25.0 75.000 0.715 0.1655 30.0 112.651 0.792 0.2546 20.0 50.001 0.563 0.0967 23.8 63.125 0.625 0.0988 30.0 125.012 0.812 0.3239 27.8 75.231 0.729 0.178

10 31.2 124.001 0.775 0.298 0.3071 0.3143 0.378611 25.1 40.012 0.545 0.121 0.2511 0.2960 0.452912 29.4 46.231 0.659 0.085 0.1527 0.2522 0.595113 20.6 120.902 0.795 0.288 0.0251 0.1469 0.828014 28.1 24.588 0.445 0.197 0.0099 0.0987 0.8914

Figure 3. Discrimination Index for three kinetic models proposed forhydrogenolysis of thiophene.

was considered as the true model which, according to Prasadand Rao6, closely represents the given reaction system. Inorder to simulate the experimental behaviour of the reactingsystem, the data were generated by adding the error termsdistributed as N2 0, S with

S0.3 ´ 10 3 0.6 ´ 10 5

0.6 ´ 10 5 0.5 ´ 10 4.

The maximum number of parameters involved in a model inthe given set being 3, initially 4 observationswere generatedthrough M 0 . Besides, in order to express the same faith ineach model, the value of DI was assumed to be the same forall the rival models, i.e., D u

3 1/6, u 1, 2, ¼, 6. Theinitial set of response values when used in the discrimina-tion criterion (11) showed inclination towards M 5 withD 5

4 0.0543. The most poorly performing model was M 2

with D 24 0.2963. Another set of the variables j1, j2, j3

was chosen by using the proposed design criterion function[equation (17)] with Guv substituted from (15).The generated response values resulted into considerabledecrease in the value of D 5

5 to 0.0074. The comparison ofthis value with the values of DI for other models showed

enough evidence in favour of M 5 as the most af® ne modelto the true model [Table 4]. The procedure, however, hadto be continued by one more stage when it was stoppedaccording to the termination criterion. The value of DIfor M 1 had by then gone as low as 0.002. So far as theposition of other models is concerned, it can be clearlyseen from Table 4 that although for models M 1 , M 4 , M 6

the values of DI kept decreasing, they could never show asmuch closeness to the system as M 5 did. However, of thesemodels M 4 was found to be the closest rival of M 5

followed by M 6 . In contrast M 2 and M 3 consistentlyproved to be bad choices for the reaction under considera-tion. The progress in discrimination from one stage toanother can be seen in Figure 4.

The investigation into this discrimination problem bythe present procedure was carried on further, using differentinitial distances. As M 5 emerged as the best model andM 4 as a slightly better model as compared with the rest,in another implementation of the present procedure, theremaining models were given a handicap advantage. Thiswas done by assigning unequal values to DI at the initial

147MODEL SELECTION FOR A MULTIRESPONSE SYSTEM

Trans IChemE, Vol 77, Part A, March 1999

Table 4. Sequential discrimination among six kinetic models proposed for the reaction between tetrachloroethane and chlorine.

Discrimination indexRun Input variables Responses [based on `k’ observations]

k j1k

j2k

j3k

y1k y2k D 1k D 2

k D 3k D 4

k D 5k D 6

k

1 0.757 0.242 0 0.701 0.7122 0.871 0.126 1 0.632 0.5703 0.598 0.406 2 0.499 0.503 0.1666 0.1666 0.1666 0.1666 0.1666 0.16664 0.784 0.216 3 0.611 0.698 0.1534 0.2963 0.2672 0.1186 0.0543 0.11025 0.821 0.210 3 0.598 0.687 0.1425 0.3621 0.3161 0.0719 0.0074 0.10007 0.629 0.300 2 0.529 0.495 0.1391 0.4175 0.3411 0.0292 0.0020 0.0711

Figure 4. Discrimination Index for six kinetic models proposed forthe reaction between tetrachloroethane and chlorine.

Figure 5. Discrimination Index for six kinetic models proposed forthe reaction between tetrachloroethane and chlorine (handicap advantageto some models).

stage according to their status in the discrimination done inthe previous implementation. To be speci® c, initial assign-ments to six competing models were 0.14, 0.12, 0.12, 0.18,0.3, 0.14. The same set of initial data were used as in theprevious application. The emergence of all the modelswas observed for their closeness to the true model as theprocedure was applied sequentially [Figure 5]. The resultsare presented in Table 5. It can be seen from this table thataddition of one more set of observations did not resultinto appreciable difference between the values of DI. Themodels M 1 , M 4 , M 5 and M 6 were almost equidistantfrom the true model. More settings of the input variablesj1, j2, j3 were, therefore, required to be designed to makethe distinction clear. In fact, this had to be carried on tillthe 3rd sequential stage. By then, M 5 with D 5

7 0.0029,clearly emerged as the closest model to the true model. Itwas also observed that M 4 was the closest rival of theselected model M 5 , while M 2 and M 3 could be declaredas bad models.

This problem of discrimination was also considered byPrasad and Rao6 who demonstrated that Roth’ s1 criterionwith point likelihood used in the calculation of the posteriorprobabilities took 18 additional runs to declare M 5 asslightly preferred model over M 1 . This conclusion couldbe drawn from the proposed procedure more decisivelywith 3 additional runs in the ® rst case (when to start withall the models were considered equidistant from the truemodel) and 4 additional runs in the second (when handicapadvantage was given to poorer models). Prasad and Rao6

also used the Box-Hill5 procedure and compared theperformance of the point likelihood with that of theexpected likelihood in calculating the posterior probabilitiesfor discriminating among these models. They could achievereasonable discrimination on the basis of 9 additionaldesigned experiments. In contrast, the present procedureneeded only 3 additional observations for achievingreasonable discrimination and declaring M 5 as the bestmodel.

CONCLUSIONS

It would not be too demanding to expect that a modelshould closely describe the underlying mechanism of areaction system and thus simulate observations as if theyhad been generated by the system itself. It is this basic ideawhich has been exploited in formulating the sequentialprocedure proposed in this work. The procedure, therefore,

has been based on the statistical similarity between thetwo sets of observations: the system response values andthe model simulated values. The procedure covers bothaspects of the problem of model selection: design andanalysis of experiments. Through its application to non-linear problems it has been seen that the procedure is notonly faster, but also results into sharper discrimination.The weights used in the design criterion function are wellresponsive to the distance between the pairs of competingmodels and for that reason are highly effective, as seen inall the applications. The monitoring of the discrimina-tion achieved at each sequential stage makes use of theinformation contained in the model simulated values withreference to their statistical dissimilarity from the systemresponse values. As this is done under the stopping rule, theresult is faster and sharper discrimination. At the termina-tion stage, the procedure not only pin points the best modelbut also indicates, to a large extent, the relative closenessof each of the proposed models to the system. Thisinformation is desirable, in the sense that the modelsconsidered in a given situation are, after all, postulated onthe basis of certain physical and chemical principles thoughtto be governing the process. And for some other reason,it might be advantageous to use other than the best model,the second best model, for instance. The ordering of modelsfor their closeness to the system, as the present proceduredoes, can be useful in such situations.

Although applied to chemical kinetic systems involvingnonlinear models, the procedure can as well be applied toother multiresponse systems, even if the proposed modelsare linear in parameters.

The design and discrimination criteria actually used inthe applications is based on the assumption of multivariatenormal distribution of errors. As this assumption is reason-able in most of the situations and can be validated in others,it presents no limitations to the use of the present procedure.The basic approach, however, is very much valid in the caseof other distributions of errors. Only in certain situationswhere the modeller has absolutely no reason to believe thatthe errors have multivariate normal or some other distribu-tion, would the requirement be a nonparametric technique.

APPENDIX A

Lemma: Let Y 1, Y 2, ¼, Y n be n independent r-vectorswith Y k distributed, alternatively, as Nr m

1,k, L1,k and

148 SINGH

Trans IChemE, Vol 77, Part A, March 1999

Table 5. Sequential discrimination among six kinetic models proposed for the reaction between tetrachloroethane and chlorine (handicap advantage tosome models).

Discrimination indexRun Input variables Responses [based on `k’ observations]

k j1k

j2k

j3k

y1k y2k D 1k D 2

k D 3k D 4

k D 5k D 6

k

1 0.757 0.242 0 0.701 0.7122 0.871 0.126 1 0.632 0.5703 0.598 0.406 2 0.499 0.503 0.1400* 0.1200* 0.1200* 0.1800* 0.3000* 0.1400*4 0.784 0.216 3 0.611 0.698 0.1445 0.2571 0.2011 0.1045 0.1745 0.11835 0.512 0.487 5 0.469 0.449 0.1453 0.3472 0.2430 0.0847 0.0632 0.11666 0.681 0.316 6 0.503 0.418 0.1565 0.3609 0.3226 0.0635 0.0123 0.08427 0.463 0.555 1 0.422 0.317 0.1626 0.3967 0.3668 0.0223 0.0029 0.0487

*distances assumed, initially

Nr m i2,k

, L2,k . Then the distance between these distributionsis given by

h f1, f2

n

k 1

4 L1,k L2,k

L1,k L2,k2

1/4

´ exp1

4

n

k 1

m1,K

m2,k

L1,k L2,k1

m1,K

m2,K

.

Proof: Since Y k is distributed alternatively as Nr mi,

k, Li,k , i 1, 2, the joint density function of n independentr-vectors Y Y 1, Y 2, ¼, Y n can be written as

fi y 2pnr

n

k 1

Li,k

1/2

´ exp1

2

n

k 1

yk

mi,k

L1

i,k yk

mi,k

,

i 1, 2 A1

where y y1, y

2, ¼, y

n. Using (A1) in the distancefunction

h f1, f2 f1 y f2 y 1/2dy,

the distance between f1 and f2 can be written

h f1, f2

2p nr/2

n

k 1

L1,k1/4

L2,k1/4

´ exp1

4

n

k 1

yk

m1,k

L 11,k y

km

1,k

yk

m2,K

L1

2,k yk

m2,K

dy. A2

Combining the two quadratic forms in the integrand in (A2)

h f1, f2

n

k 1

L k

2

1/2

n

k 1

L1,k1/4 L2,k

1/4

´ exp1

4

n

k 1

m1,k

m2,k

L1,k L2,k1

m1,k

m2,k

´n

k 1

Lk

2

1/2

2p r /2exp

1

2

n

k 1

yk

mk

Lk

2

1

´ yk

mk

dy,

where mk

L1,k L2,k1 L1,k m

2,kL2,k m

1,kand Lk

L 11,k L1,k L2,k L 1

2,k .Finally, integration gives

h f1, f2

n

k 1

4 L1,k L2,k

L1,k L2,k2

1/4

´ exp1

4

n

k 1

m1,k

m2,k

L1,k L2,k1

m1,k

m2,k

.

A3

Corollary: In the case of one set of observations obtainedby conducting the experiment at the settings j

k, the distance

function can be written from (A3) by substituting n 1

h f1, f24 L1,k L2,k

L1,k L2,k2

1/4

´ exp1

4m

1,km

2, kL1,k L2,k

1

´ m1,k

m2,k

1/2

.

APPENDIX B

Lemma: Let Y k be distributed as Nr g ik

, S under modelM i : g i j

k, h i and assume that the model functions g i

can be linearized in the parameter space H . Then theposterior distribution of Y k is

(ii) NrÃY i

k , V1 , where V1 S X ik Mi Sh

1X ik , if the

distribution of h i is known to be Npiho, Sh ,

(ii) NrÃY i

k , V2 , where V2 S X ik M 1

i X ik , if the dis-

tribution of h i is not known.

where X ik is the matrix of partial derivatives of the model

functions g ik

with respect to the parameter vector h i .

Proof: Let h ik denote the posterior density of Y k under

model 2. Then the posterior p.d.f. of Y k is given by

h ik y

kf i

k yk/g

i

kg i

gi

kdg

i

k. B1

Since Y k is distributed as r-variate normal with mean vectorg i

kand covariance matrix S, the p.d.f. of Y k under model `i ’

can be written

f ik y

k/g

i

k2p

rS

1/2

´ exp 12

yk

g ik

S 1 yk

g ik

.

B2

This provides the ® rst component f ik in the integrand in

(B2). In order to obtain the second, consider the relation

E i Y k gi

jk, h

i .

Assuming that the model functions, g ik

, can be linearizedin the parameter space, one can write

gi

kÃY i

k X ik h

ih

i , B3

where h i is an appropriately chosen estimate whichcan justify the linearization of the model function. Theidentity (B3) suggests that the distribution of g i

kis

identical with that of X ik h i and hence the posterior

distribution of h i must, ® rst, be sought. This can be doneby using the formula

gh hi L h i fh h i

L h i fh h i dh i. B4

With n sets of data and h i as the maximum likelihoodestimate of h i , the likelihood function L under model `i ’

149MODEL SELECTION FOR A MULTIRESPONSE SYSTEM

Trans IChemE, Vol 77, Part A, March 1999

can be written

L hi 2p

rS

n/2 exp1

2

n

k 1

e ik S

1e ik

1

2h

i Ãhi Mi h

i Ãhi , B5

where Mink 1 X i

k S 1X ik and e i

k Y k g i j k, Ãh i .

So far as the choice of fh in (B4) is concerned, two casesare considered.

Case 1: Distribution of the Model Parameters is Known

Suppose that the parameters h i are known to bedistributed `a priori’ according to the normal distri-bution: Npi

ho, Sh . Using this prior and the likelihood[equation (B5)] in (B4), the posterior density of h i can bewritten

gh hi

exp 1

2Q2 h i

exp 12Q2 h i dh i

, B6

where the quadratic form Q2 is given by

Q2 hi

hi Ãh

i Mi hi Ãh

i

h i h io S 1

h h i h io .

Combining the two quadratic forms in Q2 suitably andintegrating out h i in the denominator of (B6), the posteriorp.d.f. of h i can be obtained

gh hi 2p

pi Mi Sh1/2

´ exp 12

h i Åh io Mi Sh h i Åh i

o ,

where Åh io Mi Sh

1 MiÃh i Shh

i . This shows thatthe posterior distribution of h i is pi-variate normal withmean vector h i and covariance matrix Mi Sh

1.Further, the r-vector X i

k (h i Ãh i ), being a linearcombination of the normal vectors, is an r-variate normalrandom vector and is distributed about the mean vector 0and the covariance matrix Ci, given by

Ci X ik Mi Sh

1X ik .

As a consequence of the identity (B3), the p.d.f. of g ik

canbe written

g i g i

k2p r Ci

1/2

´ exp 12

g ik

Ãy ik

C 1i g i

kÃy i

k. B7

Substituting f ik from (B2) and g i from (B7) in (B1),

h ik y

k2p

r Ci S1 exp 1

2Q1 g i

kdg i

k;

where

Q1 g i

kg i

ky

kS 1 g i

ky

k

g i

kÃy i

kC 1

i g i

kÃy i

k.

Combining the two quadratic forms in Q1 appropriately andintegrating, the p.d.f. f i

k can be obtained

f ik y

k2p

r Zi1/2

´ exp 12

yk

Ãy ik Z 1

i yk

Ãy ik

,

where Zi S Ci. This shows that under model `i ’ , Y ik

is distributed as NrÃY i

k , Zi .

Case 2: Noninformative Prior Distribution of h i

Assume noninformative (locally Uniform) prior, i.e., theprior density of h i of the type

fh h i3c. B9

Using fh from (B9) and the likelihood function L from (B5)in the formula (B4), the posterior density of h i can bewritten

gh h i 2p pi Mi1/2

´ exp 1

2h i Ãh i Mi h i Ãh i .

This shows that the posterior distribution of h i is pi-variatenormal with mean vector h i and the covariance matrixM 1

i . Using the same argument as in Case 1., it can beconcluded that under model `i ’ , the posterior distribu-tion of Y k is Nr

ÃY ik , Wi with Wi S X i

k M 1i X i

k .

REFERENCES

1. Roth, P. M., 1965, Design of experiments for discriminating amongrival models, PhD Thesis, (Princeton University, USA).

2. Hosten, L. H. and Froment, G. F., 1976, Non-Bayesian sequentialexperimental design procedure for optimal discrimination betweenrival models, Proc 4th Int Symp on Chemical Reaction Engineering,Heidelberg, I1±I13.

3. Atkinson, A. C., 1978, Posterior probabilities for choosing a regressionmodel, Biometrika, 65: 39±48.

4. Hill, W. J. and Hunter, W. G., 1967, Design of experiments formodel discrimination in multiresponse situations, Tech Rep No. 65,(Dept of Stat, Univ of Wisconsin, USA).

5. Box, G. E. P. and Hill, W. J., 1967, Discriminating among mechanisticmodels, Technometrics, 9: 57±71.

6. Prasad, K. B. S. and Rao, M. S., 1977, Use of expected likelihoodin sequential model discrimination in multiresponse systems, ChemEng Sci, 32: 1411±1418.

7. Buzzi-Ferraris, G., Forzatti, P., Emig, G. and Hofman, H., 1984,Sequential experimental design for model discrimination in case ofmultiple responses, Chem Eng Sci, 39: 81±85.

8. Singh, S., 1998, On establishing the credibility of a model for asystem, Chem Eng Res Des, TransIChemE, Part A, 76: 657±668.

9. Van Parijs, I. A. and Froment, G. F., 1986, Kinetics of hydrosulfuri-zation on a CoMo/c -Al2O3 Catalyst. Kinetics of hydrogenolysis ofthiophene, Ind Eng Chem Prod Res Dev, 25: 431±436.

10. Van Parijs, I. A., Froment, G. F. and Delmon, B., 1984, Kinetic modelsfor the hydrogenolysis of thiophene. A comparison of Hougen-Watsonmodels with ® xed and interconverting sites for hydrogenolysis andhydrogenation, Bull Soc Chim Belgique, 93: 823.

11. Froment, G. F. and Bischoff, K. B., 1991, Chemical Reaction Analysisand Design (Wiley, New York).

ADDRESS

Correspondence concerning this paper should be addressed toDr Santokh Singh, RUTCOR, Rutgers University, 640 BartholomewRoad, Piscataway, NJ 08854, USA.

The manuscript was received 20 April 1998 and accepted for publication24 September 1998.

150 SINGH

Trans IChemE, Vol 77, Part A, March 1999