A Bayesian Reliability Approach To Multiple Response Optimization With Seemingly Unrelated Regression Models_Paper

7/29/2019 A Bayesian Reliability Approach To Multiple Response Optimization With Seemingly Unrelated Regression Models_P

1/17

Quality Technology &Quantitative ManagementVol. 6, No. 4, pp. 353-369, 2009

QQTTQQMM ICAQM 2009

A Bayesian Reliability Approach toMultiple Response Optimization with

Seemingly Unrelated Regression Models

John J. Peterson1, Guillermo Mir-Quesada2 and Enrique del Castillo3

1Research Statistics Unit, GlaxoSmithKline Pharmaceuticals, King of Prussia, PA, USA

Bioprocess Research and Development, Lilly Technical Center-North, Indianapolis, IN, USADepartment of Industrial and Manufacturing Engineering, The Pennsylvania State University,

2

3

University Park, PA, USA(ReceivedSeptember 2006, acceptedApril 2008)

______________________________________________________________________

Abstract: This paper presents a Bayesian predictive approach to multiresponse optimizationexperiments. It generalizes the work of Peterson [33] in two ways that make it more flexible for usein applications. First, a multivariate posterior predictive distribution of seemingly unrelated regressionmodels is used to determine optimumfactor levels by assessing the reliability of a desired multivariateresponse. It is shown that it is possible for optimal mean response surfaces to appear satisfactory yet

be associated with unsatisfactory overall process reliabilities. Second, the use of a multivariate normaldistribution for the vector of regression error terms is generalized to that of the (heavier tailed)multivariate t-distribution. This provides a Bayesian sensitivity analysis with regard to moderateoutliers. The effect of adding design points is also considered through a preposterior analysis. Theadvantages of this approach are illustrated with two real examples.

Keywords: Design space, desirability function, Gibbs sampling, multivariate t-distribution, posteriorpredictive distribution, robust parameter design, robust regression.

______________________________________________________________________

1. Introduction

tatistically designed experiments and associatedresponse surface methods are considered

effective methods for optimizing products and processes. Much has been written about

experiments involving a single response, but less has been written about multiple response

experiments, although they are quite prevalent. Popular statistical packages such as Design

Expert and JMP allow experimenters to analyze multiple response experiments by

providing procedures based upon "overlapping mean responses" or "desirability functions"

of mean responses. The overlapping mean response approach provides an overlay plot ofthe mean response surfaces to see if there is a configuration of factor levels that

simultaneously satisfies the conformance criteria of the experimenter. A listing of articles

providing examples or discussion of this approach can be found in Montgomery and

Bettencourt [30]. Harrington [16] first proposed an approach based upon a notion of a

desirability function. The idea here is that for each response type, a function,

taking values on [0, 1] is created to express the desirability of the mean response as a

function of the factor levels. (Here, is an

S

thi ( ),id y

u1ry vector of estimated mean responses, andcloser to 1 are more desirable.) The overall desirability, is a geometric mean

of the individual values. Later on, Derringer and Suich [7], del Castillo et al. [6], and

Kim and Lin [21] proposed modifications of Harrington's desirability function.

( ) ' sid y ( ),Dy( )id y


2/17

354 Peterson, Mir-Quesada and del Castillo

Another type of approach based upon a quadratic loss, about a set of target

values have been discussed by Khuri and Conlon [20], Pignatiello [35], Ames et al. [1],

Vining [41], and Ko et al. [22]. Some of these approaches try to model the joint predictivedistribution of but do not capture the uncertainty due to the estimated variance-

covariance matrix parameters. Furthermore, for the quadratic loss function and desirability

function approaches (except perhaps the one due to Harrington) it may be difficult to assign

values of the function to a scale that can be converted into "poor", "good", "excellent", etc.

Here, a panel of experts may be required (Derringer [8]) to obtain an informative "univariate

response index" (Hunter [17]) for a multiresponse optimization problem.

( ),Qy

y

The overlapping mean response, desirability function, and quadratic loss function

approaches have the drawback that they do not completely characterize the uncertainty

associated with future multivariate responses and their associated optimization measures.

The danger of this is that an experimenter may use one of these methods to get an optimal

factorconfiguration, validate it with two or three successful runs, and then begin production.For example, suppose that the probability that a future multivariate response is satisfactory

is only 0.7. Even so, the chance of getting three successful, independent validation runs is

0.343, which can easily happen. Hunter [17] states that the variance of univariate response

indices for multiresponse optimization "can be disturbing" and further study is needed to

assess the influence of parameter uncertainty.

For the optimal conditions obtained by these approaches, Peterson [33] used a real-

data example to show that the probability (i.e. reliability) of a good multivariate response,

as measured by these optimization criteria, can be unacceptably low. Furthermore, he

showed that ignoring the model parameter uncertainty can lead to reliability estimates that

are too large. A practical drawbackof the methodology in Peterson [33] is that the regression

models were limited to the standard (normal theory) multivariate regression (SMR) model

(cf., Johnson and Wichern [19]) having the same covariate structure across response types.

In this paper we generalize the applicability of the Bayesian methodology in Peterson

[33] to make it more widely useful for addressing commonly occurring multivariate response

surface problems. This is done by introducing a method for utilizing seemingly unrelated

regression (SUR) models (Zellner [44]) where each response-type has its own covariate

structure. In addition to the multivariate normal distribution assumption for the vector of

regression errors, we provide a further modification of this approach to handle

regression-error vectors that have a multivariate t-distribution. The t-distribution modeling

is useful for many typical response surface experiments from a Bayesian sensitivity analysis

perspective. Many response surface designs have sample sizes sufficiently small as to make

it difficult to assessnormality of the residuals. Obvious outliers can be removed,but moderate

outliers may be somewhat confounded with factor effects. The t-distribution allows theexperimenter to vary the thickness of the outer tails of the distribution of the residual

errors by varying the associated degrees of freedom parameter. These two modifications of

Peterson [33] provide flexibility that maybe needed for typical response surface experiments.

In this paper, we re-analyze the two examples found in Peterson [33] using the above

two generalizations. In the first example, a mixture experiment, we show that use of SUR

modeling can make a noticeable difference. The second example provides an illustration

that the optimal conditions for the process under study provide a high likelihood of meeting

specifications whether or not we use a normal distribution or a t-distribution with heavier

tails. This distribution sensitivity analysis provides some assurance that extensive validation

efforts may not be needed before implementing this optimized process.


3/17

A Bayesian Reliability Approach to Multiple Response Optimization 355 ( ) Pr( | , data),p Ax Y xFor these examples, process optimality is measured by

where is a vector of responses, u1ku1r is aY x vector of process factors, and is aspecification set that describes desirable or acceptable values for The approach used inthis paper models the posterior predictive distribution of given

A

.Yto find a value ofx xY

that maximizes (p ). Here, the probability of response conformance or reliability, ( ),p xx

is easy to interpret. However, other process reliabilities can be constructed if one wishes to

employ a desirability function, or quadratic loss function, For example,

using the posterior predictive distribution one can compute

( )DY , ( ).QY

t( ) Pr( ( ) *| , data)p D Dx Y xor d( )( ) Pr( *| , datap Q Q )x Y x if informative values of or are available. Anillustration is given in Peterson [33]. The predictive nature of this approach also easily

allows for the incorporation of noise variables to help the experimenter create a robust

process. See, for example, Mir-Quesada et al. [29] and Rajagopal et al. [40].

*D *Q

An important application of the ( )p x desirability function is the set of all pointsx -

such that ( )p x is at least some prespecified reliability value. An example of such a setappears in Peterson [33]. This type of set has applications for process capability, and in fact

has been proposed by Peterson [34] for construction of a "design space", a pharmaceutical

manufacturing process capability region described in the FDA document "Guidance for

Industry Q8 Pharmaceutical Development" [10] available at http://www.fda.gov/cber/

gdlns/ichq8pharm.htm.

2. The Statistical Model for a Bayesian Reliability

To compute ( ) Pr( | , data)p Ax Y x for multiresponse process optimization, weneed to obtain the posterior predictive distribution for givenY .x The regression model

considered here is the one that allows the experimenter to use a different (parametrically)

linearmodel for each response type. This will allow for more flexible and accurate modeling

of than one would obtain with the SMR model.Y

Here, is a vector of response-types andc 1( ,..., )rY YY r x is a u1k vector of factorsthat influence by way of the functionsY

c ( ) , 1,..., ,i i i i Y e iz x E r (1)

where i is a vector of regression model parameters and is a vector of

covariates which are arbitrary functions of

u1ip ( )iz x u1ip.x Furthermore, r is a random

variable with a multivariate normal distribution having mean vector 0 and variance-

covariance matrix,

c)e 1( ,ee ...,

. The model in (1) has been referred to as the "seemingly unrelatedregressions" (SUR) model (Zellner [43]). When {( )z x1( ) ...z x ( )r z x , we obtain the

SMR model.In order to model all of the data and obtain a convenient form for estimating the

regression parameters, consider the following vector-matrix form,

,Y Z eE (2)

where c c 1[ ,..., ] ,rY Y Y c c c 1[ ,..., ] ,rE E E c cc c 1[ ,..., ] ,re e e and Z is a um p block diagonalmatrix of the form with1diag( ,..., ),rZ Z ... .rp1p p

c c c]xHere,

and for

c 1( ,..., ),i i inY YYc (ie 1,..., ),i ine e 1[ ( ) ,..., ( )i i inZ z x z 1,...,i r.

For the SMR model under a noninformative prior (which is proportional to

the posterior predictive density function of has the multivariate 6 ( 1)/2| | ),r Y


4/17


t-distribution form. See, for example, Press [36]. Simulation of a multivariate t-distribution

r.v. with df can be done simply by simulating a multivariate normal r.v. and an

independent chi-square r.v. with df (Johnson [18]). For the SMR model then,

v

v ( ) Pr( | , data)p AY x can be computed directly for eachx x by Monte Carlo

simulations. This was done by Peterson [33] as a way to do multiresponse surface

optimization by maximizing ( )p x over the experimental region. Mir-Quesada et al. [29]

extended these results for the SMR case to include noise variables. Computation of

multivariate t-distribution probabilities over hyper-rectangles can also be computed

efficiently by numerical integration (Genz and Bretz [13]).

For the SUR model, no closed-form density or sampling procedure exists. However,

using Gibbs-sampling (Griffiths [15]) it is easy to generate random pairs of SUR model

parameters from the posterior distribution of ( , ).E Using the SUR model in (1) it is thenstraightforward to simulate r.v.'s from the posterior predictive distribution of givenY Y

.x

3. Computing the Bayesian Reliability

3.1. The SUR Model with Normally Distributed Error Terms

Before describing the Bayesian analysis, it is convenient to discuss some (conditional)

maximum likelihoodestimates for the SUR model. For a given , the maximum likelihoodestimate (MLE) of E can then be expressed as

1 1 1 [ ( ) ] ( ) ,n n c c Z I Z Z I Y E 6 6

I

(3)

uwhere n is the n n identity matrix and is the Kronecker direct product operator.

The variance-covariance matrix of

E is c 1 1Var( ) [ ( ) ] .nZ I ZE

,E the variance-covariance matrix, For a given , can be estimated by

1

1 ( ) ( ) ,n

j j

jn cE E E e e

( ) ( ( ),..., ( ))e ec

(4)

where 1j j rjE E Ee c ( ) ( ) , 1,..., .ie y i r z xE E and ij ij i j Let iE be themaximum likelihood estimator of iE for each response-type independently of the other

responses, and define

,i

1

rc c E E E ,The estimator of E

lE c 1 1[ ( ( ) ) ]nEZ I Z l

c 1( ( ) ) ,n YEZ I (5)

is called the two-stage Aiken estimator (Zellner [43]).

In order to compute and maximize ( ) Pr( | , data)p Ax Y xp

over the experimental

region, it is important to have a relatively efficient method for approximating ( )x by

Monte Carlo simulations. The approach taken in this paper is to simulate a large number of

r.v.'s from the posterior distribution of ( ,E ( , )E ), and use each value to generate ar.v. for each x. In this way, the sample of

Y

( , )E values can be re-used for simulatingvalues at each

Y

x point, instead of having to do the Gibbs sampling all over again for each

x-point.


5/17

A Bayesian Reliability Approach to Multiple Response Optimization 357( , )Consider the noninformative prior for E which is proportional to

(Percy [32] and Griffiths [15]). Note that the posterior distribution of

( 1)/2| | rE given is

modeled by

1 1~ ( ( ),( ( ) ) ),nN c Z I ZE E

E

(6)

where has the form as in (3). This follows from Srivastava and Giles [39]. Note also

that the posterior distribution of 1 given E is described by

1 1 1~ ( , ( )),W n nE E

W n 1 1 ( ),n

(7)

Ewhere is the Wishart distribution with df and scale parameter (

and

)E( , )

has the form as in (4). This follows from a slight modification of expression (7) in

Percy [32]. Sampling values from the posterior distribution of E can be done asfollows using Gibbs sampling:

Step 0. Initialize the Gibbs sampling chain using 1/2 W E E E e where *Ecorresponds to (5), E corresponds to the -form in (5), and where

, ).0 rI Here,~ (Ne W can be used to induce a slight overdispersion asrecommended by Gelman etal. [12]. In this article, W 2 is used. This initializationis done since E is approximately normal with mean *E and variance-covariancematrix E

Step 1. Generate a value as in (7) by using the most recently simulated E and thedecomposition c ,S* where 1 1 ( )1 * c n E* * S and 1 . c

ni iH H Here,

1,..., n

H H are iid ( , )rN 0 I distributed.

Step 2. Generate a E value as in (6) by using the most recently simulated and

c 0 ,R ( )E H where

1 1

) ]nc c [ (E R R Z I Z and 0H is distributed as( , )rN 0 I .

Following Percy [32], we use a burn-in of 100 iterations for steps 1 and 2. See Geweke

[14] for use of (conditionally conjugate) informative priors for E and .

To compute ( ),p x N are generated for each-vectorsY .x Each simulated

is generated using

-vector,Y( ) ,sY

c

c

#

1

( ) ( ) ( )

( )

,

( )

s s s

r

z x

Y e

z x

E

( ) ( )( , )

(8)

where s sE ( )is sampled using the Gibbs sampler, se ( )( , ),sN 0 1,..., .s N -point

is sampled from

and For each new ( ) ( )( , )sx the same sN E ( ),p

pairs are used. The

Bayesian reliability, x is approximated by

( )

1

1( ),

Ns

s

I AN

Y

,N ( )I

, )Y

(9)

for large where is an indicator function.

Percy [32] provides a similar, but three-step, Gibbs sampling procedure that generates

a ( , E triplet for a given x value. However, this is not efficient for our purposes as


6/17


this Gibbs sampling procedure would have to be re-done for many in order to

optimize

-pointsx

( ).p x Percy also proposes a multivariate normal approximation to the posterior

predictive distribution of givenY .x However, such an approximation may not beaccurate for small sample sizes. This is because one would expect the true posterior

predictive distribution of givenY x to have heavier tails than a normal distribution due

to model parameter uncertainty; this is indeed the case with the SMR model.

3.2. The SUR Model with t-Distribution Error Terms

The multivariate t-distribution can be a useful generalization of the multivariate normal

distribution for applied statistics (Liu and Rubin [25]). In particular, it can be a useful tool

for modeling bell-shaped distributions that have heavier than normal "tails". Liu [26]

illustrates the utility of using t-distribution errors for robust data analysis within the context

of an SMR model. Rajagopal et al. [40] provide an example for the univariate regression

case.

In this subsection, we show how to sample from the posterior predictive distribution of

a SUR model with multivariate t-distribution errors. This will allow the experimenter to

perform a sensitivity analysis with regard to a distribution that spans a continuum from

Cauchy to normal This can be useful for many typically used response

surface designs that may not provide enough data to perform discriminating tests of

normality. Our experience is that (if the mean response is in A) the Bayesian reliability

(df 1) f(df ).

( )p x gets smaller as the df get smaller reflecting a more disperse posterior predictive

distribution for at eachY -point.x If ( )p x is acceptably large for both small and large df,

then our sensitivity analysis provides some confidence that we have a reliable process,

provided that the residual distribution appears bell-shaped and we have found good

regression models for each response type.

The SUR model with t-distribution errors has the same model form as in (1) but with

i 's replaced by cHi 's, where the vector, H He ( ,..., )rH,

1 has a multivariate t-distribution with

location parameter0, scale (matrix) parameter, and df parameter The inverse of.v ,is sometimes called the precision matrix. For details about the multivariate

t-distribution see Kotz and Johnson [23]. Here, we are assuming that is known. Some

authors recommend using df (which implies three finite moments) for the

t-distribution for purposes of modeling a heavy-tailed errors. See for example, Lange et al.

[24], Gelman et al. [12], and Congdon [5]. The same noninformative prior, proportional to

can beused for the t-distribution errors model. This prior is used in this article.

1,v

4v

( 1)/2| | ,r

To do Gibbs sampling from a SUR model with t-distributions errors, first consider the

following weighted SUR model

c ( ) , 1,..., 1,..., ,ijij i j i j

eY i r j n

wz x E

ie

(10)

where (10) is defined as in (1) but with an index jto represent the observation number and

with replaced by c 1( ,.. )/ .ij je w Here, .,j j rje ee (0, )N ,

~ iid ,

Conditional on

1,..., .j n

jw

u1r(10) is a weighted SUR model. Note that here the weight is the same

for each vector of responses, ,jY but different for each observation. If

for where are iid chi-square with df, then (unconditional on the

thj /j jw u v,..., ,n 1,..., nu u v1j

's) the rx1 error vectors,ju c 1( / ,..., ) , 1,..., ,j j je w w j t v

/rje n are iid multivariate with

df, location parameter vector0, and scale parameter matrix, (Congdon [5]). As withthe normal errors SUR model, we use the noninformative prior which is proportional to


7/17

A Bayesian Reliability Approach to Multiple Response Optimization 359 ( 1)/2| | .r

c c 1 1 ( , ) ( ( ) ) (W Z W Z Z E

1( ,..., ).ndiag w w W

In order to set up the Gibbs sampling, we need to define some estimator-like

functions of the data and model parameters. First we define

1 ) ,W Y

where In addition let

c 1 1( , ) [ ( ) ] .V W Z W Z

l

Finally let,

c

1

1( , ) ( )( ) .

n

i i i i i i

wn

E E EW Y Z Y Z

The basic steps of the Gibbs sampling are as follows.

Step 0. Initialize the Gibbs sampling chain using (5) for .E For 1, ., ),nwsimulate ' ,iw s where /

( ..diag wWi i v and the 'iu s have independent chi-square

distributions with v df.w u

( .i n1,..., )Step 1. Simulate | ,E W according to 1 1, ~ ( ,W n nE EW W ( , )).

Step 2. Simulate | ,E W according to , ~ ( ( , ),NE EW W V ( , )).W Step 3. Simulate 1,..., )nw (conditional on (diag wW E and by simulating each iw

independently according to a gamma distribution, 1,..., ,i i n where( , )G b c denotes a gamma distribution with density function,

~ ( , ),w G b c i

t*

1 /

( ; , ) for 0.( )

b w c

b

w eg w b c w

b c

( )/2 b v rHere, and

1 11[ ( ) ( )] .2 2

c i i i i i v

c y y= =E E

( )

(11)

sComputing Y ( )pand x follows as in (8) and (9). If so desired, it is clear from the

above Gibbs sampling steps, that one can use the same (conditionally conjugate)

informative priors for E and as for the normal errors SUR model.

3.3. The Addition of Noise Variables

One advantage of this posterior predictive approach to multiresponse optimization isthat it easily allows the experimenter to incorporate noise variables and thereby do

robust-parameter-design process optimization. A noise variable is a factor that may be

precisely controlled in a laboratory setting but not in actual production use. To see how

noise variables can be incorporated, let c 1 1( ,... , ,..., )h h kx x x xx 1,...,h kx xwhere arenoise variables. Here, it is typically assumed that the ( 1,..., )jx j h k

(0,1).N

are scaled such

that they are iid By simulating c1( ,..., )h kx x 1 1( ,..., ) Pr( | ,..., , data)h hp x x A x xY

1( ,..., )hp x x

and substituting into the simulation

for (8), can be computed. Maximizing

provides for a way to do robust process optimization. Details for the SMR

case are discussed in Mir-Quesada et al. [29] and Rajagopal et al. [40]. Extension to the

SUR case for normal ort-distributed errors is straightforward.


8/17


4. Optimization of the Bayesian Reliability

If there are 2-3 controllable factors, then it is easy to maximize ( )p x by gridding overthe experimental region. For a larger number of controllable factors two other approaches

are possible. One approach is to use a general optimization procedure such as can be found

in Nelder and Mead [31], Price [37], or Chatterjee et al. [4]. Another approach is to create a

closed-form approximate model for ( )p x using logistic regression or some other regression

procedure such as a generalized additive model (Wood [42]). By creating a coarse to

moderately dense grid over the experimental region, logistic regression can be applied to

the data. For example, the grid can be an factorial design where( )( sI AY ),x km moints,5-10, say.Since we can simulate many pairs for each of many( )( ( ), )sI AY x -px

it should be possible to create a good approximate closed-form model, ( )p ,x for ( ).p x

One can then maximize ( )p x using some suitable optimization procedure. See Peterson

[33] for an example using the SMR model.

5. A Preposterior Analysis

As will be seen in the next section, it may happen that standard multiresponse

optimization procedures indicate that satisfactory results for the mean response surfaces are

possible while the associated Bayesian reliability, ( ),p x is not satisfactory. If this happens

it is because the posterior predictive distribution is too disperse or possibly even oriented in

a way that causes ( )p x to be too small. One remedial possibility is to reduce the process

variation or change the correlation structure in such a way as to increase ( ).p x However,

this may not always be possible, and in some cases difficult or costly when possible. There

is another approach which will increase ( )p x to some degree, provided that the mean

response surfaces provide satisfactory results. Some of the dispersion of the posterior

predictive distribution is due to the uncertainty of the model parameters. This uncertainty

can be reduced by increasing the sample size. Increasing the number of observations will

not make ( )p x go to one, due to the uncertainty of the natural process variation itself, but

it may be useful to assess how much ( )p x will increase as additional data are added.

5.1. A Preposterior Analysis with Normally Distributed Error Terms

One way to assess how additional data might affect the posterior predictive distribution

is to impute new data in such a way as to predict the effect of having additional data using

the information we currently have at hand. In this paper we take two different approaches

to imputing additional data for the SUR model with normally distributed error terms. The

first approach is based upon single imputation, where we impute an (imaginary) additional

data set that has the property that it keeps E and in the Gibbs sampling the same.However, the additional data involves augmenting the regression design matrix and df. It is

evident from (6) that the posterior distribution of E given can be modified to behave asif additional data were used by augmenting the design matrix and the size of the

identity matrix Accordingly, from (7) one can see that the posterior distribution of

Z

.nI given E can be changed to behave as if more data were added by increasing the n in the

Wishart distribution, both for the df and for the n in the 1n -coefficient of 1 ( )E in (7).

The second approach is based upon (parametric bootstrap) multiple imputation. Let

be estimates of( E ) ( , )E (such as *E and E in (5)). Using the model form in (2),we simulate a new response values, from(n n) ,iY

( ) ( 1,...,i a )N i n n E Z and thengenerate Nrealizations of ( , )E conditional on the augmented data set, using the Gibbssampling process. These realizations are then used to generate N realizations of a new

response variable, using (8). This whole process is repeated m times to get m estimates,Y


9/17

A Bayesian Reliability Approach to Multiple Response Optimization 361Zof These values are then averaged to get a

final estimate of 1 1Pr( | , , ,..., , ,..., ).a an n n nY A data Y Y x Z

1 1,..., | ,...,

{Pr( |

m

,1 1, data, ..., , ,..., )}. Z% n n n na aY YE Y

Z a an n n nY Yx Z ZA

This preposterior estimate of Pr( |AY x

1 1r( | , data, ,..., , ,..., ).a an n n nY A Y Y x = =

| ,dataY

, data) is a (parametric) bootstrap estimate

of an expected value; as such it seems reasonable to use m equal to 200 (Efron and

Tibshirani [9]). This multiple imputation approach, though more computationally intensive,

can also be used to produce a histogram of simulated realizations from the random variable

P

Note that multiple imputation here is not done by simulating from the posterior

predictive distribution, but instead from ( ) ( 1,...,i aN ),i n n E Z where Eand are point estimates of E and respectively. The reason for this is that, for any

fixed -point.x simulating from the posterior predictive distribution will get us nowhere.This is because multiple imputation from the posterior predictive distribution produces an

estimate of

1 1,..., | , ,...,{Pr( | ,

n n n na aY Y dataE Y x= = 1,...,d 1, , ,..., )}.a an n n nA ata Y Y = = (12)

But (12) equals This follows from the well known result

that

( ) Pr( | , datp x AY x2 1 2( ( | )) ( ).

a).

E E Y Y E Y

= =

One may be able to use new responses simulated from the

posterior predictive distribution to compute

1 1,..., | , ,..., {max Pr( |n n n na aY Y data RE Y

x1 1, ,..., , ,..., )}.a an n nA data Y Yx = =, n

But such a two-tiered Monte Carlo computation (with a maximization in between)could become rather burdensome.

5.2. A Preposterior Analysis with t-Distributed Error Terms

The Gibbs sampling for the SUR model with t-distributed error terms poses a difficulty

for the single imputation approach. This is due to the fact that we need to use the

terms (11) in the Gibbs sampling simulations

for the but each term depends upon As such, we need to use

the more computationally intensive multiple imputation (parametric bootstrap) approach.

c 1[ / 2 (1/ 2)( ) ( )]i i i i c v y Z B y Z B6 ( 1,..., ),i aw i n n ic

1i

.iy

Such modifications will give the experimenter an idea of how much the reliability can

be expected to increase by reducing model uncertainty. For example, the experimenter can

forecast the effects of replicating the experiment a certain number of times. This idea issimilar in spirit to the notion of a "preposterior" analysis as described by Raiffa and

Schlaiffer [38].

6. Examples

6.1. A Mixture Experiment

This example involves a mixture experiment to study the surfactants and emulsification

variables involved in pseudolatex formation for controlled-release drug- containing beads

(Frisbee and McGinity [11]). An extreme vertices design was used to study the influence of

surfactant blends on the size of the particles in the pseudolatex and the glass transition

temperature of films cast from those pseudolatexes. The factors chosen were: ="% of1x


10/17


Pluronic F68", 2 "%x of polyoxyethlene40 monostearate", 3 "%x

1Y 2,Y

1Y

2.Y 1Y 2Y

of polyoxyethylene

sorbitan fatty acid ester NF". The experimental design used was a modified McLean-

Anderson design (McLean and Anderson [28]) with two centroid points, resulting in asample size of eleven. The response variables measured were particle size' and glass

transition temperature, which are denoted here as and respectively. The goal of

the study was to find values ofx1, x2, and x3 to minimize as best as possible both and

Here, we choose an upper bound for to be 240 and an upper bound for to be 19.

Anderson and Whitcomb [2] also analyze this data set to illustrate Design Expert's capability

to map out overlapping mean response surfaces.

Frisbee and McGinity [11] and Anderson and Whitcomb [2] use an SMR model with

second-order terms to model the bivariate response data. For this example, however, a

severe outlier run was deleted. The resulting regression models obtained were:

1 1 2 3 1 3 2 3

248 272 533 485 424 ,y x x x x x x x

2 1 2 3 1 3 2 3 18.7 14.1 35.4 36.7 18.0 .y x x x x x x x

2 ,Y

2 1 2 3 1 2 1 3 2 3 18.8 15.6 35.4 3.59min( , ) 17.7min( , ) 10.0min( , ),y x x x x x x x x x

2Y2R

-valuesp

(13)

(14)

For this paper, several different mixture-experiment regression models were fit to each

for each response type. For the Becker-type model (Becker [3]),

(15)

resulted in a mean squared error of 1.71, which is a 53% reduction over the quadratic

model for in (14). The adjusted for the model in (15) is 96.4%. It turned out that

the model forms in (13) and (15) gave the best overall fits to the data. As such, these two

different (SUR) model forms werechosen to model the response surfaces. The Wilks-Shapiro

test for normality of the residuals for each regression model yields greater than0.05. Tests for multivariate normality via skewness and kurtosis (Mardia [27]) were not

significant at the 5% level; although such tests would not be very sensitive for the small

sample size used in this example.

Figure 1 shows the points where the predicted mean responses associated with the

model in (13) are less than 240. Likewise, Figure 2 shows the points where the predicted

mean responses associated with the model in (15) are less than 19. Figure 3 shows the

points where the predicted mean responses associated with both models in (13) and (15) are

less than 240 and 19, respectively.

We define d d1 2 1 2{ , : 240, 19}A y y y y and ( ) Pr( | , data).p AY x ( )pAllx xprobabilities are computed here using N=1000 simulated values for eachY x point. For

this example, ten independent Gibbs sampling chains were simulated for 1000 iterationsfollowing a burn-in of 100 iterations. Each chain was thinned to take only every tenth

simulation. Here, N=1000 was taken as a reasonable value (Gelman et al. [12]). For

binomial probabilities,Nof a 1000 produces a standard error of at most 0.0158 (assuming

roughly independent posterior simulations). The Gelman-Rubin convergence statistics for

all of the model parameters were all very good (less than 1.1 as recommended by Gelman

et al. [12]). Gridding over the design simplex was done using 32,761 grid points. Using the

SUR models in (13) and (15) we obtain 0.19) .c ) *) 0.622max ( (p p xx * (0.81, 0,at x* .Clearly, this is no indication of a reliable process at x If the experimenter had instead

used the classical SMR model (forms in (13) and (14)) to maximize ( )p x over the design

simplex then he/she would obtain max ( ) ( *) 0.863p p xx * (0.78,0,0 .22) .at cx( )pHence the optimal x for the SMR model is about a 39% increase in process reliability,


11/17

A Bayesian Reliability Approach to Multiple Response Optimization 363

2yp

though still possibly unacceptable. The noticeable difference in probabilities is due to the

fact that the (better fitting) model in (15), while having a smaller MSE, also has a larger

mean predicted value than the model in (14) when is greater then 0.5. Theprobabilities

1x( )x for the SUR model were also computed assuming that the residual

errors had a t-distribution with 4 df. In this case, max ( ) ( *) 0.613p p xx* (0.83,0,0 .17) .cx

at

(Gelman et al. [12], suggest a t-distribution with 4 df for doing a robust

data analysis.)

0.00

0.25

0.50

0.75

1.00

x1=10.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x2=1 x3=1

1 ,y

Figure 1. The gray area is the part of the response surface

associated with the model in (13) where the predicted meanresponse, is less than or equal to 240.

0.00

0.25

0.50

0.75

1.00

x1=10.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x2=1 x3=1

2 ,y

Figure 2. The gray area is the part of the response surfaceassociated with the model in (15) where the predicted meanresponse, is less than or equal to 19.

x1=1

x2=1 x3=1

x1=1

x2=1 x3=1


12/17


0.00

0.25

0.75

1.00

x1=1

0.50

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x2=1 x3=1

d1 240y d2 19.y

-points

Figure 3. The gray area is the part of the response surfaceassociated with the models in (13) and (15) where both

and

All of these models may indicate the need for remedial action. Such action could be of

the form of reducing the process variability, decreasing the means, and/or removing

uncertainty due to the unknown model parameter values. Since the first two actions may be

difficult to achieve, we consider the effects of adding more replications to the experimental

design by way of a preposterior analysis. To assess the effect of adding additional data, thepreposterior analyses discussed in section 5 were performed. To keep the computations

tractable, the same optimized x associated with each model and the original data

set, were used. For each model, (p )x was computed using its own optimal x-point,

for the SMR model (with normal errors), for the

SUR model (with normal errors), and

* (0.78, 0, 0.22)x * (0.81, 0, 0.19)cx0.17)

c* (0.83, 0, cx for the SUR model with

t-distributions errors (with 4 df). For each model, the entire design matrix was replicated 2,

3, or 4 times. For the SMR model the degrees of freedom were adjusted accordingly. For

the SUR model with normally distributed errors, both preposterior approaches (single and

multiple imputation) discussed in section 5.1 were used. For the SUR model with the

t-distribution errors, the multiple imputation approach as discussed in section 5.2 was used.

( *)pFigure 4 shows the increase in x as the number of replications is increased from

one to four for both the SMR and SUR models. Here, it is evident that the SMR modelmight lead the experimenter to believe that reduction of model parameter uncertainty by

using three or four replications would provide sufficient evidence that the process has a

high rate of conformance with the specifications given by the set A. However, the better

fitting SUR models, using either normal ort-distribution errors, indicate that increasing the

number of experimental replications may not validate that the process has a high rate of

conformance even with four replications. Instead, the SUR models are indicating that the

experimenter must improve the process means and/or variances to obtain conformance

with higher probability. The single imputation and (bootstrap) multiple imputation results

for the SUR model with normal errors are reasonably close but further research needs to be

done on how these two preposterior approaches compare. Nonetheless, this shows the

x1=1

x2=1 x3=1


13/17

A Bayesian Reliability Approach to Multiple Response Optimization 365importance of improved modeling which can be achieved by generalizing from the SMR

model to the more flexible SUR model.

SMR normal errorsSUR normal errorsSUR normal errors with bootstrapSUR t-dist. errors with bootstrap

Rep=1 Rep=2 Rep=3 Rep=40.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

Figure 4. Probabilities of conformance for increasing numbers of designreplications. Reps 2-4 are the preposterior probability estimates. (Bootstrapping is

not applicable for Rep=1.) For each model, p(x) was computed using its ownoptimalx-point from Rep=1.

6.2. Optimization of an HPLC Assay

This example illustrates the optimization of an event probability

for a high performance liquid chromatography (HPLC) assay as originally discussed in

Peterson [33]. Here there are three factors ( = percent of isopropyl alcohol (pipa), =

temperature (temp), and = pH) and four responses ( = resolution (rs), = run time,

= signal-to-noise ratio (s/n), = tailing). For this assay, the chemist desires to have the

event,

Pr( | , data)AY x

2x

2y1x

3x 1y

3y 4y

t d t d d1 2 3 4{ : 1.8, 15, 300, 0.75 0.85},A y y y yy (16)

occur with high probability. As such it is desired to maximize ( ) Pr( | , data)p Ax Y x asa function of .x

A Box-Behnken experimental design was run, with three center points, to gather data

to fit four quadratic response surfaces. For the SMR model, full second-order quadratic

regression forms were used for each response.

All of the response surface models fit well, with all values above 99%. As in

example 1, the Wilks-Shapiro test for normality of the residuals for each regression model

yields p-values greater than 0.05, and the Mardia tests for multivariate skewness and

2R

SMR normal errorsSUR normal errorsSUR normal errors with bootstrapSUR t-dist. errors with bootstrap

Rep=1 Rep=2 Rep=3 Rep=40.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00


14/17


kurtosis were not significant at the 5% level. The factor levels were coded so that all values

were between and with the center of the experimental region at the origin.1

(11

2

3

4 0

1,

E

E E

(1)1 1

(2 )1

(3)1 1

(4 )1

y x

y x

y x

y x

Some of the factor terms for the second-order response surface models were not

statistically significant so a SUR model was created from an SMR model by removing

some of the non-significant terms, while still preserving model-term hierarchy. Using the

STEPWISE option in SAS PROC REG the four regression models obtained for the SUR

model analysis were:

E E E E E E

E E E E E E

E E E E E E

) (1) (1) 2 (1) 2 (1)0 2 2 11 1 22 2 12 1 2 1

(2) (2) (2) (2) 2 (2) 2 (2)0 1 2 2 3 3 11 1 22 2 12 1 2 2

(3) (3) (3) (3) 2 (3)0 2 2 3 3 33 3 12 1 2 3

(4 )1

,

,

,

x x x x x e A

x x x x x x

x x x x x e

E E E (4) (4) 2 (4) 22 2 11 1 22 2 4 .x x x e

e(17)

For comparison purposes, a sensitivity analysis involving three models were performed.

The three models were:

Model 1: An SMR model using a full second-order polynomial with normally distributederrors.

Model 2: A SUR model as shown in (17) above with normally distributed errors.

Model 3: A SUR model as shown in (17) above with errors have a t-distribution with 4 df.

For models 1-3, 10,000 Monte Carlo simulations were done as it appears that the true

underlying Bayesian probabilities were extreme (close to 1). One hundred burn-in

simulations were done to get each independent simulated value. Gridding steps of 0.1 were

used across the coded design space. For the SMR model (Model 1), the maximum ( )p x

value is ( *) 0.96p 4x where c* (73.5, 43, 0.1) .x( )p

However for the SUR model with

normal errors (Model 2), the maximum x value is ( *) 1p x where c* (73.5, 43x , 0.1)(although a neighborhood containing *x also had values of ( )p 1).x Replacing thenormal errors assumption in Model 2 with t-distributions errors (with 4 df) (Model 3),

produced a maximum (p ) value of ( *) 0.978,p x where c*x (74.5, 44.9, 0.06) .x

It is interesting to note that the ( *)p x s for the SUR models are larger than for the

SMR model. In this example, Model 2 is simply a special case of Model 1 where the

non-significant regression terms are removed. Apparently, this removal of non-significant

terms for Model 2 tightens up the posterior predictive distribution enough to increase the

optimal ( )p x value of that over Model 1. Even the optimal ( )p x value for Model 3 is

slightly larger than that for Model 1 despite the use of a residual errort-distribution with 4

df. For this example, the sensitivity analysis tells us that for all three models the worst case

probability is 0.964. If this smallest reliability estimate is adequate then we need not do apreposterior analysis to check the effects of gathering additional data.

7. Summary

The SMR model (with normal errors) has a closed-form posterior predictive distribution

allowing quick and easy computation of ( ) Pr( | ,data)p Ax Y x or other posteriorpredictive metrics as shown in Peterson [33]. However, in some cases the use of the more

general SUR model will be preferable. One such case was shown in the first example

(section 6.1) where the fit of one of the response types was greatly increased by a change in

the basic model form. For the second example (section 6.2) all of the individual regression

models each had some terms that were not statistically significant. The larger posterior

probability of conformance value for the SUR model over the SMR model indicates that


15/17

A Bayesian Reliability Approach to Multiple Response Optimization 367some further efficiency can be obtained by removing terms in some of the models that do

not appear predictive.

The preposterior analysis discussed in section 5 allows the investigator to assess the

effect of model parameter uncertainty on the posterior predictive probability of conformance.

If the process means are all in conformance with process specifications, then an increase in

data will result in some increase in posterior predictive probability of conformance. If this

predicted increase is satisfactory, then the experimenter may want to gather more data to

confirm this. If this predicted increase is not satisfactory, then the experimenter may wish

to take different action and consider the possibility of process modification to improve

response means and/or variances. At this point, it is not clear in general how the single and

multiple imputation preposterior analyses compare to each other. Further research is needed

to investigate the propertiesof preposterior analyses for response surface optimization.

Useful modifications of the SUR model are possible with the addition of noise variables

and a t-distribution model for the residual errors. Further research in this area to make thevariance-covariance matrix a function of the controllable factors may also prove helpful to

experimenters.

Acknowledgements

We would like to thank Joseph Schaffer for a helpful discussion on the imputation

aspects of this work as related to the preposterior analysis.

References

1. Ames, A. E., Mattucci, N., MacDonald, S., Szonyi, G. and Hawkins, D. M. (1997).Quality loss functions for optimization across multiple response surfaces. Journal of

Quality Technology, 29, 339-346.2. Anderson, M. J. and Whitcomb, P. J. (1998). Find the most favorable formulations.

Chemical Engineering Progress, April, 63-67.

3. Becker, N. G. (1968). Models for the response of a mixture. Journal of the RoyalStatistics Society, Series B, 30, 349-358.

4. Chatterjee, S., Laudato, M. and Lynch, L. A. (1996). Genetic algorithms and theirstatistical applications: an introduction. Computational Statistics and Data Analysis, 22,633-651.

5. Congdon, P. (2006). Bayesian Statistical Modeling, 2nd edition. John Wiley and SonsLtd., Chichester.

6. del Castillo, E., Montgomery, D. C. and McCarville, D. R. (1996). Modifieddesirability functions for multiple response optimization. Journal of Quality Technology,

28, 337-345.7. Derringer, G. and Suich, R. (1980). Simultaneous optimization of several response

variables. Journal of Quality Technology, 12, 214-219.

8. Derringer, G. (1994). A balancing act: optimizing a product's properties, QualityProgress, June, 51-58.

9. Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman andHall/CRC, Boca Raton.

10. Food and Drug Administration (2006). Guidance for Industry - Q8 PharmaceuticalDevelopment. U. S. Department of Health and Human Services, CDER, CBER, USA.

11. Frisbee, S. E. and McGinity, J. W. (1994). Influence of nonionic surfactants on thephysical and chemical properties of a biodegradable pseudolatex. European Journal of

Pharmaceutics and Biopharmaceutics, 40, 355-363.


16/17


12. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004).Bayesian Data Analysis,2nd edition. Chapman and Hall/CRC, Boca Raton.

13. Genz, A. and Bretz, F. (2002). Methods for the computation of multivariate t-probabilities. Journal of Computational and Graphical Statistics, 11, 950-971.

14. Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. John Wiley andSons, Inc. Hoboken, NJ.

15. Griffiths, W. (2003). Bayesian inference in the seemingly unrelated regressions model.In Computer-Aided Econometrics, eds. D. E. A. Giles, New York, Marcel Dekker,263-290.

16. Harrington, E. C. (1965). The desirability function. Industrial Quality Control, 21,494-498

17. Hunter, J. S. (1999). Discussion of response surface methodology current status andfuture directions. Journal of Quality Technology, 31, 54-57.

18. Johnson, Mark E. (1987).Multivariate Statistical Simulation. John Wiley, New York.

19. Johnson, R. A. and Wichern, D. W. (2002).Applied Multivariate Statistical Analysis, 5thedition. Englewood Cliffs, Prentice Hall.

20. Khuri, A. I. and Conlon, M. (1987). Simultaneous optimization of multiple responsesrepresented by polynomial regression function. Technometrics, 23, 363-375.

21. Kim, K. and Lin, D. K. J. (2000). Simultaneous optimization of mechanicalproperties of steel by maximizing exponential desirability functions. Journal of the

Royal Statistical Society, Series C, 49, 311-325.

22. Ko, Y. H., Kim, K. J. and Jun, C. H. (2005). A new loss function-based method formultiresponse optimization. Journal of Quality Technology, 37, 50-59.

23. Kotz, S. and Johnson, R. (1985).Encyclopedia of Statistical Sciences, 6, 129-130.

24. Lange, K., Little, R. and Taylor, J. (1989). Robust statistical modeling using the

t-distribution. Journal of the American Statistical Association, 84, 881-896.25. Liu, C. and Rubin, D. B. (1995). ML estimation of the t-distribution using EM and its

extensions, ECM and ECME. Statistica Sinica, 5, 19-39.

26. Liu, C. (1996). Bayesian robust multivariate linear regression with incomplete data.Journal of the American Statistical Association, 91, 1219-1227.

27. Mardia, K. V. (1974). Applications of some measures of multivariate skewness andkurtosis in testing normality and robustness studies. Sankhya B, 36, 115-128.

28. McLean, R. A. and Anderson, V. L. (1966). Extreme vertices design of mixtureexperiments. Technometrics, 8, 447-454.

29. Mir-Quesada, G., del Castillo, E. and Peterson, J. J., (2004). A Bayesian approach formultiple response surface optimization in the presence of noise variables. Journal of

Applied Statistics, 31, 251-270.

30. Montgomery, D. C. and Bettencourt, V. M. (1977). Multiple response surface methodsin computer simulation. Simulation, 29, 113-121.

31. Nelder, J. A. and Mead, R. (1964). A simplex method for function minimization.The Computer Journal, 7, 308-313.

32. Percy, D. F. (1992). Prediction for seemingly unrelated regressions. Journal of the RoyalStatistical Society, Series B, 54, 243-252.

33. Peterson, J. J. (2004). A posterior predictive approach to multiple response surfaceoptimization. Journal of Quality Technology, 36, 139-153.

34. Peterson, J. J. (2008). A Bayesian approach to the ICH Q8 definition of design space.Journal of Biopharmaceutical Statistics, 18, 958-974.

35. Pignatiello, Jr. J. J. (1993). Strategies for robust multiresponse quality engineering. IIE


17/17

A Bayesian Reliability Approach to Multiple Response Optimization 369Transactions25, 5-15.

36. Press, S. J. (2003). Subjective and Objective Bayesian Statistics: Principles, Models, andApplications, 2nd edition. John Wiley, New York.

37. Price, W. L. (1977). A controlled random search procedure for global optimization.The Computer Journal, 20, 367-370.

38. Raiffa, H. and Schlaiffer, R. (2000). Applied Statistical Decision Theory. John Wiley,New York.

39. Srivastava, V. K. and Giles, D. E. A. (1987). Seemingly Unrelated Regression EquationsModels. Marcel-Dekker, New York.

40. Rajagopal, R., del Castillo, E. and Peterson, J. J. (2005). Model and distribution-robust process optimization with noise factors. Journal of Quality Technology, 37,210-222. (Corrigendum 38, p83).

41. Vining, G. G. (1998). A compromise approach to multiresponse optimization. Journalof Quality Technology, 30, 309-313.

42. Wood, S. N. (2006). Generalized Additive Models, an Introduction with R. Chapman andHall/CRC, Boca Raton, FL.

43. Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressionsand tests of aggregation bias. Journal of the American Statistical Association, 57, 500-509.

44. Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics. John Wiley,New York.

Authors Biographies:

John J. Peterson is a Senior Director in the Research Statistics Unit of GlaxoSmithKlinePharmaceuticals. He received his B.S. in Applied Mathematics and in Computer Science(double major) from the State University of New York at Stony Brook and his Ph.D. instatistics from The Pennsylvania State University. Dr. Peterson has over 20 years experienceas a statistician in the pharmaceutical industry. His current research area is in responsesurface methodology as applied to pharmaceutical industry problems, including applicationsto "chemistry, manufacturing, and control" (CMC) and combination drug studies. Dr.Peterson is a Fellow of the American Statistical Association and a Senior Member of theAmerican Society for Quality. He is also on the editorial boards of the Journal of QualityTechnologyand the journalApplied Stochastic Models in Business and Industry.

Guillermo Mir-Quesada is a Sr. Research Scientist in the Bioprocess Research andDevelopment department at Eli Lilly and Co. He received a Ph.D. in Industrial Engineeringand Operations Research from Pennsylvania State University. He has worked in theBiotech division of Eli Lilly and Co. since 2003, where he has supported the developmentof manufacturing processes for active pharmaceutical ingredients. He is involved inactivities related to integrating Quality by Design principles in the drug development planand assessing the capability of manufacturing process in development.

Enrique del Castillo is a Distinguished Professor of Engineering in the Department ofIndustrial & Manufacturing Engineering at the Pennsylvania State University. He alsoholds an appointment as Professor of Statistics at PSU and directs the EngineeringStatistics Laboratory. Dr. Castillos research interests include Engineering Statistics withparticular emphasis on Response Surface Methodology and Time Series Control. Anauthor of over 80 refereed journal papers, he is the author of the textbooks ProcessOptimization, a Statistical Approach (Springer, 2007), Statistical Process Adjustment for QualityControl (Wiley, 2002), and co-editor (with B.M. Colosimo) of the bookBayesian Process

Monitoring, Control, and Optimization (CRC, 2006). He is currently (2006-2009) editor-in-chief of the Journal of Quality Technology.

Documents

A Bayesian Reliability Approach To Multiple Response Optimization With Seemingly Unrelated Regression Models_Paper