On optimization strategies for parameter estimation in models governed by partial differential equations

+ModelM

A

eia©

K

1

(taa

(Fp

o

2

f

0

ARTICLE IN PRESSATCOM-3492; No. of Pages 11

Available online at www.sciencedirect.com

Mathematics and Computers in Simulation xxx (2010) xxx–xxx

Original article

On optimization strategies for parameter estimation in modelsgoverned by partial differential equations�

Esdras P. Carvalho a, Julian Martínez b, J.M. Martínez c,∗, Feodor Pisnitchenko d

a Department of Mathematics, State University of Maringá, Brazilb Chemical and Food Engineering Department, Federal University of Santa Catarina, Brazil

c Department of Applied Mathematics, University of Campinas, Brazild Center for Mathematics, Computation and Cognition, Federal University of ABC, Brazil

Received 23 March 2010; received in revised form 22 June 2010; accepted 19 July 2010

bstract

Extraction problems governed by systems of partial differential equations appear in several branches of Engineering. Parameterstimation involves discretization and modeling in a finite dimensional setting. A model that arises in the supercritical extraction areas analyzed in this paper. Numerical difficulties of the nonlinear programming formulation of the estimation process are discussednd a satisfactory procedure based on unconstrained derivative-free optimization is suggested.

2010 IMACS. Published by Elsevier B.V. All rights reserved.

eywords: Parameter estimation; Nonlinear optimization; Fluid extraction models simulation

. Introduction

Mathematical models for Engineering problems usually take the form of systems of partial differential equationsPDE). The practical solution of these problems involves discretization and numerical computation. The solution ofhe discretized PDE may have different levels of agreement with its analytic counterpart. In general, analytic solutionsre not known and, so, the validation of numerical solutions depends on interdisciplinary work involving engineersnd numerical mathematicians.

Many times one has to fit mathematical and numerical models to empirical data. Instead of merely solving thediscretized) PDE one needs to find equation parameters, by means of which the PDE solution agrees with observations.

Please cite this article in press as: E.P. Carvalho, et al., On optimization strategies for parameter estimation in models governedby partial differential equations, Math. Comput. Simul. (2010), doi:10.1016/j.matcom.2010.07.020

requently, the unknown parameters have useful physical interpretations and their values provide information on thehenomenon being analyzed.

After discretization, the problem of estimating the “correct” parameters takes the form of a finite dimensionalptimization problem. Closed solutions for these problems are, almost always, impossible to obtain. Therefore, numer-

� This work was supported by PRONEX-Optimization (PRONEX - CNPq/FAPERJ E-26/171.510/2006 - APQ1), FAPESP (Grants 2006/53768-0,005/57684-2 and 2007/08359-7) and CNPq.∗ Corresponding author.

E-mail addresses: [email protected] (E.P. Carvalho), [email protected] (J. Martínez), [email protected] (J.M. Martínez),[email protected] (F. Pisnitchenko).

378-4754/$36.00 © 2010 IMACS. Published by Elsevier B.V. All rights reserved.doi:10.1016/j.matcom.2010.07.020

dx.doi.org/10.1016/j.matcom.2010.07.020


mailto:[email protected]





+Model
ARTICLE IN PRESSMATCOM-3492; No. of Pages 11
2 E.P. Carvalho et al. / Mathematics and Computers in Simulation xxx (2010) xxx–xxx

ical methods are needed in order to get good approximations. Sometimes, publicly available nonlinear programmingpackages are quite efficient for this purpose. Other times, the development of special problem-oriented procedures isnecessary.

In the presence of a finite-dimensional optimization problem, nonlinear programming solvers seek approximatesolutions that satisfy the Karush–Kuhn–Tucker (KKT) conditions [21] as closely as possible. This means that anapproximate solution must be feasible (it must satisfy the discretized PDE) and the gradient of the objective func-tion must be a linear combination of the gradients of the constraints, to certain extent, with the “correct sign” ofLagrange multipliers in the case of inequality constraints. It is usually said that the approximate solution mustfulfill, the “feasibility” and “optimality” conditions as closely as possible. Sometimes, “optimality” is called “dualfeasibility”.

When the nonlinear programming solver finds an approximate solution with a very high precision with respect bothto feasibility and optimality, there is the temptation to claim “success”. Surprisingly, in many cases, engineers do notagree with this claim. This lack of agreement may be due to many reasons, including inadequacy and incompleteness ofthe original mathematical model. Eventually, frank dialogue between engineers and numerical mathematicians shouldlead to satisfactory solutions.

A “mathematical” reason for failure in cases in which success seems to be suggested by the achievement ofapproximate KKT conditions is that, sometimes, high precision with respect to the feasibility and optimality indicatorsdoes not correspond to closedness between the approximate solution of the optimization problem and the true one.More precisely, a point could “almost satisfy” the constraints despite being very far from the feasible set. Usually, theengineer detects that a numerical solution is impossible because some qualitative or geometrical property fails to befulfilled. The situation described here corresponds to the “ill-conditioning” of the set of equations (or inequations) thatdescribes the feasible set.

A trivial example is given below. Assume that, after discretization, the optimization problem is

Minimize (x1 + 1)2 subject to h(x) ≡ 10−10x21 + x2

2 = 0. (1)

Clearly, the unique solution of (1) is given by x∗1 = x∗2 = 0, where the objective function value is 1. However, takingx1 = −1, x2 = 0, the objective function value is 0 and feasibility is fulfilled with precision 10−10 (taking, as usually,|h(x)| as the infeasibility measure). Observe that there are points such that |h(x)| = 10−10 that are very close to thetrue solution, for example, the point (0, 10−5). Nevertheless, a typical nonlinear programming solver could choose(− 1, 0) instead of (0, 10−5) because the objective function value is much smaller in the first point than in the second.Of course, in this simple problem one can see exactly what happens, but in a complex problem the same essentialphenomenon could occur and a wrong approximate solution could be chosen. Note that, rigorously speaking, there isnothing in the approximate solution (− 1, 0) that makes it worse than (0, 10−5). Of course, (0, 10−5) is closer to thetrue solution than (− 1, 0) but this is only because the Euclidean norm is used to evaluate distances. In an analogousreal-life situation it can be imagined that the nonlinear programming solver presents the “solution” (− 1, 0) to theengineer and that, hopefully, the experienced engineer detects that something is wrong and that a different numericalmathematical direction should be taken.

An additional potentially misleading feature inherent to the optimization model is that, usually, feasibility is muchmore important than optimality in real applications. Moreover, the constraints that determine if a point is feasible or nothave different degrees of “hardness”. A constraint may represent a desirable feature and, thus, deserve to be classifiedas “soft”. But other constraints may represent physical laws for which a high precision is desirable. Finally, the most“serious” constraints are “definitions”, which should be satisfied with the maximum possible precision. The systematiclack of fulfillment of “definitions” may accumulate errors in different constraints, generating completely unreliablenonfeasible solutions.


An experience in the process of estimating parameters of a real-life PDE chemical engineering problem is reportedin this paper. The drawbacks of the discretization–optimization process described above have been observed in thisproblem and a satisfactory solution has been suggested. The paper is organized as follows. The PDE models arepresented in Section 2. Discretizations are described in Section 3. A Nonlinear Programming approach for fittingthe parameters is suggested in Section 4. In Section 5 a derivative-free optimization strategy is defined. Detailedexperiments concerning this strategy are given in [6]. Some conclusions are stated in Section 6.


+ModelM

2

tc

J

b

at

w

a

t


E.P. Carvalho et al. / Mathematics and Computers in Simulation xxx (2010) xxx–xxx 3

. PDE models

The mass balance equations for fluid (solvent) and solid phases in supercritical fluid extraction [5] are:

∂Y

∂t+ U

∂Y

∂h= ∂

∂h

(DaY

∂Y

∂h

)+ J(X, Y )

ε, (2)

∂X

∂t= ∂

∂r

(DaX

∂X

∂r

)− J(X, Y )

1− ε

ρ

ρS

. (3)

The unknowns of the system (2) and (3) are X = X(h, r, t), the extract concentration in the solid phase, and Y = Y(h,), the extract concentration in the fluid phase. The variables h, r and t are, respectively, the one-dimensional (space)oordinate of a column with total height equal to H, the radial coordinate of each solid particle and the time.

In (2) and (3) the following are defined:

ε: the extraction bed porosity;ρ: the solvent density (M/L3);ρS: the solid density (M/L3);U: the solvent velocity (L/T), which may be written as

U = 4QCO2

πD2bερ

,

where Db (L) is the diameter of the extraction column and QCO2 (M/T) is the solvent mass flow rate;DaY: the extract diffusivity in the fluid phase (L2/T); andDaX: the extract diffusivity in the solid phase (L2/T).

The function J(X, Y) represents the interfacial mass transfer flux. Apart from the extract concentration in both phases,(X, Y) can involve other process parameters, such as ρ, ρS and the extract solubility in the fluid phase (Y∗).

The definition of J(X, Y) also involves adjustable parameters kXA, kYA. The following definition of J(X, Y) [30] wille used in this paper:

If X > Xk, J(X, Y ) = kYA(Y∗ − Y ); (4)

If X ≤ Xk, J(X, Y ) = kXAX

(1− Y

Y∗

). (5)

In order to solve (2) and (3) initial conditions X(h, r, t = 0), Y(h, t = 0); boundary conditions X(h, r = 0, t), Y(h = 0, t)nd (∂ Y/∂h)(h = H, t) are needed. The equations must be integrated for 0≤ t≤Tf, 0≤ h≤H and 0≤ r≤R, where R ishe solid particle radius.

The extraction curve is defined by

E(h, t) = QCO2

∫ t

0Y (h, s) ds, (6)

here E(h, t) is the extract mass.In practical terms, one has m observed values E(H, t1), . . ., E(H, tm) of the extraction curve and, using these data

nd the model (2) and (3) one should estimate the parameters kXA, kYA, Xk, DaX, DaY.Three simplifications of the system (2) and (3) will be considered. In the first simplification the solid particles are

oo small. Then, the term that involves the derivatives with respect to r is neglected. Therefore, the model becomes:( )


∂Y

∂t+ U

∂Y

∂h= ∂

∂hDaY

∂Y

∂h+ J(X, Y )

ε, (7)

∂X

∂t= −J(X, Y )

1− ε

ρ

ρS

. (8)


+Model

with the initial conditions:

X(h, t = 0) = X0 and [Y (h, t = 0) = Y∗ or Y (h, t = 0) = 0] (9)

and the boundary conditions

Y (h = 0, t) = 0,∂Y

∂h(h = H, t) = 0. (10)

This model has four adjustable parameters (DaY, kXA, kYA, Xk).The choice of the initial condition for Y depends on the existence of a static period before the onset of extraction. If

this period is long, it may be admitted that the solvent becomes saturated, so its extract concentration coincides withthe solubility Y∗. If there is not a static period it may be considered that the solvent is pure when extraction begins and,so, Y is set to zero.

A second simplification neglects the second derivatives of Y in relation to h, considering that diffusion in the fluidphase is not significant when compared to convection. Thus, the model is:

∂Y

∂t+ U

∂Y

∂h= J(X, Y )

ε, (11)

∂X

∂t= −J(X, Y )

1− ε

ρ

ρS

(12)

with the initial conditions (9) and the boundary condition

Y (h = 0, t) = 0. (13)

In this case the only adjustable parameters are kXA, kYA, Xk.Finally, if one considers that the variation of Y in relation to time is negligible, the following simplification is

obtained:

εU∂Y

∂h= J(X, Y ), (14)

ρS(1− ε)∂X

∂t= −ρJ(X, Y ), (15)

with the initial condition

X(h, t = 0) = X0 (16)

and the boundary condition (13). Again, the adjustable parameters here are kXA, kYA, Xk.It can be shown [30] that the model (14) and (15) with conditions (16) and (13) has the analytic solution given by

the formulae below:

tCER = (X0 −Xk)(1− ε)ρS

Y∗kYAρ, (17)

tFER(h) = tCER + ρS

ρ

1− ε

kXA

ln

[Xk + (X0 −Xk) exp((kXAhX0)/(UεY∗))

X0

], (18)

Zw(t) = kYAY∗

kXAX0ln

{X0 exp

[(kXA(t − tCER)ρ)/((1− ε)ρS)

]−Xk

X0 −Xk

}, (19)

E(h, t) = QCO2Y∗[

1− exp

(−kYAh

εU

)]t, if 0 ≤ t ≤ tCER, (20)

[ ( )]


E(h, t) = QCO2Y∗ t − tCER exp Zw(t)− kYAh

εU, if tCER < t ≤ tFER(h), (21)

E(h, t) = (1− ε)ρShQCO2

ρεU

[X0 − Y∗Uε

kXAh× ln

{1+

[exp

(kXAhX0

UεY∗

)− 1

]exp

[kXAρ

ρS (1− ε)(tCER − t)

](Xk

X0

)}](22)


+ModelM

i

i

i

uT

3

ioeops

a

a



f t > tFER(h).Moreover,

Y (h, t) = 1

QCO2

∂E

∂t(h, t), (23)

X(h, t) = X0 + kYAρ

ρS(1− ε)

[E(h, t)

QCO2

− Y∗t]

(24)

f t≤ tFER(h),and

X(h, t) = Xk exp

(ρkXA

ρS(1− ε)

[E(h, t)− E(h, tFER(h))

QCO2Y∗ − (t − tFER(h))

])(25)

f t > tFER(h).The expression Y(0, t) must be read as:

Y (0, t) = limh→0

Y (h, t).

In this paper, we are concerned about the problem of finding the parameters that best fit the extraction curve, directlysing the differential equations (7) and (8), (11) and (12) and (14) and (15) with their initial and boundary conditions.he analytic solution of (14) and (15) will be used only to test the reliability of numerical schemes.

. Discretizations

Finite difference explicit discretizations of the systems (7) and (8), (11) and (12) and (14) and (15) are used. Ast is well known, greater stability is expected if one uses implicit or semi-implicit schemes [12,27,31] and stabilityf explicit discretizations usually need very small time steps. However, for several reasons, it was decided here thatxplicit discretizations are affordable. One of these reasons is the discontinuity of J(X, Y), due to which the definitionf implicit schemes is rather cumbersome. On the other hand, modeling and measurement errors are, in this type ofroblems, much greater than the errors that come from discretization. Finally, stability may be achieved using suitablymall time steps without significative increase of computer time.

The explicit discretization of (7) and (8) is given by:

Y (h, t +�t)− Y (t)

�t+ U

Y (h, t)− Y (h−�h, t)

�h= DaY

Y (h+�h, t)− 2Y (h, t)+ Y (h−�h, t)

�h2

+ J(X(h, t), Y (h, t))

ε, (26)

nd

X(h, t +�t)−X(h, t)

�t= −J(X(h, t), Y (h, t))

1− ε

ρ

ρS

. (27)

For (11) and (12) the following discretization is adopted:

Y (h, t +�t)− Y (t)

�t+ U

Y (h+�h, t)− Y (h−�h, t)

2�h= J(X(h, t), Y (h, t))

ε, (28)

nd

X(h, t +�t)−X(h, t)

�t= −J(X(h, t), Y (h, t))

1− ε

ρ

ρ. (29)


S

Finally, the discretization of (14) and (15) yields:

Y (h+�h, t)− Y (h, t)

�h= J(X(h, t), Y (h, t))

Uε(30)


+Model

and

X(h, t +�t)−X(h, t)

�t= −J(X(h, t), Y (h, t))

1− ε

ρ

ρS

. (31)

The discretization steps are defined by �h = H/nh, �t = Tf/nt. Clearly, unknowns in the grid may be computedrecursively using the discrete equations and the discretized initial and boundary conditions. Note that, only twoconsecutive time levels need to be stored in memory.

The theoretical extraction curve E(H, t) may be computed using Y(H, j�t), j = 1, . . ., nt, by means of (6) andelementary numerical integration. In particular, the values of the extraction curve at the grid points 0, �t, 2�t, . . .,nt�t will be:

E(H, j�t) = QCO2�t[Y (H, �t)+ · · · + Y (H, j�t)] (32)

for j = 1, . . ., nt.It was verified that the numerical scheme based on (30) and (31) is reliable for solving the system (14) and (15),

whose analytical solution is given by (17) and (25). Exhaustive comparisons between numerical and analytic solutionsof (14) and (15) allowed the following conclusions to be reached:

1. The error (average of the difference between analytic and numerical solution) is essentially insensitive in relationto space discretization. Namely, if the number of time steps is fixed, similar errors are obtained using nh = 10, 100,1000.

2. The error is empirically proportional to �t. Taking nt = 1000 the size of the error is one tenth of the one obtainedtaking nt = 100. Details may be found in [6].

3. With nt = 1000 the size of the error is clearly smaller than the size of measurement errors.

4. Nonlinear Programming estimation procedure

For simplicity, let us concentrate on the system (28) and (29). The objective herewith is to find the best parameterskXA, kYA, Xk that fit the observations of the extraction curve. Eqs. (28) and (29) are considered to be constraints of afinite dimensional optimization problem whose unknowns are X(h, t), Y(h, t) at the grid points and, in addition, theparameters kXA, kYA, Xk. In other words, this nonlinear programming problem has the form

Minimizef (x) subject to x∈�, (33)

where x∈Rn is a vector that involves the values X(h, t), Y(h, t) and the parameters kXA, kYA, Xk. The function f(x) is thesum of squares of the differences between the observed extractions and the computed extractions at the observationtime steps and the � set represents the system (28) and (29) and the positivity constraints for the parameters.

For solving (33) the well established nonlinear programming solver Algencan, described in [1] and available inwww.ime.usp.br/∼egbirgin/tango, was used. The algorithm is of the augmented Lagrangian type [11,23,28]. Thismeans that, at each (outer) iteration, one minimizes (approximately) the objective function plus a term that penalizesviolation of shifted constraints. Both penalty parameters and shifts are updated after the resolution of each subproblem.Shifts provide information on the Lagrange multipliers associated with the final solution. For solving the subproblems,Algencan uses a method for minimizing functions with bounds on the variables. The main convergence properties ofAlgencan are given in the following theorem [1].

Theorem 4.1. Assume that{xk} is a sequence generated by Algencan so as to solve (33) and that both f and thefunctions that define� are smooth enough. Then, every limit point of{xk} belongs to� or, at least, satisfies stationarityconditions for the minimization of the sum-of-squares natural infeasibility measure. Moreover, if the limit point isfeasible and satisfies the “constant positive linear dependence (CPLD)” constraint qualification, then it satisfies the


KKT optimality conditions.

Under local additional assumptions [1] it can be proved that the penalty parameters associated to the AugmentedLagrangian procedure are bounded and the method converges with arbitrarily fast linear speed. The CPLD conditionwas introduced in [26] and its properties as a constraint qualification were elucidated in [3]. CPLD is a weak constraint


http://www.ime.usp.br/~egbirgin/tango

ARTICLE IN PRESS+ModelMATCOM-3492; No. of Pages 11


qe[

ifm

ip

C

T

erffi

ctTpeSta

Fig. 1. Extraction with errors.

ualification, which means that the convergence properties stated in Theorem 4.1 are stronger than convergence prop-rties of algorithms that require stricter constraint qualifications, such as regularity [21] and Mangasarian–Fromovitz13].

Moreover, in the proof of Theorem 4.1 it can be verified that, when convergence to feasible points occurs, an iterates found such that the KKT conditions are satisfied with arbitrary precision. A precise statement of this fact may beound in [2]. Finally, if one assumes that approximate global minimizers of the augmented Lagrangian subproblemsay be found with arbitrary precision, limit points of Algencan are global minimizers of (33)[4].The lack of continuity of J(X, Y) deserves some comment because, in principle, this property causes a violation

n the smoothness assumptions required for convergence of Algencan. In order to overcome this difficulty we mayroceed as follows:

Define H(t) as the step function

H(t) = 0 if t < 0, H(t) = 1 if t ≥ 0.

onsider Hη(t) to be a smooth approximation of H(t) for a small η > 0. For example:

Hη(t) = 1

πarctan

(t

η

)+ 1

2.

hen, (4) and (5) may be approximately expressed in the smooth form:

Jη(X, Y ) = kYA(Y∗ − Y )Hη(X−Xk)+ kXAX

(1− Y

Y∗

)Hη(Xk −X).

As a result of the considerations reported in this section, we had very good theoretical and practical reasons tomploy Algencan to solve (33). In fact, this was done for several synthetic and real-life problems with very goodesults, in the sense that satisfactorily feasible points were obtained with small values of the sum of squares objectiveunction and, in all cases, approximate KKT conditions were satisfied. The most exciting initial result was that thetting of the predictions to data was almost perfect.

Then, a decision was taken to apply the optimization method to the estimation of parameters of a empirical extractionurve that, very obviously, exhibited gross measurement errors (Fig. 1). Surprisingly, Algencan obtained again a solutionhat was feasible with very high precision and the theoretical extraction curve almost exactly fitted the observations!herefore, Algencan “was successful” from the point of view of finding “approximately feasible” and “almost optimal”oints and the theoretical results were corroborated, but the “solution” was completely unacceptable: there are no


xtraction curves with the shape found by the method. The example was similar to the very simple one presented inection 1 of this paper: a point was found that approximately, and with high precision, fulfilled the constraints, but

his point was quite far from the feasible set. Moreover, this point was preferred to the “almost feasible” points thatre close to the feasible set because the objective function value was the best possible one. As mentioned in Section


+Model

1, the constraints of this problem are “definitions” and the very small systematic violation caused by the optimizationsearch caused the occurrence of a spurious approximate solution.

This behavior seems to be inherent to methods that employ merit functions and admit nonfeasible iterates. Weare not optimistic about the success of filter methods [7,9,29] for problems like this either. Observe that, in the trivialproblem defined in Section 1 of this paper, point (− 0.99, 0) is “better” than (0, 10−5) from the filter point of view (boththe objective function and the infeasibility measure are smaller). Therefore, it can be conjectured that filter methodscould also be greedily trapped by points whose infeasibility is not detected by constraint values.

5. Derivative-free feasible optimization strategy

The experience reported in Section 4 revealed that, in the process of estimating parameters for the systems herewith,the feasibility of the iterates should not be abandoned. Very small infeasibility tolerances could be used by the nonlinearprogramming solver to find smaller functional values and unreliable (albeit “almost feasible”) extraction curves.Therefore, it was decided to rely on the more intuitive process that considers that the only independent variables arethe parameters to be estimated and the function evaluation involves solving the discretized PDE, for trial values ofthe independent parameters. Since the parameters must be non-negative, we are in the presence of a bound-constraintminimization problem. Note that, in this way, (28) and (29) are almost exactly satisfied in the sense that the quantitiesthat are necessary for objective function evaluation are computed with all the precision imposed by the machinehardware and there are no tolerances from which a greedy minimization procedure could take advantage. However, theoccurrence of instabilities in the recursive computation of X, Y could still be possible, causing severe inaccuracies infunction evaluations. Since evaluation inaccuracies potentially harm descent method based on gradients, it was decidedto employ a derivative-free algorithm.

BOBYQA, the derivative-free bound-constraint method and subroutine introduced by Powell [24,25], was chosen.BOBYQA uses, at each minimization step, quadratic approximations of the objective function that sample the domainquite reasonably. Therefore, the method does not seem to be sensitive to rather large local variations.

At each iteration, BOBYQA minimizes a quadratic function with bounds on the variables. The quadratic interpolatesthe true objective function considering the current approximation to the solution and some previous approximations.Since the quadratic that interpolates the function at the required points is not unique, BOBYQA employs sophisticatedcriteria to choose the minimal variation one. If, at the trial point produced by quadratic minimization, the objectivefunction does not decrease, the algorithm employs a trust-region strategy in order to suggest a more reasonable trialpoint. When the trust region becomes very small, the algorithm stops. Up to the present, no convergence results havebeen proved for this derivative-free method.

The number of parameters that are necessary to fit (28) and (29) is 3 (kXA, kYA, Xk), which means that quadraticinterpolation is affordable. (Success of the method for minimizing functions with more than 100 variables has beenreported [25].) The method tends to find local minimizers, although, as mentioned above, no theoretical convergenceresults are still available. One should not expect global minimizers to be easy to find, unless a good sample of initialpoints is used. Therefore, a multiple-start procedure was implemented, by means of which different random initialvalues for the parameters were chosen and employed as starting approximations to the solution.

The employment of BOBYQA only requires that the objective function be coded (SUBROUTINE CALFUN) which,in our case, involves the recursive integration of (28) and (29), followed by the computation of E(H, t) at the time gridpoints and the calculation of the sum of (squares of) errors in relation to the observed extractions.

Consider the objective function f that depends on the parameters to be estimated and measures the difference betweenobserved and computed extraction values. Therefore, f = f(kXA, kYA, Xk) in (28) and (29). The natural simple constraintson the parameters are:

kXA ≥ 0, kYA ≥ 0, 0 ≤ Xk ≤ X0, DaY ≥ 0. (34)


Empirical considerations lead one to add the bounds:

kXA ≤ 5, kYA ≤ 5, DaY ≤ 5. (35)

The global optimization strategy is sketched below.


ARTICLE IN PRESS+ModelMATCOM-3492; No. of Pages 11


Table 1Function p (N).

N p (N)

10 220 430 640 850 960 1170 1380 1490 1611

1

2

3

(uFfrvcadmav

6

o

Ytft“

00 1700 + k ≈17 + 0.13 k

. Initial approximations required by BOBYQA were chosen randomly on the variation box defined by (34) and (35).N← 0 is initialized.

. For each problem, BOBYQA with 10 initial points is called and the number p of different local minimizers arecomputed. N←N + 10 is updated. If p is not greater than p(N) (see Table 1) the process is stopped. Otherwise, thereis a repetition trying 10 additional initial points and so on. If a maximum of 5000 calls to BOBYQA are performed,the process is stopped. The two local minimizers are considered to be “equivalent” if the difference between therespective objective function values is smaller than 0.01.

. Assume that k∗XA, k∗YA, X∗k are the parameters estimated by the process indicated above. Then, the process replacingthe upper bounds (35) by:

kXA ≤ max{0.1, 2k∗XA}, kYA ≤ max{0.1, 2k∗YA}, (36)

is repeated.

The results of this procedure for fitting parameters in peach almond oil extraction models were quite reasonablesee [6]). By its very essence, the method always produces curves that satisfy the recursive equations (28) and (29)p to the best precision that the computer can reach and, among those curves, computes the one that best fit the data.rom a chemical engineer’s point of view the results are very satisfactory. All the experiments reported in [6] comerom situations where measurement errors were well controlled, which means that the empirical extraction curves areeliable. Here we want to show a case in which measurement errors were unacceptable. This can be verified by aisual inspection of the empirical extraction curve (black balls in Fig. 1). The full line in this picture represents theurve found by the procedure described in this section. As expected, this line did not fit the wrong measurements and,s a matter of fact, detected their localization. The interesting fact is that, using the Nonlinear Programming methodescribed in Section 4, and computing the extraction curve by means of the solution X(h, t) found by this method, theodel extraction values coincided almost exactly with the (wrongly) measured ones. This is the phenomenon reported

nd explained in Sections 1 and 4, due to which one must be extremely cautious when using both parameters and statealues as independent variables of a time-evolution fitting problem.

. Conclusions

A parameter identification problem governed by partial differential equations with relevant applications in the areaf Chemical Engineering was addressed. See, among others, [5,8,10,14–19,22,30].

An attempt was made to use the Nonlinear Programming approach in which both parameters and states (X(h, t),(h, t)) are primal variables of the optimization problem and the discretized PDE’s are constraints. The results of


his approach were paradoxal. The nonlinear programming solver may find “almost feasible points” with approximateulfillment of KKT conditions, where the sum-of-squares objective function is almost null (and, thus, optimal) inhe proposed solution. However, although the approximate solution almost exactly fulfills the constraints, it could befar” from the true feasible set. This phenomenon seems to be inherent to optimization strategies that simultaneously


+Model

consider feasibility and optimality requirements. Moreover, the phenomenon is associated with poor conditioning ofthe functional representation of the constraints.

Our feeling is that one should be extremely cautions when dealing with engineering models that involve “designvariables” and “model (or state) variables”. If the state equations are ill-conditioned, optimization methods will probablytake advantage of feasibility tolerances for obtaining artificially low objective function values at points that may be farfrom being feasible. Therefore, if we want to use approaches in which design and model variables have the same status,it is recommendable to employ very effective preconditioning techniques for stabilizing the feasible set. In general, alot of computer time should be saved in this way, especially if the number of design variables is small. See, for example,[32].

Some optimization researchers adopt the point of view that the objective of a nonlinear programming solver is toachieve solutions within the given tolerances for feasibility and optimality. (Many large comparisons between differentalgorithms following this point of view have been published.) If points that satisfy the required tolerances are notgood enough, this would not be a problem of the optimization solver, but of the modeling strategy. Based on severalreasons, we do not agree with this position. The main reason is that the situation “small constraints, large distance tofeasible set” is not rare when one deals with large-scale problems coming from discretizations. Take, for example, thediscretization of y′′ = 0 with y(a) = y(b) = 1, with large b− a and a fine discretization greed. Starting from y = 0 it is easyto see that the Gauss–Seidel method, the Conjugate Gradient method and any method based on gradients converge veryslowly to the solution y = 1, intermediate points almost satisfy the discretized equations and the distance between thesepoints and the solution is large for many iterations. This is, essentially, the example used by Nesterov [20] to showthat gradient methods may converge with the worst possible theoretical speed. On the other hand, the problem can betrivially solved using a direct method based on factorizations. This means that we cannot discard the possible existenceof nonlinear programming algorithms that “naturally” find points with “almost-feasibility” linked to distance to thefeasible set, instead of constraint values. Of course, this is more a practical than a theoretical question behind whichthe contradictions between local fast (Newtonian) convergence and global slow (gradient-like) convergence may beencountered.

This comment cannot be ended without answering a natural question concerning the nonlinear programming option.Assume that the problem of Fig. 1 is “solved” using the continuous optimization algorithm and, after getting theapproximate solution that satisfies KKT and feasibility with small constraint tolerances, the state variables from thissolution are discarded, parameters kXA, kYA, Xk are kept and X(h, t), Y(h, t) are re-computed using the recursive equations(28) and (29). What kind of extraction curve is obtained? The answer is: a very similar curve to the one computed bythe feasible derivative-free procedure, shown in Fig. 1. Therefore, in practice, there seem to be no great differencesbetween both procedures. However, using nonlinear programming in this way is risky. The feasible derivative-freeapproach has the advantage that one knows what one is doing and knows the precise meaning of the optimizationprocedure that is being performed at every stage of the calculations.

Acknowledgements

We are indebted to two anonymous referees for helpful comments and encouraging words.

References

[1] R. Andreani, E.G. Birgin, J.M. Martínez, M.L. Schuverdt, On augmented Lagrangian methods with general lower-level constraints, SIAMJournal on Optimization 18 (2007) 1286–1309.

[2] R. Andreani, G. Haeser, J.M. Martínez, On sequential optimality conditions for smooth constrained optimization, Optimization, in press.[3] R. Andreani, J.M. Martínez, M.L. Schuverdt, On the relation between the constant positive linear dependence condition and quasinormality

constraint qualification, Journal of Optimization Theory and Applications 125 (2005) 473–485.[4] E.G. Birgin, C.A. Floudas, J.M. Martínez, Global minimization using an augmented Lagrangian method with variable lower-level constraints,

Mathematical Programming, in press.[5] G. Brunner, Gas extraction: an introduction to fundamentals of supercritical fluids and the application to separation processes, Steinkopff (4),


Darmstadt, 1994.[6] E.P. Carvalho, F. Pisnitchenko, N. Mezzomo, S.R.S. Ferreira, J.M. Martínez, J. Martínez, Numerical simulation and parameter estimation for

supercritical fluid extraction models. Tech. Rep., Department of Applied Mathematics, University of Campinas, Brazil, 2009. Available inwww.ime.unicamp.br/martinez/supercritico.doc, in press.

[7] R. Fletcher, S. Leyffer, Nonlinear programming without a penalty function, Mathematical Programming 91 (2002) 239–269.


http://dx.doi.org/10.1007/s10107--009-0264-y

http://www.ime.unicamp.br/martinez/supercritico.doc

+ModelM

[

[[

[

[

[

[

[

[

[

[[[[

[

[[

[[

[

[

[

[



[8] F. Gaspar, T. Lu, B. Santos, B. Al-Durin, Modeling the extraction of essential oils with compressed carbon dioxide, Journal of SupercriticalFluids 25 (2003) 247–260.

[9] C.C. Gonzaga, E.W. Karas, M. Vanti, A globally convergent filter method for nonlinear programming, SIAM Journal on Optimization 14 (2003)646–669.

10] M. Goto, M. Sato, T. Hirose, Extraction of peppermint oil by supercritical carbon dioxide, Journal of Chemical Engineering of Japan 26 (1993)401–409.

11] M.R. Hestenes, Optimization Theory: The Finite Dimensional Case, Wiley, New York, 1975.12] R.J. LeVeque, Finite Difference Methods For Ordinary and Partial Differential Equations. Steady State and Time Dependent Problems, SIAM,

Philadelphia, PA, 2007.13] O.L. Mangasarian, S. Fromovitz, The Fritz–John necessary optimality conditions in presence of equality and inequality constraints, Journal of

Mathematical Analysis and Applications 17 (1967) 37–47.14] J. Martínez, Extracão de óleos voláteis e outros compostos com co2 supercrítico: desenvolvimento de uma metodologia de aumento de escala

a partir da modelagem matemática do processo e avaliacão dos extratos obtidos, Ph.D. thesis, Food Engineering Department, University ofCampinas, Brasil, 2005.

15] J. Martínez, J.M. Martínez, Fitting the Sovová’s supercritical fluid extraction model by means of a global optimization tool, Computers andChemical Engineering 32 (2008) 1735–1745.

16] J. Martínez, A.R. Monteiro, P.T.V. Rosa, M.O.M. Marques, M.A.A. Meireles, Multicomponent model to describe extraction of ginger oleoresinwith supercritical carbon dioxide, Industrial Engineering of Chemistry Research 42 (2003) 1057–1063.

17] J. Martínez, P.T.V. Rosa, M.A.A. Meireles, Extraction of clove and vetiver oils with supercritical carbon dioxide: modeling and simulation,The Open Chemical Engineering Journal 1 (2007) 1–7.

18] M.A.A. Meireles, G. Zahedi, T. Hatami, Mathematical modeling of supercritical fluid extraction for obtaining extracts from vetiver root, Journalof Supercritical Fluids 49 (2009) 23–31.

19] N. Mezzomo, J. Martínez, S.R.S. Ferreira, Supercritical fluid extraction of peach (Prunus persica) almond oil: kinetics, mathematical modelingand scale-up. Journal of Supercritical Fluids, in press.

20] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer, 2004.21] J. Nocedal, S.J. Wright, in: Numerical Optimization, Springer Verlag, New York, 1999.22] A.V. Pekhov, G.K. Goncharenko, Ekstrakcija prjanogo syrja szhizennymi gazami, Maslozhirovaja promyshlennost 34 (10) (1968) 26–29.23] M.J.D. Powell, A method for nonlinear constraints in minimization problems, in: R. Fletcher (Ed.), Optimization, Academic Press, New York,

NY, 1969, pp. 283–298.24] M.J.D. Powell, The NEWUOA software for unconstrained minimization without derivatives, in: G. Di Pillo, M. Roma (Eds.), Large-Scale

Nonlinear Optimization, Springer, 2006, pp. 255–297.25] M.J.D. Powell, Subroutine BOBYQA, Department of Applied Mathematics and Theoretical Physics, Cambridge University, 2009.26] L. Qi, Z. Wei, On the constant positive linear dependence condition and its application to SQP methods, SIAM Journal on Optimization 10

(2000) 963–981.27] R.D. Richtmyer, K.W. Morton, Difference Methods for Initial-Value Problems, John Wiley and Sons, New York, 1967.28] R.T. Rockafellar, Augmented Lagrange multiplier functions and duality in nonconvex programming, SIAM Journal on Control and Optimization

12 (1974) 268–285.29] C. Shen, W. Xue, D. Pu, A filter SQP algorithm without a feasibility restoration phase, Computational and Applied Mathematics 28 (2009)

167–194.


30] H. Sovová, Rate of the vegetable oil extraction with supercriticalCO2. I. Modelling of extraction curves, Chemical Engineering Science 49(1994) 409–414.

31] J.W. Thomas, Numerical Partial Differential Equations: Finite Difference Methods, Texts in Applied Mathematics, vol. 22, Springer-Verlag,1995.

32] A. Wathen, Preconditioning for pde-constrained optimization, SIAM News 43 (2010) 8.


Documents

On optimization strategies for parameter estimation in models governed by partial differential equations