Multilevel modelling for engineering design optimization...dled implicitly by the VPP program. When given a set of hull geometry parameters, VPP automatically computes the size of

Artificial Intelligence for Engineering Design, Analysis and Manufacturing (1997), / / , 357-378. Printed in the USA.Copyright © 1997 Cambridge University Press 0890-0604/97 $11.00 + .10

Multilevel modelling for engineering design optimization

THOMAS ELLMAN, JOHN KEANE, MARK SCHWABACHER, AND KE-THIA YAODepartment of Computer Science, Hill Center for Mathematical Sciences, Rutgers University, Piscataway, NJ 08855, U.S.A.

(RECEIVED June 4, 1996; ACCEPTED April 22, 1997)

Abstract

Physical systems can be modelled at many levels of approximation. The right model depends on the problem to besolved. In many cases, a combination of models will be more effective than a single model. Our research investigatesthis idea in the context of engineering design optimization. We present a family of strategies that use multiple modelsfor unconstrained optimization of engineering designs. The strategies are useful when multiple approximations of anobjective function can be implemented by compositional modelling techniques. We show how a compositional mod-elling library can be used to construct a variety of locally calibratable approximation schemes that can be incorporatedinto the optimization strategies. We analyze the optimization strategies and approximation schemes to formulate andprove sufficient conditions for correctness and convergence. We also report experimental tests of our methods in thedomain of sailing yacht design. Our results demonstrate dramatic reductions in the CPU time required for optimization,on the problems we tested, with no significant loss in design quality.

Keywords: Design, Optimization, Approximation, Modelling

1. INTRODUCTION

Physical systems can be modelled at many levels of approx-imation. For example, the behavior of a sailing yacht can bemodelled in terms of steady-state motion or time-dependentmotion. Furthermore, the forces acting on a sailing yachtcan be modelled by simple algebraic equations or by com-putational fluid dynamics. The right model will depend, ingeneral, on the problem to be solved. For example, the sortof model needed to synthesize a new design may be differ-ent from the sort needed to accurately predict the perfor-mance of a single proposed design or to diagnose problemswith an existing design. Even in the context of a particularproblem-solving task, a single model may not suffice. Prob-lem solving may proceed in stages, with different model-ling requirements at each stage. Given a problem whosesolution requires a computational model of a physical arti-fact, an intelligent system must choose one or more suitablemodels from the range of all models possible and use themin a coordinated fashion to solve the problem.

Reprint requests to: Thomas Ellman, Dept. of Computer Science, HillCenter for Mathematical Sciences, Rutgers University, Piscataway, NJ08855, U.S.A.; E-mail: [email protected].

A number of investigators have developed knowledge-based methods for using multiple models to reason aboutphysical systems. A technique known as "compositionalmodelling" has emerged as the dominant paradigm for thisresearch (Falkenhainer & Forbus, 1991). In compositionalmodelling, each model is constructed from a library of modelfragments. An entire space of models results from system-atically enumerating combinations of model fragments. Anumber of methods have been proposed for using such aspace of models to carry out problem-solving tasks. Mostof this work has focused on problems whose solution in-volves simulation of a single artifact. Very little effort, ifany, has gone into investigating the use of compositionalmodelling in a design process that involves searching througha space of artifacts.

We are investigating the use of multiple models in thespecific context of computer-aided design optimization. De-sign optimization involves searching through a space of ar-tifacts, which presents a special opportunity for an intelligentsystem to exploit multiple models of an artifact. An optimi-zation algorithm can often utilize relatively simple modelsto make search control decisions and can rely on complexmodels only when needed to verify the optimality of a so-lution and satisfaction of constraints. For this reason, a com-bination of simple and complex models can lead to designs

357

Downloaded from https://www.cambridge.org/core. 06 May 2021 at 08:03:00, subject to the Cambridge Core terms of use.

https://www.cambridge.org/core

358 71 Ellman et al.

as good as those resulting from a single complex model, butat far lower computational cost.

We have developed a family of optimization strategiesthat are based on this insight. For example, one strategy usesa simple model to get near an optimum and falls back on acomplex model only during the last stage of optimization.Another strategy interleaves optimization with periodic cal-ibration of an approximate model. A third strategy is espe-cially useful for problems with multiple optima. This strategyuses an approximate model to find the basin of attraction ofthe global optimum and then uses the exact model to closein on the optimum.

Our research may be understood in the larger context ofprevious work in artificial intelligence. A large body of re-search has investigated the use of multiple levels of mod-elling in problem-solving activities, including the use ofabstraction in planning (Sacerdoti, 1974), decomposition indesign (Simon, 1981), and approximation in state-spacesearch (Pearl, 1984), among many others. We are attempt-ing to extend this body of research to the arena of engineer-ing design optimization.

We begin in Section 2 by describing our testbed domainand reviewing some of the modelling choices that arise inthat domain. In Section 3, we introduce our family of multi-model optimization strategies. In Section 4, we formulateand prove theorems about the correctness and convergenceof these strategies. In Section 5, we present results from ex-perimental tests of the strategies in the testbed domain. InSection 6, we discuss several areas of related work, bothwithin and outside the artificial intelligence community. InSection 7, we summarize our contributions.

2. THE SAILING-YACHT-DESIGN PROBLEM

The domain of sailing yacht design serves as a testbed forour research on multilevel modelling. As a case study, weare attempting to reconstruct the process that led to the de-sign of the "Stars and Stripes '87". A picture of the hull ofthis yacht is shown in Figure 1. The hull includes a canoebody (the main part of the hull), a keel (the appendage de-scending from the bottom of the canoe body), and the wing-lets (the appendages attached to the bottom of the keel). The"Stars and Stripes '87" won the America's Cup yacht racein 1987 (Letcher et al., 1987). Racing yachts are normallydesigned to meet objectives involving speed, rating, and cost,among others. Speed is defined by the time required for a

yacht to sail around a specified race course under specifiedwind conditions. Rating refers to a set of constraints on thehull and sails of a yacht that must be satisfied before theyacht can be admitted into a particular racing class. In ourresearch, we have chosen to focus on objectives of speedand rating and to ignore considerations of cost. We use aVelocity Prediction Program (VPP) to evaluate yacht de-signs. VPP is a program that we wrote ourselves based on acommercial software package for evaluating yacht designs(Letcher, 1991). The organization of VPP is described inFigure 2. VPP takes as input a vector x = (x,,...,xn) ofdesign parameters. In the present work, we use a five-parameter design space, including Length Beam, and Draft(dimensions of the canoe body) and KeelHeight andWingletSpan. VPP uses these parameters to generate B-splinesurfaces that represent the geometry of the yacht hull. It be-gins by using the hull-processing models to extract criticalquantities that have an impact on the speed of the yacht, forexample, wave resistance (Rw), friction resistance (/?/), ef-fective draft (D), vertical center of gravity (Vcg), and ver-tical center of pressure (Zcp), among others. VPP then usesthese quantities in a velocity-prediction model to set upnonlinear equations that describe the balance of forces andtorques on the yacht. The velocity-prediction model uses aniterative method to solve these equations and thereby deter-mine the "velocity polar," that is, a table giving the steady-state velocity of the yacht under various wind speeds and

Design Goal:Race CourseWind-Speed

Course Time

(Race Model)

Velocity(Wind-Speed,Heading)

(Velocity Prediction Model)

V7

(Hull Processing Models)

Yacht Geometry

Fig. 1. The "Stars and Stripes '87" hull.

XI X2 ••• Xn Hull Geometry Parameters

Fig. 2. Velocity prediction program.



Multilevel modelling for design optimization 359

directions of heading. The race model uses the velocity po-lar to determine the total time to traverse the given course,assuming the given wind speed. The rating constraint is han-dled implicitly by the VPP program. When given a set ofhull geometry parameters, VPP automatically computes thesize of the largest sail that will satisfy the rating constraint.By so incorporating the rating constraint into the VPP pro-gram, we formulate the yacht design problem as an uncon-strained minimization of the course time function T(x).

Numerical optimization codes offer a direct method ofattempting to solve the yacht design problem. One such codeis CFSQP (Lawrence et al., 1995). This code solves con-strained nonlinear optimization problems: Given an objec-tive function e(x), a collection of inequality constraintsgj(x) < 0 (/ = \...M), a collection of equality constraintsh-ix) = 0 (/ = 1...W), and a seed design s, CFSQP at-tempts to find a value of Jc that optimizes (minimizes or max-imizes) e(x) while satisfying each constraint. CFSQP is aquasi-Newton method. It operates by repeatedly construct-ing a quadratic approximation to the Lagrangian of the func-tions e(x), {gj(x)\i = \...M}and{hj(x)\i= \...N}, in theneighborhood of a point xh and subsequently optimizingthe quadratic function to obtain a new point xi+,. The iter-ation process is repeated until convergence is achieved. Themethod is "quasi" because the (inverse) Hessian of e(x)needed to construct the quadratic approximation is not com-puted directly but is approximated by a series of gradientsVe(Jc) generated as the iteration process moves through thedesign space.

In our yacht application, we use CFSQP to minimize thecourse time function T(x). Numerical minimization of thisfunction is difficult for several reasons. The first difficultyis the cost of evaluation. Each computation of T(x) requiresabout 1 h of CPU time, if one elects to use the most exact modelavailable. Atypical optimization from a single seed in a five-dimensional space using CFSQP requires about 100 evalua-tions of the objective function, that is, about 100 CPU h toconduct an optimization. An even greater difficulty resultsfrom the unreliability of optimization starting from a singleseed. In the yacht domain, the course time function T{x) ex-hibits numerous pathologies, such as ridges and discontinu-ities, resulting from numerical discretization used in VPP,among other things. Furthermore, T(x) is unevaluable through-out large regions of the design space, for example, becausethe balance of force equations fail to be solvable. Our systemcatches such failures and arranges for extremely bad valuesof the objective function to be returned to CFSQP. This al-lows optimization to continue when such failures are encoun-tered; however, it introduces sharp discontinuities into theobjective function. These pathologies make optimization froma single seed extremely unreliable, even for finding a localoptimum (Gill et al., 1981). A commonly used remedy is toconduct many optimizations from randomly selected seeds,but this approach dramatically raises the already high cost ofoptimization. To make reliable yacht design optimization fea-sible, some alternative must be found.

Approximation methods offer a means of overcoming thedifficulties of direct optimization. Especially fruitful oppor-tunities for approximation arise in the context of local op-timization problems. Consider that, to verify a local optimum,a search algorithm need only obtain accurate evaluations ofthe objective function or its gradient in the neighborhood ofa solution. Evaluations that occur along a path toward a so-lution need only be accurate enough to guide the search.Newton and quasi-Newton methods such as CFSQP exploitapproximations in just this fashion. They use quadratic ap-proximations of the objective function to guide the searchalong the path toward a solution. They fall back on the ex-act objective function to test for convergence.

Purely numerical methods such as CFSQP are not alwaysable to construct the most cost-effective approximationsavailable. They are limited to treating the objective func-tion as a "black box," that is, they use it to evaluate designs,but they do not look inside to examine its internal structure.More cost-effective approximations often result from ex-ploiting the internal structure of the objective function. Forexample, consider the structure of VPP shown in Figure 2.Instead of approximating the entire function T{x) (as CF-SQP does), one might choose to approximate a module thatcomputes an intermediate quantity such as wave resistanceRw or effective draft D. These two modules are especiallygood candidates for approximation because they both in-volve time-consuming computational fluid-dynamics (CFD)codes. For example, effective draft D can be computed byusing a potential flow code called PMARC (Katz & Plot-kin, 1991; Ashby et al.,1992). It also can be approximatedby a simple algebraic formula. Likewise, wave resistanceRw can be computed by using a slender-body flow codecalled SLAW (Weems et al., 1994). It also can be approx-imated by a formula that interpolates data from wave tanktests. Such internal approximations are not available topurely numerical optimization methods. They can only beconstructed by a system that has access to the internal struc-ture of the objective function.

Consider the choice between the PMARC potential flowcode and an algebraic approximation for computing effec-tive draft D. Effective draft is a measure of the amount ofdrag produced by the keel as a result of the lift the keelgenerates. An accurate estimate of this quantity is quite im-portant for analyzing the performance of a sailing yacht.Although PMARC is the most accurate means of estimat-ing effective draft, it also can be estimated by using an al-gebraic approximation that involves the maximum keel depthS and the midship cross section area A, as follows:

D = - lA/ir.

This formula is based on an approximation that treats a sail-ing yacht hull as an infinitely long cylinder and treats thekeel of the yacht as an infinitely thin fin protruding fromthe cylinder (Newman & Wu, 1973; Letcher, 1975). Theconstant K may be chosen to fit the algebraic model to dataobtained from wave tank tests or sample runs using PMARC.



360 T. Ellman et al.

(When the formula is fit to PMARC, a value of K = 0.83results.) Although the algebraic approximation is compara-tively easy to use, its results are not as accurate as thoseproduced by PMARC.

Another internal approximation involves the reuse of re-sults from prior evaluations in the course of an optimizationprocess. Some physical quantities may not change signifi-cantly when a design is modified. Values of such quantitiescan be retrieved from prior candidate designs. For example,suppose an algorithm is systematically exploring combina-tions of canoe bodies and keels of a sailing yacht (Fig-ure 3). In principle, VPP must evaluate the wave resistanceRw and the effective draft D of each candidate design. Waveresistance depends mainly on the canoe body of the yachtand is not significantly influenced by the keel. When onlythe keel is modified, wave resistance will not change much.Instead of recomputing wave resistance for the new yacht,the system can reuse the prior value. On the other hand, ef-fective draft depends mainly on the keel of the yacht and isnot significantly influenced by the canoe body. When onlythe canoe body is modified, effective draft will not changemuch. Instead of recomputing effective draft for the newyacht, the system can reuse the prior value. In fact, the en-tire matrix of yachts can be evaluated by computing waveresistance for a single row and computing effective draftfor a single column. By intelligently deciding when to reuseprior evaluation results, one can significantly lower the com-putational costs of design.

3. MULTIMODEL OPTIMIZATIONSTRATEGIES

A family of strategies for unconstrained multimodel opti-mization is presented in Figure 4. Each strategy (poten-

CANOE-BODIES-

KEELS

D-0

Rw-0

D-l

D-2

D-1 -

D-4

D-5

Rw-1 Rw-2 Rw-3

D-3

Rw-3

Rw-4 Rw-5

Fig. 3. Reuse of prior results.

tially) takes as input a seed design that represents a startingpoint for the design optimization process. Each returns anew design that results from using a nonlinear optimizationcode, such as CFSQP, in combination with exact and ap-proximate versions of the objective function. Definitions ofthese functions are shown in Figure 5. The function e(x) isthe so-called exact objective function. The function e(x)is a fixed approximation of e(x). The approximation e(x) iscalled fixed because the function e itself is not modifiedduring the optimization process. In contrast, the functione{x,y) is a locally calibratable approximation, and can bemodified during the optimization process.

3.1. Compositional modelling of approximateobjective functions

An approximate objective function e(x) can be constructedfrom a library of model fragments by using compositionalmodelling techniques. Our approach to compositional mod-elling is illustrated in Figure 6. The evaluation function e{x)is defined by an expression that references functions Fx...Fn.Each F, computes an intermediate quantity needed in thecomputation of e(x). For example, in the yacht domain,course time T(x) is defined by an expression S(Rw(x),Rf(x), D (x), Vcg (x), Zcp (x)) that references intermediate func-tions Rw, Rf, D, Vcg, and Zcp, which define hull processingmodels (Figure 7). In general, each intermediate functionF, may be defined in several ways. Different definitions ofF, implement different approximations. For instance, effec-tive draft can be defined by the function D(x), which usesPMARC, or by the function D(x), which uses the algebraicformula. Likewise, wave resistance can be defined by thefunction Rw(x), which uses SLAW, or by the function Rw(x),which uses curves fitted to tank data. In general, each ver-sion of e(x) results from selecting a tuple from the Carte-sian product of the definitions of the intermediate quantityfunctions. For example, in the yacht domain, e{x) can beinstantiated by any of the approximations 7",, T2, and T3 thatuse simplified models for computing effective draft or waveresistance.

3.2. A simple two-level strategy

The simplest multimodel strategy (MLM^) is based on theidea of dividing the optimization process into two phases.The first phase optimizes the fixed approximate objectivefunction e(x), starting at a seed sold, to find a design slemp.The second phase starts at sKmp and optimizes the exact ob-jective function e(x) to obtain a final design. Strategy A/LA/,can be justified by the following rationale. First, one ex-pects that the distance from slemp to the final design will beless than the distance from soM to the final design. Second,one expects the required number of objective function eval-uations to decrease with the distance from the seed to theoptimum. Therefore, strategy MLMX should require fewerevaluations of the exact objective function e(x) than are



Multilevel modelling for design optimization

Strategy MLM^(sM):

1. Optimize e(i) starting at sM to obtain stcmp.

2. Optimize e(x) starting at «(emp to obtain and return $„«„,.

Strategy MLM^SM):

• InneTOptimization(s): Return the result of optimizing es(z) starting at s.

• LineSeaTch(soid, snrul): If \(snew — SML)\ <: S then return Sou otherwise if e(smw) is better thane{soid) then return smm otherwise return LineSearch(soid, (Said + snew)/2).

• C onverged'i.(s0id,sne%11): Return KefS,,^) - e(Sou))le(aoid)\ < e.

1. Let snew = /nnerOptimizaHon(5o|<i).

2. If DoLine Search! then let 5new = itraeS'earc/i(«oi<j,5neTO).

3. If Convergedlôid^neu,) then return 5neu, else return MLMi(snew).

Strategy MLM3{):

1. Randomly select seed designs: {S\,..., 3n}.

2. For each seed Si, optimize e(x) starting at 5; to obtain t;.

3. Let SncW be a member of {flt..., in} that is optimal according to e(x).

4. Return MLM^s^v,).

Fig. 4. Strategies for unconstrained optimization using multiple models.

361

needed in a strategy of directly optimizing e(x). The per-formance of strategy MIMX thus will depend in part on thecost effectiveness of the approximate objective function e(Jc):How close is the optimum of e(x) to the optimum of e(Jc)?How much faster is e{x) than e{x)l The performance ofMLMt also will depend in part on the properties of the nu-meric method (e.g., CFSQP) that is used to carry out eachstage of optimization. These factors are investigated analyt-ically in Section 4 and empirically in Section 5. It turns outthat strategy MLMX often fails to achieve dramatic compu-tational savings. The reason is that optimization programstypically spend most of their time conducting searches veryclose to the final stopping point. Getting close to the stop-ping point, therefore, achieves little savings. These consid-erations motivate more sophisticated multimodel strategies.

• Design space: SJ"

• Exact objective function: e : S " -» S.

• Fixed objective approximation: e : 3?" —> S.

• Locally calibratable objective approximation: e : R" x R" —+ if?.

• Exact internal function: Fi : S n -> ft.

• Fixed internal approximation: F; : Wn —• SR.

• Locally calibratable internal approximation: Fi : Si" x S n —* ft.

Fig. S. Definitions of exact and approximate objective functions.

3.3. Recalibration of approximateobjective functions

The multimodel strategy MLM2 is based on the idea of re-peatedly calibrating an approximate objective function tofit local regions of the design space. This strategy uses anobjective function e(x,y), whose domain is the Cartesianproduct of the original design space with itself. The param-eter x represents a design to be evaluated. The parameter yrepresents a point at which calibration takes place. StrategyMLM2 consists of two nested optimization loops. TheInnerOptimization procedure takes a seed point 5 as an ar-gument. It begins by constructing the function es.(jc), that is,the restriction of e(x,y) to the subspace defined by y = s.The function e:s(x) is intended to be a locally accurateapproximation of the exact objective function, that is,e-s(x) = e(x) whenever |Jc — s\ is small. The InnerOptimi-zation procedure then optimizes e5(Jc) by using the seed x =s as a starting point. The outer loop of strategy MLM2 calls

F, = Ft | A | h ...

F2 = F2 | F 2 | h •••

Fn = Fn \ Fn \ K ...

Fig. 6. Organization of library of models and submodels.




Exact Objective Function:

e(x) = T(x)T(x) = S(Rw(x),Rf(x),D(x),Vcg(x),Zcp(x))

Fixed Approximation Objective Function:

e(x) = f(x)7i(x) = S(Rw(x),Rf(x),D(x),Vcg(x),Zcp(x))T2(x) = S(Rw(x),Rf{x),D(x),Vcg{x),Zcp(x))T3(x) = S(Rw(x),Rf(x),D(x),Vcg(x),Zcp(x))

T Computes course time from the design parameters.

S Computes course time from the results of the hull processing modules.

Rw Computes wave resistance using SLAW.

Rw Computes wave resistance using curves fitted to wave tank data.

D Computes effective draft using PMARC.

D Computes effective draft using the algebraic formula.

Fig. 7. Definitions of yacht domain functions.

InnerOptimization repeatedly to generate a series of de-signs (s0, I , , . . . ,sn,...), where si+, = lnnerOptimization{s^.The outer loop terminates when a convergence test detectsthat the relative improvement in the exact objective func-tion e{x) between successive iterates has fallen below athreshold.

The locally calibratable objective function e(x,y) can beconstructed from a library of model fragments with compo-sitional modelling techniques. Examples of such construc-tions are shown in Figure 8. Six different types of calibratableapproximations are shown. Each results from combining anapproximate version (F,(Jc)) and an exact version (F,(;c)) ofsome intermediate function. Functions e, and e2 are based

on Oth- and first-order approximations of F, about the cali-bration point. Notice that these are purely numerical ap-proximations. They do not actually require any predefinedinternal approximation Ff from the library of model frag-ments. Functions e3 and e4 are based on an assumption thatFj has a predictable absolute error with respect to F,-. In par-ticular, e3 implements an assumption of constant absoluteerror in the neighborhood of the calibration point. In con-trast, e4 implements an assumption that the absolute error isa linear function of the design parameters. Functions e5 ande6 are based on an assumption that F, has predictable rela-tive error with respect to F,. In particular, e5 implements anassumption of constant relative error in the neighborhoodof the calibration point. In contrast, e6 implements an as-sumption that the relative error is a linear function of thedesign parameters. Notice that functions e{ and e2 may beseen as special case versions of e3 and e4 by taking F/(x) =0. Alternatively, they may be seen as special case versionsof e5 and e6 by taking Ft(x) - 1.

Consider how strategy MLM2 might be instantiated in thedomain of sailing yachts. Various different locally calibrat-able objective functions result from different choices of in-termediate functions to play the role of F, in Figure 8. Forexample, if F, represents effective draft D, then the calibrat-able approximations e, , . . . ,e6 represent different ways ofperiodically recalibrating the algebraic formula for effec-tive draft when using PMARC as a basis for recalibration.Likewise, if F,- represents wave resistance Rw, then the cal-ibratable approximations e, , . . . , e6 represent different waysof periodically recalibrating the curves fitted to tank datawhen using SLAW as a basis for recalibration. Notice thatin either case each approximation «,.(jc, y) includes some sub-expressions that depend on the calibration parameters y andnot on the design parameters x. These subexpressions need

Exact Objective Function:

Fixed Approximation Objective Function:

e(x) = G(Fl(x),...

Locally Calibratable Objective Functions:

efay) = G(Fl{x),...,Fi-l(x),Fii(x,y),Fi+l(x),...,Fn{x))Fn(i,y) = Fi(y)

Fi3(x,y) = Fi(

)

Fig. 8. Exact and approximate objective functions for use in multimodel strategies.




only be computed at each calibration point s, when the Inner-Optimization procedure constructs the function es(Jc). Theyneed not be recomputed each time es(x) is evaluated insidethe inner optimization. For example, in the yacht domainexamples described above, the function F,(Jc) (which usesPMARC or SLAW) is applied only to the calibration points. These CFD codes need not be invoked from within thenumeric optimization code (e.g., CFSQP) invoked by theinner optimization.

The behavior of strategy MLM2 is illustrated by the dia-gram in Figure 9. MLM2 follows a path through the spaceon which the locally calibratable function e(x, y) is defined,that is, the Cartesian product of the design space and thecalibration parameter space. Movement along the verticalaxis corresponds to calibration or recalibration, that is, set-ting the calibration parameter y to the design parameter seeds and constructing the function e^(x). During calibration,only the calibration parameter y in e{x,y) is changed. Thedesign parameter x remains fixed. Calibration always movesvertically to the line defined by x = y. Movement along thehorizontal axis corresponds to the numeric optimization thatoccurs inside the InnerOptimization procedure. During theinner optimization, only the design parameter x in e(x, y) ischanged. The calibration parameter y remains fixed. Eval-uations of the locally calibratable approximation e(x, y) oc-cur only at points along the solid lines in the diagram.Evaluations of the exact objective function e{x) or sub-expressions of e{x) need occur only in the neighborhoodsof calibration points, that is, points that lie along the liney = x and are marked by circles on the diagram. Accord-ingly, strategy MLM2 never evaluates the exact objectivefunction e(x) inside the execution of a numeric optimiza-tion code.

Strategy MLM2 includes a procedure LineSearch{sold, snew),which may be called after each invocation of Inner-Optimization{s), depending on the flag DoLineSearchl spec-

Calibration

Parameter

Space

Seed Solution

Design SDace

Fig. 9. Path followed by interleaved optimization and calibration.

ified by the user. This procedure conducts a search in theone-dimensional space defined by the input sold and the out-put snew of InnerOptimization. It generates the series of points(Fo,?[,...,?,,...), where t0 = snew and /, = (sold + F,_,)/2for all i > 0. It terminates when finding a point /, that isbetter than sold according to e(x), the exact objective func-tion. In most cases, the point snew generated by InnerOpti-mization will immediately meet this criterion; however, thiscondition is not guaranteed because the inner optimizationuses only an approximation of the exact objective function.Notice that LineSearch conducts a satisficing search alongthe line between soid and snew. It does not conduct an opti-mizing search. This feature is motivated by a desire to min-imize the use of the exact objective function e(x). Observethat the two procedures InnerOptimization and LineSearchare analogous to two computation steps that occur in opti-mization codes based on Newton or quasi-Newton methods:1) constructing and optimizing a quadratic approximationof the objective function, and 2) conducting a line searchbetween the start and end points of the quadratic optimiza-tion (Lawrence et al., 1995).

The line search procedure is especially important in ap-plications such as the yacht domain, in which the code thatevaluates the exact objective function e(x) may fail to re-turn a value in some regions of the design space. The localapproximation e-s{x) may move the location of the uneval-uable region by a small amount. The design snew returnedby InnerOptimization may actually be unevaluable with theexact objective function e{x). (In our system, a call to e{x)would return an extremely bad numerical value.) This un-fortunate situation tends to arise when the optimal design isobtained by pushing the design up against physical con-straints that are not explicit in the problem statement butthat must be met for e(x) to be evaluable. For example, inthe yacht domain, the balance of force and torque equationsmust be solvable to compute steady-state velocity on eachleg of the race course. Near the optimum, it often happensthat equations are nearly unsolvable for one leg of the coursebecause the design is well optimized for other legs of thecourse and is just barely able to sail at all on the leg thatresults in a near failure of the objective function. In situa-tions like this, the LineSearch procedure serves to move snew

back into the evaluable region of the design space. In allof the experiments we report in this paper, the flag Do-LineSearchl was set to True so that the LineSearchl routinewas invoked after each inner optimization. In experimentson an aircraft design problem and a simpler version of theyacht design problem, we obtained good results without usingthe LineSearch routine (Ellman et al., 1997).

A possible modification to strategy MLM2 is suggestedby analogy with trust-region optimization techniques (Den-nis & Schnabel, 1983). A trust-region method fits an ap-proximation es(Jc) to the objective function e(x) at the designpoint s = sold. It then optimizes e-s{x) over a hyperspherecentered at sold to obtain a new design snew. The hyper-sphere is chosen to be a region over which e5.(jc) is a good




approximation of e(x). The optimum snew is then used todefine a direction snew — sold for a line search. Strategy MLM2

could be modified to use this approach. Suppose we havean estimate S of the size of a region over which the localapproximation e-s{x) is valid. We obtain a trust-region tech-nique by modifying the InnerOptimization routine so that itoptimizes e-s(x) subject to the constraint |jc — s| < 8.

A number of interesting variations on strategy MLM2 arisewhen the locally calibratable approximation e(x, y) in-volves two or more internal approximations. For example,if two internal functions are approximated and recalibrated,then e{x, y) has the form

ekl(x,y) =

The function ek!(x,y) uses scheme number k for approx-imating F, and uses scheme number / for approximating Fj(Figure 8). Suppose, for example, that Fi is wave resistanceRw and Fj is effective draft D. Suppose further that we usee22(x,y), that is, we fit linear functions to Rw(x) and D(x)at calibration points. A question arises regarding how to co-ordinate the recalibration of each of these linear approxi-mations. If both internal approximations are accurate overroughly the same region, it may pay to use strategy MLM2

directly and recalibrate both after each inner optimization.However, suppose that one internal approximation is validover a wider region than the other. It may be useful to use adoubly nested strategy in which one approximation is cali-brated at the beginning of each inner optimization, and theother is calibrated at the beginning of each inner-inner op-timization. On the other hand, the two internal approxima-tions may be accurate in different directions. For example,an approximation of effective draft is likely to remain ac-curate when the canoe body is changed and the keel is fixed.Likewise, an approximation of wave resistance is likely toremain accurate when the keel is changed and the canoe bodyremains fixed. This observation suggests using a scheme forapproximating each internal quantity in which the approx-imation is constant in one subspace and linear in the com-plementary space. The resulting strategy will behave in amanner similar to that illustrated in Figure 3.

3.4. Dealing with pathological objective functions

The last multimodel strategy, MLM3, may be seen as a com-bination of the simple two-level strategy MLMX and the re-calibrating strategy MLM2. It is useful when the objectivefunction has pathologies such as discontinuities, ridges, un-evaluable points, and multiple-local optima—features thatprevent numerical optimization codes from reliably reach-ing global, or even local, optima. Strategy MLM3 is basedon the hope that optimization using e(x) from multiple seedscan find a design lying in the basin of attraction of the trueglobal optimum, even though it does not produce a true glo-bal (or local) optimum itself. MLM3 thus begins by select-ing a random set of seed designs. It optimizes each by using

e{x), the fixed approximation. This results in a set of ap-proximate optima. MLM3 then selects one of the approxi-mate optima that is best according to e(x) and uses this as aseed for strategy MLM2.

Strategy MLM3 is particularly useful in the yacht do-main. In this application, pathological features of the objec-tive function are a major problem. Unreliable optimizationdue to ridges, discontinuities, and unevaluable regions canresult in a dramatic loss in design quality, even greater thanlosses that result from using approximations such as the al-gebraic formulae for effective draft and wave resistance.Consider that many optimizations using the algebraic ap-proximations can be run in the time it takes to conduct asingle optimization using the CFD codes. A computationaladvantage, therefore, results from conducting optimiza-tions from many seeds by using the algebraic formulae andthen by using the best result as a seed for a single optimi-zation with CFD codes.

3.5. Using error estimates to guide model selection

Estimates of the error of approximate models may be avail-able from a variety of sources. For example, in the yachtdomain, the error of the algebraic effective draft formula Dmay be obtained as a byproduct of the process of fitting theconstant K that appears in the formula, whether the formulais fit to data from wave tanks or to data from sample runs ofPMARC. (When the formula is fit to PMARC, the root meansquare error e is 0.51 ft. A typical effective draft is about10.5 ft.) The error in effective draft D(x) can be convertedinto an error in the course time function T(x). If the trueeffective draft lies in the interval [D — e, D + e], then thetrue course time lies in the interval [S(Rw,Rf,0 + e),Vcg,Zcp),S(Rw,Rf,0 — e),Vcg,Zcp)]. This conclusion is basedon the fact that course time is a monotonically decreasingfunction of effective draft. The error in course time T(x)corresponding to an error of e in effective draft will dependon the specific design x and on the environment conditions,that is, race course and wind speed.

When error estimates are available, they can be used tomake refinements in the multimodel optimization strat-egies. The first refinement involves strategy MLM3, whichconducts multiple optimizations by using the fixed approx-imation e(x) to find a good starting point for the recalibrat-ing strategy MLM2 (Figure 4). Strategy MLM3 uses the fixedapproximation e(x) to select among the designs {tu...,tn}resulting from the first stage of optimization. This criterionmay not result in choosing a seed for the second stage thatis best according to the exact objective e(x). An alternativeapproach could use the exact objective function e(x) tochoose among the designs {?,,...,/„}; however, this mightbe too computationally expensive. A third approach is illus-trated by Figure 10. When comparing two designs r,- and tJt

one uses the approximation e(x) to evaluate and comparee(tj) and e(tj) and checks whether the error bars overlap. Ifnot (as illustrated by the solid error bars), then the compar-




Approximate

Objective

Function

Desien Space

Fig. 10. Use of error bars in selecting among seeds.

ison process is complete. If the error bars overlap (as illus-trated by the dotted error bars), one falls back on the exactobjective function e(x) to evaluate and compare e(f,) ande(tj). This approach uses the approximation on all of theclear-cut cases and relies on the exact function only whenneeded.

A complication arises when a large number of startingpoints is used during the first stage of strategy MLM3. Itmay happen that two designs lt and tj lie very close to thesame local optimum. In such a case, their error bars willlikely overlap, thereby recommending the use of the exactobjective function to discriminate between them. Neverthe-less, the choice between two such designs will probably notmatter very much. The computation expended to evaluatethem exactly would be wasted effort. A possible solution isto cluster the designs {7,,..., tn} into equivalence classes suchthat two designs are in the same class whenever they arenear each other in design space. One then carries out thecomparison between designs by using only one representa-tive from each class.

A second refinement involves the use of error estimatesto choose convergence criteria. Numeric optimization codesoften test for convergence by comparing the evaluations ofsuccessive iterates. For example, one convergence criterionavailable in CFSQP tests whether the absolute change inthe design quality is below a user-specified threshold. Inthe context of multilevel modelling, error estimates providea basis for choosing such a threshold. For example, in strat-egy MLMX, when the first stage of optimization (using thefixed approximation e(x)) calls a numeric optimization code,the average error of e(x) would be a natural choice for theconvergence threshold. Likewise, in strategy MLM2, whenthe inner optimization (using the recalibratable approxima-tion e(x,y)) calls a numeric optimization code, an estimateof the average error of e(x, y) would be a natural choice forthe convergence threshold. In practice, the actual choice ofthreshold should probably depend on both the estimated er-

ror and the costs of computation of approximate objectivefunctions. For example, in the yacht domain, the algebraicformulae for effective draft and wave resistance are muchfaster than the PMARC and SLAW CFD codes. Therefore,relatively little savings in CPU time results from early ter-mination of an initial optimization (or inner optimization)when using these algebraic formulae. In such cases, opti-mizations using approximate objective functions can be runto convergence levels beyond the error estimates with littlesacrifice of computational resources.

3.6. Generalizing to an arbitrary number of levels

The strategies MLMX, MLM2, and MLM3 in Figure 4 all canbe generalized to use an arbitrary number of models, eachat a different level of approximation. For example, strategyMLM{ can be generalized from two levels to N levels byreplacing the sequence of two optimization steps with a se-quence of TV* optimization steps. Likewise, strategy MLM2

can be made to operate with N levels by replacing theInnerOptimization step with a version of MLM2 that oper-ates with N — \ levels. The generalization of strategy MLM3

is more complicated. One approach would optimize somenumber n of seed designs {•?, ,...,.?„} at each level to obtainn "optimal" points {r, , . . . , /„}. A fraction of the optimal pointswould be passed down as seeds for the next level. At thelowest level, the strategy returns the result that is best, ac-cording to the "exact" model.

Such multilevel strategies may or may not be useful inpractice. Suppose the overall cost of a two-level optimiza-tion is dominated by the computations that take place at thebottom level. The addition of a third level, at the top of thehierarchy, may speed up the processing that takes place atthe middle level; however, it will do little to speed up theprocessing that takes place at the bottom level. In contrast,suppose that the overall cost of a two-level optimization isdominated by the computations that take place at the toplevel. The addition of a third level, at the top of the hierar-chy, may well speed up the overall solution process.

4. THEORETICAL ANALYSIS

4.1. Analysis of the simple two-level strategy

The simple two-level optimization strategy MLMl uses thefixed approximation e(x) to get close to the optimum. Ituses the exact objective function e(x) to move the remain-ing distance to the optimum. In this section, we investigatethe question of when MLMX will run faster than a strategythat simply uses e(x) in a single phase of optimization. Strat-egy MLM^ will require fewer evaluations of e (x) if two con-ditions are met. 1) The first phase of MLA/, generates anintermediate design slemp that is closer than the seed sold tothe true optimum, that is, \snew - slemp\ < \snew - sold\. 2)The number of evaluations of e(x) needed by the numericoptimization code decreases with the distance from the seed




to the optimum. The performance of AfLM, will depend inpart on the order of convergence (Dahlquist & Bjorck, 1974)of the optimization method. Consider what happens whenthe method has linear convergence. Let e,- be the distance ofthe ith iteration from the optimum.

6,-+, <=fce,

e« ~ kne0

n » log*( - ) •

\e0/

The cost of carrying out an optimization to achieve a de-sired error of en grows as the logarithm of the error e0 of thestarting point. Suppose the initial seed sold is a distance afrom the optimum. Suppose further that the first stage ofoptimization results in a design slemp at a distance /3 fromthe optimum. If we assume that the computational cost ofthe first stage is negligible, then the relative savings S ob-tained by using strategy MLMX is given by

n(a)-n(p)

~ lOgt(6./j8)

\ogk(eja)

For example, to achieve a factor of two speedup, the firststage of strategy MLMX must move to a point slemp whoseerror is the geometric mean of the error of the initial pointsold and the desired error of the final design snew, that is,/3 «* -Jaen. This puts a rather stringent requirement on thefixed approximation e (x). Now consider what happens whenthe optimization method has quadratic convergence:

e(+1 "kef

1 ,. .,„

The cost of carrying out an optimization to achieve a de-sired error of en grows as the double logarithm of the errore0 of the starting point. The relative savings 5 obtained byusing strategy MLA/, is given by

p n(a) - n(0)S~ n(a)

logjlogfltej/logpto)] - logJlogfoJ/logOff)]Iog2[log(ten)/log(te)]

For example, to achieve a factor of two speedup, if we takek= 1, the first stage of strategy MLM{ must move to a pointstemP whose log-error is the geometric mean of the log-error

of the initial point sold and the desired log-error of the finaldesign snew, that is, log(/3) *» Vlog(a:)log(eJ. This putsa somewhat less stringent requirement on the fixedapproximation.

4.2. Analysis of the recalibrating strategy

Strategy MLM2 uses the recalibratable objective functione(x,y) in the inner optimization. It uses the exact objectivefunction e{x) or portions of e(x) only during calibration, orrecalibration, in between successive inner optimizations. Thissection investigates several questions about the behavior ofstrategy MLM2: 1) Is an optimum found by MLM2 alwaysan optimum of the exact objective function? 2) Is MLM2

guaranteed to converge? 3) Does MLM2 run faster than astrategy that simply uses e{x) in a single phase of optimi-zation? We investigate these questions by considering anidealized version of strategy MLM2. The idealization is basedon an assumption that the LineSearch routine is explicitlyturned off (DoLineSearchl = False) or else that theLineSearch routine operates as a null operation, that is, italways returns its second argument, snew, the local opti-mum of e^.(x) found by the InnerOptimization routine. Laterin this section, we discuss the degree to which our analysisapplies when we relax this assumption.

4.2.1. Analysis of the idealized strategy

We begin with two definitions.' The first definition con-cerns how closely e(x,y) approximates e(x). The secondconcerns properties of e{x,y) itself. These definitions char-acterize conditions under which we can prove correctnessand convergence of the idealized version of strategy MLM2.

DEFINITION 1. A function e: 9T X 9T -> 9? is a locallycalibratable approximation scheme of order/? for a functione:9T —> 9f, if e(x) and e(x,y) are continuous andp timesdifferentiable in x and (VJc.êSR") and (Vie[0.../?]); x = yimplies that d'e^fi/dx' = d'e(jc)Mc''- •

This definition can be understood in terms of the graphshown in Figure 9. If an approximation scheme e(x,y) hasorder zero, the value of e{x,y) will fit the value of e{x) inthe subspace defined by y = x. If an approximation schemee(x, y) has order one, the value and gradient of e(x, y) (withrespect to x) will fit the value and gradient of e(x) in thesubspace defined by y = x.

DEFINITION 2. A locally calibratable approximationscheme e: 9t" X 91" —> 5R is admissible for unconstrainedminimization from a seed in [l,u] C 9?" if

1 A closed hyperrectangle with opposite corners 7 and u is denoted by[I,ii]. An open hyperrectangle is denoted by (J,u). The gradient of a multi-variate scalar function f(x) is a column vector denoted by df/dx. The Hes-sian of f(x) is a matrix denoted by d2f{x)/Sx2. The mixed derivative off(x,y) is a matrix [a,-,-] with n (length of x) rows and m (length of y) columns,where atj = d2fldyjdxj, and is denoted by d2f(x,y)/dydx. The norm of amatrix A is denoted by \A\.




1. the function e(x,y) is continuous and twice differen-tiable in x and y on [/,u] X [7,w].

2. (Vse[I,u]),d2e(x,y)/dx2 is positive definite wheneverx = y = s.

3. (3c),(Vjc,y6[/,«]), |(d2e(jc,y)/djc2)-'• O2e(jc,y)/dyfljc)|<c< 1.

4. (V.?e|7,w]),(3f)e(7,w)) such that r is a local minimumofe;.(.v). •

In the context of an admissible approximation schemee{x,y), we may speak of a function I:\l,u\ —» (7,w) suchthat /(J) is the unique local minimum of e-s(x) in (7,w). Theexistence of such a local minimum is guaranteed by Con-dition 4. Its uniqueness is guaranteed by the following ar-gument: The admissibility of e(x, y) implies that the Hessiand2e(x,y)/dx2 is positive definite on the diagonal x = y andvaries continuously throughout [/,«] X [/,«]. If there weresome point (xo,yo) in [/,«] X [l,u] at which the Hessianwere not positive definite, then necessarily there would besome point on the line segment from (xo,xo) to (xo,yo) atwhich the Hessian would be singular, which would violateCondition 2. Therefore, the Hessian d2e(x,y)/dx2 is posi-tive definite throughout [/,«] X [/,«]. Because d2e(x,s)/dx2 = d2e-<(x)/dx2, the uniqueness of the local minimumof e3.(Jf) is guaranteed. (This argument was taken fromKachiyan, 1997.) We may think of the function I(s) as rep-resenting the InnerOptimization routine in strategy MLM2.We may define the idealized version of strategy MLM2 tobe an algorithm that accepts a seed s0 as input and repeatsthe iteration si+i = /(s,-) until reaching a fixed point sf =I(sj-). Thus, we can formulate and prove theorems about theidealized procedure MLM2 in terms of behavior of the func-tion l(s).

THEOREM 1. Let e(x,y) be a locally calibratable ap-proximation scheme of order p > 1 for e(x) that is ad-missible for unconstrained minimization from a seed in[I,u] C 3i". Lete(x) be convex on [/, u\. Then, for any se (I, u),s is local minimum of e(x) if and only if s = I(s). m

Proof: Suppose se(J,u) is a local minimum of e(x). Thenthe derivative de(x)/dx = 0 when x = s because e(x) iscontinuous and differentiable. Then de(x,y)/dx = 6 whenx = y = s because e(x,y) is a locally calibratable approxi-mation scheme of order p > 1. Then the function e-s(x) haszero derivative atx = s. Also, d2e(x,y)/dx2 is positive def-inite when x = y = s because e(x,y) is admissible for min-imization for a seed in [7,w]. Because d2e(x,s)/dx2 = d2e-s(x)/dx2, it follows that J is a local minimum of e,(i). Furthermore,the minimum s is unique in (/,«) since e(x, y) is admissiblefor minimization from a seed in [/,»]. Therefore, s = I(s).Now suppose that se(l,u) and J = I(s). Then I is a localminimum of eîx). Then the function e-^(x) has zero deriv-ative at x = * because e(x,y) is continuous and differentia-ble. Then de(x, y)/dx = 0 when x = y = s. Then de(x)/dx =0 when x = s because e(x,y) is a locally calibratable ap-

proximation scheme of order p > 1. Because e{x) is convexon [7,M], it follows that 5 is a local minimum of e(x). •

When the conditions of Theorem 1 are met, every fixedpoint se{l,u) of I(s) is a local optimum of e(x). Likewise,any local optimum se(l,u) of e(x) is a fixed point of I(s).Because fixed-point iteration of I(s) corresponds to our ide-alized version of strategy MLM2, these conditions tell uswhen a design returned by MLM2 is guaranteed to be a localoptimum for the original problem, assuming the inner op-timization operates in an ideal fashion. Among other things,the conditions require that e(x,y) fit the gradient Ve(x) ex-actly at all possible calibration points x = y for x,ye[l,u].

Consider how this theorem can be applied to the calibrat-able approximations described in Figure 8. Functions e, (x, y),e3(x,y), and e5(x,y) are locally calibratable approximationschemes of order zero, that is, they fit the objective func-tion e(x) at the calibration point. They are not guaranteed tofit the gradient Ve(Jc) at the calibration point. They may fitthe gradient by fortuitous accident. For example, if it hap-pens that VFj(x) = VFj(x) for all x, then e3(Jc,y) will fitthe gradient of e{x) at the calibration point. Likewise, ifVF,-(Jc)[F;(Jc)/F,(Jc)] = V^(Jc) for all x, then e5(x, y) will fitthe gradient of e(x) at the calibration point. In any case,these three functions should not normally be expected toresult in exactly correct solutions when used in strategyMLM2. On the other hand, functions e2 (x, y), e4 (x, y), e6 (x, y)are locally calibratable approximation schemes of order one,that is, they fit both the objective function e{x) and the gra-dient Ve(Jc) at the calibration point. Assuming these func-tions satisfy the other admissibility conditions, they shouldreturn exactly correct solutions when used in strategy MLM2,subject to ordinary numerical errors.

THEOREM 2. Let e(x, y) be a locally calibratable approx-imation scheme for e(x) that is admissible for uncon-strained minimization from a seed in [/,«] C 9?". Then thefunction I(s) has a unique fixed point sfe(l,u), and(Vsoe[l,u]), the series (so,s~\,... ,sh...) generated by the rule5,+ 1 = I(Sj), converges to Sj-. •

Proof: By the fixed-point theorem (Conte & de Boor, 1980),there exists a fixed point sfe(l,u) of the function I(s) suchthat sf is unique in (7,M) and sf is the limit of the series(s0,i,,...,I,-,...) defined by si+, = /(£,), provided three con-ditions are met: 1) soe[l,u]; 2) I(s) maps [l,u] into itself;and 3) I(s) is a contraction mapping, that is, for all sl ,s2e[l,u],| / ( i , ) - I(s2)\ < c\st — s2\ for some constant c < 1. Thefirst condition is true by assumption. The second conditionis guaranteed by the admissibility of e(x,y). The third maybe demonstrated by showing that the Jacobian of I(s) existsand satisfies the relation \dl(s)/ds\ ^ c for some constantc < 1 for all se[I,u]. I(s) is the unique value of x in [J,u]that solves the equation y (x, s) = 0, where y (x, y) = de(x, y)ldx. By the implicit function theorem (Rudin, 1964), the Ja-cobian of I(s) exists and is given by dl(s)/ds = —(dy/dxT'-idy/dy) = -\d2e{x,y)ldx2Yx -[d2e(x,y)/dydx] when




x = I(s) and y = s. The admissibility of e(x,y) guaranteesthat the norm of this product is less than one for all x, y~€[l, u\.Therefore, I(s) is a contraction mapping, satisfying the thirdcondition of the fixed-point theorem. •

Our second theorem asserts conditions under which theidealized version of strategy MLM2 is guaranteed to con-verge. Among other things, the norm of the product of themixed second derivative of e(x, y) and unmixed second de-rivative (in x) of e(x,y) must be less than one. As an illus-tration of this condition, consider the example of an objectivefunction that is the sum of two terms,/(JC) and g(x). Sup-pose that g (x) is relatively expensive to compute. One mightchoose an approximation scheme that fits a linear functionto g(x) at each calibration point:

e(x)mf(x)

g(y) + g'(y)(x - y)

= /"(*)32e(x,y)

dx2

32e(x,y)

dydx

In this example, the admissibility condition on the derivativesof e(x,y) reduces to the following: (Vx,ye[l,u])\g"(y)\ <\f"(x)\. The term g(x), which is subjected to the linear ap-proximation, must be closer to linear than the term f(x),which is left alone.

As a second illustration of the convergence condition forstrategy MLM2, consider how it can be applied to the cali-bratable approximations for the yacht domain (Figure 8).First, we rewrite e{x,y) in the following form:

e(x,y) = R[E(x),D(x,y)].

The function E(x) represents all of the hull processing mod-els other than the one that computes effective draft. E{x)does not depend on the calibration parameter y, that is, it isnot subject to approximation. Using the chain rule twice,we derive formulae for the derivatives appearing in the con-vergence condition:

d2e(x,y) _ 3R f 32D "I d2R 3D 3D [" d2R 1 3E 3D

dd ~ Dlddj + 2 d d + [ j d 3dydx

32D "I

dydxj[

3D2 dx dy + [dEdDj dx 3y

32e{x,y) _ 3R 32E 3R_ d^£ 32R[3E

dx2 ~ 3E 3x2 3D 3x2 3E2 [ dx

a ^ r a o l 2 [" 32R 13E 3D

We can simplify these formulae if we assume that essen-tially all of the nonlinearity in e(x,y) appears in E(x) andD(x), the hull processing models. In this case, all of thesecond derivatives involving R vanish:

d2e(x,y) _ 8R[ 32D~\

dydx ~ d D j

dxr2

dR 32E dR 32D_i_ ^ ^ _ ^ _ _ _

~ dE dx2 dD dx2 '

If we further assume that e(x,y) satisfies the second ad-missibility condition [i.e., d2e(x,y)/dx2 is strictly positivedefinite], then the convergence condition will be satisfied ifd2D/dydx is zero. Now consider what happens when D(x, y)is implemented by each of the six locally calibratable inter-nal approximations Fn,...,Fi6 in Figure 8. Notice that themixed derivative d2D/dydx is zero in the cases of Fn andF,3. Likewise, the mixed derivative d2D/dydx is zero in thecases of Fi2 and Fi4 provided (respectively) that Ft (x) is lin-ear and that F,(jc) - F,(Jc) is linear. In the case of Fi5, itsuffices that F,(jf)/f}(Jc) be constant. The corresponding con-dition in the case of F,6 is more complicated. Our analysisthus indicates that key factors governing convergence in-clude the relative distribution of nonlinearity in different partsof the objective function and the degree of nonlinearity inthe error of the internal approximation.

4.2.2. Extending the correctness andconvergence theorems

Our analysis so far has applied only to the idealized ver-sion of strategy MLM2, in which the LineSearch routine iseither disabled or operates as a null operation. We can ex-tend our analysis to include the line search in the followingway. Define the function L(sold,snew) with the following be-havior. Consider the series of points (Fo, F,,..., F,,...), whereh = snewandti+i = (sold + it)/2,fora\li>0.LetL(sold,snew)be tj, where i is the smallest integer such that e(F,) is betterthan e(sold) and | F, - sold\ > S, if such an integer exists. LetL(sold,snew) be sold, otherwise. We can now analyze the be-havior of strategy MLM2 by considering the series of pointsgenerated by the composition of the InnerOptimization andLineSearch routines: C(s) = L[s,I(s)].

Under the assumptions of our original correctness and con-vergence theorems, it is possible to prove that, for all s0 in[/,«], the series (Jo.s,,...,!,-,...) generated by the rulesi+, = C(Sj) converges to a limit s,. Notice first that si+, =(1 - h;)Sj + ( / J , ) / (5 , ) , where each ht lies in the interval[0,1]. Now consider two cases. 1) Suppose Sjlo'1; = °°-First, we define r, = |s, - sf\, where sf is the unique fixedpoint of I(s) in [l,u], from our original correctness proof.We can then show that r,+1 ^ a,r, , where a, = 1 -(1 — c)h;, by using simple algebra, along with the fact thatSf = I(sf) and the fact that |/(«,-) - /(£/)! < c|s(- — Sf\ forthe constant c < 1, from our original convergence proof.We can then show that I I " o ai - 0 by taking the logarithmof this product and using the fact that ln(l — x) < —x. Itfollows that the series (r0, r , , . . . , r,,...) converges to zero,and the series (so,sv...,!,,...) converges to the limit i, =Sj-. 2) Suppose 2/*lo ht < oo. We can apply the Cauchy con-




vergence criterion (Rudin, 1964) to this summation: Thus,for every e > 0, there is some integer TV such that2"=m+i hj < e whenever n > m > N. Notice also that

J,) - S;\ ^ hj\u - l\. Using the trianglesi+l- Sj\ =

< (S?=m+iinequality, it follows that \sn - sThus, for every e > 0, there is some integer N such that| sn — sm\ < e whenever n> m> N. Applying the Cauchycriterion again, it follows that series (s0, J,,...,s,„...) con-verges to a limit s,. (This proof was taken from Kachiyan,1997.)

Under the assumptions of our original correctness and con-vergence theorems, it is also possible to prove that a point sin [I,it] is a local optimum of e(x) if and only if J is a fixedpoint of C(s), provided the threshold 5 used in the defini-tion of L(sold,snew) is set to zero. Unfortunately, the func-tion C(s) may not be continuous. Consequently, the limit .s,of the series (s0 ,*,, . . . , s,.,...) generated by the rule si+, =C(SJ) may not be a fixed point of C(s) and thus not a localoptimum of e(x). Although the composition of the Inner-Optimization and LineSearch is guaranteed to converge to alimit, the result may not be a correct solution to the originaloptimization problem. We can overcome this limitationby using a more sophisticated line search algorithm, onethat satisfies the Goldstein-Armijo conditions (Dennis &Schnabel, 1983). Unfortunately, a more sophisticated linesearch would likely result in greater use of the exact objec-tive function e(x) and correspondingly greater computa-tional expense.

Our analysis does allow us to make the following guar-antee about the correctness of strategy MLM2, even whenusing our original line search routine. After the procedureterminates at a point sh we determine whether the conditionConvergedl(si,InnerOptimization{s,)) is satisfied. [Be-cause MLM2 uses Convergedl{s,,LineSearch{sl,Inner-Optimization(s,))) as its convergence test, this condition mayor may not be satisfied.] When the condition is satisfied, wehave some level of assurance that s, is close to a fixed pointof the InnerOptimization function. The level of assurancedepends on the tolerance e used in the Converged? test. Ac-cording to our original correctness theorem, any fixed pointof InnerOptimization also is a local optimum of e(x), theexact objective function. Therefore, we have the same levelof assurance that s, is close to a correct solution to the orig-inal optimization problem.

4.2.3. Practical applicability of thetheoretical analysis

For many applications, it will be difficult or impossibleto establish the conditions of our theorems about the cor-rectness and convergence of strategy MLM2. For example,in the yacht domain, the objective function is sufficientlycomplicated that one cannot analytically verify the condi-tions of the theorems. The conditions may be checked em-pirically by sampling points in the design space; however,sampling would incur considerable computational cost andstill would not provide any absolute guarantees. For these

reasons, we expect our theorems to have at most limitedvalue for a priori prediction of correctness and conver-gence. The theorems may nevertheless be useful in otherways. For example, suppose one uses strategy MLM2 to solvea problem, and the algorithm does not appear to converge.Our convergence theorem suggests ways to overcome thisdifficulty. In particular, one might help the algorithm to con-verge by switching to an approximation scheme that is morelikely to satisfy the condition on the mixed derivative. Onthe other hand, suppose the algorithm does converge, butsubsequent evaluations of the objective function in the neigh-borhood of the solution demonstrate that the algorithm didnot find a local optimum. Our correctness theorem suggestsways to overcome this difficulty. In particular, one may curea problem with correctness by switching from an approxi-mation scheme of order zero to an approximation schemeof order one.

4.2.4. Performance of the recalibrating strategy

The overall performance of MLM2 will depend on a va-riety of issues: the accuracy of e(x, y) at locations removedfrom calibration points, the computational cost of e(x,y) incomparison with the cost of e(x), and the computational costof recalibrating e{x,y). Little can be said in general aboutthese tradeoffs. Nevertheless, the asymptotic behavior ofMLM2 can be characterized more precisely. In particular,the order of convergence (Dahlquist & Bjorck, 1974) ofMLM2 is equal to the smallest derivative of I{s), with a non-zero norm at the optimum [assuming that suitably many de-rivatives of I(s) are defined and continuous]. Thus, if \dl(s)/ds\ > 0 at the optimum, then MLM2 will have linearconvergence. It will therefore be asymptotically inferior toa single-stage strategy that uses a Newton or quasi-Newtonmethod in combination with the exact objective function.There will be some sufficiently low error tolerance that canbe achieved faster by the single-stage strategy. On the otherhand, if \dl(s)/ds\ = 0 at the optimum, MLM2 will have qua-dratic or better convergence and will be asymptotically asgood or better than the default strategy. Of course, the de-sired error may not be one that leads the algorithms into re-gions of behavior where asymptotic comparisons are relevant.Asymptotic comparisons have limited value in practice.

4.2.5. Extending the analysis to constrainedoptimization problems

We observe in passing one more way in which these an-alytic results may be generalized. The theorems are limitedto problems of unconstrained optimization. They could beextended to constrained optimization in two ways: 1) re-calibratable approximation of the objective function and2) recalibratable approximation of the constraints. Our proofsof convergence and correctness might be generalized to han-dle such approximation methods for constrained optimiza-tion problems by using the Karush-Kuhn-Tucker conditions(Peressini et al., 1988) to characterize constrained optima




of e(x) and e{x,y); however, we have not yet attempted togeneralize our results in this fashion.

5. EXPERIMENTAL RESULTS

5.1. Implementation of multimodeloptimization strategies

We have implemented our multimodel optimization strat-egies as part of the Design and Modelling/Simulation As-sociate (DA-MSA) (Ellman et al., 1992, 1995, 1997). TheDA-MSA is a system that supports interactive constructionof numerical models for simulation of engineering designartifacts. It also supports construction of design optimiza-tion strategies that use simulation models to solve designproblems. Simulation models and optimization strategies arerepresented visually as second-order dataflow graphs. Nodesrepresent functions such as root finding, integration, and op-timization. Arcs represent flow of data and control. The data-flow graphs are actually implemented as executable LISPfunctions that wrap numerical C routines (Press et al., 1986).The DA-MSA also implements compositional modelling inthe following way: It maintains a library of functions thatmay be used to define simulation models and optimizationstrategies. The library is organized in the manner indicatedin Figure 6. Each function in the system is (potentially) im-plemented in several different versions. Each version em-bodies a different approximation of the function.

The user begins a session with the DA-MSA by hand cod-ing an initial simulation model and optimization strategy.The initial model includes the most accurate version of eachfunction. The initial strategy includes only a single stage ofoptimization. The user subsequently modifies the initialmodel and strategy by using a catalog of transformations.Each transformation replaces one dataflow graph (or sub-graph) with a new one. Transformations implement a vari-ety of changes, such as substituting one version of a functionfor another, approximating functions, and introducing mul-tiple stages of optimization. For example, the system in-cludes transformations that replace a single-stage strategywith multistage strategy MLMU MLM2, or MLM3. It alsoincludes transformations that construct each of the locallycalibratable objective functions et(x,y),...,e6(x,y) (Fig-ure 8). A complete description of the DA-MSA transforma-tion system is beyond the scope of this paper. For additionalinformation, see Ellman et al. (1995, 1997).

5.2. Setup of experiments in the yacht domain

Our experiments in the yacht domain were intended to ad-dress the following questions: 1) How do the multimodelstrategies compare with two alternative strategies: a) opti-mization using only the "exact" objective function and b)optimization using only an approximate objective func-tion? 2) How do the multimodel strategies MLM,, MLM2,and MLM3 compare with each other? 3) How do the cali-

bratable approximation schemes ex(x,y),...,e6(x,y) com-pare with each other?

We set up our experiments by using the DA-MSA to con-struct several versions of strategies MLMX, MLM2, andMLM3, each instantiated in the domain of sailing yacht de-sign. We chose the effective draft computation as a focusfor experimenting with approximations in the yacht do-main. The reasons for this choice are as follows: The mostexpensive parts of an "exact" course time computation arethe PMARC (effective draft) and SLAW (wave resistance)CFD codes; however, for the class of sailing yachts that in-cludes the "Stars and Stripes '87," the curves fitted to tankdata give answers nearly as accurate as the SLAW code, butat much lower cost. Yachts in this class can be designedequally well by using SLAW or the fitted wave resistancecurves. In contrast, the algebraic approximation of effec-tive draft is not nearly as accurate as PMARC. Yachts inthis class cannot be designed properly by using the alge-braic approximation alone. Considerations of cost effective-ness thus led us to focus on multimodel strategies involvingPMARC and the algebraic approximation of effective draft.When we constructed the fixed approximation e(x) and thelocally calibratable approximations ex (x, y),...,e6(x, y), welet the PMARC effective draft model play the role of F,-(Jc)(the internal quantity to be approximated), and we let thealgebraic formula for effective draft play the role of F,(Jt)(the fixed internal approximation). We used forward differ-encing to implement the gradient computations required inapproximation schemes e2(x,y), e4(x,y), and e6(x,y).

We conducted two groups of experiments. The first groupwas carried out by using an interpolated version of the func-tion D (Jc) that computes effective draft with PMARC. The in-terpolated version of D (x) was constructed by doing 35 = 243PMARC runs to generate a table mapping design parametersto effective draft. The table has one dimension for each of thefive yacht design parameters: Length, Beam, Draft, Keel-Height, and WingletSpan. The table was then used to con-struct a cubic-spline interpolation function that computeseffective draft for arbitrary values of these parameters. Theinterpolated version of D(x) was needed to experiment withstrategies that use random seeds to initialize optimization. Thisversion allowed us to conduct a relatively large number of ex-perimental optimization runs and to average the results. Thetable of PMARC evaluations is necessary only as a part of ourexperimental apparatus. The expensive fitting process itselfdoes not occur in any of the optimization strategies our ex-periments are designed to evaluate. The second group of ex-periments was carried out by using PMARC itself under thecontrol of various design optimization strategies. In bothgroups of experiments, the necessary PMARC runs were setup by using a fully automatic panelization system (Yao & Gel-sey, 1996). The panelization program was run on a SunMicrosystems Sparcstation 2. PMARC itself was run on(one processor of) a three-processor DEC Alpha 2100. EachPMARC run required approximately 1 h of total CPU timeto do the panelization and run the flow code.




We used CFSQP to perform all of the underlying numer-ical optimizations in our experiments. CFSQP carried outboth the first and second stages of optimization in each ex-periment with strategy MLMX. Likewise, CFSQP carried outthe inner optimizations in each experiment with strategyMLM2 • In addition, CFSQP was used to carry out all of thesingle-level optimizations, which we used as a baseline tomeasure the relative speedup obtained by our multilevel strat-egies. Although CFSQP is capable of solving constrainedoptimization problems, we used CFSQP as an unconstrainedoptimizer. As a formality, the CFSQP code requires us tosupply upper and lower bounds on each design parameter;however, we have observed experimentally that the boundsdo not actively constrain the solutions we find.

5.3. Results from interpolated PMARC

A portion of our results from experiments using the inter-polated version of PMARC are found in Tables 1 and 2.These tables compare the performance of several strategieson the following two problems. Design a yacht with mini-mum course time on an America's Cup race course sailingin a 10-kt wind and a 20-kt wind.2 The row labeled MLMX

refers to a strategy that uses two stages of optimization. Stageone uses the algebraic formula for effective draft. Stage twouses interpolated PMARC. The row labeled Pure-Algebraicrefers to a single-stage strategy that uses the algebraic for-mula to compute effective draft. Likewise, the row labeledPure-PMARC refers to a single-stage strategy that uses in-terpolated PMARC to compute effective draft. All strat-egies were run multiple times using 25 randomly selectedseeds to initialize each run. For each strategy, we recordedthe average number of PMARC evaluations per seed andthe median value of the final course time. We also com-puted a statistic that estimates the performance of each strat-egy when used repeatedly with multiple seed designs. Thestatistic N(p,e) represents the number of seed designs thateach strategy would require to have a probability p of find-ing a design whose course time is within a fraction e of thebest design found for this problem. Using elementary prob-ability theory,

, log(l - p)

Table 1. Comparison of strategies using interpolated PMARC(10 kt wind)

Strategy

Pure-algebraicPure-PMARCMLM,

PMARCevals

(per seed)

1.080.0024.52

Coursetime

(median)

3.3633.2503.247

Seeddesigns

yv(o.99,o.oi)

5.612.86

PMARCevals(total)

448.7570.16

log(l -

the seeds resulted in a design within e of the best design. Anestimate of the overall CPU time needed to come within eof the best design results from multiplying N(p,e) by theaverage number of PMARC evaluations per seed. This prod-uct is recorded in the last column of Tables 1 and 2.

Notice first that the Pure-Algebraic strategy failed to finda design within 1 % of the best on any of the 25 trials oneither test problem. (In the case of the Pure-Algebraic strat-egy, the table records the average final course time as mea-sured by using an interpolated PMARC and records the oneinterpolated PMARC evaluation conducted at the end of eachoptimization to evaluate the final design.) On the first testproblem (10-kt wind), the Pure-PMARC strategy requiresan average of 448.75 PMARC evaluations to have 99%chance of coming within 1% of the best design. In contrast,the two-model, two-stage strategy MLMl requires an aver-age of 70.16 PMARC evaluations to achieve the same de-gree of reliability, that is, only about 15.63% of the CPUtime, or a reduction factor of 6.40 in comparison with thePure-PMARC strategy. On the second test problem (20-ktwind), the Pure-PMARC strategy requires an average of632.52 PMARC evaluations to have 99% chance of comingwithin 1% of the best design. In contrast, the two-model,two-stage strategy MLMX requires an average of 191.95PMARC evaluations to achieve the same degree of reliabil-ity, that is, only about 30.35% of the CPU time, or a reduc-tion factor of 3.30 in comparison with the Pure-PMARCstrategy.

Additional results from experiments using the inter-polated version of PMARC are presented in Tables 3 and 4.These data were obtained by using the same experimental

where r(e) is the probability that an optimization from asingle seed results in a design within e of the best design.The value of r(e) for each strategy was obtained from a his-togram of the final course times resulting from the 25 seedsused in the experiments. A missing entry for N{p,e) indi-cates that r{e) = 0 in the experimental data, that is, none of

2 Our VPP program does not model tacking and therefore cannot im-mediately handle race courses that involve sailing directly upwind. There-fore, we modified the America's Cup course by replacing the upwind legwith one that serves to estimate the crew's tacking strategy.

Table 2. Comparison of strategies using interpolated PMARC(20 kt wind)

Strategy

Pure-algebraicPure-PMARCMLM,

PMARC Courseevals time

(per seed) (median)

Seeddesigns

W(0.99,0.01)

1.0100.8842.56

2.4022.3632.364

6.27

4.51

PMARCevals(total)

632.52191.95




Table 3. Comparison of strategies using interpolated PMARC (10 kt wind)

StrategyPMARC evals

(per seed)Inner opts,(per seed)

Course time(median)

3.5703.2583.2533.2593.2573.248

Seed designsN(0.99,0.01)

6.274.516.274.515.61

PMARC evals(total)

MLM2:e,

MIM2: e2

MLM2: e-i

MLM7: e4

MLM2: e5

MLM2: e6

8.6413.242.60

13.22.72

15.24

2.402.082.322.122.442.40

83.0711.7282.8212.2685.49

setup (America's Cup course, 10- and 20-kt winds, 25 seedsfor each strategy). These tables compare the performanceof various instantiations of MLM2, the recalibrating strat-egy. The rows labeled MLM2: <?,,.. .,MLM2:e6 show theresults of using strategy MLM2 instantiated with the corre-sponding one of the six locally calibratable approximationschemes e,(Jc,y),...,e6(x,y) to combine PMARC and thealgebraic formula. Consider first how well each strategy per-formed in terms of design quality. Strategy MLM2: ex failedto come within 1 % of the best solution on all 25 trials oneither test problem. All of the other strategies, MLM2 '•e2,..., MLM2: e6, came within 1 % of the best on over halfof the 25 trials on both test problems. Consider now howwell each strategy performed in terms of CPU time, that is,the number of (interpolated) PMARC evaluations per seed.Strategies MLM2: e3 and MLM2: e5 were the best in this re-spect, each requiring about three PMARC evaluations perseed on the 10-kt-wind problem and about seven PMARCevaluations per seed on the 20-kt-wind problem. Finally, con-sider the number of PMARC evaluations needed by eachstrategy to have a 99% probability of getting within 1% ofthe best design. Using this measure, the best performanceresults from strategy MLM2: e3 (an average of 11.72 PMARCevaluations on the 10-kt-wind problem and 29.95 PMARCevaluations on the 20-kt-wind problem) and from strategyMLM2: e5 (an average of 12.26 PMARC evaluations on the10-kt-wind problem and 36.42 PMARC evaluations on the20-kt-wind problem). In comparison with the Pure-PMARCstrategy, MLM2: e3 requires only 2.61% as much CPU time

on the 10-kt-wind problem and 4.73% as much CPU timeon the 20-kt-wind problem. Likewise, in comparison withthe Pure-PMARC strategy, MLM2: es requires only 2.73%as much CPU time on the 10-kt-wind problem and 5.76%as much CPU time on the 20-kt-wind problem. Summariz-ing the results in Tables 3 and 4, we see that strategiesMLM2:e2...MLM2:e6 require from 2.61% to 19.94% asmuch CPU time, or reduction factors ranging from 38.3 to5.01, in comparison with the Pure-PMARC strategy.

The numbers of PMARC evaluations used by strategiesMLM2: e, , . . . , MLM2: e6 require explanation (Tables 3 and4). The approximation schemes <?,, e3, and e5 each requireone PMARC evaluation per calibration. The approximationschemes e2, e4, and e6 each require six PMARC evalua-tions per calibration (one plus the number of design param-eters) to compute numerically the gradient of effective draft.If we multiply the average number of inner optimizationsper seed by the number of PMARC evaluations needed percalibration, we get a result that is lower than the averagenumber of PMARC evaluations per seed that were actuallyused. The difference is due to evaluations of PMARC thatoccurred during the LineSearch procedure of strategy MLM2.In some cases, line searches resulted in more PMARC eval-uations than were needed for calibration.

5.4. Results from real PMARC

Results from a set of experiments using PMARC itselfare found in Tables 5 and 6. These data were generated

Table 4. Comparison of strategies using interpolated PMARC (20 kt wind)

Strategy

MLM2:el

MLM7

M1M-,MLM2

MIM2

MLM2

e2

e-t

e4

e$

PMARC evals(per seed)

12.1218.966.64

20.127.24

15.92

Inner opts.(per seed)

3.723.003.803.164.362.52

Course time(median)

2.6402.3642.3632.3662.3642.363

Seed designs#(0.99,0.01)

6.274.516.275.035.61

PMARC evals(total)

118.8829.95

126.1536.4289.31




by using the same test problems. The rows labeledMLM3:eu...,MLM3:e6 contain results of using strategyMLM3, that is, a two-stage strategy that optimizes N = 25randomly generated seeds by using the fixed approximatione(x) and then optimizes the best result by using strategyMLM2. In each case, the first stage used the algebraic for-mula for effective draft. In each case, the second stage usedone of six locally calibratable approximation schemese,(x,y),...,e6(x,y) to combine PMARC and the algebraicformula. The row labeled MLM3: Pure-PMARC containsresults from a variation of MLM3. In this variation, the sec-ond stage (using MLM2) was replaced with a single optimi-zation by using PMARC to compute effective draft. Noticethat all of the strategies found designs within 1 % of the bestdesign on the 10-kt-wind problem and within 2% of the bestdesign on the 20-kt-wind problem. Strategies MLM3: e3 andMLM3: e5 were the fastest on the 10-kt-wind problem, re-quiring only two PMARC evaluations, that is, 8.0% of the25 PMARC evaluations used by strategy MLM3: Pure-PMARC. Strategy MLM3: e3 was the fastest on the 20-kt-wind problem, requiring only four PMARC evaluations, thatis, 8.0% of the 50 PMARC evaluations used by strategyMLM3: Pure-PMARC. Notice also that strategy MLM3: Pure-PMARC did not return the best design on either test prob-lem. We can explain this result in the following way. PMARCinvolves lots of numerical discretization, which introducesdiscontinuities and nonsmoothness into any objective func-tion that calls PMARC. These pathologies apparently causestrategy MLM3 : Pure-PMARC to stop at a false local opti-mum. In contrast, a locally calibrated approximation es(x)has relatively fewer pathologies because it does not callPMARC (after calibration). The relative absence of pathol-ogy apparently enables some of the recalibrating strategiesto avoid getting stuck at false local optima.

5.5. Evaluating the impact of recalibration

We conducted an additional set of experiments to evaluatethe importance of recalibration of approximate objectivefunctions. In particular, we modified the recalibrating strat-egies reported in Tables 3-6 so that each would terminate

after a single calibration of the objective function and a sin-gle inner optimization. Results from running these modi-fied strategies on the same test problems as described aboveare summarized in Tables 7 and 8 (interpolated PMARC)and Tables 9 and 10 (real PMARC). The rows labeled Scorrespond to the strategies that do a single calibration. Therows labeled M correspond to the strategies that performmultiple calibrations/recalibrations. (The M data were sim-ply copied from Tables 3 and 4 and Tables 5 and 6.) First,consider the data obtained using interpolated PMARC(Tables 7 and 8). Notice that the recalibrating strategies foundbetter designs than did the strategies that perform a singlecalibration, at the price of additional PMARC computa-tions. An estimate of the number of PMARC evaluationsneeded to have 99% probability of getting within 1% of thebest design is found in the last column of Tables 7 and 8. Ac-cording to this measure, the recalibrating strategies are com-parable to the strategies that perform a single calibration, eachrequiring about 10-13 PMARC evaluations for the 10-kt-wind problem and about 30-38 PMARC evaluations for the20-kt-wind problem. Now consider the data obtained byusing real PMARC (Tables 9 and 10). On the 10-kt-windproblem, the same overall numbers of PMARC evaluationswere used by the recalibrating strategies and the strategiesperforming a single calibration. In addition, the single cal-ibrating strategies found designs that were identical to thecorresponding recalibrating strategies, even though the re-calibrating strategies carried out an additional inner optimi-zation. This result is explained by the fact that in both casesthe second inner optimization failed to make any improve-ment over the seed design and thus returned the seed as thesolution. On the 20-kt-wind problem, recalibration made adifference. Both of the recalibrating strategies resulted inbetter designs than the strategies performing a single cali-bration at the price of additional PMARC computations.

5.6. Conclusions to be drawn fromexperimental results

Our experiments support the following general conclusionsabout the performance of our multimodel strategies in the

Table 5. Performance of multimodel strategies using real PMARC(10-kt wind)

Table 6. Performance of multimodel strategies using real PMARC(20-kt wind)

Strategy

MLM%: Pure PMARC

MLM^

MLMi

MLM^

MIMy

MLM-,

MLM,

« i

e-i

«4

« • !

« 6

PMARC evals

2512122827

Inner opts

122121

Course time

3.2803.2853.2733.2753.2853.2793.285

Strategy

MLM%: Pure PMARC

MLM^

MLM,

MLM,

MLM,

MLM,

MLM,

e,?2

« 3

« 4

« 1

« 6

PMARC evals

5015204

121513

Inner opts

233242

Course time

2.3762.3572.3722.3632.3632.3502.380




Table 7. Comparison of single and multiple recalibration using interpolated PMARC (10-kt wind)

StrategyPMARC evals

(per seed)Inner opts(per seed)

Course time(median)

Seed designs/V(0.99,0.01)'

PMARC evals(total)

MIM2: e3 (S)MLM2:e3(M)MLM2: e5 (S)MLM2:e5(M)

2.042.602.042.72

1.002.321.002.44

3.2573.2533.2623.257

5.034.516.274.51

10.2511.7212.8012.26

sailing yacht domain. The best performing strategies usedthe recalibratable approximation schemes e3 and e5. Theseschemes use both the algebraic formula and PMARC to com-pute effective draft in searching for a good design, that is,they really use multiple models. They outperformed the Pure-PMARC strategy and the approximation schemes e, and e2,all of which use only a single model (PMARC) and fail touse the algebraic formula at all. Our results thus demon-strate the value of using multiple models in a design opti-mization process. Approximation schemes e3 and e5 (whichassume locally constant error) also outperformed approxi-mation schemes e4 and e6 (which assume locally linear er-ror). This result might be surprising, considering that schemese3 and e5 both have order zero, whereas schemes e4 and e6

both have order one. Apparently, the algebraic formula foreffective draft by itself does a reasonably good job of fit-ting the gradient of the true effective draft, as computed byPMARC. When instantiated in the yacht domain with thealgebraic formula, the approximation schemes e3 and e5 for-tuitously have order one. The computational expense in-curred in schemes e4 and e6 for fitting the gradient isapparently not justified by improved fitting accuracy. (Be-cause we used forward differencing to compute these gra-dients, we suspect that our results might have been differentif a more efficient technique for computing the gradient ofeffective draft had been available.) Finally, recalibration wassometimes important to the success of our multimodel strat-egies. On one test problem, the strategies that periodicallyrecalibrate approximations found better designs than strat-egies that simply perform a single initial calibration.

Our experiments relied on CFSQP to carry out all of theunderlying numerical optimizations required by the strat-egies we tested. Of course, we could have used any one of

a large number of other codes for this purpose. A good sur-vey of optimization software is found in More" and Wright(1993). One might ask whether our results would have beendifferent if we had used a different numerical optimizationcode. To address this question, consider what would happenif all optimization codes returned the same solution whenstarted from the same seed point. In such an ideal situation,the choice of optimization code would not impact the input/output behavior of the InnerOptimization routine used instrategies MLM2 and MLM3. The choice of code also wouldnot affect the sequence of evaluations of the exact objectivefunction e(x) invoked by strategies MLM2 and MLM3 be-cause e(x) is not used in the internal optimization steps ofthese strategies. Because our cost metric ignores evalua-tions of the approximate objective function e(x, y), the choiceof optimization code would also not affect the overall costof running either strategy MLM2 or strategy MLM3, as mea-sured by the number of evaluations of e(x). Therefore, theactual algorithm used for each internal optimization wouldbe irrelevant to the absolute performance of strategies MLM2

and MLM3. Of course, the choice of optimization code stillwould influence the cost of a single-level strategy, whichwe used as a baseline to measure the relative speedup ob-tained by our multilevel strategies. Furthermore, in real life,different numerical codes will return different solutions, evenwhen started from the same seed. For this reason, we can-not guarantee that our experiments would have demon-strated the same relative improvements in performance ifwe had switched from CFSQP to another numerical optimi-zation code.

One could also ask whether our results would have beendifferent if we had carried out experiments in some appli-cation domain other than yacht design. We cannot provide a

Table 8. Comparison of single and multiple recalibration using interpolated PMARC (20-kt wind)

StrategyPMARC evals

(per seed)Inner opts(per seed)

Course time(median)

Seed designsiV(0.99,0.01)

PMARC evals(total)

MLM2: e, (S)MLM2:e3(M)MLM2: e5 (S)MLM2:e5(M)

2.246.642.207.24

1.003.801.004.36

2.4122.3632.4152.364

16.784.51

14.025.03

37.5929.9530.8436.42




Table 9. Comparison of single and multiple recalibrations usingreal PMARC (10-kt wind)

Strategy PMARC evals Inner opts Course time

MLMT, : e, (S)

MLMf.e^ (M)MLM,: e5 (S)MLM^-es (M)

2222

121

2

3.2753.2753.2793.279

definite answer without actually trying our methods on someother application, but we can try to identify the features ofthe yacht design problem that enabled our methods to besuccessful. In the yacht domain, the exact objective func-tion e(x) includes one subroutine (PMARC) that domi-nates the cost of evaluation. This subroutine computes afunction (effective draft) that is simple enough to be ap-proximated locally by an algebraic formula. In addition, mostof the nonlinearity in the objective function e(x) results fromsubroutines other than the one being approximated. Thesefacts characterize the situations in which we expect internalapproximations to be most cost effective. Therefore, we ex-pect our methods to be successful on problems that exhibitthese characteristics.

6. RELATED WORK

A considerable amount of research on multilevel designoptimization has been carried out by investigators workingon structural design or multidisciplinary design in theaerospace community (Sobieszczanski-Sobieski, 1982;Sobieszczanski-Sobieski & Hatfka, 1996). Most of this re-search has focused on problems in which optimization iscomputationally expensive due more to the presence of manydesign parameters than to the expense of individual evalu-ations of objective or constraint functions. Accordingly, suchwork has focused on decomposing design spaces into nearlyindependent subspaces. In this work, each "level" of a multi-level design process corresponds to a factor space of theoriginal design space. In contrast, our work is focused onproblems in which optimization is computationally expen-sive due more to the cost of evaluation than to the presence

Table 10. Comparison of single and multiple recalibrations usingreal PMARC (20-kt wind)

Strategy PMARC evals Inner opts Course time

MLMyMLM,MLMy

MS)MM)MS)es (M)

34

1115

1314

2.3792.3632.3822.350

of many design parameters. We leave the design space fixed.In our work, each level corresponds to a model of theobjective function that uses a particular combination ofapproximations.

A statistical approach to multilevel engineering design ispresented in Osio and Amon (1996). This technique appliesto problems for which a series A/1,...,Mq of computa-tional models of increasing accuracy and cost is availableto evaluate candidate designs. It operates by constructing aseries Y',..., Y q of "surrogate models." During the ith stageof operation, the model M' is evaluated at a set of designpoints. These points are chosen by a data-adaptive optimalsampling technique. The evaluations are used to constructthe surrogate Y' through a Bayesian updating process inwhich the previous surrogate K'~' is used to define a priordistribution. This approach is similar to ours in the sensethat it uses cheap approximate models to select points atwhich more expensive accurate models will be evaluated. Itdiffers from ours in the sense that sampling is focused onregions of the design space where errors are large and sys-tematic, regardless of whether such regions contain gooddesigns. In contrast, our system uses successive optimiza-tions to focus sampling on regions of the design space thatare close to locally optimal designs.

A method of constructing and optimizing approximateobjective functions is outlined in Vanderplaats (1984).The method operates by moving through a design space andevaluating the exact objective function e{x) at a series(s0, s,,..., Jn) of points. It periodically uses the evaluationsof these points to fit an approximate version e (x, s0, s,,..., sn)of the objective function. Depending on the number kof free parameters in the class of fitting functions, the fit-ting process may either interpolate the evaluations of{so,S\, ,sn) exactly or do a least-squares approximation.In either case, the approximation e(x,s0,st,...,sn) is thenused in a fast optimization process, resulting in one or morenew design points and a new fitting process. This techniqueis similar to ours in the sense that it involves a process ofinterleaved fitting and optimizing of approximate objectivefunctions. There are two main differences. Our technique isbased on internal approximations rather than on approxima-tions of the objective function as a whole. Furthermore, ourtechnique uses multiple domain-specific models in the fit-ting process rather than a single domain-specific model incombination with a domain-independent fitting technique.

Our recalibrating strategies are similar in spirit to numer-ical continuation methods for solving equations or optimiz-ing functions (Allgower & Georg, 1990). A continuationmethod for finding a root of a function f(x) uses a contin-uous function g(x,p) to define a family of approximationsparameterized by p in much the same manner as a locallycalibratable approximation scheme e(x,y) defines a familyof approximations parameterized by y. The family g (x, p) ischosen so that/(x) is the restriction of g(x,p) to the sub-space defined by p = 0 and so that a solution to the equa-tion g(x,\) - 0 is known or easy to find. A solution to the




original problem is found by gradually changing p from 1to 0 and repeatedly solving the equation g(x,p) = 0 by usingthe previous solution as a starting point. The chief differ-ence between our recalibration method and a standard con-tinuation method lies in how the fitting parameter/? is used.In a continuation method, the fitting parameter p is con-trolled by an independent process. In our recalibrationmethod, the parameter y is controlled by the optimizationstrategy itself.

Research on knowledge-based design optimization hasbeen pursued by several investigators in the artificial-intelligence community. One portion of this work has fo-cused on automating the choice of the underlying numericalalgorithm and the numerical parameters to that algorithm(Orelup et al., 1988). Another portion of this work has at-tempted to use numerical methods in combination with rule-based inference to guide the search for an optimal design(Powell, 1990; Tong, 1990). This work is similar to oursinasmuch as we are both motivated in part by a desire todeal with pathological objective functions. In contrast to ourwork, however, these efforts have adhered to a paradigm inwhich there is a single model of the objective function, whichremains fixed during the optimization process.

A method of adapting standard optimization techniquesto deal with modelling error is reported in Cagan andWilliams (1993). In this work, the authors developed amodification of the Karush-Kuhn-Tucker conditions forlocal optimality. In particular, they presented a method ofmodifying the objective function (for unconstrained opti-mization) and the Lagrangian function (for constrainedoptimization). Their modification allows one to includerobustness of the design with respect to modelling error aspart of the objective to be optimized. The method relies onhuman-supplied weights to decide how to balance compet-ing concerns of optimal performance and robustness. Thiswork is similar to our own inasmuch as it provides a meansof using approximate models in an optimization process;however, it differs in focusing on the use of a single modelrather than multiple models. Furthermore, it aims to im-prove robustness of the design rather than lowering the com-putation cost of the design process.

Methods of reasoning with multiple models have beeninvestigated by the qualitative-physics community. Some ofthis work is similar to ours in that it has studied methods ofdynamically selecting among multiple models to choose onethat is well suited to the task at hand. Most of this work hasconsidered gross relevance as the sole criterion for select-ing among models (Falkenhainer & Forbus, 1991; Nayak,1994). In this context, methods of dependency tracingsuffice for deciding which model to use to answer a givenquery. More sophisticated methods reason about the sign ofthe error of approximate models (Weld, 1990; Addankiet al., 1991). In contrast to our work, considerations of nu-merical accuracy are not used to guide the choice of a suit-able model.

Methods of using multiple models to assure the qualityof numerical simulations are reported in Gelsey (1995). Thiswork is similar to ours inasmuch as we both used the yachtdomain and the PMARC code as experimental testbeds; how-ever, this work focused on improving the reliability of sim-ulations of individual designs. Unlike our work, it did notaddress the use of multiple models in an overall designprocess.

In previous work (Ellman et al., 1993), we developed atechnique called Gradient Magnitude Model Selection(GMMS) for dynamically choosing between exact and ap-proximate models during design optimization. GMMS usesexplicit accuracy estimates to decide which model to usefor each numerical comparison of evaluations that takesplace during a design optimization process. The utility ofGMMS is limited by the fact that it requires a special nu-merical optimization algorithm. It cannot be combined withan arbitrary numerical optimization code without exten-sive rewriting of that code. In the present work, we haveattempted to formulate multimodel strategies that can beintegrated more easily with arbitrary numerical optimi-zation codes, that is, by calling such codes as black-boxsubroutines.

7. SUMMARY

We have developed, analyzed, and tested a family of strat-egies for using multiple models to optimize engineering de-signs. Our strategies are useful when multiple approximationsof the objective function can be implemented with compo-sitional modelling techniques. Compositional modelling isimportant in this context because it enables selective ap-proximation of internal components of the objective func-tion. We have shown how a compositional modelling librarycan be used to construct a variety of locally calibratable ap-proximation schemes, each of which can be used in our fam-ily of optimization strategies. We have analyzed the schemesand strategies to formulate and prove sufficient conditionsfor correctness and convergence. Correctness depends in parton whether the internal approximation accurately fits thegradient of the function being approximated. Convergencedepends, roughly, on the distribution of nonlinearity in theobjective function and on the degree of nonlinearity in theerror of the internal approximation. We also have tested ourapproximation schemes and optimization strategies exper-imentally in the domain of sailing yacht design. Our resultsdemonstrate that our strategies achieve dramatic reductions,ranging from a factor of 5 to a factor of 38, in the CPU timerequired for optimization on the sailing yacht problems wetested, with no significant loss in design quality. The greatestreduction depends on the use of multiple models of internalcomponents of the objective function. Preservation of designquality depends on the use of periodic recalibration of ap-proximations during the optimization process.



Multilevel modelling for design optimization

ACKNOWLEDGMENTS

The research reported in this paper was supported by the Hyper-computing and Design Project at Rutgers University, which is spon-sored by the Advanced Research Projects Agency of the Departmentof Defense through contract ARPA-DABT 63-93-C-0064. We havebenefited from discussions with Saul Amarel, Gene Bouchard,Vasek Chvatal, Martin Fritts, Andrew Gelsey, Haym Hirsh, Le-onid Kachiyan, John Letcher, Mike Meinhold, Gerry Richter, NilsSalvesen, Don Smith, Lou Steinberg, and the anonymous referees.

REFERENCES

Addanki, S., Cremonini, R., & Penberthy, S. (1991). Graphs of models.Artificial Intelligence 51, 145-178.

Allgower, E., & Georg, K. (1990). Numerical Continuation Methods.Springer-Verlag, New York.

Ashby, D.L., Dudley, M.R., Iguchi, S.K., Browne, L., & Katz, J. (1992).Potential Flow Theory and Operation Guide for the Panel CodePAMRC 12.

Cagan, J., & Williams, B. (1993). First order necessary conditions for ro-bust optimality. Proc. Design Automation Conf. ASME, Albuquerque,NM, 539-549.

Conte, S., & de Boor, C. (1980). Elementary Numerical Analysis: An Al-gorithmic Approach. McGraw-Hill, New York.

Dahlquist, G., & Bjorck, A. (1974). Numerical Methods. Prentice-Hall,Englewood Cliffs, NJ.

Dennis, J., & Schnabel, R. (1983). Numerical Methods for UnconstrainedOptimization and Nonlinear Equations. Prentice-Hall, EnglewoodCliffs, NJ.

Ellman, T., Keane, J.. & Schwabacher, M. (1992). The Rutgers CAP projectdesign associate. Report No. CAP-TR-7. Department of Computer Sci-ence, Rutgers University, New Brunswick, NJ,

Ellman, T., Keane, J., & Schwabacher, M. (1993). Intelligent model selec-tion for hillclimbing search in computer-aided design. Proc. EleventhNatl. Conf. on Artificial Intelligence. Washington, DC, 594-599.

Ellman, T.. Keane, J.. Schwabacher, M.. & Murala, T. (1995). A transfor-mation system for interactive reformulation of design optimization strat-egies. Proc. Tenth Knowledge-Based Software Engineering Conf.Boston, MA, 44-51.

Ellman, T., Keane, J., Banerjee, A., & Armhold, G. (1997). A transforma-tion system for interactive reformulation of design optimization strat-egies. Research in Engineering Design (submitted for review).

Falkenhainer, B., & Forbus, K. (1991). Compositional modelling: Findingthe right model for the job. Artificial Intelligence 51, 95-144.

Gelsey, A. (1995). Intelligent automated quality control for computationalsimulation. Artificial Intelligence for Engineering Design, Analysis andManufacturing 9(5), 387-400.

Gill, P., Murray, W., & Wright, M. (1981). Practical Optimization. Aca-demic Press, London.

Katz, J., & Plotkin, A. (1991). Low-Speed Aerodynamics: From Wing Theoryto Panel Methods. McGraw-Hill, New York.

Lawrence, C , Zhou, J., & Tits, A. (1995). User's guide for CFSQP version2.3: A C code for solving (large scale) constrained nonlinear (mini-max) optimization problems, generating iterates satisfying all inequal-ity constraints. Report No. TR-94- 16rl. Institute for Systems Research,University of Maryland.

Letcher, J. (1975). Sailing hull hydrodynamics, with reanalysis of the An-tiope data. Transactions of the Society of Naval Architects and MarineEngineers 83.

Letcher, J. (1991). The Aero/Hydro VPP Manual. Aero/Hydro, Inc., South-west Harbor, ME.

Letcher, J., Marshall, J., Oliver, J., & Salvesen, N. (1987). Stars and stripes.Scientific American 257(2), 24-32.

More\ J.J., & Wright. S.J. (1995). Optimization Software Guide. SIAM, Phil-adelphia.

Nayak, P. (1994). Causal approximations. Artificial Intelligence 70,277-334.Newman, J., & Wu, T. (1973). A generalized slender body theory for fish-

like forms. Journal of Fluid Mechanics 57(4).

377

Orelup, M.F., Dixon, J.R., Cohen, PR., & Simmons, M.K. (1988). Do-minic ii: Meta-level control in iterative redesign. Proc. Natl. Conf. onArtificial Intelligence, pp. 25-30. MIT Press, Cambridge, MA.

Osio, G., & Amon, C. (1996). An engineering design methodology withmultistage Bayesian surrogates and optimal sampling. Research in En-gineering Design 8, 189—206.

Pearl, J. (1984). Heuristics: Intelligent Search Strategies for Computer Prob-lem Solving. Addision-Wesley, Reading, MA.

Peressini, A., Sullivan, F., & Uhl, J. (1988). The Mathematics of NonlinearProgramming. Springer-Verlag, New York.

Powell, D. (1990). Inter-gen: A hybrid approach to engineering designoptimization. Ph.D. Thesis. Rensselaer Polytechnic Institute, Depart-ment of Computer Science.

Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. (1986). NumericalRecipes. Cambridge University Press, New York.

Rudin, W. (1964). Principles of Mathematical Analysis. McGraw-Hill,New York.

Sacerdoti, E.D. (1974). Planning in a hierarchy of abstraction spaces. Ar-tificial Intelligence 5, 115-135.

Simon, H. (1981). The Sciences of the Artificial. MIT Press, Cambridge,MA.

Sobieszczanski-Sobieski, J. (1982). A linear decomposition method for largeoptimization problems—Blueprint for development. Report No. NASATM-83248. National Aeronautics and Space Administration.

Sobieszczanski-Sobieski, J., & Hatfka, R. (1996). Multidisciplinary aero-space design optimization: Survey of recent developments. Report No.AIAA 96-0711. American Institute of Aeronautics and Astronautics.

Tong, S.S. (1990). Coupling symbolic manipulation and numerical simu-lation for complex engineering designs. Intelligent Mathematical Soft-ware Systems, pp. 241-252. North-Holland, New York.

Vanderplaats, G. (1984). Numerical Optimization Techniques for Engineer-ing Design: With Applications. McGraw-Hill, New York.

Weems, K., Lin, W., & Chen, H. (1994). Calculations of viscous nonlinearfree-surface flows using an interactive zonal approach. Proc. CFD Work-shop of the Ship Research Institute of Japan.

Weld, D. (1990). Approximation reformulations. Proc. Eighth Natl. Conf.on Artificial Intelligence. MIT Press, Boston.

Yao, K.-T, & Gelsey, A. (1996). Intelligent automated grid generation fornumerical simulations. Artificial Intelligence for Engineering Design,Analysis and Manufacturing 10(3), 215-234.

Thomas Ellman is an Assistant Professor in the Depart-ment of Computer Science at Rutgers University. He re-ceived his Ph.D. and M.Sc. in Computer Science fromColumbia University and his B.A. in Physics from Wes-leyan University. His research combines artificial intelli-gence and software engineering to develop knowledge-based programming and problem-solving environments forengineering design and scientific computation. Part of hiswork is focused on program synthesis and program trans-formation techniques for simulation, optimization, and con-straint satisfaction problems. Another part is focused onenvironments for setting up, carrying out, and interpretingthe results of computational experiments.

John Keane is a Postdoctoral Research Associate in the De-partment of Computer Science at Rutgers University. Hereceived his B.A., M.Sc, and Ph.D. degrees in ComputerScience from Rutgers. His thesis work was on knowledge-based management of simulation model failure occurring inlegacy codes used in design optimization. His current re-search includes work on developing causal models of fail-ure in simulation models and the integrated visualization ofexecution and data flow in legacy codes.




Mark Schwabacher is a Postdoctoral Research Associatein the National Institute of Standards and Technology's En-gineering Design Technologies Group. He has a B.A. inComputer Science, Mathematical Sciences, and Economicsfrom Rice University and M.Sc. and Ph.D. degrees in Com-puter Science from Rutgers University. His Ph.D. thesis wason the use of artificial intelligence, especially machine learn-ing, to control the numerical optimization of complex en-gineering designs. This work focused on the ability to use,within an optimization, legacy analysis codes that have pa-thologies such as numerical noise and unevaluable points.It involved collaborations with the Engineering Depart-ments at Rutgers and with industry.

Ke-Thia Yao is a computer scientist in the InformationSciences Institute of the University of Southern California.He received his B.Sc. degree in Electrical Engineering andComputer Science from the University of California atBerkeley and his M.Sc. and Ph.D. degrees in Computer Sci-ence from Rutgers University. For his thesis, he built a spa-tial reasoning system in the domain of computational fluiddynamics (CFD). By reasoning about local interactions offluid flow and geometry, the system forms global flows thatapproximate the solution to the underlying partial differen-tial equation. Topological information extracted from the ap-proximation guides the system in the automatic generationof grid for CFD simulators.



Documents

Multilevel modelling for engineering design optimization...dled implicitly by the VPP program. When given a set of hull geometry parameters, VPP automatically computes the size of