An augmented Lagrangian decomposition method for quasi … · 2009. 2. 17. · Struct Multidisc Optim (2007) 34:211–227 DOI 10.1007/s00158-006-0077-z RESEARCH PAPER An augmented

Struct Multidisc Optim (2007) 34:211–227DOI 10.1007/s00158-006-0077-z

RESEARCH PAPER

An augmented Lagrangian decomposition methodfor quasi-separable problems in MDO

S. Tosserams · L. F. P. Etman · J. E. Rooda

Received: 14 March 2006 / Revised manuscript received: 22 August 2006 / Published online: 12 December 2006© Springer-Verlag 2006

Abstract Several decomposition methods have beenproposed for the distributed optimal design of quasi-separable problems encountered in MultidisciplinaryDesign Optimization (MDO). Some of these methodsare known to have numerical convergence difficul-ties that can be explained theoretically. We proposea new decomposition algorithm for quasi-separableMDO problems. In particular, we propose a decom-posed problem formulation based on the augmentedLagrangian penalty function and the block coordi-nate descent algorithm. The proposed solution algo-rithm consists of inner and outer loops. In the outerloop, the augmented Lagrangian penalty parametersare updated. In the inner loop, our method alter-nates between solving an optimization master prob-lem and solving disciplinary optimization subproblems.The coordinating master problem can be solved an-alytically; the disciplinary subproblems can be solvedusing commonly available gradient-based optimizationalgorithms. The augmented Lagrangian decompositionmethod is derived such that existing proofs can be usedto show convergence of the decomposition algorithm to

S. Tosserams (B) · L. F. P. Etman · J. E. RoodaDepartment of Mechanical Engineering,Eindhoven University of Technology, P.O. Box 513,5600 MB, Eindhoven, The Netherlandse-mail: [email protected]

L. F. P. Etmane-mail: [email protected]

J. E. Roodae-mail: [email protected]

Karush–Kuhn–Tucker points of the original problemunder mild assumptions. We investigate the numericalperformance of the proposed method on two exampleproblems.

Keywords Multidisciplinary design optimization(MDO) · Decomposition · Quasi-separable problems ·Augmented Lagrangian · Block coordinate descent

1 Introduction

Multidisciplinary Design Optimization (MDO) prob-lems are encountered in the optimal design of large-scale engineering systems that consist of a number ofinteracting subsystems. We consider the classical MDOproblem where each subsystem represents a disciplinethat is concerned with one aspect in the design of thecomplete system (e.g., thermodynamics, structural me-chanics, aerodynamics, control, etc). The disciplinarydesign teams often use specialized computer codes thathave been under development for many years. Bringingthese legacy codes together into a single optimizationproblem to find the optimal system design is oftenimpractical, undesirable, or even impossible.

Decomposition methods for distributed design aimat finding a design that is optimal for the whole sys-tem, while allowing local autonomous design mak-ing at the disciplinary subsystems. These decomposi-tion methods exploit the structure present in MDOproblems by reformulating them as a set of indepen-dent disciplinary subproblems and introducing a masterproblem to coordinate the subproblem solutions to-wards the optimal system design. Such decomposition

212 S. Tosserams et al.

methods with local optimization autonomy are alsoreferred to as multilevel methods (see, e.g., Balling andSobieszczanski-Sobieski 1996). Well-known multileveldecomposition methods include Concurrent SubSpaceOptimization (Sobieszczanski-Sobieski 1988), Collab-orative Optimization (CO; Braun 1996; Braun et al.1997), Bi-Level Integrated System Synthesis (BLISS;Sobieszczanski-Sobieski et al. 2000, 2003), and the con-straint margin approach of Haftka and Watson (2005)among others.

This paper focusses on the decomposition of quasi-separable problems (Haftka and Watson 2005), whichare frequently encountered in MDO. The subsystems ofquasi-separable problems are coupled through a num-ber of shared variables. Most decomposition methodsfor quasi-separable problems follow a bi-level formu-lation in which the solution of the disciplinary subprob-lems is nested within a coordinating master problem. Ateach iteration of the algorithm used to solve the masterproblem, each of the subproblems are solved onceto obtain master problem function values and gradi-ents. These nested bi-level methods include CO, BLISS2000, and the constraint margin approach of Haftka andWatson (2005).

Unfortunately, several of these nested bi-level de-composition methods may have master and/or subprob-lems that are non-smooth or whose solutions do notsatisfy the Karush–Kuhn–Tucker (KKT) conditions.For example, DeMiguel and Murray (2000) andAlexandrov and Lewis (2002) showed that in CO,the KKT conditions break down at the master prob-lem solution; Haftka and Watson (2005) note that theconstraints in the master problem for their constraintmargin method are non-smooth. However, satisfactionof the KKT conditions and smoothness are importantrequirements for the use of efficient existing optimiza-tion algorithms such as Sequential Quadratic Program-ming. Bi-level formulations that do not meet theserequirements have to use specialized, typically ineffi-cient optimization algorithms to solve the associatedoptimization problems.

To reduce the computational costs associated withthe combination of inefficient algorithms and thenested bi-level formulation, the use of response sur-face modeling techniques has been proposed (see, e.g.,Sobieszczanski-Sobieski et al. 2003 for BLISS; Sobieskiand Kroo 2000 for CO; Liu et al. 2004 for the constraintmargin approach of Haftka and Watson). Instead ofcoupling the master optimization algorithm directly tothe subsystems, subsystem responses are approximatedusing surrogate models that are cheap to evaluate.These cheap approximations are then provided to themaster problem. Creating appropriate surrogate mod-

els of subproblem responses is, however, not straight-forward and may become cumbersome for increasingnumbers of shared variables and for non-smooth func-tional behavior.

Analytical Target Cascading (ATC; Kim 2001;Michelena et al. 1999) and the penalty decomposition(PD) methods of DeMiguel and Murray (2006) are twoformulations for which the master and subproblems canbe shown to be smooth and whose solutions satisfy theKKT conditions (see DeMiguel and Murray 2006; Kim2001). The conceptual difference between ATC and PDis that the PD formulation is nested similar to CO, whilethe ATC formulation is not nested. This means that inthe PD formulation, a function evaluation (one itera-tion step) for the optimization master problem, requiresthe solution of all subproblems. On the other handin ATC, the solution coordination strategy alternatesbetween solving the master optimization problem andsolving the optimization subproblems.

The original ATC formulation was intended forproduct development and typically follows an objectpartition along the lines of systems, subsystems, andcomponents (see Fig. 1 for an illustration). Althoughproposed for product development, ATC is also ap-plicable to classic MDO problems. The convergencetheory for ATC as presented by Michelena et al. (2003)assumes a purely hierarchical system. Extensions ofthe ATC formulation towards non-hierarchical systemsexist (see, e.g., Allison et al 2005; Kokkolaras et al.2002). As of yet, no convergence theory is available forsuch non-hierarchical formulations.

In this paper, we propose a new bi-level decom-position algorithm for non-hierarchical quasi-separableMDO problems. The method is derived such that ex-isting convergence proofs can be used to show conver-gence of the decomposition algorithm to KKT points

Fig. 1 Problem decomposition types

An augmented Lagrangian decomposition method for quasi-separable problems in MDO 213

of the original problem under mild assumptions. Themethod is based on augmented Lagrangian relaxationand block coordinate descent, two techniques recentlylinked to ATC (Tosserams et al. 2006). The proposedsolution coordination algorithm consists of inner andouter loops. In the outer loop, the penalty parametersare updated based on results of the inner loop. In theinner loop, the decomposed problem is solved for fixedpenalty parameters by alternating between solving themaster problem and the subproblems. This alternatingstrategy is similar to ATC, but different from many bi-level MDO methods such as CO that follow a nestedformulation.

The outline of this paper is as follows. In Section 2,we decompose the quasi-separable MDO problem us-ing the augmented Lagrangian penalty function. In Sec-tion 3 we present three solution strategies that includepractical schemes for updating the penalty parametersin the outer loop. For these schemes, convergence toa KKT point of the original non-decomposed problemcan be shown by combining existing results as foundin nonlinear programming textbooks such as Bazaraaet al. (1993) and Bertsekas (2003). Numerical resultsare presented and discussed in Section 4.

2 Decomposition of quasi-separable problems

The decomposition algorithm presented in this paperis applicable to so-called quasi-separable MDO prob-lems. The quasi-separable MDO problem with M sub-systems is given by:

minz=[yT ,xT

1 ,...,xTM]T

M∑

j=1

f j(y, x j)

subject to g j(y, x j) ≤ 0 j = 1, . . . , M, (1)

h j(y, x j) = 0 j = 1, . . . , M,

where the vector of design variables z = [yT , xT1 , . . . ,

xTM]T , z ∈ R

n consists of a number of shared variablesy ∈ R

ny, and a number of local variables x j ∈ R

nxj as-

sociated exclusively to subsystem j, where n = ny +∑Mj=1 nx

j . The shared variables may simply be commondesign variables, but also analysis outputs of one sub-system that are required as inputs for other subsystems.Local objectives f j : R

n j �→ R and local constraints g j :R

n j �→ Rmg

j and h j : Rn j �→ R

mhj are associated exclu-

sively to subsystem j and may depend on the sharedvariables y and the local variables x j of only a singlesubsystem j, such that n j = ny + nx

j .

Fig. 2 Functional dependence table for four-element examplequasi-separable problem (1) with shared variables

Problem (1) is here referred to as the original all-in-one (AIO) problem.

The quasi-separable structure is visualized in thefunctional dependence table (FDT) of Fig. 2 for afour-element example. Similar to Wagner (1993), weshade the (i, j ) entry of the table if the function of rowi depends on the variables of column j. Throughoutthis section, we use the functional dependence table toillustrate the effect of the proposed problem transfor-mations on the problem structure.

The coupling of the subsystems through the sharedvariables y can be easily observed from the FDT ofFig. 2. Without this coupling, the problem would bedirectly decomposable into M smaller subproblems.

The AIO problem can be solved directly with theso-called single-level MDO methods such as the all-at-once approach, individual discipline feasible tech-niques, or multidisciplinary feasible algorithms. Thisclass of MDO methods facilitates disciplinary analysisautonomy, rather than decision autonomy as obtainedwith multilevel decomposition methods. The reader isreferred to Cramer et al. (1994) for a review of single-level formulations.

The bi-level decomposition approach presented inthis paper solves the original problem by the followingfour steps:

1. Introduction of auxiliary variables and consistencyconstraints

2. Relaxation of the consistency constraints3. Formulation of the decomposed problem4. Solution of the decomposed problem

Here, Steps 1 through 3 are problem transformationsteps, and Step 4 entails the actual solution algorithms.

Existing convergence proofs for the solution algo-rithms of Step 4 only apply to problems with fullyseparable constraint sets. The separability of constraintsets implies that local constraint sets g j and h j forsubspace j may only depend on variables computed atthat subspace, but not on variables of other subprob-lems. However, the local constraint sets of the originalAIO problem (1) are not separable because of thecoupling through the shared variables y. To obtain a


decomposed problem formulation with fully separableconstraint sets, the first three problem transformationsteps have to be taken first.

The problem transformations used here are verysimilar to those presented by DeMiguel and Murray(2006) for the Inexact Penalty Decomposition (IPD)method. The main difference is that we arrive at analternating decomposition formulation, whereas theIPD formulation is nested. Another difference is thatIPD uses a quadratic penalty function to relax theconsistency constraints (Step 2). We use an augmentedLagrangian formulation instead, which for ATC hasproven to improve numerical efficiency and robustness.We also observed these benefits for an augmentedLagrangian formulation of IPD, which we implementedfor the examples presented in Section 4. To have afair comparison in the results section, we will there-fore compare our newly developed approach with anaugmented Lagrangian formulation of the IPD method.Because the additional terms introduced by the aug-mented Lagrangian approach are linear, we expect thatproving smoothness and satisfaction of KKT conditionsof the master and subproblems requires only minormodifications to the results presented by DeMiguel andMurray (2006) for the quadratic penalty function.

The remainder of this section describes the transfor-mation steps in detail, and in Section 3 we present theactual solution algorithm.

2.1 Step 1: introduction of auxiliary variablesand consistency constraints

In the first transformation, auxiliary shared variablesy j ∈ R

nyare introduced at each subsystem to separate

the local constraint sets g j and h j. The auxiliary sharedvariables y j are copies of the original shared variablesy and are forced equally by non-separable consistencyconstraints c : R

(M+1)·ny �→ Rmc

. These linear consis-tency constraints c are defined by c(y, y1, . . . , yM) =[cT

1 , . . . , cTM]T = 0 with:

c j(y, y j) = y − y j = 0, (2)

where c j : R2·ny �→ R

nydenotes the vector of consis-

tency constraints for subsystem j and mc = Mny.The modified AIO problem after the introduction

of the auxiliary variables and consistency constraints isgiven by:

miny,y1,x1,...,yM,xM

M∑j=1

f j(y j, x j)

subject to g j(y j, x j) ≤ 0 j = 1, . . . , M,

h j(y j, x j) = 0 j = 1, . . . , M,

c j(y, y j) = 0 j = 1, . . . , M.

(3)

Fig. 3 Functional dependence table for four-element examplemodified AIO problem (3) with separable local constraints andnon-separable consistency constraints

The solutions to the modified AIO problem (3) andthe original AIO problem (1) are equal because of theconsistency constraints (Theorem 4.1 of DeMiguel andMurray 2006).

The FDT of the modified AIO problem (3) is il-lustrated in Fig. 3, where separability of the localconstraint sets can be observed, as well as the non-separability of the introduced consistency constraints.By introducing the auxiliary variables and consistencyconstraints, we have coupling constraints instead ofcoupling variables.

Other instances of the modified all-in-one prob-lem (3) have appeared in the MDO literature. Forexample, Cramer et al. (1994) have used the term “All-At-Once” approach when the shared variables y con-tain both common design variables and analysis inputand output variables. Furthermore, Alexandrov andLewis (2002) presented the modified AIO problem as“Distributed Analysis Optimization.”

Although the local constraint sets g j and h j are nowfully separable with respect to the design variables, theconsistency constraints c are not, and are therefore thecoupling constraints of the problem.

2.2 Step 2: relaxation of the consistency constraints

The second transformation is the relaxation of thenon-separable consistency constraints, which leads toa problem with fully separable constraint sets. Theconsistency constraints are relaxed using an augmentedLagrangian penalty function φ : R

mc �→ R (see, e.g.,Bazaraa et al. 1993; Bertsekas 2003):

φ(c) = vTc + ‖w ◦ c‖22 =

M∑j=1

φ j(c j(y, y j))

=M∑j=1

vTj (y − y j) +

M∑j=1

∥∥w j ◦ (y − y j)∥∥2

2 ,

(4)


with the penalty function associated with subsystems j,φ j : R

ny �→ R defined by:

φ j(c j(y, y j)) = vT

j (y − y j) + ∥∥w j ◦ (y − y j)∥∥2

2 (5)

where v = [vT1 , . . . , vT

M]T ∈ Rmc

is the vector of La-grange multiplier estimates for the consistency con-straints and w = [wT

1 , . . . , wTM]T ∈ R

mcis the vector of

penalty weights, with v j ∈ Rny

and w j ∈ Rny

. The sym-bol ◦ represents the Hadamard product: an entry-wiseproduct of two vectors, such that a ◦ b = [a1, ..., an]T ◦[b 1, . . . , b n]T = [a1b 1, . . . , anb n]T .

The resulting relaxed AIO problem is given by:

miny,y1,x1,...,yM,xM

M∑j=1

f j(y j, x j) +M∑j=1

φ j(c j(y, y j))

subject to g j(y j, x j) ≤ 0 j = 1, . . . , M,

h j(y j, x j) = 0 j = 1, . . . , M.

(6)

The solution to the relaxed problem (6) is no longerequal to the solution to original problem (1), because arelaxation error is introduced by relaxing the couplingconstraints. By appropriate selection of the Lagrangemultiplier estimates v and penalty weights w, this relax-ation error can be driven to zero. In fact, the algorithmswe propose in Section 3 solve the decomposed problemfor a sequence of penalty parameters.

The FDT of the relaxed AIO problem is illustratedin Fig. 4 where the desired full separability of thesubsystem functions can clearly be observed. The figurealso shows the coupling of the subsystems through thepenalty terms φ j and the master copy of shared vari-ables y.

Note that any penalty function can be used to relaxthe problem. Here, we use the augmented Lagrangian

Fig. 4 Functional dependence table for four-element examplerelaxed AIO problem (6) with fully separable constraint sets

function for a number of reasons. First, the augmentedLagrangian function is continuous and also has con-tinuous first- and second-order derivatives. Second,it avoids the ill-conditioning of the relaxed problem,which is encountered for some classes of penalty func-tions. Third, it is additively separable with respect to theindividual consistency constraints c j, which allows for adegree of parallelism during distributed optimization inStep 4. Finally, the augmented Lagrangian function hasbeen extensively studied in the field on nonlinear pro-gramming, providing a large knowledge base of theoryand parameter update strategies (see, e.g., Arora et al.1991; Bertsekas 1982, 2003, for overviews).

After relaxation of the consistency constraints, thelocal constraint sets are separable with respect to thesubsystem variables, as illustrated in Fig. 4, and we areready to decompose the problem.

2.3 Step 3: formulation of the decomposed problem

The third transformation, decomposition, builds onthe observation that, for fixed shared variables y, therelaxed problem can be decomposed into M inde-pendent subproblems each associated with one of thesubsystems.

The block coordinate descent algorithm we proposefor Step 4 iterates between solving the relaxed AIOproblem (6) for a subset of variables while holdingthe remaining variables fixed at their previous value.First, the relaxed problem is solved for the mastercopy of shared variables y, while fixing the remainingvariables. Second, we fix y, and solve M independentsubproblems (possibly in parallel) for subsystem vari-ables (y j, x j)

T , j = 1, . . . , M.The solution of the relaxed problem with respect to y

defines a master problem, P0, in which only the penaltyterms have to be included. The remaining functionsare independent of y and are therefore constant. Themaster problem is given by:

miny

M∑

j=1

φ j(c j(y, y j)), (7)

which can be solved analytically:

y∗ = argminy

M∑j=1

φ j(c j(y, y j)) =M∑j=1

(w j◦w j◦y j)− 12

M∑j=1

v j

M∑j=1

(w j◦w j)

. (8)


Fig. 5 Illustration of the non-nested decomposed problem for-mulation with parallel subproblem solutions

Similarly, the M disciplinary subproblems P j in y j,

x j, j = 1, . . . , M are defined by:

miny j,x j

f j(y j, x j) + φ j(c j(y, y j))

subject to g j(y j, x j) ≤ 0,

h j(y j, x j) = 0.

(9)

An illustration of the decomposed problem is given inFig. 5.

The use of an augmented Lagrangian relaxation incombination with a block coordinate descent method isnot new (see, e.g., Fortin and Glowinski 1983). How-ever, what is novel in our approach is the use of masterproblem P0. The introduction of this master problemallows the parallel solution of the subproblems whileusing the block coordinate descent method. Existingblock coordinate decent methods do not include themaster problem and, therefore, require a sequentialsolution of the disciplinary subproblems. With ourapproach, we facilitate parallel solution of the subprob-lems, which is highly desirable in MDO. Since the mas-ter problem can be solved analytically, the additionallyrequired costs are very small.

2.4 Sparsity of coupling

Throughout this section, we have assumed that each ofthe subsystems’ functions depends on all of the sharedvariables y. In practice, however, subsystems may de-pend only on a subset of these shared variables. Suchsparsity is reflected in the y column of the FDT of theoriginal AIO problem (Fig. 2), which in the case ofsparsity is not completely “full.”

It is possible to reflect this coupling sparsity in theproblem formulation by using binary selection matri-

ces, similar to those introduced for ATC by Michalekand Papalambros (2005b). The ny

j × ny selection matri-ces S j for subsystems j = 1, . . . , M can be defined suchthat the multiplication S jy yields those ny

j componentsof y relevant for subsystem j, where ny

j ≤ ny. To illus-trate, assume that y = [y1, y2, y3]T and that subsystem1 only depends on y1 and y3, but not on y2. ThenS1 = [1 0 0; 0 0 1], and S1y = [y1, y3]T .

In the case of sparsity, the auxiliary variables y j ∈R

nyj are only introduced for those components of y

relevant for subproblems j. The consistency constraintsc j for subsystem j are then given by c j = S jy − y j = 0.Similarly, the penalty parameters v j ∈ R

nyj and w j ∈ R

nyj

are only introduced for this reduced set of consistencyconstraints.

3 Solution algorithms

We solve the decomposed problem for a sequenceof penalty parameters {vk, wk}. These sequences arechosen such that if k → ∞, then the solution to thedecomposed problem converges to the solution of theoriginal non-decomposed problem.

The solution strategy consists of inner and outerloops, similar to ATC algorithms as presented byMichalek and Papalambros (2005a) and Tosseramset al. (2006), and the PD algorithm of DeMiguel andMurray (2006). In the inner loop, the decomposedproblem is solved by means of block coordinate descentfor fixed penalty parameters. In the outer loop, thepenalty parameters are updated based on the solutionto the inner loop problem.

3.1 Outer loop: method of multipliers

In the outer loop, the penalty parameters v and w areupdated to reduce the relaxation error. This error canbe reduced by two mechanisms (Bertsekas 2003):

1. Take v close to λ∗.2. Let w approach infinity.

Here, λ∗ is the vector of optimal Lagrange multipliersof the consistency constraints c at the solution to themodified AIO problem (3) before relaxation. Manymethods are available to use these mechanisms (see,e.g., Arora et al. 1991; Bertsekas 1982, 2003).

Here we use the method of multipliers (Bazaraaet al. 1993; Bertsekas 1982, 2003) to update the penaltyparameters v and w. The Lagrange multiplier estimatesvk+1 for iteration k + 1 are determined by the estimates


vk at iteration k, the penalty weights wk at iterationk, and the value of the consistency constraints ck atthe solution to the inner loop problem at iteration k.The method of multipliers update for the Lagrangemultipliers is given by:

vk+1 = vk + 2wk ◦ wk ◦ ck. (10)

We increase the weights by a factor β only when thereduction in the consistency constraint value is smallerthan some fraction γ (Bertsekas 2003). As a result,the penalty weights are only increased when the con-tribution of the Lagrange multiplier update (10) didnot lead to a large enough reduction in the consistencyconstraint violation.

For the i-th consistency constraint ci, i = 1, . . . , mc ofc, the associated penalty weight wi is updated as:

wk+1i =

{wk

i if |cki | ≤ γ |ck−1

i |βwk

i if |cki | > γ |ck−1

i | i = 1, . . . , mc (11)

where β > 1 and 0 < γ < 1. Typically γ = 0.25 and2 < β < 3 are recommended to speed up conver-gence (Bertsekas 2003).

Convergence to local solutions of the modified AIOproblem (3) has been proven for the method of multi-plier algorithm under mild assumptions: local solutionsmust satisfy second-order sufficiency conditions, and wmust be sufficiently large (see, e.g., Proposition 2.4 inBertsekas 1982). Under the more strict assumption ofconvexity, the method of multipliers can be shown toconverge to the globally optimal solution of the orig-inal AIO problem for any positive penalty weight, aslong as the sequence of weights is non-decreasing. Theweight update scheme of (11) makes sure that weightseventually become large enough to assure convergence.

The solution procedure is terminated when two con-ditions are satisfied. First, the change in the maximalconsistency constraint value for two consecutive outerloop iterations must be smaller than some user-definedtermination tolerance ε > 0:

|cki − ck−1

i |1 + |yk

i |< ε i = 1, . . . , mc (12)

where the division by 1 + |yki | is used for scaling

purposes.Second, the maximal consistency constraint violation

must also be smaller than tolerance ε > 0:

|cki |

1 + |yki |

< ε i = 1, . . . , mc (13)

For problems without a consistent solution, the sec-ond criterion will never be satisfied, and the algorithm

will not be terminated. In such cases, one may omit thesecond criterion, but at the risk of converging prema-turely at a non-consistent solution because of a (locally)small reduction in c. Another option is to monitorthe value of the penalty term, which goes to zero forconsistent solutions. For non-consistent solutions, thevalue of the penalty term will go to infinity. Therefore,if the second criterion is not satisfied and the penaltyterm grows very large, then it is likely that the problemdoes not have a consistent solution.

3.2 Inner loop: block coordinate descent

The update algorithms of the outer loop require thesolution to the relaxed AIO problem for fixed weights.To find the solution to the relaxed AIO problem (6)for fixed weights, we use the iterative block coordinatedescent (BCD) algorithm (Bertsekas 2003). Instead ofsolving the relaxed AIO problem as a whole, the BCDmethod iterates between solving the master problemP0 and solving the disciplinary subproblems P1, . . . ,

PM in parallel. The BCD method is also know asthe “nonlinear Gauss–Seidel” method (Bertsekas andTsitsiklis 1989), or “alternating optimization” (Bezdekand Hathaway 2002; Fortin and Glowinski 1983).

Convergence to KKT points of the relaxed AIOproblem for fixed penalty parameters has been provenunder mild conditions: global solutions to subproblemsP1, . . . , PM are uniquely attained, and the objectivesf j, j = 1, . . . , M of the relaxed AIO problem are con-tinuously differentiable (Proposition 2.7.1 in Bertsekas2003).

The inner loop BCD algorithm is terminated whenthe relative change in the objective function value ofthe relaxed AIO problem for two consecutive innerloop iterations is smaller than some user-defined termi-nation tolerance εinner > 0. Let F denote the objectiveof the relaxed problem (6), then the inner loop isterminated when:

|Fξ − Fξ−1|1 + |Fξ | < εinner, (14)

where ξ denotes the inner loop iteration number. Thedivision by 1 + |Fξ | is used for proper scaling of thecriterion for very large as well as very small objec-tives (Gill et al. 1981). The termination tolerance εinner

should be smaller than the outer loop termination tol-erance ε to assure sufficient accuracy of the inner loopsolution. We use εinner = ε/100.

An alternative termination strategy is to use loosertolerances when the penalty parameters are still far


Fig. 6 Illustration ofproposed solution algorithms

from their optimal values. The tolerances are tightenedas k increases, and the penalty parameters approachtheir optimal values. With this strategy we do not wastecostly inner loop iterations for finding a solution to therelaxed problem that is far from the optimal solution.More formally, such an inexact approach uses a dif-ferent tolerance εk

inner for each outer loop iteration.Providing that the sequence {εk

inner} is nonincreasingand εk

inner → 0, convergence of the inner loop solutionsto a KKT point of the original problem has been proven(Proposition 2.14 in Bertsekas 1982).

In this inexact nested method of multipliers (IN-MOM), more moderate values for β (smaller) and γ

(larger) are advised for efficiency. Experiments indicatethat β = 1.5 and γ = 0.5 give good results for the exam-ples presented in Section 4.

3.3 Alternating direction method of multipliers

An extreme case of the inexact approach is to ter-minate the inner loop after just a single BCD itera-tion. This algorithm, the alternating direction methodof multipliers (ADMOM), is introduced by Bertsekasand Tsitsiklis (1989). ADMOM is expected to convergefaster than a nested method, because no effort is put incostly inner loop iterations. Note that the convergenceproof as presented in Proposition 4.2 of Bertsekasand Tsitsiklis (1989) is limited to convex problems.Although convexity is required in these proofs, exper-iments with non-convex geometric optimization prob-

lems in ATC have been successful (Tosserams et al.2006).

For ADMOM, the Lagrange multiplier update (10)remains unchanged, and the penalty weights are up-dated by (11), however, with more moderate values forβ and γ . Experiments indicate that β = 1.1 and γ = 0.9are suitable choices for ADMOM.

In Fig. 6, the three solution algorithms presented inthis section are illustrated. The figures show the twomain parts of the algorithm: (1) coordination throughthe penalty updates and the solution of the master sub-problem P0, and (2) distributed optimization by solvingthe subproblems P j, j = 1, . . . , M in parallel. An illus-tration of the nested methods (ENMOM and INMOM)is given in Fig. 6a, where the iterative inner and outerloops are visualized. The alternating direction methodof multipliers (ADMOM), without the iterative innerloop, is depicted in Fig. 6b.

3.4 Selection of initial penalty weights

Although the above algorithms converge for any pos-itive initial weight, the performance of the outer loopmethod of multipliers depends on the choice of theinitial weight w (see, e.g., Arora et al. 1991; Bertsekas1982, 2003). When the weights are chosen too small,the Lagrange multiplier estimates of (10) converge onlyslowly to λ∗. With weights that are too large, the innerloop problems become hard to solve because of the ill-conditioning introduced by the large quadratic penalty


terms. Moreover, the convergence speed of the innerloop BCD algorithms is adversely affected becauselarge weights cause strong coupling between the sub-problems. These increased costs for large weights havealso been observed for ATC (see, e.g., Tosserams 2004;Tzevelekos et al. 2003), which uses BCD like solutionalgorithms.

Bertsekas (1982) and Arora et al. (1991) suggestto set initial weights such that the original objectivefunction and the penalty terms are of comparable size,i.e. φ ≈ | f |. To avoid too large weights that slow downconvergence of the BCD inner loop, we propose tochoose the weights such that the penalty term is a frac-tion α of the objective function value: φ ≈ α| f |, with10−3 < α < 1. The lower bound on α is used to avoidtoo small weights that may cause slow convergence ofthe penalty parameter updates (Bertsekas 1982).

For our experiments we initially set v = 0, and takeall weights equal w = w, such that φ = w2 ∑M

j=1 ‖c j‖22.

The initial weights are then selected as:

w =√√√√ α| f |

∑Mj=1 ‖c j‖2

2

(15)

where f and c j, j = 1, . . . , M are estimates of theobjective function and the inconsistencies. For manyengineering problems, a reasonable (order of magni-tude) estimate of the objective function minimum inthe optimum can often be given. The approach assumesthat the estimate of the objective is non-zero, whichis often the case in engineering design. However, if fhappens to be zero, we propose to take a non-zero“typical value” for the objective function.

Manually setting estimates for all inconsistencies, c j

may be difficult and/or cumbersome. As an alternative,we propose to obtain these estimates by solving thedecomposed problem for small weights w and zeroLagrange multipliers v = 0. For these weights, thepenalty term will be small when compared to the ob-jective function value. As a consequence, the allowedinconsistencies will be large, and the solution of (6) willproduce an estimate c j for the size of the inconsisten-cies. Since the coupling between the subsystems is veryweak for these settings, the inner loop BCD algorithmwill converge within only a few iterations.

The initial weight selection method as presentedabove is a heuristic and by no means a fail-safe ap-proach for finding appropriate weights. Practical guide-lines for selecting f and α are derived from numericalexperiments and are presented in the next section. Analternative weight-setting approach may be to selectinitial weights such that the gradients (or the Hessians)

of the original objective and the penalty term are ofcomparable size (Arora et al. 1991).

4 Numerical results

In this section, we illustrate the numerical behavior ofthe proposed decomposition algorithms on a numberof analytical test problems. With the first test problem,Golinski’s speed reducer design problem, we studythe effect of the initial weight on the computationalcosts required to solve the decomposed problem. Fromthese experiments, we devise guidelines for selectingthe fraction parameter α and the estimate of the typicalobjective function value f in the initial weight selectionstrategy, as presented in the previous section.

The second example is a non-convex geometric opti-mization problem, previously used in Analytical TargetCascading papers. Four decompositions, each differentin the number of subsystems and shared variables, arecompared to investigate the effect of differences indecomposition on the computational costs. Further-more, the performance of the algorithms presentedhere is compared to the Inexact Penalty Decompositionmethod of DeMiguel and Murray (2006) in its originalquadratic penalty function formulation, but also in anovel augmented Lagrangian formulation.

Computational costs for the proposed solution meth-ods are determined by the number of times the sub-problems are solved (i.e., the number of times the al-gorithm passes through the “distributed optimization”box in Fig. 6). The remaining algorithmic steps, updatesof the penalty parameters, and solution of the masterproblem P0 are explicit and contribute only little to thetotal solution costs.

Note that we do not compare computational resultsfor the decomposition algorithms with the all-in-onesolution of the examples since our aim is to facilitatedistributed design of the (original) all-in-one problem.The examples are used to illustrate the numerical be-havior of the proposed algorithms. Since the examplesare small, their all-in-one solution would probably bemore efficient than solution through decomposition.

4.1 Example 1: Golinski’s speed reducer

The first test problem is taken from Golinski (1970).The objective of this problem is to minimize the volumeof a speed reducer (Fig. 7) subjected to stress, deflec-tion, and geometric constraints. The design variablesare the dimensions of the gear itself (x1, x2, x3), andboth the shafts (x4, x6, and x5, x7).


Fig. 7 Schematic of the speed reducer

The AIO design problem for the speed reducer isdefined by:

minz=[x1,...,x7]T

F =7∑

j=1F j

subject to ggear = [g5, g6, g9, g10, g11]T ≤ 0,

gshaft,1 = [g1, g3, g7]T ≤ 0,

gshaft,2 = [g2, g4, g8]T ≤ 0,

2.6 ≤ x1 ≤ 3.6, 7.3 ≤ x5 ≤ 8.3,

0.7 ≤ x2 ≤ 0.8, 2.9 ≤ x6 ≤ 3.9,

17 ≤ x3 ≤ 28, 5.0 ≤ x7 ≤ 5.5,

7.3 ≤ x4 ≤ 8.3,

where F1 = 0.7854x1x22(3.3333x2

3 +14.9335x3 − 43.0934)

F2 = −1.5079x1x26 F3 = −1.5079x1x2

7,

F4 = 7.477x36 F5 = 7.477x3

7,

F6 = 0.7854x4x26 F7 = 0.7854x5x2

7,

g1 = 1110x3

6

√(745x4x2x3

)2 + 1.69 × 107 − 1,

g2 = 185x3

7

√(745x5x2x3

)2 + 1.575 × 108 − 1,

g3 = 1.5x6 + 1.9

x4− 1, g8 = 1.93x3

5

x2x3x47

− 1,

g4 = 1.1x7 + 1.9

x5− 1, g9 = x2x3

40− 1,

g5 = 27

x1x22x3

− 1, g10 = 5x2

x1− 1,

g6 = 397.5

x1x22x2

3

− 1, g11 = x1

12x2− 1,

g7 = 1.93x34

x2x3x46

− 1.

(16)

The optimal solution, obtained with the NPSOLimplementation in Tomlab (Holmström et al. 2004),to this problem is (rounded) z∗ = [3.50, 0.70, 17.00,

7.30, 7.72, 3.35, 5.29]T with f (z∗) = 2994. Constraintsg1, g2, g4, g10 are active at the solution as well as thelower bounds on x2, x3, and x4. The functional depen-dence table for the original AIO problem (16) is givenin Fig. 8a.

4.1.1 Decomposed formulation

We decompose the AIO problem (16) into three sub-systems. Subsystem 1 is concerned with designing thegear, while subsystems 2 and 3 are associated withthe design of shafts 1 and 2, respectively. The localobjective of the gear subsystem is f1 = F1, and its localconstraints are g1 = [ggear]. The local objective of sub-system 2 is f2 = F2 + F4 + F6, and the local constraintsare g2 = [gshaft,1]. For subsystem 3, the local objectiveis given by f3 = F3 + F5 + F7, and the local constraintsare g3 = [gshaft,2]. The functional dependence table forthe relaxed AIO problem is displayed in Fig. 8b.

For this decomposition, the vector of shared vari-ables is y = [x1, x2, x3]T , which are the design variablesassociated with the gear. Subsystem 1 has no local de-sign variables x1 = []. The local variables for subsystem2 are the length and diameter of shaft 1 x2 = [x4, x6]T .Similarly, the local variables for subsystem 3 are thelength and diameter of shaft 2 x2 = [x5, x7]T .

After decomposition, the optimization subprob-lems (9) for the three subsystems are:

find x[1]1 , x[1]

2 , x[1]3

min f1 = F1

+φ(x[0]1 − x[1]

1 )

+φ(x[0]2 − x[1]

2 )

+φ(x[0]3 − x[1]

3 )

s.t. g1 = ggear ≤ 0

find x[2]1 , x[2]

2 , x[2]3 , x4, x6

min f2 = F2 + F4 + F6

+φ(x[0]1 − x[2]

1 )

+φ(x[0]2 − x[2]

2 )

+φ(x[0]3 − x[2]

3 )

s.t. g2 = gshaft,1 ≤ 0

find x[3]1 , x[3]

2 , x[3]3 , x5, x7

min f3 = F3 + F5 + F7

+φ(x[0]1 − x[3]

1 )

+φ(x[0]2 − x[3]

2 )

+φ(x[0]3 − x[3]

3 )

s.t. g3 = gshaft,2 ≤ 0

where φ is the augmented Lagrangian penalty functionof (4), and the bracketed top-right index denotesthe subsystem at which the shared variable copy iscomputed.


Fig. 8 Functionaldependence tables for speedreducer design problem

4.1.2 Experiments setup

Experiments are set up to study the effect of twoaspects on the convergence behavior of the nestedmethod of multipliers, both exact (ENMOM) and inex-act (INMOM), and the alternating direction method ofmultipliers (ADMOM). The two aspects are the initialdesign and the initial penalty weight.

To study the effect of the initial design, five designs(Table 1) are selected with components randomly cho-sen within the variable bounds.

To study the effect of the initial penalty weights, thedecomposed problem is solved for a range of values.We vary the initial weights between 10−1 and 102. Theupdate factors β and γ in (11) are set to β = 2, γ =0.25 (ENMOM), β = 1.5, γ = 0.5 (INMOM), and β =

1.1, γ = 0.9 (ADMOM). For all experiments, the ini-tial Lagrange multiplier estimates v are set to zero.

The outer loop termination tolerance in (12) and(13) is set to ε = 10−3. The constant inner loop termi-nation tolerance for ENMOM is set to εinner = 10−5.The tolerance εk

inner for INMOM is decreased from100 to 10−5 in ten iterations, after which it remains atthe constant value of 10−5:

εkinner = max(10−5, 10−0.5k). (17)

The solution procedure is also stopped when the totalnumber of subproblem optimizations is larger than1,000.

The master optimization problem is solved analyti-cally using (8), and the optimization subproblems are

Table 1 Initial designs for speed reducer problem (16)

Initial designx1 x2 x3 x4 x5 x6 x7

1 3.4980 0.7029 20.1022 7.4879 7.9088 3.1094 5.49892 2.6168 0.7440 18.6323 7.9214 8.0544 3.1806 5.26363 2.9124 0.7182 25.5718 8.1095 8.0233 2.9453 5.30664 2.9352 0.7584 27.0450 7.8013 7.7175 3.6995 5.44535 3.1134 0.7390 27.2370 7.5827 8.0699 3.6276 5.4573


Fig. 9 Example 1: computational costs required for convergenceas a function of the initial penalty parameter weight. Each linerepresents results from a specific initial design (see Table 1). Gray

markers represent experiments that did not converge within 1,000subproblem optimizations or that converged prematurely

solved with NPSOL using analytical gradients and de-fault options.

4.1.3 Numerical results

Figure 9 displays the total number of subproblem opti-mizations required for convergence, as a function of theinitial penalty weight for ENMOM, INMOM, and AD-MOM. Each line represents the computational costsrequired to reach a solution starting from the five initialdesigns. Notably, the effect of the initial design is small.

For some experiments with large initial weights,we observed premature convergence of the solutionalgorithms INMOM and ADMOM. The final solu-tion estimate was still relatively far from the knownoptimal solution although the stopping conditions weresatisfied. To filter out experiments that converged pre-maturely, we include only experiments for which the so-lution error e is smaller than 10−2. The (scaled) solutionerror at the final iteration K is given by:

e = maxi=1,...,n

|z∗i − zK

i |1 + |z∗

i |(18)

where zK is the solution estimate1 after K outer loopiterations. In Fig. 9, experiments that converged prema-turely are represented by gray markers. Experimentsthat required the maximal number of subproblem op-timizations (1,000) are also indicated by gray markers.Experiments that converged to a solution with an errore smaller than 10−2 within 1,000 subproblem optimiza-tions are indicated by black markers.

1The value for the shared variables yK at iteration K is takenas the average over the original and all copies yK = (yK +∑M

j=1 yKj )/(1 + M).

The costs of the algorithms (Fig. 9) increase wheninitial weights become smaller. This behavior is char-acteristic to the method of multipliers and is caused bythe influence of the penalty weights on the Lagrangemultiplier updates (Bertsekas 2003). In the Lagrangemultiplier updates of (10), the penalty weights deter-mine the step size of the update. A small initial weightresults in a small initial step size. For very small weights,the initial step sizes are too small to force convergence.The step size is, however, consecutively increased byincreasing the penalty weights according to (11). Aftera number of outer loop weight updates, the step sizeis large enough for convergence. For slightly largerweights, the initial step size is already large enough toforce convergence from the first iteration onwards.

However, when initial weights are too large, thenumber of subproblem optimizations greatly increasesfor ENMOM (Fig. 9a). These extra costs are mainlycaused by an increase in the coupling strength betweensubproblems due to the larger weights. These largeweights introduce large off-block-diagonal terms in theobjective Hessian of the relaxed AIO problem (6). Dueto these large off-block-diagonal terms, the couplingbetween subproblems is increased, which slows downthe inner loop BCD algorithm. This effect is especiallystrong in the first iteration, where the initial design istypically far away from the optimal design.

INMOM and ADMOM show premature conver-gence when weights become too large (grey markers).The algorithms terminate at inaccurate solutions forweights larger than 5. The premature convergence iscaused by the nature of the INMOM and ADMOMalgorithms (inexact inner loop) and is also observed forother test problems.

In general, the looser inner loop termination toler-ances for INMOM (Fig. 9b) and ADMOM (Fig. 9c)


reduce the computational costs with respect to EN-MOM. Excessive computational costs in the (firstfew) inner loop iterations are avoided with the loosertolerances.

All three algorithms show a “critical weight” valuefor which a large change in behavior can be observed.For initial weights smaller than this critical value, costsare relatively low. When initial weights are larger, how-ever, the costs become very high or the algorithm isterminated before reaching an accurate solution. Thecritical weight also coincides with the weight for whichthe lowest computational cost is obtained. For thisexample problem, the critical weight is w∗ ≈ 5.

4.1.4 Initial weight selection

To investigate its behavior, the initial weight settingstrategy of (15) is applied to the above example prob-lem. First, we solve the decomposed problem with w =10−3 to obtain

∑Mj=1 ‖c j‖2

2 = 80.7. As an estimator forthe objective function value, we use three differentsettings: 103, 104, and 105, respectively (note that theoptimal value for the speed reducer problem is 2,994).Now with w∗ = 5 and (15), we can determine the valueof α that would yield w = w∗. The values for α are 2.02,0.202, and 0.020 for f = 103, 104, and 105, respectively.From the perspective of the results as presented inFig. 9, an initial weight that is larger than the criticalweight w∗ ≈ 5 is highly undesirable. Therefore, we sug-gest to use a moderate value of α = 0.1 to provide amargin of safety against overestimation of the objectivefunction value (estimates f = 103 and f = 104 yieldw = 1.11 and w = 3.52, both below w∗ = 5).

4.2 Example 2: geometric optimization problem

The second example is a non-convex geometric op-timization problem, which also appeared in earlierwork on Analytical Target Cascading (Kim 2001;Michalek and Papalambros 2005a; Tosserams et al.2006; Tzevelekos et al. 2003). Four different decompo-sitions of the problem are presented here, which allowsus to study the effect of the choice of decomposition oncomputational costs.

We compare the performance of our algorithms totwo implementations of the Inexact Penalty Decompo-sition (IPD) method of DeMiguel and Murray (2006).The first implementation, IPD-QP, uses the quadraticpenalty function as originally proposed by DeMigueland Murray (2006). The second implementation, IPD-AL, is a novel variant and uses an augmented La-grangian formulation instead of the quadratic penaltyfunction for fairness of comparison.

The all-in-one geometric optimization problem isgiven by:

minz1,...,z14

f = z21 + z2

2

subject to g1 = (z−23 + z2

4)z−25 − 1 ≤ 0

g2 = (z25 + z−2

6 )z−27 − 1 ≤ 0

g3 = (z28 + z2

9)z−211 − 1 ≤ 0

g4 = (z−28 + z2

10)z−211 − 1 ≤ 0

g5 = (z211 + z−2

12 )z−213 − 1 ≤ 0

g6 = (z211 + z2

12)z−214 − 1 ≤ 0

h1 = (z23 + z−2

4 + z25)z

−21 − 1 = 0

h2 = (z25 + z2

6 + z27)z

−22 − 1 = 0

h3 = (z28 + z−2

9 + z−210 + z2

11)z−23 − 1 = 0

h4 = (z211 + z2

12 + z213 + z2

14)z−26 − 1 = 0

z1, . . . , z14 > 0

(19)

Fig. 10 Example 2: decomposition details for the geometric op-timization problem. Each box represents a subsystem, and theconnections between the boxes represent variables shared bysubsystems. The fields in each box represent the local variables,local objectives, and local constraints of the associated subsystem


Table 2 Initial designs for geometric optimization example problem (19)

Initial designz1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14

1 3.6745 0.9556 2.3038 0.0279 4.9149 3.6668 3.9694 3.6564 1.1721 1.9595 3.2761 2.8287 0.9295 2.42482 3.4366 2.1123 2.2868 1.4870 2.7634 1.8794 4.5998 0.9695 2.7439 3.1366 4.1879 3.5827 3.5032 0.57313 1.7306 4.2799 2.2534 0.2458 2.0004 0.0494 4.2236 4.5241 4.6579 3.4954 1.8580 2.5566 4.9135 3.32434 0.8302 2.4512 2.0611 3.4659 0.9939 2.0993 1.8388 2.8460 1.6760 1.9859 2.1263 3.8820 4.0332 1.82695 0.7781 4.0797 4.5080 3.2505 3.1260 3.7683 3.1040 3.1589 3.2777 2.0681 2.9733 2.4467 3.5178 0.7002

The unique optimal solution, obtained with NPSOL, tothis problem (rounded) is z∗ = [2.84, 3.09, 2.36, 0.76, 0.

87, 2.81, 0.94, 0.97, 0.87, 0.80, 1.30, 0.84, 1.76, 1.55]T

with f (z∗) = 17.59 and all constraints active.

4.2.1 Decomposed formulation

Four decompositions are selected to illustrate theireffect on the performance of the presented solutionstrategies. Two decompositions (1 and 3) are new, whiledecompositions 2 and 4 were presented by Kim (2001)and Tzevelekos et al. (2003), respectively. Details of thefour decompositions are given in Fig. 10. Note that thenumber of shared variables increases with the decom-position index. It is expected that decompositions witha larger number of shared variables are harder to solveand therefore require more computational effort.

4.2.2 Experiments setup

All four decompositions are solved with ENMOM,INMOM, ADMOM, and the two IPD methods forfive initial designs with components randomly selectedbetween 0 and 5 (Table 2).

We use the initial weight setting strategy with f =10, and α = 0.1, and obtain the estimates for c j, j =1, . . . , M by solving the decomposed problems withw = 10−3. The weight setting is verified for ENMOM,INMOM, and ADMOM by experiments with fixed

initial weights varying between 10−2 and 102, similar tothe previous example.

The penalty update parameters β and γ in (11)are set to β = 2, γ = 0.25 (ENMOM and IPD-AL),β = 1.5, γ = 0.5 (INMOM), and β = 1.1, γ = 0.9(ADMOM). For IPD-QP we take wk+1 = βwk, withβ = 2. For all experiments, the initial Lagrange multi-plier estimates v are set to zero.

The algorithms are terminated when the stop-ping criteria (12) and (13) are satisfied with fourdifferent values for the termination tolerances: ε =10−2, 10−3, 10−4, 10−5. The inner loop termination tol-erance for ENMOM is set to εinner = ε/100, and forINMOM inner loop tolerances are reduced from 100 toεk

inner = ε/100 in ten iterations, similar to the scheme of(17). In the IPD algorithms, the line-search parameterσ is set to 10−4. For ENMOM, INMOM, and ADMOM,the master problem is solved analytically using (8),and the subproblems for all methods are solved withNPSOL using analytical gradients and default settings.

4.2.3 Numerical results

Results for ENMOM, INMOM, and ADMOM for allfour decompositions are similar to the results displayedin Fig. 9 for the speed reducer example problem. Again,the influence of the initial design on computationalcosts is small. Similar to the speed reducer, for example,a critical weight can be observed above which the

Fig. 11 Example 2: numerical results. Average number of required subproblem optimizations as a function of the solution error asdefined in (18) for five solution algorithms. Markers represent experiments with ε = 10−2, 10−3, 10−4, 10−5 (markers from left to right).


performance of the algorithms rapidly decreases. Sur-prisingly, the critical weights for the decompositionsare very similar, w∗ ≈ 2. Apparently, the critical weightdepends mainly on the original non-decomposed AIOproblem. In all four decompositions, the suggestedweight setting strategy yielded initial weights belowthe critical value w∗ = 2. The computed weights werew = 1.1, 0.9, 0.7, and 0.5, for the four decompositions,respectively.

Figure 11 summarizes the results for all five solu-tion strategies for the four different termination tol-erances. Each line in each figure depicts the requirednumber of subproblems optimizations, averaged overthe five initial designs for each algorithm as a func-tion of the final solution error, as defined in (18).The markers for each line represent experiments withε = 10−2, 10−3, 10−4, and 10−5, respectively (markersfrom left to right). The only outlier for IPD-QP indecomposition 4 with ε = 10−5 is caused by the ill-conditioning through the quadratic penalty for thismethod. For the remaining experiments, the averagenumber of required function evaluations per subprob-lem optimization was around ten, which indicates thatthe influence of the ill-conditioning introduced by thequadratic penalty term is small.

From the proposed methods, the inexact algorithmsINMOM and ADMOM again outperform ENMOMwith respect to subproblem optimizations and functionevaluations for all decompositions. For decompositions2, 3, and 4, ADMOM outperforms INMOM also andhas overall the lowest solution cost. An advantage isgained through avoiding the iterative inner loop andperforming only a single inner loop iteration.

When compared to the nested IPD algorithms, ourproposed alternating augmented Lagrangian decompo-sition method shows a similar or better performance.IPD does not take advantage here of the superlinearconvergence rate for the inner loop (block coordinatedescent used in our method only has linear conver-gence rate). Apparently, the subproblem optimizationsrequired for the line search step in IPD nullifies theadvantage for this example.

The results for all four decompositions show thatthe required solution costs increase as the numberof shared variables becomes larger. For ADMOM,the increase is small, where for the other methodsthe solution costs increase substantially. Apparently,the number of shared variables has only little influenceon the performance of ADMOM.

The results of Fig. 11 advocate the use of our AD-MOM variant or the nested IPD-AL method. Bothhave low computational costs, the ADMOM methoda factor 2–3 lower than IPD-AL. A further advantage

Fig. 12 Example 2, decomposition 1: iteration path history to-wards the final solution (y∗ = [z∗

3, z∗11]T ) for the master problem

shared variables y. Iteration path for IPD includes line searchtrial points

of the ADMOM method is that the sequence of shared“targets” y sent from the master problem to the sub-problems is more smooth. As an illustration, Fig. 12gives the iteration paths for our ADMOM variant andfor the IPD-AL method. The steady approach of AD-MOM is due to the calculation of the targets in the mas-ter problem as a kind of weighted average. The IPD-AL master problem uses a quasi-Newton method withback-tracking line search, which causes the less smoothiteration path. This behavior is even more pronouncedfor IPD-QP.

When comparing IPD-QP to IPD-AL, cost re-ductions of around 60% can be observed for theaugmented Lagrangian formulation, with the benefitsbecoming larger for more accurate solutions. We ob-served the same result in previous work for ATC(Tosserams et al. 2006). Based on this observation,we expect that the augmented Lagrangian formulationmay be a powerful aid in the further development ofbi-level decomposition methods.

5 Conclusions and Discussion

This paper presented an augmented Lagrangian de-composition method for quasi-separable MDO prob-lems. The decomposition method is derived fromavailable algorithms from the nonlinear programmingliterature. The main techniques used are an augmentedLagrangian relaxation and the block coordinate de-scent algorithm. Using existing results on convergenceanalysis, local convergence of the solution algorithmsto KKT points of the original decomposed problem canbe proven under mild conditions.

The alternating direction method of multipliers(ADMOM), which carries out just one master problemand one set of disciplinary subproblem optimizations


in the inner loop, has proven to be robust and mostefficient (about 20–60 subproblem optimizations re-quired for solution error 10−2 to 10−5). To assure thata solution obtained with ADMOM is not a result ofpremature convergence, the algorithm can be restartedfrom the current solution with a smaller initial weight(typically factor 10 smaller).

Compared to existing MDO decomposition methodsfor quasi-separable problems, the proposed method hasseveral distinct advantages. In addition to the availabil-ity of theoretical convergence results, the formulationis such that the coordinating master problem can besolved analytically, and the disciplinary optimizationsubproblems can be solved using efficient gradient-based optimization algorithms. The method alternatesbetween solving the master problem and the disci-plinary subproblems, with the freedom to choose thenumber of iterations before the penalty parameters areupdated in the outer loop.

An initial performance comparison of our proposedmethod and the IPD method of DeMiguel and Murrayhas been carried out. Although the results for the ex-amples seem to be in favor of our proposed method,a more elaborate comparison including also other de-composition methods for quasi-separable problems isdesired to address issues such as efficiency, robustness,parameter selection, and scalability on a wider range ofproblems.

This paper furthermore shows that an augmentedLagrangian formulation of IPD can significantly reducethe computational costs compared to the quadraticpenalty proposed originally by DeMiguel and Murray(2006). Besides the observed efficiency improvements,the augmented Lagrangian variant of IPD may provideother benefits as well, given the large body of aug-mented Lagrangian theory, e.g. to allow inexact masterand subproblem solutions.

Acknowledgements The authors would like to thank the re-viewers for their comments and suggestions, which helped toimprove the paper.

References

Alexandrov NM, Lewis RM (2002) Analytical and computationalaspects of collaborative optimization for multidisciplinarydesign. AIAA J 40(2):301–309

Allison JT, Kokkalaras M, Zawislak MR, Papalambros PY (2005)On the use of analytical target cascading and collaborativeoptimization for complex system design. In: Proceedings ofthe 6th world congress on structural and multidisciplinaryoptimization, Rio de Janeiro, Brazil

Arora JS, Chahande AI, Paeng J (1991) Multiplier methodsfor engineering optimization. Int J Numer Methods Eng32(7):1485–1525

Balling RJ, Sobieszczanski-Sobieski J (1996) Optimization ofcoupled systems: a critical overview of approaches. AIAAJ 34(1):6–17

Bazaraa MS, Sherali HD, Shetty CM (1993) Nonlinear program-ming: theory and algorithms. Wiley, New York

Bertsekas DP (1982) Constrained optimization and lagrange mul-tiplier methods. Academic, New York

Bertsekas DP (2003) Nonlinear programming, 2nd edn. AthenaScientific, Belmont, Massachusetts (2nd printing)

Bertsekas DP, Tsitsiklis JN (1989) Parallel and distributed com-putation. Prentice-Hall, Englewood Cliffs, NJ

Bezdek JC, Hathaway R (2002) Some notes on alternating opti-mization. Lect Notes Comput Sci 2275:288–300

Braun RD (1996) Collaborative optimization: an architecturefor large-scale distributed design. Ph.D. thesis, StanfordUniversity

Braun RD, Moore AA, Kroo IM (1997) Collaborative approachto launch vehicle design. J Spacecr Rockets 34:478–486

Cramer EJ, Dennis JE, Frank PD, Lewis RM, Shubin GR(1994) Problem formulation for multidisciplinary optimiza-tion. SIAM J Optim 4(4):754–776

DeMiguel AV, Murray W (2000) An analysis of collaborative op-timization methods. In: Eight AIAA/USAF/NASA/ISSMOsymposium on multidisciplinary analysis and optimization,AIAA Paper 00-4720

DeMiguel AV, Murray W (2006) A local convergence analysis ofbilevel decomposition algorithms. Optim Eng 7:99–133

Fortin M, Glowinski R (1983) Augmented lagrangian meth-ods: application to the numerical solution of boundary-valueproblems. North-Holland, Amsterdam, the Netherlands

Gill PE, Murray W, Wright MH (1981) Practical optimization.Academic, London, UK

Golinski J (1970) Optimal synthesis problems solved by means ofnonlinear programming and random methods. J Mech 5:287–309

Haftka RT, Watson LT (2005) Multidisciplinary design optimiza-tion with quasiseparable subsystems. Optim Eng 6:9–20

Holmström K, Göran AO, Edvall MM (2004) User’s guide forTOMLAB 4.2. Tomlab Optimization Inc., San Diego CA.http://tomlab.biz (website date of access: 23 November 2006)

Kim HM (2001) Target cascading in optimal system design. Ph.D.thesis, University of Michigan

Kokkolaras M, Fellini R, Kim HM, Michelena NF, PapalambrosPY (2002) Extension of the target cascading formulationto the design of product families. Struct Multidiscipl Optim24(4):293–301

Liu B, Haftka RT, Watson LT (2004) Global–local structuraloptimization using response surfaces of local optimizationmargins. Struct Multidiscipl Optim 27(5):352–359

Michalek JJ, Papalambros PY (2005a) An efficient weighting up-date method to achieve acceptable inconsistency deviation inanalytical target cascading. ASME J Mech Des 127:206–214

Michalek JJ, Papalambros PY (2005b) Technical brief: weights,norms, and notation in analytical target cascading. ASME JMech Des 127:499–501

Michelena N, Kim HM, Papalambros PY (1999) A system parti-tioning and optimization approach to target cascading. In:Proceedings of the 12th international conference on engi-neering design, Munich, Germany

Michelena NF, Park H, Papalambros PY (2003) Convergenceproperties of analytical target cascading. AIAA J 41(5):897–905

http://tomlab.biz


Sobieski IP, Kroo I (2000) Collaborative optimization using re-sponse surface estimation. AIAA J 38(10):1931–1938

Sobieszczanski-Sobieski J (1988) Optimization by decomposi-tion: a step from hierarchic to non-hierarchic systems. In:2nd NASA Air Force symposium on recent advances in mul-tidisciplinary analysis and optimization, Hampton, Virginia,NASA-CP 3031

Sobieszczanski-Sobieski J, Agte JS, Sandusky Jr RR (2000)Bilevel integrated system synthesis. AIAA J 38(1):164–172

Sobieszczanski-Sobieski J, Altus TD, Phillips M, Sandusky JrRR (2003) Bilevel integrated system synthesis for con-current and distributed processing. AIAA J 41(10):1996–2003

Tosserams S (2004) Analytical target cascading: Convergenceimprovement by subproblem post-optimality sensitivities.Master’s thesis, Eindhoven University of Technology,The Netherlands, SE-420389

Tosserams S, Etman LFP, Papalambros PY, Rooda JE (2006)An augmented lagrangian relaxation for analytical tar-get cascading using the alternating direction methodof multipliers. Struct Multidiscipl Optim 31(3):176–189DOI 10.1007/s00158-005-0579-0.

Tzevelekos N, Kokkolaras M, Papalambros PY, Hulshof MF,Etman LFP, Rooda JE (2003) An empirical local conver-gence study of alternative coordination schemes in analyticaltarget cascading. In: Proceedings of the 5th world congresson structural and multidisciplinary optimization, Lido diJesolo, Venice, Italy

Wagner TC (1993) A general decomposition methodology foroptimal system design. Ph.D. thesis, Univerisy of Michigan,Ann Arbor

http://dx.doi.org/10.1007/s00158-005-0579-0

Documents

An augmented Lagrangian decomposition method for quasi … · 2009. 2. 17. · Struct Multidisc Optim (2007) 34:211–227 DOI 10.1007/s00158-006-0077-z RESEARCH PAPER An augmented