A model-hybrid approach for unconstrained optimization problems

Numer AlgorDOI 10.1007/s11075-013-9757-0

ORIGINAL PAPER

A model-hybrid approach for unconstrainedoptimization problems

Fu-Sheng Wang ·Jin-Bao Jian ·Chuan-Long Wang

Received: 29 December 2012 / Accepted: 6 August 2013© Springer Science+Business Media New York 2013

Abstract In this paper, we propose a model-hybrid approach for nonlinear optimiza-tion that employs both trust region method and quasi-Newton method, which canavoid possibly resolve the trust region subproblem if the trial step is not acceptable. Inparticular, unlike the traditional trust region methods, the new approach does not usea single approximate model from beginning to the end, but instead employs quadraticmodel or conic model at every iteration adaptively. We show that the new algo-rithm preserves the strong convergence properties of trust region methods. Numericalresults are also presented.

Keywords Nonlinear programming · Unconstrained optimization · Trust regionmethods · Approximate model · Hybrid approach

1 Introduction

In this paper, we consider the following unconstrained nonlinear programmingproblem:

minx∈Rn f (x) (1.1)

where f :Rn → R is a twice continuously differentiable function.

This work is supported by the National Natural Science Foundation of China (11171250);The Natural Science Foundation of Shanxi Province of China (2011011002-2)

F.-S. Wang (�) · C.-L. WangDepartment of Mathematics, Taiyuan Normal University,Taiyuan 030012, Chinae-mail: [email protected]

J.-B. JianCollege of Mathematics and Information Science, Yulin Normal University,Yulin, Guangxi 537000, China

mailto:[email protected]

Numer Algor

Trust region methods are powerful optimization methods for solving problem(1.1), and many different versions have been presented by using trust region strategyso far. The basic idea of traditional trust region method is as follows: At each iter-ation, it produces an iterate xk as the approximate minimizer of a relatively simplemodel function within a region in which the algorithm ‘trust’ that the model functionbehaves like f.

At present, there are mainly two approximation models. One is the quadraticmodel of f (xk + s)− f (xk), which takes the form

ϕk(s) = gTk s + 1

2sT Bks (1.2)

where s = x − xk , gk = ∇f (xk), Bk ∈ Rn×n is a symmetric matrix which approx-imates the Hessian of objective function or chosen to be exact Hessian ∇2f (xk).The trust region methods associated with the quadratic model (1.2) are usually calledthe traditional or standard trust region methods, hereafter, simply denoted by TTRQ.Presently, most trust region methods belong to TTRQ. However, the quadratic modeldoes not take into account the information concerning the function value in the pre-vious iteration which is useful for the algorithms, and many numerical results showthat the quadratic model methods often produce a poor prediction of the minimizerof the objective function with high oscillation. To improve TTRQ methods, someauthors applied the nonmonotone technique to TTRQ [1–5], Other authors sought fora kind of combination methods, that is, combining the trust region methods and theline-search methods [6–9].

The other is the conic model of f (xk + s), which takes the form

ψk(s) = f (xk)+ gTk s

1 − hTk s+ 1

2

sT Aks

(1 − hTk s)2. (1.3)

The right hand expression of (1.3) is so called conic function, which has been firstintroduced to optimization by Davidon [10], where Ak is an approximate Hessianmatrix to f (x) at xk , hk ∈ Rn is a parametric vector for collinear scaling at thekth iteration, which is normally called the horizontal vector. If hk = 0, the conicmodel reduces to a quadratic model. The trust region method associated with theconic model (1.3), hereafter, is simply denoted by TTRC. Sorensen [11] publisheddetailed results on a class of conic model method and proved that a particular memberof this class has the Q-superlinear convergence. Ariyawansa [12] derived collinearscaling Broyden’s family and established its superlinear convergence results. Sheng[13] studied further the interpolation properties of conic model method. Sun [14]analyzed several non-quadratic model methods and pointed out that the collinearscaling method is one of nonlinear scaling methods with scale invariance. Gourgeonand Nocedal [15] also proposed a conic model algorithm for solving optimizationproblems. These methods are all based on line-search methods. Di and Sun [16] firstintroduced the conic model to the trust region method, and proved that TTRC methodalso has a strong global and superlinear convergence. In recent years, a variety ofconic trust region algorithms have been proposed to improve the efficiency of TTRCmethod [17–19].

Numer Algor

The conic model methods are the generalization of the quadratic model meth-ods, which have several advantages. First, if the objective function has strongnon-quadratic behavior or its curvature changes severely, the conic model usuallyapproximates the objective function better than the quadratic model. Second, theconic model possesses more degree of freedom to incorporate more information inthe iterations, which satisfies four interpolation conditions of the function values andthe gradient values at the current and the previous points. Using these rich inter-polation information may improve the performance of the algorithms. Third, theconic model methods have the similar global and local convergence properties asthe quadratic model methods. However, the numerical results in literature, such asGrandinetti [20], show that, the conic algorithms did not exhibit any jump of qualityfor a general nonlinear objective function, as might be expected.

Line-search quasi-Newton methods also have another advantages over theirtrust-region counterparts. By employing an appropriate line-search strategy and anappropriate updating strategy for Bk , one can generate a positive definite sequenceof approximations to the Hessian of f (x). In contrast, trust region methods have tra-ditionally had to modify Bk in ways that make it a less accurate model of the Hessianof f (x) in order to maintain a positive definite sequence of approximate Hessian [6,7, 21]. Therefore, line-search quasi-Newton methods tend to compute each iteratemore quickly than do their comparable trust region methods. However, line-searchquasi-Newton methods often require more iterations to find a minimizer of f (x) thando trust region methods.

Comparing TTRQ method with TTRC method, though both have the strong globaland local convergence properties, the previous is simple and more suitable for theobjective functions with better quadratic behavior, while the latter is relative complexand has advantages in the objective functions with non-quadratic behavior. For ageneral objective function, however, it’s very difficult to prejudge whether it hasbetter quadratic or non-quadratic behavior. In particular, it’s more important to havethe knowledge of its local shape-property in implementation. In fact, since the initialiterate is usually given to be arbitrary, as if the same objective function is concerned,the different initial points are chosen, the different local shapes are usually exhibitedduring the iteration. For example, it is well-known that the objective function is verywell approximated by the quadratic model in the neighborhood of the minimizer, andit’s not desirable to approximate the simple one with the more complicated conicmodel. This may be one reason why there is no jump of quality of the previousconic models for unconstrained optimization. According to the above analysis, howto select a reasonable local approximation model to the objective function f is worthinvestigating.

Recently, some authors employ the adaptive, self-adaptive or nonmonotone tech-niques to tackle the drawbacks of traditional trust region methods, for example, Zhouand Zhang [22] proposed an adaptive trust region algorithm with the nonmonotonetechnique based on the simple quadratic model, Zhao and Wang [5] presented a non-monotone self-adaptive trust region algorithm with line search based on the conicmodel, Liu and Ma [4] presented a nonmonotone trust region algorithm with newinexact line search, numerical experiments show that these methods are effective.But these methods all focus on the trust region methods based on a single model.

Numer Algor

Motivated above ideas, in this paper, we propose a model-hybrid approach thatcombines elements of a pure quadratic trust region method, a pure conic trustregion methods and a line-search quasi-Newton method, which adaptively choose thequadratic trust region method, the conic trust region method, or the line-search quasi-Newton method according to the current or the previous information at iterate xk.The new algorithm retains the quick convergence of trust region methods, while sig-nificantly decreasing the average cost per iteration of the method. The new method,like most trust region methods, also puts few restrictions on the matrix Bk . In par-ticular, the method will behave correctly if some of the matrices are indefinite orsingular.

The organization of this paper is as follows: In Section 2, we briefly describe thetraditional trust region method with the quadratic model and the conic model. InSection 3, we describe the model-hybrid approach, and a new algorithm is presented.In Section 4, we analyze the global and local convergence under certain conditions.In Section 5, we report the numerical results. Finally, we end the paper with theconclusions.

We shall use the following notation and terminology. Unless otherwise stated, thevector norm used in this paper is Euclidean vector norm on Rn, and the matrix normis the induced operator norm on Rn×n, g(x) ∈ Rn is the gradient of f evaluated atx, G(x) ∈ Rn×n is the Hessian of f evaluated at x, fk = f (xk), gk = g(xk), andGk = G(xk).

2 Preliminaries

In this section, we summarize the traditional trust region methods. For an in-depthoverview them, see Conn et al. [23, 24]. For the TTRQ methods, the correspondingsubproblem may be formally stated as follows:

min

{ϕk(s) = gTk s + 1

2sT Bks, ‖ Nks ‖≤ δk

}. (2.1)

where δk is a positive scalar known as the trust region radius, ‖ . ‖ is any vector norm,Nk is a scaling matrix that is chosen to improve the approximation to the problem. Noconsensus, however, exhibits on what choice of Nk is appropriate. Denote the localminimizer of subproblem (2.1) as sk , which is called a trial step, then we computethe ratio

ρk = aredk

predk= f (xk)− f (xk + sk)

ϕk(0)− ϕk(sk). (2.2)

The ratio ρk plays a very important role in the trust region methods, it is used todecide whether the trial step sk is acceptable and how to adjust the trust region radiusδk . If the step sk is not acceptable, one rejects it, then shrinks δk and resolves the

Numer Algor

subproblem (2.1) until it is acceptable. The following lemma has played a very impor-tant role in analyzing the global convergence of the trust region methods, which wasestablished by Powell in [25].

Lemma 2.1 If s∗ ∈ Rn is a solution of subproblem (2.1), then

ϕk(0)− ϕk(s∗) ≥ 1

2‖ g ‖ min{δ, ‖ g ‖ / ‖ B ‖}. (2.3)

In fact, the global convergence theory of trust region methods only requires thatthe computed trial step sk satisfies

ϕk(0)− ϕk(sk) ≥ β ‖ gk ‖ min{δk, ‖ gk ‖ / ‖ Bk ‖}, (2.4)

for all k, where β is a positive constant. Obviously, (2.3) is a special case of (2.4). Inprevious works, some approximate solutions of (2.1), such as those used the doglegor double dogleg methods [21, 26], the two-dimensional subspace methods [27, 28],the truncated conjugate gradient methods [29, 30], the Newton’s methods [6, 31],also satisfy (2.4). Thus, for the TTRQ methods, solving the exact solution of (2.1) isnot necessary, and getting its approximate solution which satisfies the key inequality(2.4) is enough. For above reasons, one usually takes these approximate solutions asthe trial steps. The TTRQ method can be described as follows:

Algorithm 1 TTRQ Method

step 0 Given x0, B0, δ̄ > 0, δ0 ∈ (0, δ̄), 0<γ1 < 1<γ2, 0<η1 <η2 < 1, ε > 0,set k := 0.

step 1 Compute fk, gk , if ‖ gk ‖< ε, stop; Otherwise,

step 2 Obtain sk by solving (2.1).

step 3 Evaluate ρk by (2.2).

step 4 If ρk < η1, set δk = γ1δk , and go to step 2.

step 5 Set xk+1 = xk + sk , and

δk+1 ={

min{γ2δk, δ̄}, if ρk > η2 and ‖ sk ‖= δkδk, otherwise

(2.5)

step 6 Update Bk to Bk+1, k := k + 1, goto step 1.

Typically, the parameters can be chosen as follows: η1 = 0.25, η2 = 0.75, γ1 =0.5, γ2 = 2, and Nk = I . For TTRQ methods, the main source of computationaleffort, apart from the function and gradient evaluations required, is the work on thesubproblem (2.1) to determine a successful trial step sk . From Algorithm 1, we can

Numer Algor

see that, when ρk < η1, the subproblem (2.1) may be resolved several times ateach iteration k till the trial step is accepted. Thus, it seems difficult to improve theefficiency only by means of solving the subproblem (2.1).

The other traditional trust region methods are the TTRC methods, they can beregarded as a generalization of TTRQ methods, which replacing the quadratic model(1.2) with the conic model (1.3), Di and Sun [16] first presented a trust region methodwith conic model in 1996, the trust region subproblem has the form

min

{ψk(s) = fk + gTk s

1 − hTk s+ 1

2

sT Aks

(1 − hTk s)2, ‖ Nks ‖≤ k

}. (2.6)

where s = x − xk, Nk is a scaling matrix. In order to keep the trust region {s| ‖Nks ‖≤ k} in one side of the superplane {s|1 − hTk s = 0}, we assume that ‖ hk ‖·‖ N−1

k ‖ ·k < 1. Similarly to TTRQ methods, the key to TTRC methods is to solve(2.6) for a successful trial step sk , such that the ratio

ρk = aredk

predk= f (xk)− f (xk + sk)

ψk(0)− ψk(sk), (2.7)

is large enough. If the trial step sk is accepted, then xk+1 = xk + sk ; Otherwise, thesubproblem (2.6) has to be resolved repeatedly till a satisfactory trial step is obtained.

For TTRC methods, to maintain the global convergence, exactly solving the sub-problem (2.6) is also not necessary. Similarly to TTRQ methods, it is required thatthe approximate solutions should satisfy the convergent condition (2.4). The TTRCmethod can be described as follows:

Algorithm 2 TTRC Method

step 0: Given x0, A0, h0, 0, ̄ > 0, 0 ∈ (0, ̄), 0 < γ1 < 1 < γ2,0 < η1 <

η2 < 1, ε > 0, set k := 0.step 1: Compute gk , if ‖gk‖ < ε, stop.step 2: Obtain sk by solving (2.6).step 3: Compute rk by (2.7).step 4: If rk < η1, set k = γ1k , go to step 2. Otherwise,step 5: Set xk+1 = xk + sk , and

k+1 ={

min{γ2k, ̄}, if rk ≥ η2, and ‖sk‖ = k;k, otherwise.

(2.8)

step 6: Update hk, Ak, set k = k + 1, and go to step 1.

In this algorithm, we usually choose Nk = I , and if ‖ hk ‖ ·k ≥ 1, we setk = σ/ ‖ hk ‖ such that ‖ hk ‖ ·k < 1, where 0 < σ < 1. Similarly to TTRQmethod, TTRC methods also possibly resolve the subproblem (2.6) several times ateach iteration k till the trial step is accepted.

Numer Algor

3 Algorithms

In this section, we describe a model-hybrid approach based on trust-region method.

Algorithm 3

step 0 Given x0, h0, δ̄ > 0, ̄ > 0, δ0 ∈ (0, δ̄),0 ∈ (0, ̄), B0 = I, A0 = I ;0 < η1 < η2 < 1, 0 < γ1 < 1 < γ2; ε > 0, k := 0.

step 1 If ‖ gk ‖≤ ε, then stop.step 2 Let vq = 0.step 3 (TTRQ step) Solve the subproblem (2.1) to obtain a solution

sk , and evaluate ρk from equation (2.2).step 4 If ρk < η1, then go to step 6; Otherwise,

let vq = 1, then set xk+1 = xk + sk .step 5 If ‖ gk+1 ‖≤ ε or ‖ xk+1 − xk ‖≤ ε, stop; Otherwise,

update δk by (2.5) and update Bk. Set k = k + 1 and go to step 3.step 6 Let vc = 0step 7 (TTRC step) Solve the subproblem (2.6) to obtain a solution

sk , and evaluate rk from equation (2.7).step 8 If rk ≥ η1, then set vc = 1, xk+1 = xk + sk . Otherwise, go to step 10;step 9 If ‖ gk+1 ‖≤ ε or ‖ xk+1 − xk ‖≤ ε, stop; Otherwise,

update k by (2.8) and update Ak, hk . Set k = k + 1 and go to step 7.step 10 If vc = 0, then compute αk by (3.2), and set xk+1 = xk + αksk , go to

step 11; Otherwise, set vq = 0, go to step 12.step 11 If ‖ gk+1 ‖≤ ε or ‖ xk+1 − xk ‖≤ ε, stop; Otherwise,

set k+1 = αk ‖ sk ‖ and update Bk , let k = k + 1, go to step 2.step 12 (TTRQ step) Solve the subproblem (2.1) to obtain a solution sk ,

and evaluate ρk from equation (2.2).step 13 If ρk ≥ η1, then let vq = 1, xk+1 = xk + sk , and go to step 14.

Otherwise, go to step 15;step 14 If ‖ gk+1 ‖≤ ε or ‖ xk+1 − xk ‖≤ ε, stop; Otherwise,

update δk by (2.5) and update Bk, set k = k + 1 and go to step 12.step 15 If vq = 0, then compute αk by (3.2), and set xk+1 = xk + αksk , go to

step 16. Otherwise, go to step 6.step 16 If ‖ gk+1 ‖≤ ε or ‖ xk+1 − xk ‖≤ ε, stop; Otherwise,

set k+1 = αk ‖ sk ‖ and update Ak, hk . Let k = k + 1, go to step 6.

Remark 1 In Algorithm 3, for TTRQ step, i.e., step 3 and 12, we use the algorithmproposed in [6] to solve (2.1). It is proved, in the literature [6], that these approximatesolutions satisfy the convergent condition (3.1) and

sTk gk ≤ −β ‖ gk ‖ min{δk, ‖ gk ‖ / ‖ Bk ‖}, (3.1)

Numer Algor

which implies that sk is also a descent direction; For TTRC step of step 7, we usea dog-leg method proposed in [17] to solve (2.6), where it’s proved that the approx-imate solutions also satisfy (2.4) and (3.1). Thus the solutions sk obtained both inTTRQ step and TTRC step can all be used as search directions, if they are not accept-able, in order not to resolve their corresponding subproblems repeatedly, we canmake a backtracking line search to obtain a new iterate xk+1 = xk + αksk , whereαk = μjk is a step-length satisfying:

f (xk)− f (xk + μjsk) ≥ −νμjgTk sk, (3.2)

where ν > 0, μ ∈ (0, 1) are constants, and jk is the smallest integer, j = 0, 1, 2, ...,such that the above inequality holds. This means that step 10 and 15 are feasible, andAlgorithm 3 is well defined.

Remark 2 In Algorithm 3, the meaning of flags vq , vc are explained as follows:

(1) The meaning of flag vq = 0: If the objective function f seems to be stronglynon-quadratic locally, then we mark vq = 0 and jump out the TTRQ step.

(2) The flag vq = 1 means that, if the objective function f shows to be quadraticlocally, then the iterate xk will be updated by using quadratic model at least onetime.

(3) The meaning of flag vc = 0: If the conic approximation model is no good forthe objective function locally, then we mark vc = 0 and jump out the TTRCstep.

(4) The flag vc = 1 means that, if it shows that the conic approximation model isreasonable around the current iterate xk , then we proceed by TTRC step, andxk will be updated by using conic model at least one time.

Remark 3 Algorithm 3 employs several self-adaptive techniques and has the follow-ing characteristics:

(1) In step 3-5, we solve a quadratic trust-region subproblem first. If ρk < η1,we mark the flag vq = 0. This implies that in such a case, the effect of localquadratic model approximation to the objective function may be not good, andthe quadratic approximation model can be unsuitable at current iterate xk . Thenwe switch the quadratic model to the conic model. Otherwise, we keep workingwith TTRQ method, and mark the flag vq = 1.

(2) In step 6-9, we solve a conic trust-region subproblem first. If rk < η1, wemark the flag vc = 0. This implies that in such a case, the effect of local conicmodel approximation to the objective function may be not good, and the conicapproximation model can be unsuitable at current iterate xk . Then we switch toline-search method for updating the iterate. Otherwise, we keep working withTTRC method, and mark the flag vc = 1.

(3) In step 10-11, if vc = 0, this implies that both quadratic and conic approxima-tion model may be unsuitable for the objective function in such a trust region.Then we simply use Armijo line-search to update the iterate. Otherwise, weswitch the conic model to the quadratic model.

Numer Algor

(4) In step 12-14, if vc = 1, this implies that the iterate proceeded by TTRCmethod successfully and jumped out the current loop. Then we switch to thequadratic model.

(5) In step 15-16, if vq = 0, this implies that the quadratic model is not suitablein the current trust region, then we use the line-search method to update theiterate and switch to the conic model again. If vq = 1, this implies that theiterate proceeded by TTRQ method successfully and jumped out the currentloop. Then we switch to the conic model.

4 Convergence results

In this section, we give the convergence results of our algorithm given in the previoussection. First, we define the index sets as follows

S1 = {k | xk is produced by T T RQ step},S2 = {k | xk is produced by T T RC step},S3 = {k | xk is produced by quasi −Newton step}

To analyze the new algorithm, we make the following assumptions.

Assumption 5.1

(1) For any point x0 ∈ Rn, the level set L(x0) = {x ∈ Rn : f (x) ≤ f (x0)} isbounded.

(2) g(x) is lipschitz continuously differentiable and bounded below on the level setL(x0).

Assumption 5.2

(1) There exists positive scalar M > 0, such that ‖ Bk ‖≤ M for all k.(2) All approximate solutions of subproblem (2.1) satisfy the inequality (2.4).

Assumption 5.3

(1) There exist positive scalars M1 > 0,M2 > 0, such that ‖ Ak ‖≤ M1, ‖ hk ‖≤M2 for all k.

(2) All approximate solutions of subproblem (2.6) satisfy the inequality (2.4).

Under the above assumptions, we’d like to discuss the global convergence of thenew algorithm.

Theorem 4.1 Suppose that Assumption 5.1, 5.2 and 5.3 hold, then Algorithm 3 eitherstops in finite number of iterations or generates an infinite sequence {xk}, such that

limk→∞ inf ‖ gk ‖= 0. (4.1)

Numer Algor

Proof 1

Case 1 The sequence {xk}, k ∈ S1 for all k. In this case, Algorithm 3 reduces toTTRQ method (see Algorithm 1), and the convergence is well known (see [23, 24]).

Case 2 The sequence {xk}, k ∈ S2 for all k. In this case, Algorithm 3 reduces toTTRC method (Algorithm 2), and the convergence is also given in [17].

Case 3 The sequence {xk}, k ∈ S3 for all k. This case rarely occurs, and if it does so,Algorithm 3 reduces to the standard quasi-Newton method, the convergence theoremis usual and omitted here (see [24]).

Case 4 For the sequence {xk}, both S1 and S3 are nonempty, but S2 is empty. In thiscase, Algorithm 3 turns to TRBT method, and the convergence is given in [6].

Case 5 For the sequence {xk}, both S2 and S3 are nonempty, but S1 is empty. In thiscase, Algorithm 3 turns to the conic trust region algorithm with backtracking, and theconvergence is given in [5].

Case 6 For the sequence {xk}, all sets Si, (i = 1, 2, 3) are nonempty. Suppose thatAlgorithm 3 does not stop in finite number of iterations.

(1) If S1 is a finite set, it shows that S2 ∪ S3 is infinite. Thus, for all sufficientlylarge k, k ∈ S2 ∪ S3, this implies that Algorithm 3 will turn to Case 2, Case 3,or Case 5.

(2) If S2 is a finite set, it shows that S1 ∪ S3 is infinite. Thus for all sufficientlylarge k, k ∈ S1 ∪ S3, this implies that Algorithm 3 will turn to Case 1, Case 3,or Case 4.

(3) If S1 and S2 are both infinite sets, and S3 is a finite set. In this case, for allsufficiently large k, k ∈ S1 ∪ S2, Thus, we might as well suppose that S3 isempty. We proceed by contradictions. If the conclusion (4.1) is not true, thenthere exists a positive constant ε, such that

‖ gk ‖≥ ε (4.2)

for all k. This, together with Assumptions 5.2 and 5.3, there exists a commonconstant β > 0, such that

ϕk(0)− ϕk(sk) ≥ β ‖ gk ‖ min{δk, ‖ gk ‖ / ‖ Bk ‖} ≥ βεmin{δk,

ε

M

},

(4.3)

sTk gk ≤ −β ‖ gk ‖ min{δk, ‖ gk ‖ / ‖ Bk ‖} ≤ −βεmin{δk,

ε

M

}, (4.4)

ψk(0)− ψk(sk) ≥ β ‖ gk ‖ min{k, ‖ gk ‖ / ‖ Ak ‖} ≥ βεmin

{k,

ε

M1

},

(4.5)

Numer Algor

sTk gk ≤ −β ‖ gk ‖ min{k, ‖ gk ‖ / ‖ Ak ‖} ≤ −βεmin

{k,

ε

M1

}, (4.6)

for all k.From Algorithm 3, for every k ∈ S1 ∪ S2, there hold ρk ≥ η1 and rk ≥ η1.

It follows that∞∑k=1

(fk − fk+1) =∑k∈S1

(fk − fk+1)+∑k∈S2

(fk − fk+1)

≥ η1

∑k∈S1

(ϕk(0)− ϕk(sk))+ η1

∑k∈S2

(ψk(0)− ψk(sk))

≥ η1βε∑k∈S1

min{δk,

ε

M

}+ η1βε

∑k∈S2

min

{k,

ε

M1

}.

(4.7)

Since Algorithm 3 ensures that fk is monotonically decreasing, together withAssumption 5.1, we have

∞∑k=1

(fk − fk+1) < ∞, (4.8)

so that

limk ∈ S1k → ∞

δk = 0, (4.9)

and


k = 0. (4.10)

On the other hand, since f (x) is twice continuously differentiable on L(x0),there exists a positive scalar M3 such that ∀x ∈ L(x0), ‖ ∇2f (x) ‖≤ M3.

When k ∈ S1, we obtain

|aredk − predk | = |f (xk)− f (xk + sk)+ gTk sk + 1

2sTk Bksk|

≤ ‖ sk ‖2∫ 1

0‖ Bk −G(xk + tsk) ‖ (1 − t)dt

≤ 1

2(M +M3)δ

2k , (4.11)

and for sufficiently large k

predk = ϕk(0)− ϕk(sk) ≥ βεmin{δk,

ε

M

}≥ βεδk. (4.12)

Numer Algor

It follows from (4.11) and (4.12) that

|ρk − 1| =∣∣∣∣aredk − predk

predk

∣∣∣∣ ≤ (M +M3)δk

2βε(4.13)

holds for sufficiently large k. We have from (4.9) and (4.13) that


ρk → 1. (4.14)

This suggests that ρk ≥ η2 for all sufficiently large k. By Algorithm 3, itimplies that {δk}, k ∈ S1 has a lower bound, which contradicts with (4.9).

When k ∈ S2, since k → 0, ‖ sk ‖→ 0. This, together with the bound-edness of {hk}, shows that 1/(1 − hTk sk) = 1 + O(‖ sk ‖). Then from theboundedness of gk and Ak we have

gTk sk

1 − hTk sk= gTk sk + O(‖ sk ‖2),

sTk Aksk

(1 − hTk sk)2

= sTk Aksk + o(‖ sk ‖2).

(4.15)Similarly to the case of k ∈ S1, we obtain

|aredk − predk| =∣∣∣∣∣f (xk)− f (xk + sk)+ gTk sk

1 − hTk sk+ 1

2

sTk Aksk

(1 − hTk sk)2

∣∣∣∣∣=

∣∣∣∣f (xk)− f (xk + sk)+ gTk sk + 1

2sTk Aksk + O(‖ sk ‖2)

∣∣∣∣≤ ‖ sk ‖2

∫ 1

0‖ Ak −G(xk + tsk) ‖ (1 − t)dt + O(‖ sk ‖2)

≤ 1

2(M1 +M3 + O(1))2

k.

(4.16)

and for sufficiently large k,

predk = ψk(0)− ψk(sk) ≥ βεmin

{k,

ε

M1

}≥ βεk. (4.17)

It follows from (4.16) and (4.17) that

|rk − 1| =∣∣∣∣aredk − predk

predk

∣∣∣∣ ≤ (M1 +M3 + O(1))k2βε

(4.18)

holds for sufficiently large k. We have from (4.10) and (4.18) that


rk → 1. (4.19)

This shows that rk ≥ η2 for all sufficiently large k. By Algorithm 3, it impliesthat {k}, k ∈ S2 has a lower bound, which contradicts with (4.10).

Numer Algor

(4) If all sets Si, (i = 1, 2, 3) are infinite.Define two subsets of S3: S31 = {k|(k − 1) ∈ S1, k ∈ S3} and

S32 = {k|(k − 1) ∈ S2, k ∈ S3}, then S3 = S31 ∪ S32. Similarly to theanalyzing in (3), we have

∞∑k=1

(fk−fk+1) =∑k∈S1

(fk−fk+1)+∑k∈S2

(fk−fk+1)+∑k∈S3

(fk−fk+1). (4.20)

The only deference between (4.20) and (4.7) is that (4.20) has a additionalterm. Thus, it suffices to consider the last term of equality (4.20), we havefrom (3.2), (4.4) and (4.6) that∑k∈S3

(fk − fk+1) =∑k∈S31

(fk − fk+1)+∑k∈S32

(fk − fk+1)

≥∑k∈S31

(−ναkgTk sk

)+

∑k∈S32

(−ναkgTk sk

)

≥ νβε

⎡⎣ ∑k∈S31

αk min{δk,

ε

M

}+

∑k∈S32

αk min

{k,

ε

M1

}⎤⎦

≥ νβε

⎡⎣ ∑k∈S31

αkδk +∑k∈S32

αkk

⎤⎦

≥ νβε

⎡⎣ ∑k∈S31

δk+1 +∑k∈S32

k+1

⎤⎦ , (4.21)

which yields that


δk+1 = 0, limk ∈ S32k → ∞

k+1 = 0. (4.22)

On the other hand, from Lemma 3.4 given in [6], we have for sufficiently largek ∈ S3, δk and k have a lower bound, which contradict with (4.22), givingthe result.

In order to explore the superlinear convergence, we give the followingassumptions.

Assumption 5.4

(1) The sequence {xk} generated by Algorithm 3 converges to a stationary point x∗.(2) The Hessian ∇2f (x∗) is positive definite.(3) The Hessian ∇2f (x) is Lipschitz continuous in a neighborhood of x∗.

Although the adaptive trust-region approach Algorithm 3 is different from the puretrust-region methods TTRQ and TTRC, since the modified method finally turns to

Numer Algor

the standard quasi-Newton method when the iterate is near the minimizer, and thelocal convergence conclusion is usual. So it is omitted here.

Theorem 4.2 Suppose that Assumptions 5.4 holds, Bk is the approximate Hessian ofquadratic or conic model, and

limk→∞

‖ (Bk − ∇2f (x∗))sk ‖‖ sk ‖ = 0. (4.23)

Then the sequence xk converges to x∗ Q-superlinearly.

5 Numerical experiments

In this section, we report some preliminary numerical experiments. We focus onproposing a new model-hybrid strategy, so we performed Algorithm TRQCL on alimited number of test problems that are all from the standard test problems [32].In order to compare it with the performance of other single-models, such as thequadratic model and the conic model, we use another algorithms, such as the puretrust region algorithm based on the quadratic model (TTRQ) and the conic model(TTRC), as well as the modified trust region algorithm that combined trust regionmethod with backtracking line search (TR+BT). These algorithms are the basis ofAlgorithm TRQCL. For all algorithms, we use the same parameters, i.e., δ0 = 0 =5, B0 = A0 = I , ε1 = 10−6, ε2 = 10−5, δ̄ = ̄ = 20, η1 = 0.25, η2 = 0.75,h0 = 0, γ1 = 0.5 and γ2 = 2. In addition, Bk,Ak update by BFGS formula.

All computations are carried out in MATLAB 6.5 on a PC computer with CPUPentium 4, 2.00GHz in double precision arithmetic, and all tests use the same stopingcriterion ‖ gk ‖2< ε1 or ‖ xk+1 − xk ‖2< ε2, i.e., when one of the two conditionsis satisfied, computation stop. The numerical results are presented in Tables 1 and2. The columns in these tables have the following meaning: Problem is the nameof test problem; N denotes the dimension of the test problem; NI denotes the totalnumber of iterations; FAIL denotes that the method fails to convergence within 1000iterations or there exits numerical overflow. From Tables 1 and 2, we can observe that,in most cases, Algorithm TRQCL performs better than Algorithm TTRQ, TTRC, andTR+BT.

From Table 1, we can observe that, the TRQCL algorithm performs much betterthan the others for 7 cases of test problems, while it does not perform as good asthe others for 4 cases of test problems; For the rest of 6 cases of problems, the per-formance of all compared algorithms is equally matched, and the difference is notsignificant.

From Table 2, we can see that, of all 14 cases of test problems, there are 9 cases ofproblems such that the TRQCL algorithm performs much better than the others, whileonly 2 cases of problems such that the TRQCL algorithm performs unsatisfactorily;For the rest of 3 cases of problems, the performance of all compared algorithms isvery close.

In order to have a better flavor of the results, we consider the performance profileof the different methods with respect to the number of iterations, by following the

Numer Algor

Table 1 Numerical results

Problem Var T TRQ T T RC TR + BT TRQCL

N Ni Ni Ni Ni

Rosenbrock 2 9 9 15 10

Freud-Roth 2 18 13 12 10

Beal 2 14 11 14 17

Wood 4 180 40 97 183

Biggs Exp6 6 92 55 50 42

Gaussian 3 4 4 4 4

Box 3-dimensional 3 42 29 38 32

Var. dimensioned 6 15 7 15 17

Watson 6 25 35 26 27

Penalty-I 10 20 180 20 32

Penalty-II 4 FAIL FAIL 966 9

Brown-Dennis 4 56 FAIL 32 42

Trigonometric 20 45 47 44 49

Extended Ros. 14 252 136 150 72

Extended Pow. 16 137 95 47 119

Powell singular 4 42 53 45 36

Bard 3 17 16 18 18

Table 2 Numerical results

Problem Var. TTRQ TTRC TR + BT TRQCL

N Ni Ni Ni Ni



Var. dimensioned 200 FAIL FAIL FAIL 24

Penalty-I 50 108 143 117 40

Penalty-I 100 218 130 114 46

Penalty-I 200 45 47 44 49

Penalty-II 50 252 130 114 70

Penalty-II 100 252 136 150 181



Extended Ros. 50 395 487 325 75

Extended Ros. 100 557 454 462 62

Extended Pow. 50 295 151 618 161

Extended Pow. 100 FAIL 227 272 244

Numer Algor

approach proposed in [33]. According to the notations given in [33], we have thatthe number of solvers is ns = 4 (the TRQCL, TTRQ, TTRC, and TR+BT) and thenumber of numerical experiments is np. By ip,s we denote the number of iterationsrequired to solve problem p by the solver s. The quantity

rp,s = ip,s

min{ip,s : s ∈ {TRQCL, T TRQ, T T RC, T RLS}}is called the performance ratio. Finally, the performance of the solver s is defined bythe following cumulative distribution function

ρs(τ ) = 1

npsize{p ∈ P : rp,s ≤ τ }

where τ ∈ � and P represent the set of problems. Thus two plots of performanceprofile are obtained in Figs. 1 and 2, where Fig. 1 shows the performance profilesof the four algorithms regarding the number of iterations runs in Table 1, and Fig. 2shows the performance profiles of them runs in Table 1 and 2.

It is clear from Fig. 1 and 2 that the TRQCL algorithm has better performancesthan others. Table 2 and Fig. 2 also suggest that the TRQCL algorithm may be effec-tive for solving large problems, but we will not discuss this here, since an efficientimplementation for large problems requires careful consideration and is the subjectof future research.

Fig. 1 Performance profiles regarding the number iterations

Numer Algor

Fig. 2 Performance profiles regarding the number iterations

In summary, our computational results show that the model-hybrid approach isfeasible and competitive.

In addition, we’d like to describe some limitations of the TRQCL algorithm forsolving unconstrained optimization problems though it is more reasonable in the-oretical. On the one hand, since two approximate models are used, it leads to thecomplexity in the scheme to some extent; On the other hand, to avoid possibly resolv-ing the trust region subproblems several times at each iteration, the scheme adoptsthe technique of combining the trust region methods and line-search methods, thus,in implementation the total number of function evaluations will increase inevitably.

6 Final remarks

In this paper, we have described a model-hybrid approach that can adaptively choosethe quadratic model or the conic model in trust region scheme. When the local prop-erty of objective function is more quadratic, then the algorithm performs the quadraticmodel. Otherwise, it performs the conic model. Moreover, in order to avoid solvingtrust region subproblem repeatedly, quasi-Newton method is also performed. We haveshown that, unlike the previous adaptive techniques that always use a single modelin the iteration from beginning to the end, the model-hybrid approach can make bestuse of the characteristics of the two models. The preliminary numerical tests showthe effectiveness.

Numer Algor

Acknowledgments The authors are grateful to the editors and the anonymous referees for their carefulreading of the original draft, and their detailed suggestions which improved the paper.

References

1. Deng, N.Y., Xiao, Y., Zhou, F.J.: Nonmonotonic trust region algorithm. J. Optim. Theory. Appl. 76(2),259–285 (1993)

2. Mo, J.T., Liu, C.Y., Yan, S.C.: A nonmonotone trust region method based on nonincreasing tech-nique of weighted average of the successive function values. J. Comput. Appl. Math. 209, 97–108(2007)

3. Gu, N.Z., Mo, J.T.: Incorporating nonmonotone strategies into the trust region method for uncon-strained optimization. Appl. Math. Comput. 55, 2158–2172 (2008)

4. Liu, J.H., Ma, C.F.: A nonmonotone trust region method with new inexact line search for uncon-strained optimization. Numer. Algorithm. (2012). doi:10.1007/s11075-012-9652-0

5. Zhao, X., Wang, X.Y.: A nonmonotone self-adaptive trust region algorithm with line search based onthe new conic model. J. Taiyuan Univ. Sci. Tech. 31(1), 68–71 (2010)

6. Nocedal, J., Yuan, Y.X.: Combining Trust-Region and Line-Search Techniques, Optimization Tech-nology Center mar OTC 98/04 (1998)

7. Gertz, E.M.: A quasi-Newton trust-region method. Math. Program 100, 447–470 (2004)8. Qu, S.J., Zhang, K.C., Wang, F.S.: A global optimization using linear relaxation for generalized

geometric programming. Eur. J. Oper. Res. 190, 345–356 (2008)9. Wang, F.S., Zhang, K.C.: A hybrid algorithm for nonlinear minimax problems. Ann. Oper. Res. 164,

167–191 (2008)10. Davidon, W.C.: Conic approximations and colinear scaling for optimizers. SIAM J. Numer. Anal.

17(2), 268–281 (1980)11. Sorensen, D.C.: The q-superlinear convergence of a colinear scaling algorithm for unconstrained

optimization. SIAM J. Numer. Anal. 17(1), 84–114 (1980)12. Ariyawansa, K.A.: Deriving collinear scaling algorithms as extension of quasi-Newton methods and

the local convergence of DFP and BFGS related collinear scaling algorithms. Math. Program 49, 23–48 (1990)

13. Sheng, S.: Interpolation by conic model for unconstrained optimization. Computing 54, 83–98 (1995)14. Sun, W.Y.: Nonquadratic model optimization methods. Asia Pac. J. Oper. Res. 13, 43–63 (1996)15. Gourgeon, H., Nocedal, J.: A conic algorithm for optimization. SIAM J. Sci. Comput. 6, 253–267

(1985)16. Di, S., Sun, W.Y.: A trust region method for conic model to solve unconstrained optimization. Optim.

Methods Softw. 6, 237–263 (1996)17. Xu, C.X., Yang, X.Y.: Convergence of conic quasi-Newton trust region menthods for unconstrained

minimization. Mathematica Applicata 11, 71–76 (1998)18. Fu, J.H., Sun, W.Y., Sampaio, R.J.B.: An adaptive approach of conic trust-region method for

unconstrained optimization problems. J. Appl. Math. Comput. 19, 165–177 (2005)19. Wang, J.Y., Ni, Q.: An algorithm for solving new trust region subproblem with conic model. Sci.

China Ser. A 51, 461–473 (2008)20. Grandinetti, L.: Some investigations in a new algorithm for nonlinear optimization based on conic

models of the objective function. J. Optim. Theory Appl. 43, 1–21 (1984)21. Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J.B., Mangasarian, O.L.,

Ritter, K. (eds.) Nonlinear Programming, pp. 31–66. Academic, New York (1970)22. Zhou, Q.Y., Zhang, C.: A new nonmonotone adaptive trust region method based on simple quadratic

models. J. Appl. Math. Comput. 40, 111–123 (2012)23. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. SIAM Publications, Philadelphia

(2000)24. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)25. Powell, M.J.D.: Convergence properties of a class of minimization algorithms. In: Mangasarian, O.L.,

Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming, vol. 2, p. 1C27. Academic, New York(1975)

http://dx.doi.org/10.1007/s11075-012-9652-0

Numer Algor

26. Dennis, J.E., Mei, H.H.W.: Two new unconstrained optimization algorithms which use function andgradient values. J. Optim. Theory Appl. 28, 453–482 (1979)

27. Shultz, G.A., Schnabel, R.B., Byrd, R.H.: A family of trust-region based algorithm for unconstrainedminimization with strong global convergence. SIAM J. Numer. Anal. 22, 47–67 (1985)

28. Byrd, R.H., Schnabel, R.B., Shultz, G.A.: Approximation solution of the trust region problem byminimization over two-dimensional subspace. Math. Program 40, 247–263 (1988)

29. Steihaug, T.: The conjugate gradient method and trust-regions in large scale optimization. SIAM J.Numer. Anal. 20, 626–627 (1983)

30. Toint, P.L.: Towards an efficient sparsity exploiting Newton method for minimization. In: Duff,I.S. (ed.) Sparse Matrices and Their Uses, pp. 57–88. Academic, London (1981)

31. More, J.J., Sorensen, D.C.: Computing a trust-region step. SIAM J. Sci. Stat. Comput. 4, 553–572(1983)

32. More, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans.Math. Softw. 7, 17–41 (1981)

33. Dolan, E.D., More, J.J.: Benchmarking optimization software with performance profiles. Math.Program 91, 201–213 (2002)

Documents

A model-hybrid approach for unconstrained optimization problems