A new three-term conjugate gradient algorithm for unconstrained optimization

Numer AlgorDOI 10.1007/s11075-014-9845-9

ORIGINAL PAPER

A new three-term conjugate gradient algorithmfor unconstrained optimization

Neculai Andrei

Received: 24 October 2013 / Accepted: 21 February 2014© Springer Science+Business Media New York 2014

Abstract A new three-term conjugate gradient algorithm which satisfies both thedescent condition and the conjugacy condition is presented. The algorithm is obtainedby minimization the one-parameter quadratic model of the objective function inwhich the symmetrical approximation of the Hessian matrix satisfies the generalquasi-Newton equation. The search direction is obtained by symmetrization of theiteration matrix corresponding to the solution of the quadratic model minimization.Using the general quasi-Newton equation the search direction includes a parameterwhich is determined by the minimization of the condition number of the iterationmatrix. It is proved that this direction satisfies both the conjugacy and the descentcondition. The new approximation of the minimum is obtained by the general Wolfeline search using by now a standard acceleration technique. Under standard assump-tions, for uniformly convex functions the global convergence of the algorithm isproved. The numerical experiments using 800 large-scale unconstrained optimizationtest problems show that minimization of the condition number of the iteration matrixlead us to a value of the parameter in the search direction able to define a competi-tive three-term conjugate gradient algorithm. Numerical comparisons of this variantof the algorithm versus known conjugate gradient algorithms ASCALCG, CON-MIN, TTCG and THREECG, as well as the limited memory quasi-Newton algorithmLBFGS (m = 5) and the truncated Newton TN show that our algorithm is indeedmore efficient and more robust.

N. Andrei (�)Research Institute for Informatics, Center for Advanced Modeling and Optimization,8-10, Averescu Avenue, Bucharest 1, Romaniae-mail: [email protected]

N. AndreiAcademy of Romanian Scientists, Splaiul Independenei Nr. 54, Sector 5, Bucharest, Romania

mailto:[email protected]

Numer Algor

Keywords Unconstrained optimization · Three-term conjugate gradient method ·Descent condition · Conjugacy condition · Spectral analysis · Numericalcomparisons

1 Introduction

For solving large-scale unconstrained optimization problems

minx∈Rn

f (x), (1.1)

where f : Rn → R is a continuously differentiable function, supposed to be boundedfrom below, the three-term conjugate gradient method is very well-known in litera-ture. Starting from an initial guess x0 ∈ Rn, the three-term conjugate gradient methodwith line search, generates a sequence {xk} as:

xk+1 = xk + αkdk, (1.2)

where αk > 0 is obtained by line search, and the directions dk are computed as:

dk+1 = −gk+1 + δksk + ηkyk, d0 = −g0. (1.3)

In (1.3) δk and ηk are known as the conjugate gradient parameters, sk = xk+1 − xk,

gk = ∇f (xk) and yk = gk+1 − gk . Observe that, the search direction dk+1 is com-puted as a linear combination of −gk+1, sk and yk . The line search in the conju-gate gradient algorithms is often based on the general Wolfe conditions [33, 34]:

f (xk + αkdk)− f (xk) ≤ ραkgTk dk, (1.4)

gTk+1dk ≥ σgTk dk, (1.5)

where dk is a descent direction and 0 < ρ ≤ σ < 1. However, for some conjugategradient algorithms stronger version of the Wolfe line search conditions, given by(1.4) and

∣∣∣g

Tk+1dk

∣∣∣ ≤ −σgTk dk, (1.6)

are needed to ensure the convergence and to enhance the stability.Different three-term conjugate gradient algorithms correspond to different choices

for the scalar parameters δk and ηk . The papers by Beale [9], McGuire and Wolfe[21], Deng and Li [14] and Dai and Yuan [13], Nazareth [26] Zhang, Zhou and Li [35,36], Zhang, Xiao and Wei [37], Al-Bayati and Sharif [1], Cheng [10], Narushima,Yabe and Ford [23], Andrei [6–8], Sugiki, Narushima and Yabe [32] present differentversions of three-term conjugate gradient algorithms together with their proper-ties, global convergence and numerical performances. All these three-term conjugategradient algorithms are obtained by modification of classical conjugate gradient algo-rithms to satisfy both the descent and the conjugacy conditions. Generally, thesethree-term conjugate gradient algorithms are more efficient and more robust thanclassical conjugate gradient algorithms.

In this paper we suggest another way to get three-term conjugate gradient algo-rithms by minimization of the one-parameter quadratic model of the function f. The

Numer Algor

idea is to consider the quadratic approximation of the function f in the current pointand to determine the search direction by minimization of this quadratic model. Itis assumed that the symmetrical approximation of the Hessian matrix satisfies thegeneral quasi-Newton equation which depends by a positive parameter. The searchdirection is obtained by modifying the iteration matrix corresponding to the solu-tion of the quadratic model minimization in order to be symmetric. The parameterin the search direction is determined by the minimization of the condition numberof this new iteration matrix. Section 2 presents this idea, as well as the spectralanalysis of the iteration matrix associated to this direction. In Section 3 the cor-responding three-term conjugate gradient TTDES algorithm is presented into thecontext of acceleration of the iterations. Section 4 is dedicated to the global con-vergence analysis of the algorithm. It is shown that under the standard assumptionsfor uniformly convex functions the search direction is bounded. Section 5 includesthe numerical results with TTDES and some comparisons versus known conjugategradient algorithms ASCALCG [2, 3], CONMIN [31], TTCG [7], THREECG [8],as well as versus LBFGS [19, 27] and TN [24], on a collection of 800 large-scaleunconstrained optimization test functions. It is shown that the three-term conjugategradient algorithm TTDES, corresponding to the minimization of the condition num-ber of the iteration matrix by the spectral analysis, is more efficient and more robustthan ASCALCG. Mea culpa and CONMIN conjugate gradient algorithms and it iscompetitive versus TTCG and THREECG three-term conjugate gradient algorithmsconsidered in this numerical study.

2 Minimization of the one-parameter quadratic model of the function f

Let us consider that at the k–th iteration an inexact Wolfe line search is executed, thatis the step-length αk satisfying (1.4) and (1.5) is computed. Therefore, the followingelements sk = xk+1 − xk and yk = gk+1 − gk can immediately be determined. Now,let us consider the quadratic approximate of function f in xk+1 as:

�k+1(d) = fk+1 + gTk+1d + 1

2dT Bk+1d, (2.1)

where Bk+1 is a symmetrical approximation of the Hessian ∇2f (xk+1) and d is thedirection which follows to be determined. The direction dk+1 is computed as:

dk+1 = −gk+1 + βksk, (2.2)

where the scalar βk is determined as solution of the following minimizing problem:

minβk∈R

�k+1(dk+1). (2.3)

Introducing dk+1 from (2.2) in the minimizing problem (2.3), then βk is obtained as

βk =gTk+1Bk+1sk − gTk+1sk

sTk Bk+1sk. (2.4)

Numer Algor

Now, suppose that the symmetrical matrix Bk+1 is an approximation of the Hes-sian matrix ∇2f (xk+1) such that Bk+1sk = ω−1yk , with ω �= 0, known as the generalquasi-Newton equation. Therefore, the parameter βk can be written as

βk =gTk+1yk − ωgTk+1sk

yTk sk. (2.5)

Therefore, the search direction dk+1 from (2.2), after a simple algebra becomes:

dk+1 = −gk+1 + skyTk − ωsks

Tk

yTk skgk+1. (2.6)

Now, using the idea of Perry [29] we can rewrite (2.6) as

dk+1 = −Qk+1gk+1, (2.7)

where

Qk+1 = I − skyTk

yTk sk+ ω

sksTk

yTk sk= I − sk(yk − ωsk)

T

yTk sk. (2.8)

Remark 1 It is worth saying that, as we know, the solution of the minimizationproblem (2.3) is the solution of the linear algebraic system of equations:

Bk+1d = −gk+1. (2.9)

Using the above approach, the solution of the linear algebraic system (2.9) can beexpressed in closed form as d = −Qk+1gk+1, where Qk+1 is defined by (2.8), whichas we can see is not a symmetric matrix.

Remark 2 From (2.6) the search direction can be expressed as:

dk+1 = −gk+1 + yTk gk+1

yTk sksk − ω

sTk gk+1

yTk sksk. (2.10)

There are a number of choices for the parameter ω in (2.10). For example if ω =yTk gk+1/s

Tk gk+1, then the steepest-descent method can be obtained. When ω = 0,

we get the Hestenes and Stiefel method [18]. If ω = 2 ‖yk‖2 /yTk sk , then the CG-DESCENT algorithm by Hager and Zhang [17] is obtained, where dk+1 = −gk+1 +βHZk sk , and

βHZk = 1

yTk sk

(

yk − 2‖yk‖2

yTk sksk

)T

gk+1. (2.11)

On the other hand, if ω is a bounded positive constant (ω = 0.1 for example), then theresulting method is that of Dai and Liao [12]. We see that (2.10) is a general formulafor the search direction calculation which covers a lot of known conjugate gradientalgorithms [11]. However, as we can notice the matrix Qk+1 in (2.8) determines acrude form of a quasi-Newton method which is not symmetric.

Numer Algor

Therefore, using Qk+1, let us slightly modify it and consider the followingsymmetric matrix:

Qk+1 = I − skyTk

yTk sk+ yks

Tk

yTk sk+ ω

sksTk

yTk sk. (2.12)

Using (2.12), the search direction we consider in our algorithm is:

dk+1 = −Qk+1gk+1 = −gk+1 + yTk gk+1 − ωsTk gk+1

yTk sksk − sTk gk+1

yTk skyk, (2.13)

which determines a three-term conjugate gradient algorithm.

Proposition 2.1 Consider ω > 0, and the step length αk in (1.2) is determined bythe Wolfe line search (1.4) and (1.5), then the search direction (2.13) satisfies the Daiand Liao conjugacy condition yTk dk+1 = −tk

(

sTk gk+1)

, where tk > 0.

Proof By direct computation we get:

yTk gk+1 = −(

ω + ‖yk‖2

yTk sk

)(

sTk gk+1

)

≡ tk

(

sTk gk+1

)

,

where tk = ω + ‖yk‖2

yTk sk. By Wolfe line search yTk sk > 0, therefore tk > 0.

Proposition 2.2 Consider ω > 0, and the step length αk in (1.2) is determined bythe Wolfe line search (1.4) and (1.5), then the search direction (2.13) satisfies thedescent condition gTk+1dk+1 ≤ 0.

Proof By direct computation we have:

gTk+1dk+1 = −‖gk+1‖2 − ω

(

sTk gk+1)2

yTk sk≤ 0,

since yTk sk > 0 by Wolfe line search (1.4) and (1.5).

The search direction (2.13) satisfies the Dai and Liao conjugacy condition and itis a descent direction. To define an algorithm the only problem we face is to specifya suitable value for the parameter ω. There are some possibilities. For example, for

ω = 1 + ‖yk‖2

yTk sk

our algorithm reduces to the three-term conjugate gradient algorithm THREECG [8].On the other hand, if

ω = 1 + 2‖yk‖2

yTk sk,

then we have the three-term conjugate gradient algorithm TTCG [7]. In this paper wedetermine ω by minimization of the condition number of the symmetric matrix Qk+1.

Numer Algor

The idea is to get a value for the parameter ω such that the search direction (2.13) iswell-conditioned, i.e. the condition number of the Qk+1 matrix is at minimum.

Theorem 2.1 Let Qk+1 be defined by (2.12). Then if ω > 0, then Qk+1 is a non-singular matrix and its eigenvalues consists of 1 (n-2 multiplicity), λ+k+1 and λ−k+1,where

λ+k+1 = 1

2

[

(ak + 2)+√

(ak + 2)2 − 4(ak + bk)]

(2.14)

λ−k+1 = 1

2

[

(ak + 2)−√

(ak + 2)2 − 4(ak + bk)]

, (2.15)

and

ak = ω‖sk‖2

yTk sk, bk = ‖sk‖2

yTk sk

‖yk‖2

yTk sk> 1. (2.16)

Proof Consider

Qk+1 = I − sk(yk − ωsk)T

yTk sk+ yks

Tk

yTk sk.

From the fundamental algebra formula

det(I + pqT + uvT ) = (1 + qT p)(1 + vT u)− (pT v)(qT u),

it follows that

det(Qk+1) = ω‖sk‖2

yTk sk+ ‖sk‖2

yTk sk

‖yk‖2

yTk sk= ak + bk.

Therefore, the matrix Qk+1 is nonsingular.Since for any ξ ∈ span{sk, yk}⊥ ⊂ Rn, Qk+1ξ = ξ , it follows that Qk+1

has the eigenvalue 1 of multiplicity n-2, corresponding to the eigenvectors ξ ∈span{sk, yk}⊥.

Now,

tr(Qk+1) = tr

[

I − skyTk

yTk sk+ yks

Tk

yTk sk+ ω

sksTk

yTk sk

]

= n+ ω‖sk‖2

yTk sk= n+ ak.

Therefore, by the relationships between the trace and the determinant of a matrix andits eigenvalues, it follows that the other two eigenvalues of Qk+1 are the roots of thefollowing quadratic polynomial

λ2 − (ak + 2)λ+ (ak + bk) = 0. (2.17)

Thus, the other two eigenvalues of the symmetric matrix Qk+1 are determined from(2.17) as (2.14) and (2.15), respectively. bk > 1 follows from the following inequality

yTk sk

‖sk‖2 ≤ ‖yk‖2

yTk sk.

Proposition 2.3 The Qk+1 matrix defined by (2.12) is a normal matrix.

Proof By direct computation we can prove that Qk+1QTk+1 = QT

k+1Qk+1.

Numer Algor

In order to have λ+k+1 and λ−k+1 as real eigenvalues, from (2.14) and (2.15) thefollowing condition must be fulfilled (ak + 2)2 − 4(ak + bk) ≥ 0, out of which thefollowing estimation of the parameter ω can be determined:

ω ≥ 2

‖sk‖2

√

‖sk‖2 ‖yk‖2 − (

yTk sk)2. (2.18)

Since bk > 1, it follows that the estimation of ω given in (2.18) is well defined(if ‖sk‖ �= 0). From (2.17) we have that

λ+k+1 + λ−k+1 = ak + 2 ≥ 0, (2.19)

λ+k+1λ−k+1 = ak + bk ≥ 0. (2.20)

Therefore, from (2.19) and (2.20) we have that both λ+k+1 and λ−k+1 are positive eigen-values. Since (ak + 2)2 − 4(ak + bk) ≥ 0, from (2.14) and (2.15) we have thatλ+k+1 ≥ λ−k+1.

By direct computation, from (2.14), using (2.18) we get

λ+k+1 ≥ 1 + √

bk − 1 > 1. (2.21)

A simple analysis of equation (2.17) shows that 1 (the eigenvalue of Qk+1) is notinto the interval

[

λ−k+1, λ+k+1

]

. Since both λ+k+1 and λ−k+1 are positive, λ+k+1 > 1 andλ+k+1 ≥ λ−k+1, it follows that 1 ≤ λ−k+1 ≤ λ+k+1. Therefore, the maximum eigenvalueof Qk+1 is λ+k+1 and its minimum eigenvalue is 1.

From Proposition 2.3, Qk+1 is a normal matrix. Therefore, the condition numberof Qk+1, k2(Qk+1), can be computed as in the following proposition.

Proposition 2.4 The condition number of the normal, symmetric matrix Qk+1 is

k2(Qk+1) =λ+k+1

1= 1

2

[

(ak + 2)+√

(ak + 2)2 − 4(ak + bk)]

. (2.22)

k2(Qk+1) gets its minimum√bk − 1 + 1, when ak = 2

√bk − 1.

Proof Observe that bk > 1. By direct computation the minimum of (2.22)is obtained for ak = 2

√bk − 1, for which k2(Qk+1) arrives at its minimum√

bk − 1 + 1.

According to the Proposition 2.4 when ak = 2√bk − 1 the condition number of

Qk+1 defined by (2.12) arrives at the minimum. Therefore, from (2.16) using thisequality, a suitable choice of parameter ω is

ω = 2

‖sk‖2

√

‖sk‖2 ‖yk‖2 − (

yTk sk)2. (2.23)

Since bk > 1, it follows that ω in (2.23) is well defined (if ‖sk‖ �= 0). This choiceof the parameter ω makes the condition number of Qk+1 approach its minimum.

Numer Algor

Therefore, in our algorithm, the search direction can be computed as in (2.13), whereω is defined in (2.23).

3 TTDES algorithm

In this section we present the algorithm TTDES where the search direction is givenby (2.13) as:

dk+1 = −gk+1 + δksk − ηkyk, (3.1)

where the parameters δk and ηk are computed as

δk = yTk gk+1 − ωsTk gk+1

yTk sk, ηk = sTk gk+1

yTk sk, (3.2)

respectively and ω is computed as in (2.23).We know that in conjugate gradient algorithms the search directions tend to be

poorly scaled, and as a consequence the line search must perform more functionevaluations in order to obtain a suitable steplength αk . Besides, in conjugate gradientmethods the step lengths computed by means of the Wolfe line search (1.4) and (1.5)may differ from 1 in a very unpredictable manner [28]. They can be larger or smallerthan 1 depending on how the problem is scaled. This is in very sharp contrast tothe Newton and quasi-Newton methods, including the limited memory quasi-Newtonmethods, which accept the unit steplength most of the time along the iterations, andtherefore usually they require only few function evaluations per search direction. Inthis section we take advantage of this behavior of conjugate gradient algorithms anduse an acceleration scheme we have presented in [5].

Basically the acceleration scheme modifies the step length αk in a multiplicativemanner to improve the reduction of the function values along the iterations. As in [5],in accelerated algorithm instead of (1.2) the new estimation of the minimum point iscomputed as

xk+1 = xk + ξkαkdk, (3.3)

where

ξk = − ak

bk, (3.4)

ak = αkgTk dk, bk = −αk(gk − gz)

T dk, gz = ∇f (z) and z = xk + αkdk. Hence,if bk > 0, then the new estimation of the solution is computed as xk+1 = xk+ξkαkdk,otherwise xk+1 = xk+αkdk . We have bk = αk (gz − gk)

T dk = αk(

dTk ∇2f (xk) dk)

,where xk is a point on the line segment connecting xk and z. Since αk > 0, it followsthat for convex functions bk ≥ 0. For uniformly convex functions, we can prove thelinear convergence of the acceleration scheme [5]. Therefore, taking into considera-tion this acceleration scheme and using the definitions of gk , sk and yk the followingthree-term conjugate gradient algorithm can be presented.

Numer Algor

TTDES algorithm

Step 1. Select a starting point x0 ∈ domf and compute:f0 = f (x0) and g0 = ∇f (x0). Select some positive valuesfor ρ and σ . Set d0 = −g0 and k = 0.

Step 2. Test a criterion for stopping the iterations. If the test issatisfied, then stop; otherwise continue with step 3.

Step 3. Determine the steplength αk by using the Wolfe line searchconditions (1.4) – (1.5).

Step 4. Compute: z = xk + αkdk , gz = ∇f (z) and yk = gk − gz.Step 5. Compute: ak = αkg

Tk dk , and bk = −αky

Tk dk .

Step 6. Acceleration scheme. If bk > 0, then compute ξk = −ak/bkand update the variables as xk+1 = xk + ξkαkdk, otherwiseupdate the variables as xk+1 = xk + αkdk .Compute fk+1 and gk+1. Compute yk = gk+1 − gkand sk = xk+1 − xk.

Step 7. Compute ω as in (2.23) and determine δkand ηk as in (3.2).

Step 8. Compute the search direction as: dk+1 = −gk+1+δksk − ηkyk.

Step 9. Powell restart criterion. If∣∣gTk+1gk

∣∣ > 0.2 ‖gk+1‖2

then set dk+1 = −gk+1.Step 10. Consider k = k + 1 and go to step 2. �

If f is bounded along the direction dk then there exists a stepsize αk satisfyingthe Wolfe line search conditions (1.4) and (1.5). In our algorithm when the Beale-Powell restart condition is satisfied, then we restart the algorithm with the negativegradient −gk+1. Under reasonable assumptions, the Wolfe line search conditions andthe Beale-Powell restart criterion are sufficient to prove the global convergence ofthe algorithm. The first trial of the step length crucially affects the practical behaviorof the algorithm. At every iteration k ≥ 1 the starting guess for the step αk in the linesearch is computed as αk−1 ‖dk−1‖ / ‖dk‖.

4 Convergence analysis

Assume that:

(i) The level set S = {x ∈ Rn : f (x) ≤ f (x0)} is bounded, i.e. there exists positiveconstant B > 0 such that for all x ∈ S, ‖x‖ ≤ B .

(ii) In a neighbourhood N of S the function f is continuously differentiable and itsgradient is Lipschitz continuous, i.e. there exists a constant L > 0 such that‖∇f (x)− ∇f (y)‖ ≤ L ‖x − y‖, for all x, y ∈ N .

Numer Algor

Under these assumptions on f there exists a constant � ≥ 0 such that ‖∇f (x)‖ ≤ �

for all x ∈ S. The assumption that the function f is bounded below is weaker thanthe usual assumption that the level set is bounded.

Although the search directions generated by (3.1), (3.2) and (2.23) are alwaysdescent directions, to ensure convergence of the algorithm we need to constrain thechoice of the step-length αk . The following proposition shows that the Wolfe linesearch always gives a lower bound for the step-length αk .

Proposition 4.1 Suppose that dk is a descent direction and the gradient ∇f satisfiesthe Lipschitz condition

‖∇f (x)−∇f (xk)‖ ≤ L ‖x − xk‖for all x on the line segment connecting xk and xk+1, where L is a positive constant.If the line search satisfies the Wolfe conditions (1.4) and (1.5), then

αk ≥ (1 − σ)∣∣gTk dk

∣∣

L ‖dk‖2. (4.1)

Proof Subtracting gTk dk from both sides of (1.5) and using the Lipschitz continuitywe get

(σ − 1)gTk dk ≤ (gk+1 − gk)T dk = yTk dk ≤ ‖yk‖ ‖dk‖ ≤ αkL ‖dk‖2 .

But, dk is a descent direction and σ < 1, therefore (4.1) follows immediately.

To prove the global convergence of nonlinear conjugate gradient algorithms, oftenthe Zoutendijk condition is used. Under the Wolfe line search this condition wasgiven by Wolfe [33, 34] and Zoutendijk [38]. The following proposition proves thatin the above three-term conjugate gradient method the Zoutendijk condition holdsunder the general Wolfe line search (1.4) and (1.5).

Proposition 4.2 Suppose that the assumptions (i) and (ii) hold. Consider the algo-rithm (1.2), (3.1), (3.2) and (2.23) where dk is a descent direction and αk is computedby the general Wolfe line search (1.4) and (1.5). Then

∞∑

k=0

(

gTk dk)2

‖dk‖2< +∞. (4.2)

Proof From (1.4) and Proposition 4.1 we get

fk − fk+1 ≥ −ραkgTk dk ≥ ρ

(1 − σ)(

gTk dk)2

L ‖dk‖2.

Therefore, from assumption (i) we get the Zoutendijk condition (4.2).

In conjugate gradient algorithms the iteration can fail, in the sense that ‖gk‖ ≥γ > 0 for all k, only if ‖dk‖ → ∞ sufficiently rapidly [30]. More exactly,the sequence of gradient norms ‖gk‖ can be bounded away from zero only if

Numer Algor

∑

k≥0 1/ ‖dk‖ < ∞. For any conjugate gradient method with strong Wolfe linesearch (1.4) and (1.6) the following general result holds [28].

Proposition 4.3 Suppose that the assumptions (i) and (ii) hold and consider anyconjugate gradient algorithm (1.2) where dk is a descent direction and αk is obtainedby the strong Wolfe line search (1.4) and (1.6). If

∑

k≥1

1

‖dk‖2= ∞, (4.3)

thenlim infk→∞ ‖gk‖ = 0. (4.4)

For uniformly convex functions we can prove that the norm of the direction dk+1generated by (3.1), (3.2) and (2.23) is bounded above. Therefore by Proposition 4.3we can prove the following result.

Theorem 4.1 Suppose that the assumptions (i) and (ii) hold and consider the algo-rithm (1.2) and (3.1)–(3.2) where ω is given by (2.23), dk is a descent direction andαk is computed by the strong Wolfe line search (1.4) and (1.6). Suppose that f is auniformly convex function on S, i.e. there exists a constant μ > 0 such that

(∇f (x)− ∇f (y))T (x − y) ≥ μ ‖x − y‖2 (4.5)

for all x, y ∈ N , thenlimk→∞

‖gk‖ = 0. (4.6)

Proof From Lipschitz continuity we have ‖yk‖ ≤ L ‖sk‖. On the other hand, fromuniform convexity and Cauchy inequality we have μ ‖sk‖2 ≤ yTk sk ≤ ‖yk‖ ‖sk‖,i.e. μ ‖sk‖ ≤ ‖yk‖. Therefore, for uniformly convex functions under the Wolfe linesearch it follows that L ≥ μ, if (‖sk‖ �= 0). Now, from (2.23) using the Cauchyinequality, the assumptions (i) and (ii) and the above inequalities we have

|ω| ≤ 2‖sk‖2

√

‖sk‖2 ‖yk‖2 − μ2 ‖sk‖4 = 2‖sk‖

√

‖yk‖2 − μ2 ‖sk‖2

≤ 2‖sk‖

√

L2 ‖sk‖2 − μ2 ‖sk‖2 = 2√

L2 − μ2.(4.7)

From (3.2) we have

|δk| ≤∣∣yTk gk+1

∣∣

∣∣yTk sk

∣∣

+ |ω|∣∣sTk gk+1

∣∣

∣∣yTk sk

∣∣

≤ �Lμ‖sk‖ + 2

√

L2 − μ2 �μ‖sk‖

= �μ

(

L+ 2√

L2 − μ2)

1‖sk‖ .

(4.8)

At the same time

|ηk| =∣∣sTk gk+1

∣∣

∣∣yTk sk

∣∣

≤ ‖sk‖ ‖gk+1‖μ ‖sk‖2

≤ �

μ ‖sk‖ . (4.9)

Therefore, using (4.8) and (4.9) in (3.1) we get

‖dk+1‖ ≤ ‖gk+1‖ + |δk| ‖sk‖ + |ηk| ‖yk‖ ≤ � + 2�

μ

(

L+√

L2 − μ2

)

, (4.10)

Numer Algor

showing that (4.3) is true. By Proposition 4.3 it follows that (4.4) is true, which foruniformly convex functions is equivalent to (4.6).

5 Numerical results and discussions

In this section we report some numerical results obtained with an implementation ofthe TTDES algorithm. The code is written in Fortran and compiled with f77 (defaultcompiler settings) on a Workstation Intel Pentium 4 with 1.8 GHz. TTDES like allcodes of algorithms considered in this numerical study use the loop unrolling to adepth of 5. We selected a number of 80 large-scale unconstrained optimization testfunctions in generalized or extended form we presented in [4]. For each test functionwe have taken ten numerical experiments with the number of variables increasing asn = 1000, 2000, ..., 10000. The algorithm implements the Wolfe line search condi-tions with ρ = 0.0001, σ = 0.8 and the same stopping criterion ‖gk‖∞ ≤ 10−6,where ‖.‖∞ is the maximum absolute component of a vector. In all the algorithmswe considered in this numerical study the maximum number of iterations is limitedto 10000.

The comparisons of algorithms are given in the following context. Let fALG1i and

f ALG2i be the optimal value found by ALG1 and ALG2, for problem i = 1, . . ., 800,

respectively. We say that, in the particular problem i, the performance of ALG1 wasbetter than the performance of ALG2 if:

∣∣∣f

ALG1i − f ALG2

i

∣∣∣ < 10−3 (5.1)

and the number of iterations (#iter), or the number of function-gradient evaluations(#fg), or the CPU time of ALG1 was less than the number of iterations, or thenumber of function-gradient evaluations, or the CPU time corresponding to ALG2,respectively.

In the first set of numerical experiments we compare TTDES versus ASCALCG[2, 3], CONMIN [31], TTCG [7] and THREECG [8]. ASCALCG is an acceleratedscaled conjugate gradient algorithm using a double update scheme embedded in therestart philosophy of Beale-Powell. The basic idea of ASCALCG is to combine thescaled memoryless BFGS method and the preconditioning technique in the frame ofconjugate gradient method. The preconditioner, which is also a scaled memorylessBFGS matrix is reset when the Beale-Powell restart criterion holds. The parame-ter scaling the gradient is selected as a spectral gradient. The search direction iscomputed as a double quasi-Newton updating scheme. CONMIN is a composite con-jugate gradient algorithm in which the same philosophy used in BFGS of modifyingthe negative gradient with a positive matrix which best estimates the inverse Hessianwithout adding anything to storage requirements is implemented at restarting, i.e.when the Beale-Powell restart criterion is satisfied. TTCG and THREECG are three-term conjugate gradient algorithms which mainly consist of a simple modification ofthe HS conjugate gradient algorithm satisfying both the descent and the conjugacyconditions. These properties are independent by the line search. The search direc-tion in these algorithms is given by (2.13), but the parameter ω is computed in a

Numer Algor

different manner. In THREECG ω = 1 + ‖yk‖2 /yTk sk . On the other hand in TTCGω = 1 + 2 ‖yk‖2 /yTk sk . Also, these algorithms could be considered as a simplemodification of the memoryless BFGS quasi-Newton method.

Figure 1 shows the Dolan and More [15] CPU performance profiles correspondingto TTDES versus these conjugate gradient algorithms.

In a performance profile plot, the top curve corresponds to the method that solvedthe most problems in a time that was within a given factor of the best time. Thepercentage of the test problems for which a method is the fastest is given on the leftaxis of the plot. The right side of the plot gives the percentage of the test problemsthat were successfully solved by these algorithms, respectively. Mainly, the right sideis a measure of the robustness of an algorithm.

From Fig. 1, for example comparing TTDES versus ASCALCG, we see that, sub-ject to the number of iterations, TTDES was better in 124 problems (i.e. it achievedthe minimum number of iterations in 124 problems). ASCALCG was better in 508problems and they achieved the same number of iterations in 78 problems, etc. Outof 800 problems, only for 710 problems does the criterion (5.1) hold, etc.

From Fig. 1 we see that TTDES is more efficient than all the conjugate gradi-ent algorithms considered in this numerical study and in case of ASCALCG andCONMIN the differences are significant. Both ASCALCG and CONMIN belong tothe same family of conjugate gradient algorithms based on the Perry’s idea recently

Fig. 1 TTDES versus ASCALCG, TTCG, CONMIN and THREECG

Numer Algor

consolidated by Liu and Xu [20]. In [20] intensive numerical experiments with thesymmetric Perry conjugate gradient algorithm (SPDOC) and the Perry conjugate gra-dient algorithm with restarting procedures (SPDOCCG) proved that these algorithmsare competitive to SCALCG. (SCALCG is the un-accelerated variant of ASCALCG.)It is worth saying that TTDES is very close both to TTCG and THREECG. All thesealgorithms (TTCG, THREECG and TTDES) use the same search direction (2.13),but with different formula for computation of the parameter ω. Notice that the spec-tral analysis of the iteration matrix Qk+1 given by (2.12) doesn’t lead us to a value ofω able to define clearly a more efficient and more robust three-term conjugate gradi-ent algorithm. The same is true for the family of symmetric Perry conjugate gradientalgorithms presented by Liu and Xu [20].

Besides the conjugate gradient methods, for large-scale unconstrained optimiza-tion, two other methods can be successfully tried: the limited memory BFGS method(L-BFGS) and the discrete truncated-Newton method (TN). Both these methods usea low and predictable amount of storage, requiring only the function and its gradientvalues at each iterate. Both methods have been intensive tested on large problems ofdifferent types and their performance appears to be satisfactory [25]. Therefore, inthe second set of numerical experiments we compare TTDES versus LBFG (m = 5)[19, 27] and TN [24].

L-BFGS is an adaptation of the BFGS method for solving large-scale problems. InBFGS method the search direction is computed as dk+1 = −Hk+1gk+1, where Hk+1is an approximation to the inverse Hessian matrix of f, updated as

Hk+1 = V Tk HkVk + sks

Tk

yTk sk,

and Vk = I − (

yksTk

)

/(

yTk sk)

. In L-BFGS method, instead of forming the matricesHk , a number of m vectors sk and yk that define them implicitly are saved, as isdescribed in [19]. The numerical experience with L-BFGS method, reported in [16],indicates that values of m in the range 3 ≤ m ≤ 7 give the best results. Therefore, inthis paper we consider the value m = 5. The line-search is performed by means ofthe routine CVSRCH by More and Thuente [22] which uses cubic interpolation.

The TN method is described by Nash [24]. At each outer iteration of the TNmethod, for determination of the search direction dk an approximate solution ofthe Newton equations ∇2f (xk)dk = −gk is found using a number of inner itera-tions based on a preconditioned linear conjugate gradient method. The matrix-vectorproducts required by the inner conjugate gradient algorithm are computed by finitedifferencing, i.e. the Hessian matrix is not explicitly computed. Besides, the conju-gate gradient inner iteration is preconditioned by a scaled two-step limited memoryBFGS method with Powell’s restarting strategy used to reset the preconditionerperiodically. In TN the line search is performed using the strong Wolfe conditions.

Figure 2 presents the Dolan and More CPU performance profiles of TTDES versusL-BFGS (m = 5) and TN, respectively.

We see that this variant of the three-term conjugate gradient algorithm is way moreefficient and more robust than LBFGS (m = 5) and TN algorithms. The differencesbetween TTDS on one side and LBFGS and TN on the other one are significant.

Numer Algor

Fig. 2 TTDES versus LBFGS (m = 5) and TN

Both LBFGS and TN are based on the quasi-Newton concepts generating a relativelycomplicated search direction. On the other hand, the search direction in TTDES isvery simple, easy to be implemented and efficient for solving a large class of largescale unconstrained optimization problems.

6 Conclusion

In this paper we have presented another three-term conjugate gradient algorithmbased on a new approach which considers the minimization of the quadratic approx-imation of the function f and the modification of the corresponding iteration matrixin order to be symmetric. The search direction is determined by the generalizedquasi-Newton equation which introduces a positive parameter ω. Therefore, in thisapproach we get a family of three-term conjugate gradient algorithms dependingof a positive parameter. In particular, for some values of this parameter we get theknown three-term conjugate gradient algorithms THREECG [8] and TTCG [7]. Inthis paper we suggest another value for this parameter based on the spectral analysisof the iteration matrix. For uniformly convex functions, under standard assump-tions, the algorithm is global convergent. The numerical experiments on a collectionof 800 large-scale unconstrained optimization test functions prove that the TTDESalgorithm with parameter ω determined by minimizing the condition number ofthe iteration matrix is more efficient than the known conjugate gradient algorithmsASCALCG and CONMIN and is competitive with TTCG and THREECG three-term conjugate gradient algorithms. Using the idea presented in this paper it is worthdeveloping new and more effective conjugate gradient algorithms for large-scaleunconstrained optimization.

References

1. Al-Bayati, A.Y., Sharif, W.H.: A new three-term conjugate gradient method for unconstrainedoptimization. Can. J. Sci. Eng. Math. 1(5), 108–124 (2010)

Numer Algor

2. Andrei, N.: Scaled conjugate gradient algorithms for unconstrained optimization. Comput. Optim.Appl. 38, 401–416 (2007)

3. Andrei, N.: Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrainedoptimization. Optimi. Methods Softw. 22, 561–571 (2007)

4. Andrei, N.: Another Collection of Large-scale Unconstrained Optimization Test Functions. ICITechnical Report, January 30 (2013)

5. Andrei, N.: Acceleration of conjugate gradient algorithms for unconstrained optimization. Appl.Math. Comput. 213, 361–369 (2009)

6. Andrei, N.: A modified Polak-Ribiere-Polyak conjugate gradient algorithm for unconstrained opti-mization. Optimization 60, 1457–1471 (2011)

7. Andrei, N.: On three-term conjugate gradient algorithms for unconstrained optimization. Appl. Math.Comput. 219, 6316–6327 (2013)

8. Andrei, N.: A simple three-term conjugate gradient algorithm for unconstrained optimization. J.Comput. Appl. Math. 241, 19–29 (2013)

9. Beale, E.M.L.: A derivative of conjugate gradients In: Lootsma, F.A (ed.) Numerical Methods forNonlinear Optimization, pp. 39–43. Academic, London (1972)

10. Cheng, W.: A two-term PRP-based descent method. Numer. Funct. Anal. Optim. 28, 1217–1230(2007)

11. Dai, Y.H., Kou, C.X.: A nonlinear conjugate gradient algorithm with an optimal property and animproved Wolfe line search. SIAM J. Optim. 23(1), 296–320 (2013)

12. Dai, Y.H., Liao, L.Z.: New conjugate conditions and related nonlinear conjugate gradient methods.Appl. Math. Optim. 43, 87–101 (2001)

13. Dai, Y.H., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property.SIAM J. Optim. 10, 177–182 (1999)

14. Deng, N.Y., Li, Z.: Global convergence of three terms conjugate gradient methods. Optim. MethodSoftw. 4, 273–282 (1995)

15. Dolan, E.D., More, J.J.: Benchmarking optimization software with performance profiles. Math.Program. 91, 201–213 (2002)

16. Gilbert, J.C., Lemarechal, C.: Some numerical experiments with variable storage quasi-Newtonalgorithm. Math. Program. 45, 407–435 (1989)

17. Hager, W.W., Zhang, H.: A new conjugate gradient method with guaranteed descent and an efficientline search. SIAM J. Optim. 16, 170–192 (2005)

18. Hestenes, M.R., Stiefel, E.L.: Methods of conjugate gradients for solving linear systems. J. Res. Nat.Bur. Stand. 49, 409–436 (1952)

19. Liu, D.C., Nocedal, J.: On the limited BFGS method for large scale optimization. Math. Program. 45,503–528 (1989)

20. Liu, D., Xu, G.: Symmetric Perry conjugate gradient method. Comput. Optim. Appl. 56, 317–341(2013)

21. McGuire, M.F., Wolfe, P.: Evaluating a Restart Procedure for Conjugate Gradients. Report RC-4382,IBM Research Center, Yorktown Heights (1973)

22. More, J.J., Thuente, D.J.: On Linesearch Algorithms with Guaranteed Sufficient Decrease. Math-ematics and Computer Science Division Preprint MCS-P153-0590. Argonne National Laboratory,Argonne, IL (1990)

23. Narushima, Y., Yabe, H., Ford, J.A.: A three-term conjugate gradient method with sufficient descentproperty for unconstrained optimization. SIAM J. Optim. 21, 212–230 (2011)

24. Nash, S.G.: User’s Guide for TN-TNBC: Fortran Routines for Nonlinear Optimization. Report 397,Mathematical Sciences Department, The John Hopkins University, Baltimore

25. Nash, S.G., Nocedal, J.: A numerical study of the limited memory BFGS method and the truncated-Newton method for large scale optimization. SIAM J. Optim. 1, 358–372 (1991)

26. Nazareth, L.: A conjugate direction algorithm without line search. J. Optim. Theor. Appl. 23, 373–387 (1977)

27. Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comp. 35, 773–782 (1980)28. Nocedal, J.: Conjugate gradient methods and nonlinear optimization. In: Adams, L., Nazareth, J.L.

(eds.) Linear and Nonlinear Conjugate Gradient Related Methods, pp. 9-23. SIAM (1996)29. Perry, A.: A modified conjugate gradient algorithm. Oper. Res. Tech. Notes 26(6), 1073–1078 (1978)

Numer Algor

30. Powell, M.J.D.: Nonconvex minimization calculations and the conjugate gradient method. NumericalAnalysis (Dundee, 1983), Lecture Notes in Mathematics, vol. 1066, pp. 122-141. Springer, Berlin(1984)

31. Shanno, D.F., Phua, K.H.: Algorithm 500, Minimization of unconstrained multivariate functions.ACM Trans. Math. Soft. 2, 87–94 (1976)

32. Sugiki, K., Narushima, Y., Yabe, H.: Globally convergent three-term conjugate gradient methods thatuse secant conditions and generate descent search directions for unconstrained optimization. J. Optim.Theory Appl. 153(3), 733–757 (2012)

33. Wolfe, P.: Convergence conditions for ascent methods. SIAM Rev. 11, 226–235 (1968)34. Wolfe, P.: Convergence conditions for ascent methods, (II): some corrections. SIAM Rev. 13, 185–

188 (1971)35. Zhang, L., Zhou, W., Li, D.H.: A descent modified Polak-Ribiere-Polyak conjugate gradient method

and its global convergence. IMA J. Numer. Anal. 26, 629–640 (2006)36. Zhang, L., Zhou, W., Li, D.H.: Some descent three-term conjugate gradient methods and their global

convergence. Optim. Methods Softw. 22, 697–711 (2007)37. Zhang, J., Xiao, Y., Wei, Z.: Nonlinear conjugate gradient methods with sufficient descent condi-

tion for large-scale unconstrained optimization. Math. Probl. Eng. 2009 (2009). Article ID 243290doi:10.1155/2009/243290

38. Zoutendijk, G.: Nonlinear programming, computational methods. In: Abadie, J. (ed.) Integer andNonlinear Programming, pp. 38-86. North-Holland, Amsterdam (1970)

http://dx.doi.org/10.1155/2009/243290

Documents

A new three-term conjugate gradient algorithm for unconstrained optimization