Higher order derivatives in optimization methods

This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich]On: 10 July 2014, At: 09:15Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Optimization: A Journal of Mathematical Programmingand Operations ResearchPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gopt20

Higher order derivatives in optimization methodsG. Corradi aa University of Rome “La Sapienza”, Faculty of Economics , Via Del Castro Laurenziano, 9,Roma, 00161, ItalyPublished online: 20 Mar 2007.

To cite this article: G. Corradi (1996) Higher order derivatives in optimization methods, Optimization: A Journal ofMathematical Programming and Operations Research, 37:1, 41-49, DOI: 10.1080/02331939608844195

To link to this article: http://dx.doi.org/10.1080/02331939608844195

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gopt20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/02331939608844195

http://dx.doi.org/10.1080/02331939608844195

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Optimization, 1996, Vol. 37, pp. 41-49 0 1996 OPA (Overseas Publishers Association) Reprints available directly from the publisher Amsterdam B.V. Published in The Netherlands under Photocopying permitted by license only license by Gordon and Breach Science Publishers SA

Printed in Malaysia

HIGHER ORDER DERIVATIVES IN OPTIMIZATION METHODS

G. CORRADI

University of Rome "La Sapienza", Faculty of Economics, Via Del Castro Laurenziano, 9 00161 Roma, Italy

(Received 31 March 1995; in final form 24 October 1995)

A method for unconstrained optimization which makes use of higher derivatives is presented. A convergence analysis is considered too. The rate of convergence is shown to be superlinear. Numerical results are reported.

KEY WORDS: Higher order derivatives, optimization methods, superlinear convergence.

Mathematics Subject Classification 1991: Primary: 49M30; Secondary: 65K05.

1. INTRODUCTION

In this paper we consider the following problem: given f :Rn+ R' find f such that

f (5) = min {f ( z ) I zeRn) (1)

Afterwards we denote by 1.1 the Euclidean norm on Rn or a norm on a convenient space, the symbol (., .) denotes the Euclidean scalar product and f '(.) denotes the r-th derivative off (.). In this paper we present a method for solving problem (1) based on a higher order approximation of gradient Vf (.) off (.). Most of the known iterative algorithms for solving problem (1) construct a sequence of points z i € R n such that z,,, = z, + Aihi where AieR1 and hi€Rn. The standard methods consider a second order approximation off (.) for which

f (z , + h) - f (z ,) + f '(z,) h + (112) f "(zi)hh

where H(.) is the hessian matrix o f f (.), and compute a direction hi which is a solution of the equation Vf ( z , + h) - Vf (z ,) + H(z , )h = 0, or Vf (z,) + Bih = 0 where B, is a some approximation to H(z,).

2. MOTIVATIONS

In this paper we consider a higher order approximation of Vf(.) for which we compute the direction h solving the equation

4 1

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 09:

15 1

0 Ju

ly 2

014

42 G. CORRADI

where B, is a some approximation to H ( z i ) and W i h is a some approximation to (aH(z,)/az)h. In (2) (dH(z , ) /dz)y is such that

for every yeRn. Hence (dH(z , ) /az)y is an n x n matrix.

Remark 2.1: Note that if g(.) = V f (s), then

Examining the above relation component by component, we obtain for k = 1, n

where Hk(. ) is the Hessian matrix of gk(.) and hence we see that (aH(z) /az)h is an n x n matrix whose k-th row is [ H k ( z ) h l T .

Remark 2.2: By remark 2.1 it follows that we can choose the matrix W i h, that approximates (aH(zi ) /dz)h , so that the k-th row of W i h is [B:hlT where Bf is a some approximation to Hk(zi ) . For example B: can be computed making use of a rank two formula.

3. ON THE SOLUTION O F THE EQUATION G(z, h) = 0

We set

G(z, h) = V f ( z ) + H(z)h + (1/2)(dH(z)/dz)hh = 0 (4)

and proceed to examine the solution of (4). To this end we make use of the following theorem (implicit function theorem).

Theorem 3.1: Let S be an open subset of Rm+" and g:S -, Rn be a function such that for some p 2 0 g € C P (of class CP) over S and assume that ayg(x , y) (the Jacobian matrix of g with respect to y) exists and is continuous on S. Let ( 2 , j ) ~ S be a vector such that g(X,J) = 0 and the matrix a,g(X,j) is nonsingular. Then there exist scalars E > 0,6 > 0 and a function cf,:S(X, E ) + S ( J , 6) (we denote by S(z, p ) a neighborhood of z) such that cf,€CP over S(X, E), j = a(?) and g[x , cf,(x)] = 0 for all x ~ S ( 2 , &).The function cf, is unique in the sense that if XES(X,E) , y ~ S ( j , 6) and g(x , y) = 0 , then y = cf,(x). Furthermore, i f p 2 1 then for all xeS(X, E ) D

ownl

oade

d by

[U

ZH

Hau

ptbi

blio

thek

/ Z

entr

albi

blio

thek

Zür

ich]

at 0

9:15

10

July

201

4

HIGHER ORDER DERIVATIVES 43

Theorem 3.2: Let i e R n be a point such that V f (5) = 0. Assume that the matrix H(Z) + ( a H ( f ) / a z ) h is nonsingular for every heRn. Then a solution of (4) is (2, h) and h = 0. h = 0 is the unique solution of the system

[H(Z) + (aH(Z) /az )h]h = 0.

Furthermore if (2, h ) is a solution of (4) and d,G(i, h ) = H(5) + ( a H ( l ) / a z ) h is nonsingular, then for any z ~ S ( 5 , E ) the equation G(z, h) = 0 has an unique solution.

Proof: Since V f (2 ) = 0 , then (4) reduces to

[H(Z) + (dH(Z) /dz )h]h = 0.

Therefore h = 0 is solution of (6). We now assume that j # 0 is a solution of (6). Since the matrix H(Z) + (aH(Z) /az )h is non singular for all h, then the system [H(Z) + ( a H ( Z ) / a z ) j ] x = 0 has the unique solution x = 0 , hence cannot exist j # 0 such that [H(Z) + ( a H ( 5 ) / a z ) j ] j = O . By theorem 3.1, it follows that if (Z,@ is a solution of (4) and B,G(i7 6) is nonsingular, then there exist h = @(z), S ( ~ , E ) , S (6 , 6 ) such that @:S(Z, E ) -+ S(6 , 6 ) and K= @(Z), G(z, @(z)) = 0. Therefore for any zeS(Z, E),

the equation G(z, h) = 0 has the unique solution h = @(z)eS(6,6).

Remark 3.1: Note that the solution of (4) for any fixed z may be obtained using the Newton direction s = - a,G(z, h) - 'G(z , h) where d,G(z, h) = H(z) + (dH(z) /dz)h . We have used this direction for our numerical results. We note that a possible choice for h,, an initial approximation for solving (4), is h , = - H(z) - 'V f ( z ) which repre- sents an approximation to the solution of (4), or, if we consider equation (3) at z, then h , = - B- 'V f ( z ) where B is a some approximation of the Hessian marix off .

4. AN ALGORITHM FOR SOLVING PROBLEM (1)

Remark 4.1: Note that by above discussion it follows that if we consider the equation V f ( z ) = 0 , then we can obtain the solution if we consider the following process

zi+' = z i + hi (7)

where hi is a solution of the equation

G(zi, h) = V f (z,) + H(zi)h + (1/2)(dH(zi)/az)hh = 0.

Note that process (7) is an extension of the Newton method.In fact we now prove the following theorem:

Theorem 4.1: Let V f (5) = 0, K= 0 be. Then there exists a neighborhood S'(5, p) of 5 such that if z 1 e S f ( i , p ) the iterates given by (7) remain in S 1 ( i , p ) and converge to 5, moreover the convergence is superlinear, hence

l i m ) z i + , - Z J / J z i - i l =O. i - cc

Proof: By theorem 3.2 there exists @: S(Z, E ) + S(K, 6 ) such that G(z, @(z)) = 0 and h = @(z). If we set Q(z) = z + @(z) we have Q ( i ) = i + @(5) = i + 0 = Z, hence f is a

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 09:

15 1

0 Ju

ly 2

014

44 G. CORRADI

fixed point of Q(.), moreover by theorem 3.1 we obtain

@'(a) = - [a, G(F, I ) ] - a, ~ ( z , 6 )

It follows that Q'(Z) = I - I = 0 . On the other hand from process (7) it follows that - zi + @(zi) = Q(z,). The results follow from theorem 10.1.6 of Ortega-Rheim- Z i + 1 -

boldt Ref. 5. p. 303. We now make use of the above discussion and present a new algorithm for

solving problem (1). The algorithm makes use of derivates of higher order.

A L G O R I T H M 4.1

Stepl: Select Z , E R ~ B ~ E R " ~ " , B ~ E R " ~ " , = l , n , H 1 ~ R n X n . Comment: Set g(.) = Vf (.). Note that if make use of (9), then we consider two sequences of matrices {B, ) and { W i h ) . B , is a some approximation to H(z,), the Hessian matrix of f , Wi h (see remark 2.2) is a some approximation to (aH(z , ) /az)h and the k-th row of h is [B: h l T where Bf is a some approximation to Hk(z i ) , the Hessian matrix of gk(.). In step 1. B , is an initialization of {B , ) and B t k = 1, n is, for every k, an initialization of {B:) .

Step 2: Set i = 1

Step 3: Compute Vf (z ,)

Step 4: If Vf (2,) = 0 stop, else go to step 5.

Step 5: Compute a solution 6 , if there exists, of the following equation

Vf (2,) + H(z i )h + (1 /2) (aH(z i ) /az)hh = 0.

or

Vf (2,) + B , h + (112) W i hh = 0

If (Ki,Vf (2,)) < 0 we set hi = 6, - piVf (2,) where pi 2 0 is to define.

Step 6: If equaion (8) or (9) has no solution or ( K t , Vf ( z , ) ) 2 0, then we set

Comment: for our numerical results we make use of (9).

Compute A, > 0 such that

Step8: Set z,+, =z i+Aihi

Step 9: Compute Bi + ,, H i + ,, Bf, ,, k = 1, n.

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 09:

15 1

0 Ju

ly 2

014

HIGHER ORDER DERIVATIVES 45

Comment: We note that the matrices B,, , and Bf,, are computed only if in step 5. We make use of (9).

Step 10: Set i = i + 1 and go to step 3.

Theorem 4.2 (convergence): Consider algorithm 4.1 where we compute hi from step 5. using equation (9) or step 6.. Assume that if there exists Ki then (B, + (1/2)Wi&) is nonsingular and I(B, + (112) WiKi)-' I < c,. Further we assume Hi > 0 (positive definite) and (Hi( < c,. Then either algorithm 4.1 constructs a finite sequence (zi) whose last element is a critical point off (.) or else the algorithm constructs an infinite sequence (2,) and every accumulation point of (2,) is a critical point off (.).

Proof: The first part of the theorem is trivial since the algorithm stops at the point zi in which Vf (2,) = 0. For the second part of the theorem we need to show that

- (hi, Vf (zi)) 2 p(zi)Ihi / I Vf (zi)I (10)

where p: Rn + R' is a continuous function and if A is the set of critical points off (e),

then 0 < p(z) < 1 if z$A, 0 < p(z) < 1 if ZEA. Condition (10) follows from sufficient condition (9) of theorem (8) Ref. 6. p. 46, where the constant p has been replaced by the function p(.) defined above. We only note that the conclusion of theorem (8) still remain valid. From (9) we have (in a similar way for (8)) Ki = -(Bi + (112) Wi tii)-' Vf (2,) for which from step 5.

hi = - (Bi + (112) Wi Ki)- ' Vf (2,) - piVf (2,)

= - [(B, + (1/2)Wi & ) - I + p,I]Vf (2,).

It follows that

Ihil < IC(Bi+(1/2)Wi&)-' +~iIlll vf(ziI

< ( ~ 1 + pi) I Vf (zi) I.

On the other hand

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 09:

15 1

0 Ju

ly 2

014

46

Further

G. CORRADI

hence

From (11) and (12) it follows that for every i

where c, = max{c,,c,). If we compare (13) and (10) then the result follows if we select a convenient pi. An

appropriate choice is pi = const > 0 or pi = const. IVf (z,)] and const is a positive constant.

We now prove that if Ai is chosen by the Armijo rule so that step 7. of algorithm 4.1 is replaced by

Step 7': Compute Ai = pj where PE(O, 1) and j is the first nonnegative integer for which

where UE(O, I), then the sequence {zi} converges superlinearly.

Remark 4.2: Note that if 5 is a point such that Vf (5) = 0, then a solution of (9) corresponding to I is K = 0. It follows that, if z~S(i ;6) , then we can assume that equation (9) has a solution h€S(O,2).

If we consider equation (8) remark 4.2 follows directly from theorem 3.2.

Remark 4.3: Let Vf (I) = 0 be. Since for i > i, z,€S(Z, 4, then from remark (4.2) there exists & such that 6, = - (B, + (1/2)W,6,)-'Vf (2,). Since l(Bi + (1/2)Wi tii)- < c, it follows that Tii + O as i - t co. So we can assume that Wihi +O. Since

it follows that

for which, if Bi > 0 for every i, there exists i, such that for every i > i, (Vf (z,), 6) < 0. So the conditions of step 5. of algorithm 4.1 are verified.

Theorem 4.3 (superlinear convergence): Consider algorithm 4.1 where we compute hi from step 5. making use of equation (9). Assume that zi -+ I, Vf (I) = 0, H ( f ) > 0 and that zi # I for all i. Assume further that Vf (z ,) # 0 for all i and that pi -0 as i -t co and D

ownl

oade

d by

[U

ZH

Hau

ptbi

blio

thek

/ Z

entr

albi

blio

thek

Zür

ich]

at 0

9:15

10

July

201

4

HIGHER ORDER DERIVATIVES

where H i = B; I . Then

Proof: From remark 4.3 there exists i, such that for every i > i, algorithm (4.1) constructs a sequence of points

zi+l = zi + Aihi = zi - Ai[(Bi + (1/2)&Ei)-' + p i I ]V f ( z i )

= zi - l iD iV f ( z i )

where

Di = (Bi + (112) WiKi)- + piI .

The proof now follows making use of standard results. (Bertsekas 1, proposition 1.15).

5. NUMERICAL RESULTS

For our numerical results all the computations were done using double precision arithmetic and all gradients were obtained analytically. The termination criterion is IVf 1 < E where E = for problems 1. and 2. and E = for problem 3.. The details of the functions are given below

1. f ( z ) = l0O(z2 - (z1)2)2 + (1 - 2')'

Zin = ( - 1.2, I ) , z,, = ( 1 , l )

2. f ( z ) = (z' + 10z2)' + 5(z3 - z4)2 + (z2 - 2 ~ ~ ) ~ + 10(zl - z4)4 zin = (3, - 4 0 , I ) , zott = (1, 1, 1 , l )

3. f ( z ) = 100(z2 - ( z ' ) ~ ) ~ + (1 - 2')' + 90(z4 - ( z ~ ) ~ ) ~ + (1 - z3)2 + + I O . ~ ( Z ~ - 112 + 10.1(~4 - 1)' + 19.8(z2 - 1 ) ( ~ 4 - 1) zin = ( - 3, - 1, - 3, - I ) , zott = (1, 1, 1, l) .

Remark 5.1: Note that in steps 5. and 6. of algorithm 4.1 we set pi = const > 0. For problem 1. const = for problem 2. const = for problem 3. const =

Remark5.2: We note that in step 9. of algorithm 4.1, when we compute Bi+ ,, B!,, and H i + , we make use of following updating formulas (Powell [7]).

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 09:

15 1

0 Ju

ly 2

014

48 G. CORRADI

and 8 = 1 if

( A g , Az , ) 2 0.2 (Az , , B iAz i )

else

8 = 0.8 (Age BiAz i ) / ( (Az i , B iAz i ) - ( A z , Ag,)).

In similar way for Hi+ ,. For B:+, k = 1, n we set 8 = 1 for which q: = Agf k = 1, n.

Remark 5.3: For our numerical results we note that the step-length in step 7. of algorithm 4.1 is obtained making use of a method introduced by Powell in Ref. [7]. In this method we built a sequence {%,)i > 1 and A, = 1. For i > 1 we set A, = max ( 6 . 4 - ,, 2 ) where 2 minimizes the quadratic approximation to v(EJ = f (zi + Ah,) and 6 ~ ( 0 , 1). We set the step-length to Ai if the condition v(&) < ~ ( 0 ) + oAi(hi, V f (z,)) is satisfied and a sufficiently small.

Remark 5.4: As noted in remark 3.1 for solving equation (9) we make use of a global method for system of nonlinear equation. (J. E. Dennis and R. B. Schnabel [4]. section 6.5) Also for this method we make use of Powell's method of remark 5.3 with parameters 6,, a; If we set G(zi, h) = V f (z,) + B,h + (112) y h h , then a solution to (9) is accepted if JG(zi, h)J < E,.

Remark 5.5: Note that we obtain Bi+ ,, Hi+ ,, B!,, in step 9. of algorithm 4.1 making use of V f (.) and Vgk(.). The calculation of V f (.) and Vgk(.) can be performed in parallel. Thus in a contest of parallel computation it is important to consider the number of evaluations off (.) in step 7. and the number of simultaneous evaluations of V f (.) and Vgk( . ) k = 1 , n in step 9.. We indicate this number by ntot. In particular if n f represets the number of evaluations off ( 9 ) and nVf the number of evaluations of V f (.), then ntot = n f + nV f .

Remark 5.6: For our numerical results 6 assumes the values 0.1,0.2,0.3,0.4, OS,O.6; a assumes the values 0.1, 6, = 0.1; a , = c , = 3.10- 3; for problem 1. = for problem 2. c, = for problem 3. We have set the initial matrices B,, H,, and B: k = 1 , n as follows B , = H , = max {coef ef (z,), 1) where coe f assumes the values 0,l; while Bt = diagw.I k = I , n where diagw is a constant value.

Remark 5.7: We have considered the best ten results with respect to ntot when the parameters 6, a, coef, and diagw are changed, then we have considered the arithmetic mean, indicated by icont, of these results. So if ntot(i) i = I , 10 represent the best results we have icont = (C!P, ntot(i))/lO.

The results are reported below. We only note that nF indicates that the method fails n times to terminate in a reasonable number of iterations and the standard method is obtained from algorithm 4.1 where steps 5. and 6. are replaced by Step 5'. Compute hi = - B i l V f (z,) = - HiVf ( z i ) while the step-length in step 7. is obtained as in remark 5.3.

Standard method

Problem 1. ntot = (96,98,lOO, 101,101,103,103,105,111,11 icont = 102.9 8F

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 09:

15 1

0 Ju

ly 2

014

HIGHER ORDER DERIVATIVES

Problem 2. ntot = (94,94,101,121,125,125,135,148,158, 191)T icont = 129.2 7 F

Problem 3. ntot = (129,140,141,206,230,233,282,295,300, 432)T icont = 238.8 14F

Our method

Problem 1. ntot = (37,46,68,71,74,80,83,83,88, 88)T icont = 71.8 OF

Problem 2. ntot = (74,79,82,83,83,84,85,86,86, 90)T icont = 83.2 OF

Problem 3, ntot = (71,72,122,132,134,140,149,149,160, 168)T icont = 129.7 5F

Remark 5.8: Note that our computational experience shows that the method de- scribed in this paper performs quite well. If we consider the parameters ntot and icont, so that we compare the numerical results in a contest of parallel computation, then our method seems to be more reliable and robust than the standard method.

References

[I] Bertsekas, D. P. (1981) "Constrained Optimization and Lagrange Multiplier Methods", Academic Press, New York

[2] Corradi, G. (1992) An Algorithm for Unconstrained Optimization, Inter. J. Computer Math., 45, 123-131

[3] Corradi, G. (1994) A Note on a Method for Constrained Optimization Based on Recursive Quad- ratic Programming, Inter. J . Computer Math., 51, 173-180

[4] Dennis, J. E. and Schnabel, R. B. (1983) "Numerical Methods for Unconstrained Optimization and Nonlinear Equation", Prentice-Hall, Englewood Cliffs, New Jersey

[5] Ortega, J. M. and Rheimboldt, W. C. (1970) "Iterative Solutions of Nonlinear Equations in Several variables", Academic Press, New York

[6] Polak, E. (1971) "Computational Methods in Optimization: A Unified Approach", Academic Press, New York

[7] Powell, M. J. D. (1978) A Fast Algorithm for Nonlinearly Constrained Optimization Calculations, Lecture Notes in Mathematics (G. A. Watson ed.) n. 630, Springer-Verlag, Berlin

Dow

nloa

ded

by [

UZ

H H

aupt

bibl

ioth

ek /

Zen

tral

bibl

ioth

ek Z

üric

h] a

t 09:

15 1

0 Ju

ly 2

014

Documents

Higher order derivatives in optimization methods