Upload
g
View
214
Download
2
Embed Size (px)
Citation preview
This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich]On: 10 July 2014, At: 09:15Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Optimization: A Journal of Mathematical Programmingand Operations ResearchPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gopt20
Higher order derivatives in optimization methodsG. Corradi aa University of Rome “La Sapienza”, Faculty of Economics , Via Del Castro Laurenziano, 9,Roma, 00161, ItalyPublished online: 20 Mar 2007.
To cite this article: G. Corradi (1996) Higher order derivatives in optimization methods, Optimization: A Journal ofMathematical Programming and Operations Research, 37:1, 41-49, DOI: 10.1080/02331939608844195
To link to this article: http://dx.doi.org/10.1080/02331939608844195
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Optimization, 1996, Vol. 37, pp. 41-49 0 1996 OPA (Overseas Publishers Association) Reprints available directly from the publisher Amsterdam B.V. Published in The Netherlands under Photocopying permitted by license only license by Gordon and Breach Science Publishers SA
Printed in Malaysia
HIGHER ORDER DERIVATIVES IN OPTIMIZATION METHODS
G. CORRADI
University of Rome "La Sapienza", Faculty of Economics, Via Del Castro Laurenziano, 9 00161 Roma, Italy
(Received 31 March 1995; in final form 24 October 1995)
A method for unconstrained optimization which makes use of higher derivatives is presented. A conver- gence analysis is considered too. The rate of convergence is shown to be superlinear. Numerical results are reported.
KEY WORDS: Higher order derivatives, optimization methods, superlinear convergence.
Mathematics Subject Classification 1991: Primary: 49M30; Secondary: 65K05.
1. INTRODUCTION
In this paper we consider the following problem: given f :Rn+ R' find f such that
f (5) = min {f ( z ) I zeRn) (1)
Afterwards we denote by 1.1 the Euclidean norm on Rn or a norm on a convenient space, the symbol (., .) denotes the Euclidean scalar product and f '(.) denotes the r-th derivative off (.). In this paper we present a method for solving problem (1) based on a higher order approximation of gradient Vf (.) off (.). Most of the known iterative algorithms for solving problem (1) construct a sequence of points z i € R n such that z,,, = z, + Aihi where AieR1 and hi€Rn. The standard methods consider a second order approximation off (.) for which
f (z , + h) - f (z ,) + f '(z,) h + (112) f "(zi)hh
where H(.) is the hessian matrix o f f (.), and compute a direction hi which is a solution of the equation Vf ( z , + h) - Vf (z ,) + H(z , )h = 0, or Vf (z,) + Bih = 0 where B, is a some approximation to H(z,).
2. MOTIVATIONS
In this paper we consider a higher order approximation of Vf(.) for which we compute the direction h solving the equation
4 1
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 09:
15 1
0 Ju
ly 2
014
42 G. CORRADI
where B, is a some approximation to H ( z i ) and W i h is a some approximation to (aH(z,)/az)h. In (2) (dH(z , ) /dz)y is such that
for every yeRn. Hence (dH(z , ) /az)y is an n x n matrix.
Remark 2.1: Note that if g(.) = V f (s), then
Examining the above relation component by component, we obtain for k = 1, n
where Hk(. ) is the Hessian matrix of gk(.) and hence we see that (aH(z) /az)h is an n x n matrix whose k-th row is [ H k ( z ) h l T .
Remark 2.2: By remark 2.1 it follows that we can choose the matrix W i h, that approximates (aH(zi ) /dz)h , so that the k-th row of W i h is [B:hlT where Bf is a some approximation to Hk(zi ) . For example B: can be computed making use of a rank two formula.
3. ON THE SOLUTION O F THE EQUATION G(z, h) = 0
We set
G(z, h) = V f ( z ) + H(z)h + (1/2)(dH(z)/dz)hh = 0 (4)
and proceed to examine the solution of (4). To this end we make use of the following theorem (implicit function theorem).
Theorem 3.1: Let S be an open subset of Rm+" and g:S -, Rn be a function such that for some p 2 0 g € C P (of class CP) over S and assume that ayg(x , y) (the Jacobian matrix of g with respect to y) exists and is continuous on S. Let ( 2 , j ) ~ S be a vector such that g(X,J) = 0 and the matrix a,g(X,j) is nonsingular. Then there exist scalars E > 0,6 > 0 and a function cf,:S(X, E ) + S ( J , 6) (we denote by S(z, p ) a neighborhood of z) such that cf,€CP over S(X, E), j = a(?) and g[x , cf,(x)] = 0 for all x ~ S ( 2 , &).The function cf, is unique in the sense that if XES(X,E) , y ~ S ( j , 6) and g(x , y) = 0 , then y = cf,(x). Furthermore, i f p 2 1 then for all xeS(X, E ) D
ownl
oade
d by
[U
ZH
Hau
ptbi
blio
thek
/ Z
entr
albi
blio
thek
Zür
ich]
at 0
9:15
10
July
201
4
HIGHER ORDER DERIVATIVES 43
Theorem 3.2: Let i e R n be a point such that V f (5) = 0. Assume that the matrix H(Z) + ( a H ( f ) / a z ) h is nonsingular for every heRn. Then a solution of (4) is (2, h) and h = 0. h = 0 is the unique solution of the system
[H(Z) + (aH(Z) /az )h]h = 0.
Furthermore if (2, h ) is a solution of (4) and d,G(i, h ) = H(5) + ( a H ( l ) / a z ) h is nonsin- gular, then for any z ~ S ( 5 , E ) the equation G(z, h) = 0 has an unique solution.
Proof: Since V f (2 ) = 0 , then (4) reduces to
[H(Z) + (dH(Z) /dz )h]h = 0.
Therefore h = 0 is solution of (6). We now assume that j # 0 is a solution of (6). Since the matrix H(Z) + (aH(Z) /az )h is non singular for all h, then the system [H(Z) + ( a H ( Z ) / a z ) j ] x = 0 has the unique solution x = 0 , hence cannot exist j # 0 such that [H(Z) + ( a H ( 5 ) / a z ) j ] j = O . By theorem 3.1, it follows that if (Z,@ is a solution of (4) and B,G(i7 6) is nonsingular, then there exist h = @(z), S ( ~ , E ) , S (6 , 6 ) such that @:S(Z, E ) -+ S(6 , 6 ) and K= @(Z), G(z, @(z)) = 0. Therefore for any zeS(Z, E),
the equation G(z, h) = 0 has the unique solution h = @(z)eS(6,6).
Remark 3.1: Note that the solution of (4) for any fixed z may be obtained using the Newton direction s = - a,G(z, h) - 'G(z , h) where d,G(z, h) = H(z) + (dH(z) /dz)h . We have used this direction for our numerical results. We note that a possible choice for h,, an initial approximation for solving (4), is h , = - H(z) - 'V f ( z ) which repre- sents an approximation to the solution of (4), or, if we consider equation (3) at z, then h , = - B- 'V f ( z ) where B is a some approximation of the Hessian marix off .
4. AN ALGORITHM FOR SOLVING PROBLEM (1)
Remark 4.1: Note that by above discussion it follows that if we consider the equa- tion V f ( z ) = 0 , then we can obtain the solution if we consider the following process
zi+' = z i + hi (7)
where hi is a solution of the equation
G(zi, h) = V f (z,) + H(zi)h + (1/2)(dH(zi)/az)hh = 0.
Note that process (7) is an extension of the Newton method.In fact we now prove the following theorem:
Theorem 4.1: Let V f (5) = 0, K= 0 be. Then there exists a neighborhood S'(5, p) of 5 such that if z 1 e S f ( i , p ) the iterates given by (7) remain in S 1 ( i , p ) and converge to 5, moreover the convergence is superlinear, hence
l i m ) z i + , - Z J / J z i - i l =O. i - cc
Proof: By theorem 3.2 there exists @: S(Z, E ) + S(K, 6 ) such that G(z, @(z)) = 0 and h = @(z). If we set Q(z) = z + @(z) we have Q ( i ) = i + @(5) = i + 0 = Z, hence f is a
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 09:
15 1
0 Ju
ly 2
014
44 G. CORRADI
fixed point of Q(.), moreover by theorem 3.1 we obtain
@'(a) = - [a, G(F, I ) ] - a, ~ ( z , 6 )
It follows that Q'(Z) = I - I = 0 . On the other hand from process (7) it follows that - zi + @(zi) = Q(z,). The results follow from theorem 10.1.6 of Ortega-Rheim- Z i + 1 -
boldt Ref. 5. p. 303. We now make use of the above discussion and present a new algorithm for
solving problem (1). The algorithm makes use of derivates of higher order.
A L G O R I T H M 4.1
Stepl: Select Z , E R ~ B ~ E R " ~ " , B ~ E R " ~ " , = l , n , H 1 ~ R n X n . Comment: Set g(.) = Vf (.). Note that if make use of (9), then we consider two sequences of matrices {B, ) and { W i h ) . B , is a some approximation to H(z,), the Hessian matrix of f , Wi h (see remark 2.2) is a some approximation to (aH(z , ) /az)h and the k-th row of h is [B: h l T where Bf is a some approxi- mation to Hk(z i ) , the Hessian matrix of gk(.). In step 1. B , is an initialization of {B , ) and B t k = 1, n is, for every k, an initialization of {B:) .
Step 2: Set i = 1
Step 3: Compute Vf (z ,)
Step 4: If Vf (2,) = 0 stop, else go to step 5.
Step 5: Compute a solution 6 , if there exists, of the following equation
Vf (2,) + H(z i )h + (1 /2) (aH(z i ) /az)hh = 0.
or
Vf (2,) + B , h + (112) W i hh = 0
If (Ki,Vf (2,)) < 0 we set hi = 6, - piVf (2,) where pi 2 0 is to define.
Step 6: If equaion (8) or (9) has no solution or ( K t , Vf ( z , ) ) 2 0, then we set
Comment: for our numerical results we make use of (9).
Compute A, > 0 such that
Step8: Set z,+, =z i+Aihi
Step 9: Compute Bi + ,, H i + ,, Bf, ,, k = 1, n.
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 09:
15 1
0 Ju
ly 2
014
HIGHER ORDER DERIVATIVES 45
Comment: We note that the matrices B,, , and Bf,, are computed only if in step 5. We make use of (9).
Step 10: Set i = i + 1 and go to step 3.
Theorem 4.2 (convergence): Consider algorithm 4.1 where we compute hi from step 5. using equation (9) or step 6.. Assume that if there exists Ki then (B, + (1/2)Wi&) is nonsingular and I(B, + (112) WiKi)-' I < c,. Further we assume Hi > 0 (positive definite) and (Hi( < c,. Then either algorithm 4.1 constructs a finite sequence (zi) whose last element is a critical point off (.) or else the algorithm constructs an infinite sequence (2,) and every accumulation point of (2,) is a critical point off (.).
Proof: The first part of the theorem is trivial since the algorithm stops at the point zi in which Vf (2,) = 0. For the second part of the theorem we need to show that
- (hi, Vf (zi)) 2 p(zi)Ihi / I Vf (zi)I (10)
where p: Rn + R' is a continuous function and if A is the set of critical points off (e),
then 0 < p(z) < 1 if z$A, 0 < p(z) < 1 if ZEA. Condition (10) follows from sufficient condition (9) of theorem (8) Ref. 6. p. 46, where the constant p has been replaced by the function p(.) defined above. We only note that the conclusion of theorem (8) still remain valid. From (9) we have (in a similar way for (8)) Ki = -(Bi + (112) Wi tii)-' Vf (2,) for which from step 5.
hi = - (Bi + (112) Wi Ki)- ' Vf (2,) - piVf (2,)
= - [(B, + (1/2)Wi & ) - I + p,I]Vf (2,).
It follows that
Ihil < IC(Bi+(1/2)Wi&)-' +~iIlll vf(ziI
< ( ~ 1 + pi) I Vf (zi) I.
On the other hand
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 09:
15 1
0 Ju
ly 2
014
46
Further
G. CORRADI
hence
From (11) and (12) it follows that for every i
where c, = max{c,,c,). If we compare (13) and (10) then the result follows if we select a convenient pi. An
appropriate choice is pi = const > 0 or pi = const. IVf (z,)] and const is a positive constant.
We now prove that if Ai is chosen by the Armijo rule so that step 7. of algorithm 4.1 is replaced by
Step 7': Compute Ai = pj where PE(O, 1) and j is the first nonnegative integer for which
where UE(O, I), then the sequence {zi} converges superlinearly.
Remark 4.2: Note that if 5 is a point such that Vf (5) = 0, then a solution of (9) corresponding to I is K = 0. It follows that, if z~S(i ;6) , then we can assume that equation (9) has a solution h€S(O,2).
If we consider equation (8) remark 4.2 follows directly from theorem 3.2.
Remark 4.3: Let Vf (I) = 0 be. Since for i > i, z,€S(Z, 4, then from remark (4.2) there exists & such that 6, = - (B, + (1/2)W,6,)-'Vf (2,). Since l(Bi + (1/2)Wi tii)- < c, it follows that Tii + O as i - t co. So we can assume that Wihi +O. Since
it follows that
for which, if Bi > 0 for every i, there exists i, such that for every i > i, (Vf (z,), 6) < 0. So the conditions of step 5. of algorithm 4.1 are verified.
Theorem 4.3 (superlinear convergence): Consider algorithm 4.1 where we compute hi from step 5. making use of equation (9). Assume that zi -+ I, Vf (I) = 0, H ( f ) > 0 and that zi # I for all i. Assume further that Vf (z ,) # 0 for all i and that pi -0 as i -t co and D
ownl
oade
d by
[U
ZH
Hau
ptbi
blio
thek
/ Z
entr
albi
blio
thek
Zür
ich]
at 0
9:15
10
July
201
4
HIGHER ORDER DERIVATIVES
where H i = B; I . Then
Proof: From remark 4.3 there exists i, such that for every i > i, algorithm (4.1) constructs a sequence of points
zi+l = zi + Aihi = zi - Ai[(Bi + (1/2)&Ei)-' + p i I ]V f ( z i )
= zi - l iD iV f ( z i )
where
Di = (Bi + (112) WiKi)- + piI .
The proof now follows making use of standard results. (Bertsekas 1, proposition 1.15).
5. NUMERICAL RESULTS
For our numerical results all the computations were done using double precision arithmetic and all gradients were obtained analytically. The termination criterion is IVf 1 < E where E = for problems 1. and 2. and E = for problem 3.. The details of the functions are given below
1. f ( z ) = l0O(z2 - (z1)2)2 + (1 - 2')'
Zin = ( - 1.2, I ) , z,, = ( 1 , l )
2. f ( z ) = (z' + 10z2)' + 5(z3 - z4)2 + (z2 - 2 ~ ~ ) ~ + 10(zl - z4)4 zin = (3, - 4 0 , I ) , zott = (1, 1, 1 , l )
3. f ( z ) = 100(z2 - ( z ' ) ~ ) ~ + (1 - 2')' + 90(z4 - ( z ~ ) ~ ) ~ + (1 - z3)2 + + I O . ~ ( Z ~ - 112 + 10.1(~4 - 1)' + 19.8(z2 - 1 ) ( ~ 4 - 1) zin = ( - 3, - 1, - 3, - I ) , zott = (1, 1, 1, l) .
Remark 5.1: Note that in steps 5. and 6. of algorithm 4.1 we set pi = const > 0. For problem 1. const = for problem 2. const = for problem 3. const =
Remark5.2: We note that in step 9. of algorithm 4.1, when we compute Bi+ ,, B!,, and H i + , we make use of following updating formulas (Powell [7]).
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 09:
15 1
0 Ju
ly 2
014
48 G. CORRADI
and 8 = 1 if
( A g , Az , ) 2 0.2 (Az , , B iAz i )
else
8 = 0.8 (Age BiAz i ) / ( (Az i , B iAz i ) - ( A z , Ag,)).
In similar way for Hi+ ,. For B:+, k = 1, n we set 8 = 1 for which q: = Agf k = 1, n.
Remark 5.3: For our numerical results we note that the step-length in step 7. of algorithm 4.1 is obtained making use of a method introduced by Powell in Ref. [7]. In this method we built a sequence {%,)i > 1 and A, = 1. For i > 1 we set A, = max ( 6 . 4 - ,, 2 ) where 2 minimizes the quadratic approximation to v(EJ = f (zi + Ah,) and 6 ~ ( 0 , 1). We set the step-length to Ai if the condition v(&) < ~ ( 0 ) + oAi(hi, V f (z,)) is satisfied and a sufficiently small.
Remark 5.4: As noted in remark 3.1 for solving equation (9) we make use of a global method for system of nonlinear equation. (J. E. Dennis and R. B. Schnabel [4]. section 6.5) Also for this method we make use of Powell's method of remark 5.3 with parameters 6,, a; If we set G(zi, h) = V f (z,) + B,h + (112) y h h , then a solution to (9) is accepted if JG(zi, h)J < E,.
Remark 5.5: Note that we obtain Bi+ ,, Hi+ ,, B!,, in step 9. of algorithm 4.1 making use of V f (.) and Vgk(.). The calculation of V f (.) and Vgk(.) can be performed in parallel. Thus in a contest of parallel computation it is important to consider the number of evaluations off (.) in step 7. and the number of simultaneous evaluations of V f (.) and Vgk( . ) k = 1 , n in step 9.. We indicate this number by ntot. In particular if n f represets the number of evaluations off ( 9 ) and nVf the number of evaluations of V f (.), then ntot = n f + nV f .
Remark 5.6: For our numerical results 6 assumes the values 0.1,0.2,0.3,0.4, OS,O.6; a assumes the values 0.1, 6, = 0.1; a , = c , = 3.10- 3; for problem 1. = for problem 2. c, = for problem 3. We have set the initial matrices B,, H,, and B: k = 1 , n as follows B , = H , = max {coef ef (z,), 1) where coe f as- sumes the values 0,l; while Bt = diagw.I k = I , n where diagw is a constant value.
Remark 5.7: We have considered the best ten results with respect to ntot when the parameters 6, a, coef, and diagw are changed, then we have considered the arithmetic mean, indicated by icont, of these results. So if ntot(i) i = I , 10 represent the best results we have icont = (C!P, ntot(i))/lO.
The results are reported below. We only note that nF indicates that the method fails n times to terminate in a reasonable number of iterations and the standard method is obtained from algorithm 4.1 where steps 5. and 6. are replaced by Step 5'. Compute hi = - B i l V f (z,) = - HiVf ( z i ) while the step-length in step 7. is obtained as in remark 5.3.
Standard method
Problem 1. ntot = (96,98,lOO, 101,101,103,103,105,111,11 icont = 102.9 8F
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 09:
15 1
0 Ju
ly 2
014
HIGHER ORDER DERIVATIVES
Problem 2. ntot = (94,94,101,121,125,125,135,148,158, 191)T icont = 129.2 7 F
Problem 3. ntot = (129,140,141,206,230,233,282,295,300, 432)T icont = 238.8 14F
Our method
Problem 1. ntot = (37,46,68,71,74,80,83,83,88, 88)T icont = 71.8 OF
Problem 2. ntot = (74,79,82,83,83,84,85,86,86, 90)T icont = 83.2 OF
Problem 3, ntot = (71,72,122,132,134,140,149,149,160, 168)T icont = 129.7 5F
Remark 5.8: Note that our computational experience shows that the method de- scribed in this paper performs quite well. If we consider the parameters ntot and icont, so that we compare the numerical results in a contest of parallel computation, then our method seems to be more reliable and robust than the standard method.
References
[I] Bertsekas, D. P. (1981) "Constrained Optimization and Lagrange Multiplier Methods", Academic Press, New York
[2] Corradi, G. (1992) An Algorithm for Unconstrained Optimization, Inter. J. Computer Math., 45, 123-131
[3] Corradi, G. (1994) A Note on a Method for Constrained Optimization Based on Recursive Quad- ratic Programming, Inter. J . Computer Math., 51, 173-180
[4] Dennis, J. E. and Schnabel, R. B. (1983) "Numerical Methods for Unconstrained Optimization and Nonlinear Equation", Prentice-Hall, Englewood Cliffs, New Jersey
[5] Ortega, J. M. and Rheimboldt, W. C. (1970) "Iterative Solutions of Nonlinear Equations in Several variables", Academic Press, New York
[6] Polak, E. (1971) "Computational Methods in Optimization: A Unified Approach", Academic Press, New York
[7] Powell, M. J. D. (1978) A Fast Algorithm for Nonlinearly Constrained Optimization Calculations, Lecture Notes in Mathematics (G. A. Watson ed.) n. 630, Springer-Verlag, Berlin
Dow
nloa
ded
by [
UZ
H H
aupt
bibl
ioth
ek /
Zen
tral
bibl
ioth
ek Z
üric
h] a
t 09:
15 1
0 Ju
ly 2
014