“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates”...

“On Sizing and Shifting The BFGS Update Within The Sized-Broyden

Family of Secant Updates”

Richard Tapia (Joint work with H. Yabe and H.J. Martinez)

Rice University

Department of Computational and Applied MathematicsCenter for Excellence and Equity in Education

Berkeley, CaliforniaSeptember 23, 2005

PreliminariesThe Problem: f(x)

x min RRf n : I

“Equivalently”:

Time Honored Work-Horse Methods

(Cauchy 1847) Gradient Method (steepest descent)

0 )( xfxx

.0)( xf

Equation)(Newton )()(2 xfsxf

( 1700’s) Newton’s Method

Characteristics Gradient Method:

Inexpensive Good global properties Slow local convergence

Newton’s Method Expensive per iteration Poor global properties, excellent local properties Fast local convergence

)( 3nO

The Middle Ground and The Algorithm of Interest

Secant Methodssxx

where )(xfBs

secant equation ysB

where )()( xfsxfy

Remark: We view B as an approximation of ).(2 xf

Characteristics: Similar properties as Newton’s Method, but not as expensive,

per iteration

(B+ new approximation)

)( 2nO

History/Chronology In one dimension (n=1) the secant equation

uniquely gives

as an approximation to The 1-dimension 2-point secant method was probably discovered in the middle of the 18th century. It is extremely effective and efficient and has a convergence rate of (Golden mean)

).( xf

Gauss formulated a 3-point secant method for two dimensions

There was considerable research activity on (n+1)-point secant methods in the 1960’s. While these methods had good theoretical properties they were numerical failures. The iterates tend to cluster in a lower dimensional manifold, and lead to linear systems that are ill-conditioned and nearly singular. These (n+1)-point secant methods have been discarded.

The New Generation of Secant Methods (Variable Metric or Quasi-Newton Methods) DFP Davidon-Fletcher-Powell Davidon(1958)

Fletcher-Powell 1963 DFP was the work-horse secant method from 1963-1970

in spite of the serious numerical flaw that the diagonal of the approximating matrices approached zero (excessively small eigenvalues). This required restarts using the identity as a Hessian approximation.

BFGS (1970) Broyden-Fletcher-Goldfarb-Shanno A new secant update that does not generate excessively

small eigenvalues BFGS has become the secant method of choice

based on numerical performance In some cases BFGS is not effective and generates

approximations with excessively large eigenvalues.

Broyden Family of Secant Updates (1970)

Write sy

BBssBysBBFGS

Broyden Family TvvysBBFGSB ,,

where parameter andR

1963 DFP promotes small eigenvalues 1970 BFGS may promote large

eigenvalues Convex class Preconvex class

Two Interesting Research Ideas That We Build On John Dennis (1972)

Notion of least change secant update

Choose in the Broyden class so that is closest to B in a weighted Frobenius norm. In this case we can explain BFGS and DFP.

Oren-Luenberger (1974) (SSVM)Size the matrix B before updating

Terminology Def:

(i) A and B are said to be relatively sized if

(ii) sizes B relative to A if and A are relatively sized

Proposition: sizes B relative to A

BSpectrumASpectrum R I B

satisfying ,

Corollary: For any u

Def: sizes B relative to the Hessian of f if there exists x such that sizes B relative to

sizes B relative to A

).(2 xf

Historical Background on Sizing 1974 Oren-Luenberger (SSVM)

size at each iteration with

Proposition: sizes B relative to the Hessian of f

Proof:

dsxfsxfsxfsy TTT )(2

1978 Shanno-Phua

Observation: Secant equation implies

Therefore all secant updates are

relative to the Hessian of f. Suggestion: Size only initial approximation in

BFGS secant method and do so using

Question? Effectiveness of Effective sizing strategy

Initial approximation only? All approximations? Selective approximations?

M. Contreras and R. Tapia (1993)“Sizing The DFP and BFGS Updates: A Numerical Study”

Propositions: If the secant method converges q-superlinearly, then converges to one.

Selective sizing: size ifOL

21 1 OL 0, 21

Contreras-Tapia Findings The DFP update loves to be sized by

Sizing at every iteration is only slightly inferior to selective sizing. Without sizing DFP is vastly inferior to BFGS. With selective sizing competitive with a selectively sized BFGS.

When sizing is working, converges very nicely to one.

Selective sizing for BFGS is best, sizing at each iteration is not good; it does not like to be sized.

is not a real good fit with BFGS. It tends to size too much especially for large dimensional problems.

New ResearchYabe-Martinez-Tapia (2004) Premise:

For BFGS, especially in higher dimensions, B often has large eigenvalues (indeed by design) and this tends to give large Rayleigh quotients

Hence is small and this in turn moves in the direction of singularity.

Bsssy TTOL .ssBss TT

Follow sizing with with shift within the Broyden class to compensate for near singularity.

Sized Broyden class

set and then find best OL

TysBBFGSB ,,

Byrd-Nocedal (1989)

A General Measure of Goodness

Proposition:

The measure ω is globally and uniquely minimized by A = I over the class of symmetric positive definite matrices

Size and Shift Approach

Consider choices of the parameters and determined from the minimization problem

)det(ln)()( AATRA

and D is a symmetric positive definite weighting matrix.

Observe that solves this problem; if is not restricted to the sized Broyden class

m in D B D

TysBBFGSB ,,

Obvious choices for D D = I Obtain member of sized Broyden class

closest to the identity – Gradient flavored

D = B Obtain member of sized Broyden class closest to D – least-change

secant flavored Obtain member of sized Broyden class

closest to the Hessian – Newton flavored

)(2 xfD

Three Optimization ProblemsI. Given find as solution of

II. Given find as solution of

III. Find and as solution of

min DBDw

,min DBDw

Solutions Problem I: Given

Observation: For D = B

Interpretation: In least change sense (no sizing) implies (BFGS).

yByBss

Problem II: Given

)1()1(2

)2)(1()1()1( 2

yByBss

BsBDsDBDTR

2)1()1)1()1(1

Hence implies0 1*

Interpretation: In least-change sense BFGS should not be sized.

Problem III: Find both and from minimization problem

Interpretation: In least change sense BFGS with no sizing is best.

Numerical Experimentation Selectively size BFGS using Shift using solution obtained with D = I

(gradient flavored), D = B (least changed flavored), and (Newton flavored)

)(2 xfD

SurpriseThe winner is D = I (gradient flavored) Comment: There is consistency in this choice. Our

sizing indicator has told us that we should size; hence BFGS is probably not best and we should shift Either B is bad, is bad, or there is a bad match

between the two. Therefore least change D = B may be dangerous and Newton may be dangerous. The choice D = I prevents this faulty information from further contaminating the update; i.e. we use the member of the Broyden class which is closest to steepest descent.

)(2 xf

)(2 xfD

“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates”...

Documents

LM-CMA: an Alternative to L-BFGS for Large Scale Black-box ...loshchilov.com/publications/LMCMA.pdf · LM-CMA: an Alternative to L-BFGS for Large Scale Black-box Optimization Ilya

Nonsmooth Equation Based BFGS Method for Solving KKT

NONSMOOTH OPTIMIZATION VIA BFGS - NYU …overton/papers/pdffiles/bfgs_inexactLS.pdf · NONSMOOTH OPTIMIZATION VIA BFGS ... and the sublevel set {x : f(x) ≤f(x ... addressing the

A BROYDEN CLASS OF QUASI …whuang2/pdf/RBroydenBasic_SIOPT.pdf · In the classical Euclidean setting, the Broyden class (see, e.g., [22, section 6.3]) is a family of quasi-Newton

SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

A stochastic L-BFGS approach for full waveform inversion ... · A stochastic L-BFGS approach for full waveform inversion Gabriel Fabien-Ouellet*, Erwan Gloaguen, Bernard Giroux, INRS

A Comparison of Newly Developed Broyden – Like Methods for

On the convergence of the Broyden{like method and the

6.1 The BFGS Method - Purdue University · MS4327 Optimisation 214 ' & $ % The DFP method has been superseded by the BFGS (Broyden, Fletcher, Goldfarb & Shanno) method. It can be

Multiobjective Optimization and the COCO Platform · 2017-03-24 · Deterministic algorithms Quasi-Newton with estimation of gradient (BFGS) [Broyden et al. 1970] Simplex downhill

Broyden Method

Numerical Optimization: Understanding L-BFGS — aria42 Optimization... · Numerical Optimization: Understanding L-BFGS — aria42 5/31/17, 8:14 PM Page 3 of 11 In order to simplify

ANALYSIS OF THE BFGS METHOD WITH ERRORSAs was shown by Powell [15], an Armijo-Wolfe line search guarantees the stability of the BFGS updating procedure, and ultimately the global convergence

An SR1/BFGS SQP algorithm for nonconvex nonlinear programs

On the limited memory BFGS method for large scale optimization the Limited Memory BFGS... · 2020. 9. 8. · Newton methods, in which storage is restricted. Their simplicity is one

Yabe, John; Nakayama, Shouta M. M.; Ikenaka, Yoshinori ... · 1 Lead poisoning in children from townships in the vicinity of a lead-zinc mine in Kabwe, 2 Zambia 3 4 John Yabe,a1 Shouta

A Broyden Class of Quasi-Newton Methods for Riemannian

Méthodes quasi-Newton : BFGS

L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization

Limited Memory BFGS for Nonsmooth Optimization · Master’s thesis: Limited Memory BFGS for Nonsmooth Optimization Anders Skajaa M.S. student Courant Institute of Mathematical Science