Frank E. Curtis, Lehigh Universitycoral.ise.lehigh.edu/frankecurtis/files/talks/iccopt_19.pdf · 2019. 7. 28. · Curtis, D. P. Robinson, and B. Zhou. A Self-Correcting Variable-Metric

Motivation Algorithm NonOpt Summary

NonOpt: Non(-linear/-smooth/-convex) Optimizer

Frank E. Curtis, Lehigh University

presented at

International Conference on Continuous Optimization

Berlin, Germany

August 5, 2019

NonOpt: Non(-linear/-smooth/-convex) Optimizer 1 of 25


References

? F. E. Curtis and X. Que.

An Adaptive Gradient Sampling Algorithm for Nonsmooth Optimization.

Optimization Methods and Software, 28(6):1302–1324, 2013.


A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with GlobalConvergence Guarantees.

Mathematical Programming Computation, 7(4):399–428, 2015.

? F. E. Curtis, D. P. Robinson, and B. Zhou.

A Self-Correcting Variable-Metric Algorithm Framework for Nonsmooth Optimization.

IMA Journal of Numerical Analysis, 10.1093/imanum/drz008, 2019.



Outline

Motivation

Algorithm

NonOpt

Summary



Outline

Motivation

Algorithm

NonOpt

Summary



Why?!

Do we need complicated “non” optimization software?

I Many problems are nonlinear, nonsmooth, and nonconvex,

I . . . but people say these can be solved with simple algorithms.

For example, trend is to analyze/use:

I subgradient method

I BFGS with line search



Nonsmooth Rosenbrock

f(x) = 8|x21 − x2|+ (1− x1)2



Nonsmooth Rosenbrock

f(x) = 8|x21 − x2|+ (1− x1)2



NonOpt

NonOpt is an open-source C++ software package to solve

minx∈Rn

f(x),

where f : Rn → R is

I locally Lipschitz,

I continuously differentiable over a full-measure subset of Rn,

I (potentially) nonsmooth, and

I (potentially) nonconvex.†

The code is designed to be extensible.

I Ideas borrowed from Ipopt. (Thanks, Andreas!)

I Additional features (e.g., constraints) forthcoming.

†Theory requires f to be weakly semismooth.



Central tenets of NonOpt

Quasi-Newton methods are surprisingly effective for “non” optimization.

I Self-correcting BFGS update

I Offers exactly the inequalities needed for convergence

I Careful radii updates; Curtis, Robinson, & Zhou, 2019

Quasi-Newton must be guided with cutting planes and/or gradient sampling.

I Point sets are critical for nonsmooth optimization.

I QP subproblems need to be solved.

I IMPORTANT: Specialized QP solvers, gradient aggregation, and inexactsubproblem solutions mean that added per-iteration cost can be negligiblecompared to “simple” algorithms.



Outline

Motivation

Algorithm

NonOpt

Summary



Search direction computation

At xk ∈ Rn, a smooth optimization algorithm computes dk ← x∗k − xk, where

x∗k ∈ arg minx∈Rn

f(xk) +∇f(xk)T (x− xk) + 12

(x− xk)THk(x− xk)

s.t. ‖x− xk‖ ≤ δk.

For nonsmooth f , with sets of points, scalars, and (sub)gradients

{xk,j}mj=1, {fk,j}mj=1, and {gk,j}mj=1,

NonOpt solves the primal subproblem

minx∈Rn

(max

j∈{1,...,m}{fk,j + gTk,j(x− xk,j)}+ 1

2(x− xk)THk(x− xk)

)s.t. ‖x− xk‖ ≤ δk.

(P)



Examples

Multiple types of algorithms involve subproblems of the form

minx∈Rn

(max

j∈{1,...,m}{fk,j + gTk,j(x− xk,j)}+ 1


)s.t. ‖x− xk‖ ≤ δk.

(P)

Bundle methods: (Schramm & Zowe, 1992)

I {xk,j} from previous / current iterates and trial points

I {fk,j} with fk,j = min{f(xk,j), f(xk) + gTk,j(xk,j − xk)− c‖xk,j − xk‖22}I {gk,j} with gk,j ∈ ∂f(xk,j)

Gradient sampling methods: (Burke, Lewis, & Overton, 2005)

I {xk,j} from current iterate and randomly sampled points

I {fk,j} with fk,j = f(xk)

I {gk,j} with gk,j = ∇f(xk,j)



Dual subproblem

The primal subproblem is nonsmooth:

minx∈Rn

(max

j∈{1,...,m}{fk,j + gTk,j(x− xk,j)}+ 1


)s.t. ‖x− xk‖ ≤ δk.

(P)

With Gk ← [gk,1 · · · gk,m], it is typically more efficient to solve the dual

sup(ω,γ)∈Rm

+×Rn− 1

2(Gkω + γ)TWk(Gkω + γ) + bTk ω − δk‖γ‖∗

s.t. 1Tmω = 1.

(D)

The primal solution can then be recovered by

x∗k ← xk −Wk (Gkωk + γk)︸︷︷︸gk

.

NonOpt has specialized active-set QP solver for (D) based on Kiwiel, 1985.



Self-correcting properties of BFGS updates with {(sk, vk)}Theorem 1 (Byrd & Nocedal, 1989)

Suppose that, for all k, there exists {η, θ} ⊂ R++ such that

η ≤sTk vk

‖sk‖22and

‖vk‖22sTk vk

≤ θ. (KEY)

Then, for any p ∈ (0, 1), there exist constants {ι, κ, λ} ⊂ R++ such that, for anyK ≥ 2, the following relations hold for at least dpKe values of k ∈ {1, . . . ,K}:

ι ≤sTkHksk

‖sk‖2‖Hksk‖2and κ ≤

‖Hksk‖2‖sk‖2

≤ λ.

Corollary 2

Suppose the conditions of Theorem 1 hold. Then, for any p ∈ (0, 1), there existconstants {µ, ν} ⊂ R++ such that, for any K ≥ 2, the following relations hold forat least dpKe values of k ∈ {1, . . . ,K}:

µ‖gk‖22 ≤ gTkWkgk and ‖Wkgk‖22 ≤ ν‖gk‖22



Algorithm NonOpt Algorithm Framework

1: Choose x1 ∈ Rn.2: Choose a symmetric positive definite W1 ∈ Rn×n.3: Choose α ∈ (0, 1)4: for all k ∈ N := {1, 2, . . . } do5: Solve (P)–(D) such that setting

Gk ←[gk,1 · · · gk,m

],

sk ← −Wk(Gkωk + γk),

and xk+1 ← xk + sk

6: yields (potentially after a line search)

f(xk+1) ≤ f(xk)− 12α(Gkωk + γk)

TWk(Gkωk + γk).

7: Choose yk ∈ Rn.8: Set βk ← min{β ∈ [0, 1] : v(β) := βsk + (1− β)yk satisfies (KEY)}.9: Set vk ← v(βk).

10: Set

Wk+1 ←(I −

vksTk

sTk vk

)T

Wk

(I −

vksTk

sTk vk

)+sks

Tk

sTk vk.

11: end for



Search direction computation

Algorithm 2 NonOpt Subproblem Solver

Require: {xk,j}mj=1, {fk,j}mj=1, {gk,j}mj=1, and Wk

1: for j = m,m+ 1, . . . do2: Solve (D) for (ωk,j , γk,j).3: Set sk,j ← −Wk(Gk,jωk,j + γk,j).4: Set xk,j+1 ← xk + sk,j .5: if either

f(xk,j+1) ≤ f(xk)− 12α(Gk,jωk,j + γk,j)

TWk(Gk,jωk,j + γk,j)

6: or‖Wk(Gk,jωk,j + γk,j)‖ ≤ ξδk,

‖Gk,jωk,j + γk,j‖ ≤ ξδk,and ‖Gk,jωk,j‖ ≤ ξδk

7: then return8: else add xk,j+1 and appropriate (fk,j+1, gk,j+1) to point set9: end if

10: end for



Additional features

Features being finalized before release:

I Inexact subproblem solves: Primal value pjk and dual value qjk yielding

pjk < 0 andqjk − q

0k

pjk − q0k

≥ ζ ∈ (0, 1)

I Gradient aggregation (new for gradient sampling!); Kiwiel, 1985



Outline

Motivation

Algorithm

NonOpt

Summary



Basics

Low-level classes:

I Vector

I SymmetricMatrix

I Point

I PointSet

Solver classes:

I Reporter (similar to Journalist in Ipopt)

I Options

I Quantities

I Strategies



Strategies

Strategy DirectionComputation

InverseHessianUpdate

LineSearch

PointSetUpdate

QPSolver

SymmetricMatrix

Gradient

CuttingPlane

GradientCombination

BFGS

Backtracking

WeakWolfeProximity

ActiveSetDense

LimitedMemory



Strategies



LineSearch

PointSetUpdate

QPSolver

SymmetricMatrix

Gradient

CuttingPlane

GradientCombination

BFGS

Backtracking

WeakWolfeProximity

ActiveSetDense

LimitedMemory



Strategies



LineSearch

PointSetUpdate

QPSolver

SymmetricMatrix

Gradient

CuttingPlane

GradientCombination

BFGS

Backtracking

WeakWolfeProximity

ActiveSetDense

LimitedMemory



Strategies



LineSearch

PointSetUpdate

QPSolver

SymmetricMatrix

Gradient

CuttingPlane

GradientCombination

BFGS

Backtracking

WeakWolfe

Proximity

ActiveSetDense

LimitedMemory



Strategies



LineSearch

PointSetUpdate

QPSolver

SymmetricMatrix

Gradient

CuttingPlane

GradientCombination

BFGS

Backtracking

WeakWolfe

Proximity

ActiveSetDense

LimitedMemory



Strategies



LineSearch

PointSetUpdate

QPSolver

SymmetricMatrix

Gradient

CuttingPlane

GradientCombination

BFGS

Backtracking

WeakWolfeProximity

ActiveSet

Dense

LimitedMemory



Strategies



LineSearch

PointSetUpdate

QPSolver

SymmetricMatrix

Gradient

CuttingPlane

GradientCombination

BFGS

Backtracking

WeakWolfeProximity

ActiveSet

Dense

LimitedMemory



Nonsmooth test problems

Name Convex? f(x0) f(x∗)maxq Yes 2500.0 0.0mxhilb Yes 4.5 0.0chained lq Yes 49.0 -69.3chained cb3 1 Yes 980.0 98.0chained cb3 2 Yes 980.0 98.0active faces No 3.9 0.0brown function 2 No 98.0 0.0chained mifflin 2 No 232.8 -34.8chained crescent 1 No 292.3 0.0chained crescent 2 No 292.3 0.0

Codes:

I NonOpt

I LMBM: Karmitsa/Haarala, Miettinen, & Makela, 2004 & 2007

I HANSO: Lewis & Overton, 2012; Burke, Lewis, & Overton, 2005

Termination:

I NonOpt terminates for all problems with ‖Gkωk‖ ≤ 10−4.

I LMBM terminates 9/10 times due to small objective improvement.

I HANSO terminates 5/10 times due to small objective improvement.



Numerical results, nonsmooth test problems

Iterations Function evaluations



Chained Crescent 1+--------------------------------------------------------------+

| NonOpt = Nonsmooth Optimization Solver |

| Please visit http://coral.ise.lehigh.edu/frankecurtis/nonopt |

+--------------------------------------------------------------+

Number of variables.............. : 50

Direction computation strategy... : CuttingPlane

Inverse Hessian update strategy.. : BFGS

Line search strategy............. : WeakWolfe

Point set update strategy........ : Proximity

QP solver strategy............... : ActiveSet

Symmetric matrix strategy........ : Dense

-------------------------------------------------------------------------------------------------------------------------

Iter. Objective Stat. Rad. Trust Rad. |Points| QP Error |G. Combo.| |Step| Stepsize Correction

-------------------------------------------------------------------------------------------------------------------------

0 +2.9225e+02 +7.0000e-01 +7.0000e+04 0 +0.0000e+00 +7.0000e+00 +7.0000e+00 +5.0000e-01 +0.0000e+00

1 +2.8775e+02 +7.0000e-01 +7.0000e+04 1 +0.0000e+00 +7.0000e+00 +1.7567e+00 +1.0000e+00 +0.0000e+00

...

19 +5.2006e-08 +1.0000e-04 +7.0000e+00 12 +7.7753e-16 +1.7149e-04 +8.5751e-05 +1.0000e+00 +0.0000e+00

20 +2.2457e-08 +1.0000e-04 +7.0000e+00 14 +2.2204e-16 +8.3220e-05 +4.1619e-05

EXIT: Stationary point found.

Objective........................ : 2.245677e-08

Number of iterations............. : 20

Number of inner iterations....... : 35

Number of QP iterations.......... : 16

Number of function evaluations... : 82

Number of gradient evaluations... : 35

CPU seconds...................... : 0.006477

CPU seconds in NonOpt............ : 0.005924

CPU seconds in evaluations....... : 0.000553



Numerical results, CUTE/r/st (smooth) problems

Nonsmooth algorithms for solving difficult smooth problems?

Iterations Function evaluations



Outline

Motivation

Algorithm

NonOpt

Summary



SummaryNonOpt is an open-source C++ software package to solve

minx∈Rn

f(x),

where f : Rn → R is

I locally Lipschitz,

I continuously differentiable over a full-measure subset of Rn,

I (potentially) nonsmooth, and

I (potentially) nonconvex.


An Adaptive Gradient Sampling Algorithm for Nonsmooth Optimization.

Optimization Methods and Software, 28(6):1302–1324, 2013.


A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with GlobalConvergence Guarantees.

Mathematical Programming Computation, 7(4):399–428, 2015.

? F. E. Curtis, D. P. Robinson, and B. Zhou.

A Self-Correcting Variable-Metric Algorithm Framework for Nonsmooth Optimization.

IMA Journal of Numerical Analysis, 10.1093/imanum/drz008, 2019.


Documents

Frank E. Curtis, Lehigh Universitycoral.ise.lehigh.edu/frankecurtis/files/talks/iccopt_19.pdf · 2019. 7. 28. · Curtis, D. P. Robinson, and B. Zhou. A Self-Correcting Variable-Metric