Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Motivation Algorithm NonOpt Summary
NonOpt: Non(-linear/-smooth/-convex) Optimizer
Frank E. Curtis, Lehigh University
presented at
International Conference on Continuous Optimization
Berlin, Germany
August 5, 2019
NonOpt: Non(-linear/-smooth/-convex) Optimizer 1 of 25
Motivation Algorithm NonOpt Summary
References
? F. E. Curtis and X. Que.
An Adaptive Gradient Sampling Algorithm for Nonsmooth Optimization.
Optimization Methods and Software, 28(6):1302–1324, 2013.
? F. E. Curtis and X. Que.
A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with GlobalConvergence Guarantees.
Mathematical Programming Computation, 7(4):399–428, 2015.
? F. E. Curtis, D. P. Robinson, and B. Zhou.
A Self-Correcting Variable-Metric Algorithm Framework for Nonsmooth Optimization.
IMA Journal of Numerical Analysis, 10.1093/imanum/drz008, 2019.
NonOpt: Non(-linear/-smooth/-convex) Optimizer 2 of 25
Motivation Algorithm NonOpt Summary
Outline
Motivation
Algorithm
NonOpt
Summary
NonOpt: Non(-linear/-smooth/-convex) Optimizer 3 of 25
Motivation Algorithm NonOpt Summary
Outline
Motivation
Algorithm
NonOpt
Summary
NonOpt: Non(-linear/-smooth/-convex) Optimizer 4 of 25
Motivation Algorithm NonOpt Summary
Why?!
Do we need complicated “non” optimization software?
I Many problems are nonlinear, nonsmooth, and nonconvex,
I . . . but people say these can be solved with simple algorithms.
For example, trend is to analyze/use:
I subgradient method
I BFGS with line search
NonOpt: Non(-linear/-smooth/-convex) Optimizer 5 of 25
Motivation Algorithm NonOpt Summary
Nonsmooth Rosenbrock
f(x) = 8|x21 − x2|+ (1− x1)2
NonOpt: Non(-linear/-smooth/-convex) Optimizer 6 of 25
Motivation Algorithm NonOpt Summary
Nonsmooth Rosenbrock
f(x) = 8|x21 − x2|+ (1− x1)2
NonOpt: Non(-linear/-smooth/-convex) Optimizer 6 of 25
Motivation Algorithm NonOpt Summary
NonOpt
NonOpt is an open-source C++ software package to solve
minx∈Rn
f(x),
where f : Rn → R is
I locally Lipschitz,
I continuously differentiable over a full-measure subset of Rn,
I (potentially) nonsmooth, and
I (potentially) nonconvex.†
The code is designed to be extensible.
I Ideas borrowed from Ipopt. (Thanks, Andreas!)
I Additional features (e.g., constraints) forthcoming.
†Theory requires f to be weakly semismooth.
NonOpt: Non(-linear/-smooth/-convex) Optimizer 7 of 25
Motivation Algorithm NonOpt Summary
Central tenets of NonOpt
Quasi-Newton methods are surprisingly effective for “non” optimization.
I Self-correcting BFGS update
I Offers exactly the inequalities needed for convergence
I Careful radii updates; Curtis, Robinson, & Zhou, 2019
Quasi-Newton must be guided with cutting planes and/or gradient sampling.
I Point sets are critical for nonsmooth optimization.
I QP subproblems need to be solved.
I IMPORTANT: Specialized QP solvers, gradient aggregation, and inexactsubproblem solutions mean that added per-iteration cost can be negligiblecompared to “simple” algorithms.
NonOpt: Non(-linear/-smooth/-convex) Optimizer 8 of 25
Motivation Algorithm NonOpt Summary
Outline
Motivation
Algorithm
NonOpt
Summary
NonOpt: Non(-linear/-smooth/-convex) Optimizer 9 of 25
Motivation Algorithm NonOpt Summary
Search direction computation
At xk ∈ Rn, a smooth optimization algorithm computes dk ← x∗k − xk, where
x∗k ∈ arg minx∈Rn
f(xk) +∇f(xk)T (x− xk) + 12
(x− xk)THk(x− xk)
s.t. ‖x− xk‖ ≤ δk.
For nonsmooth f , with sets of points, scalars, and (sub)gradients
{xk,j}mj=1, {fk,j}mj=1, and {gk,j}mj=1,
NonOpt solves the primal subproblem
minx∈Rn
(max
j∈{1,...,m}{fk,j + gTk,j(x− xk,j)}+ 1
2(x− xk)THk(x− xk)
)s.t. ‖x− xk‖ ≤ δk.
(P)
NonOpt: Non(-linear/-smooth/-convex) Optimizer 10 of 25
Motivation Algorithm NonOpt Summary
Examples
Multiple types of algorithms involve subproblems of the form
minx∈Rn
(max
j∈{1,...,m}{fk,j + gTk,j(x− xk,j)}+ 1
2(x− xk)THk(x− xk)
)s.t. ‖x− xk‖ ≤ δk.
(P)
Bundle methods: (Schramm & Zowe, 1992)
I {xk,j} from previous / current iterates and trial points
I {fk,j} with fk,j = min{f(xk,j), f(xk) + gTk,j(xk,j − xk)− c‖xk,j − xk‖22}I {gk,j} with gk,j ∈ ∂f(xk,j)
Gradient sampling methods: (Burke, Lewis, & Overton, 2005)
I {xk,j} from current iterate and randomly sampled points
I {fk,j} with fk,j = f(xk)
I {gk,j} with gk,j = ∇f(xk,j)
NonOpt: Non(-linear/-smooth/-convex) Optimizer 11 of 25
Motivation Algorithm NonOpt Summary
Dual subproblem
The primal subproblem is nonsmooth:
minx∈Rn
(max
j∈{1,...,m}{fk,j + gTk,j(x− xk,j)}+ 1
2(x− xk)THk(x− xk)
)s.t. ‖x− xk‖ ≤ δk.
(P)
With Gk ← [gk,1 · · · gk,m], it is typically more efficient to solve the dual
sup(ω,γ)∈Rm
+×Rn− 1
2(Gkω + γ)TWk(Gkω + γ) + bTk ω − δk‖γ‖∗
s.t. 1Tmω = 1.
(D)
The primal solution can then be recovered by
x∗k ← xk −Wk (Gkωk + γk)︸ ︷︷ ︸gk
.
NonOpt has specialized active-set QP solver for (D) based on Kiwiel, 1985.
NonOpt: Non(-linear/-smooth/-convex) Optimizer 12 of 25
Motivation Algorithm NonOpt Summary
Self-correcting properties of BFGS updates with {(sk, vk)}Theorem 1 (Byrd & Nocedal, 1989)
Suppose that, for all k, there exists {η, θ} ⊂ R++ such that
η ≤sTk vk
‖sk‖22and
‖vk‖22sTk vk
≤ θ. (KEY)
Then, for any p ∈ (0, 1), there exist constants {ι, κ, λ} ⊂ R++ such that, for anyK ≥ 2, the following relations hold for at least dpKe values of k ∈ {1, . . . ,K}:
ι ≤sTkHksk
‖sk‖2‖Hksk‖2and κ ≤
‖Hksk‖2‖sk‖2
≤ λ.
Corollary 2
Suppose the conditions of Theorem 1 hold. Then, for any p ∈ (0, 1), there existconstants {µ, ν} ⊂ R++ such that, for any K ≥ 2, the following relations hold forat least dpKe values of k ∈ {1, . . . ,K}:
µ‖gk‖22 ≤ gTkWkgk and ‖Wkgk‖22 ≤ ν‖gk‖22
NonOpt: Non(-linear/-smooth/-convex) Optimizer 13 of 25
Motivation Algorithm NonOpt Summary
Algorithm NonOpt Algorithm Framework
1: Choose x1 ∈ Rn.2: Choose a symmetric positive definite W1 ∈ Rn×n.3: Choose α ∈ (0, 1)4: for all k ∈ N := {1, 2, . . . } do5: Solve (P)–(D) such that setting
Gk ←[gk,1 · · · gk,m
],
sk ← −Wk(Gkωk + γk),
and xk+1 ← xk + sk
6: yields (potentially after a line search)
f(xk+1) ≤ f(xk)− 12α(Gkωk + γk)
TWk(Gkωk + γk).
7: Choose yk ∈ Rn.8: Set βk ← min{β ∈ [0, 1] : v(β) := βsk + (1− β)yk satisfies (KEY)}.9: Set vk ← v(βk).
10: Set
Wk+1 ←(I −
vksTk
sTk vk
)T
Wk
(I −
vksTk
sTk vk
)+sks
Tk
sTk vk.
11: end for
NonOpt: Non(-linear/-smooth/-convex) Optimizer 14 of 25
Motivation Algorithm NonOpt Summary
Search direction computation
Algorithm 2 NonOpt Subproblem Solver
Require: {xk,j}mj=1, {fk,j}mj=1, {gk,j}mj=1, and Wk
1: for j = m,m+ 1, . . . do2: Solve (D) for (ωk,j , γk,j).3: Set sk,j ← −Wk(Gk,jωk,j + γk,j).4: Set xk,j+1 ← xk + sk,j .5: if either
f(xk,j+1) ≤ f(xk)− 12α(Gk,jωk,j + γk,j)
TWk(Gk,jωk,j + γk,j)
6: or‖Wk(Gk,jωk,j + γk,j)‖ ≤ ξδk,
‖Gk,jωk,j + γk,j‖ ≤ ξδk,and ‖Gk,jωk,j‖ ≤ ξδk
7: then return8: else add xk,j+1 and appropriate (fk,j+1, gk,j+1) to point set9: end if
10: end for
NonOpt: Non(-linear/-smooth/-convex) Optimizer 15 of 25
Motivation Algorithm NonOpt Summary
Additional features
Features being finalized before release:
I Inexact subproblem solves: Primal value pjk and dual value qjk yielding
pjk < 0 andqjk − q
0k
pjk − q0k
≥ ζ ∈ (0, 1)
I Gradient aggregation (new for gradient sampling!); Kiwiel, 1985
NonOpt: Non(-linear/-smooth/-convex) Optimizer 16 of 25
Motivation Algorithm NonOpt Summary
Outline
Motivation
Algorithm
NonOpt
Summary
NonOpt: Non(-linear/-smooth/-convex) Optimizer 17 of 25
Motivation Algorithm NonOpt Summary
Basics
Low-level classes:
I Vector
I SymmetricMatrix
I Point
I PointSet
Solver classes:
I Reporter (similar to Journalist in Ipopt)
I Options
I Quantities
I Strategies
NonOpt: Non(-linear/-smooth/-convex) Optimizer 18 of 25
Motivation Algorithm NonOpt Summary
Strategies
Strategy DirectionComputation
InverseHessianUpdate
LineSearch
PointSetUpdate
QPSolver
SymmetricMatrix
Gradient
CuttingPlane
GradientCombination
BFGS
Backtracking
WeakWolfeProximity
ActiveSetDense
LimitedMemory
NonOpt: Non(-linear/-smooth/-convex) Optimizer 19 of 25
Motivation Algorithm NonOpt Summary
Strategies
Strategy DirectionComputation
InverseHessianUpdate
LineSearch
PointSetUpdate
QPSolver
SymmetricMatrix
Gradient
CuttingPlane
GradientCombination
BFGS
Backtracking
WeakWolfeProximity
ActiveSetDense
LimitedMemory
NonOpt: Non(-linear/-smooth/-convex) Optimizer 19 of 25
Motivation Algorithm NonOpt Summary
Strategies
Strategy DirectionComputation
InverseHessianUpdate
LineSearch
PointSetUpdate
QPSolver
SymmetricMatrix
Gradient
CuttingPlane
GradientCombination
BFGS
Backtracking
WeakWolfeProximity
ActiveSetDense
LimitedMemory
NonOpt: Non(-linear/-smooth/-convex) Optimizer 19 of 25
Motivation Algorithm NonOpt Summary
Strategies
Strategy DirectionComputation
InverseHessianUpdate
LineSearch
PointSetUpdate
QPSolver
SymmetricMatrix
Gradient
CuttingPlane
GradientCombination
BFGS
Backtracking
WeakWolfe
Proximity
ActiveSetDense
LimitedMemory
NonOpt: Non(-linear/-smooth/-convex) Optimizer 19 of 25
Motivation Algorithm NonOpt Summary
Strategies
Strategy DirectionComputation
InverseHessianUpdate
LineSearch
PointSetUpdate
QPSolver
SymmetricMatrix
Gradient
CuttingPlane
GradientCombination
BFGS
Backtracking
WeakWolfe
Proximity
ActiveSetDense
LimitedMemory
NonOpt: Non(-linear/-smooth/-convex) Optimizer 19 of 25
Motivation Algorithm NonOpt Summary
Strategies
Strategy DirectionComputation
InverseHessianUpdate
LineSearch
PointSetUpdate
QPSolver
SymmetricMatrix
Gradient
CuttingPlane
GradientCombination
BFGS
Backtracking
WeakWolfeProximity
ActiveSet
Dense
LimitedMemory
NonOpt: Non(-linear/-smooth/-convex) Optimizer 19 of 25
Motivation Algorithm NonOpt Summary
Strategies
Strategy DirectionComputation
InverseHessianUpdate
LineSearch
PointSetUpdate
QPSolver
SymmetricMatrix
Gradient
CuttingPlane
GradientCombination
BFGS
Backtracking
WeakWolfeProximity
ActiveSet
Dense
LimitedMemory
NonOpt: Non(-linear/-smooth/-convex) Optimizer 19 of 25
Motivation Algorithm NonOpt Summary
Nonsmooth test problems
Name Convex? f(x0) f(x∗)maxq Yes 2500.0 0.0mxhilb Yes 4.5 0.0chained lq Yes 49.0 -69.3chained cb3 1 Yes 980.0 98.0chained cb3 2 Yes 980.0 98.0active faces No 3.9 0.0brown function 2 No 98.0 0.0chained mifflin 2 No 232.8 -34.8chained crescent 1 No 292.3 0.0chained crescent 2 No 292.3 0.0
Codes:
I NonOpt
I LMBM: Karmitsa/Haarala, Miettinen, & Makela, 2004 & 2007
I HANSO: Lewis & Overton, 2012; Burke, Lewis, & Overton, 2005
Termination:
I NonOpt terminates for all problems with ‖Gkωk‖ ≤ 10−4.
I LMBM terminates 9/10 times due to small objective improvement.
I HANSO terminates 5/10 times due to small objective improvement.
NonOpt: Non(-linear/-smooth/-convex) Optimizer 20 of 25
Motivation Algorithm NonOpt Summary
Numerical results, nonsmooth test problems
Iterations Function evaluations
NonOpt: Non(-linear/-smooth/-convex) Optimizer 21 of 25
Motivation Algorithm NonOpt Summary
Chained Crescent 1+--------------------------------------------------------------+
| NonOpt = Nonsmooth Optimization Solver |
| Please visit http://coral.ise.lehigh.edu/frankecurtis/nonopt |
+--------------------------------------------------------------+
Number of variables.............. : 50
Direction computation strategy... : CuttingPlane
Inverse Hessian update strategy.. : BFGS
Line search strategy............. : WeakWolfe
Point set update strategy........ : Proximity
QP solver strategy............... : ActiveSet
Symmetric matrix strategy........ : Dense
-------------------------------------------------------------------------------------------------------------------------
Iter. Objective Stat. Rad. Trust Rad. |Points| QP Error |G. Combo.| |Step| Stepsize Correction
-------------------------------------------------------------------------------------------------------------------------
0 +2.9225e+02 +7.0000e-01 +7.0000e+04 0 +0.0000e+00 +7.0000e+00 +7.0000e+00 +5.0000e-01 +0.0000e+00
1 +2.8775e+02 +7.0000e-01 +7.0000e+04 1 +0.0000e+00 +7.0000e+00 +1.7567e+00 +1.0000e+00 +0.0000e+00
...
19 +5.2006e-08 +1.0000e-04 +7.0000e+00 12 +7.7753e-16 +1.7149e-04 +8.5751e-05 +1.0000e+00 +0.0000e+00
20 +2.2457e-08 +1.0000e-04 +7.0000e+00 14 +2.2204e-16 +8.3220e-05 +4.1619e-05
EXIT: Stationary point found.
Objective........................ : 2.245677e-08
Number of iterations............. : 20
Number of inner iterations....... : 35
Number of QP iterations.......... : 16
Number of function evaluations... : 82
Number of gradient evaluations... : 35
CPU seconds...................... : 0.006477
CPU seconds in NonOpt............ : 0.005924
CPU seconds in evaluations....... : 0.000553
NonOpt: Non(-linear/-smooth/-convex) Optimizer 22 of 25
Motivation Algorithm NonOpt Summary
Numerical results, CUTE/r/st (smooth) problems
Nonsmooth algorithms for solving difficult smooth problems?
Iterations Function evaluations
NonOpt: Non(-linear/-smooth/-convex) Optimizer 23 of 25
Motivation Algorithm NonOpt Summary
Outline
Motivation
Algorithm
NonOpt
Summary
NonOpt: Non(-linear/-smooth/-convex) Optimizer 24 of 25
Motivation Algorithm NonOpt Summary
SummaryNonOpt is an open-source C++ software package to solve
minx∈Rn
f(x),
where f : Rn → R is
I locally Lipschitz,
I continuously differentiable over a full-measure subset of Rn,
I (potentially) nonsmooth, and
I (potentially) nonconvex.
? F. E. Curtis and X. Que.
An Adaptive Gradient Sampling Algorithm for Nonsmooth Optimization.
Optimization Methods and Software, 28(6):1302–1324, 2013.
? F. E. Curtis and X. Que.
A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with GlobalConvergence Guarantees.
Mathematical Programming Computation, 7(4):399–428, 2015.
? F. E. Curtis, D. P. Robinson, and B. Zhou.
A Self-Correcting Variable-Metric Algorithm Framework for Nonsmooth Optimization.
IMA Journal of Numerical Analysis, 10.1093/imanum/drz008, 2019.
NonOpt: Non(-linear/-smooth/-convex) Optimizer 25 of 25