Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Algorithms for Nonsmooth Optimization
Frank E. Curtis, Lehigh University
presented at
Center for Optimization and Statistical Learning,
Northwestern University
2 March 2018
Algorithms for Nonsmooth Optimization 1 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Outline
Motivating Examples
Subdifferential Theory
Fundamental Algorithms
Nonconvex Nonsmooth Functions
General Framework
Algorithms for Nonsmooth Optimization 2 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Outline
Motivating Examples
Subdifferential Theory
Fundamental Algorithms
Nonconvex Nonsmooth Functions
General Framework
Algorithms for Nonsmooth Optimization 3 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Nonsmooth optimization
In mathematical optimization, one wants to
I minimize an objective
I subject to constraints
i.e.,minx∈X
f(x)
Why nonsmooth optimization?
Nonsmoothness can arise for different reasons:I physical
(phenomena can be nonsmooth)I phase changes in materials
I technological
(constraints impose nonsmoothness)I obstacles in shape design
I methodological
(nonsmoothness introduced by solution method)I decompositions; penalty formulations
I numerical
(analytically smooth, but practically nonsmooth)I “stiff” problems
(Bagirov, Karmitsa, Makela (2014))
Algorithms for Nonsmooth Optimization 4 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Nonsmooth optimization
In mathematical optimization, one wants to
I minimize an objective
I subject to constraints
i.e.,minx∈X
f(x)
Why nonsmooth optimization? Nonsmoothness can arise for different reasons:I physical
(phenomena can be nonsmooth)I phase changes in materials
I technological
(constraints impose nonsmoothness)I obstacles in shape design
I methodological
(nonsmoothness introduced by solution method)I decompositions; penalty formulations
I numerical
(analytically smooth, but practically nonsmooth)I “stiff” problems
(Bagirov, Karmitsa, Makela (2014))
Algorithms for Nonsmooth Optimization 4 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Nonsmooth optimization
In mathematical optimization, one wants to
I minimize an objective
I subject to constraints
i.e.,minx∈X
f(x)
Why nonsmooth optimization? Nonsmoothness can arise for different reasons:I physical (phenomena can be nonsmooth)
I phase changes in materials
I technological
(constraints impose nonsmoothness)I obstacles in shape design
I methodological
(nonsmoothness introduced by solution method)I decompositions; penalty formulations
I numerical
(analytically smooth, but practically nonsmooth)I “stiff” problems
(Bagirov, Karmitsa, Makela (2014))
Algorithms for Nonsmooth Optimization 4 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Nonsmooth optimization
In mathematical optimization, one wants to
I minimize an objective
I subject to constraints
i.e.,minx∈X
f(x)
Why nonsmooth optimization? Nonsmoothness can arise for different reasons:I physical (phenomena can be nonsmooth)
I phase changes in materials
I technological (constraints impose nonsmoothness)I obstacles in shape design
I methodological
(nonsmoothness introduced by solution method)I decompositions; penalty formulations
I numerical
(analytically smooth, but practically nonsmooth)I “stiff” problems
(Bagirov, Karmitsa, Makela (2014))
Algorithms for Nonsmooth Optimization 4 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Nonsmooth optimization
In mathematical optimization, one wants to
I minimize an objective
I subject to constraints
i.e.,minx∈X
f(x)
Why nonsmooth optimization? Nonsmoothness can arise for different reasons:I physical (phenomena can be nonsmooth)
I phase changes in materials
I technological (constraints impose nonsmoothness)I obstacles in shape design
I methodological (nonsmoothness introduced by solution method)I decompositions; penalty formulations
I numerical
(analytically smooth, but practically nonsmooth)I “stiff” problems
(Bagirov, Karmitsa, Makela (2014))
Algorithms for Nonsmooth Optimization 4 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Nonsmooth optimization
In mathematical optimization, one wants to
I minimize an objective
I subject to constraints
i.e.,minx∈X
f(x)
Why nonsmooth optimization? Nonsmoothness can arise for different reasons:I physical (phenomena can be nonsmooth)
I phase changes in materials
I technological (constraints impose nonsmoothness)I obstacles in shape design
I methodological (nonsmoothness introduced by solution method)I decompositions; penalty formulations
I numerical (analytically smooth, but practically nonsmooth)I “stiff” problems
(Bagirov, Karmitsa, Makela (2014))
Algorithms for Nonsmooth Optimization 4 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Data fitting
minx∈Rn
θ(x) + ψ(x) where, e.g., θ(x) = ‖Ax− b‖22
and ψ(x) =n∑i=1
φ(xi) with
φ1(t) =α|t|
1 + α|t|,
φ2(t) = log(α|t|+ 1),
φ3(t) = |t|q , or
φ4(t) = α−(α− t)2+
α
Algorithms for Nonsmooth Optimization 5 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Clusterwise linear regression (CLR)
Given a dataset of pairs A := {(ai, bi)}li=1, the goal of CLR is to simultaneously
I partition the dataset into k disjoint clusters, and
I find regression coefficients {(xj , yj)}kj=1 for each cluster
in order to minimize overall error in the fit; e.g.,
min{(xj ,yj)}
fk({xj , yj}), where fk({xj , yj}) =l∑i=1
minj∈{1,...,k}
|xTj ai − yj − bi|p.
This objective is nonconvex (though it is a difference of convex functions).
Algorithms for Nonsmooth Optimization 6 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Decomposition
Various types of decomposition strategies introduce nonsmoothness.I Primal decomposition can be used for
min(x1,x2,y)
f1(x1, y) + f2(x2, y),
where y is the complicating/linking variable; equivalent to
miny
φ1(y) + φ2(y) where
φ1(y) := min
x1f1(x1, y)
φ2(y) := minx2
f2(x2, y)
This master problem may be nonsmooth in y.
I Dual decomposition can be used for same problem, reformulating as
min(x1,x2,y)
f1(x1, y1) + f2(x2, y2) s.t. y1 = y2
The Lagrangian is separable, meaning the dual function decomposes:
g1(λ) = inf(x1,y1)
(f1(x1, y1) + λT y1)
g2(λ) = inf(x2,y2)
(f2(x2, y2)− λT y2)
Dual problem to maximize g(λ) = g1(λ) + g2(λ) may be nonsmooth in λ.
Algorithms for Nonsmooth Optimization 7 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Decomposition
Various types of decomposition strategies introduce nonsmoothness.I Primal decomposition can be used for
min(x1,x2,y)
f1(x1, y) + f2(x2, y),
where y is the complicating/linking variable; equivalent to
miny
φ1(y) + φ2(y) where
φ1(y) := min
x1f1(x1, y)
φ2(y) := minx2
f2(x2, y)
This master problem may be nonsmooth in y.I Dual decomposition can be used for same problem, reformulating as
min(x1,x2,y)
f1(x1, y1) + f2(x2, y2) s.t. y1 = y2
The Lagrangian is separable, meaning the dual function decomposes:
g1(λ) = inf(x1,y1)
(f1(x1, y1) + λT y1)
g2(λ) = inf(x2,y2)
(f2(x2, y2)− λT y2)
Dual problem to maximize g(λ) = g1(λ) + g2(λ) may be nonsmooth in λ.
Algorithms for Nonsmooth Optimization 7 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Dual decomposition with constraints
Consider the nearly separable problem
min(x1,...,xm)
m∑i=1
fi(xi)
s.t. xi ∈ Xi for all i ∈ {1, . . . ,m}m∑i=1
Aixi ≤ b (e.g., shared resource constraint)
where the last are complicating/linking constraints; “dualizing” leads to
g(λ) := min(x1,...,xm)
m∑i=1
fi(xi) + λT
(m∑i=1
Aixi − b)
s.t. xi ∈ Xi for all i ∈ {1, . . . ,m}.
Given λ ∈ Rm, the value g(λ) comes from solving separable problems; the dual
maxλ≥0
g(λ)
is typically nonsmooth (and people often use poor algorithms!).
Algorithms for Nonsmooth Optimization 8 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Control of dynamical systems
Consider the discrete time linear dynamical system:
yk+1 = Ayk +Buk (state equation)
zk = Cyk (observation equation)
Supposing we want to “design” a control such that
uk = XCyk (where X is our variable)
consider the “closed loop system” given by
yk+1 = Ayk +Buk
= Ayk +BXCyk
= (A+BXC)yk.
Common objectives are to minimize a stability measure
ρ(A+BXC),
which are often functions of the eigenvalues of A+BXC.
Algorithms for Nonsmooth Optimization 9 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Eigenvalue optimization
Plots of ordered eigenvalues as matrix is perturbed along a given direction:
Algorithms for Nonsmooth Optimization 10 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Other sources of nonsmooth optimization problems
I Lagrangian relaxation
I Composite optimization (e.g., penalty methods for “soft constraints”)
I Parametric optimization (e.g., for model predictive control)
I Multilevel optimization
Algorithms for Nonsmooth Optimization 11 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Outline
Motivating Examples
Subdifferential Theory
Fundamental Algorithms
Nonconvex Nonsmooth Functions
General Framework
Algorithms for Nonsmooth Optimization 12 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Derivatives
When I teach an optimization class, I always start with the same question:
What is a derivative? (f : R → R)
Answer I get: “slope of the tangent line”
x
f(x)
slope = f ′(x)
Algorithms for Nonsmooth Optimization 13 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Derivatives
When I teach an optimization class, I always start with the same question:
What is a derivative? (f : R → R)
Answer I get: “slope of the tangent line”
x
f(x)
slope = f ′(x)
Algorithms for Nonsmooth Optimization 13 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Gradients
Then I ask:
What is a gradient? (f : Rn → R)
Answer I get: “direction along which the function increases at the fastest rate”
Algorithms for Nonsmooth Optimization 14 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Gradients
Then I ask:
What is a gradient? (f : Rn → R)
Answer I get: “direction along which the function increases at the fastest rate”
Algorithms for Nonsmooth Optimization 14 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Derivative vs. gradient
So if a derivative is a magnitude (here, a slope), then why does it generalize inmultiple dimensions to something that is a direction?
(n = 1) f ′(x) =df
dx(x) =
∂f
∂x(x)
(n ≥ 1) ∇f(x) =
∂f∂x1
(x)
...∂f∂xn
(x)
What’s important? Magnitude? direction?
Answer: The gradient is a vector in Rn, which
I has magnitude (e.g., its 2-norm)
I can be viewed as a direction
I and gives us a way to compute directional derivatives
Algorithms for Nonsmooth Optimization 15 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Derivative vs. gradient
So if a derivative is a magnitude (here, a slope), then why does it generalize inmultiple dimensions to something that is a direction?
(n = 1) f ′(x) =df
dx(x) =
∂f
∂x(x)
(n ≥ 1) ∇f(x) =
∂f∂x1
(x)
...∂f∂xn
(x)
What’s important? Magnitude? direction?
Answer: The gradient is a vector in Rn, which
I has magnitude (e.g., its 2-norm)
I can be viewed as a direction
I and gives us a way to compute directional derivatives
Algorithms for Nonsmooth Optimization 15 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Differentiable f
How should we think about the gradient?
If f is continuously differentiable (i.e., f ∈ C1),
then ∇f(x) is the unique vector in the linear (Taylor) approximation of f at x.
x x
f(x) +∇f(x)T (x− x)
f(x)
Both are graphs of functions of x!
Algorithms for Nonsmooth Optimization 16 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Differentiable and convex f
If f ∈ C1 is convex, then
f(x) ≥ f(x) +∇f(x)T (x− x) for all (x, x) ∈ Rn × Rn
x x
f(x) +∇f(x)T (x− x)
f(x)
Algorithms for Nonsmooth Optimization 17 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Graphs and epigraphs
There is another interpretation of a gradient that is also useful. First. . .
What is a graph?
A set of points in Rn+1, namely, {(x, z) : f(x) = z}
A related quantity, another set, is the epigraph: {(x, z) : f(x) ≤ z}
Algorithms for Nonsmooth Optimization 18 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Graphs and epigraphs
There is another interpretation of a gradient that is also useful. First. . .
What is a graph?
A set of points in Rn+1, namely, {(x, z) : f(x) = z}
A related quantity, another set, is the epigraph: {(x, z) : f(x) ≤ z}
x
{(x, f(x))}
Algorithms for Nonsmooth Optimization 18 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Graphs and epigraphs
There is another interpretation of a gradient that is also useful. First. . .
What is a graph?
A set of points in Rn+1, namely, {(x, z) : f(x) = z}
A related quantity, another set, is the epigraph: {(x, z) : f(x) ≤ z}
x
{(x, f(x))}
Algorithms for Nonsmooth Optimization 18 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Differentiable and convex f
If f ∈ C1 is convex, then, for all (x, x) ∈ Rn × Rn,
f(x) ≥ f(x) +∇f(x)T (x− x)
⇐⇒ f(x)−∇f(x)T x ≥ f(x)−∇f(x)T x
⇐⇒[−∇f(x)
1
]T [x
f(x)
]≥[−∇f(x)
1
]T [x
f(x)
]
Note: Given x, the vector
[−∇f(x)
1
]is given,
so the inequality above involves a linear function over Rn+1 and says
the value at any point
[x
f(x)
]in the graph is at least the value at
[x
f(x)
]
Algorithms for Nonsmooth Optimization 19 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Differentiable and convex f
If f ∈ C1 is convex, then, for all (x, x) ∈ Rn × Rn,
f(x) ≥ f(x) +∇f(x)T (x− x)
⇐⇒ f(x)−∇f(x)T x ≥ f(x)−∇f(x)T x
⇐⇒[−∇f(x)
1
]T [x
f(x)
]≥[−∇f(x)
1
]T [x
f(x)
]
Note: Given x, the vector
[−∇f(x)
1
]is given,
so the inequality above involves a linear function over Rn+1 and says
the value at any point
[x
f(x)
]in the graph is at least the value at
[x
f(x)
]
Algorithms for Nonsmooth Optimization 19 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Linearization
and supporting hyperplane for epigraph
x x
f(x) +∇f(x)T (x− x)
f(x)
Algorithms for Nonsmooth Optimization 20 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Linearization and supporting hyperplane for epigraph
x x
f(x) +∇f(x)T (x− x)
{(x, f(x))}
[x
f(x)
]+
[−∇f(x)
1
]
Algorithms for Nonsmooth Optimization 20 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Subgradients (convex f)
Why was that useful?
We can generalize this idea when the function is not differentiable somewhere.
[x
f(x)
]
[x
f(x)
]+
[−g1
]
A vector g ∈ Rn is a subgradient of a convex f : Rn → R at x ∈ Rn if
f(x) ≥ f(x) + gT (x− x)
⇐⇒[−g1
]T [x
f(x)
]≥[−g1
]T [x
f(x)
]
Algorithms for Nonsmooth Optimization 21 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Subdifferentials
Theorem
If f is convex and differentiable at x, then ∇f(x) is its unique subgradient at x.
But in general,
the set of all subgradients for a convex f at x is the subdifferential of f at x:
∂f(x) := {g ∈ Rn : g is a subgradient of f at x}.
From the definition, it is easily seen that
x∗ is a minimizer of f if and only if 0 ∈ ∂f(x∗)
Algorithms for Nonsmooth Optimization 22 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
What about nonconvex, nonsmooth?
We need to generalize the idea of a subgradient further.
I Directional derivatives
I Subgradients
I Subdifferentials
Let’s return to this after we discuss some algorithms. . .
Algorithms for Nonsmooth Optimization 23 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Outline
Motivating Examples
Subdifferential Theory
Fundamental Algorithms
Nonconvex Nonsmooth Functions
General Framework
Algorithms for Nonsmooth Optimization 24 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
A fundamental iteration
Thinking of −∇f(xk), we have a vector that
I directs us in a direction of descent, and
I vanishes as we approach a minimizer
Algorithm : Gradient Descent
1: Choose an initial point x0 ∈ Rn and stepsize α ∈ (0, 1/L]2: for k = 0, 1, 2, . . . do3: if ‖∇f(xk)‖ ≈ 0, then return xk4: else set
xk+1 ← xk − α∇f(xk)
I call this a fundamental iteration.
Here, we suppose ∇f is Lipschitz continuous, i.e., there exists L ≥ 0 such that
‖∇f(x)−∇f(x)‖2 ≤ L‖x− x‖2 for all (x, x) ∈ Rn × Rn
=⇒ f(x) ≤ f(x) +∇f(x)T (x− x) + 12L‖x− x‖22 for all (x, x) ∈ Rn × Rn.
Algorithms for Nonsmooth Optimization 25 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
A fundamental iteration
Thinking of −∇f(xk), we have a vector that
I directs us in a direction of descent, and
I vanishes as we approach a minimizer
Algorithm : Gradient Descent
1: Choose an initial point x0 ∈ Rn and stepsize α ∈ (0, 1/L]2: for k = 0, 1, 2, . . . do3: if ‖∇f(xk)‖ ≈ 0, then return xk4: else set
xk+1 ← xk − α∇f(xk)
I call this a fundamental iteration.
Here, we suppose ∇f is Lipschitz continuous, i.e., there exists L ≥ 0 such that
‖∇f(x)−∇f(x)‖2 ≤ L‖x− x‖2 for all (x, x) ∈ Rn × Rn
=⇒ f(x) ≤ f(x) +∇f(x)T (x− x) + 12L‖x− x‖22 for all (x, x) ∈ Rn × Rn.
Algorithms for Nonsmooth Optimization 25 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
A fundamental iteration
Thinking of −∇f(xk), we have a vector that
I directs us in a direction of descent, and
I vanishes as we approach a minimizer
Algorithm : Gradient Descent
1: Choose an initial point x0 ∈ Rn and stepsize α ∈ (0, 1/L]2: for k = 0, 1, 2, . . . do3: if ‖∇f(xk)‖ ≈ 0, then return xk4: else set
xk+1 ← xk − α∇f(xk)
I call this a fundamental iteration.
Here, we suppose ∇f is Lipschitz continuous, i.e., there exists L ≥ 0 such that
‖∇f(x)−∇f(x)‖2 ≤ L‖x− x‖2 for all (x, x) ∈ Rn × Rn
=⇒ f(x) ≤ f(x) +∇f(x)T (x− x) + 12L‖x− x‖22 for all (x, x) ∈ Rn × Rn.
Algorithms for Nonsmooth Optimization 25 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Convergence of gradient descent
xk x
f(xk)
f(xk) +∇f(xk)T (x − xk) + 12L‖x − xk‖
22
f(x)? f(x)?
Algorithms for Nonsmooth Optimization 26 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Convergence of gradient descent
xk x
f(xk)
f(xk) +∇f(xk)T (x − xk) + 12L‖x − xk‖
22
f(x)? f(x)?
Algorithms for Nonsmooth Optimization 26 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Convergence of gradient descent
xk x
f(xk)
f(xk) +∇f(xk)T (x − xk) + 12L‖x − xk‖
22
f(x)? f(x)?
Algorithms for Nonsmooth Optimization 26 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Gradient descent for f
Theorem
If ∇f is Lipschitz continuous with constant L > 0 and α ∈ (0, 1/L], then
∞∑j=0
‖∇f(xj)‖22 <∞ which implies {∇f(xj)} → 0.
Proof.
Let k ∈ N and recall that xk+1 − xk = −α∇f(xk). Then, since α ∈ (0, 1/L],
f(xk+1) ≤ f(xk) +∇f(xk)T (xk+1 − xk) + 12L‖xk+1 − xk‖22
= f(xk)− α‖∇f(xk)‖22 + 12α2L‖∇f(xk)‖22
= f(xk)− α(1− 12αL)‖∇f(xk)‖22
≤ f(xk)− 12α‖∇f(xk)‖22.
Thus, summing over j ∈ {0, . . . , k}, one finds
∞ > f(x0)− finf ≥ f(x0)− f(xk+1) ≥ 12α∑kj=0 ‖∇f(xj)‖22.
Algorithms for Nonsmooth Optimization 27 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Gradient descent for f
Theorem
If ∇f is Lipschitz continuous with constant L > 0 and α ∈ (0, 1/L], then
∞∑j=0
‖∇f(xj)‖22 <∞ which implies {∇f(xj)} → 0.
Proof.
Let k ∈ N and recall that xk+1 − xk = −α∇f(xk). Then, since α ∈ (0, 1/L],
f(xk+1) ≤ f(xk) +∇f(xk)T (xk+1 − xk) + 12L‖xk+1 − xk‖22
= f(xk)− α‖∇f(xk)‖22 + 12α2L‖∇f(xk)‖22
= f(xk)− α(1− 12αL)‖∇f(xk)‖22
≤ f(xk)− 12α‖∇f(xk)‖22.
Thus, summing over j ∈ {0, . . . , k}, one finds
∞ > f(x0)− finf ≥ f(x0)− f(xk+1) ≥ 12α∑kj=0 ‖∇f(xj)‖22.
Algorithms for Nonsmooth Optimization 27 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Strong convexity
Now suppose that f is c-strongly convex, which means that
f(x) ≥ f(x) +∇f(x)T (x− x) + 12c‖x− x‖22 for all (x, x) ∈ Rn × Rn.
Important consequences of this are that
I f has a unique global minimizer, call it x∗ with f∗ := f(x∗), and
I the gradient norm grows with the optimality error in that
2c(f(x)− f∗) ≤ ‖∇f(x)‖22 for all x ∈ Rn.
Algorithms for Nonsmooth Optimization 28 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Strong convexity, lower bound
xk x
f(xk)
f(xk) +∇f(xk)T (x − xk) + 12L‖x − xk‖
22
f(xk) +∇f(xk)T (x − xk) + 12c‖x − xk‖
22
Algorithms for Nonsmooth Optimization 29 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Strong convexity, lower bound
xk x
f(xk)
f(xk) +∇f(xk)T (x − xk) + 12L‖x − xk‖
22
f(xk) +∇f(xk)T (x − xk) + 12c‖x − xk‖
22
Algorithms for Nonsmooth Optimization 29 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Gradient descent for strongly convex f
Theorem
If ∇f is Lipschitz with L > 0, f is c-strongly convex, and α ∈ (0, 1/L], then
f(xj+1)− f∗ ≤ (1− αc)j+1(f(x0)− f∗) for all j ∈ N.
Proof.
Let k ∈ N. Following the previous proof, one finds
f(xk+1) ≤ f(xk)− 12α‖∇f(xk)‖22
≤ f(xk)− αc(f(xk)− f∗).
Subtracting f∗ from both sides, one finds
f(xk+1)− f∗ ≤ (1− αc)(f(xk)− f∗).
Applying the result repeatedly over j ∈ {0, . . . , k} yields the result.
Algorithms for Nonsmooth Optimization 30 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Gradient descent for strongly convex f
Theorem
If ∇f is Lipschitz with L > 0, f is c-strongly convex, and α ∈ (0, 1/L], then
f(xj+1)− f∗ ≤ (1− αc)j+1(f(x0)− f∗) for all j ∈ N.
Proof.
Let k ∈ N. Following the previous proof, one finds
f(xk+1) ≤ f(xk)− 12α‖∇f(xk)‖22
≤ f(xk)− αc(f(xk)− f∗).
Subtracting f∗ from both sides, one finds
f(xk+1)− f∗ ≤ (1− αc)(f(xk)− f∗).
Applying the result repeatedly over j ∈ {0, . . . , k} yields the result.
Algorithms for Nonsmooth Optimization 30 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
A fundamental iteration when f is nonsmooth?
What is a fundamental iteration for nonsmooth optimization?
Steepest descent!
For convex f , the directional derivative of f at x along s is
f ′(x; s) = maxg∈∂f(x)
gT s
Along which direction is f decreasing at the fastest rate?
The solution of an optimization problem!
min‖s‖2≤1
f ′(x; s) = min‖s‖2≤1
maxg∈∂f(x)
gT s
= maxg∈∂f(x)
min‖s‖2≤1
gT s (von Neumann minimax theorem)
= maxg∈∂f(x)
(−‖g‖2)
= − ming∈∂f(x)
‖g‖2 =⇒ (need minimum norm subgradient)
Algorithms for Nonsmooth Optimization 31 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
A fundamental iteration when f is nonsmooth?
What is a fundamental iteration for nonsmooth optimization?
Steepest descent!
For convex f , the directional derivative of f at x along s is
f ′(x; s) = maxg∈∂f(x)
gT s
Along which direction is f decreasing at the fastest rate?
The solution of an optimization problem!
min‖s‖2≤1
f ′(x; s) = min‖s‖2≤1
maxg∈∂f(x)
gT s
= maxg∈∂f(x)
min‖s‖2≤1
gT s (von Neumann minimax theorem)
= maxg∈∂f(x)
(−‖g‖2)
= − ming∈∂f(x)
‖g‖2 =⇒ (need minimum norm subgradient)
Algorithms for Nonsmooth Optimization 31 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
A fundamental iteration when f is nonsmooth?
What is a fundamental iteration for nonsmooth optimization?
Steepest descent!
For convex f , the directional derivative of f at x along s is
f ′(x; s) = maxg∈∂f(x)
gT s
Along which direction is f decreasing at the fastest rate?
The solution of an optimization problem!
min‖s‖2≤1
f ′(x; s) = min‖s‖2≤1
maxg∈∂f(x)
gT s
= maxg∈∂f(x)
min‖s‖2≤1
gT s (von Neumann minimax theorem)
= maxg∈∂f(x)
(−‖g‖2)
= − ming∈∂f(x)
‖g‖2 =⇒ (need minimum norm subgradient)
Algorithms for Nonsmooth Optimization 31 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Main challenge
But, typically, we can only access g ∈ ∂f(x), not all of ∂f(x)
I would argue:
no practical fundamental iteration for general nonsmooth optimization
(no computable descent direction that vanishes near a minimizer)
What are our options?
There are a few ways to design a convergent algorithm:
I algorithmically (e.g., subgradient method)
I iteratively (e.g., cutting plane / bundle methods)
I randomly (e.g., gradient sampling)
Algorithms for Nonsmooth Optimization 32 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Main challenge
But, typically, we can only access g ∈ ∂f(x), not all of ∂f(x)
I would argue:
no practical fundamental iteration for general nonsmooth optimization
(no computable descent direction that vanishes near a minimizer)
What are our options?
There are a few ways to design a convergent algorithm:
I algorithmically (e.g., subgradient method)
I iteratively (e.g., cutting plane / bundle methods)
I randomly (e.g., gradient sampling)
Algorithms for Nonsmooth Optimization 32 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Subgradient method
Algorithm : Subgradient method (not descent)
1: Choose an initial point x0 ∈ Rn.2: for k = 0, 1, 2, . . . do3: if a termination condition is satisfied, then return xk4: else compute gk ∈ ∂f(xk), choose αk ∈ R>0, and set
xk+1 ← xk − αkgk
Algorithms for Nonsmooth Optimization 33 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Why not “subgradient descent”?
Consider
minx∈R2
f(x), where f(x1, x2) := x1 + x2 + max{0, x21 + x22 − 4}.
At x = (0,−2), we have
∂f(x) = conv
{[11
],
[1−3
]}, but −
[11
]and −
[1−3
]are both directions of ascent for f from x!
Algorithms for Nonsmooth Optimization 34 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Decreasing the distance to a solution
The objective f is not the only measure of progress.
I Given an arbitrary subgradient gk for f at xk, we have
f(x) ≥ f(xk) + gTk (x− xk) for all x ∈ Rn, (1)
which means that all points with an objective value lower than f(xk) lie in
Hk := {x ∈ Rn : gTk (x− xk) ≤ 0}
I Thus, a small step along −gk should decrease the distance to a solution
I (Convexity is crucial for this idea)
Algorithms for Nonsmooth Optimization 35 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
“Algorithmic convergence”
Theorem
If f has a minimizer, ‖gk‖2 ≤ G ∈ R>0 for all k ∈ N, and the stepsizes satisfy
∞∑k=1
αk =∞ and∞∑k=1
α2k <∞, (2)
then
limk→∞
{min
j∈{0,...,k}fj
}= f∗.
I An example sequence satisfying (2) is αk = α/k for k = 1, 2, . . .
Algorithms for Nonsmooth Optimization 36 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Proof, limk→∞{
minj∈{0,...,k} fj}
= f∗, part 1.
Let k ∈ N. By (1), the iterates satisfy
‖xk+1 − x∗‖22 = ‖xk − αkgk − x∗‖22= ‖xk − x∗‖22 − 2αkg
Tk (xk − x∗) + α2
k‖gk‖22
≤ ‖xk − x∗‖22 − 2αk(fk − f∗) + α2k‖gk‖
22.
Applying this inequality recursively, we have
0 ≤ ‖xk+1 − x∗‖22 ≤ ‖x0 − x∗‖22 − 2k∑j=0
αj(fj − f∗) +k∑j=0
α2j‖gj‖22,
which implies that
2k∑j=0
αj(fj − f∗) ≤ ‖x0 − x∗‖22 +k∑j=1
α2j‖gj‖22
⇒ minj∈{0,...,k}
fj − f∗ ≤‖x0 − x∗‖22 +G2
∑kj=1 α
2j
2∑kj=0 αj
. (3)
Algorithms for Nonsmooth Optimization 37 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Proof, limk→∞{
minj∈{0,...,k} fj}
= f∗, part 2.
Now consider an arbitrary scalar ε > 0. By (2), there exists a nonnegative integerK such that, for all k > K,
αk ≤ε
G2and
k∑j=0
αj ≥1
ε
‖x0 − x∗‖22 +G2K∑j=0
α2j
.
Then, by (3), it follows that for all k > K we have
minj∈{0,...,k}
fj − f∗ ≤‖x0 − x∗‖22 +G2
∑Kj=0 α
2j
2∑kj=0 αj
+G2∑kj=K+1 α
2j
2∑Kj=0 αj + 2
∑kj=K+1 αj
≤‖x0 − x∗‖22 +G2
∑Kj=0 α
2j
2ε
(‖x0 − x∗‖22 +G2
∑Kj=0 α
2j
) +G2∑kj=K+1
εG2 αj
2∑kj=K+1 αj
=ε
2+ε
2= ε.
The result follows since ε > 0 was chosen arbitrarily.
Algorithms for Nonsmooth Optimization 38 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Cutting plane method
Subgradient methods lose previously computed information in every iteration.
I Suppose, after a sequence of iterates, we have the affine underestimators
fi(x) = f(xi) + gTi (x− xi) for all i ∈ {0, . . . , k}.
x
f(x)
x0 x1x2
f(x1) + gT1 (x− x1)
f(x0) + gT0 (x− x0)
I At iteration k, we can compute the next iterate by solving the master problem
xk+1 ← arg minx∈X
fk(x), where fk(x) := maxi∈{1,...,k}
(f(xi) + gTi (x− xi)).
Algorithms for Nonsmooth Optimization 39 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Cutting plane method
Subgradient methods lose previously computed information in every iteration.
I Suppose, after a sequence of iterates, we have the affine underestimators
fi(x) = f(xi) + gTi (x− xi) for all i ∈ {0, . . . , k}.
x
f(x)
x0 x1x2
f(x1) + gT1 (x− x1)
f(x0) + gT0 (x− x0)
I At iteration k, we can compute the next iterate by solving the master problem
xk+1 ← arg minx∈X
fk(x), where fk(x) := maxi∈{1,...,k}
(f(xi) + gTi (x− xi)).
Algorithms for Nonsmooth Optimization 39 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Cutting plane method convergence
The iterates of the cutting plane method yield lower bounds of the optimal value:
vk+1 := minx∈X
fk(x) ≤ minx∈X
f(x) =: f∗.
Therefore, if vk+1 = f(xk+1), then we terminate since f(xk+1) = f∗.
I If f is piecewise linear, then convergence occurs in finitely many iterations!
x
f(x)
x0 x1x2
f(x1) + gT1 (x− x1)
f(x0) + gT0 (x− x0)
However, in general, we have the following theorem.
Theorem
The cutting plane method yields {xk} satisfying {f(xk)} → f∗.
Algorithms for Nonsmooth Optimization 40 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Cutting plane method convergence
The iterates of the cutting plane method yield lower bounds of the optimal value:
vk+1 := minx∈X
fk(x) ≤ minx∈X
f(x) =: f∗.
Therefore, if vk+1 = f(xk+1), then we terminate since f(xk+1) = f∗.
I If f is piecewise linear, then convergence occurs in finitely many iterations!
x
f(x)
x0 x1x2
f(x1) + gT1 (x− x1)
f(x0) + gT0 (x− x0)
However, in general, we have the following theorem.
Theorem
The cutting plane method yields {xk} satisfying {f(xk)} → f∗.
Algorithms for Nonsmooth Optimization 40 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Bundle method
A bundle method attempts to combine the practical advantages of a cutting planemethod with the theoretical strengths of a proximal point method.
I Given xk, consider the regularized master problem
minx∈Rn
(fk(x) +
γ
2‖x− xk‖22
), where fk(x) := max
i∈Ik(f(xi) + gTi (x− xi)).
Here, Ik ⊆ {1, . . . , k − 1} indicates a subset of previous iterations.
I This problem is equivalent to the quadratic optimization problem
min(x,v)∈Rn×R
v +γ
2‖x− xk‖22
s.t. f(xi) + gTi (x− xi) ≤ v for all i ∈ Ik.
I Only move to a “new” point when a sufficient decrease is obtained.
Convergence rate analyses are limited; O( 1ε
log( 1ε)) for strongly convex f
Algorithms for Nonsmooth Optimization 41 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Bundle method convergence
Analysis makes use of the Moreau-Yosida regularization function
fγ(x) = minx∈Rn
(f(x) + 1
2γ‖x− x‖22
).
Theorem
If xk is not a minimizer, then fγ(xk) < f(xk).
Algorithms for Nonsmooth Optimization 42 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Bundle method convergence
Theorem
For all (k, j) ∈ N × N in a bundle method,
vk,j + 12γ‖xk,j − xk‖22 ≤ fγ(xk) < f(xk).
Algorithms for Nonsmooth Optimization 43 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Outline
Motivating Examples
Subdifferential Theory
Fundamental Algorithms
Nonconvex Nonsmooth Functions
General Framework
Algorithms for Nonsmooth Optimization 44 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Clarke subdifferential
What if f is nonconvex and nonsmooth? What are subgradients?
We still need some structure; we assume
I f is locally Lipschitz and
I f is differentiable on a full measure set D
The Clarke subdifferential of f at x is
∂f(x) = conv
{limj→∞
∇f(xj) : xj → x and xj ∈ D},
i.e., convex hull of limits of gradients of f at points in D converging to x
Theorem
If f is continuously differentiable at x, then ∂f(x) = {∇f(x)}
Algorithms for Nonsmooth Optimization 45 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Clarke subdifferential
What if f is nonconvex and nonsmooth? What are subgradients?
We still need some structure; we assume
I f is locally Lipschitz and
I f is differentiable on a full measure set DThe Clarke subdifferential of f at x is
∂f(x) = conv
{limj→∞
∇f(xj) : xj → x and xj ∈ D},
i.e., convex hull of limits of gradients of f at points in D converging to x
Theorem
If f is continuously differentiable at x, then ∂f(x) = {∇f(x)}
Algorithms for Nonsmooth Optimization 45 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Clarke subdifferential
What if f is nonconvex and nonsmooth? What are subgradients?
We still need some structure; we assume
I f is locally Lipschitz and
I f is differentiable on a full measure set DThe Clarke subdifferential of f at x is
∂f(x) = conv
{limj→∞
∇f(xj) : xj → x and xj ∈ D},
i.e., convex hull of limits of gradients of f at points in D converging to x
Theorem
If f is continuously differentiable at x, then ∂f(x) = {∇f(x)}
Algorithms for Nonsmooth Optimization 45 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Differentiable, but nonsmooth
Theorem
If f is differentiable at x, then {∇f(x)} ⊆ ∂f(x) (not necessarily equal)
Considering
f(x) =
{x2 cos( 1
x) if x 6= 0
0 if x = 0
one finds that
f ′(0) = 0
yet [−1, 1] ⊆ ∂f(0)
Algorithms for Nonsmooth Optimization 46 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Clarke ε-subdifferential
and gradient sampling
As before, we typically cannot compute ∂f(x).
It is approximated by the Clarke ε-subdifferential, namely,
∂εf(x) = conv{∂f(B(x, ε))},
which in turn can be approximated as in
∂εf(x) ≈ conv{∇f(xk),∇f(xk,1), . . . ,∇f(xk,m)},where {xk,1, . . . , xk,m} ⊂ B(xk, ε).
In gradient sampling, we compute the minimum norm element in
conv{∇f(xk),∇f(xk,1), . . . ,∇f(xk,m)},
which is equivalent to solving
min(x,v)∈Rn×R
v + ‖x− xk‖22
s.t. f(xk) +∇f(xk,i)T (x− xk) ≤ v for all i ∈ {1, . . . ,m}
Algorithms for Nonsmooth Optimization 47 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Clarke ε-subdifferential and gradient sampling
As before, we typically cannot compute ∂f(x).
It is approximated by the Clarke ε-subdifferential, namely,
∂εf(x) = conv{∂f(B(x, ε))},
which in turn can be approximated as in
∂εf(x) ≈ conv{∇f(xk),∇f(xk,1), . . . ,∇f(xk,m)},where {xk,1, . . . , xk,m} ⊂ B(xk, ε).
In gradient sampling, we compute the minimum norm element in
conv{∇f(xk),∇f(xk,1), . . . ,∇f(xk,m)},
which is equivalent to solving
min(x,v)∈Rn×R
v + ‖x− xk‖22
s.t. f(xk) +∇f(xk,i)T (x− xk) ≤ v for all i ∈ {1, . . . ,m}
Algorithms for Nonsmooth Optimization 47 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Outline
Motivating Examples
Subdifferential Theory
Fundamental Algorithms
Nonconvex Nonsmooth Functions
General Framework
Algorithms for Nonsmooth Optimization 48 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Popular and effective method
Despite all I’ve talked about, a very effective method: BFGS
Approximate second-order information with gradient displacements:
x
xkxk+1
Secant equation Hkyk = sk to match gradient of f at xk, where
sk := xk+1 − xk and yk := ∇f(xk+1)−∇f(xk)
Algorithms for Nonsmooth Optimization 49 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Popular and effective method
Despite all I’ve talked about, a very effective method: BFGS
Approximate second-order information with gradient displacements:
x
xkxk+1
Secant equation Hkyk = sk to match gradient of f at xk, where
sk := xk+1 − xk and yk := ∇f(xk+1)−∇f(xk)
Algorithms for Nonsmooth Optimization 49 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
BFGS-type updates
Inverse Hessian and Hessian approximation updating formulas (sTk vk > 0):
Wk+1 ←(I −
vksTk
sTk vk
)TWk
(I −
vksTk
sTk vk
)+sks
Tk
sTk vk
Hk+1 ←(I −
sksTkHk
sTkHksk
)THk
(I −
sksTkHk
sTkHksk
)+vkv
Tk
sTk vk
With an appropriate technique for choosing vk, we attain
I self-correcting properties for {Hk} and {Wk}I (inverse) Hessian approximations that can be used in other algorithms
Algorithms for Nonsmooth Optimization 50 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Subproblems in nonsmooth optimization algorithms
With sets of points, scalars, and (sub)gradients
{xk,j}mj=1, {fk,j}mj=1, {gk,j}mj=1,
nonsmooth optimization methods involve the primal subproblem
minx∈Rn
(max
j∈{1,...,m}{fk,j + gTk,j(x− xk,j)}+ 1
2(x− xk)THk(x− xk)
)s.t. ‖x− xk‖ ≤ δk,
(P)
but, with Gk ← [gk,1 · · · gk,m], it is typically more efficient to solve the dual
sup(ω,γ)∈Rm
+×Rn− 1
2(Gkω + γ)TWk(Gkω + γ) + bTk ω − δk‖γ‖∗
s.t. 1Tmω = 1.
(D)
The primal solution can then be recovered by
x∗k ← xk −Wk (Gkωk + γk)︸ ︷︷ ︸gk
.
Algorithms for Nonsmooth Optimization 51 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Algorithm Self-Correcting Variable-Metric Alg. for Nonsmooth Opt.
1: Choose x1 ∈ Rn.2: Choose a symmetric positive definite W1 ∈ Rn×n.3: Choose α ∈ (0, 1)4: for k = 1, 2, . . . do5: Solve (P)–(D) such that setting
Gk ←[gk,1 · · · gk,m
],
sk ← −Wk(Gkωk + γk),
and xk+1 ← xk + sk
6: yields
f(xk+1) ≤ f(xk)− 12α(Gkωk + γk)TWk(Gkωk + γk).
7: Choose vk (details omitted, but very simple)8: Set
Wk+1 ←(I −
vksTk
sTk vk
)TWk
(I −
vksTk
sTk vk
)+sks
Tk
sTk vk.
Algorithms for Nonsmooth Optimization 52 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Instances of the framework
Cutting plane / bundle methods
I Points added incrementally until sufficient decrease obtained
I Finite number of additions until accepted step
Gradient sampling methods
I Points added randomly / incrementally until sufficient decrease obtained
I Sufficient number of iterations with “good” steps
In any case: convergence guarantees require {Wk} to be uniformly positivedefinite and bounded on a sufficient number of accepted steps
Algorithms for Nonsmooth Optimization 53 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
C++ implementation: NonOpt
BFGS w/ weak Wolfe line search
Name Exit εend f(xend) #iter #func #grad #subs
maxq Stationary +9.77e-05 +2.26e-07 450 1017 452 451
mxhilb Stepsize +3.13e-03 +9.26e-02 101 1886 113 102
chained lq Stepsize +5.00e-02 -6.93e+01 205 4754 207 206
chained cb3 1 Stepsize +1.00e-01 +9.80e+01 347 7469 348 348
chained cb3 2 Stepsize +1.00e-01 +9.80e+01 64 1496 69 65
active faces Stepsize +2.50e-02 +2.22e-16 24 672 27 25
brown function 2 Stepsize +1.00e-01 +2.04e-05 395 17259 396 396
chained mifflin 2 Stepsize +5.00e-02 -3.47e+01 476 10808 508 477
chained crescent 1 Stepsize +1.00e-01 +2.18e-01 74 2278 91 75
chained crescent 2 Stepsize +1.00e-01 +5.86e-02 313 7585 334 314
Bundle method with self-correcting properties
Name Exit εend f(xend) #iter #func #grad #subs
maxq Stationary +9.77e-05 +1.04e-06 193 441 635 440
mxhilb Stationary +9.77e-05 +2.25e-05 39 338 351 137
chained lq Stationary +9.77e-05 -6.93e+01 29 374 398 366
chained cb3 1 Stationary +9.77e-05 +9.80e+01 50 1038 1069 1017
chained cb3 2 Stationary +9.77e-05 +9.80e+01 29 174 204 173
active faces Stationary +9.77e-05 +2.09e-02 17 387 165 32
brown function 2 Stationary +9.77e-05 +2.49e-03 232 10094 9674 9438
chained mifflin 2 Stationary +9.77e-05 -3.48e+01 393 24410 19493 18924
chained crescent 1 Stationary +9.77e-05 +2.73e-04 30 66 92 59
chained crescent 2 Stationary +9.77e-05 +4.36e-05 137 6679 6140 5997
Algorithms for Nonsmooth Optimization 54 of 55
Examples Subdifferentials Algorithms Nonconvex-Nonsmooth General Framework
Thanks!
NonOpt coming soon. . .
I Andreas could finish in a day. . .
I . . . what has taken me 6 months on sabbatical, so
I it’ll be done when he has a free day ;-)
Thanks for listening!
Algorithms for Nonsmooth Optimization 55 of 55