Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Analysis of Optimization Algorithms viaSum-of-Squares
Sandra S. Y. Tan Antonios Varvitsiotis (SUTD) Vincent Y. F. Tan
Motivation
First-order optimization algorithms
Iterative algorithms using only (sub)gradient information
• Low computational complexity
• Ideal for large-scale problems with low-accuracy requirements
minx∈Rn
f (x)
GD: xk+1 = xk − γk∇f (xk) x 0
x 1
x 2
x 3 x 4
*
*
Figure 1: Gradient Descent (GD)
Vincent Y. F. Tan 1
First-order optimization algorithms
Iterative algorithms using only (sub)gradient information
• Low computational complexity
• Ideal for large-scale problems with low-accuracy requirements
minx∈Rn
f (x)
GD: xk+1 = xk − γk∇f (xk) x 0
x 1
x 2
x 3 x 4
*
*
Figure 1: Gradient Descent (GD)
Vincent Y. F. Tan 1
Worst-case convergence bounds
Convergence rate of algorithm A wrt function class F systematically
Find minimum contraction factor t ∈ (0, 1) such that
Performance Metric(xk+1) ≤ t Performance Metric(xk)
holds for all f ∈ F and all {xk}k≥1 generated by A
Performance metric
• Objective function accuracy f (xk)− f (x∗), also denoted as fk − f∗
• Squared distance to optimality ‖xk − x∗‖2
• Squared residual gradient norm ‖∇f (xk)‖2, also denoted as ‖gk‖2
Vincent Y. F. Tan 2
Worst-case convergence bounds
Convergence rate of algorithm A wrt function class F systematically
Find minimum contraction factor t ∈ (0, 1) such that
Performance Metric(xk+1) ≤ t Performance Metric(xk)
holds for all f ∈ F and all {xk}k≥1 generated by A
Performance metric
• Objective function accuracy f (xk)− f (x∗), also denoted as fk − f∗
• Squared distance to optimality ‖xk − x∗‖2
• Squared residual gradient norm ‖∇f (xk)‖2, also denoted as ‖gk‖2
Vincent Y. F. Tan 2
Worst-case convergence bounds
Convergence rate of algorithm A wrt function class F systematically
Find minimum contraction factor t ∈ (0, 1) such that
Performance Metric(xk+1) ≤ t Performance Metric(xk)
holds for all f ∈ F and all {xk}k≥1 generated by A
Performance metric
• Objective function accuracy f (xk)− f (x∗), also denoted as fk − f∗
• Squared distance to optimality ‖xk − x∗‖2
• Squared residual gradient norm ‖∇f (xk)‖2, also denoted as ‖gk‖2
Vincent Y. F. Tan 2
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
More SOS proofs
GD with constant step size, γ ∈ (0, 2L ) (Lessard et al., 2016)
ρ2‖xk − x∗‖2 − ‖xk+1 − x∗‖2
≥
γ(2−γ(L+µ))
L−µ ‖gk − µ(xk − x∗)‖2, γ ∈(0, 2
L+µ
]γ(γ(L+µ)−2)
L−µ ‖gk − L(xk − x∗)‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|}
GD with the Armijo rule(1− 4µε(1− ε)
ηL
)(fk − f∗)− (fk+1− f∗) ≥
2ε(1− ε)η(L− µ)
‖gk + µ(x∗ − xk)‖2,
where ε ∈ (0, 1), η > 1 are the parameters of the Armijo rule
Vincent Y. F. Tan 4
More SOS proofs
GD with constant step size, γ ∈ (0, 2L ) (Lessard et al., 2016)
ρ2‖xk − x∗‖2 − ‖xk+1 − x∗‖2
≥
γ(2−γ(L+µ))
L−µ ‖gk − µ(xk − x∗)‖2, γ ∈(0, 2
L+µ
]γ(γ(L+µ)−2)
L−µ ‖gk − L(xk − x∗)‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|}
GD with the Armijo rule(1− 4µε(1− ε)
ηL
)(fk − f∗)− (fk+1− f∗) ≥
2ε(1− ε)η(L− µ)
‖gk + µ(x∗ − xk)‖2,
where ε ∈ (0, 1), η > 1 are the parameters of the Armijo rule
Vincent Y. F. Tan 4
More SOS proofs: Proximal Gradient Method (PGM)
PGM with exact line search (Taylor et al., 2018)
(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥
(1z
)>Q∗
(1z
)
where z = (f∗, fk , fk+1, x∗, xk , xk+1, . . . ) and Q∗ is a PSD matrix
PGM with constant step size, γ ∈ (0, 2L ) (Taylor et al., 2018)
ρ2‖rk + sk‖2 − ‖rk+1 + s̄k+1‖2
≥
ρ2‖q1‖2 + 2−γ(L+µ)
γ(L−µ) ‖q2‖2, γ ∈(0, 2
L+µ
]ρ2‖q1‖2 + γ(L+µ)−2
γ(L−µ) ‖q3‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|} and q1, q2 and q3 are linearfunctions of sk , s̄k+1, rk , rk+1
Vincent Y. F. Tan 5
More SOS proofs: Proximal Gradient Method (PGM)
PGM with exact line search (Taylor et al., 2018)
(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥
(1z
)>Q∗
(1z
)
where z = (f∗, fk , fk+1, x∗, xk , xk+1, . . . ) and Q∗ is a PSD matrix
PGM with constant step size, γ ∈ (0, 2L ) (Taylor et al., 2018)
ρ2‖rk + sk‖2 − ‖rk+1 + s̄k+1‖2
≥
ρ2‖q1‖2 + 2−γ(L+µ)
γ(L−µ) ‖q2‖2, γ ∈(0, 2
L+µ
]ρ2‖q1‖2 + γ(L+µ)−2
γ(L−µ) ‖q3‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|} and q1, q2 and q3 are linearfunctions of sk , s̄k+1, rk , rk+1
Vincent Y. F. Tan 5
Related Work
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
The Sum-of-Squares Approach
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
1 : 4 = Q1,1
x : 0 = Q1,2 + Q2,1 = 2Q1,2
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
1 : 4 = Q1,1
x : 0 = Q1,2 + Q2,1 = 2Q1,2
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z)︸ ︷︷ ︸SOS
+∑i
σi (z)︸ ︷︷ ︸SOS
hi (z) +∑j
θj(z)vj(z)
where a sum-of-squares (SOS) polynomial: σ(z) =∑
k q2k(z)
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z)︸ ︷︷ ︸SOS
+∑i
σi (z)︸ ︷︷ ︸SOS
hi (z)︸︷︷︸≥0
+∑j
θj(z) vj(z)︸︷︷︸=0
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z)︸ ︷︷ ︸SOS
+∑i
σi (z)︸ ︷︷ ︸SOS
hi (z)︸︷︷︸≥0
+∑j
θj(z) vj(z)︸︷︷︸=0
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
Analyzing optimization algorithms
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
f ∈ Fxk+1 = A (xk , fk , gk)
where fk = f (xk) and gk ∈ ∂f (xk) for all k
Vincent Y. F. Tan 10
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
f ∈ Fxk+1 = A (xk , fk , gk)
}K
where fk = f (xk) and gk ∈ ∂f (xk) for all k
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
E.g. f is L-smooth =⇒ ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y
Vincent Y. F. Tan 11
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
E.g. f is L-smooth =⇒ ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y=⇒ h1(z) = L2‖xk − x∗‖2 − ‖gk − g∗‖2 ≥ 0
Vincent Y. F. Tan 11
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0 ∀ z ∈ K
t ∈ (0, 1)
where K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
Vincent Y. F. Tan 12
Second relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0 ∀ z ∈ K
t ∈ (0, 1)
where K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
2. Relax the constraint that p(z) := t(fk − f∗)− (fk+1 − f∗) isnonnegative over K with an SOS cert
Vincent Y. F. Tan 13
Second relaxation step
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
where K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
2. Relax the constraint that p(z) := t(fk − f∗)− (fk+1 − f∗) isnonnegative over K with an SOS cert
Vincent Y. F. Tan 13
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
Example
Function class
Class of (µ, L)-smooth functions (denoted Fµ,L(Rn))
Theorem (Fµ,L-interpolability (Taylor et al., 2017b))
Given a set {(xi , fi , gi )}i∈I , ∃ a (µ, L)-smooth function f wherefi = f (xi ) and gi ∈ ∂f (xi ) ∀ i iff
fi − fj − g>j (xi − xj) ≥L
2(L− µ)
[1L‖gi − gj‖2 + µ‖xi − xj‖2
− 2µ
L(gj − gi )>(xj − xi )
]∀ i 6= j
E.g. if we set i = k and j = k + 1:
fk − fk+1 − g>k+1(xk − xk+1)− L
2(L− µ)
[1L‖gk − gk+1‖2 + µ‖xk − xk+1‖2
− 2µ
L(gk+1 − gk)>(xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 15
Function class
Class of (µ, L)-smooth functions (denoted Fµ,L(Rn))
Theorem (Fµ,L-interpolability (Taylor et al., 2017b))
Given a set {(xi , fi , gi )}i∈I , ∃ a (µ, L)-smooth function f wherefi = f (xi ) and gi ∈ ∂f (xi ) ∀ i iff
fi − fj − g>j (xi − xj) ≥L
2(L− µ)
[1L‖gi − gj‖2 + µ‖xi − xj‖2
− 2µ
L(gj − gi )>(xj − xi )
]∀ i 6= j
E.g. if we set i = k and j = k + 1:
fk − fk+1 − g>k+1(xk − xk+1)− L
2(L− µ)
[1L‖gk − gk+1‖2 + µ‖xk − xk+1‖2
− 2µ
L(gk+1 − gk)>(xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 15
Function class
Class of (µ, L)-smooth functions (denoted Fµ,L(Rn))
Theorem (Fµ,L-interpolability (Taylor et al., 2017b))
Given a set {(xi , fi , gi )}i∈I , ∃ a (µ, L)-smooth function f wherefi = f (xi ) and gi ∈ ∂f (xi ) ∀ i iff
fi − fj − g>j (xi − xj) ≥L
2(L− µ)
[1L‖gi − gj‖2 + µ‖xi − xj‖2
− 2µ
L(gj − gi )>(xj − xi )
]∀ i 6= j
E.g. if we set i = k and j = k + 1:
fk − fk+1 − g>k+1(xk − xk+1)− L
2(L− µ)
[1L‖gk − gk+1‖2 + µ‖xk − xk+1‖2
− 2µ
L(gk+1 − gk)>(xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 15
Algorithm
Gradient Descent (GD) with exact line search (ELS)
γk = argminγ>0
f (xk − γgk)
xk+1 = xk − γkgk
where gk = ∇f (xk)
Derived constraints
g>k+1 (xk+1 − xk) = 0
g>k+1gk = 0
Vincent Y. F. Tan 16
Algorithm
Gradient Descent (GD) with exact line search (ELS)
γk = argminγ>0
f (xk − γgk)
xk+1 = xk − γkgk
where gk = ∇f (xk)
Derived constraints
g>k+1 (xk+1 − xk) = 0
g>k+1gk = 0
Vincent Y. F. Tan 16
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
g>k+1 (xk+1 − xk) = 0
g>k+1gk = 0
The SDP
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
σ0(z) : SOS of linear polynomials
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
σ0(z) : SOS of linear polynomials
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
σ0(z) : SOS of linear polynomials
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, i ≥ 1, θj(z) : degree 0
Vincent Y. F. Tan 17
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, i ≥ 1, θj(z) : degree 0
Vincent Y. F. Tan 17
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, i ≥ 1, θj ∈ R
Vincent Y. F. Tan 17
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t =
2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2
− σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2
− σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
1 : 0 = Q1,1
f∗ : 1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
fk : t = 2Q1,3 + σ1 + σ2 − σ3 − σ5
fk+1 : −1 = 2Q1,4 − σ1 + σ3 + σ4 − σ6
...
Vincent Y. F. Tan 19
Solving the SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, θj ∈ R
Guessing the optimal solution
t∗ =
(L− µ
L+ µ
)2, Q∗ =
(04×4 04×5
05×4 Q∗5×5
), σ∗
1 =L− µ
L+ µ, . . .
Q∗5×5 =
2L2µ2(L+µ)2(L−µ)
− Lµ2(L+µ)2
− Lµ2(L+µ)(L−µ)
Lµ
(L+µ)2Lµ
(L+µ)(L−µ)Lµ(L+3µ)2(L+µ)2
− Lµ2(L+µ) −µ(3L+µ)
2(L+µ)2− µ
2(L+µ)Lµ
2(L−µ)µ
2(L+µ) − µ2(L−µ)
L+3µ2(L+µ)2
12(L+µ)
12(L−µ)
Vincent Y. F. Tan 20
Solving the SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, θj ∈ R
Guessing the optimal solution
t∗ =
(L− µ
L+ µ
)2, Q∗ =
(04×4 04×5
05×4 Q∗5×5
), σ∗
1 =L− µ
L+ µ, . . .
Q∗5×5 =
2L2µ2(L+µ)2(L−µ)
− Lµ2(L+µ)2
− Lµ2(L+µ)(L−µ)
Lµ
(L+µ)2Lµ
(L+µ)(L−µ)Lµ(L+3µ)2(L+µ)2
− Lµ2(L+µ) −µ(3L+µ)
2(L+µ)2− µ
2(L+µ)Lµ
2(L−µ)µ
2(L+µ) − µ2(L−µ)
L+3µ2(L+µ)2
12(L+µ)
12(L−µ)
Vincent Y. F. Tan 20
Multivariate case
Recall z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1)
Let z = (z0, z1, . . . , zn) where
z0 = (f∗, fk , fk+1) and z` = (x∗(`), xk(`), xk+1(`), g∗(`), gk(`), gk+1(`))
Require polynomials p(z), hi (z) and vj(z) satisfy
pol(z) = pol0(z0) +n∑`=1
pol1(z`)
• Fulfilled when polynomials are linear in f ’s and in the inner productsof x’s and g’s e.g.
fk−f∗−L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 21
Multivariate case
Recall z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1)
Let z = (z0, z1, . . . , zn) where
z0 = (f∗, fk , fk+1) and z` = (x∗(`), xk(`), xk+1(`), g∗(`), gk(`), gk+1(`))
Require polynomials p(z), hi (z) and vj(z) satisfy
pol(z) = pol0(z0) +n∑`=1
pol1(z`)
• Fulfilled when polynomials are linear in f ’s and in the inner productsof x’s and g’s e.g.
fk−f∗−L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 21
Multivariate case
Recall z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1)
Let z = (z0, z1, . . . , zn) where
z0 = (f∗, fk , fk+1) and z` = (x∗(`), xk(`), xk+1(`), g∗(`), gk(`), gk+1(`))
Require polynomials p(z), hi (z) and vj(z) satisfy
pol(z) = pol0(z0) +n∑`=1
pol1(z`)
• Fulfilled when polynomials are linear in f ’s and in the inner productsof x’s and g’s e.g.
fk−f∗−L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 21
Multivariate case
Univariate case
p(z) =
1z0
z1
>0
Q0
Q1
1z0
z1
+∑i
σihi (z) +∑j
θjvj(z)
Multivariate case
p(z) =
1z0
z1...
zn
>0Q0
Q1 ⊗ In
1z0
z1...
zn
+∑i
σihi (z) +∑j
θjvj(z)
SOS cert for univariate case =⇒ SOS cert for multivariate case
Vincent Y. F. Tan 22
Multivariate case
Univariate case
p(z) =
1z0
z1
>0
Q0
Q1
1z0
z1
+∑i
σihi (z) +∑j
θjvj(z)
Multivariate case
p(z) =
1z0
z1...
zn
>0Q0
Q1 ⊗ In
1z0
z1...
zn
+∑i
σihi (z) +∑j
θjvj(z)
SOS cert for univariate case =⇒ SOS cert for multivariate case
Vincent Y. F. Tan 22
Multivariate case
Univariate case
p(z) =
1z0
z1
>0
Q0
Q1
1z0
z1
+∑i
σihi (z) +∑j
θjvj(z)
Multivariate case
p(z) =
1z0
z1...
zn
>0Q0
Q1 ⊗ In
1z0
z1...
zn
+∑i
σihi (z) +∑j
θjvj(z)
SOS cert for univariate case =⇒ SOS cert for multivariate case
Vincent Y. F. Tan 22
Obtaining the SOS certificate
t∗(fk − f∗)− (fk+1 − f∗)(1)=
(1z
)>Q∗
(1z
)+∑i
σ∗i hi (z) +∑j
θ∗j vj(z)
(2)≥
(1z
)>Q∗
(1z
)(3)=µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)
1. Verifying the solution to be feasible
2. Since σ∗i , hi (z) ≥ 0 and vj(z) = 0
3. Expanding the quadratic term into its SOS decomposition
Vincent Y. F. Tan 23
Obtaining the SOS certificate
t∗(fk − f∗)− (fk+1 − f∗)(1)=
(1z
)>Q∗
(1z
)+∑i
σ∗i hi (z) +∑j
θ∗j vj(z)
(2)≥
(1z
)>Q∗
(1z
)
(3)=µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)
1. Verifying the solution to be feasible
2. Since σ∗i , hi (z) ≥ 0 and vj(z) = 0
3. Expanding the quadratic term into its SOS decomposition
Vincent Y. F. Tan 23
Obtaining the SOS certificate
t∗(fk − f∗)− (fk+1 − f∗)(1)=
(1z
)>Q∗
(1z
)+∑i
σ∗i hi (z) +∑j
θ∗j vj(z)
(2)≥
(1z
)>Q∗
(1z
)(3)=µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)
1. Verifying the solution to be feasible
2. Since σ∗i , hi (z) ≥ 0 and vj(z) = 0
3. Expanding the quadratic term into its SOS decomposition
Vincent Y. F. Tan 23
Revisiting the PEP
The PEP as an SDP
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
The PEP as an SDP
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
The PEP as an SDP
max fN − f∗
s.t. ∃f ∈ F s.t. Of (xi ) = {fi , gi} ∀ i
xk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
The PEP as an SDP
max fN − f∗
s.t. ∃f ∈ F s.t. Of (xi ) = {fi , gi} ∀ i
xk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
• Linear in f ’s and in the inner products of x’s and g’s =⇒ Grammatrices
• Formulated as an SDP
Vincent Y. F. Tan 24
The PEP as an SDP
max fN − f∗
s.t. hi (z) ≥ 0, ∀ivj(z) = 0, ∀jf0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
• Linear in f ’s and in the inner products of x’s and g’s =⇒ Grammatrices
• Formulated as an SDP
Vincent Y. F. Tan 24
The PEP as an SDP
max fN − f∗
s.t. hi (z) ≥ 0, ∀ivj(z) = 0, ∀jf0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
The PEP as an SDP
max fN − f∗
s.t. hi (z) ≥ 0, ∀ivj(z) = 0, ∀jf0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒ Deg-1 SOS-SDP
Vincent Y. F. Tan 25
Results
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Set ε and η to be parameters related to the Armijo rule, and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
GD ELS fk − f∗(
L−µL+µ
)2 de Klerk et al.
(2017)
γ ∈ (0, 2L ) ‖xk − x∗‖2 ρ2
γ
Lessard et al.
(2016)
Armijo rule fk − f∗ 1− 4µε(1−ε)ηL New
Vincent Y. F. Tan 26
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Set ε and η to be parameters related to the Armijo rule, and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
GD ELS fk − f∗(
L−µL+µ
)2 de Klerk et al.
(2017)
γ ∈ (0, 2L ) ‖xk − x∗‖2 ρ2
γ
Lessard et al.
(2016)
Armijo rule fk − f∗ 1− 4µε(1−ε)ηL New
Vincent Y. F. Tan 26
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Set ε and η to be parameters related to the Armijo rule, and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
GD ELS fk − f∗(
L−µL+µ
)2 de Klerk et al.
(2017)
γ ∈ (0, 2L ) ‖xk − x∗‖2 ρ2
γ
Lessard et al.
(2016)
Armijo rule fk − f∗ 1− 4µε(1−ε)ηL New
Vincent Y. F. Tan 26
GD with Armijo-terminated Line Search
Our result tnew vs. known contraction factors tLY (Luenberger and Ye,2016) and tnemi (Nemirovski, 1999):
tnew = 1− 4ε(1− ε)ηκ
tLY = 1− 2εηκ
tnemi =κ− (2− ε−1)(1− ε)η−1
κ+ (ε−1 − 1)η−1
2 4 6 8 10
Condition number,
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tLY, = 1.001
Figure 2: tnew and tLY for ε = 0.25
2 4 6 8 10
Condition number,
0
0.2
0.4
0.6
0.8
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tnemi, = 1.001
Figure 3: tnew and tnemi for ε = 0.5
Condition number, κ = L/µ
Vincent Y. F. Tan 27
GD with Armijo-terminated Line Search
Our result tnew vs. known contraction factors tLY (Luenberger and Ye,2016) and tnemi (Nemirovski, 1999):
tnew = 1− 4ε(1− ε)ηκ
tLY = 1− 2εηκ
tnemi =κ− (2− ε−1)(1− ε)η−1
κ+ (ε−1 − 1)η−1
2 4 6 8 10
Condition number,
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tLY, = 1.001
Figure 2: tnew and tLY for ε = 0.25
2 4 6 8 10
Condition number,
0
0.2
0.4
0.6
0.8
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tnemi, = 1.001
Figure 3: tnew and tnemi for ε = 0.5
Condition number, κ = L/µ
Vincent Y. F. Tan 27
GD with Armijo-terminated Line Search
Our result tnew vs. known contraction factors tLY (Luenberger and Ye,2016) and tnemi (Nemirovski, 1999):
tnew = 1− 4ε(1− ε)ηκ
tLY = 1− 2εηκ
tnemi =κ− (2− ε−1)(1− ε)η−1
κ+ (ε−1 − 1)η−1
2 4 6 8 10
Condition number,
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tLY, = 1.001
Figure 2: tnew and tLY for ε = 0.25
2 4 6 8 10
Condition number,
0
0.2
0.4
0.6
0.8
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tnemi, = 1.001
Figure 3: tnew and tnemi for ε = 0.5
Condition number, κ = L/µ
Vincent Y. F. Tan 27
A new function class: Composite functions
Composite convex minimization problem
minx∈Rn{f (x) := a(x) + b(x)} ,
where a ∈ Fµ,L and b is closed, convex and proper.
Further assumptionsAssume that the proximal operator of b,
proxγb(x) := arg miny∈Rn
{γb(y) +
12‖x− y‖2
},
exists at every x
Vincent Y. F. Tan 28
A new function class: Composite functions
Composite convex minimization problem
minx∈Rn{f (x) := a(x) + b(x)} ,
where a ∈ Fµ,L and b is closed, convex and proper.
Further assumptionsAssume that the proximal operator of b,
proxγb(x) := arg miny∈Rn
{γb(y) +
12‖x− y‖2
},
exists at every x
Vincent Y. F. Tan 28
Algorithm
Proximal gradient method (PGM) with constant step size γ > 0
xk+1 = proxγb (xk − γ∇a(xk)) ,
where 0 ≤ γ ≤ 2L
PGM with exact line search
γk = argminγ>0
f[proxγb (xk − γ∇a(xk))
]xk+1 = proxγb (xk − γk∇a(xk))
Vincent Y. F. Tan 29
Algorithm
Proximal gradient method (PGM) with constant step size γ > 0
xk+1 = proxγb (xk − γ∇a(xk)) ,
where 0 ≤ γ ≤ 2L
PGM with exact line search
γk = argminγ>0
f[proxγb (xk − γ∇a(xk))
]xk+1 = proxγb (xk − γk∇a(xk))
Vincent Y. F. Tan 29
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Let gk denotes a (sub)gradient of f at xk , and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
PGM γ ∈ (0, 2L ) ‖gk‖2 ρ2
γ
Taylor et al.
(2018)
ELS fk − f∗(
L−µL+µ
)2 Taylor et al.
(2018)
Vincent Y. F. Tan 30
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Let gk denotes a (sub)gradient of f at xk , and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
PGM γ ∈ (0, 2L ) ‖gk‖2 ρ2
γ
Taylor et al.
(2018)
ELS fk − f∗(
L−µL+µ
)2 Taylor et al.
(2018)
Vincent Y. F. Tan 30
Conclusion
Conclusion
Introduced an SDP hierarchy for bounding convergence rates via SOScertificates⇒ First level coincides with the PEP
Future work
• Other function classes
• Other algorithms
• Understanding when SOS certificates cannot be found
Links
arXiv paper: http://arxiv.org/abs/1906.04648github codes: https://github.com/sandratsy/SumsOfSquares
Vincent Y. F. Tan 31
Conclusion
Introduced an SDP hierarchy for bounding convergence rates via SOScertificates⇒ First level coincides with the PEP
Future work
• Other function classes
• Other algorithms
• Understanding when SOS certificates cannot be found
Links
arXiv paper: http://arxiv.org/abs/1906.04648github codes: https://github.com/sandratsy/SumsOfSquares
Vincent Y. F. Tan 31
Thank you!
Vincent Y. F. Tan 31
References I
References
de Klerk, E., Glineur, F., and Taylor, A. B. (2017). On the worst-casecomplexity of the gradient method with exact line search for smoothstrongly convex functions. Optimization Letters, 11(7):1185–1199.
Drori, Y. and Teboulle, M. (2014). Performance of first-order methodsfor smooth convex minimization: A novel approach. MathematicalProgramming, 145(1):451–482.
Lasserre, J. (2007). A sum of squares approximation of nonnegativepolynomials. SIAM Review, 49(4):651–669.
Lessard, L., Recht, B., and Packard, A. (2016). Analysis and design ofoptimization algorithms via integral quadratic constraints. SIAMJournal on Optimization, 26(1):57–95.
References II
Luenberger, D. G. and Ye, Y. (2016). Linear and NonlinearProgramming. Springer International Publishing, 4th edition.
Nemirovski, A. (1999). Optimization II: Numerical methods for nonlinearcontinuous optimization.https://www2.isye.gatech.edu/~nemirovs/Lect_OptII.pdf.
Parrilo, P. A. (2003). Semidefinite programming relaxations forsemialgebraic problems. Mathematical Programming, 96(2):293–320.
Taylor, A., Hendrickx, J., and Glineur, F. (2017a). Exact worst-caseperformance of first-order methods for composite convex optimization.SIAM Journal on Optimization, 27(3):1283–1313.
Taylor, A. B., Hendrickx, J. M., and Glineur, F. (2017b). Smoothstrongly convex interpolation and exact worst-case performance offirst-order methods. Mathematical Programming, 161(1):307–345.
References III
Taylor, A. B., Hendrickx, J. M., and Glineur, F. (2018). Exact worst-caseconvergence rates of the proximal gradient method for compositeconvex minimization. Journal of Optimization Theory andApplications, 178(2):455–476.