181
Advanced Convex Optimization (PGMO) Yurii Nesterov, CORE/INMA (UCL) January 20-22, 2016 (Ecole Polytechnique, Paris) Yu. Nesterov Advanced Convex Optimization (PGMO)

Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Advanced Convex Optimization (PGMO)

Yurii Nesterov, CORE/INMA (UCL)

January 20-22, 2016 (Ecole Polytechnique, Paris)

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 2: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 3: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 4: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 5: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 6: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 7: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 8: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 9: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 10: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 11: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 12: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 13: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton method

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 14: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the course

Main goals:

Theoretical justification of efficiency of optimization methods.

No gap between theory and practice.

Part 1: Black-Box Optimization

Lecture 1. Complexity of Black-Box Optimization

Difficult problems

Lower complexity bounds for Convex Optimization

Optimal methods

Lecture 2. Second order methods. Systems of nonlinear equations

Globally convergent second-order schemes

Cubic regularization for Newton Method

Modified Gauss-Newton methodYu. Nesterov Advanced Convex Optimization (PGMO)

Page 15: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 16: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 17: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 18: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 19: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 20: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 21: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 22: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 23: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 24: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 25: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 26: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 27: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Part 2: Structural Optimization

Lecture 3. Interior-point methods

Self-concordant functions

Self-concordant barriers

Application examples

Lecture 4. Smoothing Technique

Explicit model of objective function

Smoothing

Application examples

Lecture 5. Huge-scale optimization

Sparsity in optimization problems

Coordinate-descent schemes

Gradient methods with sublinear cost of iteration

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 28: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

References

Books:

Yu. Nesterov. Introductory Lectures on Convex Optimization.Kluwer, Boston, 2004.

Yu. Nesterov, A. Nemirovskii. Interior point polynomialmethods in convex programming: Theory and Applications,,SIAM, Philadelphia, 1994.

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 29: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Papers

1 Yu. Nesterov. Subgradient methods for huge-scaleoptimization problems. Mathematical Programming 146(1-2),275-297 (2014)

2 Yu.Nesterov. Gradient methods for minimizing compositefunctions. Mathematical Programming, 140(1), 125-161(2013).

3 Yu.Nesterov. Efficiency of coordinate-descent methods onhuge-scale optimization problems. SIOPT, 22(2), 341-362(2012).

4 Yu.Nesterov. Simple bounds for boolean quadratic problems.EUROPT Newsletters, 18, 19-23 (2009).

5 Yu. Nesterov. “Primal-dual subgradient methods for convexproblems”. Mathematical Programming, 120(1), 261-283(2009)

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 30: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

6 Yu. Nesterov. “Accelerating the cubic regularization ofNewton’s method on convex problems”. MathematicalProgramming, 112(1) 159-181 (2008)

7 Yu.Nesterov, J.-Ph.Vial. Confidence level solutions forstochastic programming. Automatica, 44(6), 1559-1568(2008)

8 Yu. Nesterov. “Modified Gauss-Newton scheme withworst-case guarantees for its global performance”.Optimization Methods and Software, 22(3) 469-483 (2007)

9 Yu. Nesterov. “Smoothing technique and its applications insemidefinite optimization”. Mathematical Programming,110(2), 245-259 (2007)

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 31: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

10 Yu. Nesterov. “Dual extrapolation and its application forsolving variational inequalities and related problems”.Mathematical Programming, 109(2-3), 319-344 (2007).

11 Yu. Nesterov, B. Polyak. “Cubic regularization of Newton’smethod and its global performance”. MathematicalProgramming, 108(1), 177-205 (2006).

12 Yu. Nesterov. “Excessive gap technique in nonsmooth convexminimization”. SIAM J. Optim. 16 (1), 235-249 (2005).

13 Yu. Nesterov. “Smooth minimization of non-smoothfunctions”, Mathematical Programming (A), 103 (1),127-152 (2005).

Yu. Nesterov Advanced Convex Optimization (PGMO)

Page 32: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Advanced Convex Optimization (PGMO 2016)

Lecture 1. Intrinsic complexity of Black-BoxOptimization

Yurii Nesterov, CORE/INMA (UCL)

January 20-22, 2016 (Ecole Polytechnique, Paris)

Page 33: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Outline

1 Basic NP-hard problem

2 NP-hardness of some popular problems

3 Lower complexity bounds for Global Minimization

4 Nonsmooth Convex Minimization. Subgradient scheme.

5 Smooth Convex Minimization. Lower complexity bounds

6 Methods for Smooth Minimization with Simple Constraints

Page 34: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Standard Complexity Classes

Let data be coded in matrix A, and n be dimension of the problem.

Combinatorial Optimization

NP-hard problems: 2n operations. Solvable in O(p(n)‖A‖).

Fully polynomial approximation schemes: O(p(n)εk

lnα ‖A‖)

.

Polynomial-time problems: O(p(n) lnα ‖A‖).

Continuous Optimization

Sublinear complexity: O(p(n)εα ‖A‖

β)

, α, β > 0.

Polynomial-time complexity: O(p(n) ln( 1

ε‖A‖)).

Page 35: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Basic NP-hard problem: Problem of stones

Given n stones of integer weights a1, . . . , an, decide if it is possibleto divide them on two parts of equal weight.

Mathematical formulation

Find a Boolean solution xi = ±1, i = 1, . . . , n, to a single linear

equationn∑

i=1aixi = 0.

Another variant:n∑

i=2aixi = a1.

NB: Solvable in O

(ln n ·

n∑i=1|ai |)

by FFT transform.

Page 36: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Immediate consequence: quartic polynomial

Theorem: Minimization of quartic polynomial of n variables isNP-hard.

Proof: Consider the following function:

f (x) =n∑

i=1x4i −

1n

(n∑

i=1x2i

)2

+

(n∑

i=1aixi

)4

+ (1− x1)4.

The first part is 〈A[x ]2, [x ]2〉, where A = I − 1neneTn 0 with

Aen = 0, and [x ]2i = x2i , i = 1, . . . , n.

Thus, f (x) = 0 iff all xi = τ ,n∑

i=1aixi = 0, and x1 = 1.

Corollary: Minimization of convex quartic polynomial over theunit sphere is NP-hard.

Page 37: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Nonlinear Optimal Control: NP-hard

Problem: minu f (x(1)) : x ′ = g(x , u), 0 ≤ t ≤ 1, x(0) = x0 .

Consider g(x , u) = 1nx · 〈x , u〉 − u.

Lemma. Let ‖x0‖2 = n. Then ‖x(t)‖2 = n, 0 ≤ t ≤ 1.

Proof. Consider g(x , u) =(

xxT

‖x‖2 − I)

u and let x ′ = g(x , u). Then

〈x ′, x〉 = 〈(

xxT

‖x‖2 − I)

u, x〉 = 0.

Thus, ‖x(t)‖2 = ‖x0‖2. Same is true for x(t) defined by g .Note: We have enough degrees of freedom to put x(1) at anyposition of the sphere.Hence, our problem is: minf (y) : ‖y‖2 = n.

Page 38: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Descent direction of nonsmooth nonconvex function

Consider φ(x) =(

1− 1γ

)max

1≤i≤n|xi | − min

1≤i≤n|xi |+ |〈a, x〉|,

where a ∈ Zn+ and γ

def=

n∑i=1

ai ≥ 1. Clearly, φ(0) = 0.

Lemma. It is NP-hard to decide if φ(x) < 0 for some x ∈ Rn.Proof: 1. Assume that σ ∈ Rn with σi = ±1 satisfies 〈a, σ〉 = 0.Then φ(σ) = − 1

γ < 0.2. Assume φ(x) < 0 and max

1≤i≤n|xi | = 1. Denote δ = |〈a, x〉|.

Then |xi | > 1− 1γ + δ, i = 1, . . . , n.

Denoting σi = signxi , we have σixi > 1− 1γ + δ. Therefore,

|σi − xi | = 1− σixi < 1γ − δ, and we conclude that

|〈a, σ〉| ≤ |〈a, x〉|+ |〈a, σ − x〉| ≤ δ + γ max1≤i≤n

|σi − xi |

< (1− γ)δ + 1 ≤ 1.

Since a ∈ Zn , this is possible iff 〈a, σ〉 = 0.

Page 39: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Black-box optimization

Oracle: Special unit for computing function value and derivativesat test points. (0-1-2 order.)

Analytic complexity: Number of calls of oracle, which isnecessary (sufficient) for solving any problem from the class.

(Lower/Upper complexity bounds.)

Solution: ε-approximation of the minimum.

Resisting oracle: creates the worst problem instance for aparticular method.

Starts from “empty” problem.

Answers must be compatible with the description of theproblem class.

The bad problem is created after the method stops.

Page 40: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Bounds for Global Minimization

Problem: f ∗ = minxf (x) : x ∈ Bn, Bn = x ∈ Rn : 0 ≤ x ≤ en.

Problem Class: |f (x)− f (y)| ≤ L‖x − y‖∞ ∀x , y ∈ Bn.

Oracle: f (x) (zero order).

Goal: Find x ∈ Bn: f (x)− f ∗ ≤ ε.

Theorem: N(ε) ≥(L2ε

)n.

Proof. Divide Bn on pn l∞-balls of radius 12p .

Resisting oracle: at each test point reply f (x) = 0.Assume, N < pn. Then, ∃ ball with no questions. Hence, we cantake f ∗ = − L

2p . Hence, ε ≥ L2p .

Corollary: Uniform Grid method is worst-case optimal.

Page 41: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Nonsmooth Convex Minimization (NCM)

Problem: f ∗ = minxf (x) : x ∈ Q, where

Q ⊆ Rn is a convex set: x , y ∈ Q ⇒ [x , y ] ∈ Q. It is simple.

f (x) is a sub-differentiable convex function:

f (y) ≥ f (x) + 〈f ′(x), y − x〉, x , y ∈ Q,

for certain subgradient f ′(x) ∈ Rn.

Oracle: f (x), f ′(x) (first order).

Solution: ε-approximation in function value.

Main inequality: 〈f ′(x), x − x∗〉 ≥ f (x)− f ∗ ≥ 0, ∀x ∈ Q.

NB: Anti-subgradient decreases the distance to the optimum.

Page 42: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Computation of subgradients

Denote by ∂f (x) the subdifferential of f at x .

This is the set of all subgradients at x .

1. For f = α1f1 + α2f2 with α1, α2 > 0, we have

∂f (x) = α1∂f1(x) + α2∂f2(x).

2. For f = maxf1, f2, we have

∂f (x) = Conv ∂f1(x), ∂f2(x).

Page 43: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

NCM: Lower Complexity Bounds

.Let Q ≡ ‖x‖ ≤ 2R and xk+1 ∈ x0 + Linf ′(x0), . . . , f ′(xk).Consider the function fm(x) = L max

1≤i≤mxi + µ

2‖x‖2 with µ = L

Rm1/2 .

From the problem: minτ

(Lτ + µm

2 τ2), we get

τ∗ = − Lµm = − R

m1/2 , f ∗m = − L2

2µm = − LRm1/2 , ‖x∗‖2 = mτ2

∗ = R2.

NB: If x0 = 0, then after k iterations we can keep xi = 0 for i > k .

Lipschitz continuity: fk+1(xk)− f ∗k+1 ≥ −f ∗k+1 = LR(k+1)1/2 .

Strong convexity: fk+1(xk)− f ∗k+1 ≥ −f ∗k+1 = L2

2(k+1)·µ .

Both lower bounds are exact!

Page 44: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Subgradient Method (SG)

Problem: minx∈Qf (x) : g(x) ≤ 0,

where Q is a closed convex set, and convex f , g ∈ C 0,0L (Q).

SG: If g(xk )‖g ′(xk )‖ > h then a) xk+1 = πQ

(xk − g(xk )

‖g ′(xk )‖2 g ′(xk))

,

else b) xk+1 = πQ

(xk − h

‖f ′(xk )‖ f ′(xk)).

Denote f ∗N = min0≤k≤N

f (xk) : k ∈ b). Let N = Na + Nb.

Theorem: If N > 1h2 ‖x0 − x∗‖2, then f ∗N − f ∗ ≤ hL. (h = ε

L .)

Proof: Denote rk = ‖xk − x∗‖.

a): r 2k+1 − r 2

k ≤ −2g(xk )‖g ′(xk )‖2 〈g ′(xk), xk − x∗〉+ g2(xk )

‖g ′(xk )‖2 ≤ −h2.

b): r 2k+1 − r 2

k ≤ −2h〈f ′(xk ),xk−x∗〉‖f ′(xk )‖ + h2 ≤ −2h

L (f (xk)− f ∗) + h2.

Thus, Nb2hL (f ∗N − f ∗) ≤ r 2

0 + h2(Nb −Na) = r 20 + h2(2Nb −N).

Page 45: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Smooth Convex Minimization (SCM)

Lipschitz-continuous gradient: ‖f ′(x)− f ′(y)‖ ≤ L‖x − y‖.Geometric interpretation: for all x , y ∈ domF we have

0 ≤ f (y)− f (x)− 〈f ′(x), y − x〉

=1∫

0

〈f ′(x + τ(y − x)− f ′(x), y − x〉dt ≤ L2‖x − y‖2.

Sufficient condition: 0 f ′′(x) L · In, x ∈ dom f .

Equivalent definition:

f (y) ≥ f (x) + 〈f ′(x), y − x〉+ 12L‖f

′(x)− f ′(y)‖2.

Hint: Prove first that f (x)− f ∗ ≥ 12L‖f

′(x)‖2.

Page 46: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

SCM: Lower complexity bounds

Consider the family of functions (k ≤ n):

fk(x) = 12

[x2

1 +k−1∑i=1

(xi − xi+1)2 + x2k

]− x1 ≡ 1

2〈Akx , x〉 − x1.

Let Rnk = x ∈ Rn : xi = 0, i > k. Then fk+p(x) = fk(x),

x ∈ Rnk .

Clearly, 0 ≤ 〈Akh, h〉 ≤ h21 +

k−1∑i=1

2(h2i + h2

i+1) + h2k ≤ 4‖h‖2,

Ak =

2 −1 0−1 2 −1

0 −1 20

. . . . . .

0−1 2 −1

0 −1 2

k lines

0n−k,k 0n−k,n−k

,

Page 47: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Hence, Akx = e1 has the solution xki =

k+1−ik+1 , 1 ≤ i ≤ k ,

0, i > k ..

Thus f ∗k = 12〈Ak xk , xk〉 − 〈e1, x

k〉 = −12〈e1, x

k〉 = − k2(k+1) , and

‖ xk ‖2=k∑

i=1

(k+1−ik+1

)2= 1

(k+1)2

k∑i=1

i2 = k(2k+1)6(k+1) .

Let x0 = 0 and p ≤ n is fixed.

Lemma. If xk ∈ Lkdef= Linf ′p(x0), . . . , f ′p(xk−1), then Lk ⊆ Rn

k .

Proof: x0 = 0 ∈ Rn0 , f′p(0) = −e1 ∈ Rn

1 ⇒ x1 ∈ Rn1 , f′p(x1) ∈ Rn

2 ,

Corollary 1: fp(xk) = fk(xk) ≥ f ∗k .

Corollary 2: Take p = 2k + 1. Thenfp(xk )−f ∗pL‖x0−xp‖2 ≥

[− k

2(k+1) + 2k+12(2k+2)

]/[

(2k+1)(4k+3)3(k+1)

]= 3

4(2k+1)(4k+3) .

‖ xk − xp ‖2≥2k+1∑i=k+1

(x2k+1i )2 = (2k+3)(k+2)

24(k+1) ≥ 18‖x

p‖2.

Page 48: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Some remarks

1. The rate of convergence of any Black-Box gradient methods asapplied to f ∈ C 1,1 cannon be high than O( 1

k2 ).

2. We cannot guarantee any rate of convergence in the argument.

3. Let A = LLT and f (x) = 12〈Ax , x〉 − 〈b, x〉. Then

f (x)− f ∗ = 12‖L

T x − d‖2, where d = LT x∗.

Thus, the residual of the linear system LT x = b cannot bedecreased faster than with the rate O( 1

k )(provided that we are allowed to multiply by L and LT .)

4. Optimization problems with nontrivial linear equality constraintscannot be solved faster than with the rate O( 1

k ).

Page 49: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Methods for Smooth Minimization with Simple Constraints

Consider the problem: minxf (x) : x ∈ Q,

where convex f ∈ C 1,1L (Q), and Q is a simple closed convex set

(allows projections).

Gradient mapping: for M > 0 defineTM(x) = arg min

y∈Q[f (x) + 〈f ′(x), y − x〉+ M

2 ‖x − y‖2].

If M ≥ L, thenf (TM(x)) ≤ f (x) + 〈f ′(x),TM(x)− x〉+ M

2 ‖x − TM(x)‖2].

Reduced gradient: gM(x) = M · (x − TM(x)).

Since 〈f ′(x) + M(TM(x)− x), y − TM(x)〉 ≥ 0 for all y ∈ Q,

f (x)− f (TM(x)) ≥ M2 ‖x − TM(x)‖2 = 1

2M ‖gM(x)‖2, (→ 0)

f (y) ≥ f (x) + 〈f ′(x),TM(x)− x〉+ 〈f ′(x), y − TM(x)〉≥ f (TM(x))− 1

2M ‖gM(x)‖2 + 〈gM(x), y − TM(x)〉.

Page 50: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Primal Gradient Method (PGM)

Main scheme: x0 ∈ Q, xk+1 = TL(xk), k ≥ 0.

Primal interpretation: xk+1 = πQ(xk − 1

L f ′(xk)).

Rate of convergence. f (xk)− f (xk+1) ≥ 12L‖gL(xk)‖2.

f (TL(x))− f ∗ ≤ 12L‖gL(x)‖2 + 〈gL(x),TL(x)− x∗〉

≤ 12L(‖gL(x)‖+ LR)2 − L

2 R2.

Hence, ‖gL(x)‖ ≥[2L(f (TL(x))− f ∗) + L2R2

]1/2 − LR

= 2L(f (TL(x))−f ∗)[2L(f (TL(x))−f ∗)+L2R2]1/2+LR

≥ cR · (f (TL(x))− f ∗).

Thus, f (xk)− f (xk+1) ≥ c2

LR2 (f (xk+1)− f ∗)2.

Similar situation: a′(t) = −a2(t)⇒ a(t) ≈ 1t .

Conclusion: PGM converges as O( 1k ). This is far from the lower

complexity bounds.

Page 51: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Dual Gradient Method (DGM)

Model: Let λki ≥ 0, i = 0, . . . , k , and Skdef=

k∑i=0

λki . Then

Sk f (y) ≥ Lλk (y)def=

k∑i=0

λki [f (x i ) + 〈f ′(x i ), y − x i 〉], y ∈ Q.

DGM: xk+1 = arg miny∈Q

ψk(y)

def= Lλk (y) + M

2 ‖y − x0‖2

.

Let us choose λki ≡ 1 and M = L. We prove by induction

(∗) : F ∗kdef=

k∑i=0

f (y i ) ≤ ψ∗kdef= min

y∈Qψk(y). (≤ (k + 1)f ∗ + L

2 R2)

1. k = 0. Then y 0 = TL(x0).2. Assume (∗) is true for some k ≥ 0. Then

ψ∗k+1 = miny∈Q

[ψk(y) + f (xk) + 〈f ′(xk), y − xk〉

]≥ min

y∈Q

[ψ∗k + L

2‖y − xk‖2 + f (xk) + 〈f ′(xk), y − xk〉].

We can take yk+1 = TL(xk). Thus, 1k+1

k∑i=0

f (y i ) ≤ f ∗ + LR2

2(k+1) .

Page 52: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Some remarks

1. Dual gradient method works with the model of the objectivefunction.

2. The minimizing sequence yk is not necessary for thealgorithmic scheme. We can generate it if necessary.

3. Both primal and dual method have the same rate ofconvergence O( 1

k ). It is not optimal.

May be we can combine them in order to get a better rate?

Page 53: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Comparing PGM and DGM

Primal Gradient method

Monotonically improves the current state using the localmodel of the objective.

Interpretation: Practitioners, industry.

Dual Gradient Method

The main goal is to construct a model of the objective.

It is updated by a new experience collected around thepredicted test points (xk).

Practical verification of the advices (yk) is not essential forthe procedure.

Interpretation: Science.

Hint: Combination of theory and practice should give better results

Page 54: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Estimating sequences

Def. A sequences φk(x)∞k=0 and λk∞k=0, λk ≥ 0 are called theestimating sequences if λk → 0 and ∀x ∈ Q, k ≥ 0,

(∗) : φk(x) ≤ (1− λk)f (x) + λkφ0(x).

Lemma: If (∗∗) : f (xk) ≤ φ∗k ≡ minx∈Q

φk(x), then

f (xk)− f ∗ ≤ λk [φ0(x∗)− f ∗]→ 0.

Proof. f (xk) ≤ φ∗k = minx∈Q

φk(x) ≤ minx∈Q

[(1− λk)f (x) + λkφ0(x)]

≤ (1− λk)f (x∗) + λkφ0(x∗).

Rate of λk → 0 defines the rate of f (xk)→ f ∗.

Questions

How to construct the estimating sequences?

How we can ensure (**)?

Page 55: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Updating estimating sequences

Let φ0(x) = L2‖x − x0‖2, λ0 = 1, yk∞k=0 is a sequence in Q, and

αk∞k=0 : αk ∈ (0, 1),∞∑k=0

αk =∞. Then φk(x)∞k=0, λk∞k=0:

λk+1 = (1− αk)λk ,

φk+1(x) = (1− αk)φk(x) + αk [f (yk) + 〈f ′(yk), x − yk〉]are estimating sequences.Proof: φ0(x) ≤ (1− λ0)f (x) + λ0φ0(x) ≡ φ0(x).If (*) holds for some k ≥ 0, then

φk+1(x) ≤ (1− αk)φk(x) + αk f (x)= (1− (1− αk)λk)f (x) + (1− αk)(φk(x)− (1− λk)f (x))≤ (1− (1− αk)λk)f (x) + (1− αk)λkφ0(x)= (1− λk+1)f (x) + λk+1φ0(x).

Page 56: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Updating the points

Denote φ∗k = minx∈Q

φk(x), vk = arg minx∈Q

φk(x). Suppose

φ∗k ≥ f (xk).φ∗k+1 = min

x∈Q

(1− αk)φk(x) + αk [f (yk) + 〈f ′(yk), x − yk〉]

minx∈Q

(1− αk)[φ∗k + λkL

2 ‖x − vk‖2] + αk [f (yk) + 〈f ′(yk), y − yk〉]

≥ minx∈Qf (yk) + (1−αk )λkL

2 ‖x − vk‖2

+〈f ′(yk), αk(x − yk) + (1− αk)(xk − yk)〉(yk

def= (1− αk)xk + αkvk = xk + αk(vk − xk))

= minx∈Qf (yk) + (1−αk )λkL

2 ‖x − vk‖2 + αk〈f ′(yk), x − vk〉

= miny=xk+αk (x−xk )

x∈Q

f (yk) + (1−αk )λkL2α2

k‖y − yk‖2 + 〈f ′(yk), y − yk〉

(?)

≥ f (xk+1)

Answer: α2k = (1− αk)λk . xk+1 = TL(yk).

Page 57: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Optimal method

Choose v 0 = x0 ∈ Q, λ0 = 1, φ0(x) = L2‖x − x0‖2.

For k ≥ 0 iterate:

Compute αk : α2k = (1− αk)λk ≡ λk+1.

Define yk = (1− αk)xk + αkvk .

Compute xk+1 = TL(yk).

φk+1(x) = (1− αk)φk(x) + αk [f (yk) + 〈f ′(yk), x − yk〉].

Convergence: Denote ak = λ−1/2k . Then

ak+1−ak =λ

1/2k −λ

1/2k+1

λ1/2k λ

1/2k+1

= λk−λk+1

λ1/2k λ

1/2k+1(λ

1/2k +λ

1/2k+1)≥ λk−λk+1

2λkλ1/2k+1

= αk

2λ1/2k+1

= 12 .

Thus, ak ≥ 1 + k2 . Hence, λk ≤ 4

(k+2)2 .

Page 58: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Interpretation

1. φk(x) accumulates all previously computed information aboutthe objective. This is a current model of our problem.2. vk = arg min

x∈Qφk(x) is a prediction of the optimal strategy.

3. φ∗k = φk(vk) is an estimate of the optimal value.

4. Acceleration condition: f (xk) ≤ φ∗k . We need a firm, whichis at least as good as the best theoretical prediction.5. Then we create a startup yk = (1− αk)xk + αkvk , and allow itto work one year.

6. Theorem: Next year, its performance will be at least as goodas the new theoretical prediction. And we can continue!

Acceleration result: 10 years instead 100.

Who is in a right position to arrange 5? Government, politicalinstitutions.

Page 59: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Advanced Convex Optimization (PGMO 2016)

Lecture 2. Second-order methods. Solvingsystems of nonlinear equations

Yurii Nesterov, CORE/INMA (UCL)

January 20-22, 2016 (Ecole Polytechnique, Paris)

Page 60: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Outline

1 Historical remarks

2 Trust region methods

3 Cubic regularization of second-order model

4 Local and global convergence

5 Accelerated Cubic Newton

6 Solving the system of nonlinear equations

7 Numerical experiments

Page 61: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Historical remarks

Problem: f (x) → min : x ∈ Rn

is treated as a non-linear system f ′(x) = 0.

Newton method: xk+1 = xk − [f ′′(xk)]−1f ′(xk).

Standard objections:

The method is not always well defined (det f ′′(xk) = 0).

Possible divergence.

Possible convergence to saddle points or even to localmaximums.

Chaotic global behavior.

Page 62: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Pre-History (see Ortega, Rheinboldt [1970].)

Bennet [1916]: Newton’s method in general analysis.

Levenberg [1944]: Regularization. If f ′′(x) 6 0, then used = G−1f ′(x) with G = f ′′(x) + γI 0. (See also Marquardt[1963].)

Kantorovich [1948]: Proof of local quadratic convergence.Assumptions:

a) f ∈ C 3(Rn).b) ‖f ′′(x)− f ′′(y)‖ ≤ L2‖x − y‖.c) f ′′(x∗) 0.d) x0 ≈ x∗.

Global convergence: Use line search (good advice).

Global performance: Not addressed.

Page 63: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Modern History (see in Conn, Gould and Toint [2000])

Main idea: Trust Region Approach.

1. By some norm ‖ · ‖k define the trust regionBk = x ∈ Rn : ‖x − xk‖k ≤ ∆k.2. Denotemk(x) = f (xk) + 〈f ′(xk), x − xk〉+ 1

2〈Gk(x − xk), x − xk〉.Variants: Gk = f ′′(xk), Gk = f ′′(xk) + γk I 0, etc.

3. Compute the trial point xk = arg minx∈Bk

mk(x).

4. Compute the ratio ρk = f (xk )−f (xk )f (xk )−mk (xk ) .

5. In accordance to ρk either accept xk+1 = xk or update the value∆k and repeat the steps above.

Page 64: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Comment

Advantages:

More parameters ⇒ Flexibility

Convergence to a point, which satisfies second-order necessaryoptimality condition:

f ′(x∗) = 0, f ′′(x∗) 0.

Disadvantages:

Complicated strategies for parameters’ coordination.

For certain ‖ · ‖k the auxiliary problem is difficult.

Line search abilities are quite limited.

Unselective theory.

Global complexity issues are not addressed.

Page 65: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Development of numerical schemes

Classical style: Problem formulation⇒ MethodExamples:

Gradient and Newton methods in optimization.

Runge-Kutta method for ODE, etc.

2. Modern style:Problem formulationProblem class

⇒ Method

Examples:

Non-smooth convex minimization.

Smooth minimization: minx∈Q

f (x), with f ∈ C 1,1.

Gradient mapping (Nemirovsky&Yudin 77):

x+ = T (x) ≡ arg miny∈Q

m1(y),

m1(y) ≡ f (x) + 〈f ′(x), y − x〉+ L12 ‖y − x‖2.

Justification: f (y) ≤ m1(y) for all y ∈ Q.

Page 66: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Using the second-order model

Problem: f (x) min : x ∈ Rn.

Assumption: Let F be an open convex set. Then‖f ′′(x)− f ′′(y)‖ ≤ L2‖x − y‖ ∀x , y ∈ F ,

L(x0) = x ∈ Rn : f (x) ≤ f (x0) ⊂ F .

Definem2(x , y) = f (x) + 〈f ′(x), y − x〉+ 1

2〈f′′(x)(y − x), y − x〉,

m′2(x , y) = f ′(x) + f ′′(x)(y − x).

Lemma 1. For any x , y ∈ F‖f ′(y)−m′2(x , y)‖ ≤ 1

2L2‖y − x‖2,|f (y)−m2(x , y)| ≤ 1

6L2‖y − x‖3.

Corollary: For any x and y from F ,f (y) ≤ m2(x , y) + 1

6L2‖y − x‖3.

Page 67: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Cubic regularization

For M > 0 define fM(x , y) = m2(x , y) + 16M‖y − x‖3,

TM(x) ∈ Arg miny

fM(x , y),

where “Arg” indicates that TM(x) is a global minimum.

Computability: If ‖ · ‖ is a Euclidean norm, then TM(x) can becomputed from a convex problem.

For r ∈ D ≡ r ∈ R : f ′′(x) + M2 rI 0, r ≥ 0, denote

v(r) = −12〈(f

′′(x) + Mr2 I )−1f ′(x), f ′(x)〉 − M

12 r3.

Lemma. For M > 0, minh∈Rn

fM(x , x + h) = supr∈D

v(r).

If the sup is attained at r∗ : f ′′(x) + Mr∗

2 I 0, then

h∗ = −(f ′′(x) + Mr∗

2 I )−1f ′(x)

where r∗ > 0 is a unique solution to r = ‖(f ′′(x) + Mr2 I )−1f ′(x)‖.

Page 68: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Simple properties

1. Denote rM(x) = ‖x − TM(x)‖. Then

f ′(x) + f ′′(x)(TM(x)− x) + MrM(x)2 (TM(x)− x) = 0,

f ′′(x) + 12MrM(x)I 0.

2. We have 〈f ′(x), x − TM(x)〉 ≥ 0, and

f (x)− fM(x) ≥ M12 r

3M(x),

r2M(x) ≥ 2

L+M ‖f′(x)‖.

3. If M ≥ L then fM(x) ≥ f (TM(x)).

4. fM(x) ≤ miny

[f (y) + L+M

6 ‖y − x‖3].

Compare with prox-method: x+ = miny

[f (y) + 12M‖y − x‖2].

Page 69: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Cubic regularization of Newton method

Consider the process: xk+1 = TL(xk), k = 0, 1, . . . .Note that f (xk+1) ≤ f (xk).

Saddle points. Let f ′(x∗) = 0 and f ′′(x∗) 6 0. Then ∃ε, δ > 0such that

‖x − x∗‖ ≤ ε, f (x) ≥ f (x∗) ⇒ f (TL(x)) ≤ f (x∗)− δ

Local convergence. If L(x0) is bounded, thenX ∗ ≡ lim

k→∞xk 6= ∅.

For any x∗ ∈ X ∗ we have f (x∗) = f ∗, f ′(x∗) = 0, f ′′(x∗) 0.

Global convergence: gk ≡ min1≤i≤k

‖f ′(xi )‖ ≤ O(

1k2/3

).

For gradient method we can guarantee only gk ≤ O(

1k1/2

).

Local rate of convergence: Quadratic.

Page 70: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Global performance: Star-convex functions

Def. For any x∗ ∈ X ∗ and any x ∈ F , α ∈ [0, 1] we havef (αx∗ + (1− α)x) ≤ αf (x∗) + (1− α)f (x).

Th 1. Let diam F ≤ D. Then

1. If f (x0)− f ∗ ≥ 32LD

3, then f (x1)− f ∗ ≤ 12LD

3.

2. If f (x0)− f ∗ ≤ 32LD

3, then f (xk)− f ∗ ≤ 3LD3

2(1+ 13k)2 .

Let X ∗ be non-degenerate: f (x)− f ∗ ≥ γ2ρ

2(x ,X ∗). Denoteω = 1

L2 (γ2 )3.

Th 2. Denote k0 the first number for which f (xk0)− f ∗ ≤ 49 ω.

If k ≤ k0, then f (xk)− f ∗ ≤[

(f (x0)− f ∗)1/4 − k6

√23 ω

1/4

]4

.

For k ≥ k0 we have f (xk+1)− f ∗ ≤ 12 (f (xk)− f ∗)

√f (xk )−f ∗

ω .

NB The Hessian f ′′(x∗) can be degenerate!

Page 71: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Global performance: Gradient-dominated functions

Definition.For any x ∈ F and x∗ ∈ X ∗ we havef (x)− f (x∗) ≤ τf ‖f ′(x)‖p

with τf > 0 and p ∈ [1, 2] (degree of domination).

Example 1. Convex functions:f (x)− f ∗ ≤ 〈f ′(x), x − x∗〉 ≤ R‖f ′(x)‖

for ‖x − x∗‖ ≤ R. Thus, p = 1, τf = 12D.

Example 2. Strongly convex functions: ∀x , y ∈ Rn

f (x) ≤ f (y) + 〈f ′(y), x − y〉+ 12γ ‖f

′(x)− f ′(y)‖2.

Thus, f (x)− f ∗ ≤ 12γ ‖f

′(x)‖2 ⇒ p = 2, τf = 12γ .

Page 72: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Gradient dominated functions, II

Example 3. Sum of squares. Consider the systemg(x) = 0 ∈ Rm, x ∈ Rn.

Assume that m ≤ n and the Jacobian J(x) = (g ′1(x), . . . , g ′m(x)) isuniformly non-degenerate:

σ ≡ infx∈F λmin(JT (x)J(x)) > 0.

Consider the function f (x) =m∑i=1

g2i (x). Then

f (x)− f ∗ ≤ 12σ‖f

′(x)‖2.

Thus, p = 2 and τf = 12σ .

Page 73: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Gradient dominated functions: rate of convergence

Theorem 3. Let p = 1. Denote ω = 23L(6τf )3. Let k0 be defined

as f (xk0)− f ∗ ≤ ξ2ω for some ξ > 1. Then for k ≤ k0 we have

ln(

1ω (f (xk)− f ∗)

)≤(

23

)kln(

1ω (f (x0)− f ∗)

).

Otherwise, f (xk)− f ∗ ≤ ω · ξ2(2+ 32ξ)2

(2+(k+ 32

)·ξ)2 .

Theorem 4. Let p = 2. Denote ω = 1(144L)2τ3

f. Let k0 be defined

as f (xk0)− f ∗ ≤ ω. Then for k ≤ k0 we have

f (xk)− f ∗ ≤ (f (x0)− f ∗) · e−kσ

with σ = ω1/4

ω1/4+(f (x0)−f ∗)1/4 . Otherwise,

f (xk+1)− f ∗ ≤ ω ·(f (xk )−f ∗

ω

)4/3.

NB: Superlinear convergence without direct nondegeneracyassumption for the Hessian.

Page 74: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Transformations of convex functions

Let u(x) : Rn → Rn be non-degenerate. Denote by v(u) itsinverse: v(u(x)) ≡ x .

Consider the function f (x) = φ(u(x)), where φ(u) is a convexfunction. Denote

σ = maxu‖v ′(u)‖ : φ(u) ≤ f (x0),

D = maxu‖u − u∗‖ : φ(u) ≤ f (x0).

Theorem 5.1. If f (x0)− f ∗ ≥ 3

2L(σD)3, then f (x1)− f ∗ ≤ 12L(σD)3.

2. If f (x0)− f ∗ ≤ 32L(σD)3, then f (xk)− f ∗ ≤ 3L(σD)3

2(1+ 13k)2 .

Example.u1(x) = x1, u2(x) = x2 + φ1(x1), . . .un(x) = xn + φn−1(x1, . . . , xn−1),

where φi (·) are arbitrary functions.

Page 75: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Accelerated Newton: Cubic prox-function

Denote d(x) = 13‖x − x0‖3.

Lemma. Cubic prox-function is uniformly convex: for allx , y ∈ Rn,

〈d ′(x)− d ′(y), x − y〉 ≥ 12‖x − y‖3,

d(x)− d(y)− 〈d ′(y), x − y〉 ≥ 16‖x − y‖3,

Moreover, its Hessian is Lipschitz continuous:

‖d ′′(x)− d ′′(y)‖ ≤ 2‖x − y‖, x , y ∈ Rn.

Remark. In our constructions, we are going to use d(x) instead ofthe standard strongly convex prox-functions.

Page 76: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Linear estimate functions (Compare with 1st-ordermethods)

We recursively update the following sequences.

Sequence of estimate functions ψk(x) = lk(x) + N2 d(x),

k ≥ 1, where lk(x) are linear, and N > 0.

A minimizing sequence xk∞k=1.

A sequence of scaling parameters Ak∞k=1:

Ak+1def= Ak + ak , k ≥ 1.

These objects have to satisfy the following relations:

(∗) :Ak f (xk) ≤ ψ∗k ≡ min

xψk(x),

ψk(x) ≤ Ak f (x) + (L2 + 12N)d(x), ∀x ∈ Rn,

for all k ≥ 1.(⇒ Ak(f (xk)− f (x∗)) ≤ (L + N

2 )d(x∗).)

For k = 1, we can choose x1 = TL2(x0), l1(x) ≡ f (x1), A1 = 1.

Page 77: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Denote vk = arg minx

ψk(x).

For some ak > 0 and M ≥ 2L2, define

αk = akAk+ak

∈ (0, 1),

yk = (1− αk)xk + αkvk ,

xk+1 = TM(yk),

ψk+1(x) = ψk(x) + ak [f (xk+1) + 〈f ′(xk+1), x − xk+1〉].

Theorem. For M = 2L2, N = 12L2, and ak = (k+1)(k+2)2 , k ≥ 1,

relations (*) hold recursively.

Corollary. For any k ≥ 1 we have f (xk)− f (x∗) ≤ 14L2‖x0−x∗‖3

k(k+1)(k+2) .

Page 78: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Accelerated CNM

Initialization: Set x1 = TL2(x0). Define ψ1(x) = f (x1) + 6L2 · d(x).

Iteration k, (k ≥ 1): vk = arg minx∈Rn

ψk(x),

yk = kk+3xk + 3

k+3vk , xk+1 = T2L2(yk),

ψk+1(x) = ψk(x) + (k+1)(k+2)2 [f (xk+1) + 〈f ′(xk+1), x − xk+1〉]

Remark:

Instead of recursive computation of ψk(x), we can update only onevector:

s1 = 0, sk+1 = sk + (k+1)(k+2)2 f ′(xk+1), k ≥ 1.

Then vk can be computed by an explicit expression.

Page 79: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Global non-degeneracy

Standard setting: for convex f ∈ C 2(Rn) define positiveconstants σ1 and L1 such that

σ1‖h‖2 ≤ 〈f ′′(x)h, h〉 ≤ L1‖h‖2

for all x , y , h ∈ Rn. The value γ1(f ) = σ1L1

is called the conditionnumber of f .

(Compatible with definition in Linear Algebra.)

Geometric interpretation: 〈f ′(x),x−x∗〉‖f ′(x)‖·‖x−x∗‖ ≥

2√γ1(f )

1+γ1(f ) , x ∈ Rn.

Complexity: (1st-order methods)

PGM: O(

1γ1(f ) · ln

), FGM: O

(1√γ1(f )

· ln 1ε

).

It does not work for 2nd-order schemes: f (xk)− f ∗ ≤ 14 L2 R3

k(k+1)(k+2) .

Page 80: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Global 2nd-order non-degeneracy

Assumption: for any x , y ∈ Rn, function f ∈ C 2(Rn) satisfiesinequalities

‖f ′′(x)− f ′′(y)‖ ≤ L2‖x − y‖,〈f ′(x)− f ′(y), x − y〉 ≥ σ2‖x − y‖3,

where σ2 > 0. We call the value γ2(f ) = σ2L2∈ (0, 1) the 2nd-order

condition number of function f .(Invariant w.r.t. addition of convex quadratic functions.)

Example: γ2(d) = 14 .

Justification: σ23 ‖xk − x∗‖3 ≤ f (xk)− f ∗ ≤ 14L2‖x0−x∗‖3

k(k+1)(k+2) .

Hence, in O(

1[γ2(f )]1/3

)iterations we halve the distance to x∗.

Complexity bound: (Accelerated CNM with restart)

O(

1[γ2(f )]1/3 · ln 1

ε

)iterations.

Page 81: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Open questions

1. Problem classes.

2. Lower complexity bounds and optimal methods.

3. Non-degenerate problems: geometric interpretation?

4. Complexity of strongly convex functions.(1st-order schemes?)

5. Consequences for polynomial-time methods.

Page 82: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Solving the systems of nonlinear equations

1. Standard Gauss-Newton method

Problem: Find x ∈ Rn satisfying the system F (x) = 0 ∈ Rm.

Assumption: ∀x , y ∈ Rn ‖F ′(x)− F ′(y)‖ ≤ L‖x − y‖.Gauss-Newton method: Choose a merit function φ(u) ≥ 0,φ(0) = 0, u ∈ Rm.

Compute x+ ∈ Arg miny

[φ(F (x) + F ′(x)(y − x))].

Usual choice: φ(u) =m∑i=1

u2i . (Justification: Why not?)

Remarks

Local quadratic convergence (m ≥ n, non-degeneracy andF (x∗) = 0 (?)).

If m < n, then the method is not well-defined.

No global complexity results.

Page 83: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Modified Gauss-Newton method

Lemma. For all x , y ∈ Rn we have

‖F (y)− F (x)− F ′(x)(y − x)‖ ≤ 12L‖y − x‖2.

Corollary. Denote f (y) = ‖F (y)‖. Then

f (y) ≤ ‖F (x) + F ′(x)(y − x)‖+ 12L‖y − x‖2.

Modified method:

xk+1 = arg miny

[‖F (xk) + F ′(xk)(y − xk) + 12L‖y − xk‖2].

Remarks

The merit function is non-smooth.

Nevertheless, f (xk+1) < f (xk) unless xk is a stationary point.

Quadratic convergence for non-degenerate solutions.

Global efficiency bounds.

Problem of finding xk+1 is convex.

Different norms in Rn and Rm can be used.

Page 84: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Testing CNM: Chebyshev oscilator

Consider f (x) = 14 (1− x (1))2 +

n−1∑i=1

(x (i+1) − p2(x (i))

)2, with

p2(τ) = 2τ2 − 1.

Note that p2 is a Chebyshev polynomial: pk(τ) = cos(k arccos(τ)).

Hence, the equations for the “central path” is

x (i+1) = p2(x (i)) = p4(x (i−1)) = · · · = p2i (x(1)).

This is an exponential oscillation! However, all coefficients andderivatives are small.

NB: f (x) is unimodular and x∗ = (1, . . . , 1).

In our experiments we usually take x0 = (−1, 1, . . . , 1).Drawback: x0 − 2∇f (x0) = x∗. Hence, sometimes we usex0 = (−1, 0.9. . . . , 0.9).

Page 85: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Solving Chebyshev oscilator by CN: ‖∇f (x)‖(2) ≤ 10−8

n Iter DF GNorm NumF Time (s)

2 14 7.0 · 10−19 4.2 · 10−09 18 0.032

3 33 1.1 · 10−24 7.5 · 10−12 51 0.031

4 82 1.7 · 10−20 9.3 · 10−10 148 0.047

5 207 4.5 · 10−19 1.2 · 10−09 395 0.078

6 541 1.0 · 10−17 5.6 · 10−09 1062 0.266

7 1490 1.4 · 10−18 2.9 · 10−09 2959 0.609

8 4087 2.7 · 10−17 9.1 · 10−09 8153 1.782

9 11205 1.6 · 10−16 9.6 · 10−09 22389 5.922

10 30678 2.7 · 10−15 9.6 · 10−09 61335 18.89

11 79292 7.7 · 10−14 1.0 · 10−08 158563 57.813

12 171522 9.7 · 10−13 9.9 · 10−09 343026 144.266

13 385353 1.3 · 10−11 9.9 · 10−09 770691 347.094

14 938758 2.1 · 10−11 1.0 · 10−08 1877500 1232.953

15 2203700 7.8 · 10−11 1.0 · 10−08 4407385 3204.359

Page 86: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Other methods

Trust region Knitro Minos 5.5 Snopt

n Inner Iter Iter Iter NFG Iter# NFG

3 129 50 30 44 120 106 784 431 123 80 136 309 268 2045 1310 299 203 339 793 647 5096 3963 722 531 871 2022 1417 1149∗

7 12672 1921 1467 2291 5404 ∗ ∗ ∗8 40036 5234 4040 6109 146809 120873 13907 11062 11939 28535

10 358317 36837 29729∗ ∗ ∗ ∗11 842368 78854 ∗ ∗ ∗12 2121780 182261

Notation: ∗ early termination, (∗ ∗ ∗) numerical difficulties/inaccurate solution, # needs an alternative starting point.Trust region: very reliable, but T (12) = 2577 sec (Matlab),T (n) = Const ∗ (4.5)n.

Page 87: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Advanced Convex Optimization (PGMO 2016)

Lecture 3. Structural Optimization: InteriorPoint Methods

Yurii Nesterov, CORE/INMA (UCL)

January 20-22, 2016 (Ecole Polytechnique, Paris)

Page 88: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Outline

1 Interior-point methods: standard problem

2 Newton method

3 Self-concordant functions and barriers

4 Minimizing self-concordant functions

5 Conic optimization problems

6 Primal-dual barriers

7 Primal-dual central path and path-following methods

Page 89: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Interior Point Methods

Black-Box Methods: Main assumptions represent the bounds forthe size of certain derivatives.

Example

Consider the function f (x1, x2) =

x2

2x1, x1 > 0,

0, x1 = x2 = 0.It is closed, convex, but discontinuous at the origin.

However, its epigraph x ∈ R3 : x1x3 ≥ x22 is a simple convex set:

x1 = u1 + u3, x2 = u2, x3 = u1 − u3 ⇒ u1 ≥√

u22 + u2

3 .

(Lorentz cone)

Question: Can we always replace the functional components byconvex sets?

Page 90: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Standard formulation

Problem: f ∗ = minx∈Q〈c , x〉,

where Q ⊂ E is a closed convex set with nonempty interior.

How we can measure the quality of x ∈ Q?

1. The residual 〈c , x〉 − f ∗ is not very informative since it doesnot depend on position of x inside Q.

2. The boundary of a convex set can be very complicated.

3. It is easy to travel inside provided that we keep a sufficientdistance to the boundary.

Conclusion: we need a barrier function f (x):

dom f = intQ,

f (x)→∞ as t → ∂Q.

Page 91: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Path-following method

Central path: for t > 0 define x∗(t), tc + f ′(x∗(t)) = 0(hence x∗(t) = arg min

x

[Ψt(x)

def= t〈c , x〉+ f (x)

].)

Lemma. Suppose 〈f ′(x), y − x〉 ≤ A for all x , y ∈ domQ. Then〈c , x∗(t)− x∗〉 = 1

t 〈f′(x∗(t)), x∗ − x∗(t)〉 ≤ 1

t A.

Method: tk > 0, xk ≈ x∗(tk) ⇒ tk+1 > tk ,xk+1 ≈ x∗(tk+1).

For approximating x∗(tk+1), we need a powerful minimizationscheme.

Main candidate: Newton Method.(Very good local convergence.)

Page 92: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Classical results on the Newton Method

Method: xk+1 = xk − [f ′′(xk)]−1f ′(xk).

Assume that:

f ′′(x∗) ≥ ` · In‖f ′′(x)− f ′′(y)‖ ≤ M‖x − y‖, ∀x , y ∈ Rn.

The starting point x0 is close to x∗: ‖x0 − x∗ ‖< r = 2`3M .

Then ‖ xk − x∗ ‖< r for all k , and the Newton method converges

quadratically: ‖ xk+1 − x∗ ‖≤ M‖xk−x∗‖2

2(`−M‖xk−x∗‖) .

Note:

The description of the region of quadratic convergence isgiven in terms of the metric 〈·, ·〉.The resulting neighborhood is changing when we chooseanother metric.

Page 93: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Simple observation

Let f (x) satisfy our assumptions. Consider φ(y) = f (Ay),where A is a non-degenerate (n × n)-matrix.

Lemma: Let xk be a sequence, generated by Newton Methodfor function f .Consider the sequence yk, generated by the Newton Method forfunction φ with y 0 = A−1x0.Then yk = A−1xk for all k ≥ 0.Proof: Assume yk = A−1xk for some k ≥ 0. Then

yk+1 = yk − [φ′′(yk)]−1φ′(yk)= yk − [AT f ′′(Ayk)A]−1AT f ′(Ayk)= A−1xk − A−1[f ′′(xk)]−1f ′(xk) = A−1xk+1.

Conclusion: The method is affine invariant. Its region ofquadratic convergence does not depend on the metric!

Page 94: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

What was wrong?

Old assumption: ‖ f ′′(x)− f ′′(y) ‖≤ M ‖ x − y ‖.Let f ∈ C 3(Rn). Denote f ′′′(x)[u] = lim

α→0

1α [f ′′(x + αu)− f ′′(x)].

This is a matrix!

Then the old assumption is equivalent to: ‖ f ′′′(x)[u] ‖≤ M ‖ u ‖.Hence, at any point x ∈ Rn we have

(∗) : | 〈f ′′′(x)[u]v , v〉 |≤ M ‖ u ‖ · ‖ v ‖2 for all u, v ∈ Rn.

Note:

The LHS of (∗) is an affine invariant directional derivative.

The norm ‖ · ‖ has nothing common with our particular f .

However, there exists a local norm, which is closely relatedto f . This is ‖ u ‖f ′′(x)= 〈f ′′(x)u, u〉1/2.

Let us make a similar assumption in terms of ‖ · ‖f ′′(x).

Page 95: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Definition of Self-Concordant Function

Let f (x) ∈ C 3(dom f ) be a closed and convex, with open domain.Let us fix a point x ∈ dom f and a direction u ∈ Rn.

Consider the function φ(x ; t) = f (x + tu). Denote

Df (x)[u] = φ′t(x ; 0) = 〈f ′(x), u〉,D2f (x)[u, u] = φ′′tt(x ; 0) = 〈f ′′(x)u, u〉 =‖ u ‖2

f ′′(x),

D3f (x)[u, u, u] = φ′′′ttt(x ; 0) = 〈f ′′′(x)[u]u, u〉.Def. We call function f self-concordant if the inequality| D3f (x)[u, u, u] |≤ 2 ‖ u ‖3

f ′′(x) holds for any x ∈ dom f , u ∈ Rn.

Note:

We cannot expect that these functions are very common.

We hope that they are good for the Newton Method.

Page 96: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Examples

1. Linear function is s.c. since f ′′(x) ≡ 0, f ′′′(x) ≡ 0

2. Convex quadratic function is s.c. (f ′′′(x) ≡ 0).

3. Logarithmic barrier for a ray x > 0:f (x) = − ln x , f ′(x) = − 1

x , f ′′(x) = 1x2 , f ′′′(x) = − 2

x3 .

4. Logarithmic barrier for a quadratic region. Consider a concavefunction φ(x) = α + 〈a, x〉 − 1

2〈Ax , x〉. Define f (x) = − lnφ(x).

Df (x)[u] = − 1φ(x) [〈a, u〉 − 〈Ax , u〉] def

= ω1,

D2f (x)[u]2 = 1φ2(x)

[〈a, u〉 − 〈Ax , u〉]2 + 1φ(x)〈Au, u〉,

D3f (x)[u]3 = − 2φ3(x)

[〈a, u〉 − 〈Ax , u〉]3 − 3〈Au,u〉φ2(x)

[〈a, u〉 − 〈Ax , u〉].

D2 = ω21 + ω2, D3 = 2ω3

1 − 3ω1ω2. Hence, |D3| ≤ 2|D2|3/2.

Page 97: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Simple properties

1. If f1, f2 are s.c.f., then f1 + f2 is s.c. function.

2. If f (y) is s.c.f., then φ(x) = f (Ax + b) is also a s.c. function.

Proof: Denote y = y(x) = Ax + b, v = Au. Then

Dφ(x)[u] = 〈f ′(y(x)),Au〉 = 〈f ′(y), v〉,D2φ(x)[u]2 = 〈f ′′(y(x))Au,Au〉 = 〈f ′′(y)v , v〉,D3φ(x)[u]3 = D3f (y(x))[Au]3 = D3f (y)[v ]3.

Example: f (x) = 〈c , x〉 −m∑i=1

ln(ai − ‖Aix − bi‖2) is a

s.c.-function.

Page 98: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Main properties

Let x ∈ dom f and u ∈ Rn, u 6= 0. For x + tu ∈ dom f , considerφ(t) = 1

〈f ′′(x+tu)u,u〉1/2 .

Lemma 1. For all feasible t we have: | φ′(t) |≤ 1.

Proof: Indeed, φ′(t) = − f ′′′(x+tu)[u]3

2〈f ′′(x+tu)u,u〉3/2 .

Corollary 1: domφ contains the interval (−φ(0), φ(0)).

Proof: Since f (x + tu)→∞ as x + tu → ∂dom f , the same istrue for 〈f ′′(x + tu)u, u〉. Hence domφ(t) ≡ t | φ(t) > 0.Denote ‖h‖2

x = 〈f ′′(x)h, h〉, W 0(x ; r) = y ∈ Rn | ‖ y − x ‖x< r.Then

W 0(x ; r) ⊆ dom f for r < 1.

Page 99: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Main inequalities

Denote W (x ; r) = y ∈ Rn | ‖y − x‖x < r.Theorem. For all x , y ∈ dom f the following inequality holds:

‖ y − x ‖y≥ ‖y−x‖x1+‖y−x‖x .

If ‖ y − x ‖x< 1 then ‖ y − x ‖y≤ ‖y−x‖x1−‖y−x‖x .

Proof. 1. Let us choose u = y − x . Then

φ(1) = 1‖y−x‖y , φ(0) = 1

‖y−x‖x ,

and φ(1) ≤ φ(0) + 1 in view of Lemma 1.

2. If ‖ y − x ‖x< 1, then φ(0) > 1, and in view of Lemma 1,φ(1) ≥ φ(0)− 1.

Page 100: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Useful inequalities

Theorem. For any x , y ∈ dom f we have:

〈f ′(y)− f ′(x), y − x〉 ≥ ‖y−x‖2x

1+‖y−x‖x ,

f (y) ≥ f (x) + 〈f ′(x), y − x〉+ ω(‖ y − x ‖x),

where ω(t) = t − ln(1 + t).

Proof. Denote yτ = x + τ(y − x), τ ∈ [0, 1], and r =‖ y − x ‖x .

〈f ′(y)− f ′(x), y − x〉 =1∫

0

〈f ′′(yτ )(y − x), y − x〉dτ

=1∫

0

1τ2 ‖ yτ − x ‖2

yτ dτ ≥1∫

0

r2

(1+τ r)2 dτ = rr∫

0

1(1+t)2 dt = r2

1+r

f (y)− f (x)− 〈f ′(x), y − x〉 =1∫

0

〈f ′(yτ )− f ′(x), y − x〉dτ

≥1∫

0

‖yτ−x‖2x

τ(1+‖yτ−x‖x ) dτ =1∫

0

τ r2

1+τ r dτ =r∫

0

tdt1+t = ω(r).

Page 101: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Similar inequalities

Theorem. Let x ∈ dom f and ‖ y − x ‖x< 1. Then

〈f ′(y)− f ′(x), y − x〉 ≤ ‖y−x‖2x

1−‖y−x‖x ,

f (y) ≤ f (x) + 〈f ′(x), y − x〉+ ω∗(‖ y − x ‖x),

where ω∗(t) = −t − ln(1− t).

Main Theorem: for any y ∈W (x ; r), r ∈ [0, 1), we have

(1− r)2F ′′(x) F ′′(y) 1(1−r)2 F ′′(x).

Corollary. For G =1∫

0

f ′′(x + τ(y − x))dτ , we have

(1− r + r2

3 )f ′′(x) G 11−r f ′′(x).

Observation: If dom f contains no straight line, then f ′′(x) 0for any x ∈ dom f . (If not, then W (x , 1) is unbounded.)

Page 102: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Minimizing the self-concordant function

Consider the problem: minf (x) | x ∈ dom f . Assume dom fcontains no straight line.

Theorem. Let λf (x) < 1 for some x ∈ dom f . Then the solutionof this problem x∗f , exists and unique.

Proof. Indeed, for any y ∈ dom f we have:

f (y) ≥ f (x) + 〈f ′(x), y − x〉+ ω(‖ y − x ‖x)

≥ f (x)− ‖ f ′(x) ‖∗x · ‖ y − x ‖x +ω(‖ y − x ‖x)

= f (x)− λf (x)· ‖ y − x ‖x +ω(‖ y − x ‖x).

Since ω(t) = t − ln(1 + t), the level sets are bounded. ⇒ ∃x∗f .

It is unique since in since f (y) ≥ f (x∗f ) + ω(‖ y − x∗f ‖x∗f ), andf ′′(x∗f ) is nondegenerate.

Example: f (x) = (1− ε)x − ln x with ε ∈ (0, 1) and x = 1.

Page 103: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Damped Newton Method

Consider the following scheme: x0 ∈ dom f ,

xk+1 = xk − 11+λf (xk ) [f ′′(xk)]−1f ′(xk).

Theorem. For any k ≥ 0 we have f (xk+1) ≤ f (xk)− ω(λf (xk)).

Proof. Denote λ = λf (xk). Then ‖ xk+1− xk ‖x= λ1+λ . Therefore,

f (xk+1) ≤ f (xk) + 〈f ′(xk), xk+1 − xk〉+ ω∗(‖ xk+1 − xk ‖x)

= f (xk)− ω(λ).

Consequence: we come to the region λf (xk) ≤ const inO(f (x0)− f ∗) iterations.

Page 104: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Local convergence

For x close to x∗, f ′(x∗) = 0, function f (x) is almost quadratic:

f (x) ≈ f ∗ + 12〈f′′(x∗)(x − x∗), x − x∗〉.

Therefore, f (x)− f ∗ ≈ 12‖x − x∗‖2

x∗ ≈ 12‖x − x∗‖2

x

≈ 12〈f′(x), [f ′′(x)]−1f ′(x)〉 def= 1

2 (‖f ′(x)‖∗x)2 def= λ2

f (x).

The last value is the local norm of the gradient. It is computable.

Theorem: Let x ∈ dom f and λf (x) < 1.

Then the point x+ = x − [f ′′(x)]−1f ′(x) belongs to dom f and

λf (x+) ≤(

λf (x)1−λf (x)

)2.

Page 105: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Proof

Denote p = x+ − x , λ = λf (x). Then ‖ p ‖x= λ < 1, x+ ∈ dom f .λf (x+) ≤ 1

1−‖p‖x ‖ f ′(x+) ‖x= 11−λ ‖ f ′(x+) ‖x .

Note that f ′(x+) = f ′(x+)− f ′(x)− f ′′(x)(x+ − x) = Gp, where

G =1∫

0

[f ′′(x + τp)− f ′′(x)]dτ . Therefore

‖ f ′(x+) ‖2x= 〈[f ′′(x)]−1Gp,Gp〉 ≤‖ H ‖2 · ‖ p ‖2

x

,where H = [f ′′(x)]−1/2G [f ′′(x)]−1/2. In view of Corollary,

(−λ+ 13λ

2)f ′′(x) ≤ G ≤ λ1−λ f ′′(x).

Therefore ‖ H ‖≤ max

λ1−λ , λ−

13λ

2

= λ1−λ , and

λ2f (x+) ≤ 1

(1−λ)2 ‖ f ′(x+) ‖2x≤ λ4

(1−λ)4 .

NB: Region of quadratic convergence is λf (x) < λ, λ(1−λ)2 = 1.

It is affine-invariant!

Page 106: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Following the cental path

Consider Ψt(x) = t〈c , x〉+ f (x) with s.c. function f .

For Ψt , Newton Method has local quadratic convergence.

The region of quadratic convergence (RQC) is given byλΨt (x) ≤ β < λ.

Assume we know x = x∗(t). We want to update t, t+ = t + ∆,keeping x in RQC of function Ψt+∆: λΨt+∆

(x) ≤ β.

Question: How large can be ∆? Since tc + f ′(x) = 0, we have:

λΨt+∆(x) =‖ t+c + f ′(x) ‖∗x= |∆|· ‖ c ‖∗x= |∆|

t ‖ f ′(x) ‖∗x≤ β.

Conclusion: for the linear rate, we need to assume that

〈[f ′′(x)]−1f ′(x), f ′(x)〉 is uniformly bounded on dom f .

Thus, we come to the definition of self-concordant barrier.

Page 107: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Definition of Self-Concordant Barrier

Let F (x) be a s.c.-function. It is a ν-self-concordant barrier, ifmaxu∈Rn

[2〈F ′(x), u〉 − 〈F ′′(x)u, u〉] ≤ ν for all x ∈ domF .

The value ν is called the parameter of the barrier.

If F ′′(x) is non-degenerate, then 〈F ′(x), [F ′′(x)]−1F ′(x)〉 ≤ ν.

Another form: 〈F ′(x), u〉2 ≤ ν〈F ′′(x)u, u〉.Main property: 〈F ′(x), y − x〉 ≤ ν, x , y ∈ intQ.

NB: ν is responsible for the rate of p.-f. method: t+ = t ± α·tν1/2 .

Complexity: O(√ν ln ν

ε

)iterations of the Newton method.

Calculus: 1. Affine transformations do not change ν.2. Restriction on a subspace can only decrease ν.3. F = F1 + F2 ⇒ ν = ν1 + ν2.

Page 108: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Examples

1. Barrier for a ray: F (t) = − ln t, F ′(t) = −1t , F ′′(t) = 1

t2 , ν = 1.

2. Polytop x : 〈ai , x〉 ≤ bi, F (x) = −m∑i=1

ln(bi − 〈ai , x〉), ν = m.

3. l2-ball: F (x) = − ln(1− ‖x‖2), D1 = ω1, D2 = ω21 + ω2, ν = 1.

4. Intersection of ellipsoids: F (x) = −m∑i=1

ln(r 2i − ‖Aix − bi‖2),

ν = m.5. Epigraph t ≥ ex, F (x , t) = − ln(t − ex)− ln(ln t − x), ν = 4.

6. Universal barrier. Define the polar setP(x) = s : 〈s, y − x〉 ≤ 1, y ∈ Q.

Then F (x) = − ln volnP(x) is an O(n)-s.c. barrier for Q.

7. Lorentz cone t ≥ ‖x‖, F (x , t) = − ln(t2 − ‖x‖2), ν = 2.

8. LMI-cone X = XT 0, F (X ) = − ln det X , ν = n.

Page 109: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Conic minimization problems

Problem: f ∗ = min〈c , x〉 : Ax = b, x ∈ K,where

A ∈ Rm×n : Rn → Rm, m < n, c ∈ Rn, b ∈ Rm.

K , intK 6= ∅, is a closed convex pointed cone:

1 ∀x1, x2 ∈ K ⇒ x1 + x2 ∈ K .2 ∀x ∈ K , τ ≥ 0 ⇒ τx ∈ K .3 K contains no straight line.

Assumptions:

A is nondegenerate, b 6= 0.

There is no y ∈ Rm such that c = AT y .

Explanations:

If b = 0 then either f ∗ = 0 or f ∗ = −∞.

If ∃y ∈ Rm : c = AT y then ∀x , Ax = b, we have:〈c, x〉 = 〈AT y , x〉 = 〈y ,Ax〉 = 〈b, y〉.

Page 110: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Main Assumption:

There exist a computable ν-normal barrier F (x) for K such that

F (x) is a ν-self-concordant barrier for K .

F (x) is logarithmically homogeneous: ∀x ∈ intK , τ > 0(∗) F (τx) = F (x)− ν ln τ .

Examples:

1 Positive orthant: K = Rn+

def= x ∈ Rn : x (i) ≥ 0, i = 1 . . . n,

F (x) = −n∑

i=1ln x (i), ν = n.

2 Cone of positive semidefinite matrices:

K = Sn+

def= X ∈ Rn×n : X = XT , 〈Xu, u〉 ≥ 0 ∀u ∈ Rn,

F (X ) = − ln det X , ν = n.

3 2nd order cone: K = Lndef= z = (x , τ) ∈ Rn+1 : τ ≥‖ x ‖,

F (z) = − ln(τ2− ‖ x ‖2), ν = 2.

4 Direct sums of these cones.

Page 111: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Properties of logarithmically homogeneous barriers

Then for any x ∈ intK and τ > 0 we have:

(1) : F ′(τx) = 1τ F ′(x), (2) : F ′′(τx) = 1

τ2 F ′′(x).

(3) : 〈F ′(x), x〉 = −ν, (4) : F ′′(x)x = −F ′(x).

(5) : 〈F ′′(x)x , x〉 = ν, (6) : 〈[F ′′(x)]−1F ′(x),F ′(x)〉 = ν.

Proof:1. Differentiate (*) in x : τF ′(τx) = F ′(x).2. Differentiate 1) in x : τF ′′(τx) = 1

τ F ′′(x).3. Diff. (*) in τ : 〈F ′(τx), x〉 = −ν

τ . Take τ = 1 and we get 3).4. Differentiate 3) in x : F ′′(x)x + F ′(x) = 0.5. Substitute 4) in 3).6. Substitute 4) in 5).

Page 112: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Dual cones

Definition. Let K be a closed convex cone. The setK ∗ = s : 〈s, x〉 ≥ 0 ∀x ∈ K

is called the dual cone to K ∗.Theorem. If K is a proper cone, then K ∗ is also proper and(K ∗)∗ = K .

Proof: K ∗ is closed and convex as an intersection of half-spaces.If K ∗ contains a straight line s = τ s, τ ∈ R, then 〈s, x〉 = 0∀x ∈ K (contradiction).For all s ∈ K ∗ and x ∈ K we have: 〈s, x〉 ≥ 0. ThereforeK ⊆ (K ∗)∗. If ∃u ∈ (K ∗)∗ \ K then

∃s : 〈s, u〉 < 〈s, x〉 ∀x ∈ K .Hence, s ∈ K ∗ and u /∈ (K ∗)∗. Contradiction.If intK ∗ = ∅ then ∃x : 〈s, x〉 = 0, ∀s ∈ K ∗.Therefore ±x ∈ (K ∗)∗ ≡ K ∗. Contradiction.

Page 113: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Conjugate barriers

Definition. Let K be a proper cone and F (x) be a ν-s.c.b. for K .The function

F∗(s) = max−〈s, x〉 − F (x) : x ∈ Kis called conjugate (or dual) barrier.

Main properties

domF∗(s) ≡ intK ∗.

F∗(s) is a ν-normal barrier for K ∗.

For any x ∈ intK and s ∈ intK ∗ we have:

F (x) + F∗(s) ≥ −ν ln〈s, x〉 − ν + ν ln ν.

Equality is attained iff s = −τF ′(x) for some τ > 0.

Examples: The barriers for Rn+, Ln and Sn

+ are self-dual.

Page 114: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Primal–Dual Problems

Primal problem: f ∗ = minx〈c , x〉 : Ax = b, x ∈ K.

Dual problem: f∗ = maxy ,s〈b, y〉 : s + AT y = c , s ∈ K ∗.

Denote FD the feasible set of the dual problem.

Note:

For any x ∈ FP and (s, y) ∈ FD we have:0 ≤ 〈s, x〉 = 〈c − AT y , x〉 = 〈c , x〉 − 〈b, y〉.

Therefore we always have: f ∗ ≥ f∗.

Page 115: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Main Assumption (we always assume this)

There exists a strictly feasible primal-dual solution (x , s, y):Ax = b, x ∈ intK , s + AT y = c , s ∈ intK ∗.

Primal central path: x(t) = arg mint〈c , x〉+ F (x) : Ax = b.Dual central path:

(s(t), y(t)) = arg min−t〈b, y〉+ F∗(s) : s + AT y = c.Primal–dual central path: (x(t), s(t), y(t)), t > 0.

Lemma. The primal-dual central path is well defined.

Proof: Note that ∀x ∈ FP

F (x) ≥ −t〈s, x〉 − F∗(ts) = −t(〈c , x〉 − 〈b, y〉)− F∗(ts).Therefore t〈c , x〉+ F (x) ≥ t〈b, y〉 − F∗(ts).

Thus, x(t) exists. The proof for the dual path is symmetric.

Page 116: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Properties of primal-dual central path

Theorem. For any t > 0 we have the following:〈c , x(t)〉 − 〈b, y(t)〉 = ν

t ,F ′(x(t)) = −1

t F ′∗(s(t)), F ′∗(s(t)) = −1t F ′(x(t)),

F (x(t)) + F∗(s(t)) = −ν + ν ln t.Proof: Let us write down the optimality conditions for x(t):

tc + F ′(x(t)) = AT y(t), Ax(t) = b.

Denote s(t) = −1t F ′(x(t)), y(t) = 1

t y(t). Thenx(t) = −F ′∗(ts(t)) = −1

t F ′∗(s(t)).Thus, c = s(t) + AT y(t), tb + AF ′∗(s(t)) = 0.

This is the optimality conditions for the dual path. In view ofuniqueness, we have s(t) = −1

t F ′(x(t)).

The rest: 〈c , x(t)〉 − 〈b, y(t)〉 = 〈s(t), x(t)〉 = νt ,

F (x(t)) + F∗(s(t)) = F (x(t)) + F∗(−F ′(x(t)) + ν ln t = −ν + ν ln t.

Page 117: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Remarks

Under our Main Assumption f ∗ = f∗.

The set FP ×FD is never bounded.

We have complete characterization for the duality gap〈c , x〉 − 〈b, y〉 and the barrier F (x) + F∗(s) along the centralpath.

This information forms the basis for all primal-dual schemes.

That is not for free: We assume that F∗(s) is computable.

.

Page 118: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Primal–dual potential

Φ(x , s) = 2ν ln〈s, x〉+ F (x) + F∗(s)

= 2ν ln[〈c , x〉 − 〈b, y〉] + F (x) + F∗(s).

Lemma. For any (x , s, y) ∈ F0PD ≡ F0

P ×F0D we have:

〈c , x〉 − 〈b, y〉 ≤ 1ν exp

1 + 1

νΦ(x , s)

.

Proof:

Φ(x , s) = 2ν ln〈s, x〉+ F (x) + F∗(s)

≥ 2ν ln〈s, x〉 − ν + ν ln ν − ν ln〈s, x〉

= ν ln[〈c , x〉 − 〈b, y〉]− ν + ν ln ν.

Main question: What can be the rate of decrease of Φ(x , s)?

Page 119: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Decrease along the central path z(t) = (x(t), s(t), y(t))

We want to have ‖ z(t + ∆t)− z(t) ‖z(t)≤ 1.

This is approximately | ∆t |≤ 1‖z ′(t)‖z(t)

.

Note that 〈s ′(t), x ′(t)〉 = 0 ands ′(t) = −1

t s(t)− 1t F ′′(x(t))x ′(t),

x ′(t) = −1t x(t)− 1

t F ′′∗ (s(t))s ′(t).

Therefore 〈F ′′(x(t))x ′(t), x ′(t)〉 = −〈s(t), x ′(t)〉,〈F ′′∗ (s(t))s ′(t), s ′(t)〉 = −〈s ′(t), x(t)〉.Hence ‖ z ′(t) ‖2

z(t)= 〈F ′′(x(t))x ′(t), x ′(t)〉+ 〈F ′′∗ (s(t))s ′(t), s ′(t)〉= −(〈s(t), x(t)〉)′t = −

(νt

)′t

= νt2 .

Thus, we can take ∆t = t√ν

.

For potential: ∆t · (Φ(x(t), s(t)))′t = t√ν·(2ν ln ν

t + ν ln t)′t

= t√ν· (−ν ln t)′t = t√

ν·(−ν

t

)= −√ν.

Page 120: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Proximity measure

Ω(x , s) = ν ln〈s, x〉+ F (x) + F∗(s) + ν − ν ln ν= ν ln[〈c , x〉 − 〈b, y〉] + F (x) + F∗(s) + ν − ν ln ν.

Properties:

Ω(x , s) ≥ 0 for all (x , s, y) ∈ F0PD .

Ω(x , s) = 0 only along the central path.

The restriction of Ω(x , s) onto the hyperplane〈c , x〉 − 〈b, y〉 = const is a convex self-concordant function.

Note: (x(t), s(t), y(t)) = arg minx ,s,yF (x) + F∗(s) :

Ax = b, s + AT y = c , 〈c , x〉 − 〈b, y〉 = νt .

Proof: z(t) is feasible and F (x(t)) + F∗(s(t)) = −ν + ν ln t.On the other hand, for any feasible (x , s, y) we have:

F (x) + F∗(s) ≥ −ν + ν ln ν − ν ln〈s, x〉= −ν + ν ln ν − ν ln ν

t = −ν + ν ln t.The minimum is unique since FPD contains no straight line.

Page 121: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Primal-dual path-following scheme

If we close to the central path, we can move along thetangent direction up to the moment Ω(x , s) ≤ β.

Then we fix 〈c , x〉 − 〈b, y〉 and go back to the central path byminimizing the barrier.

Efficiency estimate: O(√ν ln 1

ε

).

Advantages:

The tangent step typically is large.

The level β bounds the number of Newton steps for thecorrector process by an absolute constant.

NB. These schemes are the most efficient now for solving Linearand Quadratic Optimization problems and Linear MatrixInequalities of moderate size.

Page 122: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Advanced Convex Optimization (PGMO 2016)

Lecture 4. Structural Optimization: SmoothingTechnique

Yurii Nesterov, CORE/INMA (UCL)

January 20-22, 2016 (Ecole Polytechnique, Paris)

Page 123: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Outline

1 Nonsmooth Optimization

2 Smoothing technique

3 Application examples

Page 124: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Nonsmooth Unconstrained Optimization

Problem: min f (x) : x ∈ Rn ⇒ x∗, f ∗ = f (x∗),where f (x) is a nonsmooth convex function.

Subgradients: g ∈ ∂f (x)⇔ f (y) ≥ f (x) + 〈g , y − x〉 ∀y ∈ Rn.

Main difficulties:

g ∈ ∂f (x) is not a descent direction at x .

g ∈ ∂f (x∗) does not imply g = 0.

Example

f (x) = max1≤j≤m

〈aj , x〉+ bj,

∂f (x) = Conv aj : 〈aj , x〉+ bj = f (x).

Page 125: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Subgradient methods in Nonsmooth Optimization

Advantages

Very simple iteration scheme.

Low memory requirements.

Optimal rate of convergence (uniformly in the dimension).

Interpretation of the process.

Objections:

Low rate of convergence. (Confirmed by theory!)

No acceleration.

High sensitivity to the step-size strategy.

Page 126: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Lower complexity bounds

Nemirovsky, Yudin 1976

If f (x) is given by a local black-box, it is impossible to converge

faster than O(

1√k

)uniformly in n. (k is the # of calls of oracle.)

NB: Convergence is very slow.

Question: We want to find an ε-solution of the problem

max1≤j≤m

〈aj , x〉+ bj → minx

: x ∈ Rn,

by a gradient scheme (n and m are big).

What is the worst-case complexity bound?

“Right answer” (Complexity Theory): O(

1ε2

)calls of oracle.

Our target: A gradient scheme with O(

)complexity bound.

Reason of speed up: our problem is not in a black box.

Page 127: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Complexity of Smooth Minimization

Problem: f (x) → minx

: x ∈ Rn , where f is a convex function

and ‖∇f (x)−∇f (y)‖∗ ≤ L(f )‖x − y‖ for all x , y ∈ Rn.

(For measuring gradients we use dual norms: ‖s‖∗ = max‖x‖=1

〈s, x〉.)

Rate of convergence: Optimal method gives O(L(f )k2

).

Complexity: O

(√L(f )ε

). The difference with O

(1ε2

)is very big.

Page 128: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Smoothing the convex function

For function f define its Fenchel conjugate:f∗(s) = max

x∈Rn[〈s, x〉 − f (x)].

It is a closed convex function with dom f∗ = Convf ′(x) : x ∈ Rn.Moreover, under very mild conditions (f∗(s))∗ ≡ f (x).

Define fµ(x) = maxs∈dom f∗

[〈s, x〉 − f∗(s)− µ2‖s‖

2∗], where ‖ · ‖∗ is a

Euclidean norm.

Note: f ′µ(x) = sµ(x), and x = f ′∗(sµ(x)) + µsµ(x). Therefore,

‖x1 − x2‖2 = ‖f ′∗(s1)− f ′∗(s2)‖2 + 2µ〈f ′∗(s1)− f ′∗(s2), s1 − s2〉

+µ2‖s1 − s2‖2 ≥ µ2‖s1 − s2‖2.

Thus, fµ ∈ C 1,11/µ and f (x) ≥ fµ(x) ≥ f (x)− µD2,

where D = Diam(dom f∗).

Page 129: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Main questions

1. Given by a non-smooth convex f (x), can we form itscomputable smooth ε-approximation fε(x) with

L(fε) = O(

)?

If yes, we need only O

(√L(fε)ε

)= O

(1ε

)iterations.

2. Can we do this in a systematic way?

Conclusion: We need a convenient model of our problem.

Page 130: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Adjoint problem

Primal problem: Find f ∗ = minxf (x) : x ∈ Q1, where

Q1 ⊂ E1 is convex closed and bounded.

Objective: f (x) = f (x) + maxu〈Ax , u〉2− φ(u) : u ∈ Q2, where

f (x) is differentiable and convex on Q1.

Q2 ⊂ E2 is a closed convex and bounded.

φ(u) is continuous convex function on Q2.

linear operator A : E1 → E ∗2 .

Adjoint problem: maxuφ(u) : u ∈ Q2, where

φ(u) = −φ(u) + minx〈Ax , u〉2 + f (x) : x ∈ Q1.

NB: Adjoint problem is not unique!

Page 131: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Example

Consider f (x) = max1≤j≤m

|〈aj , x〉1 − bj |.

1. Q2 = E ∗1 , A = I , φ(u) ≡ f∗(u) = maxx〈u, x〉1 − f (x) : x ∈ E1

= mins∈Rm

m∑j=1

sjbj : u =m∑j=1

sjaj ,m∑j=1|sj | ≤ 1

.

2. E2 = Rm, φ(u) = 〈b, u〉2, f (x) = max1≤j≤m

|〈aj , x〉1 − bj |

= maxu∈Rm

m∑j=1

uj [〈aj , x〉1 − bj ] :m∑j=1|uj | ≤ 1

.

3. E2 = R2m, φ(u) is a linear, Q2 is a simplex:

f (x) = maxu∈R2m

m∑j=1

(u1j −u2

j )[〈aj , x〉1−bj ] :m∑j=1

(u1j +u2

j ) = 1, u ≥ 0.

NB: Increase in dimE2 decreases the complexity of representation.

Page 132: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Smooth approximations

Prox-function: d2(u) is continuous and strongly convex on Q2:d2(v) ≥ d2(u) + 〈∇d2(u), v − u〉2 + 1

2σ2‖v − u‖22.

Assume: d2(u0) = 0 and d2(u) ≥ 0 ∀u ∈ Q2.

Fix µ > 0, the smoothing parameter, and definefµ(x) = max

u〈Ax , u〉2 − φ(u)− µd2(u) : u ∈ Q2.

Denote by u(x) the solution of this problem.

Theorem: fµ(x) is convex and differentiable for x ∈ E1. Itsgradient ∇fµ(x) = A∗u(x) is Lipschitz continuous with

L(fµ) = 1µσ2‖A‖2

1,2,

where ‖A‖1,2 = maxx ,u〈Ax , u〉2 : ‖x‖1 = 1, ‖u‖2 = 1.

NB: 1. For any x ∈ E1 we have f0(x) ≥ fµ(x) ≥ f0(x)− µD2,where D2 = max

ud2(u) : u ∈ Q2.

2. All norms are very important.

Page 133: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Optimal method

Problem: minxf (x) : x ∈ Q1 with f ∈ C 1,1(Q1).

Prox-function: strongly convex d1(x), d1(x0) = 0, d1(x) ≥ 0,x ∈ Q1.

Gradient mapping:TL(x) = arg min

y∈Q1

〈∇f (x), y − x〉1 + 1

2L‖y − x‖21

.

Method. For k ≥ 0 do:1. Compute f (xk),∇f (xk).2. Find yk = TL(f )(xk).

3. Find zk = arg minx∈Q1

L(f )σ d1(x) +

k∑i=0

i+12 〈∇f (x i ), x〉1.

4. Set xk+1 = 2k+3z

k + k+1k+3y

k .

Convergence: f (yk)− f (x∗) ≤ 4L(f )d1(x∗)σ1(k+1)2 , where x∗ is the

optimal solution.

Page 134: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Applications

Smooth problem: fµ(x) = f (x) + fµ(x) → min : x ∈ Q1.

Lipschitz constant: Lµ = L(f ) + 1µσ2‖A‖2

1,2. DenoteD1 = max

xd1(x) : x ∈ Q1.

Theorem: Let us choose N ≥ 1. Define

µ = µ(N) =2‖A‖1,2

N+1 ·√

D1σ1σ2D2

.

After N iterations set x = yN ∈ Q1 and

u =N∑i=0

2(i+1)(N+1)(N+2) u(x i ) ∈ Q2.

Then 0 ≤ f (x)− φ(u) ≤ 4‖A‖1,2

N+1 ·√

D1D2σ1σ2

+ 4L(f )D1

σ1·(N+1)2 .

Corollary. Let L(f ) = 0. For getting an ε-solution, we choose

µ = ε2D2

, L = D22σ2· ‖A‖

21,2

ε , N ≥ 4‖A‖1,2

√D1D2σ1σ2

· 1ε .

Page 135: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Example: Equilibrium in matrix games (1)

Denote ∆n = x ∈ Rn : x ≥ 0,n∑

i=1x (i) = 1. Consider the

problem minx∈∆n

maxu∈∆m

〈Ax , u〉2 + 〈c , x〉1 + 〈b, u〉2.

Minimization form:minx∈∆n

f (x), f (x) = 〈c , x〉1 + max1≤j≤m

[〈aj , x〉1 + bj ],

maxu∈∆m

φ(u), φ(u) = 〈b, u〉2 + min1≤i≤n

[〈ai , u〉2 + ci ],

where aj are the rows and ai are the columns of A.

1. Euclidean distance: Let us take

‖x‖21 =

n∑i=1

x2i , ‖u‖2

2 =m∑j=1

u2j ,

d1(x) = 12‖x −

1nen‖

21, d2(u) = 1

2‖u −1mem‖2

2.

Then ‖A‖1,2 = λ1/2max(ATA) and f (x)− φ(u) ≤ 4λ

1/2max(ATA)N+1 .

Page 136: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Example: Equilibrium in matrix games (2)

2. Entropy distance. Let us choose

‖x‖1 =n∑

i=1|xi |, d1(x) = ln n +

n∑i=1

xi ln xi ,

‖u‖2 =m∑j=1|uj |, d2(u) = lnm +

m∑j=1

uj ln uj .

LM: σ1 = σ2 = 1. (Hint: 〈d ′′1 (x)h, h〉 =n∑

i=1

h2ixi→ min

x∈∆n

= ‖h‖21.)

Moreover, since D1 = ln n, D2 = lnm, and

‖A‖1,2 = maxx max

1≤j≤m|〈aj , x〉| : ‖x‖1 = 1 = max

i ,j|Ai ,j |,

we have f (x)− φ(u) ≤ 4√

ln n lnmN+1 ·max

i ,j|Ai ,j |.

NB: 1. Usually maxi ,j|Ai ,j | << λ

1/2max(ATA).

2. We have fµ(x) = 〈c , x〉1 + µ ln

(1m

m∑j=1

e [〈aj ,x〉+bj ]/µ

).

Page 137: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Example 2: Continuous location problem

Problem: p cities with populations mj , j = 1, . . . ,m, are locatedat

cj ∈ Rn, j = 1, . . . , p.

Goal: Construct a service center at point x∗, which minimizes thetotal distance to the center.

That is Find f ∗ = minx

f (x) =

p∑j=1

mj‖x − cj‖1 : ‖x‖1 ≤ r

.

Primal space:

‖x‖21 =

n∑i=1

(x (i))2, d1(x) = 12‖x‖

21, σ1 = 1, D1 = 1

2 r2.

Page 138: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Adjoint space: E2 = (E ∗1 )p, ‖u‖22 =

p∑j=1

mj(‖uj‖∗1)2,

Q2 = u = (u1, . . . , up) ∈ E2 : ‖uj‖∗1 ≤ 1, j = 1, . . . , p ,

d2(u) = 12‖u‖

22, σ2 = 1, D2 = 1

2P.

with P ≡p∑

j=1mj , the total size of population.

Operator norm: ‖A‖1,2 = P1/2.

Rate of convergence: f (x)− f ∗ ≤ 2PrN+1 .

fµ(x) =p∑

j=1mjψµ(‖x − cj‖1), ψµ(τ) =

τ2

2µ , τ ≤ µ,τ − µ

2 , µ ≤ τ.

Page 139: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Example 3: Variational inequalities (linear operator)

Consider B(w) = Bw + c : E → E ∗, which is monotone:

〈Bh, h〉 ≥ 0 ∀h ∈ E .

Problem: Find w∗ ∈ Q : 〈B(w∗),w − w∗〉 ≥ 0 ∀w ∈ Q,where Q is a bounded convex closed set.

Merit function: ψ(w) = maxv〈B(v),w − v〉 : v ∈ Q.

ψ(w) is convex on E1.

ψ(w) ≥ 0 for all w ∈ Q.

ψ(w) = 0 if and only if w solves VI-problem.

〈B(v), v〉 is a convex function. Thus, ψ is exactly in our form.

Primal smoothing:ψµ(w) = max

v〈B(v),w − v〉 − µd2(v) : v ∈ Q.

Dual smoothing:φµ(v) = min

w〈B(v),w − v〉+ µd1(w) : w ∈ Q. (Looks better.)

Page 140: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Example 4: Piece-wise linear functions

1. Maximum of absolute values. Consider

minx

f (x) = max

1≤j≤m|〈aj , x〉1 − b(j)| : x ∈ Q1

.

For simplicity choose ‖x‖21 =

n∑i=1

(x (i))2, d1(x) = 12‖x‖

2.

It is convenient to choose E2 = R2m,

‖u‖2 =2m∑j=1|u(j)|, d2(u) = ln(2m) +

2m∑j=1

u(j) ln u(j).

Denote by A the matrix with the rows aj . Then

f (x) = maxu〈Ax , u〉2 − 〈b, u〉2 : u ∈ ∆2m,

where A =

(A−A

)and b =

(b−b

).

Page 141: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Thus, σ1 = σ2 = 1,D2 = ln(2m), D1 = 1

2 r2, r = max

x‖x‖1 : x ∈ Q1.

Operator norm: ‖A‖1,2 = max1≤j≤m

‖aj‖∗1.

Complexity: 2√

2 r max1≤j≤m

‖aj‖∗1√

ln(2m) · 1ε .

Approximation: for ξ(τ) = 12 [eτ + e−τ ] define

fµ(x) = µ ln

(1m

m∑j=1

ξ(

1µ [〈aj , x〉+ b(j)]

)).

Page 142: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Piece-wise linear functions: Sum of absolute values.

minx

f (x) =

m∑j=1|〈aj , x〉1 − b(j)| : x ∈ Q1

.

Let us choose E2 = Rm, Q2 = u ∈ Rm : |u(j)| ≤ 1,

j = 1, . . . ,m, and d2(u) = 12‖u‖

22 = 1

2

m∑j=1‖aj‖∗1 · (u(j))2.

Then fµ(x) =m∑j=1‖aj‖∗1 · ψµ

(|〈aj ,x〉1−b(j)|‖aj‖∗1

),

‖A‖21,2 = P ≡

m∑j=1‖aj‖∗1.

On the other hand, D2 = 12P and σ2 = 1. Thus, we get the

following complexity bound: 1ε ·√

8D1σ1·

m∑j=1‖aj‖∗1.

NB: The bound and the scheme allow m→∞.

Page 143: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Computational experiments

Test problem: minx∈∆n

maxu∈∆m

〈Ax , u〉2.

Entries of A are uniformly distributed in [−1, 1].

Goal: Test of computational stability. Computer: 2.6GHz.

Complexity of iteration: 2mn operations.Results for ε = 0.01. Table 1

m\n 100 300 1000 3000 10000

1008080′′

10110′′

11123′′

131412′′

141544′′

3009100′′

11122′′

141510′′

161735′′

1819135′′

10001112

2′′1213

8′′141532′′

1718115′′

2020451′′

Number of iterations: 40− 50% of predicted values.

Page 144: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Results for ε = 0.001. Table 2m\n 100 300 1000 3000 10000

1006970

2′′8586

8′939429′′,

1000091′′

10908349′′

3007778

8′′10101

27′′12424

97′′14242313′′

156561162′′

1000878830′′

11010105′′

13030339′′

157571083′′

182824085′′

Results for ε = 0.0001. Table 3m\n 100 300 1000 3000

10067068

25′′72073

80′′74075287′′

80081945′′

30085086

89′′, 42%92093243′′

101102914′′

1121133302′′

100097098331′′

100101760′′

1161172936′′

13914011028′′

Page 145: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Comparing the bounds

Smoothing + FGM: 2 · 4 · mnε

√ln n lnm.

Short-step p.-f. method (n ≥ m):(7.2√n ln 1

ε

)· m(m+1)

2 n.

Right digits

m n 2 3 4 5

100 100 g g b b300 300 g g b b300 1000 g g b b300 3000 g g = b300 10000 g g g b

1000 1000 g g g b1000 3000 g g g b1000 10000 g g g =

g - S+FGM, b - barrier method

Page 146: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Advanced Convex Optimization (PGMO 2016)

Lecture 5. Huge-scale optimization

Yurii Nesterov, CORE/INMA (UCL)

January 20-22, 2016 (Ecole Polytechnique, Paris)

Page 147: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Outline

1 Problems sizes

2 Random coordinate search

3 Confidence level of solutions

4 Sparse Optimization problems

5 Sparse updates for linear operators

6 Fast updates in computational trees

7 Simple subgradient methods

8 Application examples

Page 148: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Nonlinear Optimization: problems sizes

Class Operations Dimension Iter.Cost Memory

Small-size All 100 − 102 n4 → n3 Kilobyte: 103

Medium-size A−1 103 − 104 n3 → n2 Megabyte: 106

Large-scale Ax 105 − 107 n2 → n Gigabyte: 109

Huge-scale x + y 108 − 1012 n→ log n Terabyte: 1012

Sources of Huge-Scale problems

Internet (New)

Telecommunications (New)

Finite-element schemes (Old)

Partial differential equations (Old)

Page 149: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Very old optimization idea: Coordinate Search

Problem: minx∈Rn

f (x) (f is convex and differentiable).

Coordinate relaxation algorithm

For k ≥ 0 iterate

1 Choose active coordinate ik .

2 Update xk+1 = xk − hk∇ik f (xk)eik ensuring f (xk+1) ≤ f (xk).(ei is ith coordinate vector in Rn.)

Main advantage: Very simple implementation.

Page 150: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Possible strategies

1 Cyclic moves. (Difficult to analyze.)

2 Random choice of coordinate (Why?)

3 Choose coordinate with the maximal directional derivative.

Complexity estimate: assume‖∇f (x)−∇f (y)‖ ≤ L‖x − y‖, x , y ∈ Rn.

Let us choose hk = 1L . Then

f (xk)− f (xk+1) ≥ 12L |∇ik f (xk)|2 ≥ 1

2nL‖∇f (xk)‖2

≥ 12nLR2 (f (xk)− f ∗)2.

Hence, f (xk)− f ∗ ≤ 2nLR2

k , k ≥ 1. (For Grad.Method, drop n.)

This is the only known theoretical result known for CDM!

Page 151: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Criticism

Theoretical justification:

Complexity bounds are not known for the most of theschemes.

The only justified scheme needs computation of thewhole gradient. (Why don’t use GM?)

Computational complexity:

Fast differentiation: if function is defined by a sequence ofoperations, then C (∇f ) ≤ 4C (f ).

Can we do anything without computing the function’s values?

Result: CDM are almost out of the computational practice.

Page 152: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Google problem

Let E ∈ Rn×n be an incidence matrix of a graph. Denotee = (1, . . . , 1)T and

E = E · diag (ET e)−1.

Thus, ET e = e. Our problem is as follows:

Find x∗ ≥ 0 : E x∗ = x∗.

Optimization formulation:

f (x)def= 1

2‖E x − x‖2 + γ2 [〈e, x〉 − 1]2 → min

x∈Rn

Page 153: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Huge-scale problems

Main features

The size is very big (n ≥ 107).

The data is distributed in space.

The requested parts of data are not always available.

The data may be changing in time.

Consequences

Simplest operations are expensive or infeasible:

Update of the full vector of variables.

Matrix-vector multiplication.

Computation of the objective function’s value, etc.

Page 154: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Structure of the Google Problem

Let ua look at the gradient of the objective:

∇i f (x) = 〈ai , g(x)〉+ γ[〈e, x〉 − 1], i = 1, . . . , n,

g(x) = E x − x ∈ Rn, (E = (a1, . . . , an)).

Main observations:

The coordinate move x+ = x − hi∇i f (x)ei needs O(pi ) a.o.(pi is the number of nonzero elements in ai .)

didef= diag

(∇2f

def= ET E + γeeT

)i

= γ + 1pi

are available.

We can use them for choosing the step sizes (hi = 1di

).

Reasonable coordinate choice strategy? Random!

Page 155: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Random coordinate descent methods (RCDM)

minx∈RN

f (x), (f is convex and differentiable)

Main Assumption:

|f ′i (x + hiei )− f ′i (x)| ≤ Li |hi |, hi ∈ R, i = 1, . . . ,N,

where ei is a coordinate vector. Then

f (x + hiei ) ≤ f (x) + f ′i (x)hi + Li2 h

2i . x ∈ RN , hi ∈ R.

Define the coordinate steps: Ti (x)def= x − 1

Lif ′i (x)ei . Then,

f (x)− f (Ti (x)) ≥ 12Li

[f ′i (x)]2, i = 1, . . . ,N.

Page 156: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Random choice for coordinates

We need a special random counter Rα, α ∈ R:

Prob [i ] = p(i)α = Lαi ·

[N∑j=1

Lαj

]−1, i = 1, . . . ,N.

Note: R0 generates uniform distribution.

Method RCDM(α, x0)

For k ≥ 0 iterate:

1) Choose ik = Rα.

2) Update xk+1 = Tik (xk).

Page 157: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Complexity bounds for RCDM

We need to introduce the following norms for x , g ∈ RN :

‖x‖α =

[N∑i=1

Lαi [x (i)]2]1/2

, ‖g‖∗α =

[N∑i=1

1Lαi

[g (i)]2]1/2

.

After k iterations, RCDM(α, x0) generates random output xk ,which depends on ξk = i0, . . . , ik. Denote φk = Eξk−1

f (xk).

Theorem. For any k ≥ 1 we have

φk − f ∗ ≤ 2k ·

[N∑j=1

Lαj

]· R2

1−α(x0),

where Rβ(x0) = maxx

maxx∗∈X∗

‖x − x∗‖β : f (x) ≤ f (x0)

.

Page 158: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Interpretation

Denote Sα =N∑i=1

Lαi .

1. α = 0. Then S0 = N, and we get

φk − f ∗ ≤ 2Nk · R

21 (x0).

Note

We use the metric ‖x‖21 =N∑i=1

Li [x(i)]2.

Matrix with diagonal LiNi=1 can have its norm equal to n.

Hence, for GM we can guarantee the same bound.

But its cost of iteration is much higher!

Page 159: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Interpretation

2. α = 12 . Denote

D∞(x0) = maxx

maxy∈X∗

max1≤i≤N

|x (i) − y (i)| : f (x) ≤ f (x0)

.

Then, R21/2(x0) ≤ S1/2D

2∞(x0), and we obtain

φk − f ∗ ≤ 2k ·[

N∑i=1

L1/2i

]2· D2∞(x0).

Note:

For the first order methods, the worst-case complexity ofminimizing over a box depends on N.

Since S1/2 can be bounded, RCDM can be applied insituations when the usual GM fail.

Page 160: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Interpretation

3. α = 1. Then R0(x0) is the size of the initial level set in thestandard Euclidean norm. Hence,

φk − f ∗ ≤ 2k ·[

N∑i=1

Li

]· R2

0 (x0) ≡ 2Nk ·[

1N

N∑i=1

Li

]· R2

0 (x0).

Rate of convergence of GM can be estimated as

f (xk)− f ∗ ≤ γ

kR20 (x0),

where γ satisfies condition f ′′(x) γ · I , x ∈ RN .Note: maximal eigenvalue of symmetric matrix can reach its trace.

In the worst case, the rate of convergence of GM is the same asthat of RCDM.

Page 161: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Minimizing the strongly convex functions

Theorem. Let f (x) be strongly convex with respect to ‖ · ‖1−αwith convexity parameter σ1−α > 0.Then, for xk generated by RCDM(α, x0) we have

φk − φ∗ ≤(

1− σ1−αSα

)k(f (x0)− f ∗).

Proof: Let xk be generated by RCDM after k iterations.Let us estimate the expected result of the next iteration.

f (xk)− Eik (f (xk+1)) =N∑i=1

p(i)α · [f (xk)− f (Ti (xk))]

≥N∑i=1

p(i)α2Li

[f ′i (xk)]2 = 12Sα

(‖f ′(xk)‖∗1−α)2

≥ σ1−αSα

(f (xk)− f ∗).

It remains to compute expectation in ξk−1.

Page 162: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Confidence level of the answers

Note: We have proved that the expected values of random f (xk)are good.

Can we guarantee anything after a single run?

Confidence level: Probability β ∈ (0, 1), that some statementabout random output is correct.Main tool: Chebyschev inequality (ξ ≥ 0):

Prob [ξ ≥ T ] ≤ E(ξ)T .

Our situation:

Prob [f (xk)− f ∗ ≥ ε] ≤ 1ε [φk − f ∗] ≤ 1− β.

We need φk − f ∗ ≤ ε · (1− β). Too expensive for β → 1?

Page 163: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Regularization technique

Consider fµ(x) = f (x) + µ2‖x − x0‖21−α. It is strongly convex.

Therefore, we can obtain φk − f ∗µ ≤ ε · (1− β) in

O(

1µSα ln 1

ε·(1−β)

)iterations.

Theorem. Define α = 1, µ = ε4R2

0 (x0), and choose

k ≥ 1 +8S1R2

0 (x0)ε

[ln

2S1R20 (x0)ε + ln 1

1−β

].

Let xk be generated by RCDM(1, x0) as applied to fµ.Then

Prob (f (xk)− f ∗ ≤ ε) ≥ β.

Note: β = 1− 10−p ⇒ ln 10p = 2.3p.

Page 164: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Implementation details: Random Counter

Given the values Li , i = 1, . . . ,N, generate efficiently random

i ∈ 1, . . . ,N with probabilities Prob [i = k] = Lk/N∑j=1

Lj .

Solution: a) Trivial ⇒ O(N) operations.

b). Assume N = 2p. Define p + 1 vectors Sk ∈ R2p−k,

k = 0, . . . , p:

S(i)0 = Li , i = 1, . . . ,N.

S(i)k = S

(2i)k−1 + S

(2i−1)k−1 , i = 1, . . . , 2p−k , k = 1, . . . , p.

Algorithm: Make the choice in p steps, from top to bottom.

If the element i of Sk is chosen, then choose in Sk−1 either 2i

or 2i − 1 in accordance to probabilitiesS(2i)k−1

S(i)k

orS(2i−1)k−1

S(i)k

.

Difference: for n = 220 > 106 we have p = log2N = 20.

Page 165: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Sparse problems

Problem: minx∈Q

f (x), where Q is closed and convex in RN , and

f (x) = Ψ(Ax), where Ψ is a simple convex function:

Ψ(y1) ≥ Ψ(y2) + 〈Ψ′(y2), y1 − y2〉, y1, y2 ∈ RM ,

A : RN → RM is a sparse matrix.

Let p(x)def= # of nonzeros in x . Sparsity coefficient:

γ(A)def= p(A)

MN .

Example 1: Matrix-vector multiplication

Computation of vector Ax needs p(A) operations.

Initial complexity MN is reduced in γ(A) times.

Page 166: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Gradient Method

x0 ∈ Q, xk+1 = πQ(xk − hf ′(xk)), k ≥ 0.

Main computational expenses

Projection onto a simple set Q needs O(N) operations.

Displacement xk → xk − hf ′(xk) needs O(N) operations.

f ′(x) = ATΨ′(Ax). If Ψ is simple, then the main efforts arespent for two matrix-vector multiplications: 2p(A).

Conclusion: As compared with full matrices, we accelerate inγ(A) times.Note: For Large- and Huge-scale problems, we often haveγ(A) ≈ 10−4 . . . 10−6. Can we get more?

Page 167: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Sparse updating strategy

Main idea

After update x+ = x + d we have y+def= Ax+ = Ax︸︷︷︸

y

+Ad .

What happens if d is sparse?

Denote σ(d) = j : d (j) 6= 0. Then y+ = y +∑

j∈σ(d)d (j) · Aej .

Its complexity, κA(d)def=

∑j∈σ(d)

p(Aej), can be VERY small!

κA(d) = M∑

j∈σ(d)γ(Aej) = γ(d) · 1

p(d)

∑j∈σ(d)

γ(Aej) ·MN

≤ γ(d) max1≤j≤m

γ(Aej) ·MN.

If γ(d) ≤ cγ(A), γ(Aj) ≤ cγ(A), then

κA(d) ≤ c2 · γ2(A) ·MN .

Expected acceleration: (10−6)2 = 10−12 ⇒ 1 sec ≈ 32 000years

Page 168: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

When it can work?

Simple methods: No full-vector operations! (Is it possible?)

Simple problems: Functions with sparse gradients.

Examples

1 Quadratic function f (x) = 12〈Ax , x〉 − 〈b, x〉. The gradient

f ′(x) = Ax − b, x ∈ RN ,

is not sparse even if A is sparse.

2 Piece-wise linear function g(x) = max1≤i≤m

[〈ai , x〉 − b(i)]. Its

subgradient f ′(x) = ai(x), i(x) : f (x) = 〈ai(x), x〉 − b(i(x)),can be sparse if ai is sparse!

But: We need a fast procedure for updating max-operations.

Page 169: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Fast updates in short computational trees

Def: Function f (x), x ∈ Rn, is short-tree representable, if it canbe computed by a short binary tree with the height ≈ ln n.

Let n = 2k and the tree has k + 1 levels: v0,i = x (i), i = 1, . . . , n.Size of the next level halves the size of the previous one:

vi+1,j = ψi+1,j(vi ,2j−1, vi ,2j), j = 1, . . . , 2k−i−1, i = 0, . . . , k − 1,

where ψi ,j are some bivariate functions.

v2,1v1,1 v1,2

v0,1 v0,2 v0,3 v0,4

v2,n/4v1,n/2−1 v1,n/2

v0,n−3v0,n−2v0,n−1 v0,n

. . . . . . . . .

. . .

vk−1,1 vk−1,2vk,1

Page 170: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Main advantages

Important examples (symmetric functions)

f (x) = ‖x‖p, p ≥ 1, ψi ,j(t1, t2) ≡ [ |t1|p + |t2|p ]1/p ,

f (x) = ln

(n∑

i=1ex

(i)

), ψi ,j(t1, t2) ≡ ln (et1 + et2) ,

f (x) = max1≤i≤n

x (i), ψi ,j(t1, t2) ≡ max t1, t2 .

The binary tree requires only n − 1 auxiliary cells.

Its value needs n − 1 applications of ψi ,j(·, ·) ( ≡ operations).

If x+ differs from x in one entry only, then for re-computingf (x+) we need only k ≡ log2 n operations.

Thus, we can have pure subgradient minimization schemes withSublinear Iteration Cost

.

Page 171: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Simple subgradient methods

I. Problem: f ∗def= min

x∈Qf (x), where

Q is a closed and convex and ‖f ′(x)‖ ≤ L(f ), x ∈ Q,

the optimal value f ∗ is known.

Consider the following optimization scheme (B.Polyak, 1967):

x0 ∈ Q, xk+1 = πQ

(xk −

f (xk)− f ∗

‖f ′(xk)‖2f ′(xk)

), k ≥ 0.

Denote f ∗k = min0≤i≤k

f (xi ). Then for any k ≥ 0 we have:

f ∗k − f ∗ ≤ L(f )‖x0−πX∗ (x0)‖(k+1)1/2

,

‖xk − x∗‖ ≤ ‖x0 − x∗‖, ∀x∗ ∈ X∗.

Page 172: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Proof:

Let us fix x∗ ∈ X∗. Denote rk(x∗) = ‖xk − x∗‖. Then

r2k+1(x∗) ≤∥∥∥xk − f (xk )−f ∗

‖f ′(xk )‖2f ′(xk)− x∗

∥∥∥2= r2k (x∗)− 2 f (xk )−f ∗

‖f ′(xk )‖2〈f ′(xk), xk − x∗〉+ (f (xk )−f ∗)2

‖f ′(xk )‖2

≤ r2k (x∗)− (f (xk )−f ∗)2‖f ′(xk )‖2

≤ r2k (x∗)− (f ∗k −f∗)2

L2(f ).

From this reasoning, ‖xk+1 − x∗‖2 ≤ ‖xk − x∗‖2, ∀x∗ ∈ X ∗.Corollary: Assume X∗ has recession direction d∗. Then

‖xk − πX∗(x0)‖ ≤ ‖x0 − πX∗(x0)‖, 〈d∗, xk〉 ≥ 〈d∗, x0〉.

(Proof: consider x∗ = πX∗(x0) + αd∗, α ≥ 0.)

Page 173: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Constrained minimization (N.Shor (1964) & B.Polyak)

II. Problem: minx∈Qf (x) : g(x) ≤ 0, where

Q is closed and convex,

f , g have uniformly bounded subgradients.

Consider the following method. It has step-size parameter h > 0.

If g(xk) > h ‖g ′(xk)‖, then (A): xk+1 = πQ

(xk − g(xk )

‖g ′(xk )‖2g ′(xk)

),

else (B): xk+1 = πQ

(xk − h

‖f ′(xk )‖ f′(xk)

).

Let Fk ⊆ 0, . . . , k be the set (B)-iterations, andf ∗k = min

i∈Fk

f (xi ).

Theorem: If k > ‖x0 − x∗‖2/h2, then Fk 6= ∅ and

f ∗k − f (x) ≤ hL(f ), maxi∈Fk

g(xi ) ≤ hL(g).

Page 174: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Computational strategies

1. Constants L(f ), L(g) are known (e.g. Linear Programming)

We can take h = εmaxL(f ),L(g) . Then we need to decide on the

number of steps N (easy!).

Note: The standard advice is h = R√N+1

(much more difficult!)

2. Constants L(f ), L(g) are not known

Start from a guess.

Restart from scratch each time we see the guess is wrong.

The guess is doubled after restart.

3. Tracking the record value f ∗k

Double run. Other ideas are welcome!

Page 175: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Application examples

Observations:

1 Very often, Large- and Huge- scale problems have repetitivesparsity patterns and/or limited connectivity.

Social networks.Mobile phone networks.Truss topology design (local bars).Finite elements models (2D: four neighbors, 3D: six neighbors).

2 For p-diagonal matrices κ(A) ≤ p2.

Page 176: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Nonsmooth formulation of Google Problem

Main property of spectral radius (A ≥ 0)

If A ∈ Rn×n+ , then ρ(A) = min

x≥0max1≤i≤n

1x(i)〈ei ,Ax〉.

The minimum is attained at the corresponding eigenvector.

Since ρ(E ) = 1, our problem is as follows:

f (x)def= max

1≤i≤N[〈ei , E x〉 − x (i)] → min

x≥0.

Interpretation: Maximizing the self-esteem!Since f ∗ = 0, we can apply Polyak’s method with sparse updates.Additional features; the optimal set X ∗ is a convex cone.If x0 = e, then the whole sequence is separated from zero:

〈x∗, e〉 ≤ 〈x∗, xk〉 ≤ ‖x∗‖1 · ‖xk‖∞ = 〈x∗, e〉 · ‖xk‖∞.

Goal: Find x ≥ 0 such that ‖x‖∞ ≥ 1 and f (x) ≤ ε.(First condition is satisfied automatically.)

Page 177: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Computational experiments: Iteration Cost

We compare Polyak’s GM with sparse update (GMs) with thestandard one (GM).

Setup: Each agent has exactly p random friends.Thus, κ(A) ≈ p2.

Iteration Cost: GMs ≈ p2 log2N, GM ≈ pN.

Time for 104 iterations (p = 32)

N κ(A) GMs GM

1024 1632 3.00 2.982048 1792 3.36 6.414096 1888 3.75 15.118192 1920 4.20 139.92

16384 1824 4.69 408.38

Time for 103 iterations (p = 16)

N κ(A) GMs GM

131072 576 0.19 213.9262144 592 0.25 477.8524288 592 0.32 1095.5

1048576 608 0.40 2590.8

1 sec ≈ 100 min!

Page 178: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Convergence of GMs : Medium Size

Let N = 131072, p = 16, κ(A) = 576, and L(f ) = 0.21.

Iterations f − f ∗ Time (sec)

1.0 · 105 0.1100 16.443.0 · 105 0.0429 49.326.0 · 105 0.0221 98.651.1 · 106 0.0119 180.852.2 · 106 0.0057 361.714.1 · 106 0.0028 674.097.6 · 106 0.0014 1249.541.0 · 107 0.0010 1644.13

Dimension and accuracy are sufficiently high, but the time is stillreasonable.

Page 179: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Convergence of GMs : Large Scale

Let N = 1048576, p = 8, κ(A) = 192, and L(f ) = 0.21.

Iterations f − f ∗ Time (sec)

0 2.000000 0.001.0 · 105 0.546662 7.694.0 · 105 0.276866 30.741.0 · 106 0.137822 76.862.5 · 106 0.063099 192.145.1 · 106 0.032092 391.979.9 · 106 0.016162 760.881.5 · 107 0.010009 1183.59

Final point x∗: ‖x∗‖∞ = 2.941497, R20

def= ‖x∗ − e‖22 = 1.2 · 105.

Theoretical bound:L2(f )R2

0ε2

= 5.3 ·107. Time for GM: ≈ 1 year!

Page 180: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Conclusion

1 Sparse GM is an efficient and reliable method for solvingLarge- and Huge- Scale problems with uniform sparsity.

2 We can treat also dense rows. Assume that inequality〈a, x〉 ≤ b is dense. It is equivalent to the following system:

y (1) = a(1) x (1), y (j) = y (j−1) + a(j) x (j), j = 2, . . . , n,

y (n) ≤ b.

We need new variables y (j) for all nonzero coefficients of a.

Introduce p(a) additional variables and p(A) additionalequality constraints. (No problem!)Hidden drawback: the above equalities are satisfied with errors.May be it is not too bad?

3 Similar technique can be applied to dense columns.

Page 181: Advanced Convex Optimization (PGMO) · Interior-point methods Self-concordant functions Self-concordant barriers Application examples Lecture 4. Smoothing Technique Explicit model

Theoretical consequences

Assume that κ(A) ≈ γ2(A)n2. Compare three methods:

Sparse updates (SU). Complexity γ2(A)n2 L2R2

ε2log n

operations.

Smoothing technique (ST). Complexity γ(A)n2 LRε operations.

Polynomial-time methods (PT). Complexity(γ(A)n + n3)n ln LR

ε operations.

There are three possibilities.

Low accuracy: γ(A)LRε < 1. Then we choose SU.

Moderate accuracy: 1 < γ(A)LRε < n2. We choose ST.

High accuracy: γ(A)LRε > n2. We choose PT.

NB: For Huge-Scale problems usually γ(A) ≈ 1n ⇒

LRε

∨n