169
New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) January 12, 2015 (Les Houches) Yu. Nesterov New primal-dual methods for functional constraints 1/19

New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

New primal-dual subgradient methodsfor Convex Problems with Functional Constraints

Yurii Nesterov, CORE/INMA (UCL)

January 12, 2015 (Les Houches)

Yu. Nesterov New primal-dual methods for functional constraints 1/19

Page 2: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Outline

1 Constrained optimization problem

2 Lagrange multipliers

3 Dual function and dual problem

4 Augmented Lagrangian

5 Switching subgradient methods

6 Finding the dual multipliers

7 Complexity analysis

Yu. Nesterov New primal-dual methods for functional constraints 2/19

Page 3: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 4: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x),

where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 5: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 6: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set:

x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 7: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 8: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 9: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 10: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition:

point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 11: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 12: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 13: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation:

Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 14: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: simple constraints

Consider the problem: minx∈Q

f (x), where

Q is a closed convex set: x , y ∈ Q ⇒ [x , y ] ⊆ Q,

f is a subdifferentiable on Q convex function:

f (y) ≥ f (x) + 〈∇f (x), y − x〉, x , y ∈ Q, ∇f (x) ∈ ∂f (x).

Optimality condition: point x∗ ∈ Q is optimal iff

〈∇f (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q.

Interpretation: Function increases along any feasible direction.

Yu. Nesterov New primal-dual methods for functional constraints 3/19

Page 15: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 16: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem:

minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 17: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m},

where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 18: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 19: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 20: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 21: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 22: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951):

point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 23: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 24: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 25: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 26: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 27: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Optimization problem: functional constraints

Problem: minx∈Q{f0(x), fi (x) ≤ 0, i = 1, . . . ,m}, where

Q is a closed convex set,

all fi are convex and subdifferentiable on Q, i = 0, . . . ,m:

fi (y) ≥ fi (x) + 〈∇fi (x), y − x〉, x , y ∈ Q, ∇fi (x) ∈ ∂fi (x).

Optimality condition (KKT, 1951): point x∗ ∈ Q is optimal iff

there exist Lagrange multipliers λ(i)∗ ≥ 0, i = 1, . . . ,m, such that

(1) : 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0, ∀x ∈ Q,

(2) : fi (x∗) ≤ 0, i = 1, . . . ,m, (feasibility)

(3) : λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m. (complementary slackness)

Yu. Nesterov New primal-dual methods for functional constraints 4/19

Page 28: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 29: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 30: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x).

Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 31: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 32: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 33: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 34: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 35: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 36: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 37: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 38: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?

Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 39: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Lagrange multipliers: interpretation

Let I ⊆ {1, . . . ,m} be an arbitrary set of indexes.

Denote fI(x) = f0(x) +∑i∈I

λ(i)∗ fi (x). Consider the problem

PI : minx∈Q{fI(x) : fi (x) ≤ 0, i 6∈ I}.

Observation: in any case, x∗ is the optimal solution ofproblem PI .

Interpretation: λ(i)∗ are the shadow prices for resources.

(Kantorovich, 1939)

Application examples:

Traffic congestion: car flows on roads ⇔ size of queues.

Electrical networks: currents in the wires ⇔ voltagepotentials, etc.

Main question: How to compute (x∗, λ∗)?Yu. Nesterov New primal-dual methods for functional constraints 5/19

Page 40: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 41: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 42: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q,

implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 43: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 44: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0.

It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 45: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 46: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 47: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 48: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algebraic interpretation

Consider the Lagrangian L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x).

Condition KKT(1): 〈∇f0(x∗) +m∑

i=1λ

(i)∗ ∇fi (x∗), x − x∗〉 ≥ 0,

∀x ∈ Q, implies

x∗ ∈ Arg minx∈QL(x , λ∗).

Define the dual function φ(λ) = minx∈QL(x , λ), λ ≥ 0. It is concave!

By Danskin’s Theorem, ∇φ(λ) = (f1(x(λ)), . . . , fm(x(λ)), withx(λ) ∈ Arg min

x∈QL(x , λ).

Conditions KKT(2,3): fi (x∗) ≤ 0, λ(i)∗ fi (x∗) = 0, i = 1, . . . ,m,

imply (x∗ = x(λ∗))

λ∗ ∈ Arg maxλ≥0

φ(λ).

Yu. Nesterov New primal-dual methods for functional constraints 6/19

Page 49: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 50: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 51: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 52: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 53: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 54: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 55: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 56: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 57: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 58: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence

(O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 59: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Algorithmic aspects

Main idea: solve the dual problem maxλ≥0

φ(λ)

by the subgradient method:

1. Compute x(λk) and define ∇φ(λk) = (f1(x(λk)), . . . , fm(x(λk))).

2. Update λk+1 = ProjectRn+

(λk + hk∇φ(λk)).

Stepsizes hk > 0 are defined in the usual way.

Main difficulties:

Each iteration is time consuming.

Unclear termination criterion.

Low rate of convergence (O(

1ε2

)upper-level iterations).

Yu. Nesterov New primal-dual methods for functional constraints 7/19

Page 60: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave. Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 61: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave. Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 62: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave. Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 63: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave.

Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 64: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave. Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 65: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave. Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 66: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave. Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 67: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Augmented Lagrangian (1970’s)[Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . .]

Define the Augmented Lagrangian

L̂K (x , λ) = f0(x) + 12K

m∑i=1

(λ(i) + Kfi (x)

)2+− 1

2K ‖λ‖22, λ ∈ Rm,

where K > 0 is a penalty parameter.

Consider the dual function φ̂(λ) = minx∈QL̂(x , λ).

Main properties. Function φ̂ is concave. Its gradient isLipschitz continuous with constant 1

K .

Its unconstrained maximum is attained at the optimal dualsolution.

The corresponding point x̂(λ∗) is the optimal primal solution.

Hint: Check that the equation(λ(i) + Kfi (x)

)+

= λ(i)

is equivalent to KKT(2,3).

Yu. Nesterov New primal-dual methods for functional constraints 8/19

Page 68: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 69: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 70: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 71: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 72: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 73: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 74: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 75: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 76: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 77: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Method of Augmented Lagrangians

Note that ∇φ̂(λ) = 1K

(λ(i) + Kfi (x)

)+− 1

K λ.

Therefore, the usual gradient method λk+1 = λk + K∇φ̂(λk) isexactly as follows:

Method: λk+1 = (λk + Kf (x̂(λk)))+.

Advantage: Fast convergence of the dual process.

Disadvantages:

Difficult iteration.

Unclear termination.

No global complexity analysis.

Do we have an alternative?

Yu. Nesterov New primal-dual methods for functional constraints 9/19

Page 78: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 79: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m},

where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 80: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 81: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set.

(We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 82: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 83: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 84: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 85: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 86: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗.

Later, we will show f ∗ = f∗ algorithmically.

Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 87: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Problem formulation

Problem: f ∗ = infx∈Q{f0(x) : fi (x) ≤ 0, i = 1, . . . ,m}, where

fi (x), i = 0, . . . ,m, are closed convex functions on Q endowedwith a first-order black-box oracles,

Q ⊂ E is a bounded simple closed convex set. (We can solvesome auxiliary optimization problems over Q.)

Defining the Lagrangian

L(x , λ) = f0(x) +m∑

i=1λ(i)fi (x), x ∈ Q, λ ∈ Rm

+,

we can introduce the Lagrangian dual problem f∗def= sup

λ∈Rm+

φ(λ),

where φ(λ)def= inf

x∈QL(x , λ).

Clearly, f ∗ ≥ f∗. Later, we will show f ∗ = f∗ algorithmically.Yu. Nesterov New primal-dual methods for functional constraints 10/19

Page 88: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 89: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 90: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 91: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 92: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 93: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 94: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0

define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 95: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 96: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Bregman distances

Prox-function: d(·) is strongly convex on Q with parameter one:d(y) ≥ d(x) + 〈∇d(x), y − x〉+ 1

2‖y − x‖2, x , y ∈ Q.

Denote by x0 the prox-center of the set Q: x0 = arg minx∈Q

d(x).

Assume d(x0) = 0.

Bregman distance:

β(x , y) = d(y)− d(x)− 〈∇d(x), y − x〉, x , y ∈ Q.

Clearly, β(x , y) ≥ 12‖x − y‖2 for all x , y ∈ Q.

Bregman mapping: for x ∈ Q, g ∈ E ∗ and h > 0 define

Bh(x , g) = arg miny∈Q{h〈g , y − x〉+ β(x , y)}.

Examples: Euclidean distance, Entropy distance, etc.

Yu. Nesterov New primal-dual methods for functional constraints 11/19

Page 97: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 98: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 99: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 100: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 :

a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 101: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 102: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 103: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗.

Compute xk+1 = Bhk(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 104: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 105: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.

Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 106: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|.

It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 107: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Switching subgradient methods: Primal Method

Input parameter: the step size h > 0.

Initialization : Compute the prox-center x0.

Iteration k ≥ 0 : a) Define Ik = {i ∈ {1, . . . ,m} : fi (xk) > h‖∇fi (xk)‖∗}.

b) If Ik = ∅, then compute xk+1 = Bh

(xk ,

∇f0(xk )‖∇f0(xk )‖∗

).

c) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

hk =fik (xk )

‖∇fik (xk )‖2∗. Compute xk+1 = Bhk

(xk ,∇fik (xk)).

After t ≥ 0 iterations, define Ft = {k ∈ {0, . . . , t} : Ik = ∅}.Denote N(t) = |F(t)|. It is possible that N(t) = 0.

Yu. Nesterov New primal-dual methods for functional constraints 12/19

Page 108: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 109: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 110: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 111: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 112: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 113: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ .

If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 114: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 115: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy,

we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 116: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 117: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Finding the dual multipliers

if N(t) > 0, define the dual multipliers as follows:

λ(0)t = h

∑k∈Ft

1‖∇f0(xk )‖∗ ,

λ(i)t = 1

λ(0)t

∑k∈Ai (t)

hk , i = 1, . . . ,m,

where Ai (t) = {k ∈ {0, . . . , t} : ik = i}, 0 ≤ i ≤ m.

Denote St =∑

k∈Ft

1‖∇f0(xk )‖∗ . If Ft = ∅, then we define St = 0.

For proving convergence of the switching strategy, we find anupper bound for the gap

δt = 1St

∑k∈F(t)

f0(xk )‖∇f0(xk )‖∗ − φ(λt),

assuming that N(t) > 0.

Yu. Nesterov New primal-dual methods for functional constraints 13/19

Page 118: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 119: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt

≤ r0(x) + 12N(t)h2 − 1

2(t − N(t))h2 = r0(x)− 12 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 120: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2

= r0(x)− 12 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 121: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 122: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 123: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D,

then F(t) 6= ∅.In this case δt ≤ Mh and max

1≤i≤mfi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 124: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 125: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh

and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 126: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 127: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof:

If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 128: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0.

Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 129: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0.

Thisis impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 130: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 131: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t).

Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 132: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough,

then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 133: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 134: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Main inequality:

λ(0)t δt ≤ r0(x) + 1

2N(t)h2 − 12(t − N(t))h2 = r0(x)− 1

2 th2 + N(t)h2.

Denote D = maxx∈Q

r0(x).

Theorem. If the number t ≥ 2h2 D, then F(t) 6= ∅.

In this case δt ≤ Mh and max1≤i≤m

fi (xk) ≤ Mh, k ∈ F(t)

where M = max0≤k≤t

max0≤i≤m

‖∇fi (xk)‖∗.

Proof: If F(t) = ∅, then N(t) = 0. Consequently, λ(0)t = 0. This

is impossible for t big enough.

Finally, λ(0)t ≥ h

M N(t). Therefore, if t is big enough, then

δt ≤ N(t)h2

λ(0)t

≤ Mh. �

Yu. Nesterov New primal-dual methods for functional constraints 14/19

Page 135: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Dual subgradient method

Define the averaging coefficients {ak}k≥0, and nondecreasing

scaling coefficients {βk}k≥0. Denote At =t∑

k=0

ak .

Initialization : Define `0(x) ≡ 0, x ∈ Q.

Iteration k ≥ 0 : a) Compute xk = arg minx∈Q{`k(x) + βkd(x)}.

b) Define Ik = {i ∈ [1 : m] : f (i)(xk) ≥ ε}.

c) If Ik = ∅, then `k+1(x) = `k(x) + ak [f (0)(xk) + 〈∇f (0)(xk), x − xk〉].

d) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

`k+1(x) = `k(x) + ak [f (ik )(xk) + 〈∇f (ik )(xk), x − xk〉].

Yu. Nesterov New primal-dual methods for functional constraints 15/19

Page 136: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Dual subgradient method

Define the averaging coefficients {ak}k≥0, and nondecreasing

scaling coefficients {βk}k≥0. Denote At =t∑

k=0

ak .

Initialization : Define `0(x) ≡ 0, x ∈ Q.

Iteration k ≥ 0 : a) Compute xk = arg minx∈Q{`k(x) + βkd(x)}.

b) Define Ik = {i ∈ [1 : m] : f (i)(xk) ≥ ε}.

c) If Ik = ∅, then `k+1(x) = `k(x) + ak [f (0)(xk) + 〈∇f (0)(xk), x − xk〉].

d) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

`k+1(x) = `k(x) + ak [f (ik )(xk) + 〈∇f (ik )(xk), x − xk〉].

Yu. Nesterov New primal-dual methods for functional constraints 15/19

Page 137: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Dual subgradient method

Define the averaging coefficients {ak}k≥0, and nondecreasing

scaling coefficients {βk}k≥0. Denote At =t∑

k=0

ak .

Initialization : Define `0(x) ≡ 0, x ∈ Q.

Iteration k ≥ 0 : a) Compute xk = arg minx∈Q{`k(x) + βkd(x)}.

b) Define Ik = {i ∈ [1 : m] : f (i)(xk) ≥ ε}.

c) If Ik = ∅, then `k+1(x) = `k(x) + ak [f (0)(xk) + 〈∇f (0)(xk), x − xk〉].

d) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

`k+1(x) = `k(x) + ak [f (ik )(xk) + 〈∇f (ik )(xk), x − xk〉].

Yu. Nesterov New primal-dual methods for functional constraints 15/19

Page 138: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Dual subgradient method

Define the averaging coefficients {ak}k≥0, and nondecreasing

scaling coefficients {βk}k≥0. Denote At =t∑

k=0

ak .

Initialization : Define `0(x) ≡ 0, x ∈ Q.

Iteration k ≥ 0 : a) Compute xk = arg minx∈Q{`k(x) + βkd(x)}.

b) Define Ik = {i ∈ [1 : m] : f (i)(xk) ≥ ε}.

c) If Ik = ∅, then `k+1(x) = `k(x) + ak [f (0)(xk) + 〈∇f (0)(xk), x − xk〉].

d) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

`k+1(x) = `k(x) + ak [f (ik )(xk) + 〈∇f (ik )(xk), x − xk〉].

Yu. Nesterov New primal-dual methods for functional constraints 15/19

Page 139: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Dual subgradient method

Define the averaging coefficients {ak}k≥0, and nondecreasing

scaling coefficients {βk}k≥0. Denote At =t∑

k=0

ak .

Initialization : Define `0(x) ≡ 0, x ∈ Q.

Iteration k ≥ 0 : a) Compute xk = arg minx∈Q{`k(x) + βkd(x)}.

b) Define Ik = {i ∈ [1 : m] : f (i)(xk) ≥ ε}.

c) If Ik = ∅, then `k+1(x) = `k(x) + ak [f (0)(xk) + 〈∇f (0)(xk), x − xk〉].

d) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

`k+1(x) = `k(x) + ak [f (ik )(xk) + 〈∇f (ik )(xk), x − xk〉].

Yu. Nesterov New primal-dual methods for functional constraints 15/19

Page 140: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Dual subgradient method

Define the averaging coefficients {ak}k≥0, and nondecreasing

scaling coefficients {βk}k≥0. Denote At =t∑

k=0

ak .

Initialization : Define `0(x) ≡ 0, x ∈ Q.

Iteration k ≥ 0 : a) Compute xk = arg minx∈Q{`k(x) + βkd(x)}.

b) Define Ik = {i ∈ [1 : m] : f (i)(xk) ≥ ε}.

c) If Ik = ∅, then `k+1(x) = `k(x) + ak [f (0)(xk) + 〈∇f (0)(xk), x − xk〉].

d) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

`k+1(x) = `k(x) + ak [f (ik )(xk) + 〈∇f (ik )(xk), x − xk〉].

Yu. Nesterov New primal-dual methods for functional constraints 15/19

Page 141: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Dual subgradient method

Define the averaging coefficients {ak}k≥0, and nondecreasing

scaling coefficients {βk}k≥0. Denote At =t∑

k=0

ak .

Initialization : Define `0(x) ≡ 0, x ∈ Q.

Iteration k ≥ 0 : a) Compute xk = arg minx∈Q{`k(x) + βkd(x)}.

b) Define Ik = {i ∈ [1 : m] : f (i)(xk) ≥ ε}.

c) If Ik = ∅, then `k+1(x) = `k(x) + ak [f (0)(xk) + 〈∇f (0)(xk), x − xk〉].

d) If Ik 6= ∅, then choose arbitrary ik ∈ Ik and define

`k+1(x) = `k(x) + ak [f (ik )(xk) + 〈∇f (ik )(xk), x − xk〉].

Yu. Nesterov New primal-dual methods for functional constraints 15/19

Page 142: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Define A0(t) = {k ∈ [0 : t] : Ik = ∅}, N(t) = |A0(t)|, and

σt =∑

k∈A0(t)

ak , λ(i)t = 1

σt

∑k∈Ai (t)

ak , i = 1, . . . ,m,

where Ai (t) = {k ∈ [0 : t] : ik = i}, 1 ≤ i ≤ m.

If N(t) > 0, then define the gap δt = 1σt

∑k∈F(t)

ak f (0)(xk)− φ(λt).

Denote D = maxx∈Q

d(x).

T. Let all subgradients be bounded by M. Then for any t ≥ 0

σt · (δt − ε) + Atε ≤ βt+1D + 12M2

t∑k=0

a2kβk

.

If Atε > βt+1D + 12M2

t∑k=0

a2kβk

, then λ(0)t > 0 and δt ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

Yu. Nesterov New primal-dual methods for functional constraints 16/19

Page 143: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Define A0(t) = {k ∈ [0 : t] : Ik = ∅}, N(t) = |A0(t)|, and

σt =∑

k∈A0(t)

ak , λ(i)t = 1

σt

∑k∈Ai (t)

ak , i = 1, . . . ,m,

where Ai (t) = {k ∈ [0 : t] : ik = i}, 1 ≤ i ≤ m.

If N(t) > 0, then define the gap δt = 1σt

∑k∈F(t)

ak f (0)(xk)− φ(λt).

Denote D = maxx∈Q

d(x).

T. Let all subgradients be bounded by M. Then for any t ≥ 0

σt · (δt − ε) + Atε ≤ βt+1D + 12M2

t∑k=0

a2kβk

.

If Atε > βt+1D + 12M2

t∑k=0

a2kβk

, then λ(0)t > 0 and δt ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

Yu. Nesterov New primal-dual methods for functional constraints 16/19

Page 144: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Define A0(t) = {k ∈ [0 : t] : Ik = ∅}, N(t) = |A0(t)|, and

σt =∑

k∈A0(t)

ak , λ(i)t = 1

σt

∑k∈Ai (t)

ak , i = 1, . . . ,m,

where Ai (t) = {k ∈ [0 : t] : ik = i}, 1 ≤ i ≤ m.

If N(t) > 0, then define the gap δt = 1σt

∑k∈F(t)

ak f (0)(xk)− φ(λt).

Denote D = maxx∈Q

d(x).

T. Let all subgradients be bounded by M. Then for any t ≥ 0

σt · (δt − ε) + Atε ≤ βt+1D + 12M2

t∑k=0

a2kβk

.

If Atε > βt+1D + 12M2

t∑k=0

a2kβk

, then λ(0)t > 0 and δt ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

Yu. Nesterov New primal-dual methods for functional constraints 16/19

Page 145: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Define A0(t) = {k ∈ [0 : t] : Ik = ∅}, N(t) = |A0(t)|, and

σt =∑

k∈A0(t)

ak , λ(i)t = 1

σt

∑k∈Ai (t)

ak , i = 1, . . . ,m,

where Ai (t) = {k ∈ [0 : t] : ik = i}, 1 ≤ i ≤ m.

If N(t) > 0, then define the gap δt = 1σt

∑k∈F(t)

ak f (0)(xk)− φ(λt).

Denote D = maxx∈Q

d(x).

T. Let all subgradients be bounded by M. Then for any t ≥ 0

σt · (δt − ε) + Atε ≤ βt+1D + 12M2

t∑k=0

a2kβk

.

If Atε > βt+1D + 12M2

t∑k=0

a2kβk

, then λ(0)t > 0 and δt ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

Yu. Nesterov New primal-dual methods for functional constraints 16/19

Page 146: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Define A0(t) = {k ∈ [0 : t] : Ik = ∅}, N(t) = |A0(t)|, and

σt =∑

k∈A0(t)

ak , λ(i)t = 1

σt

∑k∈Ai (t)

ak , i = 1, . . . ,m,

where Ai (t) = {k ∈ [0 : t] : ik = i}, 1 ≤ i ≤ m.

If N(t) > 0, then define the gap δt = 1σt

∑k∈F(t)

ak f (0)(xk)− φ(λt).

Denote D = maxx∈Q

d(x).

T. Let all subgradients be bounded by M. Then for any t ≥ 0

σt · (δt − ε) + Atε ≤ βt+1D + 12M2

t∑k=0

a2kβk

.

If Atε > βt+1D + 12M2

t∑k=0

a2kβk

, then λ(0)t > 0 and δt ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

Yu. Nesterov New primal-dual methods for functional constraints 16/19

Page 147: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Define A0(t) = {k ∈ [0 : t] : Ik = ∅}, N(t) = |A0(t)|, and

σt =∑

k∈A0(t)

ak , λ(i)t = 1

σt

∑k∈Ai (t)

ak , i = 1, . . . ,m,

where Ai (t) = {k ∈ [0 : t] : ik = i}, 1 ≤ i ≤ m.

If N(t) > 0, then define the gap δt = 1σt

∑k∈F(t)

ak f (0)(xk)− φ(λt).

Denote D = maxx∈Q

d(x).

T. Let all subgradients be bounded by M. Then for any t ≥ 0

σt · (δt − ε) + Atε ≤ βt+1D + 12M2

t∑k=0

a2kβk

.

If Atε > βt+1D + 12M2

t∑k=0

a2kβk

, then λ(0)t > 0 and δt ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

Yu. Nesterov New primal-dual methods for functional constraints 16/19

Page 148: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Define A0(t) = {k ∈ [0 : t] : Ik = ∅}, N(t) = |A0(t)|, and

σt =∑

k∈A0(t)

ak , λ(i)t = 1

σt

∑k∈Ai (t)

ak , i = 1, . . . ,m,

where Ai (t) = {k ∈ [0 : t] : ik = i}, 1 ≤ i ≤ m.

If N(t) > 0, then define the gap δt = 1σt

∑k∈F(t)

ak f (0)(xk)− φ(λt).

Denote D = maxx∈Q

d(x).

T. Let all subgradients be bounded by M. Then for any t ≥ 0

σt · (δt − ε) + Atε ≤ βt+1D + 12M2

t∑k=0

a2kβk

.

If Atε > βt+1D + 12M2

t∑k=0

a2kβk

, then λ(0)t > 0 and δt ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

Yu. Nesterov New primal-dual methods for functional constraints 16/19

Page 149: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Quasi-monotone method

Initialization : Define `0(x) ≡ 0, x0 = ./, and σ0 = 0.Iteration t ≥ 0 :

a) Set vt = arg minx∈Q{`t(x) + βtd(x)}, It

def= {i ∈ [1 : m] : f (i)(vt) ≥ ε}.

b) If It 6= ∅, then set xt+1 = xt , σt+1 = σt , and choose arbitrary it ∈ It .

Update `t+1(x) = `t(x) + at+1[f (it)(vt) + 〈∇f (it)(vt), x − vt〉].

c) Otherwise, σt+1 = σt + at+1, τt = at+1

σt+1, xt+1 = (1− τt)xt + τtvt .

Update `t+1(x) = `t(x) + at+1[f (0)(xt+1) + 〈∇f (0)(xt+1), x − xt+1〉].

Operation x0 = ./ ∈ E indicates that x0 is not chosen yet.

Yu. Nesterov New primal-dual methods for functional constraints 17/19

Page 150: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Quasi-monotone method

Initialization : Define `0(x) ≡ 0, x0 = ./, and σ0 = 0.Iteration t ≥ 0 :

a) Set vt = arg minx∈Q{`t(x) + βtd(x)}, It

def= {i ∈ [1 : m] : f (i)(vt) ≥ ε}.

b) If It 6= ∅, then set xt+1 = xt , σt+1 = σt , and choose arbitrary it ∈ It .

Update `t+1(x) = `t(x) + at+1[f (it)(vt) + 〈∇f (it)(vt), x − vt〉].

c) Otherwise, σt+1 = σt + at+1, τt = at+1

σt+1, xt+1 = (1− τt)xt + τtvt .

Update `t+1(x) = `t(x) + at+1[f (0)(xt+1) + 〈∇f (0)(xt+1), x − xt+1〉].

Operation x0 = ./ ∈ E indicates that x0 is not chosen yet.

Yu. Nesterov New primal-dual methods for functional constraints 17/19

Page 151: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Quasi-monotone method

Initialization : Define `0(x) ≡ 0, x0 = ./, and σ0 = 0.Iteration t ≥ 0 :

a) Set vt = arg minx∈Q{`t(x) + βtd(x)}, It

def= {i ∈ [1 : m] : f (i)(vt) ≥ ε}.

b) If It 6= ∅, then set xt+1 = xt , σt+1 = σt , and choose arbitrary it ∈ It .

Update `t+1(x) = `t(x) + at+1[f (it)(vt) + 〈∇f (it)(vt), x − vt〉].

c) Otherwise, σt+1 = σt + at+1, τt = at+1

σt+1, xt+1 = (1− τt)xt + τtvt .

Update `t+1(x) = `t(x) + at+1[f (0)(xt+1) + 〈∇f (0)(xt+1), x − xt+1〉].

Operation x0 = ./ ∈ E indicates that x0 is not chosen yet.

Yu. Nesterov New primal-dual methods for functional constraints 17/19

Page 152: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Quasi-monotone method

Initialization : Define `0(x) ≡ 0, x0 = ./, and σ0 = 0.Iteration t ≥ 0 :

a) Set vt = arg minx∈Q{`t(x) + βtd(x)}, It

def= {i ∈ [1 : m] : f (i)(vt) ≥ ε}.

b) If It 6= ∅, then set xt+1 = xt , σt+1 = σt , and choose arbitrary it ∈ It .

Update `t+1(x) = `t(x) + at+1[f (it)(vt) + 〈∇f (it)(vt), x − vt〉].

c) Otherwise, σt+1 = σt + at+1, τt = at+1

σt+1, xt+1 = (1− τt)xt + τtvt .

Update `t+1(x) = `t(x) + at+1[f (0)(xt+1) + 〈∇f (0)(xt+1), x − xt+1〉].

Operation x0 = ./ ∈ E indicates that x0 is not chosen yet.

Yu. Nesterov New primal-dual methods for functional constraints 17/19

Page 153: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Quasi-monotone method

Initialization : Define `0(x) ≡ 0, x0 = ./, and σ0 = 0.Iteration t ≥ 0 :

a) Set vt = arg minx∈Q{`t(x) + βtd(x)}, It

def= {i ∈ [1 : m] : f (i)(vt) ≥ ε}.

b) If It 6= ∅, then set xt+1 = xt , σt+1 = σt , and choose arbitrary it ∈ It .

Update `t+1(x) = `t(x) + at+1[f (it)(vt) + 〈∇f (it)(vt), x − vt〉].

c) Otherwise, σt+1 = σt + at+1, τt = at+1

σt+1, xt+1 = (1− τt)xt + τtvt .

Update `t+1(x) = `t(x) + at+1[f (0)(xt+1) + 〈∇f (0)(xt+1), x − xt+1〉].

Operation x0 = ./ ∈ E indicates that x0 is not chosen yet.

Yu. Nesterov New primal-dual methods for functional constraints 17/19

Page 154: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Quasi-monotone method

Initialization : Define `0(x) ≡ 0, x0 = ./, and σ0 = 0.Iteration t ≥ 0 :

a) Set vt = arg minx∈Q{`t(x) + βtd(x)}, It

def= {i ∈ [1 : m] : f (i)(vt) ≥ ε}.

b) If It 6= ∅, then set xt+1 = xt , σt+1 = σt , and choose arbitrary it ∈ It .

Update `t+1(x) = `t(x) + at+1[f (it)(vt) + 〈∇f (it)(vt), x − vt〉].

c) Otherwise, σt+1 = σt + at+1, τt = at+1

σt+1, xt+1 = (1− τt)xt + τtvt .

Update `t+1(x) = `t(x) + at+1[f (0)(xt+1) + 〈∇f (0)(xt+1), x − xt+1〉].

Operation x0 = ./ ∈ E indicates that x0 is not chosen yet.

Yu. Nesterov New primal-dual methods for functional constraints 17/19

Page 155: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Theorem. 1. All points xt 6= x./ are ε-feasible.

2. If all subgradients are bounded by M, then ∀x ∈ Q, t ≥ 0,

σt(f (0)(xt)− ε) + Atε ≤ `t(x) + βtd(x) + 12M2

t∑k=1

a2k

βk−1.

3. No later than Atε > βtD + 12M2

t∑k=1

a2k

βk−1, we get σt > 0 and

f (xt)− φ(λt) ≤ f (xt)− 1σt

minx∈Q

`t(x) ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

NB: this is true for the whole sequence!

Yu. Nesterov New primal-dual methods for functional constraints 18/19

Page 156: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Theorem. 1. All points xt 6= x./ are ε-feasible.

2. If all subgradients are bounded by M, then ∀x ∈ Q, t ≥ 0,

σt(f (0)(xt)− ε) + Atε ≤ `t(x) + βtd(x) + 12M2

t∑k=1

a2k

βk−1.

3. No later than Atε > βtD + 12M2

t∑k=1

a2k

βk−1, we get σt > 0 and

f (xt)− φ(λt) ≤ f (xt)− 1σt

minx∈Q

`t(x) ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

NB: this is true for the whole sequence!

Yu. Nesterov New primal-dual methods for functional constraints 18/19

Page 157: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Theorem. 1. All points xt 6= x./ are ε-feasible.

2. If all subgradients are bounded by M, then ∀x ∈ Q, t ≥ 0,

σt(f (0)(xt)− ε) + Atε ≤ `t(x) + βtd(x) + 12M2

t∑k=1

a2k

βk−1.

3. No later than Atε > βtD + 12M2

t∑k=1

a2k

βk−1, we get σt > 0 and

f (xt)− φ(λt) ≤ f (xt)− 1σt

minx∈Q

`t(x) ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

NB: this is true for the whole sequence!

Yu. Nesterov New primal-dual methods for functional constraints 18/19

Page 158: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Theorem. 1. All points xt 6= x./ are ε-feasible.

2. If all subgradients are bounded by M, then ∀x ∈ Q, t ≥ 0,

σt(f (0)(xt)− ε) + Atε ≤ `t(x) + βtd(x) + 12M2

t∑k=1

a2k

βk−1.

3. No later than Atε > βtD + 12M2

t∑k=1

a2k

βk−1, we get σt > 0 and

f (xt)− φ(λt) ≤ f (xt)− 1σt

minx∈Q

`t(x) ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

NB: this is true for the whole sequence!

Yu. Nesterov New primal-dual methods for functional constraints 18/19

Page 159: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Theorem. 1. All points xt 6= x./ are ε-feasible.

2. If all subgradients are bounded by M, then ∀x ∈ Q, t ≥ 0,

σt(f (0)(xt)− ε) + Atε ≤ `t(x) + βtd(x) + 12M2

t∑k=1

a2k

βk−1.

3. No later than Atε > βtD + 12M2

t∑k=1

a2k

βk−1, we get σt > 0 and

f (xt)− φ(λt) ≤ f (xt)− 1σt

minx∈Q

`t(x) ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

NB: this is true for the whole sequence!

Yu. Nesterov New primal-dual methods for functional constraints 18/19

Page 160: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Theorem. 1. All points xt 6= x./ are ε-feasible.

2. If all subgradients are bounded by M, then ∀x ∈ Q, t ≥ 0,

σt(f (0)(xt)− ε) + Atε ≤ `t(x) + βtd(x) + 12M2

t∑k=1

a2k

βk−1.

3. No later than Atε > βtD + 12M2

t∑k=1

a2k

βk−1, we get σt > 0 and

f (xt)− φ(λt) ≤ f (xt)− 1σt

minx∈Q

`t(x) ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

NB: this is true for the whole sequence!

Yu. Nesterov New primal-dual methods for functional constraints 18/19

Page 161: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Convergence result

Theorem. 1. All points xt 6= x./ are ε-feasible.

2. If all subgradients are bounded by M, then ∀x ∈ Q, t ≥ 0,

σt(f (0)(xt)− ε) + Atε ≤ `t(x) + βtd(x) + 12M2

t∑k=1

a2k

βk−1.

3. No later than Atε > βtD + 12M2

t∑k=1

a2k

βk−1, we get σt > 0 and

f (xt)− φ(λt) ≤ f (xt)− 1σt

minx∈Q

`t(x) ≤ ε.

Example: at ≡ 1, βt ≈√

t ⇒ t ≈ O(

1ε2

).

NB: this is true for the whole sequence!

Yu. Nesterov New primal-dual methods for functional constraints 18/19

Page 162: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19

Page 163: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19

Page 164: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation

:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19

Page 165: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19

Page 166: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate

even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19

Page 167: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19

Page 168: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19

Page 169: New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Conclusion

1. Optimal primal-dual solution can be approximated by simpleswitching subgradient schemes.

2. Approximations of dual multipliers have natural interpretation:relative importance of corresponding constraints during theadjustments process.

3. However, it has optimal worst-case efficiency estimate even ifthe dual optimal solution does not exist.

4. Many interesting questions (influence of smoothness, strongconvexity, etc.)

Thank you for your attention!

Yu. Nesterov New primal-dual methods for functional constraints 19/19