Introduction to Optimization · Constrained Optimization General constrained optimization problem: Let x2Rn, f: Rn!R, g: Rn!Rm, h: Rn!Rlﬁnd min x f(x) s.t. g(x) 0;h(x) = 0 In this

Introduction toOptimization

Constrained Optimization

Marc ToussaintU Stuttgart

Constrained Optimization

• General constrained optimization problem:Let x ∈ Rn, f : Rn → R, g : Rn → Rm, h : Rn → Rl find

minx

f(x) s.t. g(x) ≤ 0, h(x) = 0

In this lecture I’ll focus (mostly) on inequality constraints g!

• Applications– Find an optimal, non-colliding trajectory in robotics– Optimize the shape of a turbine blade, s.t. it must not break– Optimize the train schedule, s.t. consistency/possibility

2/39

• Try to some how transform the constraint problem to

a series of unconstraint problems

a single but larger unconstraint problem

another constraint problem, hopefully simpler (dual, convex)

3/39

General approaches

• Penalty & Barriers– Associate a (adaptive) penalty cost with violation of the constraint– Associate an additional “force compensating the gradient into the constraint”(augmented Lagrangian)– Associate a log barrier with a constraint, becoming∞ for violation (interiorpoint method)

• Gradient projection methods (mostly for linear contraints)– For ‘active’ constraints, project the step direction to become tangantial– When checking a step, always pull it back to the feasible region

• Lagrangian & dual methods– Rewrite the constrained problem into an unconstrained one– Or rewrite it as a (convex) dual problem

• Simplex methods (linear constraints)– Walk along the constraint boundaries

4/39

Penalties & Barries

• Convention:

A barrier is really∞ for g(x) > 0

A penalty is zero for g(x) ≤ 0 and increases with g(x) > 0

5/39

Log barrier method or Interior Point method

6/39

Log barrier method

• Instead ofminx

f(x) s.t. g(x) ≤ 0

we addressminx

f(x)− µ∑i

log(−gi(x))

7/39

Log barrier

• For µ→ 0, −µ log(−g) converges to∞[g > 0]

Notation: [boolean expression] ∈ {0, 1}

• The barriers gradient ∇− log(−g) = ∇gg pushes away from the

constraint

• Eventually we want to have a very small µ – but choosing small µmakes the barrier very non-smooth, which is bad for Gradient and 2ndorder methods

8/39

Central Path

• Every µ defines a different optimal x∗(µ)

x∗(µ) = argminx

f(x)− µ∑i

log(−gi(x))

• Each point on the path can be understood as the optimal compromiseof minimizing f(x) and a repelling force of the constraints. (Whichcorresponds to dual variables λ∗(µ).)

9/39

Log barrier method

Input: initial x ∈ Rn, functions f(x), g(x),∇f(x),∇g(x), tolerances θ, εOutput: x

1: initialize µ = 1

2: repeat3: find x← argminx f(x)− µ

∑i log(−gi(x)) with tolerance 10θ

4: decrease µ← µ/10

5: until |∆x| < θ and ∀i : gi(x) < ε

Note: See Boyd & Vandenberghe for stopping criteria based on fprecision (duality gap) and better choice of initial µ (which is called tthere).

10/39

We will revisit the log barrier method later, once we introduced theLangrangian...

11/39

Squared Penalty Method

12/39


• This is perhaps the simplest approach

• Instead ofminx

f(x) s.t. g(x) ≤ 0

we address

minx

f(x) + µ

m∑i=1

[gi(x) > 0] gi(x)2

Input: initial x ∈ Rn, functions f(x), g(x),∇f(x),∇g(x), tol. θ, εOutput: x

1: initialize µ = 1

2: repeat3: find x← argminx f(x) + µ

∑i[gi(x) > 0] g(x)2 with tolerance 10θ

4: µ← 10µ

5: until |∆x| < θ and ∀i : gi(x) < ε

13/39


• The method is ok, but will always lead to some violation of constraints

• A better idea would be to add an out-pushing gradient/force −∇gi(x)

for every constraint gi(x) > 0 that is violated.

Ideally, the out-pushing gradient mixes with −∇f(x) exactly such thatthe result becomes tangential to the constraint!

This idea leads to the augmented Lagrangian approach.

14/39


• The method is ok, but will always lead to some violation of constraints

• A better idea would be to add an out-pushing gradient/force −∇gi(x)

for every constraint gi(x) > 0 that is violated.

Ideally, the out-pushing gradient mixes with −∇f(x) exactly such thatthe result becomes tangential to the constraint!

This idea leads to the augmented Lagrangian approach.

14/39

Augmented Lagrangian

(We can introduce this is a self-contained manner, without yet defining the

“Lagrangian”)

15/39

Augmented Lagrangian (equality constraint)• We first consider an equality constraint before addressing inequalities• Instead of

minx

f(x) s.t. h(x) = 0

we address

minx

f(x) + µ

m∑i=1

hi(x)2 +∑i=1

λihi(x) (1)

• Note:– The gradient ∇hi(x) is always orthogonal to the constraint– By tuning λi we can induce a “virtual gradient” λi∇hi(x)

– The term µ∑mi=1 hi(x)2 penalizes as before

• Here is the trick:– First minimize (1) for some µ and λi– This will in general lead to a (slight) penalty µ

∑mi=1 hi(x)2

– For the next iteration, choose λi to generate exactly the gradient thatwas previously generated by the penalty 16/39

• Optimality condition after an iteration:

x′ = argminx

f(x) + µ

m∑i=1

hi(x)2 +

m∑i=1

λihi(x)

⇒ 0 = ∇f(x′) + µ

m∑i=1

2hi(x′)∇hi(x

′) +

m∑i=1

λi∇hi(x′)

• Update λ’s for the next iteration:∑i=1

λnewi ∇hi(x

′) = µ

m∑i=1

2hi(x′)∇hi(x

′) +∑i=1

λoldi ∇hi(x

′)

λnewi = λold

i + 2µhi(x′)

Input: initial x ∈ Rn, functions f(x), h(x),∇f(x),∇h(x), tol. θ, εOutput: x

1: initialize µ = 1, λi = 0


∑i hi(x)2 +

∑i λihi(x)

4: ∀i : λi ← λi + 2µhi(x′)

5: until |∆x| < θ and |hi(x)| < ε

17/39

This adaptation of λi is really elegant:

• We do not have to take the penalty limit µ→∞ but still can have exactconstraints

• If f and h were linear (∇f and ∇hi constant), the updated λi is exactlyright: In the next iteration we would exactly hit the constraint (byconstruction)

• The penalty term is like a measuring device for the necessary “virtualgradient”, which is generated by the agumentation term in the nextiteration

• The λi are very meaningful: they give the force/gradient that a constraintexerts on the solution

18/39

Augmented Lagrangian (inequality constraint)• Instead of

minx

f(x) s.t. g(x) ≤ 0

we address

minx

f(x) + µ

m∑i=1

[gi(x) ≥ 0 ∨ λi > 0] gi(x)2 +

m∑i=1

λigi(x)

• A constraint is either active or inactive:– When active (gi(x) ≥ 0 ∨ λi > 0) we aim for equality gi(x) = 0

– When inactive (gi(x) < 0 ∧ λi = 0) we don’t penalize/augment– λi are zero or positive, but never negative

Input: initial x ∈ Rn, functions f(x), g(x),∇f(x),∇g(x), tol. θ, εOutput: x

1: initialize µ = 1, λi = 0


∑i[gi(x) ≥ 0 ∨ λi > 0] gi(x)2 +

∑i λigi(x)

4: ∀i : λi ← max(λi + 2µgi(x′), 0)

5: until |∆x| < θ and gi(x) < ε

19/39

General approaches

• Penalty & Barriers– Associate a (adaptive) penalty cost with violation of the constraint– Associate an additional “force compensating the gradient into the constraint”(augmented Lagrangian)– Associate a log-barrier with a constraint, becoming∞ for violation (interiorpoint method)




20/39

The Lagrangian

21/39

The Lagrangian

• Given a constraint problem

minx

f(x) s.t. g(x) ≤ 0

we define the Lagrangian as

L(x, λ) = f(x) +

m∑i=1

λigi(x)

• The λi ≥ 0 are called dual variables or Lagrange multipliers

22/39

What’s the point of this definition?

• The Lagrangian is useful to compute optima analytically, on paper –that’s why physicist learn it early on

• The Lagrangian implies the KKT conditions of optimality

• Optima are necessarily at saddle points of the Lagrangian

• The Lagrangian implies a dual problem, which is sometimes easier tosolve than the primal

23/39

Example: Some calculus using the Lagrangian

• For x ∈ R2, what is

minxx2 s.t. x1 + x2 = 1

• Solution:

L(x, λ) = x2 + λ(x1 + x2 − 1)

0 = ∇xL(x, λ) = 2x+ λ

1

1

⇒ x1 = x2 = −λ/2

0 = ∇λL(x, λ) = x1 + x2 − 1 = −λ/2− λ/2− 1 ⇒ λ = −1

⇒ = x1 = x2 = 1/2

24/39

The “force” & KKT view on the Lagrangian

• At the optimum there must be a balance between the cost gradient−∇f(x) and the gradient of the active constraints −∇gi(x)

25/39



• Formally: for optimal x: ∇f(x) ∈ span{∇gi(x)}

• Or: for optimal x there must exist λi such that −∇f(x) = −[∑

i(−λi∇gi(x))]

• For optimal x it must hold (necessary condition): ∃λ s.t.

∇f(x) +

m∑i=1

λi∇gi(x) = 0 (“force balance”)

∀i : gi(x) ≤ 0 (primal feasibility)

∀i : λi ≥ 0 (dual feasibility)

∀i : λigi(x) = 0 (complementary)

The last condition says that λi > 0 only for active constraints.These are the Karush-Kuhn-Tucker conditions (KKT, neglectingequality constraints)

26/39



• Formally: for optimal x: ∇f(x) ∈ span{∇gi(x)}

• Or: for optimal x there must exist λi such that −∇f(x) = −[∑

i(−λi∇gi(x))]

• For optimal x it must hold (necessary condition): ∃λ s.t.

∇f(x) +

m∑i=1

λi∇gi(x) = 0 (“force balance”)

∀i : gi(x) ≤ 0 (primal feasibility)

∀i : λi ≥ 0 (dual feasibility)

∀i : λigi(x) = 0 (complementary)

The last condition says that λi > 0 only for active constraints.These are the Karush-Kuhn-Tucker conditions (KKT, neglectingequality constraints)

26/39


• The first condition (“force balance”), ∃λ s.t.

∇f(x) +

m∑i=1

λi∇gi(x) = 0

can be equivalently expressed as, ∃λ s.t.

∇xL(x, λ) = 0

• In that sense, the Lagrangian can be viewed as the “energy function”that generates (for good choice of λ) the right balance between costand constraint gradients

• This is exactly as in the augmented Lagrangian approach, where however wehave an additional (“augmented”) squared penalty that is used to tune the λi

27/39

Saddle point view on the Lagrangian

• Let’s briefly consider the equality case again:

minx

f(x) s.t. h(x) = 0

with the Lagrangian

L(x, λ) = f(x) +

m∑i=1

λihi(x)

• Note:

minxL(x, λ) ⇒ 0 = ∇xL(x, λ) ↔ force balance

maxλ

L(x, λ) ⇒ 0 = ∇λL(x, λ) = hi(x) ↔ constraint

• Optima (x∗, λ∗) are saddle points where∇xL = 0 ensures force balance and∇λL = 0 ensures the constraint 28/39

Saddle point view on the Lagrangian

• In the inequality case:

maxλ≥0

L(x, λ) =

f(x) if g(x) ≤ 0

∞ otherwise

maxλi≥0

L(x, λ)⇒

λi = 0 if g(x) < 0

0 = ∇λiL(x, λ) = gi(0) otherwise

This implies either (λi = 0 ∧ gi(x) < 0) or gi(0) = 0, which is exactlyequivalent to the KKT conditions

• Again, optima (x∗, λ∗) are saddle points whereminx L enforces force balance andmaxλ L enforces the KKT conditions

29/39

The Lagrange dual problem• We define the Lagrange dual function as

l(λ) = minxL(x, λ)

• This implies two problems

minxf(x) s.t. g(x) ≤ 0 primal problem

maxλ

l(λ) s.t. λ ≥ 0 dual problem

The dual problem is convex, even if the primal is non-convex!

• Written more symmetric:

minx

maxl≥0

L(x, λ) primal problem

maxλ≥0

minxL(x, λ) dual problem

because the maxλ≥0 L(x, λ) ensures the constraints (previous slide).30/39

The Lagrange dual problem

• The dual function is always a lower bound (for any λi ≥ 0)

l(λ) = minxL(x, λ) ≤

[minxf(x) s.t. g(x) ≤ 0

]And consequently

maxλ≥0

minxL(x, λ) ≤ min

xmaxl≥0

L(x, λ)

• We say strong duality holds iff

maxλ≥0

minxL(x, λ) = min

xmaxl≥0

L(x, λ)

• If the primal is convex, and there exist an interior point

∃x : ∀i : gi(x) < 0

(which is called Slater condition), then we have strong duality31/39

And what about algorithms?

• So far we’ve only introduced a whole lot of formalism, and seen thatthe Lagrangian sort of represents the constraint problem– minx L or ∇xL = 0 is related to the force balance– maxλ L or ∇λL = 0 is related to constraints or KKT conditions– This implies two dual problems, minx maxλ L and maxλ minx L, thesecond (dual) is a lower bound of the first (primal)

• But what are the algorithms we can get out of this?

32/39

Algorithmic implications of the Lagrangian view

• If minx L(x, λ) can be solved analytically, we can alternatively solve the(convex) dual problem.

• But more generally

Optimization problem −→ Solve KKT conditions

→ Apply standard algos for solving an equation system r(x, λ) = 0:

Newton method ∇r∆x

∆λ

= −r

This leads to primal-dual algorithms that adapt x and λ concurrently.Roughly, they use the curvature ∇2f to estimate the right λ to push outof the constraint. We will discuss this after we’ve learnt about 2ndorder methods.

33/39

Log barrier method revisited

34/39


• Log barrier method: Instead of

minx

f(x) s.t. g(x) ≤ 0

we addressminx

f(x)− µ∑i

log(−gi(x))

• For given µ the optimality condition is

∇f(x)−∑i

µ

gi(x)∇gi(x) = 0

or equivalently

∇f(x)−∑i

λi∇gi(x) = 0 , λigi(x) = −µ

These are called modified (=approximate) KKT conditions.35/39


Centering (the unconstrained minimization) in the log barrier method isequivalent to solving the modified KKT conditions.

Note also: On the central path, the duality gap is mµ:l(λ∗(µ)) = f(x∗(µ)) +

∑i λigi(x

∗(µ)) = f(x∗(µ))−mµ

36/39

Phase I: Finding a feasible initialization

37/39

Phase I: Finding a feasible initialization

• An elegant method for finding a feasible point x:

min(x,s)∈Rn+1

s s.t. ∀i : gi(x) ≤ s, s ≥ 0

or

min(x,s)∈Rn+m

m∑i=1

si s.t. ∀i : gi(x) ≤ si, si ≥ 0

38/39

General approaches

• Penalty & Barriers– Associate a (adaptive) penalty cost with violation of the constraint– Associate an additional “force compensating the gradient into the constraint”(augmented Lagrangian)– Associate a log barrier with a constraint, becoming∞ for violation (interiorpoint method)




39/39

Documents

Introduction to Optimization · Constrained Optimization General constrained optimization problem: Let x2Rn, f: Rn!R, g: Rn!Rm, h: Rn!Rlﬁnd min x f(x) s.t. g(x) 0;h(x) = 0 In this