Summary of Lagrangian Theoryychen/CSE543/notes/Lecture 7.pdfSummary of Lagrangian Theory •A necessary condition for constrained local minimum •A unified treatment for equality

Summary of Lagrangian Theory

• A necessary condition for constrained local minimum

• A unified treatment for equality and inequality constraints

• A dual view of constrained optimization

– Constraints and variables are in certain sense inter-changeable

• A tool to analytically solve many problems

Limitations of Lagrangian Theory

• Works only for continuous and differentiable problems

• Requires regularity

• Only necessary, but not sufficient

• Only for local minimum, not global minimum

• Unique Lagrange multipliers may be difficult to locate

– KKT system: a system of nonlinear equations, not necessarily easier to solve

We will study …

• New computational methods for constrainedoptimization

• Do not involve projection to feasibleregions or finding feasible directions

– Well-suited for nonlinear constraints

• Most popular general-purpose constrainedoptimization solvers are in this class

Two major classes

• Penalty methods– Penalize infeasibility

– Transforms constraints

– Non-unique penalty multipliers

– May achieve global optimality

• Lagrange multiplier methods– Based on KKT condition

– Does not transform constraints

– Unique Lagrange multipliers

– Local optimality only

• No clear winner between the two– There are hybrid methods

Penalty methods

• Penalty methods drop constraints and use

new terms to penalize infeasibility

min F(x,c) = f(x) + c [ ∑i Hi(x) + ∑j Gj(x) ]

where scalar c>0 is the penalty multiplier

Hi(x): 0 if hi(x) =0; >0 if hi(x) 0

Gj(x): 0 if gj(x) ≤ 0; >0 if gj(x) > 0

Common penalty terms

• l1- penalty:

|h(x)|

max(0, g(x))

• Squared penalty:

|h(x)|2

max(0, g(x))2

Some properties

• If an global optimal x* for the unconstrained

penalty function is feasible in the original

constrained model, it is a constrained global

minimum

• Squared penalty functions are differentiable

when the underlying functions are

differentiable

Exact Penalty Functions

• A penalty function is exact if there exists a

finite c>0 that the unconstrained penalty

model F(x,c) yields an optimal solution in

the original constrained model

• An example:

– Min y

– S.t. y ≥ 0

Exactness

• Squared penalty is usually not exact. i.e.

there is usually no choice of c for which the

penalty function yields the optimal solution

– Setting c +∞ usually converges to solution

• l1- penalty is generally exact under mild

assumptions. i.e. there exists finite c for

which the penalty function yields the

optimal solution

A sequential strategy

• When addressing a constraint nonlinear

optimization problem by penalty methods,

the multiplier c should be started at a

relatively low value >0 and increased as

computation proceeds. [Ch 14, Rardin 98]

Sequential Unconstrained Penalty

Method

Step 0: Initialization. Choose penalty form, initial c > 0 relatively small, starting x0, and an increase rate β > 1. Set t = 0

Step 1: Unconstrained optimization. Beginning from xt, solve penalty optimization problem F(x,c)to produce xt+1

Step 2: Stopping. If xt+1 is feasible or sufficiently close to feasible, stop and output xt+1

Step 3: Increase. Enlarge the penalty multiplier as

c β c

A Service Desk Design Example

• Min –(2x + 2y +2)

subject to x2/25+ y2/16 – 1

Iteration trajectory

t c (β=4) xt+1 g1 g2 g3

0 1/4 (9.69,6.20) 5.160 0 0

1 1 (6.63, 4.24) 1.885 0 0

2 4 (4.98,3.18) 0.627 0 0

3 16 (4.22,2.74) 0.185 0 0.01

4 64 (3.80,2.74) 0.051 0 0.01

5 256 (3.67,2.74) 0.013 0 0.01

6 1024 (3.63,2.75) 0.003 0 0

7 4096 (3.63,2.75) 0.001 0 0

Problems of penalty methods

• Global unconstrained optimization (expensive) needed

– The result is for global optimality

• Single penalty term can make the search easily get stuck

Augmented Lagrangian Function

Augmented Lagrangian Function

• A marriage of penalty and Lagrangian

L(x, λ, c) = f(x) + λ’h(x) + c/2 || h(x)||2

• Two Mechanisms for the marriage to work

– Taking λ close to λ*

• From KKT condition, L(x*, λ*, c) can be a local

minimum assuming x* satisfies 2nd order condition

– Taking c very large

• Squared penalty

An Example

Consider: case 1) Taking λ close to λ*

case 2) Taking c very large

Illustrations

Global Convergence of the Penalty

Approach

• Assuming {λk} is bounded, 0 < ck < ck+1,

ck +∞, then every limit point of {xk} is a

global optimum to the original problem

Proof

• Lck(xk, λk) ≤ Lck(x, λ

k), for any k and x∈ X

• Lck(x*, λk) = f(x*)

• So, Lck(xk, λk) ≤ f(x*), for any k

• Take k → ∞, we have

– limk→∞ f(xk) + λk h(xk) + ck/2||h(xk)||2 ≤ f(x*)

• Since ck → +∞, we must have ||h(xk)|| → 0

• Since the limit point is feasible, it is optimal

Practical behavior: ill-conditioning

• Extensive practical experiences show that the penalty method is pretty reliable

• When it fails, it is often due to the difficulty in minimizing Lck(x

k, λk)

• Difficulty depends on the ratio of largest to smallest eigenvalue of the Hessian of Lck– f(xk+1) ≤ ((M-m)/(M+m))2 f(xk)

• When ck is large, the ratio tends to be large

– Re-visit the example

• Trade-off in increasing ck

How to alleviate the ill-conditioning

problem

• Ill-conditioning caused by too large ck

• But the global convergence requires ck ∞

• It would desirable if the algorithm can work

even if ck is bounded

• Setting λk close to λ* can achieve that

• But how to estimate λ*?

A surprising by-product

• Since we minimize Lck(x, λk) for each k, usually

we have ||∇x Lck(xk, λk) || = 0

Or, in practice, ||∇x Lck(xk, λk) || ≤ εk, εk →0

• Now, let’s examine ∇ x Lck(xk, λk)

∇ Lck(xk, λk) = ∇ f(xk) + ∇ h(xk)(λk+ck h(xk))

= ∇ f(xk) + ∇ h(xk) αk

Take k → ∞ 0 = ∇ f(x*) + ∇ h(x*) α*

• Conclusion: if {xk} converges to x*

– limk → ∞ λk + ck h(xk) = λ*

The Multiplier Method

• λk+1 = λk + ck h(xk)

• λk will approach λ* during search

• The method that increases ck and updates λk

using the above formula is called the

multiplier method

– A popular and reliable method

Example

• For given λk, ck , what is xk?

• Express λk+1 in λk and ck

(knowing that λk+1 = λk + ck h(xk))

• Do λk, xk converge to λ*, x*?

• What is effect of ck? Does ck needs to go to ∞ in

order to converge to x*?

Example

• For given λk, ck ,what is xk?

• Express λk+1 in λk and ck

(knowing that λk+1 = λk + ck h(xk))

• Do λk, xk converge to λ*, x*?

• What is effect of ck? Does ck needs to go to ∞ in

order to converge to x*? Any requirements on ck?

• Minimize f(x) = ½(-x12 + x2

2)

Subject to x1 = 1

Conclusion

• ck does not need to go to ∞, but may need to

be above certain threshold, for the Lagrange

multiplier algorithm to converge.

Guidelines for Parameter Choices

• Set λ0 close to λ*

• ck should eventually become sufficiently large (over some threshold)

• c0 should not be too large to avoid ill-conditioning

• ck should not be increased too fast to the point of causing ill-conditioning early

• ck should not be increased too slowly to the extend that λ has poor convergence to λ*

• use xk as the starting point for iteration k+1

Guidelines for Parameter Choices

(cont’d)

• A popular scheme:

– moderate c0, increase by ck+1 = βck, β>1

– If the unconstrained method is powerful in

dealing with ill-conditioning (e.g. Newton’s),

may set a large β (e.g. [5, 10])

• Another scheme:

– Increase ck only when the constraint violation is

not decreased by a factor γ (e.g. 0.25).

Documents

Summary of Lagrangian Theoryychen/CSE543/notes/Lecture 7.pdfSummary of Lagrangian Theory •A necessary condition for constrained local minimum •A unified treatment for equality