29
Summary of Lagrangian Theory A necessary condition for constrained local minimum A unified treatment for equality and inequality constraints A dual view of constrained optimization Constraints and variables are in certain sense inter-changeable A tool to analytically solve many problems

Summary of Lagrangian Theoryychen/CSE543/notes/Lecture 7.pdfSummary of Lagrangian Theory •A necessary condition for constrained local minimum •A unified treatment for equality

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • Summary of Lagrangian Theory

    • A necessary condition for constrained local minimum

    • A unified treatment for equality and inequality constraints

    • A dual view of constrained optimization

    – Constraints and variables are in certain sense inter-changeable

    • A tool to analytically solve many problems

  • Limitations of Lagrangian Theory

    • Works only for continuous and differentiable problems

    • Requires regularity

    • Only necessary, but not sufficient

    • Only for local minimum, not global minimum

    • Unique Lagrange multipliers may be difficult to locate

    – KKT system: a system of nonlinear equations, not necessarily easier to solve

  • We will study …

    • New computational methods for constrainedoptimization

    • Do not involve projection to feasibleregions or finding feasible directions

    – Well-suited for nonlinear constraints

    • Most popular general-purpose constrainedoptimization solvers are in this class

  • Two major classes

    • Penalty methods– Penalize infeasibility

    – Transforms constraints

    – Non-unique penalty multipliers

    – May achieve global optimality

    • Lagrange multiplier methods– Based on KKT condition

    – Does not transform constraints

    – Unique Lagrange multipliers

    – Local optimality only

    • No clear winner between the two– There are hybrid methods

  • Penalty methods

    • Penalty methods drop constraints and use

    new terms to penalize infeasibility

    min F(x,c) = f(x) + c [ ∑i Hi(x) + ∑j Gj(x) ]

    where scalar c>0 is the penalty multiplier

    Hi(x): 0 if hi(x) =0; >0 if hi(x) 0

    Gj(x): 0 if gj(x) ≤ 0; >0 if gj(x) > 0

  • Common penalty terms

    • l1- penalty:

    |h(x)|

    max(0, g(x))

    • Squared penalty:

    |h(x)|2

    max(0, g(x))2

  • Some properties

    • If an global optimal x* for the unconstrained

    penalty function is feasible in the original

    constrained model, it is a constrained global

    minimum

    • Squared penalty functions are differentiable

    when the underlying functions are

    differentiable

  • Exact Penalty Functions

    • A penalty function is exact if there exists a

    finite c>0 that the unconstrained penalty

    model F(x,c) yields an optimal solution in

    the original constrained model

    • An example:

    – Min y

    – S.t. y ≥ 0

  • Exactness

    • Squared penalty is usually not exact. i.e.

    there is usually no choice of c for which the

    penalty function yields the optimal solution

    – Setting c +∞ usually converges to solution

    • l1- penalty is generally exact under mild

    assumptions. i.e. there exists finite c for

    which the penalty function yields the

    optimal solution

  • A sequential strategy

    • When addressing a constraint nonlinear

    optimization problem by penalty methods,

    the multiplier c should be started at a

    relatively low value >0 and increased as

    computation proceeds. [Ch 14, Rardin 98]

  • Sequential Unconstrained Penalty

    Method

    Step 0: Initialization. Choose penalty form, initial c > 0 relatively small, starting x0, and an increase rate β > 1. Set t = 0

    Step 1: Unconstrained optimization. Beginning from xt, solve penalty optimization problem F(x,c)to produce xt+1

    Step 2: Stopping. If xt+1 is feasible or sufficiently close to feasible, stop and output xt+1

    Step 3: Increase. Enlarge the penalty multiplier as

    c β c

  • A Service Desk Design Example

    • Min –(2x + 2y +2)

    subject to x2/25+ y2/16 – 1

  • Iteration trajectory

    t c (β=4) xt+1 g1 g2 g3

    0 1/4 (9.69,6.20) 5.160 0 0

    1 1 (6.63, 4.24) 1.885 0 0

    2 4 (4.98,3.18) 0.627 0 0

    3 16 (4.22,2.74) 0.185 0 0.01

    4 64 (3.80,2.74) 0.051 0 0.01

    5 256 (3.67,2.74) 0.013 0 0.01

    6 1024 (3.63,2.75) 0.003 0 0

    7 4096 (3.63,2.75) 0.001 0 0

  • Problems of penalty methods

    • Global unconstrained optimization (expensive) needed

    – The result is for global optimality

    • Single penalty term can make the search easily get stuck

  • Augmented Lagrangian Function

  • Augmented Lagrangian Function

    • A marriage of penalty and Lagrangian

    L(x, λ, c) = f(x) + λ’h(x) + c/2 || h(x)||2

    • Two Mechanisms for the marriage to work

    – Taking λ close to λ*

    • From KKT condition, L(x*, λ*, c) can be a local

    minimum assuming x* satisfies 2nd order condition

    – Taking c very large

    • Squared penalty

  • An Example

    Consider: case 1) Taking λ close to λ*

    case 2) Taking c very large

  • Illustrations

  • Global Convergence of the Penalty

    Approach

    • Assuming {λk} is bounded, 0 < ck < ck+1,

    ck +∞, then every limit point of {xk} is a

    global optimum to the original problem

  • Proof

    • Lck(xk, λk) ≤ Lck(x, λ

    k), for any k and x∈ X

    • Lck(x*, λk) = f(x*)

    • So, Lck(xk, λk) ≤ f(x*), for any k

    • Take k → ∞, we have

    – limk→∞ f(xk) + λk h(xk) + ck/2||h(xk)||2 ≤ f(x*)

    • Since ck → +∞, we must have ||h(xk)|| → 0

    • Since the limit point is feasible, it is optimal

  • Practical behavior: ill-conditioning

    • Extensive practical experiences show that the penalty method is pretty reliable

    • When it fails, it is often due to the difficulty in minimizing Lck(x

    k, λk)

    • Difficulty depends on the ratio of largest to smallest eigenvalue of the Hessian of Lck– f(xk+1) ≤ ((M-m)/(M+m))2 f(xk)

    • When ck is large, the ratio tends to be large

    – Re-visit the example

    • Trade-off in increasing ck

  • How to alleviate the ill-conditioning

    problem

    • Ill-conditioning caused by too large ck

    • But the global convergence requires ck ∞

    • It would desirable if the algorithm can work

    even if ck is bounded

    • Setting λk close to λ* can achieve that

    • But how to estimate λ*?

  • A surprising by-product

    • Since we minimize Lck(x, λk) for each k, usually

    we have ||∇x Lck(xk, λk) || = 0

    Or, in practice, ||∇x Lck(xk, λk) || ≤ εk, εk →0

    • Now, let’s examine ∇ x Lck(xk, λk)

    ∇ Lck(xk, λk) = ∇ f(xk) + ∇ h(xk)(λk+ck h(xk))

    = ∇ f(xk) + ∇ h(xk) αk

    Take k → ∞ 0 = ∇ f(x*) + ∇ h(x*) α*

    • Conclusion: if {xk} converges to x*

    – limk → ∞ λk + ck h(xk) = λ*

  • The Multiplier Method

    • λk+1 = λk + ck h(xk)

    • λk will approach λ* during search

    • The method that increases ck and updates λk

    using the above formula is called the

    multiplier method

    – A popular and reliable method

  • Example

    • For given λk, ck , what is xk?

    • Express λk+1 in λk and ck

    (knowing that λk+1 = λk + ck h(xk))

    • Do λk, xk converge to λ*, x*?

    • What is effect of ck? Does ck needs to go to ∞ in

    order to converge to x*?

  • Example

    • For given λk, ck ,what is xk?

    • Express λk+1 in λk and ck

    (knowing that λk+1 = λk + ck h(xk))

    • Do λk, xk converge to λ*, x*?

    • What is effect of ck? Does ck needs to go to ∞ in

    order to converge to x*? Any requirements on ck?

    • Minimize f(x) = ½(-x12 + x2

    2)

    Subject to x1 = 1

  • Conclusion

    • ck does not need to go to ∞, but may need to

    be above certain threshold, for the Lagrange

    multiplier algorithm to converge.

  • Guidelines for Parameter Choices

    • Set λ0 close to λ*

    • ck should eventually become sufficiently large (over some threshold)

    • c0 should not be too large to avoid ill-conditioning

    • ck should not be increased too fast to the point of causing ill-conditioning early

    • ck should not be increased too slowly to the extend that λ has poor convergence to λ*

    • use xk as the starting point for iteration k+1

  • Guidelines for Parameter Choices

    (cont’d)

    • A popular scheme:

    – moderate c0, increase by ck+1 = βck, β>1

    – If the unconstrained method is powerful in

    dealing with ill-conditioning (e.g. Newton’s),

    may set a large β (e.g. [5, 10])

    • Another scheme:

    – Increase ck only when the constraint violation is

    not decreased by a factor γ (e.g. 0.25).