Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Summary of Lagrangian Theory
• A necessary condition for constrained local minimum
• A unified treatment for equality and inequality constraints
• A dual view of constrained optimization
– Constraints and variables are in certain sense inter-changeable
• A tool to analytically solve many problems
Limitations of Lagrangian Theory
• Works only for continuous and differentiable problems
• Requires regularity
• Only necessary, but not sufficient
• Only for local minimum, not global minimum
• Unique Lagrange multipliers may be difficult to locate
– KKT system: a system of nonlinear equations, not necessarily easier to solve
We will study …
• New computational methods for constrainedoptimization
• Do not involve projection to feasibleregions or finding feasible directions
– Well-suited for nonlinear constraints
• Most popular general-purpose constrainedoptimization solvers are in this class
Two major classes
• Penalty methods– Penalize infeasibility
– Transforms constraints
– Non-unique penalty multipliers
– May achieve global optimality
• Lagrange multiplier methods– Based on KKT condition
– Does not transform constraints
– Unique Lagrange multipliers
– Local optimality only
• No clear winner between the two– There are hybrid methods
Penalty methods
• Penalty methods drop constraints and use
new terms to penalize infeasibility
min F(x,c) = f(x) + c [ ∑i Hi(x) + ∑j Gj(x) ]
where scalar c>0 is the penalty multiplier
Hi(x): 0 if hi(x) =0; >0 if hi(x) 0
Gj(x): 0 if gj(x) ≤ 0; >0 if gj(x) > 0
Common penalty terms
• l1- penalty:
|h(x)|
max(0, g(x))
• Squared penalty:
|h(x)|2
max(0, g(x))2
Some properties
• If an global optimal x* for the unconstrained
penalty function is feasible in the original
constrained model, it is a constrained global
minimum
• Squared penalty functions are differentiable
when the underlying functions are
differentiable
Exact Penalty Functions
• A penalty function is exact if there exists a
finite c>0 that the unconstrained penalty
model F(x,c) yields an optimal solution in
the original constrained model
• An example:
– Min y
– S.t. y ≥ 0
Exactness
• Squared penalty is usually not exact. i.e.
there is usually no choice of c for which the
penalty function yields the optimal solution
– Setting c +∞ usually converges to solution
• l1- penalty is generally exact under mild
assumptions. i.e. there exists finite c for
which the penalty function yields the
optimal solution
A sequential strategy
• When addressing a constraint nonlinear
optimization problem by penalty methods,
the multiplier c should be started at a
relatively low value >0 and increased as
computation proceeds. [Ch 14, Rardin 98]
Sequential Unconstrained Penalty
Method
Step 0: Initialization. Choose penalty form, initial c > 0 relatively small, starting x0, and an increase rate β > 1. Set t = 0
Step 1: Unconstrained optimization. Beginning from xt, solve penalty optimization problem F(x,c)to produce xt+1
Step 2: Stopping. If xt+1 is feasible or sufficiently close to feasible, stop and output xt+1
Step 3: Increase. Enlarge the penalty multiplier as
c β c
A Service Desk Design Example
• Min –(2x + 2y +2)
subject to x2/25+ y2/16 – 1
Iteration trajectory
t c (β=4) xt+1 g1 g2 g3
0 1/4 (9.69,6.20) 5.160 0 0
1 1 (6.63, 4.24) 1.885 0 0
2 4 (4.98,3.18) 0.627 0 0
3 16 (4.22,2.74) 0.185 0 0.01
4 64 (3.80,2.74) 0.051 0 0.01
5 256 (3.67,2.74) 0.013 0 0.01
6 1024 (3.63,2.75) 0.003 0 0
7 4096 (3.63,2.75) 0.001 0 0
Problems of penalty methods
• Global unconstrained optimization (expensive) needed
– The result is for global optimality
• Single penalty term can make the search easily get stuck
Augmented Lagrangian Function
Augmented Lagrangian Function
• A marriage of penalty and Lagrangian
L(x, λ, c) = f(x) + λ’h(x) + c/2 || h(x)||2
• Two Mechanisms for the marriage to work
– Taking λ close to λ*
• From KKT condition, L(x*, λ*, c) can be a local
minimum assuming x* satisfies 2nd order condition
– Taking c very large
• Squared penalty
An Example
Consider: case 1) Taking λ close to λ*
case 2) Taking c very large
Illustrations
Global Convergence of the Penalty
Approach
• Assuming {λk} is bounded, 0 < ck < ck+1,
ck +∞, then every limit point of {xk} is a
global optimum to the original problem
Proof
• Lck(xk, λk) ≤ Lck(x, λ
k), for any k and x∈ X
• Lck(x*, λk) = f(x*)
• So, Lck(xk, λk) ≤ f(x*), for any k
• Take k → ∞, we have
– limk→∞ f(xk) + λk h(xk) + ck/2||h(xk)||2 ≤ f(x*)
• Since ck → +∞, we must have ||h(xk)|| → 0
• Since the limit point is feasible, it is optimal
Practical behavior: ill-conditioning
• Extensive practical experiences show that the penalty method is pretty reliable
• When it fails, it is often due to the difficulty in minimizing Lck(x
k, λk)
• Difficulty depends on the ratio of largest to smallest eigenvalue of the Hessian of Lck– f(xk+1) ≤ ((M-m)/(M+m))2 f(xk)
• When ck is large, the ratio tends to be large
– Re-visit the example
• Trade-off in increasing ck
How to alleviate the ill-conditioning
problem
• Ill-conditioning caused by too large ck
• But the global convergence requires ck ∞
• It would desirable if the algorithm can work
even if ck is bounded
• Setting λk close to λ* can achieve that
• But how to estimate λ*?
A surprising by-product
• Since we minimize Lck(x, λk) for each k, usually
we have ||∇x Lck(xk, λk) || = 0
Or, in practice, ||∇x Lck(xk, λk) || ≤ εk, εk →0
• Now, let’s examine ∇ x Lck(xk, λk)
∇ Lck(xk, λk) = ∇ f(xk) + ∇ h(xk)(λk+ck h(xk))
= ∇ f(xk) + ∇ h(xk) αk
Take k → ∞ 0 = ∇ f(x*) + ∇ h(x*) α*
• Conclusion: if {xk} converges to x*
– limk → ∞ λk + ck h(xk) = λ*
The Multiplier Method
• λk+1 = λk + ck h(xk)
• λk will approach λ* during search
• The method that increases ck and updates λk
using the above formula is called the
multiplier method
– A popular and reliable method
Example
• For given λk, ck , what is xk?
• Express λk+1 in λk and ck
(knowing that λk+1 = λk + ck h(xk))
• Do λk, xk converge to λ*, x*?
• What is effect of ck? Does ck needs to go to ∞ in
order to converge to x*?
Example
• For given λk, ck ,what is xk?
• Express λk+1 in λk and ck
(knowing that λk+1 = λk + ck h(xk))
• Do λk, xk converge to λ*, x*?
• What is effect of ck? Does ck needs to go to ∞ in
order to converge to x*? Any requirements on ck?
• Minimize f(x) = ½(-x12 + x2
2)
Subject to x1 = 1
Conclusion
• ck does not need to go to ∞, but may need to
be above certain threshold, for the Lagrange
multiplier algorithm to converge.
Guidelines for Parameter Choices
• Set λ0 close to λ*
• ck should eventually become sufficiently large (over some threshold)
• c0 should not be too large to avoid ill-conditioning
• ck should not be increased too fast to the point of causing ill-conditioning early
• ck should not be increased too slowly to the extend that λ has poor convergence to λ*
• use xk as the starting point for iteration k+1
Guidelines for Parameter Choices
(cont’d)
• A popular scheme:
– moderate c0, increase by ck+1 = βck, β>1
– If the unconstrained method is powerful in
dealing with ill-conditioning (e.g. Newton’s),
may set a large β (e.g. [5, 10])
• Another scheme:
– Increase ck only when the constraint violation is
not decreased by a factor γ (e.g. 0.25).