Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Convex Optimization & Lagrange Duality
Chee Wei Tan
CS6491 Topics in Optimization and its Applications to Computer Science
Outline
• Convex optimization
• Optimality condition
• Lagrange duality
• KKT optimality condition
• Sensitivity analysis
1
Convex Optimization
A convex optimization problem with variables x:
minimize f0(x)subject to fi(x) 0, i = 1, 2, . . . ,m
aTi x = bi, i = 1, 2, . . . , p
where f0, f1, . . . , fm are convex functions.
• Minimize convex objective function (or maximize concave objectivefunction)
• Upper bound inequality constraints on convex functions ()Constraint set is convex)
• Equality constraints must be a�ne
2
Convex Optimization
• Epigraph form:
minimize tsubject to f0(x)� t 0
fi(x) 0, i = 1, 2, . . . ,maTi x = bi, i = 1, 2, . . . , p
• Can you reformulate the Illumination Problem in Lecture 1?
3
• Can you reformulate the following problem (not in convexoptimization form):
minimize x21 + x2
2
subject to x11+x2
2 0
(x1 + x2)2 = 0
Now transformed into a convex optimization problem:
minimize x21 + x2
2
subject to x1 0x1 + x2 = 0
4
Locally Optimal ) Globally Optimal
Given x is locally optimal for a convex optimization problem, i.e., x is feasibleand for some R > 0,
f0(x) = inf{f0(z)|z is feasible , kz � xk2 R}
Suppose x is not globally optimal, i.e., there is a feasible y such thatf0(y) < f0(x)
Since ky � xk2 > R, we can construct a point z = (1� ✓)x+ ✓y where✓ = R
2ky�xk2. By convexity of feasible set, z is feasible. By convexity of f0, wehave
f0(z) (1� ✓)f0(x) + ✓f0(y) < f0(x)
which contradicts locally optimality of x
Therefore, there exists no feasible y such that f0(y) < f0(x)
5
Optimality Condition for Di↵erentiable f0
x is optimal for a convex optimization problem i↵ x is feasible andfor all feasible y:
rf0(x)T (y � x) � 0
�rf0(x) is supporting hyperplane to feasible set
Unconstrained convex optimization: condition reduces to:
rf0(x) = 0
Proof: take y = x� trf0(x) where t 2 R+. For small enough t, y isfeasible, so rf0(x)T (y � x) = �tkrf0(x)k22 � 0. Thus rf0(x) = 0
6
Unconstrained Quadratic Optimization
Minimize f0(x) =12x
TPx+ qTx+ r
P is positive semidefinite. So it’s a convex optimization problem
x minimizes f0 i↵ (P, q) satisfy this linear equality:
rf0(x) = Px+ q = 0
• If q /2 R(P ), no solution. f0 unbounded below
• If q 2 R(P ) and P � 0, there is a unique minimizer x⇤ = �P�1q
• If q 2 R(P ) and P is singular, set of optimal x: �P †q +N (P )
7
Equality Constrained Convex Optimization
minimize f0(x)subject to Ax = b
Assume nonempty feasible set. Sincerf0(x)T (y � x) � 0, 8y : Ay = b, and every feasible y can be
written as y = x+ v for some v 2 N (A), optimality condition is:
rf0(x)Tv � 0, 8v 2 N (A)
which implies rf0(x)Tv = 0, i.e., rf0(x) is orthogonal to N (A).Thus rf0(x) 2 R(AT ), i.e., there exists ⌫ such that
rf0(x) +AT⌫ = 0
Conclusion: x is optimal i↵ Ax = b and 9⌫ s.t. rf0(x) +AT⌫ = 0
8
Duality Mentality
Bound or solve an optimization problem via a di↵erent optimizationproblem!
We’ll develop the basic Lagrange duality theory for a generaloptimization problem, then specialize for convex optimization
9
Lagrange Dual Function
An optimization problem in standard form:
minimize f0(x)subject to fi(x) 0, i = 1, 2, . . . ,m
hi(x) = 0, i = 1, 2, . . . , p
Variables: x 2 Rn. Assume nonempty feasible set
Optimal value: p⇤. Optimizer: x⇤
Idea: augment objective with a weighted sum of constraints
Lagrangian L(x,�, ⌫) = f0(x) +Pm
i=1 �ifi(x) +Pp
i=1 ⌫ihi(x)
Lagrange multipliers (dual variables): � ⌫ 0, ⌫
Lagrange dual function: g(�, ⌫) = infxL(x,�, ⌫)
10
Lower Bound on Optimal Value
Claim: g(�, ⌫) p⇤, 8� ⌫ 0, ⌫
Proof: Consider feasible x̃:
L(x̃,�, ⌫) = f0(x̃) +mX
i=1
�ifi(x̃) +pX
i=1
⌫ihi(x̃) f0(x̃)
since fi(x̃) 0 and �i � 0
Hence, g(�, ⌫) L(x̃,�, ⌫) f0(x̃) for all feasible x̃
Therefore, g(�, ⌫) p⇤
11
Lagrange Dual Function and ConjugateFunction
• Lagrange dual function g(�, ⌫)
• Conjugate function: f⇤(y) = supx2dom f(yTx� f(x))
Consider linearly constrained optimization:
minimize f0(x)subject to Ax � b
Cx = d
g(�, ⌫) = infx
�f0(x) + �T (Ax� b) + ⌫T (Cx� d)
�
= �bT�� dT⌫ + infx
�f0(x) + (AT�+ CT⌫)Tx
�
= �bT�� dT⌫ � f⇤0 (�AT�� CT⌫)
12
Example
We’ll use the simplest version of entropy maximization as ourexample for the rest of this lecture on duality. Entropy maximization
is an important basic problem in information theory:
minimize f0(x) =Pn
i=1 xi log xi
subject to Ax � b1Tx = 1
Since the conjugate function of u log u is ey�1, by independence ofthe sum, we have
f⇤0 (y) =
nX
i=1
eyi�1
13
Therefore, dual function of entropy maximization is
g(�, ⌫) = �bT�� ⌫ � e�⌫�1nX
i=1
e�aTi �
where ai are columns of A
14
Lagrange Dual Problem
Lower bound from Lagrange dual function depends on (�, ⌫).What’s the best lower bound that can be obtained from Lagrange
dual function?
maximize g(�, ⌫)subject to � ⌫ 0
This is the Lagrange dual problem with dual variables (�, ⌫)
Always a convex optimization! (Dual objective function always aconcave function since it’s the infimum of a family of a�ne
functions in (�, ⌫))
Denote the optimal value of Lagrange dual problem by d⇤
15
Weak Duality
What’s the relationship between d⇤ and p⇤?
Weak duality always hold (even if primal problem is not convex):
d⇤ p⇤
Optimal duality gap:
p⇤ � d⇤
E�cient generation of lower bounds through (convex) dual problem
16
Strong Duality
Strong duality (zero optimal duality gap):
d⇤ = p⇤
If strong duality holds, solving dual is ‘equivalent’ to solving primal.But strong duality does not always hold
Convexity and constraint qualifications ) Strong duality
A simple constraint qualification: Slater’s condition (there existsstrictly feasible primal variables fi(x) < 0 for non-a�ne fi)
Another reason why convex optimization is ‘easy’
17
Example: Entropy Maximization
Primal optimization problem (variables x):
minimize f0(x) =Pn
i=1 xi log xi
subject to Ax � b, 1Tx = 1
Dual optimization problem (variables �, ⌫):
maximize �bT�� ⌫ � e�⌫�1Pn
i=1 e�aTi �
subject to � ⌫ 0
Analytically maximize over the unconstrained ⌫ ) Simplified dualoptimization problem (variables �) and strong duality holds:
maximize �bT�� logPn
i=1 exp(�aTi �)subject to � ⌫ 0
18
Saddle Point Interpretation
Assume no equality constraints. We can express primal optimalvalue as
p⇤ = infx
sup�⌫0
L(x,�)
By definition of dual optimal value:
d⇤ = sup�⌫0
infx
L(x,�)
Weak duality (max min inequality):
sup�⌫0
infx
L(x,�) infx
sup�⌫0
L(x,�)
19
Strong duality (saddle point property):
sup�⌫0
infx
L(x,�) = infx
sup�⌫0
L(x,�)
20
Complementary Slackness
Assume strong duality holds:
f0(x⇤) = g(�⇤, ⌫⇤)
= infx
f0(x) +
mX
i=1
�⇤i fi(x) +
pX
i=1
⌫⇤i hi(x)
!
f0(x⇤) +
mX
i=1
�⇤i fi(x
⇤) +pX
i=1
⌫⇤i hi(x⇤)
f0(x⇤)
21
Complementary Slackness
So the two inequalities must hold with equality. This implies:
�⇤i fi(x
⇤) = 0, i = 1, 2, . . . ,m
Complementary Slackness Property:
�⇤i > 0 ) fi(x
⇤) = 0
fi(x⇤) < 0 ) �⇤
i = 0
22
KKT Optimality Conditions
Since x⇤ minimizes L(x,�⇤, ⌫⇤) over x, we have
rf0(x⇤) +
mX
i=1
�⇤irfi(x
⇤) +pX
i=1
⌫⇤irhi(x⇤) = 0
Karush-Kuhn-Tucker optimality conditions:
fi(x⇤) 0, hi(x⇤) = 0, �⇤i ⌫ 0
�⇤i fi(x
⇤) = 0
rf0(x⇤) +Pm
i=1 �⇤irfi(x⇤) +
Ppi=1 ⌫
⇤irhi(x⇤) = 0
23
KKT Optimality Conditions
• Any optimization (with di↵erentiable objective and constraintfunctions) with strong duality, KKT condition is necessarycondition for primal-dual optimality
• Convex optimization (with di↵erentiable objective and constraintfunctions) with Slater’s condition, KKT condition is also su�cientcondition for primal-dual optimality (useful for theoretical andnumerical purposes)
24
Example: Entropy Maximization
Primal optimization problem (variables x):
minimize f0(x) =Pn
i=1 xi log xi
subject to Ax � b, 1Tx = 1
Dual optimization problem (variables �, ⌫):
maximize �bT�� ⌫ � e�⌫�1Pn
i=1 e�aTi �
subject to � ⌫ 0
Having solved dual problem, recover optimal primal variables asminimizer of
L(x,�⇤, ⌫⇤) =nX
i=1
xi log xi + �⇤T (Ax� b) + ⌫⇤(1Tx� 1)
25
i.e., x⇤i = 1/ exp(aTi �
⇤ + ⌫⇤ + 1)
If x⇤ above is primal feasible, it’s the optimal primal. Otherwise, theprimal optimum is not attained
26
Example: Maximizing Channel Capacity asWater-filling
maximizenX
i=1
log
✓1 +
xi
↵i
◆
subject to x ⌫ 0, 1Tx = 1
Variables: x (powers). Constants: ↵i > 0 (channel noise)
KKT conditions:
x⇤ ⌫ 0, 1Tx⇤ = 1, �⇤ ⌫ 0
�⇤ix
⇤i = 0, �1/(↵i + x⇤
i )� �⇤i + ⌫⇤ = 0
27
Since �⇤ are slack variables, reduce to
x⇤ ⌫ 0, 1Tx⇤ = 1
x⇤i (⌫
⇤ � 1/(↵i + x⇤i )) = 0, ⌫⇤ � 1/(↵i + x⇤
i )
If ⌫⇤ < 1/↵i, x⇤i > 0. So x⇤
i = 1/⌫⇤ � ↵i. Otherwise, x⇤i = 0
Thus, x⇤i = [1/⌫⇤ � ↵i]+ where ⌫⇤ is such that
Pi x
⇤i = 1
• Draw a picture to interpret this optimal solution
• Design an algorithm to compute this optimal solution
28
Global Sensitivity Analysis
Perturbed optimization problem:
minimize f0(x)subject to fi(x) ui, i = 1, 2, . . . ,m
hi(x) = vi i = 1, 2, . . . , p
Optimal value p⇤(u, v) as a function of parameters (u, v)
Assume strong duality and that dual optimum is attained:
p⇤(0, 0) = g(�⇤, ⌫⇤) f0(x) +
Pi �
⇤i fi(x) +
Pi ⌫
⇤i hi(x)
f0(x) + �⇤Tu+ ⌫⇤Tv
29
Global Sensitivity Analysis
p⇤(u, v) � p⇤(0, 0)� �⇤Tu� ⌫⇤Tv
• If �⇤i is large, tightening ith constraint (ui < 0) will increase
optimal value greatly
• If �⇤i is small, loosening ith constraint (ui > 0) will reduce optimal
value only slightly
30
Local Sensitivity Analysis
Assume that p⇤(u, v) is di↵erentiable at (0, 0):
�⇤i = �@p⇤(0, 0)
@ui, ⌫⇤i = �@p⇤(0, 0)
@vi
Shadow price interpretation of Lagrange dual variables
Small �⇤i means tightening or loosening ith constraint will not
change optimal value by much
31
Summary
• Convexity mentality. Convex optimization is ‘nice’ for severalreasons: local optimum is global optimum, zero optimal duality gap(under mild conditions), KKT optimality conditions are necessaryand su�cient
• Duality mentality. Can always bound primal through dual,sometimes indirectly solve primal through dual
• Primal-dual: where is the optimum, how sensitive is it toperturbation in problem parameters?
Reading assignment: Sections 4.1-4.2 and 5.1-5.7 of textbook.
32