Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Convex OptimizationLecture 13
Today: Interior-Point (continued)
• Central Path method for SDP
• Feasibility and Phase I Methods
• From Central Path to Primal/Dual
Central'Path'Log'Barrier'Method
Access&to:• 2nd order&oracle&for&!",&!_• Explicit&access&to&*, +• Strictly&feasible&point&# "
• Assumptions:• !" convex&and&selfGconcordant• !_ convex&quadratic&(or&linear)• # " strictly&convex&with&!_ # " < ö
• Overall&#Newton&Iterations:&V = (log H [⁄ + log log H õ⁄ )
• Overall&runtime:&≈ V = N + ú à + =:&':evals log H [⁄
Init: Feasible&# " and&some&5(")Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# >
# >GH ← #∗(5 > )Stop&if&fÑ ≤ ϵ5 >GH ← ï ⋅ 5 > (for&some¶meter&ï > 1)
ArkadiNemirovski
YuriNesterov
Johnvon&Neumann
NarendraKarmarkar
Optimizing'with'Matrix'Inequalities
!_ :ℝ9 → ¢>è
• Central&path&given&by&solutions&to:
min/∈ℝ2
: !"(#)3. 5. !_ # ≼ 0
*# = +
min/∈ℝ2
: !" # − ∑ logdet(−!_ # )_
3. 5. *# = +
ã #, D, B =!" # + ∑ ⟨D_, !_ # ⟩_ + ⟨B,*# − +⟩
ãÑ #, B =!" # − H
Ñ∑ logdet −!_ #_ + ⟨B, *# − +⟩
0 = &/ãÑ #Ñ∗, BÑ∗ =&!" #Ñ∗ + ∑ éH
Ñ!_ #Ñ∗ éH⨀&!_(#Ñ∗)_ + *?BÑ∗
&/ã #Ñ∗, DÑ∗, BÑ∗
= &!" #Ñ∗ + ∑ DÑ∗⨀&!_(#Ñ∗)_ + *?BÑ∗ = 0
S DÑ∗, BÑ∗= inf
/ã #, DÑ∗, BÑ∗ = ã #Ñ∗, DÑ∗, BÑ∗
= !" #Ñ∗ − ∑ HÑ⟨!_ #Ñ∗ éH, !_(#Ñ∗)⟩ + BÑ∗,*#Ñ∗ − +
= !" #Ñ∗ −∑ >èeèß®Ñ
min/∈ℝ2
: !"(#)3. 5. !_ # ≼ 0,&*# = +
min/∈ℝ2
: !" # − HÑ∑ log det −!_ #f_`H
3. 5. *# = +
Optimum&#Ñ∗,&dual&opt&BÑ∗ #Ñ∗ is&strictly&feasible
Define&DÑ∗ =éHÑ !_ #Ñ
∗ éH ≻ 0
How&suboptimal&is&#Ñ∗ ?
DÑ∗, BÑ∗ dual&(strictly)&feasible&with
!" #Ñ∗ − S DÑ∗, BÑ∗ =∑ U_f_`H5
Optimizing'with'Matrix'Inequalities
• An&optimum&#∗(5) for&the&5Gbarrier&problem&is&F = ∑ >èeèß®Ñ
suboptimal&for&constrained&problem• Central&Path&method:
min/∈ℝ2
: !"(#)3. 5. !_ # ≼ 0 ∈ ¢>è
*# = +
min/∈ℝ2
: !" # − HÑ∑ logdet(−!_ # )_
3. 5. *# = +
Init: Feasible&# " and&some&5 (")Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# >
# >GH ← #∗(5 > )Stop&if&∑ >èe
èß®Ñ
≤ ϵ5 >GH ← ï ⋅ 5 > (for&some¶meter&ï > 1)
Central'Path'Method'for'SDP
Access&to:• 2nd order&oracle&for&!",&!_• Explicit&access&to&*, +• Strictly&feasible&point&# "
• Assumptions:• !":ℝ9 → ℝ convex&and&selfGconcordant• !_:ℝ9 → ¢>è convex&quadratic&(or&linear)• # " strictly&feasible&with&!_ # " ≺ öÇ:
• Overall&#Newton&Iterations:&V =U (log H [⁄ + log log H õ⁄ )
• Overall&runtime:&≈ V =U N + ú à + =:&':evals log H [⁄
Init: Feasible&# " and&some&5(")Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# >
# >GH ← #∗(5 > )Stop&if&
∑ >èeèß®Ñ ≤ ϵ
5 >GH ← ï ⋅ 5 > (for&some¶meter&ï > 1)
ArkadiNemirovski
YuriNesterov
Feasibility and Phase I Methods
Recall that in the Log Barrier Central Path method we need to start with a (strictly)feasible x(0).Two phases:
• Phase I : Solve feasibility problem
• Phase II : Use solution as starting point for barrier method
We can convert feasibility to an optimization problem:
(P )Find x ∈ Rn
s.t. fi(x) ≤ 0Ax = b
⇒min
x∈Rn,s∈Rs
s.t. fi(x) ≤ sAx = b
(P̄ )
This optimization problem is always feasible:we can start from a solution to Ax(0) = b and set s = maxi fi(x
(0)).
Then we can apply the log barrier method to solve the optimization problem.
minx∈Rn,s∈R
s
s.t. fi(x) ≤ s
Ax = b
How well do we need to optimize?
• If we find a P̄ -feasible (x, s) with s < 0 ⇒ x is strictly P -feasible
• If we get an ε-suboptimal solution to P̄ with s > ε ⇒ P is infeasible
• Otherwise, there could be a solution that is feasible but not strictly so
Can convert feasibility to optimization with matrix constraints too:
Find x ∈ Rn
s.t. fi(x) � 0Ax = b
⇒min
x∈Rn,s∈Rs
s.t. fi(x) � sIAx = b
Finally, note that we can also reduce optimization to feasibility:
min f0(x)s.t. fi(x) ≤ 0
Ax = b⇒
Find xs.t. fi(x) ≤ 0
f0(x) ≤ sAx = b
(P s)
then search over s.
From Central Path to Primal/Dual
Let us review our approach.We would like to solve the KKT of (P):
(KKT)
Ax = bfi(x) ≤ 0λ ≥ 0∇f0(x) +
∑i λifi(x) + A>ν = 0
λifi(x) = 0
At each iteration we consider problem (Pt), i.e., solving:
Ax = b
∇f0(x) +∑
i
−1
tfi(x)∇fi(x) + A>ν = 0
And we do this by Newton: linearize w.r.t. x (and ν) around x(k).
This can be viewed as solving modified KKT:
(KKTt)
Ax = bfi(x) ≤ 0λ ≥ 0∇f0(x) +
∑i λifi(x) + A>ν = 0
λifi(x) = −1/t
Solve by:(i) Eliminate λi = −1
tfi(x), and get a problem in (x, ν)
(ii) Linearize w.r.t. (x, ν) around x(k)
Instead, in P/D we maintain both x(k) and λ(k), and linearize (KKTt) w.r.t. bothx and λ around x(k) and λ(k), without first eliminating λ.
Primal-dual method
Define the residuals:
rpri(x) = Ax− b ∈ Rp
rdual(x, λ, ν) = ∇f0(x) +∑
i
λifi(x) + A>ν ∈ Rn
rcent(t)(x, λ) =
λ1f1(x) + 1/t
...λmfm(x) + 1/t
∈ Rm
Jointly:
r(t)(x, λ, ν) = (rpri, rdual, rcent(t)) ∈ Rp+n+m
If x, λ, ν satisfy r(t)(x, λ, ν) = 0 (and fi(x) < 0, λ > 0), then x = x∗(t), λ = λ∗(t),and ν = ν∗(t).
Therefore, at each iteration we approximately solve:
r(t)(x + ∆x, λ + ∆λ, ν + ∆ν) = 0
s.t. fi(x + ∆x) ≤ 0
λ + ∆λ ≥ 0
This is done by linearizing w.r.t. ∆x,∆λ:
r(t)(x, λ, ν + ∆ν) +∇xr(t)(x, λ, ν)>∆x +∇λr(t)(x, λ, ν)>∆λ
Boils down to:
610 11 Interior-point methods
duality gap m/t. The first block component of rt,
rdual = ∇f0(x) + Df(x)Tλ+ AT ν,
is called the dual residual, and the last block component, rpri = Ax − b, is calledthe primal residual. The middle block,
rcent = −diag(λ)f(x) − (1/t)1,
is the centrality residual, i.e., the residual for the modified complementarity condi-tion.
Now consider the Newton step for solving the nonlinear equations rt(x,λ, ν) =0, for fixed t (without first eliminating λ, as in §11.3.4), at a point (x,λ, ν) thatsatisifes f(x) ≺ 0, λ ≻ 0. We will denote the current point and Newton step as
y = (x,λ, ν), ∆y = (∆x, ∆λ, ∆ν),
respectively. The Newton step is characterized by the linear equations
rt(y + ∆y) ≈ rt(y) + Drt(y)∆y = 0,
i.e., ∆y = −Drt(y)−1rt(y). In terms of x, λ, and ν, we have⎡⎣
∇2f0(x) +∑m
i=1 λi∇2fi(x) Df(x)T AT
−diag(λ)Df(x) −diag(f(x)) 0A 0 0
⎤⎦⎡⎣
∆x∆λ∆ν
⎤⎦ = −
⎡⎣
rdual
rcent
rpri
⎤⎦ .
(11.54)The primal-dual search direction ∆ypd = (∆xpd, ∆λpd, ∆νpd) is defined as thesolution of (11.54).
The primal and dual search directions are coupled, both through the coefficientmatrix and the residuals. For example, the primal search direction ∆xpd dependson the current value of the dual variables λ and ν, as well as x. We note also thatif x satisfies Ax = b, i.e., the primal feasibility residual rpri is zero, then we haveA∆xpd = 0, so ∆xpd defines a (primal) feasible direction: for any s, x + s∆xpd
will satisfy A(x + s∆xpd) = b.
Comparison with barrier method search directions
The primal-dual search directions are closely related to the search directions usedin the barrier method, but not quite the same. We start with the linear equa-tions (11.54) that define the primal-dual search directions. We eliminate the vari-able ∆λpd, using
∆λpd = −diag(f(x))−1 diag(λ)Df(x)∆xpd + diag(f(x))−1rcent,
which comes from the second block of equations. Substituting this into the firstblock of equations gives
[Hpd AT
A 0
] [∆xpd
∆νpd
]
= −[
rdual + Df(x)T diag(f(x))−1rcent
rpri
]
= −[ ∇f0(x) + (1/t)
∑mi=1
1−fi(x)∇fi(x) + AT ν
rpri
], (11.55)
while always maintaining fi(x) < 0 and λi > 0
It follows that:
rpri(x) = 0 ⇒ x is primal feasible
rdual(x, λ, ν) = 0 ⇒ ∇xL(x, λ, ν) = 0, so x minimizes L, and
g(λ, ν) = f0(x) +∑
i
λifi(x) + ν>(Ax− b) > −∞,
so (λ, ν) are dual feasible
If in addition we have rcent = 0, then:
g(λ, ν) = f0(x) +∑
i
λi−1
λit+ 0 = f0(x)− m
t
So the gap between (P) and (D): f0(x)− g(λ, ν) ≤ mt .
⇒ suboptimality ≤ mt
Even if rcent 6= 0, as long as rpri = 0 and rdual = 0, then
g(λ, ν) = f0(x) +∑
i
λifi(x)
⇒ f0(x)− g(λ, ν) = −∑
i
λifi(x)
︸ ︷︷ ︸η̂(x,λ)
where η̂(x, λ) > 0 is the surrogate gap, and we are η̂ suboptimal.
Primal-dual interior-point algorithm
• Start at initial x(0), λ(0), ν(0) s.t. fi(x(0)) < 0 and λ
(0)i > 0
• Iterate:
– Determine t(k): set t(k) = µ mη̂(x(k),λ(k))
– Compute search direction:Linearize (KKTk) for x = x(k) + ∆x, λ = λ(k) + ∆λ, ν = ν(k) + ∆νSolve to obtain ∆x(k),∆λ(k),∆ν(k)
– Set step size s(k) by line search on ‖r(t)(x, λ, ν)‖,ensuring fi(x) < 0 and λi > 0
– Update: (x(k+1), λ(k+1), ν(k+1)) += s(k)(∆x(k),∆λ(k),∆ν(k))
– Stop if: ‖rpri‖ < εfeas and ‖rdual‖ < εfeas (approx. feasible),and η̂(x(k), λ(k)) < ε
Important: x(k) need not be feasible – OK if Ax(k) 6= bAlso, (λ(k), ν(k)) need not be feasible – g(λ(k), ν(k)) can be ∞
Advantages: single loop, no phase I
Why no need for phase I?
We don’t need to ensure Ax = b, but we do need fi(x) < 0 and λ > 0.
We can rewrite (P ) as:
minx∈Rn,s∈R
f0(x)
s.t. fi(x) ≤ s
Ax = b
s = 0
Now we can start with any x(0) s.t. fi(x(0)) <∞, then set s = maxi fi(x
(0)) + 1.
If finding such x(0) is hard, we can rewrite as:
minx∈Rns∈R
x1,...,xm∈Rn
f0(x)
s.t. fi(xi) ≤ s
Ax = b
s = 0
x = xi ∀iThen can find point in domain for each fi separately.But many more variables (mn)