Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Lecture 16
Interior-Point Method
October 27, 2008
Lecture 16
Outline
• Review of Self-concordance
• Overview of Newton’s Methods for Equality Constrained Minimization
• Examples
• Interior-Point Method
• Inequality constrained minimization
• Logarithmic barrier function and central path
• Barrier method
• Feasibility and phase I methods
• Complexity analysis via self-concordance
Convex Optimization 1
Lecture 16
Equality Constrained Minimization
minimize f(x)
subject to Ax = b
KKT Optimality Conditions imply that x∗ is optimal if and only if there
exists a λ∗ such that Ax∗ = b, ∇f(x∗) + ATλ∗ = 0
• Newton’s Method solves KKT conditions
[∇2f(xk) AT
A 0
] [dk
wk
]= −
[∇f(xk)
hk
]
where
• Feasible point method uses hk = 0
• Infeasible point method uses hk = Axk − b
with wk being dual optimal for the minimization of the quadratic
approximation of f at xk
Convex Optimization 2
Lecture 16
Equality Constrained Analytic Centering
minimize f(x) = −∑n
i=1 lnxi
subject to Ax = b
Feasible point Newton’s method: g = ∇f(x), H = ∇2f(x)
[H AT
A 0
] [d
w
]=
[−g
0
], g =
− 1
x1...
− 1xn
, H = diag[1
x21
, . . . ,1
x2n
]
• The Hessian is positive definite
• KKT matrix first row: Hd+ATw = −g ⇒ d = −H−1(g+ATw) (1)
• KKT matrix second row, Ad = 0, and Eq. (1)⇒ AH−1(g+ATw) = 0
• The matrix A has full row rank, thus AH−1AT is invertible, hence
w = −(AH−1AT
)−1AH−1g, H−1 = diag
[x2
1, . . . , x2n
]• The matrix −AH−1AT is known as Schur complement of H (any H)
Convex Optimization 3
Lecture 16
Network Flow Optimization
minimize∑n
l=1 φl(xl)
subject to Ax = b
• Directed graph with n arcs and p + 1 nodes
• Variable xl: flow through arc l
• Cost φl: cost flow function for arc l, with φ′′l (t) > 0
• Node-incidence matrix A ∈ R(p+1)×n defined as
Ail =
1 arc l originates at node i
−1 arc l ends at node i
0 otherwise
• Reduced node-incidence matrix A ∈ Rp×n is A with last row removed
• b ∈ Rp is (reduced) source vector
• Rank A = p when the graph is connected
Convex Optimization 4
Lecture 16
KKT system for infeasible Newton’s method
[H AT
A 0
] [d
w
]= −
[g
h
]where h = Ax− b is a measure of infeasibility at the current point x
• g = [φ′1(x1), . . . , φn(xn)]T
• H = diag [φ′′1(x1), . . . , φ′′n(xn)] with positive diagonal entries
• Solve via elimination:
w = (AH−1AT)−1[h−AH−1g], d = −H−1(g + ATw)
Convex Optimization 5
Lecture 16
Interior-Point Methods
For solving inequality constrained problems of the form
minimize f(x)
subject to gj(x) ≤ 0, j = 1, . . . , m
Ax = b
• The interior-point methods have been extensively studied since early
60’s as a sub-class of penalty methods [penalizing the convex inequality
constraints gj(x), j = 1, . . . , m]
• Log-barrier is just one of the choices for penalty (convergence established
by Fiacco and McCormick in 1965)
• Newton’s method combined with Log-barrier penalty was analyzed by
Karmarkar SSSR in 1984
• As a new polynomial-time algorithm for linear programming problems
• Nesterov and Nemirovski in 1994 provided a new analysis of Karmarkar’s
approach for general convex optimization problem
Convex Optimization 6
Lecture 16
Inequality Constrained Minimization
minimize f(x)
subject to gj(x) ≤ 0, j = 1, . . . , m
Ax = b• Assumptions:
• The functions f and gj are convex and twice continuously differen-
tiable (their domains are open)
• The matrix A ∈ Rp×n has rank p
• The optimal value f ∗ is finite and attained
• No duality gap and a dual optimal (µ∗, λ∗) ∈ Rm × Rp exists x with
gj(x) < 0, j = 1, . . . , m, Ax = b, x ∈ domf,
gj(x) ≤ 0, j = 1, . . . , m, Ax = b x ∈ relintdomf
Convex Optimization 7
Lecture 16
KKT ConditionsFor the inequality constrained problem:
minimize f(x)
subject to gj(x) ≤ 0, j = 1, . . . , m
Ax = b
Under the assumption just stated, x∗ is primal optimal if and only if there
exist µ∗ and λ∗ such that the following KKT conditions hold:
• Primal feasibility: Ax∗ = b, gj(x∗) ≤ 0 for all j
• Dual feasibility: µ∗ � 0
• Lagrangian optimality in x: ∇f(x∗)+∑m
j=1 µ∗j∇gj(x∗)+ATλ∗ = 0
• Complementarity slackness: µ∗jgj(x∗) = 0 for all j
The interior-point method solves these conditions
Our focus is on the barrier type method
Convex Optimization 8
Lecture 16
Logarithmic Barrier FunctionBased on reformulation of the constrained problem via indicator
function:minimize f(x) +
∑mj=1 I−(gj(x))
subject to Ax = b
where I− is the indicator function of nonpositive reals:
I−(u) = 0 when u ≤ 0, and I−(u) = ∞ otherwise
• Consider a point-wise approximation of I− by a logarithmic barrier
φt(u) = −1tln(−u) with t > 0
• Thus: limt→∞ φt(u) = 0 for u < 0, limt→∞ φt(u) = +∞ otherwise
• With t > 0, φt(u) = −1tln(−u) is a smooth approximation of I−,
improving as t →∞• Using this approximation, we have a family of equality constrained
problemsminimize f(x)− 1
t
∑mj=1 ln(−gj(x))
subject to Ax = b
Convex Optimization 9
Lecture 16
Family of Equality Constrained ProblemsConsider the problems with t > 0:
minimize f(x)− 1t
∑mj=1 ln(−gj(x))
subject to Ax = b
• Can be viewed as a sequence of penalized problems approximating the
original problem
• The (penalty) parameter t drives the approximation accuracy: as t
increases the approximation is more accurate
• Each of the approximate problems is convex and differentiable
• The function φ(x) = −∑m
j=1 ln(−gj(x)) is referred to as logarithmic
barrier or log barrier
• Its domain is domφ = {x ∈ Rn | gj(x) < 0, j = 1, . . . , m}
Convex Optimization 10
Lecture 16
Central PathAn equivalent family of problems:
minimize tf(x) + φ(x)
subject to Ax = b(1)
with φ(x) = −∑m
j=1 ln(−gj(x))
• The problem (1) has the same minimizers as when divided by t
• Assume a unique optimal x∗(t) exists for each t > 0. When is this true?
• The set {x∗(t) | t > 0} is referred to as central path for problem (1)
• Each point on the central path is characterized by the following (neces-
sary and sufficient conditions):
• Strict feasibility: Ax∗(t) = b and gj(x∗(t)) < 0 for all j
• There exists λ(t) such that: t∇f(x∗(t))+∇φ(x∗(t))+AT λ(t) = 0
where
∇φ(x) = −m∑
j=1
1
gj(x)∇gj(x)
Convex Optimization 11
Lecture 16
Important Property of Central PathFor an x with g(x) ≺ 0, we have x = x∗(t) if and only if there exists a
λ(t) such that
∇f(x) +m∑
j=1
1
−tgj(x)∇gj(x) +
1
tAT λ(t) = 0, Ax = b
• Therefore, x∗(t) minimizes the Lagrangian
L(x, µ∗(t), λ∗(t)) = f(x) +m∑
j=1
µ∗j(t)gj(x) + λ∗(t)T(Ax− b)
where µ∗j(t) = − 1tgj(x∗(t))
and λ∗(t) = λ(t)t
• Note that L(x∗(t), µ∗(t), λ∗(t)) = f(x∗(t))− mt
• This confirms the intuitive idea that f(x∗(t)) → f ∗ as t →∞:
f(x∗(t))−m
t= L(x∗(t), µ∗(t), λ∗(t)) = q(µ∗(t), λ∗(t)) ≤ f ∗
• Provides a lower estimate for f ∗ (can serve as stopping criterion)
Convex Optimization 12
Lecture 16
Interpretation via KKT ConditionsThe vectors x, µ and λ are primal-dual optimal if and only if they satisfy
• Primal feasibility: gj(x) ≤ 0, j = 1, . . . , m, Ax = b
• Dual feasibility: µ � 0
• Lagrangian optimality in x:
∇f(x) +m∑
j=1
µj∇gj(x) + ATλ = 0
• Complementary slackness: µTg(x) = −mt
The difference with the KKT conditions for the original inequality con-
strained problem is that the last condition replaces µTg(x) = 0
The last condition can be viewed as “approximate” complementary
slackness for the original problem, which converges to the “original”
complementary slackness as t →∞
Convex Optimization 13
Lecture 16
KKT Conditions
Original problem:minimize f(x)
subject to g(x) � 0, Ax = b
x∗ is optimal iff there is (µ∗, λ∗)
• Ax∗ = b, g(x∗) � 0, µ∗ � 0
• ∇f(x∗) +∑m
j=1 µ∗j∇gj(x∗) +
ATλ∗ = 0
• (µ∗)T g(x∗) = 0
Penalized problem:minimize f(x)− 1
t
∑mj=1 ln(−gj(x))
subject to Ax = b
x∗t is optimal iff there is λ:
• Ax∗t = b, g(x∗t) ≺ 0
• ∇f(x∗t) +∑m
j=11
−tgj(x∗t )∇gj(x∗t) +
AT λt = 0
• Letting µjt = 1−tgj(x
∗t )
for all j, we
see that x∗t and (µt, λt) satisfy KKTs
for the original problem with approx-
imate CS condition µTt g(x∗t) = −m
t
Furthermore f ∗ ≤ f(x∗t) ≤ f ∗ + mt
Convex Optimization 14
Lecture 16
Central Path for LP
minimize cTx
subject to aTj x ≤ bj, j = 1, . . . , m
• The logarithmic barrier is given by
φ(x) = −∑m
j=1 ln(bj − aT
j x)
• The gradient and Hessian of the barrier function are
∇φ(x) =∑m
j=11
bj−aTj x
aj = ATd for d ∈ Rm with dj = 1bj−aT
j x
∇2φ(x) =∑m
j=11(
bj−aTj x
)2 ajaTj = ATdiag[d]2A
• Since d � 0 for x ∈ dom φ, the Hessian is nonsingular iff rank(A) = n
• The centrality condition reduces to: tc +∑m
j=11
bj−aTj x
aj = 0
• Interpretation: at x∗(t) on the central path
the gradient ∇φ(x∗(t)) is parallel to −c, i.e., the hyperplane cTx =
cTx∗(t) is tangent to the level set of φ through x∗(t)
Convex Optimization 15
Lecture 16
Barrier Method
Given a strictly feasible x, t := t0 > 0, β > 1, and a tolerance ε > 0.
Repeat
1. Centering step. Compute x∗(t) solving minAz=b {tf(z) + φ(z)}
2. Update. Set x := x∗(t).
3. Stopping criterion. Quit if m/t < ε.
4. Increase penalty. Set t := βt.
Convex Optimization 16
Lecture 16
• It terminates with f(x)− f ∗ ≤ ε [follows from f(x∗(t))− f ∗ ≤ m/t]
• Centering steps are viewed as outer iterations
• Inside centering step, computing x∗(t) is usually done using Newton’s
method starting at current x; these are inner iterations
• The methods of this form are also referred as sequential minimizationtechniques
• Choice of β involves a trade-off: large β means fewer outer iterations,
more inner (Newton) iterations
• Choosing t large so as to have m/t < ε in one iteration causes problem
with points close to the boundary of the feasible set: Hessian is unstable
∇2φ(x) =m∑
i=1
1
gj(x)2∇gj(x)∇gj(x)
T +m∑
i=1
1
−gj(x)∇2gj(x)
Convex Optimization 17
Lecture 16
Convergence Analysis: Outer Iterations
Number of outer (centering) iterations: to determine x(t) such that
f(x(t))− f ∗ ≤ ε, we need K such that
m
βKt0≤ ε
Thus, the number K of outer iterations for accuracy ε is given exactly by
ln
(mt0ε
)lnβ
Convex Optimization 18
Lecture 16
Convergence Analysis: Initialization and
Inner IterationsTo complete the analysis of the method, we need to address
• The initial step of the method:
• Determining a strictly feasible starting point x
x ∈ domf, Ax = b, gj(x) < 0, j = 1, . . . , m
• The centering step: solving subproblems of the form
minimize tf(z) + φ(z)
subject to Az = b
• The functions tf + φ must have closed level sets for all t ≥ t0
• Analysis via self-concordance requires self-concordance of tf + φ
[thus, three-times differentiable tf + φ]
Convex Optimization 19
Lecture 16
Feasibility and Phase I Methods
• The barrier method requires a strictly feasible starting point x
• When such a point is not available, the method is preceded by a
preliminary stage (part of the initialization)
• Phase I: computing a strictly feasible point or finding that the
constraints are infeasible
• The point found in Phase I serves as a starting iteration in the method
• The later stage is referred as Phase II of the method
• Basic approach for determining a feasible x:
Minimizing the maximum infeasibility
• Many variations of this basic approach exists, among them:
Minimizing the sum of infeasibilities
Convex Optimization 20
Lecture 16
Feasibility: Minimizing the Maximum Infeasibility
Basic phase I approachminimize s
subject to gj(x) ≤ s, j = 1, . . . , m
Ax = b
(2)
NOTE:• The minimization is with respect to (x, s) with s ∈ R• The domain of f has to be taken into account [added in constraints]
Let domf = Rn and f ∗ be the optimal value of the feasibility problem• The feasibility problem is strictly feasible: apply barrier method• When (x, s) with s < 0 is feasible for problem (1), x is strictly feasible
in original problem• When f ∗ > 0, the original problem is infeasible• When f ∗ = 0 and attained, the original problem is feasible [not strictly]• When f ∗ = 0 and not attained, the original problem is infeasible
Note: In practice, we cannot exactly determine that f ∗ = 0;
typically, the feasibility algorithm terminates with |f ∗| ≤ η
Convex Optimization 21
Lecture 16
Alternative Approach:
Minimize the Sum of Infeasibilities
minimize 1Ts
subject to gj(x) ≤ sj, j = 1, . . . , m
Ax = b, s � 0• For a fixed x, the optimal value of sj is max{gj(x),0} - indeed
maximizing the sum of infeasibilities
• The optimal value is f ∗ = 0 and attained if and only if the originalproblem is feasible
• Interesting property when the original problem is infeasible:
• The approach produces a solution that satisfies many more constraints
than the basic phase I approach [minimizing the max infeasibility]
• Thus, it identifies a larger subset of constraints that are feasibleNOTE:
• However, this model produces a feasible point but NOT a strictly feasible
Convex Optimization 22
Lecture 16
Analysis of the Inner Iterations
Analysis Assumption:
• The feasible level sets of the original problem are bounded
• The function tf + φ is closed and self-concordant for all t ≥ t0
When f and gj are linear or quadratic, the function tf −∑m
j=1 ln(−gj) is
self-concordant for all t ≥ 0 (LPs, QPs and QCQPs)
NOTE:
The barrier method works well whether or not self-concordance is present
Convex Optimization 23
Lecture 16
Inner Iterations
The iterations within centering step: accuracy fixed to εc
• Start with x(t), a solution to minAz=b{tf(z) + φ(z)}
• Use Newton’s method and backtracking line search with parameters
α = 1, βc ∈ (0,1) and σ ∈ (0,1/2)
• Obtain x(βt), an εc-solution to the problem minAz=b{βtf(z) + φ(z)}Recall: to minimize a closed (self-concordant function f , the number of
Newton’s iterations, with backtracking line search and starting iterate x0,
is bounded byf(x0)− f ∗
Γ+ κ
with Γ = σβc(1−2σ)2
20−8σand κ = log2 log2
(1εc
)We apply this to the function βtf(z) + φ(z)
Convex Optimization 24
Lecture 16
Bound on the Number of Inner IterationsUnder the level set boundedness and self-concordance assumption on func-
tions tf + φ, t ≥ t0, the number of Newton’s iterations within a centering
step is estimated as follows:• Under unrealistic assumptions of exact solutions: x = x∗(t) and xβ =
x∗(βt) optimal for the penalized problems with penalties t and βt resp.
• We have: # Newton’s steps ≤ βtf(x)+φ(x)−βtf(xβ)−φ(xβ)
Γ + κ
with Γ = σβc(1−2σ)2
20−8σand κ = log2 log2
(1εc
)Therefore:
numerator = βtf(x)− βtf(xβ) +m∑
j=1
ln[−βtµjgj(xβ)]−m lnβ
≤ βtf(x)− βtf(xβ)− βtm∑
j=1
µjgj(xβ)−m−m lnβ
≤ βtf(x)− βtq(µ, λ)−m−m lnβ
= βtf(x)− βtf(x) + mβ −m−m lnβ
= m(β − 1− lnβ)
Convex Optimization 25
Lecture 16
A Total Bound on Newton’s IterationsCombining the bound on total number of Newton’s iterations:
N ≤
ln
(mt0ε
)lnβ
⌈m(β − 1− lnβ)
Γ+ κ
⌉
• The bound depends strongly on the parameter β
• The function β − 1 − lnβ is almost quadratic for small β, and grows
linearly for large β
• Hence, for a small number of inner iterations, it is recommendable to
use a smaller β [the number of inner iterations can be large for large β]
• The bound does not depend on n and p (size of x and # of rows in A)
• When β = 1 + 1√m
the bound on the total number of Newton’s steps
N ≤(
12Γ + κ
) ⌈√m log2
(mt0ε
)⌉
Convex Optimization 26
Lecture 16
Pros and Cons
Advantages
• Interior Point Method is successfully applied in practice
• Works very well for moderately large problems
• Even for nonconvex (issues with “trapped” at a local minimum)
Limitations
• High requirement on differentiability of f and gjs
• The method “is inefficient” when the size of the problem is very large
(n in millions) and the Hessians are not sparse
• Not suitable for “distributed” computations in general
Convex Optimization 27
Lecture 16
ExampleImage Reconstruction in PET-scan [Ben-Tal, 2005]
• Maximum Likelihood Model results in convex optimization
minx≥0, e′x≤1
−m∑
i=1
yi ln
n∑j=1
pijxj
• x is a decision vector - size n
• y models measured data (by PET detectors) - size m
• pij probabilities modeling detections of emitted positrons
• The number n of decision variables ranges from 1/2− 3 millions
• The number m of data variables ranges from 3− 25 millions
• The Hessians are not sparse
Convex Optimization 28
Lecture 16
Interior Point Method can be Inefficient
• For the PET Imaging problem [Ben-Tal, 2005]
• Image resolution 64 × 64 × 64 and n = 262,144, the CPU time per
Newton’s iteration is about 2.5 hours
• Image resolution 128×128×128 and n = 2,097,152 the CPU time
per Newton’s iteration is more than 13 days
• This motivates a re-newed interest in gradient-type methods
• The complexity of a gradient-type step is linear in n
• In practice, their accuracy is not high, BUT in large scale problems “high
accuracy” is not required
• Gradient-type methods are well suited for medium-accuracy solutions
in extremely large convex problems
Convex Optimization 29