Lecture 16 Interior-Point Method - University Of Illinoisangelia/L16_interiormethod.pdf · Lecture 16 Outline • Review of Self-concordance • Overview of Newton’s Methods for

Lecture 16

Interior-Point Method

October 27, 2008

Lecture 16

Outline

• Review of Self-concordance

• Overview of Newton’s Methods for Equality Constrained Minimization

• Examples

• Interior-Point Method

• Inequality constrained minimization

• Logarithmic barrier function and central path

• Barrier method

• Feasibility and phase I methods

• Complexity analysis via self-concordance

Convex Optimization 1

Lecture 16

Equality Constrained Minimization

minimize f(x)

subject to Ax = b

KKT Optimality Conditions imply that x∗ is optimal if and only if there

exists a λ∗ such that Ax∗ = b, ∇f(x∗) + ATλ∗ = 0

• Newton’s Method solves KKT conditions

[∇2f(xk) AT

A 0

] [dk

wk

]= −

[∇f(xk)

hk

]

where

• Feasible point method uses hk = 0

• Infeasible point method uses hk = Axk − b

with wk being dual optimal for the minimization of the quadratic

approximation of f at xk


Lecture 16

Equality Constrained Analytic Centering

minimize f(x) = −∑n

i=1 lnxi

subject to Ax = b

Feasible point Newton’s method: g = ∇f(x), H = ∇2f(x)

[H AT

A 0

] [d

w

]=

[−g

0

], g =

− 1

x1...

− 1xn

, H = diag[1

x21

, . . . ,1

x2n

]

• The Hessian is positive definite

• KKT matrix first row: Hd+ATw = −g ⇒ d = −H−1(g+ATw) (1)

• KKT matrix second row, Ad = 0, and Eq. (1)⇒ AH−1(g+ATw) = 0

• The matrix A has full row rank, thus AH−1AT is invertible, hence

w = −(AH−1AT

)−1AH−1g, H−1 = diag

[x2

1, . . . , x2n

]• The matrix −AH−1AT is known as Schur complement of H (any H)


Lecture 16

Network Flow Optimization

minimize∑n

l=1 φl(xl)

subject to Ax = b

• Directed graph with n arcs and p + 1 nodes

• Variable xl: flow through arc l

• Cost φl: cost flow function for arc l, with φ′′l (t) > 0

• Node-incidence matrix A ∈ R(p+1)×n defined as

Ail =

1 arc l originates at node i

−1 arc l ends at node i

0 otherwise

• Reduced node-incidence matrix A ∈ Rp×n is A with last row removed

• b ∈ Rp is (reduced) source vector

• Rank A = p when the graph is connected


Lecture 16

KKT system for infeasible Newton’s method

[H AT

A 0

] [d

w

]= −

[g

h

]where h = Ax− b is a measure of infeasibility at the current point x

• g = [φ′1(x1), . . . , φn(xn)]T

• H = diag [φ′′1(x1), . . . , φ′′n(xn)] with positive diagonal entries

• Solve via elimination:

w = (AH−1AT)−1[h−AH−1g], d = −H−1(g + ATw)


Lecture 16

Interior-Point Methods

For solving inequality constrained problems of the form

minimize f(x)

subject to gj(x) ≤ 0, j = 1, . . . , m

Ax = b

• The interior-point methods have been extensively studied since early

60’s as a sub-class of penalty methods [penalizing the convex inequality

constraints gj(x), j = 1, . . . , m]

• Log-barrier is just one of the choices for penalty (convergence established

by Fiacco and McCormick in 1965)

• Newton’s method combined with Log-barrier penalty was analyzed by

Karmarkar SSSR in 1984

• As a new polynomial-time algorithm for linear programming problems

• Nesterov and Nemirovski in 1994 provided a new analysis of Karmarkar’s

approach for general convex optimization problem


Lecture 16

Inequality Constrained Minimization

minimize f(x)

subject to gj(x) ≤ 0, j = 1, . . . , m

Ax = b• Assumptions:

• The functions f and gj are convex and twice continuously differen-

tiable (their domains are open)

• The matrix A ∈ Rp×n has rank p

• The optimal value f ∗ is finite and attained

• No duality gap and a dual optimal (µ∗, λ∗) ∈ Rm × Rp exists x with

gj(x) < 0, j = 1, . . . , m, Ax = b, x ∈ domf,

gj(x) ≤ 0, j = 1, . . . , m, Ax = b x ∈ relintdomf


Lecture 16

KKT ConditionsFor the inequality constrained problem:

minimize f(x)

subject to gj(x) ≤ 0, j = 1, . . . , m

Ax = b

Under the assumption just stated, x∗ is primal optimal if and only if there

exist µ∗ and λ∗ such that the following KKT conditions hold:

• Primal feasibility: Ax∗ = b, gj(x∗) ≤ 0 for all j

• Dual feasibility: µ∗ � 0

• Lagrangian optimality in x: ∇f(x∗)+∑m

j=1 µ∗j∇gj(x∗)+ATλ∗ = 0

• Complementarity slackness: µ∗jgj(x∗) = 0 for all j

The interior-point method solves these conditions

Our focus is on the barrier type method


Lecture 16

Logarithmic Barrier FunctionBased on reformulation of the constrained problem via indicator

function:minimize f(x) +

∑mj=1 I−(gj(x))

subject to Ax = b

where I− is the indicator function of nonpositive reals:

I−(u) = 0 when u ≤ 0, and I−(u) = ∞ otherwise

• Consider a point-wise approximation of I− by a logarithmic barrier

φt(u) = −1tln(−u) with t > 0

• Thus: limt→∞ φt(u) = 0 for u < 0, limt→∞ φt(u) = +∞ otherwise

• With t > 0, φt(u) = −1tln(−u) is a smooth approximation of I−,

improving as t →∞• Using this approximation, we have a family of equality constrained

problemsminimize f(x)− 1

t

∑mj=1 ln(−gj(x))

subject to Ax = b


Lecture 16

Family of Equality Constrained ProblemsConsider the problems with t > 0:

minimize f(x)− 1t


subject to Ax = b

• Can be viewed as a sequence of penalized problems approximating the

original problem

• The (penalty) parameter t drives the approximation accuracy: as t

increases the approximation is more accurate

• Each of the approximate problems is convex and differentiable

• The function φ(x) = −∑m

j=1 ln(−gj(x)) is referred to as logarithmic

barrier or log barrier

• Its domain is domφ = {x ∈ Rn | gj(x) < 0, j = 1, . . . , m}


Lecture 16

Central PathAn equivalent family of problems:

minimize tf(x) + φ(x)

subject to Ax = b(1)

with φ(x) = −∑m

j=1 ln(−gj(x))

• The problem (1) has the same minimizers as when divided by t

• Assume a unique optimal x∗(t) exists for each t > 0. When is this true?

• The set {x∗(t) | t > 0} is referred to as central path for problem (1)

• Each point on the central path is characterized by the following (neces-

sary and sufficient conditions):

• Strict feasibility: Ax∗(t) = b and gj(x∗(t)) < 0 for all j

• There exists λ(t) such that: t∇f(x∗(t))+∇φ(x∗(t))+AT λ(t) = 0

where

∇φ(x) = −m∑

j=1

1

gj(x)∇gj(x)


Lecture 16

Important Property of Central PathFor an x with g(x) ≺ 0, we have x = x∗(t) if and only if there exists a

λ(t) such that

∇f(x) +m∑

j=1

1

−tgj(x)∇gj(x) +

1

tAT λ(t) = 0, Ax = b

• Therefore, x∗(t) minimizes the Lagrangian

L(x, µ∗(t), λ∗(t)) = f(x) +m∑

j=1

µ∗j(t)gj(x) + λ∗(t)T(Ax− b)

where µ∗j(t) = − 1tgj(x∗(t))

and λ∗(t) = λ(t)t

• Note that L(x∗(t), µ∗(t), λ∗(t)) = f(x∗(t))− mt

• This confirms the intuitive idea that f(x∗(t)) → f ∗ as t →∞:

f(x∗(t))−m

t= L(x∗(t), µ∗(t), λ∗(t)) = q(µ∗(t), λ∗(t)) ≤ f ∗

• Provides a lower estimate for f ∗ (can serve as stopping criterion)


Lecture 16

Interpretation via KKT ConditionsThe vectors x, µ and λ are primal-dual optimal if and only if they satisfy

• Primal feasibility: gj(x) ≤ 0, j = 1, . . . , m, Ax = b

• Dual feasibility: µ � 0

• Lagrangian optimality in x:

∇f(x) +m∑

j=1

µj∇gj(x) + ATλ = 0

• Complementary slackness: µTg(x) = −mt

The difference with the KKT conditions for the original inequality con-

strained problem is that the last condition replaces µTg(x) = 0

The last condition can be viewed as “approximate” complementary

slackness for the original problem, which converges to the “original”

complementary slackness as t →∞


Lecture 16

KKT Conditions

Original problem:minimize f(x)

subject to g(x) � 0, Ax = b

x∗ is optimal iff there is (µ∗, λ∗)

• Ax∗ = b, g(x∗) � 0, µ∗ � 0

• ∇f(x∗) +∑m

j=1 µ∗j∇gj(x∗) +

ATλ∗ = 0

• (µ∗)T g(x∗) = 0

Penalized problem:minimize f(x)− 1

t


subject to Ax = b

x∗t is optimal iff there is λ:

• Ax∗t = b, g(x∗t) ≺ 0

• ∇f(x∗t) +∑m

j=11

−tgj(x∗t )∇gj(x∗t) +

AT λt = 0

• Letting µjt = 1−tgj(x

∗t )

for all j, we

see that x∗t and (µt, λt) satisfy KKTs

for the original problem with approx-

imate CS condition µTt g(x∗t) = −m

t

Furthermore f ∗ ≤ f(x∗t) ≤ f ∗ + mt


Lecture 16

Central Path for LP

minimize cTx

subject to aTj x ≤ bj, j = 1, . . . , m

• The logarithmic barrier is given by

φ(x) = −∑m

j=1 ln(bj − aT

j x)

• The gradient and Hessian of the barrier function are

∇φ(x) =∑m

j=11

bj−aTj x

aj = ATd for d ∈ Rm with dj = 1bj−aT

j x

∇2φ(x) =∑m

j=11(

bj−aTj x

)2 ajaTj = ATdiag[d]2A

• Since d � 0 for x ∈ dom φ, the Hessian is nonsingular iff rank(A) = n

• The centrality condition reduces to: tc +∑m

j=11

bj−aTj x

aj = 0

• Interpretation: at x∗(t) on the central path

the gradient ∇φ(x∗(t)) is parallel to −c, i.e., the hyperplane cTx =

cTx∗(t) is tangent to the level set of φ through x∗(t)


Lecture 16

Barrier Method

Given a strictly feasible x, t := t0 > 0, β > 1, and a tolerance ε > 0.

Repeat

1. Centering step. Compute x∗(t) solving minAz=b {tf(z) + φ(z)}

2. Update. Set x := x∗(t).

3. Stopping criterion. Quit if m/t < ε.

4. Increase penalty. Set t := βt.


Lecture 16

• It terminates with f(x)− f ∗ ≤ ε [follows from f(x∗(t))− f ∗ ≤ m/t]

• Centering steps are viewed as outer iterations

• Inside centering step, computing x∗(t) is usually done using Newton’s

method starting at current x; these are inner iterations

• The methods of this form are also referred as sequential minimizationtechniques

• Choice of β involves a trade-off: large β means fewer outer iterations,

more inner (Newton) iterations

• Choosing t large so as to have m/t < ε in one iteration causes problem

with points close to the boundary of the feasible set: Hessian is unstable

∇2φ(x) =m∑

i=1

1

gj(x)2∇gj(x)∇gj(x)

T +m∑

i=1

1

−gj(x)∇2gj(x)


Lecture 16

Convergence Analysis: Outer Iterations

Number of outer (centering) iterations: to determine x(t) such that

f(x(t))− f ∗ ≤ ε, we need K such that

m

βKt0≤ ε

Thus, the number K of outer iterations for accuracy ε is given exactly by

ln

(mt0ε

)lnβ


Lecture 16

Convergence Analysis: Initialization and

Inner IterationsTo complete the analysis of the method, we need to address

• The initial step of the method:

• Determining a strictly feasible starting point x

x ∈ domf, Ax = b, gj(x) < 0, j = 1, . . . , m

• The centering step: solving subproblems of the form

minimize tf(z) + φ(z)

subject to Az = b

• The functions tf + φ must have closed level sets for all t ≥ t0

• Analysis via self-concordance requires self-concordance of tf + φ

[thus, three-times differentiable tf + φ]


Lecture 16

Feasibility and Phase I Methods

• The barrier method requires a strictly feasible starting point x

• When such a point is not available, the method is preceded by a

preliminary stage (part of the initialization)

• Phase I: computing a strictly feasible point or finding that the

constraints are infeasible

• The point found in Phase I serves as a starting iteration in the method

• The later stage is referred as Phase II of the method

• Basic approach for determining a feasible x:

Minimizing the maximum infeasibility

• Many variations of this basic approach exists, among them:

Minimizing the sum of infeasibilities


Lecture 16

Feasibility: Minimizing the Maximum Infeasibility

Basic phase I approachminimize s

subject to gj(x) ≤ s, j = 1, . . . , m

Ax = b

(2)

NOTE:• The minimization is with respect to (x, s) with s ∈ R• The domain of f has to be taken into account [added in constraints]

Let domf = Rn and f ∗ be the optimal value of the feasibility problem• The feasibility problem is strictly feasible: apply barrier method• When (x, s) with s < 0 is feasible for problem (1), x is strictly feasible

in original problem• When f ∗ > 0, the original problem is infeasible• When f ∗ = 0 and attained, the original problem is feasible [not strictly]• When f ∗ = 0 and not attained, the original problem is infeasible

Note: In practice, we cannot exactly determine that f ∗ = 0;

typically, the feasibility algorithm terminates with |f ∗| ≤ η


Lecture 16

Alternative Approach:

Minimize the Sum of Infeasibilities

minimize 1Ts

subject to gj(x) ≤ sj, j = 1, . . . , m

Ax = b, s � 0• For a fixed x, the optimal value of sj is max{gj(x),0} - indeed

maximizing the sum of infeasibilities

• The optimal value is f ∗ = 0 and attained if and only if the originalproblem is feasible

• Interesting property when the original problem is infeasible:

• The approach produces a solution that satisfies many more constraints

than the basic phase I approach [minimizing the max infeasibility]

• Thus, it identifies a larger subset of constraints that are feasibleNOTE:

• However, this model produces a feasible point but NOT a strictly feasible


Lecture 16

Analysis of the Inner Iterations

Analysis Assumption:

• The feasible level sets of the original problem are bounded

• The function tf + φ is closed and self-concordant for all t ≥ t0

When f and gj are linear or quadratic, the function tf −∑m

j=1 ln(−gj) is

self-concordant for all t ≥ 0 (LPs, QPs and QCQPs)

NOTE:

The barrier method works well whether or not self-concordance is present


Lecture 16

Inner Iterations

The iterations within centering step: accuracy fixed to εc

• Start with x(t), a solution to minAz=b{tf(z) + φ(z)}

• Use Newton’s method and backtracking line search with parameters

α = 1, βc ∈ (0,1) and σ ∈ (0,1/2)

• Obtain x(βt), an εc-solution to the problem minAz=b{βtf(z) + φ(z)}Recall: to minimize a closed (self-concordant function f , the number of

Newton’s iterations, with backtracking line search and starting iterate x0,

is bounded byf(x0)− f ∗

Γ+ κ

with Γ = σβc(1−2σ)2

20−8σand κ = log2 log2

(1εc

)We apply this to the function βtf(z) + φ(z)


Lecture 16

Bound on the Number of Inner IterationsUnder the level set boundedness and self-concordance assumption on func-

tions tf + φ, t ≥ t0, the number of Newton’s iterations within a centering

step is estimated as follows:• Under unrealistic assumptions of exact solutions: x = x∗(t) and xβ =

x∗(βt) optimal for the penalized problems with penalties t and βt resp.

• We have: # Newton’s steps ≤ βtf(x)+φ(x)−βtf(xβ)−φ(xβ)

Γ + κ

with Γ = σβc(1−2σ)2

20−8σand κ = log2 log2

(1εc

)Therefore:

numerator = βtf(x)− βtf(xβ) +m∑

j=1

ln[−βtµjgj(xβ)]−m lnβ

≤ βtf(x)− βtf(xβ)− βtm∑

j=1

µjgj(xβ)−m−m lnβ

≤ βtf(x)− βtq(µ, λ)−m−m lnβ

= βtf(x)− βtf(x) + mβ −m−m lnβ

= m(β − 1− lnβ)


Lecture 16

A Total Bound on Newton’s IterationsCombining the bound on total number of Newton’s iterations:

N ≤

ln

(mt0ε

)lnβ

⌈m(β − 1− lnβ)

Γ+ κ

⌉

• The bound depends strongly on the parameter β

• The function β − 1 − lnβ is almost quadratic for small β, and grows

linearly for large β

• Hence, for a small number of inner iterations, it is recommendable to

use a smaller β [the number of inner iterations can be large for large β]

• The bound does not depend on n and p (size of x and # of rows in A)

• When β = 1 + 1√m

the bound on the total number of Newton’s steps

N ≤(

12Γ + κ

) ⌈√m log2

(mt0ε

)⌉


Lecture 16

Pros and Cons

Advantages

• Interior Point Method is successfully applied in practice

• Works very well for moderately large problems

• Even for nonconvex (issues with “trapped” at a local minimum)

Limitations

• High requirement on differentiability of f and gjs

• The method “is inefficient” when the size of the problem is very large

(n in millions) and the Hessians are not sparse

• Not suitable for “distributed” computations in general


Lecture 16

ExampleImage Reconstruction in PET-scan [Ben-Tal, 2005]

• Maximum Likelihood Model results in convex optimization

minx≥0, e′x≤1

−m∑

i=1

yi ln

n∑j=1

pijxj

• x is a decision vector - size n

• y models measured data (by PET detectors) - size m

• pij probabilities modeling detections of emitted positrons

• The number n of decision variables ranges from 1/2− 3 millions

• The number m of data variables ranges from 3− 25 millions

• The Hessians are not sparse


Lecture 16

Interior Point Method can be Inefficient

• For the PET Imaging problem [Ben-Tal, 2005]

• Image resolution 64 × 64 × 64 and n = 262,144, the CPU time per

Newton’s iteration is about 2.5 hours

• Image resolution 128×128×128 and n = 2,097,152 the CPU time

per Newton’s iteration is more than 13 days

• This motivates a re-newed interest in gradient-type methods

• The complexity of a gradient-type step is linear in n

• In practice, their accuracy is not high, BUT in large scale problems “high

accuracy” is not required

• Gradient-type methods are well suited for medium-accuracy solutions

in extremely large convex problems


Documents

Lecture 16 Interior-Point Method - University Of Illinoisangelia/L16_interiormethod.pdf · Lecture 16 Outline • Review of Self-concordance • Overview of Newton’s Methods for