Conic Programming

8/12/2019 Conic Programming

1/182

Linear Conic Programming

Yinyu Ye

December 2004


2/182

i


3/182

ii


4/182


5/182

iv PREFACE

Stanford, 2002


6/182

Chapter 1

Introduction andPreliminaries

1.1 Introduction

Semidefinite Programming, hereafter SDP, is a natural extension of Linear pro-gramming (LP) that is a central decision model in Management Science andOperations Research. LP plays an extremely important role in the theory andapplication of Optimization. In one sense it is a continuous optimization prob-lem in minimizing a linear objective function over a convex polyhedron; but itis also a combinatorial problem involving selecting an extreme point among afinite set of possible vertices. Businesses, large and small, use linear program-ming models to optimize communication systems, to schedule transportationnetworks, to control inventories, to adjust investments, and to maximize pro-ductivity.

In LP, the variables form a vector which is regiured to be nonnegative, wherein SDP they are components of a matrix and it is constrained to be positivesemidefinite. Both of them may have linear equality constraints as well. Onething in common is that interior-point algorithms developed in past two decadesfor LP are naturally applied to solving SDP.

Interior-point algorithms are continuous iterative algorithms. Computationexperience with sophisticated procedures suggests that the number of iterationsnecessarily grows much more slowly than the dimension grows. Furthermore,they have an established worst-case polynomial iteration bound, providing the

potential for dramatic improvement in computation effectiveness.The goal of the monograph is to provide a text book for teaching Semidefinite

Programming, a modern Linear Programming decision model and its applica-tions in other scientific and engineering fields. One theme of the monograph isthe mapping between SDP and LP, so that the reader, with knowledge of LP,can understand SDP with little effort.

The monograph is organized as follows. In Chapter 1, we discuss some

1


7/182

2 CHAPTER 1. INTRODUCTION AND PRELIMINARIES

necessary mathematical preliminaries. We also present several decision andoptimization problems and several basic numerical procedures used throughoutthe text.

Chapter 2 is devoted to studying the theories and geometries of linearand matrix inequalities, convexity, and semidefinite programming. Almost allinterior-point methods exploit rich geometric properties of linear and matrixinequalities, such as center, volume, potential, etc. These geometries arealso helpful for teaching, learning, and research.

Chapter 3 is focused on interior-point algorithms. Here, we select two typesalgorithms: the path-following algorithm and the potential reduction algorithm.Each algorithm has three forms, the primal, the dual and the primal-dual form.We analyze the worst-case complexity bound for them, where we will use thereal number computation model in our analysis because of the continuous natureof interior-point algorithms. We also compare the complexity theory with the

convergence rate used in numerical analysis.Not only has the convergnece speed of SDP algorithms been significantly

improved during the last decade, but also the problem domain applicable bySDP has dramatically widened. Chapters 4, 5, and 6 would describe someof SDP applications and new established results in Engineering, CombinatoryOptimization, Robust Optimization, Euclidean Geometry Computation, etc.

Finally, we discuss major computational issues in Chapter 7. We discuss sev-eral effective implementation techniques frequently used in interior-point SDPsoftware, such as the sparse linear system, the predictor and corrector step, andthe homogeneous and self-dual formulation. We also present major difficultiesand challengies faced by SDP.

1.2 Mathematical Preliminaries

This section summarizes mathematical background material for linear algebra,linear programming, and nonlinear optimization.

1.2.1 Basic notations

The notation described below will be followed in general. There may be somedeviation where appropriate.

ByRwe denote the set of real numbers.R+ denotes the set of nonnegativereal numbers, and

R+denotes the set of positive numbers. For a natural number

n, the symbolRn

(Rn+,

Rn+) denotes the set of vectors with n components in

R (R+,R+). We call

Rn+ the interior ofRn+.The vector inequalityx ymeansxj yj forj = 1, 2,...,n. Zero represents

a vector whose entries are all zeros and e represents a vector whose entriesare all ones, where their dimensions may vary according to other vectors inexpressions. A vector is always considered as a column vector, unless otherwisestated. Upper-case letters will be used to represent matrices. Greek letters will


8/182

1.2. MATHEMATICAL PRELIMINARIES 3

typically be used to represent scalars. For convenience, we sometime write acolumn vector x as

x= (x1; x2; . . . ; xn)

and a row vector asx= (x1, x2, . . . , xn).

Addition of matrices and multiplication of matrices with scalars are stan-dard. The superscript T denotes transpose operation. The inner product inRn is defined as follows:

x, y :=xTy=n

j=1

xj yj for x, y Rn.

Thel2 norm of a vector x is given by

x2= xTx,and thel norm is

x = max{|x1|, |x2|, ..., |xn|}.In general, thep norm is

xp =

n1

|xj |p1/p

, p= 1, 2,...

The dual of the p norm, denoted by., is the qnorm, where1p

+1q

= 1.

In this book,.generally represents thel2 norm.For natural numbersm andn, Rmn denotes the set of real matrices with m

rows and n columns. ForA Rmn, we assume that the row index set ofA is{1, 2,...,m} and the column index set is {1, 2,...,n}. Theith row ofAis denotedbyai. and the jth column ofA is denoted by a.j ; the i and jth component ofA is denoted by aij . IfIis a subset of the row index set and J is a subset ofthe column index set, then AIdenotes the submatrix ofA whose rows belongtoI, AJdenotes the submatrix ofAwhose columns belong to J, AIJ denotesthe submatrix ofA induced by those components ofA whose indices belong to

I and J, respectively.The identity matrix is denoted by I. The null space ofA is denotedN(A)and the range ofAisR(A). The determinant of an n n-matrixA is denotedby det(A). The trace ofA, denoted by tr(A), is the sum of the diagonal entriesin A. The operator norm ofA, denoted byA, is

A2 := max0=xRn

Ax2x2 .


9/182


For a vectorx R

n, Dx

represents a diagonal matrix inR

n

n whose diag-

onal entries are the entries ofx, i.e.,

Dx = diag(x).

A matrix Q Rnn is said to be positive definite (PD), denoted by Q 0,if

xTQx >0, for all x = 0,and positive semi-definite (PSD), denoted by Q 0, if

xTQx 0, for all x.IfQ 0, thenQis called negative definite (ND), denoted by Q 0; ifQ 0,thenQ is called negative semi-definite (NSD), denoted by Q 0. If Q issymmetric, then its eigenvalues are all real numbers; furthermore, Q is PSD ifand only if all its eigenvalues are non-negative, and Q is PD if and only if all itseigenvalue are positive. Given a PD matrixQ we can define a Q-norm ,.Q,for vector x as

xQ=

xTQx .

Mn denotes the space of symmetric matrices inRnn. The inner productin Mn is defined as follows:

X, Y :=X Y = trXTY =i,j

Xi,j Yi,j for X, Y Mn.

This is a generalization of the vector inner product to matrices. The matrix

norm associated with the inner product is called Frobenius norm:Xf =

trXTX .

Mn+ denote the set of positive semi-definite matrices inMn.Mn+ denotes the

set of positive definite matrices inMn. We call

Mn+ the interior ofMn+.{xk}0 is an ordered sequence x0, x1, x2,...,xk,.... A sequence{xk}0 is

convergent to x, denoted xk x, ifxk x 0.

A pointx is a limit point of{xk}0 if there is a subsequence of{xk}convergenttox.

If g(x) 0 is a real valued function of a real nonnegative variable, thenotation g(x) =O(x) means that g(x)cx for some constant c; the notationg(x) = (x) means thatg(x) cxfor some constantc; the notationg(x) = (x)means thatcx g(x) cx. Another notation is g(x) =o(x), which means thatg(x) goes to zero faster than x does:

limx0

g(x)

x = 0.


10/182


1.2.2 Convex sets

Ifx is a member of the set , we write x ; ify is not a member of , wewritey. The union of two setsS and T is denoted S T; the intersectionof them is denotedS T. A set can be specified in the form = {x: P(x)}asthe set of all elements satisfying property P.

For y Rn and >0, B(y, ) ={x : x y } is the ball or sphere ofradius with centery. In addition, for a positive definite matrixQ of dimensionn, E(y, Q) = {x: (x y)TQ(x y) 1} is called an ellipsoid. The vector y isthe center ofE(y, Q).

A set is closed ifxk x, where xk , implies x. A set is openif around every point y there is a ball that is contained in , i.e., there isan > 0 such that B(y, ) . A set is bounded if it is contained within aball with finite radius. A set is compact if it is both closed and bounded. The

(topological) interior of any set , denoted , is the set of points in whichare the centers of some balls contained in . The closure of , denoted, isthe smallest closed set containing . The boundary of is the part of that

is not in.

A set C is said to be convex if for every x1, x2 Cand every real number, 0< 0. A cone that is alsoconvex is a convex cone. For a cone C E, the dual ofC is the cone

C := {y : x, y 0 for all x C},

where

,

is an inner product operation for spaceE

.

Example 1.1 Then-dimensional non-negative orthant,Rn+={x Rn : x0}, is a convex cone. The dual of the cone is also Rn+; it is self-dual.

Example 1.2 The set of all positive semi-definite matrices inMn,Mn+, is aconvex cone, called thepositive semi-definite matrix cone. The dual of the coneis alsoMn+; it is self-dual.

Example 1.3 The set{(t; x) Rn+1 : t x} is a convex cone inRn+1,called thesecond-order cone. The dual of the cone is also the second-order coneinRn+1; it is self-dual.

A coneC is (convex) polyhedral ifCcan be represented by

C= {x: Ax 0}

for some matrix A (Figure 1.1).

Example 1.4 The non-negative orthant is a polyhedral cone, and neither thepositive semi-definite matrix cone nor the second-order cone is polyhedral.


11/182


12/182


A bounded polyhedron is called polytope.LetPbe a polyhedron in

Rn,Fis a face ofPif and only if there is a vector

c for which F is the set of points attaining max{cTx : x P} provided thethis maximum is finite. A polyhedron has only finite many faces; each face is anonempty polyhedron.

The most important theorem about the convex set is the following separatingtheorem (Figure 1.3).

Theorem 1.1 (Separating hyperplane theorem) Let C E, whereE is eitherRn orMn, be a closed convex set and lety be a point exterior to C. Then thereis a vectora Esuch that

a, y < infxC

a, x.The geometric interpretation of the theorem is that, given a convex set C

and a point y outside ofC, there is a hyperplane containing y that contains Cin one of its open half spaces.

C

ay

Figure 1.3: Illustration of the separating hyperplane theorem; an exterior pointy is separated by a hyperplane from a convex set C.

Example 1.5 LetCbe a unit circle centered at the point(1; 1). That is, C={x R2 : (x1 1)2 + (x2 1)2 1}. Ify = (2;0), a= (1;1) is a separatinghyperplane vector. Ify = (0; 1), a = (0;1) is a separating hyperplane vector.It is worth noting that these separating hyperplanes are not unique.

We use the notationEto represent eitherRn orMn, depending on the con-text, throughout this book, because all our decision and optimization problemstake variables from one or both of these two vector spaces.

1.2.3 Real functions

The real function f(x) is said to be continuous at x ifxk x implies f(xk) f(x). The real functionf(x) is said to be continuous on set E, where recallthatE is eitherRn orMn, iff(x) is continuous atx for every x .


13/182


A functionf(x) is called homogeneous of degree k iff(x) = kf(x) for all

0.

Example 1.6 Let c Rn be given andxRn+. ThencTx is homogeneous of

degree1 and

P(x) =n log(cTx) n

j=1

log xj

is homogeneous of degree0, where log is the natural logarithmic function. Let

C Mn be given andX

Mn+. ThenxTCx is homogeneous of degree2, C Xanddet(X) are homogeneous of degree1, and

P(X) =n log(C X) log det(X)is homogeneous of degree0.

A set of real-valued function f1, f2,...,fm defined onEcan be written as asingle vector function f = (f1, f2,...,fm)

T Rm. If f has continuous partialderivatives of order p, we say f Cp. The gradient vector or matrix of areal-valued function f C1 is a vector or matrix

f(x) = {f/xij}, for i, j = 1,...,n.Iff C2, we define the Hessian offto be then2-dimensional symmetric matrix

2f(x) =

2f

xij xkl

for i ,j,k,l= 1,...,n.

Iff= (f1, f2,...,fm)T

Rm, the Jacobian matrix off is

f(x) = f1(x)...

fm(x)

.

fis a (continuous) convex function if and only if for 0 1,f(x + (1 )y) f(x) + (1 )f(y).

fis a (continuous) quasi-convex function if and only if for 0 1,f(x + (1 )y) max[f(x), f(y)].

Thus, a convex function is a quasi-convex function. The level set off is given

byL(z) = {x: f(x) z}.

f is a quasi-convex function implies that the level set of f is convex for anygiven z (see Exercise 1.9).

A group of results that are used frequently in analysis are under the headingofTaylors theorem or the mean-value theorem. The theorem establishes thelinear and quadratic approximations of a function.


14/182


Theorem 1.2 (Taylor expansion) Letf C1 be in a region containing the linesegment [x, y]. Then there is a, 0

1, such that

f(y) =f(x) + f(x + (1 )y)(y x).

Furthermore, iff C2 then there is a, 0 1, such that

f(y) =f(x) + f(x)(y x) + (1/2)(y x)T2f(x + (1 )y)(y x).

We also have several propositions for real functions. The first indicates thatthe linear approximation of a convex function is a under-estimate.

Proposition 1.3 Let f C1. Then f is convex over a convex set if andonly if

f(y) f(x) + f(x)(y x)for allx, y .

The following proposition states that the Hessian of a convex function ispositive semi-definite.

Proposition 1.4 Let f C2. Then f is convex over a convex set if andonly if the Hessian matrix off is positive semi-definite throughout.

1.2.4 Inequalities

There are several important inequalities that are frequently used in algorithmdesign and complexity analysis.

Cauchy-Schwarz: given x, y Rn, then

xTy xy.

Arithmetic-geometric mean: givenx Rn+,xj

n

xj

1/n.

Harmonic: given x Rn+,

xj

1/xj

n2.

Hadamard: given A Rmn with columns a1, a2,...,an, thendet(ATA)

aj .


15/182


1.3 Some Basic Decision and Optimization Prob-

lemsA decision or optimization problem has a form that is usually characterized bythe decision variables and the constraints. A problem,P, consists of two sets,data setZp and solution setSp. In general,Sp can be implicitly defined by theso-called optimality conditions. The solution set may be empty, i.e., problemPmay have no solution.

Theorem 1.5 Weierstrass theorem A continuous functionfdefined on a com-pact set (bounded and closed) Ehas a minimizer in; that is, there is anx such that for allx , f(x) f(x).

In what follows, we list several decision and optimization problems. Moreproblems will be listed later when we address them.

1.3.1 System of linear equations

Given A Rmn and b Rm, the problem is to solve m linear equations for nunknowns:

Ax= b.

The data and solution sets are

Zp = {A Rmn, b Rm} and Sp = {x Rn :Ax = b}.Sp in this case is an affine set. Given an x, one can easily check to see ifx is inSp by a matrix-vector multiplication and a vector-vector comparison. We saythat a solution of this problem is easy to recognize.

To highlight the analogy with the theories of linear inequalities and linearprogramming, we list several well-known results of linear algebra. The firsttheorem provides two basic representations, the null and row spaces, of a linearsubspaces.

Theorem 1.6 Each linear subspace ofRn is generated by finitely many vectors,and is also the intersection of finitely many linear hyperplanes; that is, for eachlinear subspace ofL ofRn there are matricesA andCsuch thatL=N(A) =R(C).

The following theorem was observed by Gauss. It is sometimes called thefundamental theoremof linear algebra. It gives an example of a characterizationin terms of necessary and sufficient conditions, where necessity is straightfor-ward, and sufficiency is the key of the characterization.

Theorem 1.7 LetA Rmn and b Rm. The system{x : Ax = b} has asolution if and only if thatATy= 0 impliesbTy= 0.

A vector y, with ATy = 0 and bTy = 1, is called an infeasibility certificate forthe system{x: Ax= b}.Example 1.7 Let A = (1; 1) and b = (1;1). Then, y = (1/2; 1/2) is aninfeasibility certificate for{x: Ax= b}.


16/182

1.3. SOME BASIC DECISION AND OPTIMIZATION PROBLEMS 11

1.3.2 Linear least-squares problem

Given A Rmn and c Rn, the system of equations ATy =c may be over-determined or have no solution. Such a case usually occurs when the number ofequations is greater than the number of variables. Then, the problem is to findany Rm ors R(AT) such thatATy cors c is minimized. We canwrite the problem in the following format:

(LS) minimize ATy c2subject to y Rm,

or(LS) minimize s c2

subject to s R(AT).In the former format, the term

ATy

c

2 is called the objective function,

y is called the decision variable. Since y can be any point inRm, we say this(optimization) problem is unconstrained. The data and solution sets are

Zp = {A Rmn, c Rn}and

Sp = {y Rm : ATy c2 ATx c2 for every x Rm}.Given ay, to see ify Spis as the same as the original problem. However, froma projection theorem in linear algebra, the solution set can be characterized andrepresented as

Sp = {y Rm :AATy= Ac},which becomes a system of linear equations and always has a solution. Thevector s = ATy = AT(AAT)+Ac is the projection ofc onto the range of AT,whereAAT is called normal matrixand (AAT)+ is called pseudo-inverse. IfAhas full row rank then (AAT)+ = (AAT)1, the standard inverse of full rankmatrix AAT. IfA is not of full rank, neither is AAT and (AAT)+AATx = xonly for x R(AT).

The vector c ATy = (I AT(AAT)+A)c is the projection of c onto thenull space ofA. It is the solution of the following least-squares problem:

(LS) minimize x c2subject to x N(A).

In the full rank case, both matrices AT(AAT)1A and I

AT(AAT)1A arecalledprojection matrices. These symmetric matrices have several desired prop-erties (see Exercise 1.15).

1.3.3 System of linear inequalities

Given A Rmn and b Rm, the problem is to find a solution x Rnsatisfying Ax b or prove that the solution set is empty. The inequality


17/182


problem includes other forms such as finding an x that satisfies the combinationof linear equations Ax = b and inequalities x

0. The data and solution sets

of the latter are

Zp = {A Rmn, b Rm} and Sp = {x Rn :Ax = b, x 0}.

Traditionally, a point inSp is called a feasible solution, and a strictly positivepoint inSp is called a strictly feasibleor interior feasible solution.

The following results are Farkas lemma and its variants.

Theorem 1.8 (Farkas lemma) LetA Rmn andb Rm. Then, the system{x : Ax = b, x 0} has a feasible solution x if and only if that ATy 0impliesbTy 0.

A vector y , with ATy0 and bTy= 1, is called a (primal) infeasibility certifi-cate for the system{x: Ax= b, x 0}. Geometrically, Farkas lemma meansthat if a vector b Rm does not belong to the cone generated by a.1,...,a.n,then there is a hyperplane separating b from cone(a.1,...,a.n).

Example 1.8 Let A = (1, 1) and b =1. Then, y =1 is an infeasibilitycertificate for{x: Ax= b, x 0}.

Theorem 1.9 (Farkas lemma variant) LetA Rmn andc Rn. Then, thesystem{y : ATy c} has a solutiony if and only if that Ax = 0 and x 0implycTx 0.

Again, a vectorx 0, withAx = 0 andcTx= 1, is called a (dual) infeasibilitycertificate for the system{y: ATy c}.

Example 1.9 LetA= (1; 1) andc= (1; 2). Then, x= (1;1) is an infeasi-bility certificate for{y : ATy c}.

We say {x: Ax= b, x 0} or {y: ATy c} isapproximately feasiblein thesense that we have an approximate solution to the equations and inequalities.In this case we can show that any certificate proving their infeasibility musthave large norm. Conversely, if{x : Ax = b, x 0} or{y : ATy c} isapproximately infeasible in the sense that we have an approximate certificatein Farkas lemma, then any feasible solution must have large norm.

Example 1.10 Given > 0 but small. Let A = (1, 1) and b =. Then,x= (0; 0)is approximately feasible for{x: Ax= b, x 0}, and the infeasibilitycertificatey= 1/ has a large norm.

LetA= (1; 1) andc= (1; 1 ). Then, y = 1 is approximately feasiblefor{y : ATy c}, and the infeasibility certificate x = (1/; 1/) has a largenorm.


18/182


1.3.4 Linear programming (LP)

Given A Rmn, b Rm and c , l ,u Rn, the linear programming (LP)problem is the following optimization problem:

minimize cTxsubject to Ax= b, l x u,

where some elements in l may be meaning that the associated variablesare unbounded from below, and some elements in u may be meaning thatthe associated variables are unbounded from above. If a variable is unboundedeither from below or above, then it is called a free variable

The standard form linear programming problem is given below, which wewill use throughout this book:

(LP) minimize c

T

xsubject to Ax= b, x 0.The linear function cTx is called the objective function, and x is called thedecision variables. In this problem, Ax = b and x 0 enforce constraintsonthe selection ofx. The setFp ={x : Ax = b, x 0} is called feasible setorfeasible region. A point x Fp is called a feasible point, and a feasible pointx is called an optimal solutionifcTxcTx for all feasible points x. If thereis a sequence{xk} such that xk is feasible and cTxk , then (LP) is saidto beunbounded.

The data and solution sets for (LP), respectively, are

Zp = {A Rmn, b Rm, c Rn}and

Sp = {x Fp : cTx cTy, for every y Fp}.Again, given an x, to see if x Sp is as difficult as the original problem.However, due to the duality theorem, we can simplify the representation of thesolution set significantly.

With every (LP), another linear program, called the dual (LD), is the fol-lowing problem:

(LD) maximize bTysubject to ATy+ s= c, s 0,

wherey Rm ands Rn. The components ofs are called dual slacks. DenotebyFd the sets of all (y, s) that are feasible for the dual. We see that (LD) isalso a linear programming problem where y is a free vector.

The following theorems give us an important relation between the two prob-lems.

Theorem 1.10 (Weak duality theorem) LetFp andFd be non-empty. Then,cTx bTy where x Fp, (y, s) Fd.


19/182


This theorem shows that a feasible solution to either problem yields a boundon the value of the other problem. We call cTx

bTy the duality gap. From

this we have important results.

Theorem 1.11 (Strong duality theorem) LetFp andFd be non-empty. Then,x is optimal for (LP) if and only if the following conditions hold:

i) x Fp;

ii) there is(y, s) Fd;

iii) cTx= bTy.

Theorem 1.12 (LP duality theorem) If (LP) and (LD) both have feasible solu-

tions then both problems have optimal solutions and the optimal objective valuesof the objective functions are equal.

If one of (LP) or (LD) has no feasible solution, then the other is eitherunbounded or has no feasible solution. If one of (LP) or (LD) is unboundedthen the other has no feasible solution.

The above theorems show that if a pair of feasible solutions can be found tothe primal and dual problems with equal objective values, then these are bothoptimal. The converse is also true; there is no gap. From this condition, thesolution set for (LP) and (LD) is

Sp = (x,y,s) (Rn+, Rm, Rn+) :cTx bTy = 0

Ax = bATy s = c , (1.1)which is a system of linear inequalities and equations. Now it is easy to verifywhether or not a pair (x,y,s) is optimal.

For feasible x and (y, s), xTs = xT(cATy) = cTxbTy is called thecomplementarity gap. If xTs = 0, then we say x and s are complementary toeach other. Since both x and s are nonnegative, xTs= 0 implies that xj sj = 0for allj = 1, . . . , n. Thus, one equation plus nonnegativity are transformed inton equations. Equations in (1.1) become

Dxs = 0Ax = b

ATy s = c.(1.2)

This system has total 2n+ m unknowns and 2n+ m equations including nnonlinear equations.

The following theorem plays an important role in analyzing LP interior-point algorithms. It give a unique partition of the LP variables in terms ofcomplementarity.


20/182


Theorem 1.13 (Strict complementarity theorem) If (LP) and (LD) both havefeasible solutions then both problems have a pair of strictly complementary so-lutionsx 0 ands 0 meaning

Xs = 0 and x+ s > 0.

Moreover, the supports

P = {j : xj >0} and Z = {j : sj >0}are invariant for all pairs of strictly complementary solutions.

Given (LP) or (LD), the pair ofP and Z is called the (strict)complemen-tarity partition. {x : APxP = b, xP 0, xZ = 0} is called the primaloptimal face, and

{y : cZ

ATZy

0, cP

ATPy = 0

} is called the dual

optimal face.Selectm linearly independent columns, denoted by the index set B , fromA.

Then matrix AB is nonsingular and we may uniquely solve

ABxB =b

for the m-vectorxB. By setting the variables, xN, of x corresponding to theremaining columns ofA equal to zero, we obtain a solution x such that

Ax= b.

Then,x is said to be a (primal)basic solutionto (LP) with respect to the basisAB. The components ofxB are calledbasic variables. A dual vectory satisfying

ATBy= cB

is said to be the corresponding dual basic solution. If a basic solution x 0,then x is called a basic feasible solution. If the dual solution is also feasible,that is

s= c ATy 0,then x is called an optimal basic solution and AB an optimal basis. A basicfeasible solution is a vertex on the boundary of the feasible region. An optimalbasic solution is an optimal vertex of the feasible region.

If one or more components in xB has value zero, that basic solution x issaid to be (primal) degenerate. Note that in a nondegenerate basic solution thebasic variables and the basis can be immediately identified from the nonzerocomponents of the basic solution. If all components, sN, in the correspondingdual slack vector s, except for sB, are non-zero, then y is said to be (dual)nondegenerate. If both primal and dual basic solutions are nondegenerate, ABis called a nondegenerate basis.

Theorem 1.14 (LP fundamental theorem) Given (LP) and (LD) whereA hasfull row rankm,


21/182


i) if there is a feasible solution, there is a basic feasible solution;

ii) if there is an optimal solution, there is an optimal basic solution.

The above theorem reduces the task of solving a linear program to thatsearching over basic feasible solutions. By expanding upon this result, the sim-plex method, a finite search procedure, is derived. The simplex method is toproceed from one basic feasible solution (an extreme point of the feasible region)to an adjacent one, in such a way as to continuously decrease the value of theobjective function until a minimizer is reached. In contrast, interior-point algo-rithms will move in the interior of the feasible region and reduce the value ofthe objective function, hoping to by-pass many extreme points on the boundaryof the region.

1.3.5 Quadratic programming (QP)GivenQ Rnn,A Rmn,b Rm andc Rn , the quadratic programming(QP) problem is the following optimization problem:

(QP) minimize q(x) := (1/2)xTQx + cTxsubject to Ax= b, x 0.

We may denote the feasible set byFp. The data and solution sets for (QP) areZp = {Q Rnn, A Rmn, b Rm, c Rn}

andSp = {x Fp : q(x) q(y) for every y Fp}.

A feasible point x is called a KKTpoint, where KKT stands for Karush-Kuhn-Tucker, if the following KKT conditions hold: there exists ( y Rm, sRn) such that (x, y, s) is feasible for the following dual problem:

(QD) maximize d(x, y) :=bTy (1/2)xTQxsubject to ATy+ s Qx= c, x, s 0,

and satisfies the complementarity condition

(x)Ts = (1/2)(x)TQx+ cTx (bTy (1/2)(x)TQx = 0.Similar to LP, we can write the KKT condition as:

(x,y,s)

(

Rn+,

Rm,

Rn+)

andDxs = 0

Ax = bATy+ Qx s = c.

(1.3)

Again, this system has total 2n + munknowns and 2n + mequations includingn nonlinear equations.


22/182

1.4. ALGORITHMS AND COMPUTATIONS 17

The above condition is also called thefirst-order necessary condition. IfQ ispositive semi-definite, then x

is an optimal solution for (QP) if and only ifx

is a KKT point for (QP). In this case, the solution set for (QP) is characterizedby a system of linear inequalities and equations. One can see (LP) is a specialcase of (QP).

1.4 Algorithms and Computations

An algorithm is a list of instructions to solve a problem. For every instance ofproblemP, i.e., for every given data Z Zp, an algorithm for solvingPeitherdetermines thatSp is empty or generates an output x such that x Sp orx isclose toSp in certain measure. The latter x is called an approximatesolution.

Let us useAp to denote the collection of all possible algorithm for solvingevery instance inP. Then, the (operation) complexity of an algorithm A Apfor solving an instanceZ Zp is defined as the total arithmetic operations: +,,, /, and comparison on real numbers. Denote it by co(A, Z). Sometimes itis convenient to define the iteration complexity, denoted by ci(A, Z), where weassume that each iteration costs a polynomial number (in m and n) of arith-metic operations. In most iterative algorithms, each iteration can be performedefficiently both sequentially and in parallel, such as solving a system of linearequations, rank-one updating the inversion of a matrix, pivoting operation of amatrix, multiplying a matrix by a vector, etc.

In the real number model, we introduce , the error for an approximatesolution as a parameter. Let c(A,Z,) be the total number of operations ofalgorithmA for generating an -approximate solution, with a well-defined mea-sure, to problem

P. Then,

c(A, ) := supZZp

c(A,Z,) fA(m,n,) for any > 0.

We call this complexity model error-based. One may also view an approximatesolution an exact solution to a problem-near toPwith a well-defined measurein the data space. This is the so-calledbackward analysismodel in numericalanalysis.

If fA(m,n,) is a polynomial in m, n, and log(1/), then algorithm A isa polynomial algorithm and problemP is polynomially solvable. Again, iffA(m,n,) is independent of and polynomial in m and n, then we say al-gorithm A is a strongly polynomial algorithm. If fA(m,n,) is a polynomialin m, n, and (1/), then algorithm A is a polynomial approximation schemeor pseudo-polynomial algorithm . For some optimization problems, the com-plexity theory can be applied to prove not only that they cannot be solvedin polynomial-time, but also that they do not have polynomial approximationschemes. In practice, approximation algorithms are widely used and acceptedin practice.

Example 1.11 There is a strongly polynomial algorithm for sorting a vector in


23/182


descending or ascending order, for matrix-vector multiplication, and for com-puting the norm of a vector.

Example 1.12 Consider the bisection method to locate a root of a continuousfunction f(x) :R R within interval [0, 1], where f(0) > 0 and f(1) < 0.The method calls the oracle to evaluate f(1/2) (counted as one operation). Iff(1/2)> 0, we throw away[0, 1/2); iff(1/2)< 0, we throw away(1/2, 1]. Thenwe repeat this process on the remaining half interval. Each step of the methodhalves the interval that contains the root. Thus, inlog(1/)steps, we must havean approximate root whose distance to the root is less than . Therefore, thebisection method is a polynomial algorithm.

We have to admit that the criterion of polynomiality is somewhat controver-sial. Many algorithms may not be polynomial but work fine in practice. Thisis because polynomiality is built upon the worst-case analysis. However, thiscriterion generally provides a qualitative statement: if a problem is polynomialsolvable, then the problem is indeed relatively easy to solve regardless of thealgorithm used. Furthermore, it is ideal to develop an algorithm with bothpolynomiality and practical efficiency.

1.4.1 Convergence rate

Most algorithms are iterative in nature. They generate a sequence of ever-improving points x0, x1,...,xk,... approaching the solution set. For many opti-mization problems and/or algorithms, the sequence will never exactly reach thesolution set. One theory of iterative algorithms, referred to as local or asymp-totic convergence analysis, is concerned with the rate at which the optimality

error of the generated sequence converges to zero.Obviously, if each iteration of competing algorithms requires the same amount

of work, the speed of the convergence of the error reflects the speed of the algo-rithm. This convergence rate, although it may hold locally or asymptotically,provides evaluation and comparison of different algorithms. It has been widelyused by the nonlinear optimization and numerical analysis community as an ef-ficiency criterion. In many cases, this criterion does explain practical behaviorof iterative algorithms.

Consider a sequence of real numbers {rk} converging to zero. One can defineseveral notions related to the speed of convergence of such a sequence.

Definition 1.1 . Let the sequence{rk} converge to zero. The order of conver-gence of

{rk

}is defined as the supermum of the nonnegative numbersp satisfying

0 limsupk

|rk+1||rk|p < .

Definition 1.2 . Let the sequence{rk} converge to zero such that

limsupk

|rk+1||rk|2 < .


24/182

1.5. BASIC COMPUTATIONAL PROCEDURES 19

Then, the sequence is said to converge quadratically to zero.

It should be noted that the order of convergence is determined only by theproperties of the sequence that holds ask . In this sense we might say thatthe order of convergence is a measure of how good the tail of{rk} is. Largevalues ofp imply the faster convergence of the tail.

Definition 1.3 . Let the sequence{rk} converge to zero such that

limsupk

|rk+1||rk| =


25/182


1.5.1 Gaussian elimination method

Probably the best-known algorithm for solving a system of linear equations isthe Gaussian elimination method. Suppose we want to solve

Ax= b.

We may assume a11= 0 after some row switching, where aij is the componentofA in row i and column j . Then we can subtract appropriate multiples of thefirst equation from the other equations so as to have an equivalent system:

a11 A1.0 A

x1

x

=

b1

b

.

This is a pivot step, where a11 is called a pivot, and A is called a Schur com-

plement. Now, recursively, we solve the system of the lastm 1 equations forx. Substituting the solutionx found into the first equation yields a value forx1. The last process is called back-substitution.

In matrix form, the Gaussian elimination method transformsAinto the form U C

0 0

whereU is a nonsingular, upper-triangular matrix,

A= L

U C

0 0

,

and L is a nonsingular, lower-triangular matrix. This is called the LU-decomposition.Sometimes, the matrix is transformed further to a form D C

0 0

where D is a nonsingular, diagonal matrix. This whole procedure uses aboutnm2 arithmetic operations. Thus, it is a strong polynomial-time algorithm.

1.5.2 Choleski decomposition method

Another useful method is to solve the least squares problem:

(LS) minimize AT

y c.The theory says that y minimizesATy c if and only if

AATy = Ac.

So the problem is reduced to solving a system of linear equations with a sym-metric semi-positive definite matrix.


26/182

1.5. BASIC COMPUTATIONAL PROCEDURES 21

One method is Choleskis decomposition. In matrix form, the method trans-formsAAT into the form

AAT =LLT,

where L is a lower-triangular matrix and is a diagonal matrix. (Such atransformation can be done in aboutnm2 arithmetic operations as indicated inthe preceding section.) Lis called the Choleski factorofAAT. Thus, the abovelinear system becomes

LLTy= Ac,

andy can be obtained by solving two triangle systems of linear equations.

1.5.3 The Newton method

The Newton method is used to solve a system of nonlinear equations: givenf(x) : Rn Rn, the problem is to solve n equations forn unknowns such that

f(x) = 0.

The idea behind Newtons method is to use the Taylor linear approximation atthe current iteratexk and let the approximation be zero:

f(x) f(xk) + f(xk)(x xk) = 0.

The Newton method is thus defined by the following iterative formula:

xk+1 =xk (f(xk))1f(xk),

where scalar 0 is called step-size. Rarely, however, is the Jacobian matrixf inverted. Generally the system of linear equations

f(xk)dx = f(xk)

is solved and xk+1 = xk +dx is used. The direction vector dx is called aNewton step, which can be carried out in strongly polynomial time.

A modified or quasi Newton method is defined by

xk+1 =xk

Mkf(xk),

whereMk is an n nsymmetric matrix. In particular, ifMk =I, the methodis called thegradient method, wherefis viewed as the gradient vector of a realfunction.

The Newton method has a superior asymptotic convergence order equal 2forf(xk). It is frequently used in interior-point algorithms, and believed tobe the key to their effectiveness.


27/182


1.5.4 Solving ball-constrained linear problem

The ball-constrained linear problem has the following form:

(BP) minimize cTxsubject to Ax= 0,x2 1,

or(BD) minimize bTy

subject to ATy2 1.x minimizes (BP) if and only if there always exists a y such that they satisfy

AATy= Ac,

and ifc ATy= 0 thenx =

(c

ATy)/

c

ATy

;

otherwise any feasible x is a solution. The solution y for (BD) is given asfollows: Solve

AATy= b,

and if y= 0 then sety = y/ATy;

otherwise any feasibley is a solution. So these two problems can be reduced tosolving a system of linear equations.

1.5.5 Solving ball-constrained quadratic problem

The ball-constrained quadratic problem has the following form:

(BP) minimize (1/2)xTQx + cTxsubject to Ax= 0,x2 1,

or simply(BD) minimize (1/2)yTQy+ bTy

subject to y2 1.This problem is used by the classical trust region method for nonlinear opti-mization. The optimality conditions for the minimizery of (BD) are

(Q + I)y = b, 0, y2 1, (1 y2) = 0,and

(Q + I) 0.These conditions are necessary and sufficient. This problem can be solved inpolynomial time log(1/) and log(log(1/)) by the bisection method or a hybridof the bisection and Newton methods, respectively. In practice, several trustregion procedures have been very effective in solving this problem.

The ball-constrained quadratic problem will be used an a sub-problem byseveral interior-point algorithms in solving complex optimization problems. Wewill discuss them later in the book.


28/182

1.6. NOTES 23

1.6 Notes

The term complexity was introduced by Hartmanis and Stearns [155]. Alsosee Garey and Johnson [118] and Papadimitriou and Steiglitz [248]. The N Ptheory was due to Cook [70] and Karp [180]. The importance ofPwas observedby Edmonds [88].

Linear programming and the simplex method were introduced by Dantzig[73]. Other inequality problems and convexity theories can be seen in Gritz-mann and Klee [141], Grotschel, Lovasz and Schrijver [142], Grunbaum [143],Rockafellar [264], and Schrijver [271]. Various complementarity problems can befound found in Cottle, Pang and Stone [72]. The positive semi-definite program-ming, an optimization problem in nonpolyhedral cones, and its applications canbe seen in Nesterov and Nemirovskii [241], Alizadeh [8], and Boyd, Ghaoui,Feron and Balakrishnan [56]. Recently, Goemans and Williamson [125] ob-

tained several breakthrough results on approximation algorithms using positivesemi-definite programming. The KKT condition for nonlinear programming wasgiven by Karush, Kuhn and Tucker [195].

It was shown by Klee and Minty [184] that the simplex method is not apolynomial-time algorithm. The ellipsoid method, the first polynomial-time al-gorithm for linear programming with rational data, was proven by Khachiyan[181]; also see Bland, Goldfarb and Todd [52]. The method was devised inde-pendently by Shor [277] and by Nemirovskii and Yudin [239]. The interior-pointmethod, another polynomial-time algorithm for linear programming, was devel-oped by Karmarkar. It is related to the classical barrier-function method studiedby Frisch [109] and Fiacco and McCormick [104]; see Gill, Murray, Saunders,Tomlin and Wright [124], and Anstreicher [21]. For a brief LP history, see the

excellent article by Wright [323].The real computation model was developed by Blum, Shub and Smale [53]

and Nemirovskii and Yudin [239]. Other complexity issues in numerical opti-mization were discussed in Vavasis [318].

Many basic numerical procedures listed in this chapter can be found inGolub and Van Loan [133]. The ball-constrained quadratic problem and itssolution methods can be seen in More [229], Sorenson [281], and Dennis andSchnable [76]. The complexity result of the ball-constrained quadratic problemwas proved by Vavasis [318] and Ye [332].

1.7 Exercises

1.1 Let Q Rnn be a given nonsingular matrix, and a and b be givenRnvectors. Show

(Q + abT)1 =Q1 11 + bTQ1a

Q1abTQ1.

This formula is called theSherman-Morrison-Woodbury formula.


29/182


1.2 Prove that the eigenvalues of all matrices Q Mnn are real. Further-more, show that Q is PSD if and only if all its eigenvalues are non-negative,andQ is PD if and only if all its eigenvalues are positive.

1.3 Using the ellipsoid representation in Section 1.2.2, find the matrixQ andvectory that describes the following ellipsoids:

1. The3-dimensional sphere of radius 2 centered at the origin;

2. The2-dimensional ellipsoid centered at(1;2) that passes the points(0;2),(1; 0), (2; 2), and(1;4);

3. The2-dimensional ellipsoid centered at(1;2)with axes parallel to the liney= x andy = x, and passing through(1;0), (3; 4), (0;3), and(2;1).

1.4 Show that the biggest coordinate-aligned ellipsoid that is entirely containedinRn+ and has its center atxa

Rn+ can be written as:E(xa) = {x Rn : (Xa)1(x xa) 1}.

1.5 Show that the non-negative orthant, the positive semi-definite cone, andthe second-order cone are all self-dual.

1.6 Consider the convex setC= {x R2 : (x1 1)2 + (x2 1)2 1} and lety R2. Assumingy C,

1. Find the point inC that is closest to y;

2. Find a separating hyperplane vector as a function ofy.

1.7 Using the idea of Exercise 1.6, prove the separating hyperplane theorem1.1.

1.8 Given anm nmatrixA and a vectorc Rn, consider the functionB(y) =nj=1log sj wheres = c ATy > 0. FindB(y) and2B(y)in terms ofs.

Given C Mn, Ai Mn, i = 1, , m, and b Rm, consider thefunctionB(y) := log det(S), whereS=C

mi=1

yiAi 0. FindB(y)and2B(y) in terms ofS.The best way to do this is to use the definition of the partial derivative

f(y)i = lim0

f(y1, y2,...,yi+ ,...,ym) f(y1, y2,...,yi,...,ym)

.

1.9 Prove that the level set of a quasi-convex function is convex.

1.10 Prove Propositions 1.3 and 1.4 for convex functions in Section 1.2.3.


30/182

1.7. EXERCISES 25

1.11 Letf1, . . . , f m be convex functions. Then, the functionf(x)defined belowis also convex:

max

i=1,...,mfi(x)

m

i=1

fi(x)

1.12 Prove the Harmonic inequality described in Section 1.2.4.

1.13 Prove Farkas lemma 1.7 for linear equations.

1.14 Prove the linear least-squares problem always has a solution.

1.15 LetP =AT(AAT)1A orP =I AT(AAT)1A. Then prove1. P =P2.

2. P is positive semi-definite.

3. The eigenvalues ofP are either0 or1.

1.16 Using the separating theorem, prove Farkas lemmas 1.8 and 1.9.

1.17 If a systemATy cof linear inequalities inm variables has no solution,show thatATy c has a subsystem (A)Ty c of at most m+ 1 inequalitieshaving no solution.

1.18 Prove the LP fundamental theorem 1.14.

1.19 If (LP) and (LD) have a nondegenerate optimal basisAB, prove that thestrict complementarity partition in Theorem 1.13 is

P = B.

1.20 If Q is positive semi-definite, prove that x is an optimal solution for(QP) if and only ifx is a KKT point for (QP).

1.21 ProveX S 0 if bothX andSare positive semi-definite matrices.1.22 Prove that two positive semi-definite matrices are complementary to each

other, X

S= 0, if and only ifXS= 0.

1.23 Let both (LP) and (LD) for a given data set(A,b,c)have interior feasiblepoints. Then consider the level set

(z) = {y : c ATy 0, z+ bTy 0}wherez < z andz designates the optimal objective value. Prove that(z) isbounded and has an interior for any finitez < z, evenFd is unbounded.


31/182


1.24 Given an (LP) data set (A,b,c) and an interior feasible point x0, findthe feasible direction d

x (Ad

x = 0) that achieves the steepest decrease in the

objective function.

1.25 Given an (LP) data set(A,b,c)and a feasible point(x0, y0, s0) (Rn+, Rm, Rn+)for the primal and dual, and ignoring the nonnegativity condition, write the sys-tems of linear equations used to calculate the Newton steps for finding points thatsatisfy the optimality equations (1.2) and (1.3), respectively.

1.26 Show the optimality conditions for the minimizery of (BD) in Section1.5.5:

(Q + I)y= b, 0, y 1, (1 y) = 0,and

(Q + I) 0,are necessary and sufficient.


32/182

Chapter 2

Semidefinite Programming

2.0.1 Semi-definite programming (SDP)

Given C Mn, Ai Mn, i = 1, 2,...,m, and b Rm, the semi-definiteprogramming problem is to find a matrixX Mn for the optimization problem:

(SDP) inf C Xsubject to Ai X=bi, i= 1, 2,...,m, X 0.

Recall that theoperation is the matrix inner product

A B:= trATB.

The notationX 0 means thatXis a positive semi-definite matrix, and X 0means that X is a positive definite matrix. If a point X 0 and satisfies allequations in (SDP), it is called a (primal) strictly or interior feasible solution. .

The dual problem to (SDP) can be written as:

(SDD) sup bTysubject to

mi yiAi+ S= C, S 0,

which is analogous to the dual (LD) of LP. Here y Rm and S Mn. Ifa point (y, S 0) satisfies all equations in (SDD), it is called a dual interiorfeasible solution.

Example 2.1 LetP(y Rm) = C+mi yiAi, whereCandAi,i = 1, . . . , m,are given symmetric matrices. The problem of minimizing the max-eigenvalueofP(y) can be cast as a (SDD) problem.

In semi-definite programming, we minimize a linear function of a matrix inthe positive semi-definite matrix cone subject to affine constraints. In contrastto the positive orthant cone of linear programming, the positive semi-definite

27


33/182

28 CHAPTER 2. SEMIDEFINITE PROGRAMMING

matrix cone is non-polyhedral (or non-linear), but convex. So positive semi-definite programs are convex optimization problems. Semi-definite program-ming unifies several standard problems, such as linear programming, quadraticprogramming, and convex quadratic minimization with convex quadratic con-straints, and finds many applications in engineering, control, and combinatorialoptimization.

From Farkas lemma for linear programming, a vector y , with ATy 0 andbTy = 1, always exists and is called a ( primal) infeasibility certificate for thesystem{x : Ax = b, x 0}. But this does not hold for matrix equations inthe positive semi-definite matrix cone.

Example 2.2 Consider

A1= 1 00 0 , A2=

0 11 0

and

b=

02

where we have the following matrix system

A1 X= 0, A2 X= 2, X M2+.The problem is that{y: yi = Ai X, i= 1, 2,...,m, X 0}is not a closed set.

Similary,

C=

0 11 0

and A1=

1 00 0

makes C A1y1 infeasible but it does not have an infeasible certificate.

We have several theorems analogous to Farkas lemma.Theorem 2.1 (Farkas lemma in SDP) LetAi Mn,i = 1,...,m, have rankm(i.e.,

mi yiAi = 0 impliesy= 0) andb Rm. Then, there exists a symmetric

matrixX 0 withAi X=bi, i= 1,...,m,

if and only ifm

i yiAi 0 andm

i yiAi= 0 impliesbTy 0.In other words, an X 0, X= 0,

Ai X= 0, i= 1,...,m,and C X 0 proves thatmi yiAi C is impossible. Note the differencebetween the LP and SDP.

The weak duality theorem for SDP is identical to that of (LP) and (LD).


34/182

29

Corollary 2.3 (Weak duality theorem in SDP) LetFp andFd, the feasible setsfor the primal and dual, be non-empty. Then,

C X bTy where X Fp, (y, S) Fd.But we need more to make the strong duality theorem hold.

Theorem 2.4 (Strong duality theorem in SDP) LetFp andFd be non-emptyand at least one of them has an interior. Then, X is optimal for (PS) if andonly if the following conditions hold:

i) X Fp;ii) there is (y, S) Fd;iii) C X=bTy orX S= 0.Again note the difference between the above theorem and the strong duality

theorem for LP.Example 2.3 The following SDP has a duality gap:

C=

0 1 01 0 0

0 0 0

, A1=

0 0 00 1 0

0 0 0

, A1=

0 1 01 0 0

0 0 2

and

b=

010

.

Two positive semi-definite matrices are complementary to each other,XS=0, if and only ifXS= 0 (Exercise 1.22). From the optimality conditions, thesolution set for certain (SDP) and (SDD) is

Sp = {X Fp, (y, S) Fd : C X bTy= 0},or

Sp = {X Fp, (y, S) Fd : X S= 0},which is a system of linear matrix inequalities and equations.

In general, we have

Theorem 2.5 (SDP duality theorem) If one of (SDP) or (SDD) has a strictlyor interior feasible solution and its optimal value is finite, then the other isfeasible and has the same optimal value. If one of (SDP) or (SDD) is unboundedthen the other has no feasible solution.

Note that a duality gap may exists if neither (SDP) nor (SDD) has a strictlyfeasible point. This is in contrast to (LP) and (LD) where no duality gap exists

if both are feasible.Although semi-definite programs are much more general than linear pro-

grams, they are not much harder to solve. It has turned out that most interior-point methods for LP have been generalized to semi-definite programs. As inLP, these algorithms possess polynomial worst-case complexity under certaincomputation models. They also perform well in practice. We will describe suchextensions later in this book.


35/182


2.1 Analytic Center

2.1.1 AC for polytope

Let be a bounded polytope inRm represented byn (> m) linear inequalities,i.e.,

= {y Rm :c ATy 0},whereA Rmn and c Rn are given and A has rank m. Denote the interiorof by

= {y Rm :c ATy >0}.

Define

d(y) =n

j=1(cj aTjy), y ,

where a.j is the jth column ofA. Traditionally, we let s := c ATy and callit a slack vector. Thus, the function is the product of all slack variables. Itslogarithm is called the (dual)potential function,

B(y) := log d(y) =n

j=1

log(cj aT.j y) =n

j=1

log sj , (2.1)

and B(y) is the classical logarithmic barrier function. For convenience, in whatfollows we may writeB(s) to replaceB(y) where s is always equal to c ATy.Example 2.4 LetA = (1,1)andc = (1;1). Then the set of is the interval[1, 1]. LetA = (1,1, 1) andc = (1;1;1). Then the set of is also theinterval [

1, 1]. Note that

d(1/2) = (3/2)(1/2) = 3/4 and B(1/2) = log(3/4),and

d(1/2) = (3/)(1/2)(1/2) = 3/8 and B(1/2) = log(3/8).The interior point, denoted by ya and sa =c ATya, in that maximizes

the potential function is called the analytic centerof , i.e.,

B() := B(ya, ) = maxy

log d(y, ).

(ya, sa) is uniquely defined, since the potential function is strictly concave in a

bounded convex. Setting

B(y, ) = 0 and letting xa = (Sa)1e, the analytic

center (ya, sa) together with xa satisfy the following optimality conditions:

Xs = eAx = 0

ATy s = c.(2.2)

Note that adding or deleting a redundant inequality changes the location of theanalytic center.


36/182

2.1. ANALYTIC CENTER 31

Example 2.5 Consider ={yR :y0, y1}, which is interval [0, 1].The analytic center isya = 1/2 withxa = (2, 2)T.

Consider

= {y R:n times

y 0, , y 0, y 1},which is, again, interval [0, 1] but y 0 is copied n times. The analyticcenter for this system is ya = n/(n+ 1) with xa = ((n+ 1)/n, , (n+1)/n, (n + 1))T.

The analytic center can be defined when the interior is empty or equalitiesare presented, such as

= {y Rm :c ATy 0, By = b}.Then the analytic center is chosen on the hyperplane

{y: By = b

}to maximize

the product of the slack variables s = c ATy. Thus, the interior of is notused in the sense that the topological interior for a set is used. Rather, it refersto the interior of the positive orthant of slack variables:Rn+ :={s : s 0}.When say has an interior, we mean that

Rn+{s: s= c ATy for some y where B y= b} = .

Again

Rn+:={s Rn+ : s >0}, i.e., the interior of the orthantRn+. Thus, if has only a single pointy with s = c ATy >0, we still say is not empty.Example 2.6 Consider the system = {x: Ax= 0, eTx= n, x 0}, whichis calledKarmarkars canonical set. Ifx = e is in thene is the analytic centerof, the intersection of the simplex{x: eTx= n, x0} and the hyperplane{x: Ax= 0} (Figure 2.1).

2.1.2 AC for SDP

Let be a bounded convex set in Rm represented byn (> m) a matrix inequal-ity, i.e.,

= {y Rm :Cmi

yiAi 0, }.

LetS= C

mi yiAi and

B(y) := log det(S)) = log det(C mi

yiAi). (2.3)

The interior point, denoted by y a and Sa =Cmi yaiAi, in that maxi-mizes the potential function is called the analytic centerof , i.e.,

maxy

B(y).


37/182


x

x

x

1

2

3

(3,0,0)

(0,3,0)

(0,0,3)

(1,1,1).

Ax=0

Figure 2.1: Illustration of the Karmarkar (simplex) polytope and its analyticcenter.

(ya, Sa) is uniquely defined, since the potential function is strictly concave in a

bounded convex. Setting B(y, ) = 0 and lettingXa = (Sa)1, the analytic

center (ya, Sa) together with Xa satisfy the following optimality conditions:

XS = IAX = 0

ATy

S =

C.

(2.4)

2.2 Potential Functions for LP and SDP

We show how potential functions can be defined to solve linear programmingproblems and semi-definite programming.

We assume that for a given LP data set (A,b,c), both the primal and dualhave interior feasible point. We also let z be the optimal value of the standardform (LP) and (LD). Denote the feasible sets of (LP) and (LD) by Fp andFd,respectively. Denote byF= Fp Fd, and the interior ofF by

F.

2.2.1 Primal potential function for LP

Consider the level set

(z) = {y Rm : c ATy 0, z+ bTy 0}, (2.5)where z < z. Since both (LP) and (LD) have interior feasible point for given(A,b,c), (z) is bounded and has an interior for any finite z, even :=Fd isunbounded (Exercise 1.23). Clearly, (z) , and ifz2 z1, (z2) (z1)and the inequalityz+ bTy is translated from z= z1 toz = z2.


38/182

2.2. POTENTIAL FUNCTIONS FOR LP AND SDP 33

From the duality theorem again, finding a point in (z) has a homogeneousprimal problem

minimize cTx zx0s.t. Ax bx0= 0, (x, x0) 0.

For (x, x0) satisfying

Ax bx0= 0, (x, x0)> 0,

letx := x/x0Fp, i.e.,

Ax= b, x >0.

Then, the primal potential function for (z) (Figure 2.2), as described in thepreceding section, is

P(x, (z)) = (n + 1) log(cTx zx0) n

j=0

log xj

= (n + 1)log(cTx z) n

j=1

log xj =: Pn+1(x, z).

The latter,Pn+1(x, z), is the Karmarkar potential function in the standard LPform with a lower bound z forz.

ya ya

b T y = z

b T y = bTy = b y a

The objective hyperplane The updated objective hyperplane

Figure 2.2: Intersections of a dual feasible region and the objective hyperplane;bTy z on the left and bTy bTya on the right.

One algorithm for solving (LD) is suggested in Figure 2.2. If the ob jectivehyperplane is repeatedly translated to the analytic center, the sequence of new


39/182


analytic centers will converge to an optimal solution and the potentials of thenew polytopes will decrease to

.

As we illustrated before, one can represent (z) differently:

(z) = {y: c ATy 0, times

z+ bTy 0, , z+ bTy 0}, (2.6)

i.e., z+bTy0 is copied times. Geometrically, this representation doesnot change (z), but it changes the location of its analytic center. Since the last inequalities in (z) are identical, they must share the same slack value andthe same corresponding primal variable. Let (x, x0) be the primal variables.Then the primal problem can be written as

minimize cTx

times zx0 zx0s.t. Ax

times bx0 bx0= 0, (x, x0) 0.

Let x = x/(x0)Fp. Then, the primal potential function for the new (z)

given by (2.6) is

P(x, (z)) = (n + )log(cTx z(x0)) n

j=1

log xj log x0

= (n + )log(cTx z) n

j=1log xj+ log

=: Pn+(x, z) + log .

The function

Pn+(x, z) = (n + )log(cTx z) n

j=1

log xj (2.7)

is an extension of the Karmarkar potential function in the standard LP formwith a lower bound z for z. It represents the volume of a coordinate-alignedellipsoid whose intersection withA(z) containsS(z), where z + bTy 0 isduplicated times.

2.2.2 Dual potential function for LP

We can also develop a dual potential function, symmetric to the primal, for

(y, s) FdBn+(y,s,z) = (n + )log(z bTy)

nj=1

log sj , (2.8)


40/182

2.2. POTENTIAL FUNCTIONS FOR LP AND SDP 35

wherez is a upper bound ofz. One can show that it represents the volume of acoordinate-aligned ellipsoid whose intersection with the affine set

{x: Ax= b

}contains the primal level set

{x Fp : times

cTx z 0, , cTx z 0},where cTx z 0 is copied times (Exercise 2.1). For symmetry, we maywriteBn+(y,s,z) simply byBn+(s, z), since we can always recover y from susing equation ATy= c s.

2.2.3 Primal-dual potential function for LP

A primal-dual potential function for linear programming will be used later. For

x Fp and (y, s) Fd it is defined by

n+(x, s) := (n + )log(xTs)

nj=1

log(xj sj ), (2.9)

where 0.We have the relation:

n+(x, s) = (n + )log(cTx bTy)

nj=1

log xjn

j=1

log sj

=

Pn+(x, b

Ty)

n

j=1 log sj= Bn+(s, cTx)

nj=1

log xj .

Sincen+(x, s) = log(x

Ts) + n(x, s) log(xTs) + n log n,then, for > 0, n+(x, s) implies that xTs 0. More precisely, wehave

xTs exp( n+(x, s) n log n

).

We have the following theorem:

Theorem 2.6 Define the level set

() := {(x,y,s) F: n+(x, s) }.i)

(1) (2) if 1 2.


41/182


ii)

() = {(x,y,s) F : n+(x, s)< }.

iii) For every, () is bounded and its closure() has non-empty intersec-tion with the solution set.

Later we will show that a potential reduction algorithm generates sequences

{xk, yk, sk} F such thatn+

n(x

k+1, yk+1, sk+1) n+n(xk, yk, sk) .05for k = 0, 1, 2,.... This indicates that the level sets shrink at least a constantrate independently ofmor n.

2.2.4 Potential function for SDP

The potential functions for SDP of Section 2.0.1 are analogous to those for LP.For given data, we assume that both (SDP) and (SDD) have interior feasible

points. Then, for any X Fp and (y, S)Fd, the primal potential function is

defined by

Pn+(X, z) := (n + )log(C X z) log det(X), z z;the dual potential function is defined by

Bn+(y,S,z) := (n + )log(z bTy) log det(S), z z,

where 0 and z designates the optimal objective value.For X Fp and (y, S)

Fd the primal-dual potential function for SDP is

defined by

n+(X, S) := (n + )log(X S) log(det(X) det(S))= (n + )log(C X bTy) log det(X) log det(S)= Pn+(X, bTy) log det(S)= Bn+(S, C X) log det(X),

where 0. Note that if X and S are diagonal matrices, these definitionsreduce to those for LP.

Note that we still have (Exercise 2.2)

n+(X, S) = log(X S) + n(X, S) log(X S) + n log n.Then, for >0, n+(X, S) implies thatX S 0. More precisely, wehave

X S exp(n+(X, S) n log n

).

We also have the following corollary:


42/182

2.3. CENTRAL PATHS OF LP AND SDP 37

Corollary 2.7 Let (SDP) and (SDD) have non-empty interior and define thelevel set

() := {(X,y,S) F: n+(X, S) }.i)

(1) (2) if 1 2.

ii)

() = {(X,y,S) F : n+(X, S)< }.

iii) For every, () is bounded and its closure() has non-empty intersec-tion with the solution set.

2.3 Central Paths of LP and SDP

Many interior-point algorithms find a sequence of feasible points along a cen-tral path that connects the analytic center and the solution set. We now presentthis one of the most important foundations for the development of interior-pointalgorithms.

2.3.1 Central path for LP

Consider a linear program in the standard form (LP) and (LD). Assume thatF= , i.e., both

Fp= and

Fd= , and denote z the optimal objective value.

The central path can be expressed as

C =

(x,y,s) F: Xs = xTs

n e

in the primal-dual form. We also see

C =

(x,y,s) F: n(x, s) = n log n

.

For any > 0 one can derive the central path simply by minimizing theprimal LP with a logarithmic barrier function:

(P) minimize cTx

nj=1log xj

s.t. Ax= b, x

0.

Let x() Fp be the (unique) minimizer of (P). Then, for some y Rm itsatisfies the optimality conditions

Xs = eAx = b

ATy s = c.(2.10)


43/182


Consider minimizing the dual LP with the barrier function:

(D) maximize bTy+ nj=1log sjs.t. ATy+ s= c, s 0.

Let (y(), s()) Fd be the (unique) minimizer of (D). Then, for some x Rnit satisfies the optimality conditions (2.10) as well. Thus, both minimizers x()and (y(), s()) are on the central path with x()Ts() = n.

Another way to derive the central path is to consider again the dual levelset (z) of (2.5) for any z < z (Figure 2.3).

Then, the analytic center (y(z), s(z)) of (z) and a unique point(x(z), x0(z)) satisfies

Ax(z) bx0(z) = 0, X(z)s= e, s= c ATy, andx 0(z)(bTy z) = 1.

Letx(z) = x(z)/x0(z), then we haveAx(z) = b, X(z)s(z) = e/x0(z) = (b

Ty(z) z)e.Thus, the point (x(z), y(z), s(z)) is on the central path with = bTy(z) zand cTx(z) bTy(z) =x(z)Ts(z) =n(bTy(z) z) =n. As we proved earlierin Section 2.2, (x(z), y(z), s(z)) exists and is uniquely defined, which imply thefollowing theorem:

Theorem 2.8 Let both (LP) and (LD) have interior feasible points for thegiven data set (A,b,c). Then for any 0 <


44/182


Theorem 2.9 Let(x(), y(), s()) be on the central path.

i) The central path point(x(), s()) is bounded for0< 0 and any given0< 0 < .

ii) For0< < ,

cTx()< cTx() and bTy()> bTy().

iii) (x(), s())converges to an optimal solution pair for (LP) and (LD). More-over, the limit point x(0)P is the analytic center on the primal optimalface, and the limit points(0)Z is the analytic center on the dual optimalface, where (P, Z) is the strict complementarity partition of the indexset{1, 2,...,n}.

Proof. Note that

(x(0) x())T(s(0) s()) = 0,

since (x(0)x()) N(A) and (s(0)s()) R(AT). This can be rewrittenas

nj

s(0)j x()j+ x(

0)j s()j

=n(0 + ) 2n0,

ornj

x()jx(0)j

+ s()js(0)j

2n.

Thus, x() and s() are bounded, which proves (i).We leave the proof of (ii) as an exercise.Since x() and s() are both bounded, they have at least one limit point

which we denote by x(0) and s(0). Let xP (xZ = 0) and s

Z (s

P = 0),

respectively, be the unique analytic centers on the primal and dual optimalfaces: {xP : APxP = b, xP 0} and{sZ : sZ = cZ ATZy0, cP ATPy= 0}. Again, we have

nj

sj x()j+ x

j s()j

=n,

or jP

xjx()j

+ jZ

sjs()j

=n.Thus, we have

x()j xj /n > 0, j P

ands()j sj /n > 0, j Z.


45/182


46/182


Theorem 2.10 Let(x,y,s) N() for constant0<


47/182


2.4 Notes

The SDP with a duality gap was constructed by Freund.

2.5 Exercises

2.1 Let (LP) and (LD) have interior. Prove the dual potential function Bn+1(y,s,z),where z is a upper bound of z, represents the volume of a coordinate-alignedellipsoid whose intersection with the affine set{x: Ax= b}contains the primallevel set{x Fp : cTx z}.2.2 LetX, S Mn be both positive definite. Then prove

n(X, S) =n log(X

S)

log(det(X)

det(S))

n log n.

2.3 Consider linear programming and the level set

() := {(X , y , S ) F: n+(x, s) }.Prove that

(1) (2) if 1 2,and for every()is bounded and its closure()has non-empty intersectionwith the solution set.

2.4 Prove (ii) of Theorem 2.9.

2.5 Prove Theorem 2.10.

2.6 Prove Corollary 2.11. Here we assume that X()= X() and y()=y(mu).


48/182

Chapter 3

Interior-Point Algorithms

This pair of semidefinite programs can be solved in polynomial time. Thereare actually several polynomial algorithms. One is the primal-scaling algorithm,and it uses only Xto generate the iterate direction. In other words,

Xk+1

Sk+1

=Fp(X

k),

whereFp is the primal algorithm iterative mapping.Another is the dual-scaling algorithm which is the analogue of the dual po-

tential reduction algorithm for linear programming. The dual-scaling algorithmuses only S to generate the new iterate:

Xk+1Sk+1 =Fd(Sk),whereFd is the dual algorithm iterative mapping.

The third is the primal-dual scaling algorithm which uses both X and S togenerate the new iterate and references therein):

Xk+1

Sk+1

=Fpd(X

k, Sk),

whereFpd is the primal-dual algorithm iterative mapping.All these algorithms generate primal and dual iterates simultaneously, and

possess O(

n ln(1/)) iteration complexity to yield the duality gap accuracy .Other scaling algorithms have been proposed in the past. For example, an SDP

equivalent of Dikins affine-scaling algorithm could be very fast. However thisalgorithm may not even converge.

Recall thatMn denotes the set of symmetric matrices inRnn. LetMn+denote the set of positive semi-definite matrices and

Mn+ the set of positivedefinite matrices inMn. The goal of this section is to extend interior-pointalgorithms to solving the positive semi-definite programming problem (SDP)and (SDD) presented in Section 2.0.1.

43


49/182

44 CHAPTER 3. INTERIOR-POINT ALGORITHMS

(SDP) and (SDD) are analogues to linear programming (LP) and (LD). Infact, as the notation suggest, (LP) and (LD) can be expressed as a positivesemi-definite program by defining

C= diag(c), Ai = diag(ai), b= b,

where ai is the ith row of matrix A. Many of the theorems and algorithmsused in LP have analogues in SDP. However, while interior-point algorithms forLP are generally considered competitive with the simplex method in practiceand outperform it as problems become large, interior-point methods for SDPoutperform other methods on even small problems.

Denote the primal feasible set byFp and the dual byFd. We assume thatboth

Fp and

Fd are nonempty. Thus, the optimal solution sets for both (SDP)

and (SDD) are bounded and the central path exists, see Section 2.3 Let z

denote the optimal value andF =Fp Fd. In this section, we are interestedin finding an approximate solution for the SDP problem:

C X bTy= S X .

For simplicity, we assume that a central path pair (X0, y0, S0), which satisfies

(X0).5S0(X0).5 =0I and 0 =X0 S0/n,

is known. We will use it as our initial point throughout this section.

Let X Fp, (y, S)Fd, and z z. Then consider the primal potential

function

P(X, z) = (n + )log(C X z) log det X,and the primal-dual potential function

(X, S) = (n + )log(S X) log detXS,

where =

n. Letz= bTy. Then S X=C X z, and we have

(x, s) = P(x, z) log detS.

Like in Chapter 4, these functions will be used to solve SDP problems.Define the -norm,, which is the traditionall2operator norm for matrices,

ofMn by

X

:= max

j{1,...,n}{|j(X)

|},

wherej (X) is thej th eigenvalue ofX, and the Euclidean or l2 norm, whichis the traditional Frobenius norm, by

X := Xf =

X X= n

j=1

(j(X))2 .


50/182

3.1. POTENTIAL REDUCTION ALGORITHM FOR LP 45

We rename these norms because they are perfect analogues to the norms ofvectors used in LP. Furthermore, note that, for X

Mn,

tr(X) =n

j=1

j(X) and det(I+ X) =n

j=1

(1 + j (X)).

Then, we have the following lemma which resembles Lemma 3.1.We first prove two important lemmas.

Lemma 3.1 Ifd Rn such thatd < 1 then

eTd n

i=1

log(1 + di) eTd d2

2(1 d) .

Lemma 3.2 LetX

Mn and

X

< 1. Then,

tr(X) log det(I+ X) tr(X) X2

2(1 X) .

3.1 Potential Reduction Algorithm for LP

Let (x,y,s) F. Then consider the primal-dual potential function:

n+(x, s) = (n + )log(xTs)

nj=1

log(xj sj ),

where

0. Letz = bTy, then sTx= cTx

z and we have

n+(x, s) = Pn+(x, z) n

j=1

log sj .

Recall that when = 0, n+(x, s) is minimized along the central path. How-ever, when > 0, n+(x, s) means that x and s converge to theoptimal face, and the descent gets steeper as increases. In this section wechoose =

n.

The process calculates steps forx and s, which guarantee a constant reduc-tion in the primal-dual potential function. As the potential function decreases,both x and s are forced to an optimal solution pair.

Consider a pair of (xk, yk, sk) F. Fix z k =bTyk, then the gradient vectorof the primal potential function at x

k

is

Pn+(xk, zk) = (n + )(sk)Txk

c (Xk)1e= (n + )cTxk zk c (X

k)1e.

We directly solve the ball-constrained linear problem for direction dx:

minimize Pn+(xk, zk)Tdxs.t. Adx = 0,(Xk)1dx .


51/182


Let the minimizer be dx. Then

dx = Xkpk

pk ,

where

pk =p(zk) :== (I XkAT(A(Xk)2AT)1AXk)XkPn+(xk, zk).

Update

xk+1 =xk + dx = xk X

kpk

pk , (3.1)

and,

Pn+(xk+1

, zk

) Pn+(xk

, zk

) pk

+ 2

2(1 ) .Here, we have used the fact

Pn+(xk+1, zk) Pn+(xk, zk)

n + cTxk zk c

Tdx eT(Xk)1dx+ (Xk)1dx2

2(1 (Xk)1dx)= P(xk, zk)Tdx+ (X

k)1dx22(1 (Xk)1dx)

= pk + 2

2(1 )

Thus, as long aspk >0, we may choose an appropriate such that

Pn+(xk+1, zk) Pn+(xk, zk)

for some positive constant. By the relation betweenn+(x, s) and Pn+(x, z),the primal-dual potential function is also reduced. That is,

n+(xk+1, sk) n+(xk, sk) .

However, even ifpk is small, we will show that the primal-dual potentialfunction can be reduced by a constantby increasingz k and updating (yk, sk).

We focus on the expression ofpk, which can be rewritten as

pk = (I XkAT(A(Xk)2AT)1AXk)( (n + )cTxk zk X

kc e)

= (n + )

cTxk zk Xks(zk) e, (3.2)

where

s(zk) = c ATy(zk) (3.3)


52/182


and

y(z

k

) = y2 cTxk

zk

(n+) y1,y1 = (A(Xk)2AT)1b,

y2 = (A(Xk)2AT)1A(Xk)2c.

(3.4)

Regardingpk = p(zk), we have the following lemma:

Lemma 3.3 Let

k =(xk)Tsk

n =

cTxk zkn

and =(xk)Ts(zk)

n .

If

p(zk)

0, Xks(zk) e < , and


53/182


which, in view of (3.7), leads to

p(zk)2 ( (n + )nk

1)2n

((1 + 1n

)(1 .5n

) 1)2n

(1 2

2

n)2

(1 )2.

The lemma says that, whenp(zk) is small, then (xk, y(zk), s(zk)) is inthe neighborhood of the central path and b

T

y(zk

)> zk

. Thus, we can increasezk to bTy(zk) to cut the dual level set (zk). We have the following potentialreduction theorem to evaluate the progress.

Theorem 3.4 Given (xk, yk, sk) F. Let =n, zk = bTyk, xk+1 be givenby (3.1), andyk+1 =y(zk) in (3.4) andsk+1 =s(zk) in (3.3). Then, either

n+(xk+1, sk) n+(xk, sk)

or

n+(xk, sk+1) n+(xk, sk)

where >1/20.

Proof. If (3.5) does not hold, i.e.,

p(zk) min

n

n + 2 , 1

,

then

Pn+(xk+1, zk) Pn+(xk, zk) min

n

n + 2 , 1

+

2

2(1 ) ,

hence from the relation between

Pn+ and n+,

n+(xk+1, sk) n+(xk, sk) min

n

n + 2 , 1

+

2

2(1 ) .

Otherwise, from Lemma 3.3 the inequalities of (3.6) hold:

i) The first of (3.6) indicates that y k+1 andsk+1 are inFd.


54/182


ii) Using the second of (3.6) and applying Lemma 3.1 to vector Xksk+1/, wehave

n log(xk)Tsk+1 n

j=1

log(xkj sk+1j )

= n log n n

j=1

log(xkj sk+1j /)

n log n + Xksk+1/ e2

2(1 Xksk+1/ e) n log n +

2

2(1 )

n log(xk

)T

sk

n

j=1log(x

kj s

kj ) +

2

2(1 ) .

iii) According to the third of (3.6), we have

n(log(xk)Tsk+1 log(xk)Tsk) = n log

k

2 .

Adding the two inequalities in ii) and iii), we have

n+(xk, sk+1) n+(xk, sk)

2+

2

2(1 ) .

Thus, by choosing = .43 and = .3 we have the desired result.

Theorem 3.4 establishes an important fact: theprimal-dual potential func-tion can be reduced by a constant no matter where xk and y k are. In practice,one can perform the line search to minimize the primal-dual potential function.This results in the following primal-dual potential reduction algorithm.

Algorithm 3.1 Given a central path point(x0, y0, s0) F. Letz0 =bTy0. Setk:= 0.

While (sk)Txk do

1. Computey1 andy2 from (3.4).

2. If there existsz such thats(z)> 0, compute

z = arg minz

n+(xk, s(z)),

and if n+(xk, s(z))< n+(x

k, sk) thenyk+1 = y(z), sk+1 = s(z) andzk+1 =bTyk+1; otherwise, yk+1 =yk, sk+1 =sk andzk+1 =z k.


55/182


3. Letxk+1 =xk Xkp(zk+1) with

= arg min0

n+(xk Xkp(zk+1), sk+1).

4. Letk:= k+ 1 and return to Step 1.

The performance of the algorithm results from the following corollary:

Corollary 3.5 Let =

n. Then, Algorithm 3.1 terminates in at mostO(

n log(cTx0bTy0)/) iterations with

cTxk bTyk .

Proof. In O(

n log((x0)Ts0/)) iterations

n log((x0)Ts0/) = n+(xk, sk) n+(x0, s0) n log(xk)Tsk + n log n n+(x0, s0)=

n log((xk)Tsk/(x0)Ts0).

Thus, n log(cTxk bTyk) = n log(xk)Tsk n log ,

i.e.,

cTxk bTyk = (xk)Tsk .

3.2 Primal-Dual (Symmetric) Algorithm for LP

Another technique for solving linear programs is the symmetric primal-dual

algorithm. Once we have a pair (x,y,s) F with = xTs/n, we can generatea new iterate x+ and (y+, s+) by solving for dx, dy and ds from the system oflinear equations:

Sdx+ Xds = e Xs,Adx = 0,

ATdy ds = 0.(3.8)

Let d := (dx, dy , ds). To show the dependence ofd on the current pair (x, s)and the parameter , we write d= d(x,s,). Note that dTx ds =

dTx A

Tdy = 0

here.The system (3.8) is the Newton step starting from (x, s) which helps to find

the point on the central path with duality gap n, see Section 2.3.1. If= 0, itsteps toward the optimal solution characterized by the system of equations (1.2);if= 1, it steps toward the central path point (x(), y(), s()) characterizedby the system of equations (2.10); if 0 < < 1, it steps toward a centralpath point with a smaller complementarity gap. In the algorithm presented in


56/182

3.2. PRIMAL-DUAL (SYMMETRIC) ALGORITHM FOR LP 51

this section, we choose = n/(n+ ) < 1. Each iterate reduces the primal-dual potential function by at least a constant , as does the previous potentialreduction algorithm.

To analyze this algorithm, we present the following lemma, whose proof isomitted.

Lemma 3.6 Let the direction d = (dx, dy, ds) be generated by equation (3.8)with= n/(n + ), and let

=

min(Xs)

(XS)1/2( xTs(n+)e Xs), (3.9)

where is a positive constant less than 1. Let

x+ =x + dx, y+ =y + dy , and s+ =s + ds.

Then, we have(x+, y+, s+) F and

n+(x+, s+) n+(x, s)

min(Xs)(XS)1/2(e (n + )xTs

Xs) + 2

2(1 ) .

Letv = X s. Then, we can prove the following lemma (Exercise 3.3):

Lemma 3.7 Letv Rn be a positive vector and n. Then,

min(v)V1/2(e (n + )eTv

v)

3/4 .

Combining these two lemmas we have

n+(x+, s+) n+(x, s)

3/4 + 2

2(1 ) =

for a constant . This result will provide a competitive theoretical iterationbound, but a faster algorithm may be again implemented by conducting a linesearch along direction d to achieve the greatest reduction in the primal-dual

potential function. This leads to

Algorithm 3.2 Given(x0, y0, s0) F. Set n andk:= 0.While (sk)Txk do

1. Set (x, s) = (xk, sk) and = n/(n+ ) and compute (dx, dy, ds) from(3.8).


57/182


2. Letxk+1 =xk + dx, yk+1 =yk + dy, ands

k+1 =sk + ds where

= arg min0 n+(x

k + dx, sk + ds).

3. Letk:= k+ 1 and return to Step 1.

Theorem 3.8 Let = O(

n). Then, Algorithm 3.2 terminates in at mostO(

n log((x0)Ts0/)) iterations with

cTxk bTyk .

3.3 Potential Reduction Algorithm for SDP

Consider a pair of (Xk, yk, Sk)

F. Fixz k =bTyk, then the gradient matrix of

the primal potential function at Xk is

P(Xk, zk) = n + Sk Xk C (X

k)1.

The following corollary is an analog to LP.

Corollary 3.9 Let Xk Mn+ and(Xk).5(X Xk)(Xk).5 < 1. Then,

X

Mn+ andP(X, zk) P(Xk, zk) P(Xk, zk) (X Xk)

+ (Xk).5(X Xk)(Xk).52

2(1 (Xk

).5

(X Xk

)(Xk

).5

) .

Let

A =

A1A2...

Am

.

Then, define

AX=

A1 XA2 X

...Am X

=b,

and

ATy =m

i=1

yiAi.

Then, we directly solve the following ball-constrained problem:

minimize P(Xk, zk) (X Xk)s.t. A(X Xk) = 0,

(Xk).5(X Xk)(Xk).5 < 1.


58/182

3.3. POTENTIAL REDUCTION ALGORITHM FOR SDP 53

LetX = (Xk).5X(Xk).5. Note that for any symmetric matricesQ, T Mn

andX

Mn+,Q X.5T X.5 =X.5QX.5 T andXQ = QX = X.5QX.5.

Then we transform the above problem into

minimize (Xk).5P(Xk, zk)(Xk).5 (X I)s.t. A(X I) = 0, i= 1, 2,...,i,

X I ,where

A =

A1A2...

Am

:=

(Xk).5A1(Xk).5

(Xk).5A2(Xk).5

...(Xk).5Am(X

k).5

.

Let the minimizer be X and letXk+1 = (Xk).5X(Xk).5. Then

X I= Pk

Pk ,

Xk+1 Xk = (Xk).5Pk(Xk).5

Pk , (3.10)where

Pk = PA(Xk).5P(Xk, zk)(Xk).5= (Xk).5P(Xk, zk)(Xk).5 ATyk

or Pk = n + Sk Xk (X

k).5(CATyk)(Xk).5 I,and

yk =Sk Xk

n + (AAT)1A(Xk).5P(Xk, zk)(Xk).5.

Here,PA is the projection operator onto the null space ofA, and

AAT :=

A1 A1 A1 A2 ... A1 AmA2 A1 A2 A2 ... A2 Am

... ... ... ...Am A1 Am A2 ... Am Am

Mm.

In view of Corollary 3.9 and

P(Xk, zk) (Xk+1 Xk) = P(Xk, zk) (Xk).5Pk(Xk).5

Pk= (X

k).5P(Xk, zk)(Xk).5 PkPk

= Pk2

Pk = Pk,


59/182


we have

P(Xk+1

, zk

) P(Xk

, zk

) Pk

+ 2

2(1 ) .Thus, as long asPk >0, we may choose an appropriate such that

P(Xk+1, zk) P(Xk, zk) for some positive constant.

Now, we focus on the expression ofPk, which can be rewritten as

P(zk) := Pk = n +

Sk Xk (Xk).5S(zk)(Xk).5 I (3.11)

withS(zk) = C

ATy(zk) (3.12)

and

y(zk) := yk =y2 Sk Xkn +

y1= y2 C Xk zk

n + y1 , (3.13)

wherey1 andy2 are given by

y1 = (AAT)1AI= (AAT)1b,y2 = (AAT)1A(Xk).5C(Xk).5.

(3.14)

Regarding Pk = P(zk), we have the following lemma resembling Lemma3.3.

Lemma 3.10 Let

k = Sk Xk

n =

C Xk zkn

and = S(zk) Xk

n .

If

P(zk)


60/182

3.3. POTENTIAL REDUCTION ALGORITHM FOR SDP 55

Theorem 3.11 Given Xk Fp and (yk, Sk)Fd, let =n, zk = bTyk,

Xk+1

be given by (3.10), and yk+1

= y(zk

) in (3.13) and Sk+1

= S(zk

) in(3.12). Then, either

(Xk+1, Sk) (Xk, Sk) or

(Xk, Sk+1) (Xk, Sk) ,where >1/20.

Proof. If (3.15) does not hold, i.e.,

P(zk) min

n

n + 2 , 1

,

then, since(Xk+1, Sk) (Xk, Sk) = P(Xk+1, zk) P(Xk, zk),

(Xk+1, Sk) (Xk, Sk) min

n

n + 2 , 1

+

2

2(1 ) .

Otherwise, from Lemma 3.10 the inequalities of (3.16) hold:

i) The first of (3.16) indicates that y k+1 andSk+1 are inFd.

ii) Using the second of (3.16) and applying Lemma 3.2 to matrix(Xk).5Sk+1(Xk).5/, we have

n log Sk+1 Xk log det Sk+1Xk

= n log Sk+1

Xk

/ log det(Xk

).5

Sk+1

(Xk

).5

/= n log n log det(Xk).5Sk+1(Xk).5/ n log n + (X

k).5Sk+1(Xk).5/ I22(1 (Xk).5Sk+1(Xk).5/ I)

n log n + 2

2(1 ) n log Sk Xk log detSkXk +

2

2(1 ) .

iii) According to the third of (3.16), we have

n(log Sk+1

Xk

log Sk

Xk

) = n log

k

2 .

Adding the two inequalities in ii) and iii), we have

(Xk, Sk+1) (Xk, Sk) 2

+ 2

2(1 ) .

Thus, by choosing = .43 and = .3 we have the desired result.


61/182


Theorem 3.11 establishes an important fact: the primal-dualpotential func-tion can be reduced by a constant no matter where Xk andy k are.

Documents

Conic Programming