【书】nonlinear optimization (SC function)

Embed Size (px)

Citation preview

  • 8/3/2019 nonlinear optimization SC function

    1/158

    UNIVERSITY OF WATERLOO

    Nonlinear Optimization

    E. de Klerk, C. Roos, and T. Terlaky

    Waterloo, February 22, 2006

  • 8/3/2019 nonlinear optimization SC function

    2/158

  • 8/3/2019 nonlinear optimization SC function

    3/158

    Contents

    6 Self-concordant functions 1

    6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Epigraphs and closed convex functions . . . . . . . . . . . . . . 2

    6.3 Definition of the self-concordance property . . . . . . . . . . . . 3

    6.4 Equivalent formulations of the self-concordance property . . . . 8

    6.5 Positive definiteness of the Hessian matrix . . . . . . . . . . . . 12

    6.6 Some basic inequalities . . . . . . . . . . . . . . . . . . . . . . . 14

    6.7 Quadratic convergence of Newtons method . . . . . . . . . . . . 16

    6.8 Algorithm with full Newton steps . . . . . . . . . . . . . . . . . 18

    6.9 Linear convergence of the damped Newton method . . . . . . . 20

    6.10 Further estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    7 Minimization of a linear function over a closed convex domain 31

    7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    7.2 Effect of a -update . . . . . . . . . . . . . . . . . . . . . . . . . 32

    7.3 Estimate ofcTx cTx . . . . . . . . . . . . . . . . . . . . . . 40

    7.4 Algorithm with full Newton steps . . . . . . . . . . . . . . . . . 42

    7.4.1 Analysis of the algorithm with full Newton steps . . . . 43

    7.5 Algorithm with damped Newton steps . . . . . . . . . . . . . . 45

    7.5.1 Analysis of the algorithm with damped Newton steps . . 47

    7.6 Adding equality constraints . . . . . . . . . . . . . . . . . . . . 50

    8 Solving convex optimization problems 53

    8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    8.2 Getting a self-concordant barrier for F . . . . . . . . . . . . . . 54

    8.3 Tools for proving self-concordancy . . . . . . . . . . . . . . . . . 57

    i

  • 8/3/2019 nonlinear optimization SC function

    4/158

    8.4 Application to the functions in the table of Figure 8.1 . . . . . . 63

    8.5 Application to other convex problems . . . . . . . . . . . . . . . 668.5.1 Entropy minimization . . . . . . . . . . . . . . . . . . . 67

    8.5.2 Extended entropy minimization . . . . . . . . . . . . . . 67

    8.5.3 p-norm optimization . . . . . . . . . . . . . . . . . . . . 68

    8.5.4 Geometric optimization . . . . . . . . . . . . . . . . . . . 69

    9 Conic optimization 73

    9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    9.2 Every optimization problem can be modelled as a conic problem 74

    9.3 Solution method . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    9.4 Reduction to inequality system . . . . . . . . . . . . . . . . . . . 78

    9.5 Interior-point condition . . . . . . . . . . . . . . . . . . . . . . . 79

    9.6 Embedding into a self-dual problem . . . . . . . . . . . . . . . . 82

    9.7 Self-concordant barrier function for (SP) . . . . . . . . . . . . . 84

    10 Symmetric Optimization 89

    10.1 Self-dual problems over the standard cones . . . . . . . . . . . . 90

    10.1.1 On the structure of the matrix M . . . . . . . . . . . . . 90

    10.1.2 Linear cone . . . . . . . . . . . . . . . . . . . . . . . . . 91

    10.1.3 Second-order cone . . . . . . . . . . . . . . . . . . . . . . 95

    10.1.4 Semidefinite cone . . . . . . . . . . . . . . . . . . . . . . 102

    10.2 The general self-dual symmetric case . . . . . . . . . . . . . . . 114

    10.3 Back to the general symmetric case . . . . . . . . . . . . . . . . 121

    10.4 What if = 0? . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

    10.5 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

    10.5.1 The linear case . . . . . . . . . . . . . . . . . . . . . . . 133

    10.5.2 The quadratic case . . . . . . . . . . . . . . . . . . . . . 135

    10.5.3 The semidefinite case . . . . . . . . . . . . . . . . . . . . 142

    A Some technical lemmas 147

    Bibliography 153

    ii

  • 8/3/2019 nonlinear optimization SC function

    5/158

    iii

  • 8/3/2019 nonlinear optimization SC function

    6/158

    Chapter 6

    Self-concordant functions

    Thanks

    Thanks are due to Gouyong Gu, Bib Silalahi, Maryam Zangibadi, Stefan Zwam(students 3TU course Optimization), Gamal Elabwabi (proof of Lemma 10.31) forsome textual and conceptual remarks on earlier versions of the following chapters.

    6.1 Introduction

    In this chapter we introduce the notion of a self-concordant function and we derive

    some properties of such functions. We consider a strictly convex function : D R, where the domain D is an open convex subset of Rn. Our first aim is to findthe minimal value on its domain D (if it exists).

    The classical convergence analysis of Newtons method for minimizing has somemajor shortcomings. The first shortcoming is that the analysis uses quantitiesthat are not a priori known, for example uniform lower and upper bounds for theeigenvalues of the Hessian matrix of on D. The second shortcoming is that whileNewtons method is affine invariant, these quantities are not affine invariant. As aresult, if we change coordinates by an affine transformation this has in essence noeffect on the behavior of Newtons method but these quantities all change, and asa result also the iteration bound changes.

    A simple and elegant way to avoid these shortcomings was proposed by Nesterov

    and Nemirovski [8]. They posed an affine invariant condition on the function ,named self-concordancy. The well known logarithmic barrier functions, that playan important role in interior-point methods for linear and convex optimization, areself-concordant (abbreviated below as self-concordant). The analysis of Newtonsmethod for self-concordant functions does not depend on any unknown constants.As a consequence the iteration bound resulting form this analysis is invariant under(affine) changes of coordinates.

    1

  • 8/3/2019 nonlinear optimization SC function

    7/158

    The aim of this section to provide a brief introduction to the notion of self-concordancy, and to recall some results on the behavior of Newtons method whenminimizing a self-concordant function.

    Having dealt with this we will consider in the next chapter the problem of mini-mizing a linear function over the closure ofD, while assuming that a self-concordantfunction on D is given. In two subsequent chapters we apply these results to generalconvex optimization problems and conic optimization problems, respectively.

    Although we deviate from it at several places, the treatment below is based mainlyon [1], [3], [5] and [7].

    6.2 Epigraphs and closed convex functions

    We did not deal with the notion of a closed convex function so far. We thereforestart with some definitions. In this section and further on, always denotes afunction whose domain D is an open convex subset of Rn.

    Recall from Definition ?? that the epigraph of is the set

    epi := {(x, t) : x D, (x) t} ,and that by Exercise ?? the function : D R is convex if and only if epi is aconvex set.

    Definition 6.1. A function : D R is called closed if its epigraph is closed. If,moreover, is convex then is called a closed convex function.

    We denote the closure of D as D. Then the boundary ofD is defined as the setD \ D.

    Lemma 6.2. Let : D R be a closed convex function and let x belong to theboundary of D. If a sequence {xk}k=0 in the domain converges to x then (xk) .

    Proof. Consider the sequence { (xk)}k=0. Assume that it is bounded above. Thenit has a limit point . Of course, we can think that this is the unique limit point ofthe sequence. Therefore,

    zk := (xk, (xk)) x, .Note that zk belongs to the epigraph of . Since is a closed function, then alsox,

    belongs to the epigraph. But this is a contradiction since x does not belong

    to the domain of . 2

    We conclude that if the function is closed convex, then it has the property that(x) approaches infinity when x approaches the boundary of the domain D. This

    2

  • 8/3/2019 nonlinear optimization SC function

    8/158

    is also expressed by saying that is a barrier function on D. In fact, the follow-ing exercise makes clear that the barrier property is equivalent to the closednessproperty.

    Exercise 6.1. Let the function : D R have the property that it becomesunbounded (+) when approaching the boundary of its open domain D. Then isclosed. Prove this.

    Solution: 2

    6.3 Definition of the self-concordance property

    We want to minimize : D R by using Newtons method. Recall that Newtonsmethod is exact if is a quadratic function. As we will see the self-concordancyproperty guarantees good behavior of Newtons method.

    To start with we consider the case where is a univariate function. So weassume for the moment that n = 1, and that the domain D of the convex function : D R is just an open interval in R. The third order Taylor polynomial of around x D is given by

    T3() = (x) + (x) +

    1

    22(x) +

    1

    63(x).

    The self-concordance property bounds the third order term in terms of the second

    order term, by requiring that(x)3

    2((x)2)3

    =((x))2

    ((x))3, x D,

    is bounded above by some (uniform) constant. According to the following definitionthis constant is given by 42.

    Definition 6.3. Let 0. The univariate function is called -self-concordant if

    |(x)| 2 ((x)) 32 , x D. (6.1)

    Note that this definition assumes that (x) is nonnegative, whence is convex,and moreover that is three times differentiable.

    It is easy to verify that the property (6.1) is affine invariant. Because, let be-self-concordant and let be defined by (y) = (ay + b), where a = 0. Then onehas

    (y) = a(x), (y) = a2(x), (y) = a3(x),

    3

  • 8/3/2019 nonlinear optimization SC function

    9/158

    where x = ay + b. Hence it follows, due to the exponent 32

    in the definition, that is -self-concordant as well.

    Now suppose that n > 1, so is a multivariate function. Then is called a-self-concordant function if its restriction to an arbitrary line in its domain is-self-concordant. In other words, we have the following definition.

    Definition 6.4. Let 0. The function is called -self-concordant if and onlyif x,h() := (x + h) is a -self-concordant function of , for all x D and forall h Rn.

    Here the domain of x,h() is defined in the natural way: for given x D andh Rn it consists of all such that x + h D. Note that since D is an openconvex subset of Rn, the domain of () is an open interval in R.

    We proceed by presenting some simple examples of self-concordant functions. Inwhat follows, when it gives no rise to confusion, we will denote the function x,hsimply as .

    Example 6.5 [Linear function] Let (x) = + aTx, with R and a Rm. Then, for anyx Rm and h Rm we have

    () = (x + h) = + aT (x + h) = (x) + aTh.

    So () is linear in , and hence() = () = 0 .

    This implies that is 0-self-concordant. Since this holds for any x Rm and h Rm it followsthat is 0-self-concordant.

    Example 6.6 [Convex quadratic function] Let

    (x) = + aTx +1

    2xTAx,

    with and a as before and A = AT positive semidefinite. Then, for any x Rm and h Rm,

    () = (x + h) = + aT (x + h) +1

    2(x + h)T A (x + h)

    = (x) +

    aTx + hTAx

    +1

    22hTAh.

    Hence() = hTAh 0, () = 0.

    Thus it follows that is is 0-self-concordant, and since this holds for any x Rm and h Rm,also is 0-self-concordant.

    We may conclude from the above two examples that linear and convex quadraticfunctions are 0-self-concordant. This is an obvious consequence of the fact that inthese cases the third derivatives are zero.

    Example 6.7 Consider the convex function (x) = x4, with x R. Then(x) = 4x3, (x) = 12x2, (x) = 24x

    4

  • 8/3/2019 nonlinear optimization SC function

    10/158

    Now we have((x))2

    ((x))3 =(24x)2

    (12x2)3 =

    1

    3x4 .

    Clearly the right hand side expression is not bounded if x 0, hence in this case (x) is notself-concordant.

    Exercise 6.2. Letk be an integer and k > 1. Prove that (x) = xk, where x R,is -self-concordant for some only if k 2.

    Solution: 2

    Example 6.8 Now consider the (univariate) convex function

    (x) = x4 log x, x > 0.Then

    (x) = 4x3 1x

    , (x) = 12x2 +1

    x2, (x) = 24x 2

    x3.

    Therefore,

    ((x))2

    ((x))3=

    24x 2x3

    2

    12x2 + 1x2

    3 =

    24x4 2 2

    (12x4 + 1)3

    24x4 + 2 2

    (12x4 + 1)3=

    4

    12x4 + 1 4.

    This proves that (x) is a 1-self-concordant function.

    Exercise 6.3. Let k be an integer and k

    2. Verify for which values of k the

    function(x) = xk log x, withx > 0, is self-concordant, and in any such case findthe best value for .

    Solution: We have

    (x) = kxk1 1x

    (x) = k(k 1)xk2 + 1x2

    (x) = k(k 1)(k 2)xk3 2x3

    .

    Hence

    ((x))2

    ((x))3=

    k(k 1)(k 2)xk3 2x3

    2

    k(k 1)xk2 + 1x2

    3 =

    k(k 1)(k 2)xk 2 2(k(k 1)xk + 1)3 ,

    and we need to find the maximal value of this expression for x > 0. We have

    42 = maxx>0

    k(k 1)(k 2)xk 2 2

    (k(k 1)xk + 1)3 = maxy>0(k(k 1)(k 2)y 2)2

    (k(k 1)y + 1)3

    5

  • 8/3/2019 nonlinear optimization SC function

    11/158

    If k = 0 or k = 1 we get 42 = 4, whence = 1. Ifk = 2 we also get = 1. Thus wefurther assume that k > 2. Putting z = k(k

    1)y and p = k

    2, we get

    42 = maxz>0

    g(z), g(z) =(p z 2)2(z + 1)3

    .

    Now

    g(z) =12 p2(z 2)z +p(8z 4)

    (1 + z)4

    and g(z) = 0 ifz

    2p

    , 2(p+3)p

    . One has g

    2p

    = 0 and

    g

    2(p + 3)

    p

    =(2p + 4)2

    1 + 2(3+p)p

    3 =(2k)2

    1 + 2(k+1)k2

    3 =(2k)2

    3kk2

    3 =4

    27

    (k 2)3k

    .

    Thus we obtain that if k > 2 then

    =

    (k 2)327k

    ;

    this value occurs for y = 2(k+1)k(k1)(k2) , which means that

    x = k

    2(k + 1)

    k(k 1)(k 2) .

    Example 6.9 Let

    (x) = log x,with 0 < x R. Then

    (x) =1x

    , (x) =1

    x2, (x) =

    2x3

    ,

    and

    ((x))2

    ((x))3=

    2x3

    2

    1x2

    3 = 4.

    Hence, is 1-self-concordant.

    Example 6.10 Let(x) = x log(1 + x),

    with

    1 < x

    R. Then

    (x) =x

    1 + x, (x) =

    1

    (1 + x)2, (x) =

    2(1 + x)3

    , (6.2)

    and it easily follows that is 1-self-concordant.

    Exercise 6.4. Prove that the functions x log x andx log xx are not self-concordanton their domain.

    6

  • 8/3/2019 nonlinear optimization SC function

    12/158

    Solution: Defining (x) = x log x x one has

    (x) = log x, (x) = 1x

    , (x) = 1x2

    .

    If follows that((x))2

    ((x))3=

    1x4

    1x3

    =1

    x,

    which is not bounded above for x > 0. Hence (x) is not self-concordant. Since x log x

    has the same second and third derivative as (x) it is also not self-concordant.

    Exercise 6.5. Consider the so-called logarithmic barrier function of the entropyfunction x log x, which is given by

    (x) := x log x log x = (x 1)log x, 0 < x R.

    Show that (x) is 1-self-concordant.

    Solution: One has

    (x) =x 1

    x+ log x, (x) =

    x + 1

    x2, (x) = x + 2

    x3.

    Hence, using also x > 0 we may write,

    ((x))2

    ((x))3=

    x+2x3

    2

    x+1x2

    3 =(x + 2)2

    (x + 1)3 (2x + 2)

    2

    (x + 1)3=

    4

    x + 1 4,

    showing that is 1-self-concordant. 2

    Exercise 6.6. If is -self-concordant function with > 0, then can be rescaledby a positive scalar so that it becomes 1-self-concordant. This fol lows because if

    is some positive constant then is

    -self-concordant. Prove this.

    Solution: Due to Definition 6.4 it suffices to deal with the case that is a univariatefunction. Supposing that is -self-concordant we have

    (x)

    2 (x) 32 , x D.Now let be a positive scaler and (x) = (x) for each x D. Then

    (x) = (x), (x) = (x).

    Hence

    (x)

    =

    (x)

    2 (x) 32 = 2

    (x)

    32 = 2

    (x)

    32

    for each x D, proving that is

    -self-concordant. 2

    7

  • 8/3/2019 nonlinear optimization SC function

    13/158

    6.4 Equivalent formulations of the self-concordance

    propertyThe aim of this section is to present some other characterizations of the propertyof self-concordance. We start by introducing some new notations. As before, weassume that : D R, where D is an open convex subset of Rn, and, for anyx D and h Rn we use the univariate function x,h() := (x + h) where runs through all values such that x + h D. The first three derivatives of x,h()with respect to are given by

    x,h() =ni=1

    hi(x + h)

    xi(6.3)

    x,h() =

    ni=1

    nj=1

    hihj2(x + h)

    xixj (6.4)

    x,h() =ni=1

    nj=1

    nk=1

    hihjhk3(x + h)

    xixjxk. (6.5)

    The formulas (6.3) and (6.4) are immediately clear from Exercise ??. The thirdexpression is left as an exercise.

    Exercise 6.7. Prove formula (6.5).

    Solution: By (6.4) (see also Exercise ??) we have

    x,h() = hT2(x + h)h =

    n

    i=1

    n

    j=1

    hihj2(x + h)

    xixj.

    Since

    d

    d

    2(x + h)

    xixj=

    n

    k=1

    hk

    xk

    2(x + h)

    xixj=

    n

    k=1

    hk3(x + h)

    xixjxk

    we obtain

    x,h() =n

    i=1

    n

    j=1

    hihj

    n

    k=1

    hk3(x + h)

    xixjxk.

    which is in agreement with (6.5).2

    It will become clear soon that to verify if is self-concordant we need to knowthe values of the first three derivatives of x,h() = (x + h) at = 0. Theseimmediately follow from (6.3)(6.5). To simplify notation we use sort-hand nota-tions for these values, and denote them respectively as (x)[h], 2(x)[h, h] and

    8

  • 8/3/2019 nonlinear optimization SC function

    14/158

    3(x)[h,h,h] respectively. Thus we may write

    x,h(0) = (x)[h] = hT(x)x,h(0) = 2(x)[h, h] = hT2(x)hx,h(0) = 3(x)[h,h,h] = hT3(x)[h]h.

    (6.6)

    We now come to the main result of this section [3, Theorem 2.1].

    Theorem 6.11. Let be three times continuously differentiable and 0. Thenthe following three conditions are equivalent.

    3(x)[h,h,h] 22(x)[h, h]

    32 , x D, h Rn (6.7)

    x,h() 2x,h()3

    2 , x D, h Rn

    , dom x,h (6.8)x,h(0) 2x,h(0)

    32 , x D, h Rn. (6.9)

    Proof. The equivalence of (6.7) and (6.9) is a direct consequence of (6.6). Obviously(6.8) implies (6.9), because 0 dom x,h. On the other hand, by replacing x byx + h in (6.9) we obtain (6.8). Hence the proof is complete. 2

    Remark: In the literature most authors use instead of (6.7) the apparently strongercondition

    3(x)[h,h,h]

    2

    2(x)[h, h]

    32 ,

    x

    D,

    h

    R

    n. (6.10)

    But this is not needed since the left-hand side in (6.7) changes sign when replacing h by

    h, whereas the right-hand side does not change. Thus it becomes clear that (6.10) and(6.7) are equivalent.

    Note that (6.8) just states that is -self-concordant, by Definition 6.4. Wewill say that is self-concordant, without specifying , if is -self-concordant forsome 0. In general (6.9) make its much simpler to prove that a function isself-concordant. By Theorem 6.11 this will be the case if and only if the quotient3(x)[h,h,h]2

    (

    2(x)[h, h])3

    (6.11)

    is bounded above by 42 when x runs through the domain of and h through allvectors in Rn. Note that the condition for -self-concordancy is homogeneous in h:if it holds for some h then it holds for any h, with R.

    The -self-concordancy condition bounds the third order term in terms of thesecond order term in the Taylor expansion. Hence, if it is satisfied, it makes thatthe second order Taylor expansion locally provides a good quadratic approximation

    9

  • 8/3/2019 nonlinear optimization SC function

    15/158

    of (x). The latter property makes that Newtons method behaves well on self-concordant functions. This will be shown later on.

    Recall that the definition of the -self-concordance property applies to every threetimes differentiable convex function with an open domain. Keeping this in mind wecan already give two more examples of self-concordant multivariate functions.

    Example 6.12 By way of example consider the multivariate function

    (x) := n

    i=1

    log xi,

    with 0 < x Rn. Then, with e denoting the all-one vector,(x)

    xi=

    1xi

    ,2(x)

    x2i=

    1

    x2i,

    3(x)

    x3i=

    2x3i

    ,

    and all other second and third order partial derivatives are zero. Hence we have for any h Rn

    2(x)[h, h] =n

    i=1

    h2i

    x2i, 3(x)[h,h,h] =

    n

    i=1

    2h3i

    x3i.

    Hence, putting i :=hixi

    we get

    2(x)[h, h] =n

    i=1

    i2,

    3(x)[h,h,h] = 2

    n

    i=1

    3i

    .

    Applying the inequality

    n

    i=1

    3i

    n

    i=1

    |i|3

    n

    i=1

    2i

    32

    (6.12)

    we obtain

    3(x)[h,h,h] 2 2(x)[h, h] 32 ,

    thus proving that is 1-self-concordant.

    Example 6.13 With as defined in Example 6.9 we now consider

    (x) :=n

    i=1

    (xi),

    with e < x Rn. Letting h Rn, we prove that () := (x + h) is 1-self-concordant. Thisgoes as follows. One has

    () =n

    i=1

    (xi + hi).

    Hence

    () =n

    i=1

    hi(xi + hi), () =

    n

    i=1

    h2i (xi + hi), () =

    n

    i=1

    h3i (xi + hi).

    So we have, also using (6.2),

    (0) =

    n

    i=1h2i (xi) =

    n

    i=1

    h2i(1 + xi)

    2 , (0) =

    n

    i=1h3i (xi) =

    n

    i=1

    2h3i(1 + xi)

    3 .

    Putting i := hi/(1 + xi) we thus have

    (0) =n

    i=1

    i2,

    (0)

    = 2

    n

    i=1

    3i

    .

    It remains to show that |(0)| 2 ((0)) 32 . But this follows from (6.12). Hence the proof iscomplete.

    10

  • 8/3/2019 nonlinear optimization SC function

    16/158

    In what follows we use the following notations:

    g(x) := (x), x D

    and

    H(x) := 2(x), x D.As we will see in the next section, under a very weak assumption the matrix H(x)is always positive definite. As a consequence it defines an inner product, accordingto

    v, wx := vTH(x)w, v, w Rn. (6.13)The induced norm is denoted as x. So we have

    vx

    := vTH(x)v, v Rn. (6.14)Of course, this norm depends on x D. We call it the local Hessian norm of v atx D. Using this notation, the inequality (6.7) can be written as3(x)[h,h,h] 2 h3x .We conclude this section with the following characterization of the self-concordanceproperty.

    Lemma 6.14. A three times differentiable closed convex function with opendomainD is -self-concordant if and only if

    3(x)[h1, h2, h3] 2 h1x h2x h3xholds for any x D and all h1, h2, h3 Rn.

    Proof. This statement follows from a general property of three-linear forms. Forthe proof we refer to Lemma A.2 in the Appendix. 2

    We conclude this section with one more characterization of self-concordance, leav-ing the proof to the reader.

    Exercise 6.8. Given a three times differentiable convex function as before, andwith x,h defined as usual, one has that is -self-concordant if and only if

    d

    d

    1

    x,h()

    , x D, h Rn, dom x,h. (6.15)

    Prove this.

    11

  • 8/3/2019 nonlinear optimization SC function

    17/158

    Solution: One has

    dd

    1x,h()

    =

    x,h()

    2

    x,h()

    32

    .

    Hence, (6.15) is equivalent to (6.8), which completes the proof. 2

    6.5 Positive definiteness of the Hessian matrix

    In this section we deal with an interesting, and important consequence of Lemma 6.2.Before dealing with it we introduce a useful function. Let x D and 0 = d Rnbe such that x + d D. Fixing v, we define for 0 1,

    q() := vTH(x + d)v = v2x+d . (6.16)Then q() is nonnegative and continuous differentiable. The derivative to is givenby

    q() = vT3(x + d)[d] v = 3(x + d)[d,v,v].

    Using Lemma 6.14 we obtain

    |q()| = 3(x + d)[d,v,v] 2 dx+d v2x+d = 2 dx+d q().If q() > 0 this implies

    d log q()

    d = q()

    q() =|q()|

    q() 2

    d

    x+d . (6.17)

    In the special case where v = d we have dx+d = q()12 , and hence we then have

    |q()| 2 q () 32 . (6.18)If q() > 0 this implies dd 1q()

    = q()2q() 32

    . (6.19)Theorem 6.15. Let the closed convex function with open domain D be -self-concordant. IfD does not contain a straight line then the Hessian

    2

    (x) is positivedefinite at any x D.

    Proof. Suppose that H(x) is not positive definite for some x D . Then thereexists a nonzero vector d Rn such that dTH(x)d = 0 or, equivalently, dx = 0.Let q() := d2x+d, just as in (6.16) with v = d. Then q(0) = 0 and q() isnonnegative and continuously differentiable. Now (6.18) gives q() 2q() 32 .

    12

  • 8/3/2019 nonlinear optimization SC function

    18/158

    We claim that this implies q() = 0 for every 0 such that x + d D. This isa consequence of the following claim.

    Claim: Let I= [0, a) for some a > 0 and q : I R+. If q(0) = 0 and q() 2q()

    32 for every Ithen q() = 0 for every I.

    Proof. 1 Assume q(1) > 0 for some 1 I. Let

    0 := min { : q() > 0, (, 1]} .

    Since q is continuous and q(0) = 0, we have 0 0 < 1 and q(0) = 0. Now define

    h(t) :=1

    q(1 t)

    , t [0, 1 0).

    Then, since 1 t (0, 1], the definition of 0 implies that h(t) is well definedand positive. Note that h(t) goes to if t approaches 1 0. On the other handwe have

    h(t) =1

    2

    q(1 t)q(1 t) 32

    12

    2q(1 t) 32q(1 t) 32

    = ,

    and hence h(t) h(0)+ t for all t [0, 1 0). Since h(0)+ t remains boundedwhen t approaches 1 0 we have a contradiction. Hence the claim is proved. 2

    Thus we have shown that q() = 0 for every 0 such that x + d D. Thisimplies that (x + d) is linear in , because we have for some , 0

    ,

    (x + d) = (x) + dTg(x) +1

    22q() = (x) + dTg(x).

    Since D does not contain a straight line there exists an such that x + d belongsto the boundary of D. We may assume that > 0 (else replace d by d). Sincelim (x + d) = (x) + dTg(x), which is finite, this gives a conflict with thebarrier property of on D. Thus the proof is compete. 2

    Corollary 6.16. If is closed and self-concordant, andD does not contain a line,then (x) has a unique minimizer.

    From now on it will be assumed that the hypothesis of Theorem 6.15 is satisfied.So the domain D does not contain a straight line. As a consequence we have

    x D, h Rn : hx = 0 h = 0.1This proof is due to Ir. P. Sonneveld and Dr. A. Almendral.

    13

  • 8/3/2019 nonlinear optimization SC function

    19/158

    6.6 Some basic inequalities

    From now on we assume that is strictly convex. By Theorem 6.15 this is the caseif is closed and self-concordant, and D does not contain a line. The Newton stepat x is given by

    x = H(x)1g(x). (6.20)Suppose that x is a minimizer of(x) on D. A basic question is how we can measurethe distance from x to x? One obvious measure for the distance is the Euclideannorm x x. But x is unknown! So this measure can not be computed withoutknowing the minimizer. Therefore we might use instead the Euclidean norm of x,i.e., x, which vanishes only if x = x. However, instead of the Euclidean normwe use the local Hessian norm xx of x at x, as introduced in Section 6.4, tomeasure the distance from x to x. In what follows we denote this quantity by

    (x). Thus we have

    (x) := xx =

    xTH(x)x =

    g(x)TH(x)1g(x). (6.21)

    Exercise 6.9. If the full Newton step at x D is feasible, i.e., if x + x D, thenwe have

    (x + x) (x) (x)2.Prove this.

    Solution: Since is convex, we have (x + x) (x) + xTg(x). Using (6.20) and(6.21) we get xTg(x) = g(x)TH(x)1g(x) = (x)2. 2

    Lemma 6.17. Let x D and R+ and d Rn such that x + d D. Thendx

    1 + dx dx+d

    dx1 dx

    ;

    the left inequality holds for all such that 1 + dx > 0 and the right for all such that 1 dx > 0.

    Proof. Let q() :=

    d

    2x+d, just as in (6.16) with v = d. Then, from (6.19),dq()

    12

    d

    = q()2q() 32

    .Consequently, if x + d D then

    q(0)12 q() 12 q(0) 12 + .

    14

  • 8/3/2019 nonlinear optimization SC function

    20/158

    Since q(0)12 = dx and q()

    12 = dx+d, this gives

    1dx

    1dx+d 1dx

    + ,

    or, equivalently,1 dx

    dx 1dx+d

    1 + dxdx.

    Hence, if 1 + dx > 0 we obtaindx

    1 + dx dx+d

    and if 1 dx > 0 we obtain

    dx+d dx1 dx,

    proving the lemma. 2

    Exercise 6.10. With h Rn fixed, define

    () :=1

    hx+h.

    Then

    () := 3(x + h)[h,h,h]

    22(x + h)[h, h] 32 ,and hence |()| . Derive Lemma 6.17 from this.

    Solution: 2

    Lemma 6.18. Let x and d be such that x D, x + d D and dx < 1. Thenwe have, for any nonzero v Rn,

    (1 dx) vx vx+d vx

    1

    d

    x

    . (6.22)

    Proof. Let q() := v2x+d, just as in (6.16). Then q(0) = v2x and q(1) =v2x+d. Hence we may write

    logvx+dvx

    =1

    2log

    q(1)

    q(0)=

    1

    2(log q(1) log q(0)) = 1

    2

    10

    d log q()

    d

    d.

    15

  • 8/3/2019 nonlinear optimization SC function

    21/158

    By (6.17) we have

    d log q()d

    2 dx+d . Also using Lemma 6.17 this implies

    logvx+dvx

    10

    dx1 dx d = log (1 dx) |

    1=0

    = log

    1

    1 dx

    and

    logvx+dvx

    10

    dx1 dx

    d = log (1 dx) .

    Since the log function is monotonically increasing, we obtain from the above in-equalities that

    1 dx vx+dvx

    11 dx

    .

    This proves the lemma. 2

    Exercise 6.11. If x and d are such that x D, x + d D and dx < 1, then

    (1 dx)2 H(x) H(x + d) H(x)

    (1 dx)2,

    Derive this from Lemma 6.18.

    Solution: 2

    Lemma 6.19. Let x D and d Rm. If dx < 1 then x + d D.

    Proof. Since dx < 1 , we have from Lemma 6.18 that H(x + d) is bounded forall 0 1, and thus (x + d) is bounded. On the other hand, takes infinitevalues on the boundary of the feasible set, by Lemma 6.2. As a consequence wemust have x + d D. 2

    6.7 Quadratic convergence of Newtons method

    In Example ?? we already have seen an example where Newtons method convergesquadratically fast if the method starts in the neighborhood of the minimizer. Inthis section we show that in case the function that is minimized is self-concordantthis behavior of Newtons method can be quantified nicely. More precisely, wecan specify accurately a region around the minimizer where Newtons method isquadratically convergent by means of our proximity measure (x).

    16

  • 8/3/2019 nonlinear optimization SC function

    22/158

    Letx+ := x + x

    denote the iterate after the Newton step at x. Recall that the Newton step at x isgiven by

    x = H(x)1g(x)where H(x) and g(x) are the Hessian matrix and the gradient of (x), respectively.

    Recall from (6.21) that we measure the distance from x to the minimizer x of(x) by the quantity

    (x) = xx =

    g(x)TH(x)1g(x).

    Note that if x = x then g(x) = 0 and hence (x) = 0; whereas in all other cases(x) will be positive.

    The Newton step at x+ is denoted as x+. After the Newton step we have

    (x+) = x+x+ =H(x+)1g(x+)

    x+=

    g(x+)TH(x+)1g(x+).

    We are now are ready to prove our first main result on Newtonss behavior onself-concordant functions.

    Theorem 6.20. If (x) 1

    then x+ is feasible. Moreover, if (x) < 1

    then

    (x+)

    (x)

    1 (x)2

    .

    Proof. The feasibility of x+ follows from Lemma 6.19, since xx = (x) 1/.To prove the second statement in the theorem we denote the Newton step at x+

    shortly as v. Sov := H(x+)1g(x+).

    For 0 1 we consider the functionk() := vTg(x + x) (1 )vTg(x).

    Note that k(0) = 0 and

    k(1) = g(x+)TH(x+)1g(x+) = (x+)2.

    Taking the derivative of k to we get, also using H(x)x = g(x),k() = vTH(x + x)x + vTg(x) = vT (H(x + x) H(x)) x.

    By Exercise 6.11,

    H(x + x) H(x)

    1

    (1 xx)2 1

    H(x).

    17

  • 8/3/2019 nonlinear optimization SC function

    23/158

    Now applying the generalized Cauchy inequality in the Appendix (Lemma A.1) weget

    vT (H(x + x) H(x)) x

    1

    (1 xx)2 1

    vx xx .

    Hence, combining the above results, and using xx = (x), we may write

    k()

    1

    (1 (x))2 1

    vx (x).

    Therefore, since k(0) = 0,

    k(1) (x) vx 1

    0 1(1 (x))2 1 d = vx (x)

    2

    1 (x) .

    Since v = H(x+)1g(x+), we have, by Lemma 6.18,

    vx vx+

    1 xx=

    (x+)

    1 (x) .

    Since k(1) = (x+)2, it follows by substitution,

    (x+)2 = k(1) (x+)

    1 (x)(x)2

    1 (x) .

    Dividing both sides by (x+) the lemma follows. 2

    Corollary 6.21. If (x) 12

    3 5 0.3820 then x+ is feasible and (x+) (x).

    Corollary 6.22. If (x) 13 then x+ is feasible and (x+) 32

    (x)2

    =32(x)

    2

    .

    6.8 Algorithm with full Newton steps

    Assuming that we know a point x D with (x) 13 we can easily obtain a pointx D such that (x) , for prescribed > 0, with the algorithm in Figure 6.1. Inthis algorithm we use the notations introduced before. So x denotes the Newtonstep with respect to at x, given by (6.20), and (x) is the value of the proximitymeasure at x as given by (6.21). We assume that is not linear or quadratic. Then > 0. Due to Exercise 6.6 we may always assume that 49 . We will assume thisfrom now on.

    18

  • 8/3/2019 nonlinear optimization SC function

    24/158

    Input:

    An accuracy parameter (0, 1);x D such that (x) 1

    3.

    while (x) dox := x + x

    endwhile

    Figure 6.1. Algorithm with full Newton steps

    The following theorem gives an upper bound for the number of iterations requiredby the algorithm.

    Theorem 6.23. Let x D and (x) 13

    . Then the algorithm with full Newtonsteps requires at most

    log2

    3.4761 log

    1

    iterations. The output is a point x D such that (x) .

    Proof. Let x0

    D be such that (x0

    ) 13 Starting at x

    0

    we repeatedly applyfull Newton steps until the k-iterate, denoted as xk, satisfies (xk) , where > 0 is the prescribed accuracy parameter. We can estimate the required numberof Newton steps by using Corollary 6.22. To simplify notation we define for themoment 0 = (x0) and = 3

    2

    . Note that 1. It then follows that

    (xk) (xk1)2 (xk2)22 2+4++2k 02k .This gives

    (xk) 2k+12 02k = 2 202k 202k .Using the definition of and 0 13 we obtain

    20

    3

    2

    21

    3=

    3

    4.

    Hence, we certainly have (xk) if 342k . Taking logarithms at both sidesthis reduces to

    2k log 34

    log .

    19

  • 8/3/2019 nonlinear optimization SC function

    25/158

    Dividing by log 34

    (which is negative!) we get 2k log log 3

    4

    , or, equivalently, k

    log2log log 3

    4 . Thus we find that after no more than

    log2

    log

    log 34

    = log2 (3.4761 log ) = log2

    3.4761 log

    1

    iterations the process will stop and the output will be an x D such that (x) .2

    6.9 Linear convergence of the damped Newtonmethod

    In this section we consider the case where x D lies outside the region wherethe Newton process is quadratically convergent. More precisely, we assume that(x) > 1

    3. In that case we perform a damped Newton step, with damping factor ,

    and the new iterate is given by

    x+ = x + x.

    In the algorithm of Figure 6.2 we use = 1/(1 + (x)) as a default step size.

    Input:

    x D such that (x) > 13 .while (x) > 1

    3do

    := 11+(x)

    x := x + x

    endwhile

    Figure 6.2. Algorithm with damped Newton steps

    In the next theorem we use the function

    (t) := t log(1 + t), t > 1. (6.23)Note that this is a strictly convex nonnegative function, which is minimal at t = 0,and (0) = 0. The lemma shows that with an appropriate choice of we canguarantee a fixed decrease in after the step.

    20

  • 8/3/2019 nonlinear optimization SC function

    26/158

    Theorem 6.24. Let x D and := (x). If := 11+

    then

    (x) (x + x) ()2

    .

    Proof. Define() := (x) (x + x).

    Then

    () = g(x + x)Tx() = xTH(x + x)x = 2(x + x)[x, x]

    () = 3(x + x)[x, x, x].

    Now using that is -self-concordant we deduce from the last expression that

    () 2 x3x+x .Hence, also using Lemma 6.17,

    () 2 x3x

    (1 xx)3=

    23(1 )3 .

    This information on the third derivative of () is used to prove the theorem, byintegrating three times. By integrating once we obtain

    ()

    (0)

    0

    23

    (1 )3 d =

    2

    (1 )2 =0 =

    2

    (1 )2 +

    2.

    Since (0) = 2(x)[x, x] = 2, we obtain

    () 2

    (1 )2 .

    By integrating once more we derive an estimate for ():

    () (0) 0

    2(1 )2 d =

    (1 )

    =0

    =

    (1 ) +

    .

    Since (0) = g(x)Tx = xH(x)x = 2, we obtain

    () (1 ) +

    + 2.

    Finally, in the same way we derive an estimate for (). Using that (0) = 0 wehave

    () 0

    (1 ) +

    + 2

    d =

    1

    2

    log (1 ) + + 22 .21

  • 8/3/2019 nonlinear optimization SC function

    27/158

    One may easily verify that the last expression is maximal for a = 11+

    . Substitutionof this value yields

    (a) 12

    log

    1

    1 +

    +

    =

    1

    2( log (1 + )) = 1

    2 () ,

    which is the desired inequality. 2

    Since (t) is monotonically increasing for positive t, and > 1/(3), the followingresult is an immediate consequence of Theorem 6.24.

    Corollary 6.25. If (x) > 13

    then x+ is feasible and

    () 12

    1

    3

    = 0.0457

    2> 1

    222.

    The next result is an obvious consequence of this corollary.

    Theorem 6.26. Let x D and (x) > 13 . If x denotes the minimizer of (x),then the algorithm with damped Newton steps requires at most

    222

    (x0) (x)iterations. The output is a point x

    Dsuch that (x)

    13

    .

    In order to obtain a solution such that (x) , after the algorithm with dampedNewton steps we can proceed with full Newton steps. Due to Theorem 6.23 andTheorem 6.24 we can obtain such a solution after a total of at most

    222

    (x0) (x) + 2log3.4761 log 1

    (6.24)

    iterations. Note the drawback of the above iteration bound: usually we have no apriori knowledge of (x) and the bound cannot be calculated at the start of thealgorithm. But in many cases we can derive a good estimate for (x0) (x)and we obtain an upper bound for the number of iterations before starting the

    optimization process.

    Example 6.27 Consider the function : (1, ) R defined by(x) = x log(1 + x), x > 0.

    We established earlier that is 1-self-concordant, in Example 6.10. One has

    (x) =x

    1 + x, (x) =

    1

    (1 + x)2.

    22

  • 8/3/2019 nonlinear optimization SC function

    28/158

    Therefore,

    (x) =

    (x)2

    (x) = x2 = |x| .Note that x = 0 is the unique minimizer. The Newton step at x is given by

    x = (x)

    (x)= x(1 + x),

    and a full Newton step yieldsx+ = x x(1 + x) = x2.

    The Newton step is feasible only if x2 > 1, i.e. only if (x) < 1. Note that the theoryguarantees feasibility in that case. Moreover, if the Newton step is feasible then (x+) = (x)2,which is better than the theoretical result of Theorem 6.20.

    When we take a damped Newton step, with the default step size = 11+(x) , the next iterate

    is given by

    x+ = x x(1 + x)1 +

    |x

    |=

    0, if x > 0

    2x2

    1xif x < 0.

    Thus we find in this example that the damped Newton step is exact if x > 0. Also, if1 < x < 0then

    2x21 x < x

    2,

    and hence then the full Newton step performs better than the damped Newton step. Finallyobserve that if we apply Newtons method until (x) then the output is a point x such that|x| .

    Example 6.28 We now consider the function (x) introduced in Example 6.13:

    (x) :=n

    i=1

    (xi) =n

    i=1

    (xi log(1 + xi)) ,

    with e < x Rn. The gradient and Hessian of are

    g(x) =

    x1

    1 + x1; . . . ;

    xn

    1 + xn

    =x

    e + x

    H(x) = diag

    1

    (1 + x1)2 ; . . . ;

    1

    (1 + xn)2

    = diag

    e

    (e + x)2

    .

    We already established that is 1-self-concordant. One has

    (x) =

    g(x)TH(x)1g(x) =

    n

    i=1

    x2i = x .

    This implies that x = 0 is the unique minimizer. The Newton step at x is given by

    x = H(x)1g(x) = x(e + x),and a full Newton step yields

    x+ = x x(e + x) = x2.The Newton step is feasible only ifx2 > e, i.e., if x2 < e; this certainly holds if (x) < 1. Notethat the theory guarantees feasibility only in that case. Moreover, if the Newton step is feasible

    then (x+) =-

    - x2-

    - x x (x)2,and this is again better than the theoretical result of Theorem 6.20.

    When we take a damped Newton step, with the default step size = 11+(x)

    , the next iterate

    is given by

    x+ = x x(e + x)1 + x .

    If we apply Newtons method until (x) then the output is an x such that x .

    23

  • 8/3/2019 nonlinear optimization SC function

    29/158

    Example 6.29 Consider the (univariate) function f : (0, ) R defined byf(x) = x log x

    log x, x > 0.

    This is the logarithmic barrier function of the entropy function x log x. It can be easily shown thatf is 1-self-concordant. One has

    f(x) =x 1

    x+ log x, f(x) =

    x + 1

    x2.

    Therefore,

    (x) =

    f(x)2

    f(x)=

    (x 1 + x log x)21 + x

    =|x 1 + x log x|

    1 + x.

    Note that (1) = 0, which implies that x = 1 is the unique minimizer. The Newton step at anyx > 0 is given by

    x = f(x)

    f(x)= x (x 1 + x log x)

    1 + x.

    So a full Newton step yields

    x+ = x

    x (x 1 + x log x)1 + x

    =x (2 x log x)

    1 + x.

    When we take a damped Newton step, with the default step size = 11+(x) , the next iterate is

    given by

    x+ = x x (x 1 + x log x)(1 + x) (1 + x)

    .

    We conclude this example with a numerical experiment. If we start at x = 10 we get as outputthe figures in the following tableau. In this tableau k denotes the iteration number, xk the k-thiterate, (xk) is the proximity value at xk and k the step size in the k + 1-th iteration.

    k xk f(xk) (xk) k

    0 10.00000000000000 20.72326583694642 9.65615737513337 0.09384245791391

    1 7.26783221086343 12.43198234403589 7.19322142387618 0.12205211457924

    2 5.04872746432087 6.55544129967853 4.97000287092924 0.16750410705319

    3 3.33976698811526 2.82152744553701 3.05643368252612 0.24652196443090

    4 2.13180419256384 0.85674030296950 1.55140872104182 0.39194033937129

    5 1.39932346194914 0.13416824208214 0.56132642454284 0.64048105782415

    6 1.07453881397326 0.00535871156275 0.10538523300312 1.00000000000000

    7 0.99591735745291 0.00001670208774 0.00577372342963 1.00000000000000

    8 0.99998748482804 0.00000000015663 0.00001769912592 1.00000000000000

    9 0.99999999988253 0.00000000000000 0.00000000016613 1.00000000000000

    If we start at x = 0.1 the output becomes

    k xk f(xk) (xk) k

    0 0.10000000000000 2.07232658369464 1.07765920479347 0.48131088953032

    1 0.14945506622819 1.61668135596306 1.05829223631865 0.48583965986703

    2 0.22112932596124 1.17532173793649 1.00679545093710 0.49830688998873

    3 0.32152237588997 0.76986051286674 0.90755746327638 0.524230603403384 0.45458940014373 0.42998027395695 0.74937259761986 0.57163351098592

    5 0.61604926491198 0.18599661844608 0.53678522950535 0.65070901307522

    6 0.78531752299982 0.05188170346324 0.30270971353625 1.00000000000000

    7 0.96323307457328 0.00137728412903 0.05199249905660 1.00000000000000

    8 0.99897567517041 0.00000104977911 0.00144861398705 1.00000000000000

    9 0.99999921284500 0.00000000000062 0.00000111320527 1.00000000000000

    10 0.99999999999954 0.00000000000000 0.00000000000066 1.00000000000000

    24

  • 8/3/2019 nonlinear optimization SC function

    30/158

    Observe that in both cases the number of iterations is significantly less than the theoretical boundin (6.24).

    6.10 Further estimates

    In the above analysis we found an upper bound for the number of iterations thatthe algorithm needs to yield a feasible point x such that (x) . But what canbe said about (x) (x), and what about x x? The aim of this section is toprovide answers to these questions.

    We start with the following lemma.

    Lemma 6.30. Letx D and d Rn such that x + d D. Then

    d2x1 + dx

    dT (g(x + d) g(x)) d2x1 dx

    , (6.25)

    ( dx)2

    (x + d) (x) dTg(x) ( dx)2

    . (6.26)

    In the right-hand side inequalities it is assumed that dx < 1.

    Proof. We have

    dT (g(x + d) g(x)) =10

    dTH(x + d)d d =

    10

    d2x+d d.

    Using Lemma 6.17 we may write

    d2x1 + dx

    =

    10

    d2x(1 + dx)2

    d 10

    d2x+d d

    10

    d2x(1 dx)2

    d =d2x

    1 dx.

    From this the inequalities in (6.25) immediately follow. To obtain the inequalitiesin (6.26) we write

    (x + d) (x) dTg(x) = 1

    0

    dT (g(x + d) g(x)) d.

    Now using the inequalities in (6.25) we obtain10

    dT (g(x + d) g(x)) d 10

    d2x1 dx

    d = dx log(1 dx)

    2

    =( dx)

    2

    25

  • 8/3/2019 nonlinear optimization SC function

    31/158

    and

    10

    dT (g(x + d) g(x)) d 10

    d2x1 + dx

    d = dx log(1 + dx)2

    =( dx)

    2.

    This completes the proof. 2

    As usual, for each x D, (x) = xx, with x denoting the Newton step atx. We now prove that if (x) < 1

    for some x D then must have a minimizer.

    Note that this surprising result expresses that some local condition on providesus with a global property, namely the existence of a minimizer.

    Theorem 6.31. Let (x) < 1

    for some x D. Then has a unique minimizerx in D.

    Proof. The proof is based on the observation that the level set

    L := {y D : (y) (x)} ,

    with x as given in the theorem, is compact. This can be seen as follows. Let y D.Writing y = x + d, with d Rn, Lemma 6.30 implies the inequality

    (y)

    (x)

    dTg(x) +

    ( dx)

    2

    =

    dTH(x)x +

    ( dx)

    2

    ,

    where we used that, by definition, the Newton step x at x satisfies H(x)x =g(x). Since

    dTH(x)x dx xx = dx (x)we thus have

    (y) (x) dx (x) + ( dx)

    2.

    Now let y = x + d L. Then (y) (x), whence we obtain

    dx (x) + ( dx)

    2 0,

    which implies ( dx) dx

    (x) < 1. (6.27)

    Putting := dx one may easily verify that ()/ is monotonically increasingfor > 0 and goes to 1 if . Therefore, since (x) < 1, we may conclude from(6.27) that dx can not be arbitrary large. In other words, dx is boundedabove. This means that the set of vectors d such that x + d L is bounded. This

    26

  • 8/3/2019 nonlinear optimization SC function

    32/158

    implies that the level set L itself is bounded. Since this set is also closed, the set Lis compact. Hence has a minimal value in

    L, and this value is attained at some

    x L. Since is convex, x is a (global) minimizer of , and by Corollary 6.16,this minimizer is unique. 2

    Example 6.32 Consider

    (x) := n

    i=1

    log xi,

    with 0 < x Rn. We established in Example 6.12 that is 1-self-concordant, and the first andsecond order derivatives are given by

    g(x) = (x) = ex

    , H(x) = 2(x) = diag e

    x2

    .

    Therefore,

    (x) =

    g(x)TH(x)1g(x) =

    n

    i=1

    1 =

    n.

    We conclude from this that has no minimizer (cf. Exercise 6.12).

    The next example shows that the result of Theorem 6.31 is sharp.

    Example 6.33 With 0 fixed, consider the function f : (0, ) R defined byf(x) = x log x, x > 0.

    This function is 1-self-concordant. One has

    f(x) = 1

    x, f (x) =

    1

    x2.

    Therefore,

    (x) =

    x2

    1x

    2

    = |1 x| .

    Thus, for = 0 we have (x) = 1 for each x > 0. Since f0(x) = log x, f0(x) has no minimizer.On the other hand, if > 0 then ( 1

    ) = 0 < 1 and x = 1

    is a minimizer.

    Exercise 6.12. Let (x) 1

    for all x D. Then is unbounded (from below)and hence has no minimizer in D. Prove this. (Hint: use Theorem 6.24.)

    Solution: 2

    Exercise 6.13. Let(x) 1

    for all x D. ThenD is unbounded. Prove this.

    Solution: 2

    The proof of the next theorem requires the result of the following exercise.

    27

  • 8/3/2019 nonlinear optimization SC function

    33/158

    Exercise 6.14. For s < 1 one has 2

    (s) = supt>1

    {st (t)} ,

    whence

    (s) + (t) st, s < 1, t > 1. (6.28)Prove this.

    Solution: 2

    Theorem 6.34. Let x D

    be such that (x) < 1

    and let x denote the uniqueminimizer of . Then, with := (x),

    ()

    2 (x) (x) ()

    2(6.29)

    ()

    =

    1 + x xx

    1 = ()

    . (6.30)

    Proof. The left inequality in (6.29) follows from Theorem 6.24, because is min-imal at x. Furthermore, from (6.26) in Lemma 6.30, with d = x x, we get theright inequality in (6.29):

    (x) (x) dTg(x) + ( dx)2

    dx +( dx)

    2

    =1

    2( dx + ( dx))

    ()2

    ,

    where the second inequality holds since

    dTg(x)

    =

    dTH(x)x

    dx xx = dx (x) = dx , (6.31)

    and the fourth inequality follows from (6.28) in Exercise 6.14.

    For the proof of (6.30) we first derive from (6.31) and inequalty (6.25) in Lemma 6.30that

    d2x1 + dx

    dT (g(x) g(x)) = dTg(x) dx ,

    2The property of below means that (t) is the so-called conjugate function of (t).

    28

  • 8/3/2019 nonlinear optimization SC function

    34/158

    where we used that g(x) = 0. Dividing by dx we get

    dx1 + dx

    ,

    which gives rise to the right inequality in (6.30), since it follows now that

    dx

    1 = ()

    .

    Note that the left inequality in (6.30) is trivial if dx 1, because then dx 1/, whereas

    1+< 1

    . Thus we may assume that 1 dx > 0. For 0 1,consider

    k() := g(x d)TH(x)1g(x).

    One has k(0) = 0 and k(1) = (x)2

    = 2

    . From Exercise 6.11 and the Cauchyinequality we get

    k() = dTH(x d)H(x)1g(x) = dTH(x d)x dx (x)(1 dx)2

    .

    Hence we have

    2 = k(1) 10

    dx (1 dx)2

    d =dx

    1 dx.

    After dividing both sides by this implies

    dx

    1 + .

    Thus the proof is complete. 2

    29

  • 8/3/2019 nonlinear optimization SC function

    35/158

    30

  • 8/3/2019 nonlinear optimization SC function

    36/158

    Chapter 7

    Minimization of a linear

    function over a closed

    convex domain

    7.1 Introduction

    In this chapter we consider the problem of minimizing a linear function over a closedconvex domain D:

    (P) min

    cTx : x D .We assume that we have a self-concordant barrier function : D R, whereD = int D, and also that H(x) = 2(x) is positive definite for every x D.

    For each > 0 we define

    (x) :=

    cTx

    + (x), x Dand we consider the problem

    (P) inf {(x) : x D} .We denote the gradient and Hessian matrix of (x) as g(x) and H(x), respec-tively. Then we may write

    g(x) := (x) = c

    + (x) = c

    + g(x) (7.1)

    andH(x) :=

    2(x) =

    2(x) = H(x). (7.2)

    An immediate consequence of (7.2) is

    3(x) = 3(x).So it becomes clear that the second and third derivatives of (x) coincide with thesecond and third derivatives of (x), and do not depend on . Assuming that (x)is -self-concordant, it follows that (x) is -self-concordant as well.

    31

  • 8/3/2019 nonlinear optimization SC function

    37/158

    The minimizer of (x), if it exists, is denoted as x(). When runs thoughall positive numbers then x() runs through the so-called central path of (P). Weexpect that x() converges to an optimal solution of (P) when approaches 0, sincethen the linear term in the objective function of (P) dominates the remaining part.Therefore, our aim is to use the central path as a guideline to the (set of) optimalsolution(s) of (P). This approach is likely to be feasible, because since (x) isself-concordant its minimizer can be computed efficiently.

    The Newton step at x D with respect to (x) is given byx = H(x)1g(x).

    Just as in the previous chapter we measure the distance of x D to the -centerx() by the local norm of x. So for this purpose we use the quantity (x)defined by

    (x) = xx =

    xTH(x)x =

    g(x)TH(x)1g(x) = g(x)H1 .

    Before presenting the algorithm we need to deal with two issues. First, when is small enough? We want to have the guarantee that the algorithm generates afeasible point whose objective value deviates no more than from the optimal value,where > 0 is some prescribed accuracy parameter. Second, we need to know whatthe effect is of an update of on our proximity measure (x). We start with thesecond issue.

    7.2 Effect of a -update

    Let := (x) and + = (1 ). Our aim is to estimate +(x). We have

    g+(x) =c

    ++ (x) = c

    (1 ) + (x) =c

    (1 ) + g(x)

    =1

    1

    c

    + g(x) g(x)

    =

    1

    1 (g(x) g(x)) .

    Hence, denoting H(x) shortly as H, we may write

    +(x) =1

    1 g(x) g(x)H1

    1

    1 g(x)H1 (x)

    + g(x)H1 (x)

    =

    1

    1 ((x) + (x)) . (7.3)

    At present we have no means to obtain an upper bound for the quantity (x).Therefore, we use the following definition.

    32

  • 8/3/2019 nonlinear optimization SC function

    38/158

    Definition 7.1. Let 0. The self-concordant function is called a -barrier if

    (x)2 = g(x)2H1 , x D. (7.4)

    An immediate consequence of this definition and (7.3) is the following lemma,which requires no further proof.

    Lemma 7.2. If is a self-concordant -barrier then

    +(x) (x) +

    1 .

    In what follows we shall say that is a -barrier function if it satisfies (7.4).If is also -self-concordant then we say that is a (, )-self-concordant barrierfunction (SCB).

    Before proceeding we prove the next theorem, which provides some other char-acterizations of the -barrier property that will be used later on.

    Theorem 7.3. Let be two times continuously differentiable and 0. Then thefollowing three conditions are equivalent.

    (x)2 , x D (7.5)x,h()

    2 x,h(), x D, h Rn, dom x,h (7.6)

    x,h(0)2

    x,h(0), x D, h Rn

    . (7.7)

    Proof. Obviously (7.6) implies (7.7), because 0 dom x,h. On the other hand,by replacing x by x + h in (7.7) we obtain (7.6). Hence it follows that (7.6) and(7.7) are equivalent. It remains to prove the equivalence of these statements with(7.5). This requires a little more effort.

    From the definition (6.21) of (x) it follows that (7.5) can be equivalently statedas

    x2x = g(x)TH(x)1g(x) , x D. (7.8)Assuming that this holds, and using H(x)x = g(x) we may write

    x,h(0)2 =g(x)Th2 = H(x)1g(x)T H(x) h2

    =

    xTH(x)h2

    = (x, H(x)hx)2

    x2x h2x (using the Cauchy-Schwarz inequality) h2x (by (7.11))= x,h(0) (by (6.6)).

    33

  • 8/3/2019 nonlinear optimization SC function

    39/158

    This shows that (7.11) implies (7.7). It remains to prove the converse implication.Assuming that (7.7) holds, we write

    x2x = g(x)T H(x)1g(x) = g(x)T(x) = x,x(0) (by (6.6))

    x,x(0) (by (7.7))

    =

    xTH(x)x (by (6.6))

    =

    xx (by (6.14)).

    This implies xx

    , which is equivalent to (7.11). Hence the proof is com-plete. 2

    In the next three exercises the reader is invited to compute and for barrierfunctions of three important closed convex sets, namely the nonnegative orthant inRn, the Lorentz cone Ln+ and the semidefinite cone S

    n+. These are given by

    the nonnegative orthant:

    Rn+ = {x Rn : x 0} ;

    the Lorentz cone:3

    Ln+ =

    x Rn : xn n1

    i=1x2i

    ;

    the semidefinite cone:

    Sn+ =

    A Rnn : A = AT, xTAx 0, x Rn .Although Sn+ is defined as a set of matrices it fits in our framework if one realizesthat we have a one-to-one correspondence between n n matrices and vectors inRn

    2

    , namely by associating to every n n matrix the concatenation of its columns,in their natural order.

    Exercise 7.1. Prove that ni=1 log xi is a (1, n)-SCB for R

    n+.

    Solution: Consider

    (x) := n

    i=1

    log xi, 0 < x Rn.

    3The cone is called after the Dutch physician Hendrik Antoon Lorentz who, together with hisstudent Pieter Zeeman received the Noble prize in 1902 for their work on the so-called Zeeman-effect. The cone is also called quadratic cone, or second order cone, or ice-cream cone.

    34

  • 8/3/2019 nonlinear optimization SC function

    40/158

    Denoting the all-one vector as e, the first and second order derivatives are given by

    g(x) = (x) = ex

    , H(x) = 2(x) = diag

    ex2

    .

    One easily verifies that is 1-self-concordant. Moreover,

    (x) =

    g(x)TH(x)1g(x) =

    eT

    xTdiag (x2)

    e

    x=

    eTe = e = n.

    This shows that (x) is an n-barrier for Rn+. Note that it follows that there is no x intRn+ = R

    n++ such that (x) < 1/ = 1. This is in agreement with the fact that has

    no minimizer. 2

    Exercise 7.2. Prove that log x2n x1:n12 is a (1, 2)-SCB for Ln+. Solution: It is convenient to use the following representation of the cone Ln+:

    Ln+ = {(x, t) Rn : x t} .

    Denote

    (x, t) = log t2 x2 , (x, t) intLn+.Let (x, t) intLn+, and (h, ) a nonzero vector in Rn. Denote () = (t+)2x + h2.Note that () is quadratic in . So () = 0. We need to compare the second andthird derivative to of the function

    () := (x + h,t + ) =

    log ()

    at = 0. Some tedious, but straightforward calculations yield (using () = 0)

    (0) = t2 x2, (0) = 2 t hTx , (0) = 2 2 h2

    (0) = (0)(0) ,

    (0) = (0)2(0)(0)

    (0)2, (0) = 2(0)33(0)(0)(0)

    (0)3

    The inequality ((0))2 2(0) is equivalent to 2(0)(0) ((0))2, which holds ifand only if

    2 h2 t2 x2

    t hTx 2

    , (h, ) Rn.

    This certainly holds if 2 h2 0. So we may assume 2 h2 > 0. The aboveinequality is homogeneous in (h, ). Thus we may take = 1, and for a similar reason also

    t = 1. The worst case occurs if hTx = h x, whence the inequality reduces to

    1 h2 1 x2 (1 h x)2 , h Rn1,

    whose validity can be easily checked. Finally, we need to check self-concordance. We have

    ((0))2

    ((0))3=

    2(0)3 3(0)(0)(0) 2((0)2 (0)(0))3 .

    35

  • 8/3/2019 nonlinear optimization SC function

    41/158

    Putting = (0)(0)

    (0)2we write

    ((0))2

    ((0))3=

    (2 3)2(1 )3 .

    Since 2(0)(0) ((0))2 we have 12

    . Using this one may easily check that when 12 the right hand side expression is maximal for = 0. Hence we obtain

    ((0))2

    ((0))3 4.

    So we may conclude that (x, t) is 1-self-concordant. 2

    Exercise 7.3. Prove that log det X is a (1, n)-SCB for Sn+.

    Solution: Recall that

    Sn+ =

    X Rnn : X = XT, xTXx 0, x Rn

    .

    Let

    (X) := logdet X, X intSn+.For X intSn+ and Y Sn we consider

    () := (X + Y) = logdet(X + Y)

    We may write

    () = logdet(X + Y) = log det

    X12

    I + X12 Y X

    12

    X12

    = 2logdet X 12 log det

    I+ X12 Y X

    12

    = logdet X logn

    i=1

    (1 + i) = logdet X n

    i=1

    log (1 + i) ,

    where i are the eigenvalues of X 12 Y X

    12 . Taking derivatives to we get

    () = n

    i=1

    i1 + i

    , () =n

    i=1

    2i

    (1 + i)2 ,

    () = 2n

    i=1

    3i

    (1 + i)3 .

    Hence we have

    ((0))2

    (0)=

    n

    i=1 i 2

    n

    i=1 2i

    n, ((0))2

    ((0))3=

    2 n

    i=1 3i

    2

    n

    i=1 2i

    3 4

    proving that (X) is a 1-self-concordant n-barrier for the semidefinite cone Sn+. 2

    36

  • 8/3/2019 nonlinear optimization SC function

    42/158

    Exercise 7.4. Show that each of the above self-concordant functions is logarithmi-cally homogeneous. Denoting the function as , this means that there exists

    R

    such that(tx) = (x) log t, x D, t > 0. (7.9)

    Also show that if a self-concordant function satisfies (7.9) then it is a -barrier with = .

    Solution: The barrier function for Rn+ is

    (x) := n

    i=1

    log xi, 0 < x Rn.

    Hence we have

    (tx) = n

    i=1

    log txi = n

    i=1

    (log t + log xi) = (x) n log t,

    so (7.9) holds with = n. The barrier function for Ln+ is

    (x, ) = log s2 x2 , (x, ) intLn+.

    Hence

    (tx) = log t22 tx2 = log t2 2 x2 = (x) 2log t,

    so (7.9) holds with = 2. Finally, the barrier function for Sn+ is

    (X) := log det X, X intSn+.

    Hence

    (tX) = log det (tX) = log det ((tIn) X) log(det (tIn)det X)= log det (X) logdet (tIn) = (X) log tn = (X) n log t,

    so (7.9) holds with = n. This proves the first statement in the exercise. Note that in allcases we found that = . We next show that this holds in general. Differentiating (7.9)with respect to t gives

    (tx)Tx = t

    .

    For t = 1 this gives(x)Tx = .

    Differentiating once more, now with respect to x, yields

    (x) + 2(x)x = 0.

    Hence, since (x) = g(x), 2(x) = H(x) and H(x)x = g(x), we obtain

    g(x)Tx = , x = x.

    Therefore,x2

    x= xTH(x)x = g(x)Tx = .

    37

  • 8/3/2019 nonlinear optimization SC function

    43/158

    It follows that (x) satisfies (7.11) with = . By Theorem 7.3 (or better: by its proof)it follows that (x) is a -barrier. It is worth pointing out that we have equality in (7.11),and hence also in (7.5), for all x D. 2

    Before proceeding to the next section, we introduce the so-called Dikin-ellipsoidat x, and using this we give a new characterization of our proximity measure (x).

    Definition 7.4. For any x D the Dikin-ellipsoid at x is defined byEx := {d Rn : dx 1} .

    Lemma 7.5. For any x

    Done has

    maxdTg(x) : d Ex = (x).

    Proof. Due to Definition 7.4 the maximization problem in the lemma can bereformulated as

    maxdTg(x) : dTH(x)d 1 .

    If g(x) = 0 then the lemma is obviously true, because then (x) = 0. So we mayassume that g(x) = 0 and (x) = 0. In that case any optimal solution d willcertainly satisfy dTH(x)d = 1. Hence, if d is optimal then

    g(x) = H(x)d, R,where is a Lagrange multiplier. This implies d = H(x)1g(x) = x, wherex denotes the Newton step at x with respect to . Now dTH(x)d = 1 impliesxH(x)x = 2. Since we also have xH(x)x = (x)2, it follows that =(x). So we get

    d = x(x)

    ,

    whence, using H(x)x = g(x),dTg(x) = g(x)Tx

    (x)=

    xTH(x)x

    (x)=

    (x)2

    (x)= (x),

    proving the lemma. 2

    For future use we also state the following result.

    Lemma 7.6. If is a self-concordant -barrier then we havedTg(x)

    2 dTH(x)d, d Rn, x D.38

  • 8/3/2019 nonlinear optimization SC function

    44/158

    Proof. The inequality in the lemma is homogeneous in d. Hence we may assumethat dTH(x)d = 1. Now Lemma 7.5 implies that dTg(x) (x). Hence we obtaindTg(x)2 (x)2. By Definition 7.1 this implies the lemma. 2

    Exercise 7.5. If is a self-concordant -barrier then

    g(x)g(x)T H(x), x D.Derive this from Lemma 7.6.

    Solution: The statement in Lemma 7.6 can be written as

    dTg(x)g(x)Td dTH(x)d, d Rn, x D,which implies that g(x)g(x)T H(x). 2

    Exercise 7.6. Prove that if is self-concordant then is a -barrier if and only if

    ((x)[h])2 2(x)[h, h], x D, h Rn. (7.10)

    Solution: Since (x)[h] = x,h(0) and 2(x)[h, h] = x,h(0), this is an obvious conse-quence of (7.7). 2

    We conclude this section with one more characterization of the -barrier property,leaving the proof to the reader.

    Exercise 7.7. Let be a self-concordant function and x,h defined as usual, then has the -barrier property if and only if

    d

    d

    1

    x,h()

    1

    , x D, h Rn, dom x,h. (7.11)

    Prove this.

    Solution: One has

    dd

    1x,h()

    = x,h()x,h()

    2 .

    Hence, (7.11) is equivalent to (7.6), which completes the proof. 2

    Assuming that (P) has x as optimal solution, we proceed with estimating theobjective value cTx in terms of and (x). This is the subject in the next section.

    39

  • 8/3/2019 nonlinear optimization SC function

    45/158

    7.3 Estimate ofcTx cTx

    For the analysis of our algorithm we will need some more lemmas.

    Lemma 7.7. Let be a self-concordant -barrier and x D and x + d D. ThendTg(x) .

    Proof. Consider the function

    q() = dTg(x + d), [0, 1).Observe that q(0) = dTg(x). So we need to show that q(0) . If q(0) 0 thereis nothing to prove. Therefore, assume that q(0) > 0. Since (x) is a -barrier, wehave by Lemma 7.6, for any [0, 1),

    q() = dTH(x + d)d 1

    dTg(x + d)

    2=

    1

    (q())2 .

    Therefore, q() is increasing and hence positive for [0, 1]. Therefore, we maywrite

    1

    10

    q()

    (q())2d = 1

    q()

    10

    =1

    q(0) 1

    q(1) 0;

    proximity parameter (0, 1

    );

    update parameter , 0 < < 1;

    x0 D and 0 > 0 such that 0(x0) .begin

    x := x0; := 0;

    while > do

    := (1 );x := x + x;

    endwhile

    end

    Figure 7.1. Algorithm with full Newton steps

    completely determined by , 0 and , according to the following lemma.

    42

  • 8/3/2019 nonlinear optimization SC function

    48/158

    Lemma 7.10. The number of iterations of the algorithm does not exceed the number

    1

    log0

    .

    Proof. The algorithm stops when . After the k-th iteration we have =(1 )k0, where 0 denotes the initial value of . Hence the algorithm will havestopped if k satisfies

    (1 )k0 .Taking logarithms at both sides this gives

    k log(1

    )

    log

    0

    .

    Since log(1 ) < 0 this is equivalent to

    k 1 log(1 ) log0

    .

    Since log(1 ) , this certainly holds if

    k 1

    log0

    ,

    which implies the lemma. 2

    7.4.1 Analysis of the algorithm with full Newton steps

    In this section we prove the following theorem.

    Theorem 7.11. If = 19 and =1

    2+8

    , then the algorithm with full Newton

    steps is well-defined and requires not more than

    2 1 + 4

    log0

    iterations. The output is a point x D such that

    cTx cTx +

    1 +1 + 9

    722

    ,

    where x denotes an optimal solution of (P).

    43

  • 8/3/2019 nonlinear optimization SC function

    49/158

    Proof. We need to find values of and that make the algorithm well-defined. Atthe start of the first iteration we have x = x0

    Dand = 0 such that

    (x)

    .

    When the barrier parameter is updated to + = (1 ), Lemma 7.2 gives

    +(x) (x) +

    1 +

    1 . (7.12)

    Then after the Newton step, the new iterate is x+ = x + x and

    +(x+)

    +(x)

    1 +(x)2

    . (7.13)

    The algorithm is well defined if we choose and such that +(x+) . To get

    the lowest iteration bound, we need at the same time to maximize . From (7.13)we deduce that

    +(x+)

    certainly holds if

    +(x)

    1 +(x)

    ,

    which is equivalent to

    +(x)

    +

    . (7.14)

    According to (7.12) thisand hence +(x+) will hold if

    +

    1

    +

    .

    This leads us to the following condition on :

    1

    +

    (1 +

    )

    .

    Replacing

    by this leads to

    1 2

    +

    (1 + )=

    1 2+

    (1 + )

    .

    The question is which value of 0 yields the largest possible value for , becausethis value will minimize the iteration bound of Lemma 7.10. Of course, this valuedepends on the so-called complexity number

    . Some elementary analysis makes

    clear that the optimal value of lies between 0.27 and 0.30. Table 7.1 shows forsome values of the smallest possible value for 1/. Probably the best value is 0.29715 with 1/ = 1.62723 + 7.10323, since this has the smallest possiblecoefficient of the complexity number (which may be large in practice). To simplifythe presentation we will work with = 1/(2+8

    ), which is a lower bound for the

    largest possible value of . This is compatible with = 1/3, which gives = 19

    .Thus we have justified the choice of the values of and in the theorem.

    44

  • 8/3/2019 nonlinear optimization SC function

    50/158

    value of 1

    0.27 1.52184 + 7.15828 0.28 1.55860 + 7.12504

    0.29 1.59770 + 7.10701

    0.30 1.63934 + 7.10383

    0.31 1.68379 + 7.11535

    0.32 1.73130 + 7.14162

    0.33 1.78221 + 7.18286

    0.34 1.83688 + 7.23949

    0.35 1.89573 + 7.31212 0.36 1.95925 + 7.40160

    Table 7.1. Some values of 1

    as a function of .

    Now that is given, the iteration bound is immediate from Lemma 7.10. Thelast statement in the theorem is implied by Lemma 7.9. Because at terminationof the algorithm we have 9(x) 1 and . Hence, denoting = (x),Lemma 7.9 implies that

    cTx cTx + 1 + ( + )(1 )

    cTx +

    1 +

    19

    19

    +

    (89

    )

    = cTx +

    1 +1 + 9

    722

    .

    This completes the proof. 2

    7.5 Algorithm with damped Newton steps

    The method that we considered in the previous sections is in practice rather slow.This is due to the fact that the barrier update parameter is rather small. Forexample, in the case of linear optimization the set D is the intersection of Rn andand affine space {x : Ax = b}, for some A and b. From Exercise 7.1 we know thatthe logarithmic barrier function ni=1 log xi is a 1-self-concordant n-barrier for

    45

  • 8/3/2019 nonlinear optimization SC function

    51/158

    Rn+. In that case we have = 1 and = n, and hence the value of is given by = 5

    9+36n. Assuming 0 = 1 in Theorem 7.11, this leads to the iteration bound

    2

    1 + 4

    n

    logn

    = O

    n log

    n

    ,

    which is up till now the best known bound for linear optimization.

    In practice one is tempted to accelerate the algorithm by taking larger values of. But this is not justified by the theory, and in fact may cause the algorithm to failbecause the full Newton step may yield an infeasible point. However, by dampingthe Newton step we can keep the iterates feasible. In this section we investigatethe resulting method, which is in practice much faster than the full-Newton stepmethod. So we consider in this section the case where is some small (but fixed)constant in the interval (0, 1), for example = 0.5 or = 0.99, and where the new

    iterate is obtained from x+ = x + x,

    where x is the Newton step at x and where is the so-called damping factor,which is also taken from the interval (0, 1), but which has to be carefully chosen.

    The algorithm is described in Figure 7.2. We refer to the first while-loop in the

    Input:

    A proximity parameter = 13

    ;

    an accuracy parameter > 0;

    an update parameter , 0 < < 1;

    x0 D and 0 > 0 such that 0(x0) .begin

    x := x0; := 0;

    while > do

    := (1 );while (x) > do

    = 11+(x)

    ;

    x := x + x;

    endwhile

    endwhile

    end

    Figure 7.2. Algorithm with damped Newton steps

    algorithm as the outer loop and to the second while-loop as the inner loop. Each

    46

  • 8/3/2019 nonlinear optimization SC function

    52/158

    execution of the outer loop is called an outer iteration and each execution of theinner loop an inner iteration. The main task in the analysis of the algorithm is toderive an upper bound for the number of iterations in the inner loop, because thenumber of outer iterations follows from Lemma 7.10.

    7.5.1 Analysis of the algorithm with damped Newton steps

    As we will see, in the analysis of the algorithm many results can be used that wealready obtained in the analysis of the algorithm for minimizing a self-concordantfunction with damped Newton steps, in Section 6.9.

    Due to the choice of the damping factor in the algorithm, Theorem 6.24 impliesthat in each inner iteration the decrease in the value of satisfies

    (x) (x + x) ((x))

    2 .

    Since during each inner iteration (x) and > 13 , we obtain

    (x) (x + x) ()2

    >1

    2

    1

    3

    =

    0.0457

    2>

    1

    222.

    Thus we see that each inner iteration decreases the value of with at least1

    222.

    This implies that we can easily find an upper bound for the number of inner iter-ations during one outer iteration if we know the difference between the values of at the start and at the end of one outer iteration. Since +(x) is minimal atx = x(+), this difference is not larger than

    +(x)

    +(x(+)),

    where x denotes the iterate at the start of an outer iteration and + = (1 ) thevalue of the barrier parameter after the -update.

    The proofs of the next two lemmas follow similar arguments as used in the proofof Theorem 2.2 in [4].

    Lemma 7.12. Let0 < . Then we have

    d(x())

    d= c

    Tx()

    2=

    g(x())Tx()

    .

    Proof. Denoting the derivative of x() with respect to as x(), we may writed(x())

    d=

    d

    d

    cTx()

    + (x())

    = c

    Tx()

    2+

    cTx()

    + g(x())Tx().

    The definition of x(), as minimizer of (x), implies

    g(x()) = c

    .

    47

  • 8/3/2019 nonlinear optimization SC function

    53/158

    Hence we obtaincTx()

    + g(x())T

    x() = 0,

    whenced(x())

    d= c

    Tx()

    2,

    which implies the lemma. 2

    Lemma 7.13. Let x D, (x) = 1/(3) and + = (1 ). Then we have

    +(x)

    +(x(

    +))

    1

    13 2

    +

    1 .

    Proof. Fixing x D, we define

    () = (x) (x()).

    Then we need to find an upper bound for (+). According to the Mean ValueTheorem there exists a (+, ) such that

    (+) = () + () (+ ) . (7.15)

    Let us consider first (). We have

    () =d(x)

    d d(x())

    d=

    cTx2

    d(x())d

    . (7.16)

    Using Lemma 7.12 we get

    () =cTx

    2+

    cTx()

    2=

    cT (x() x)2

    =g(x())T (x x())

    .

    Now applying Lemma 7.7 twice, with d = x x() and d = x() x respectively,we obtain

    | ()|

    .

    Hence, since

    (+, ), we get

    | ()| +

    .

    Substitution into (7.15) yields

    (+) () + +

    ( +) = () + 1 .

    48

  • 8/3/2019 nonlinear optimization SC function

    54/158

    In other words,

    +(x) +(x(+)) (x) (x()) + 1 .

    Since (x) = 1/(3), we derive from Theorem 6.34 that

    (x) (x()) 12

    1

    3

    =

    0.0721318

    2 0 these bounds are given by2

    1 + 4

    log0

    and

    22 2

    1

    13 2+

    1

    log0

    =

    22

    13 +

    22 2

    1

    log0

    ,

    respectively. Neglecting the factor log 0

    , we see that the first bound is O().

    On the other hand, when assuming = (1), the second bound is O(()2).This shows that from a theoretical point of view the full-Newton step method

    is more efficient than the damped-Newton step method. In practice, however, theconverse holds. This phenomenon has become known as the irony of interior-pointmethods [10, page 51].

    Also note that in both cases the quantity

    is solely responsible for the iterationbound, or complexity of the algorithm. That is why we followed [3] and called thisthe complexity number of .

    Exercise 7.10. Verify that the complexity value does not change if we scalar (x)by a positive scalar.

    Solution: 2

    7.6 Adding equality constraints

    In many cases the vector x of variables in (P) not only has to belong to D but hasalso to satisfy a system of equality constraints. The problem then becomes

    (P) min

    cTx : Ax = b, x D .We assume that A is a m n matrix and rank (A) = m. This problem can besolved without much extra effort. The search direction has to be designed suchthat feasibility is maintained. Given a feasible x we take as a search direction x

    the direction that minimizes the second order Tailor polynomial at x subject to thecondition Ax = 0. Thus we consider the problem

    min

    (x) + x

    Tg(x) +1

    2x

    TH(x)x : Ax = 0

    .

    This gives rise to the system

    H(x)x + g(x) = ATy, Ax = 0,

    50

  • 8/3/2019 nonlinear optimization SC function

    56/158

    whence, denoting H(x) as H,

    x = H1AT

    AH1AT

    1 AH1g(x) H1g(x)or, equivalently,

    H12 x =

    I H 12 AT AH1AT1 AH 12H 12 g(x)

    = PAH

    12

    H12 g(x),

    where PAH

    12

    denotes the orthogonal projection onto the null space of AH12 . Note

    that if the system Ax = b is void, i.e., A = 0 and b = 0, then x is just the olddirection.

    Denoting the feasible region of (P) as

    Pand its interior as

    P, one easily un-

    derstands that the restriction P of to P is a -self-concordant -barrier for P.Moreover, x as above, is precisely the Newton direction for P at x P. Hence,essentially the same full-Newton step method and damped-Newton step method asbefore can be used to solve the above problem in polynomial time. More efficientmethods can be obtained by using different schemes, like adaptive -updates, apredictor-corrector method, etc. We do not work out this further.

    Exercise 7.11. Verify that the restriction P of to P is a -self-concordant-barrier for P.

    Solution: 2

    51

  • 8/3/2019 nonlinear optimization SC function

    57/158

    52

  • 8/3/2019 nonlinear optimization SC function

    58/158

    Chapter 8

    Solving convex

    optimization problems

    8.1 Introduction

    In this chapter we show how the results of the previous chapter can be used to solvea convex optimization problem of the form

    inf {f(x) : x F} ,where

    F= {x Rn : gj(x) 0, 1 j m} . (8.1)It will be assumed that the functions f(x) and gj(x) are continuously differentiable.

    First we observe that without loss of generality we assume that f(x) is a linearfunction. Because we can introduce a new variable , add the constraint f(x) 0to the problem and then minimize . So, assuming that f(x) = cTx for somesuitable vector c, we may instead consider the following problem.

    (CP) inf cTx : x F (8.2)

    with Fas defined in (8.1).The Lagrange-Wolfe dual of (CP) is given by

    (CD)

    sup cTx + mj=1 yjgj(x)mj=1 yjgj(x) = c

    yj 0, j = 1, , m.We will furthermore assume that (CP) satisfies the Slater condition, namely

    F0 := int F = .It is clear from the previous chapter that this problem is tractable from a compu-tational point of view whenever we have a self-concordant barrier function for the

    53

  • 8/3/2019 nonlinear optimization SC function

    59/158

    interior F0 of the domain of (CP). The aim of this chapter is to show that in manycases such a self-concordant barrier function can be obtained, but that this oftenrequires a reformulation of the constraint functions in (CP).

    8.2 Getting a self-concordant barrier for F

    As we show in this section, a self-concordant barrier for F0 can be easily obtained ifwe know self-concordant barriers for the subsets ofRn determined by the constraintfunctions. Let us define

    Fj := {x Rn : gj(x) 0} , (8.3)and suppose that j(x) is a self-concordant barrier function for Fj . Then

    (x) =

    mj=1

    j(x) (8.4)

    is a self-concordant barrier function for F. This follows from the following lemma.

    Lemma 8.1. Ifj is aj-self-concordant j-barrier forFj, where j {1, . . . , m},then , as just defined, is a -self-concordant -barrier for F= mj=1Fj, with

    = max1jm

    j , =mj=1

    j .

    Proof. The lemma is obvious if m = 1. Below we prove the lemma for m = 2.The extension to larger values of m is straightforward, by using induction to m. Sosuppose m = 2 and (x) = 1(x) + 2(x), where x F1 F2. Now using Lemma?? we may write, for any h Rn,3(x)[h,h,h]

    (2(x)[h, h]) 32=

    31(x)[h,h,h] + 32(x)[h,h,h](21(x)[h, h] + 22(x)[h, h])

    32

    2132

    1 + 2232

    2

    (1 + 2)32

    .

    where i = 2i(x)[h, h], with i {1, 2}. The last expression is homogeneous in[1; 2]. So, since i 0, we may assume 1 + 2 = 1. With = 1, the lastexpression then gets the form

    2132 + 22(1 ) 32 , [0, 1].

    This function is convex in , and hence its maximal value occurs either for = 0or for = 1. Hence the maximal value is given by 2 max{1, 2}. This proves thestatement on .

    On the other hand, by Lemma 7.6,

    ((x)[h])22(x)[h, h] =

    (1(x)[h] + 2(x2)[h])221(x)[h, h] + 22(x)[h, h]

    12

    1 12

    1 + 12

    2 12

    2

    21 + 2

    1 + 2.

    54

  • 8/3/2019 nonlinear optimization SC function

    60/158

    The last inequality is due to the Cauchy-Schwarz inequality, since 1 + 2 = 1.Thus the lemma has been proved. 2

    Recall from Theorem 7.11 and Theorem 7.14 that the complexity of the algo-rithms that we considered in the previous chapter is an increasing function of

    .

    Therefore, it is quite important to get an SCB with

    as small as possible. Inthis respect it is worth to recall from Exercise 6.6 that a positive multiple of aself-concordant function is again a self-concordant function. The next lemma showsthat a similar statement holds for self-concordant barrier functions.

    Lemma 8.2. Let be -self-concordant barrier (or shortly (, )-SCB) and R, > 0. Then is a (

    , )-SCB.

    Proof. According to the definitions of and we have3(x)[h,h,h]2(2(x)[h, h])3 4

    2,((x)[h])22(x)[h, h] , x D, h R

    n.

    Denoting = , this implies3(x)[h,h,h]22(x)[h, h]3 =23(x)[h,h,h]2

    32(x)[h, h]3 4

    2and

    (x)[h]

    2

    2

    (x)[h, h]

    =2 ((x)[h])2

    2

    (x)[h, h] ,

    x

    D,

    h

    Rn.

    This proves the statement in the lemma. 2

    It thus follows that instead of the SCB in (8.4) we can also use the SCB thatis obtained by multiplying each term j(x) in the sum defining (x) by a positivemultiplier j . This yields

    (x) =mj=1

    jj(x), j > 0,

    which is a (, )-SCB with = maxj jj and =

    mj=1 jj . So, it might

    be useful to use (positive) multipliers j that minimize . It has been arguedthat the optimal choice is j = 2j , which gives = 1 and =

    mj=1

    2jj [3, page

    4849]. Later on (cf. Lemma 9.10) we show that there exist positive multipliers jsuch that

    =

    mj=1

    2jj ,

    55

  • 8/3/2019 nonlinear optimization SC function

    61/158

    and that is the best possible (i.e., minimal) value for

    .

    In many problems one or more constraint have the formpi=1

    hi(x) t, (8.5)

    where also t is a decision variable in the problem. In such cases it is often convenientto replace this constraint by the equivalent system of inequalities

    hi(x) ti (i = 1, . . . , p),pi=1

    ti t. (8.6)

    Exercise 8.1. Prove that x and t satisfy (8.5) if and only there exist ti (i =

    1, . . . , p) such that x and t satisfy (8.6).

    Solution: It is obvious that (8.6) implies (8.5). On the other hand, if x and t satisfy(8.5), then defining ti = hi(x), also (8.6) is satisfied. 2

    Exercise 8.2. Prove that

    log

    t pi=1

    ti

    is a (1, 1)-self-concordant barrier function for the linear constraint pi=1 ti t in

    (8.6).

    Solution: Denoting x = (t; x) and g(x) = t pi=1 ti, and letting h = (h0; h), withh Rn, we define

    () := log

    t + h0 p

    i=1

    (ti + hi)

    = log g(x + h).

    Then

    () = g(h)g(x + h)

    , () =g(h)2

    g(x + h)2, () = g(h)

    3

    g(x + h)3,

    and hence

    (0) = g(h)g(x) ,

    (0) = g(h)

    2

    g(x)2 , (0) = 2g(

    h)

    3

    g(x)3 .

    Therefore,|(0)|(0)

    32

    = 2,(0)2

    (0)= 1,

    proving the claim. 2

    56

  • 8/3/2019 nonlinear optimization SC function

    62/158

    f(x) (x, t)

    1 log x, x > 0 log(t + log x) log x 1 22 ex log (log t x) log t 1 23 x log x, x > 0 log(t x log x) log x 1 24 1

    x, x > 0 log(tx 1) , x > 0 1 2

    5 xp, x > 0, p 1 log

    x t 1p

    log t 1 26 xp, x > 0, 0 p 1 log(xp + t) log x 1 27 |x|p , p 1 log

    t2p x2

    2log t 1 4

    Figure 8.1. Self-concordant barriers for some 2-dimensional sets

    It follows from Lemma 8.1 that if we know (i, i)-SCBs i(x, t) for the respectiveepigraphs of hi(x) (1 i p) then

    pi=1

    i(x, ti) log

    t pi=1

    ti

    is a (, )-SCB with = maxi {i} = and = 1 +pj=1 i.

    8.3 Tools for proving self-concordancy

    Recall that the epigraph of a function f : D R is defined by

    epi(f) = {(x, t) : x D, f(x) t} ,

    and that f is a convex function if and only if epi (f) is a convex set (cf. Exercise??). In many cases the domain of a problem is described by inequalities of the formf(x) t. The domain is then (part of) the intersection of the epigraphs of convexfunctions. The table in Figure 8.1 shows some examples of self-concordant barriersfor 2-dimensional sets that are epigraphs of simple convex functions [5, page 22].

    One easily verifies that in all cases one has f(x) < t if and only if (x, t) belongsto the domain of (x, t). However, to prove that the functions (x, t) in this tableare SCBs, with the given values of and , is a nontrivial and tedious task. T