Nonconvex Opt

8/10/2019 Nonconvex Opt

1/48

Nonconvex Optimization forCommunication Systems

Mung Chiang

Electrical Engineering DepartmentPrinceton University, Princeton, NJ 08544, [email protected]

Summary. Convex optimization has provided both a powerful tool and an intrigu-ing mentality to the analysis and design of communication systems over the lastfew years. A main challenge today is on nonconvex problems in these application.This paper presents an overview of some of the important nonconvex optimization

problems in point-to-point and networked communication systems. Three typical ap-plications are covered: Internet congestion control through nonconcave network util-ity maximization, wireless network power control through geometric and sigmoidalprogramming, and DSL spectrum management through distributed nonconvex op-timization. A variety of nonconvex optimization techniques are showcased: fromstandard dual relaxation to sum-of-squares programming through successive SDPrelaxation, signomial programming through successive GP relaxation, and leveragingthe specific structures in problems for efficient and distributed heuristics.

Key words: Nonconvex optimization, Geometric programming, Semidefi-nite programming, Sum of squares, Duality, Network utility maximization,TCP/IP, Wireless network, Power control.

1 Introduction

There has been two major waves in the history of optimization theory: thefirst started with linear programming and simplex method in late 1940s, andthe second with convex optimization and interior point method in late 1980s.Each has been followed by a transforming period of appreciation-applicationcycle: as more people appreciate the use of LP/convex optimization, morelook for their formulations in various applications; then more work on its the-ory, efficient algorithms and softwares; the more powerful the tools become;then more people appreciate its usage. Communication systems benefit sig-nificantly from both waves, including multicommodity flow solutions (e.g.,

Bellman Ford algorithm) from LP, and basic network utility maximizationand robust transceiver design from convex optimization.


2/48


3/48

Sp ecial Volume 3

functions, NUM substantially expands the scope of the classical LP-basedNetwork Flow Problems.

Consider a communication network withLlinks, each with a fixed capacityofcl bps, and Ssources (i.e., end users), each transmitting at a source rateofxs bps. Each source s emits one flow, using a fixed set L(s) of links in its

path, and has a utility function Us(xs). Each link l is shared by a set S(l)of sources. Network Utility Maximization (NUM), in its basic version, is thefollowing problem of maximizing the total utility of the network

sUs(xs),

over the source ratesx, subject to linear flow constraintss:lL(s) xs cl for

all links l:maximize

sUs(xs)

subject tosS(l) xs cl, l,

x 0(1)

where the variables are x RS.There are many nice properties of the basic NUM model due to several

simplifying assumptions of the utility functions and flow constraints, whichprovide the mathematical tractability of problem (1) but also limit its ap-

plicability. In particular, the utility functions {Us} are often assumed to beincreasing and strictly concave functions.

Assuming thatUs(xs) becomes concave for large enoughxs is reasonable,because the law of diminishing marginal utility eventually will be effective.However,Us may not be concave throughout its domain. In his seminal paperpublished a decade ago, Shenker [45] differentiated inelastic network trafficfrom elastic traffic. Utility functions for elastic traffic were modeled as strictlyconcave functions. Whileinelasticflows with nonconcave utility functions rep-resent important applications in practice, they have received little attentionand rate allocation among them have scarcely any mathematical foundation,except three recent publications [28, 12, 15] (see also earlier work in [54, 29, 30]related to the approach in [28])

In this section, we investigate the extension of the basic NUM to max-

imization of nonconcave utilities, as in the approach of [15]. We provide acentralized algorithm for off-line analysis and establishment of a performancebenchmark for nonconcave utility maximization when the utility function is apolynomial or signomial. Based on the semialgebraic approach to polynomialoptimization, we employ convex sum-of-squares (SOS) relaxations solved bya sequence of semidefinite programs (SDP), to obtain increasingly tighter up-per bounds on total achievable utility for polynomial utilities. Surprisingly, inall our experiments, a very low order and often a minimal order relaxationyields not just a bound on attainable network utility, but the globally maxi-mized network utility. When the bound is exact, which can be proved usinga sufficient test, we can also recover a globally optimal rate allocation.


4/48

4 Chiang: Nonconvex Optimization for Communication Systems

Canonical distributed algorithm

A reason that the the assumption of utility functions concavity is upheld inalmost all papers on NUM is that it leads to three highly desirable mathe-matical properties of the basic NUM:

It is a convex optimization problem, therefore the global minimum can becomputed (at least in centralized algorithms) in worst-case polynomial-time complexity [5].

Strong duality holds for (1) and its Lagrange dual problem. Zero dualitygap enables a dual approach to solve (1).

Minimization of a separable objective function over linear constraints canbe conducted by distributed algorithms based on the dual approach.

Indeed, the basic NUM (1) is such a nice optimization problem thatits theoretical and computational properties have been well studied since the1960s in the field of monotropic programming, e.g., as summarized in [41].For network rate allocation problems, a dual-based distributed algorithm hasbeen widely studied (e.g., in [24, 32]), and is summarized below.

Zero duality gap for (1) states that the solving the Lagrange dual problemis equivalent to solving the primal problem (1). The Lagrange dual problemis readily derived. We first form the Lagrangian of (1):

L(x,) =s

Us(xs) +l

l

cl

sS(l)

xs

wherel 0 is the Lagrange multiplier (link congestion price) associated withthe linear flow constraint on link l. Additivity of total utility and linearityof flow constraints lead to a Lagrangian dual decomposition into individualsource terms:

L(x,) =s

Us(xs) lL(s)

l

xs+l

cll

=s

Ls(xs, s) +l

cll

where s =

lL(s) l. For each source s, Ls(xs, s) = Us(xs) sxs only

depends on local xs and the link prices l on those links used by source s.The Lagrange dual functiong() is defined as the maximized L(x,) over

x. This net utility maximization obviously can be conducted distributivelyby the each source, as long as the aggregate link price s =

lL(s) l is

available to source s, where source s maximizes a strictly concave functionLs(xs, s) overxs for a given s:

xs(s) = argmax[Us(xs)

sxs] , s. (2)


5/48

Sp ecial Volume 5

The Lagrange dual problem is

minimize g() = L(x(),)subject to 0

(3)

where the optimization variable is

. Any algorithms that find a pair of primal-dual variables (x,) that satisfy the KKT optimality condition would solve(1) and its dual problem (23). One possibility is a distributed, iterative sub-gradient method, which updates the dual variables to solve the dual problem(23):

l(t + 1) =

l(t) (t)

cl

sS(l)

xs(s(t))

+

, l (4)

wheret is the iteration number and (t)> 0 are step sizes. Certain choices ofstep sizes, such as (t) =0/t, 0 > 0, guarantee that the sequence of dualvariables (t) will converge to the dual optimal as t . The primalvariable x((t)) will also converge to the primal optimal variable x. For aprimal problem that is a convex optimization, the convergence is towards theglobal optimum.

The sequence of the pair of algorithmic steps (2,4) forms a canonical dis-tributed algorithm that globally solves network utility optimization problem(1) and the dual (23) and computes the optimal rates x and link prices .

Nonconcave Network Utility Maximization

It is known that for many multimedia applications, user satisfaction mayassume non-concave shape as a function of the allocated rate. For example, theutility for streaming applications is better described by a sigmoidal function:with a convex part at low rate and a concave part at high rate, and a singleinflexion pointx0 (withUs(x

0) = 0) separating the two parts. The concavity

assumption onUsis also related to the elasticity assumption on rate demandsby users. When demands for xs are not perfectly elastic, Us(xs) may not beconcave.

Suppose we remove the critical assumption that {Us} are concave func-tions, and allow them to be any nonlinear functions. The resulting NUMbecomes nonconvex optimization and significantly harder to be analyzed andsolved, even by centralized computational methods. In particular, a local op-timum may not be a global optimum and the duality gap can be strictlypositive. The standard distributive algorithms that solve the dual problemmay produce infeasible or suboptimal rate allocation.

Despite such difficulties, there have been two very recent publications ondistributed algorithm for nonconcave utility maximization. In [28], a self-regulation heuristic is proposed to avoid the resulting oscillation in rate

allocation and shown to converges to an optimal rate allocation asymptot-ically when the proportion of nonconcave utility sources vanishes. In [12], a


6/48


0 2 4 6 8 10 120

0.5

1

1.5

2

2.5

3

x

U(x

)

Fig. 1. Some examples of utility functions Us(xs): it can be concave or sigmoidalas shown in the graph, or any general nonconcave function. If the bottleneck linkcapacity used by the source is small enough, i.e., if the dotted vertical line is pushedto the left, a sigmoidal utility function effectively becomes a convex utility function.

set of sufficient conditions and necessary conditions is presented under whichthe canonical distributed algorithm still converges to the globally optimal so-lution. However, these conditions may not hold in many cases. These twoapproaches illustrate the choice between admission control and capacity plan-ning to deal with nonconvexity (see also the discussion in [23]). But neitherapproach provides a theoretically polynomial-time and practically efficient al-gorithm (distributed or centralized) for nonconcave utility maximization.

In this section, we remove the concavity assumption on utility functions,thus turning NUM into a nonconvex optimization problem with a strictly pos-itive duality gap. Such problems in general are NP hard, thus extremely un-likely to be polynomial-time solvable even by centralized computations. Using

a family of convex semidefinite programming (SDP) relaxations based on thesum-of-squares (SOS) relaxation and the Positivstellensatz Theorem in realalgebraic geometry, we apply a centralized computational method to boundthe total network utility in polynomial-time. A surprising result is that for allthe examples we have tried, wherever we could verify the result, the tightestpossible bound (i.e., the globally optimal solution) of NUM with nonconcaveutilities is computed with a very low order relaxation. This efficient numer-ical method for off-line analysis also provides the benchmark for distributedheuristics.

These three different approaches: proposing distributed but suboptimalheuristics (for sigmoidal utilities) in [28], determining optimality conditionsfor the canonical distributed algorithm to converge globally (for all nonlinearutilities) in [12], and proposing efficient but centralized method to computethe global optimum (for a wide class of utilities that can be transformed into


7/48

Sp ecial Volume 7

polynomial utilities) in [15] and this section, are complementary in the studyof distributed rate allocation by nonconcave NUM.

2.2 Global maximization of nonconcave network utility

Sum-of-squares method

We would like to bound the maximum network utility byin polynomial timeand search for a tight bound. Had there been no link capacity constraints,maximizing a polynomial is already an NP hard problem, but can be relaxedinto a SDP [46]. This is because testing if the following bounding inequalityholds p(x), where p(x) is a polynomial of degree d in n variables, isequivalent to testing the positivity ofp(x), which can be relaxed into testingif p(x) can be written as a sum of squares (SOS): p(x) =

ri=1 qi(x)

2 forsome polynomialsqi, where the degree ofqi is less than or equal to d/2. Thisis referred to as the SOS relaxation. If a polynomial can be written as a sumof squares, it must be non-negative, but not vice versa. Conditions underwhich this relaxation is tight were studied since Hilbert. Determining if asum of squares decomposition exists can be formulated as an SDP feasibilityproblem, thus polynomial-time solvable.

Constrained nonconcave NUM can be relaxed by a generalization of theLagrange duality theory, which involves nonlinearcombinations of the con-straints instead of linear combinations in the standard duality theory. The keyresult is the Positivstellensatz, due to Stengle [48], in real algebraic geometry,which states that for a system of polynomial inequalities, either there existsa solution in Rn or there exists a polynomial which is a certificate that nosolution exists. This infeasibility certificate is recently shown to be also com-putable by an SDP of sufficient size [38, 37], a process that is referred to asthe sum-of-squares method and automated by the software SOSTOOLS [39]initiated by Parrilo in 2000. For a complete theory and many applications of

SOS methods, see [38] and references therein.Furthermore, the bound itself can become an optimization variable inthe SDP and can be directly minimized. A nested family of SDP relaxations,each indexed by the degree of the certificate polynomial, is guaranteed toproduce the exact global maximum. Of course, given the problem is NP hard,it is not surprising that the worst-case degree of certificate (thus the numberof SDP relaxations needed) is exponential in the number of variables. Whatis interesting is the observation that in applying SOSTOOLS to nonconcaveutility maximization, a very low order, often the minimum order relaxationalready produces the globally optimal solution.

Application of SOS method to nonconcave NUM

Using sum-of-squares and the Positivstellensatz, we set up the following prob-lem whose objective value converges to the optimal value of problem (1), where


8/48


{Ui} are now general polynomials, as the degree of the polynomials involvedis increased.

minimize subject to

s Us(xs)

l l(x)(cl

sS(l)xs)

j,k jk(x)(cj

sS(j) xs)(ck

sS(k) xs)

. . . 12...n(x)(c1 sS(1) xs) . . . (cn

sS(n) xs)

is SOS,l(x), jk(x), . . . , 12...n(x) are SOS.

(5)

The optimization variables are and all of the coefficients in polynomialsl(x), jk(x), . . . , 12...n(x). Note that x isnotan optimization variable; theconstraints hold for all x, therefore imposing constraints on the coefficients.This formulation uses Schmudgens representation of positive polynomialsover compact sets [44].1 Two alternative representations are discussed in [15].

Let D be the degree of the expression in the first constraint in (5). Werefer to problem (5) as the SOS relaxation of order D for the constrained

NUM. For a fixed D , the problem can be solved via SDP. As D is increased,the expression includes more terms, the corresponding SDP becomes larger,and the relaxation gives tighter bounds. An important property of this nestedfamily of relaxations is guaranteed convergence of the bound to the globalmaximum.

Regarding the choice of degree D for each level of relaxation, clearly apolynomial of odd degree cannot be SOS, so we need to consider only thecases where the expression has even degree. Therefore, the degree of the firstnon-trivial relaxation is the largest even number greater than or equal todegree of

sUs(xs), and the degree is increased by 2 for the next level.

A key question now becomes: How do we find out, after solving an SOSrelaxation, if the bound happens to be exact? Fortunately, there is a sufficienttest that can reveal this, using the properties of the SDP and its dual solu-tion. In [19, 26], a parallel set of relaxations, equivalent to the SOS ones, isdeveloped in the dual framework. The dual of checking the nonnegativity ofa polynomial over a semi-algebraic set turns out to be finding a sequence ofmomentsthat represent a probability measure with support in that set. Tobe a valid set of moments, the sequence should form a positive semidefinitemoment matrix. Then, each level of relaxation fixes the size of this matrix,i.e., considers moments up a certain order, and therefore solves an SDP. Thisis equivalent to fixing the order of the polynomials appearing in SOS relax-

1 Schmudgens representation applies when

Us(xs) is strictly positive onthe feasible set. Therefore the convergence is asymptotic in theory, however inpractice finite convergence is observed most of the time. If we were to use StenglesPositivstellensatz, we would have finite convergence but could not have as an

optimization variable and at each relaxation level would have to use a bisectionon . For computational convenience, we choose Schmudgens form.


9/48

Sp ecial Volume 9

ations. The sufficient rank test checks a rank condition on this moment matrixand recovers (one or several) optimal x, as discussed in [19].

In summary, we have the following Algorithm for centralized computationof a globally optimal rate allocation to nonconcave utility maximization, wherethe utility functions can be written as or converted into polynomials.

Algorithm 1. Sum-of-squares for nonconcave utility maximiza-tion.

1) Formulate the relaxed problem (5) for a given degree D.2) Use SDP to solve the Dth order relaxation, which can be conducted

using SOSTOOLS [39].3) If the resulting dual SDP solution satisfies the sufficient rank condition,

theDth order optimizer (D) is the globally optimal network utility, and acorrespondingx can be obtained 2.

4) IncreaseD to D + 2, i.e., the next higher order relaxation, and repeat.

In the following subsection, we give examples of the application of SOSrelaxation to the nonconcave NUM. We also apply the above sufficient test tocheck if the bound is exact, and if so, we recover the optimum rate allocationx that achieve this tightest bound.

2.3 Numerical Examples and Sigmoidal Utilities

Polynomial utility examples

First, consider quadratic utilities, i.e., Us(xs) = x2s as a simple case to startwith (this can be useful, for example, when the bottleneck link capacity limitssources to their convex region of a sigmoidal utility). We present examplesthat are typical, in our experience, of the performance of the relaxations.

Example 1. A small illustrative example. Consider the simple 2 link, 3

user network shown in Figure 2, with c = [1, 2]. The optimization problem is

x1

x2 x3

c1 c2

Fig. 2. Network topology for example 1.

2 Otherwise, (D) may still be the globally optimal network utility but is onlyprovably an upper bound.


10/48


maximize

s x2s

subject tox1+ x2 1x1+ x3 2x1, x2, x3 0.

(6)

The first level relaxation with D = 2 is

minimize subject to (x21+ x

22+ x

23) 1(x1 x2+ 1) 2(x1

x3+ 2) 3x1 4x2 5x3 6(x1 x2+ 1)(x1 x3+ 2) 7x1(x1 x2+ 1) 8x2(x1x2+ 1) 9x3(x1 x2+ 1) 10x1(x1 x3+ 2)11x2(x1 x3+ 2) 12x3(x1 x3+ 2)13x1x2 14x1x3 15x2x3 is SOS,i 0, i= 1, . . . , 15.

(7)

The first constraint above can be written as xTQx for x = [1, x1, x2, x3]T

and an appropriate Q. For example, the (1,1) entry which is the constantterm reads 1 22 26, the (2,1) entry, coefficient ofx1, reads 1 +2 3+ 36 7 210, and so on. The expression is SOS if and only ifQ 0. The optimal is 5, which is achieved by, e.g., 1 = 1, 2 = 2, 3 =1, 8 = 1, 10 = 1, 12 = 1, 13 = 1, 14 = 2 and the rest of thei equal tozero. Using the sufficient test (or, in this example, by inspection) we find theoptimal ratesx0 = [0, 1, 2].

In this example, many of the i could be chosen to be zero. This meansnot all product terms appearing in (7) are needed in constructing the SOSpolynomial. Such information is valuable from the decentralization point ofview, and can help determine to what extent our bound can be calculated ina distributed manner. This is a challenging topic for future work.

Example 2. Larger tree topology. As a larger example, consider the net-

work shown in Figure 2.3 with 7 links. There are 9 users, with the followingrouting table that lists the links on each users path.

x1 x2 x3 x4 x5 x6 x7 x8 x91,2 1,2,4 2,3 4,5 2,4 6,5,7 5,6 7 5

Forc = [5, 10, 4, 3, 7, 3, 5], we obtain the bound = 116 with D = 2,which turns out to be globally optimal, and the globally optimal rate vectorcan be recovered:x0 = [5, 0, 4, 0, 1, 0, 0, 5, 7]. In this example, exhaustivesearch is too computationally intensive, and the sufficient condition test playsan important role in proving the bound was exact and in recovering x0.

Example 3. Large m-hop ring topology. Consider a ring network with nnodes, n users andn links where each users flow starts from a node and goesclockwise through the next m links, as shown in Figure 2.3 for n = 6, m = 2.

As a large example, with n = 25, m = 2 and capacities chosen randomlyfor a uniform distribution on [0, 10], using relaxation of order D = 2 we


11/48

Special Volume 11

c1 c2

c3

c4

c5

c6

c7


obtain the exact bound = 321.11 and recover an optimal rate allocation.For n = 30, m= 2, and capacities randomly chosen from [0, 15], it turns outthat D = 2 relaxation yields the exact bound 816.95 and a globally optimalrate allocation.

c1

c2

c3

c4

c5

c6


Sigmoidal utility examples

Now consider sigmoidal utilities in a standard form:

Us(xs) = 1

1 + e(asxs+bs),

where {as, bs} are constant integers. Even though these sigmoidal functionsare not polynomials, we show the problem can be cast as one with polynomialcost and constraints, with a change of variables.

Example 4.Sigmoidal utility.Consider the simple 2 link, 3 user exampleshown in Figure 2 for as= 1 and bs = 5.

The NUM problem is to


12/48


maximizes

11+e(xs5)

subject tox1+ x2 c1x1+ x3 c2x 0.

(8)

Letys= 1

1+e(xs5) , thenxs = log( 1ys 1)+ 5. Substituting forx1, x2 in

the first constraint, arranging terms and taking exponentials, then multiplyingthe sides by y1y2 (note thaty1, y2 > 0), we get

(1 y1)(1 y2) e(10c1)y1y2,

which is polynomial in the new variables y. This applies to all capacity con-straints, and the non-negativity constraints for xs translate to ys

11+e5 .

Therefore the whole problem can be written in polynomial form, and SOSmethods apply. This transformation renders the problem polynomial for gen-eral sigmoidal utility functions, with anyas andbs.

We present some numerical results, using a small illustrative example. HereSOS relaxations of order 4 (D = 4) were used. For c1 = 4, c2 = 8, we find= 1.228, which turns out to be a global optimum, with x0 = [0, 4, 8] as theoptimal rate vector. Forc1 = 9, c2 = 10, we find= 1.982 and x0 = [0, 9, 10].Now place a weight of 2 on y1, while the otherys have weight one, we obtain= 1.982 and x0= [9, 0, 1].

In general, ifas= 1 for some s, however, the degree of the polynomials inthe transformed problem may be very high. If we write the general problemas

maximize

s1

1+e(asxs+bs)

subject tosS(l) xs cl, l,

x 0,(9)

each capacity constraint after transformation will be

s(1 ys)rlsk=sak

exp(s as(cl+

s rls/asbs))

s yrls

k=s aks ,

where rls = 1 if l L(s) and equals 0 otherwise. Since the product of theas appears in the exponents, as > 1 significantly increases the degree of thepolynomials appearing in the problem and hence the dimension of the SDPin the SOS method.

It is therefore also useful to consider alternative representations of sig-moidal functions such as the following rational function:

Us(xs) = xnsa+ xns

,

where the inflection point is x0 = (a(n1)n+1 )1/n and the slope at the inflection

point isUs(x0) = n14n (

n+1a(n1) )1/n. Letys= Us(xs), the NUM problem in this

case is equivalent to


13/48

Special Volume 13

maximize

s yssubject toxns ysx

ns ays= 0

sS(l) xs cl, lx 0

(10)

which again can be accommodated in the SOS method and be solved byAlgorithm 1.

The benefit of this choice of utility function is that the largest degree ofthe polynomials in the problem is n+ 1, therefore growing linearly with n.The disadvantage compared to the exponential form for sigmoidal functionsis that the location of the inflection point and the slope at that point cannotbe set independently.

2.4 Alternative representations for convex relaxations tononconcave NUM

The SOS relaxation we used in the last two sections is based on Schmudgensrepresentation for positive polynomials over compact sets described by other

polynomials. We now briefly discuss two other representations of relevance tothe NUM, that are interesting from both theoretical (e.g., interpretation) andcomputational points of view.

LP relaxation

Exploiting linearity of the constraints in NUM and with the additional as-sumption of nonempty interior for the feasible set (which holds for NUM), wecan use Handelmans representation [18] and refine the Positivstellensatz con-dition to obtain the following convex relaxation of nonconcave NUM problem:

maximizesubject to

s Us(xs) =

NL

Ll=1

(cl sS(l) xs)

l , x

0, ,

(11)

where the optimization variables are and, and denotes an ordered setof integers{l}.

FixingD wherel l D, and equating the coefficients on the two sides

of the equality in (11), yields a linear program (LP). (Note that there are noSOS terms, therefore no semidefiniteness conditions.) As before, increasingthe degree D gives higher order relaxations and a tighter bound.

We provide a pricing interpretation for problem (11). First, normalize eachcapacity constraint as 1 ul(x) 0, where ul(x) =sS(l) xs/cl. We caninterpretul(x) as link usage, or the probability that link l is used at any givenpoint in time. Then, in (11), we have terms linear in u such as l(1 ul(x)),


14/48


in which l has a similar interpretation as in concave NUM, as the price ofusing link l. We also have product terms such as jk(1 uj(x))(1 uk(x)),wherejkuj(x)uk(x) indicates the probability of simultaneoususage of links

j and k, for links whose usage probabilities are independent (e.g., they do notshare any flows). Products of more terms can be interpreted similarly.

While the above price interpretation is not complete and does not justify allthe terms appearing in (11) (e.g., powers of the constraints; product terms forlinks with shared flows), it does provide some useful intuition: this relaxationresults in a pricing scheme that provides better incentives for the users toobserve the constraints, by putting additional reward (since the correspondingterm adds positively to the utility) for simultaneously keeping two links free.Such incentive helps tighten the upper bound and eventually achieve a feasible(and optimal) allocation.

This relaxation is computationally attractive since we need to solve anLPs instead of the previous SDPs at each level. However, significantly morelevels may be required [27].

Relaxation with no product terms

Putinar [40] showed that a polynomial positive over a compact set 3 canbe represented as an SOS-combination of the constraints. This yields thefollowing convex relaxation for nonconcave NUM problem:

maximize subject to

s Us(xs) =

Ll=1 l(x)(cl

sS(l) xs), x

(x) is SOS,

(12)

where the optimization variables are the coefficients in l(x). Similar to theSOS relaxation (5), fixing the order D of the expression in (12) results in an

SDP. This relaxation has the nice property that no product terms appear: therelaxation becomes exact with a high enoughD without the need of productterms. However, this degree might be much higher than what the previousSOS method requires.

2.5 Concluding Remarks and Future Directions

We consider the NUM problem in the presence of inelastic flows, i.e., flowswith nonconcave utilities. Despite its practical importance, this problem hasnot been studied widely, mainly due to the fact it is a nonconvex problem.There has been no effective mechanism, centralized or distributed, to com-pute the globally optimal rate allocation for nonconcave utility maximization

3

with an extra assumption that always holds for linear constraints as in NUMproblems


15/48

Special Volume 15

problems in networks. This limitation has made performance assessment anddesign of networks that include inelastic flows very difficult.

To address this problem, we employed convex SOS relaxations, solved by asequence of SDPs, to obtain high quality, increasingly tighter upper bounds ontotal achievable utility. In practice, the performance of our SOSTOOLS-based

algorithm was surprisingly good, and bounds obtained using a polynomial-time (and indeed a low-order and often minimal order) relaxation were foundto be exact, achieving the global optimum of nonconcave NUM problems.Furthermore, a dual-based sufficient test, if successful, detects the exactnessof the bound, in which case the optimal rate allocation can also be recovered.This performance of the proposed algorithm brings up a fundamental questionon whether there is any particular property or structure in nonconcave NUMthat makes it especially suitable for SOS relaxations.

We further examined the use of two more specialized polynomial repre-sentations, one that uses products of constraints with constant multipliers,resulting in LP relaxations; and at the other end of spectrum, one that uses alinear combination of constraints with SOS multipliers. We expect these re-laxations to give higher order certificates, thus their potential computationalbenefits need to be examined further. We also show they admit economicsinterpretations (e.g., prices, incentives) that provide some insight on how theSOS relaxations work in the framework of link congestion pricing for the si-multaneous usage of multiple links.

An important research issue to be further investigated is decentraliza-tion methods for rate allocation among sources with nonconcave utilities. Theproposed algorithm here is not easy to decentralize, given the products of theconstraints or polynomial multipliers that destroy the separable structure ofthe problem. However, when relaxations become exact, the sparsity patternof the coefficients can provide information about partially decentralized com-putation of optimal rates. For example, if after solving the NUM off-line, weobtain an exact bound, then if the coefficient of the cross-term xixj turns out

to be zero, it means users i and j do not need to communicate to each otherto find their optimal rates. An interesting next step in this area of research isto investigate distributed version of the proposed algorithm through limitedmessage passing among clusters of network nodes and links.

3 Wireless Network Power Control

3.1 Introduction

Due to the broadcast nature of radio transmission, data rates and other Qual-ity of Service (QoS) in a wireless network are affected by interference. This isparticularly important in CDMA systems where users transmit at the same

time over the same frequency bands and their spreading codes are not per-fectly orthogonal. Transmit power control is often used to tackle this problem


16/48


of signal interference. We study how to optimize over the transmit powers tocreate the optimal set of Signal-to-Interference Ratios (SIR) on wireless links.Optimality here can be with respect to a variety of objectives, such as maxi-mizing a system-wide efficiency metric (e.g., the total system throughput), ormaximizing a Quality of Service (QoS) metric for a user in the highest QoS

class, or maximizing a QoS metric for the user with the minimum QoS metricvalue (i.e., a maxmin optimization).

While the objective represents a system-wide goal to be optimized, individ-ual users QoS requirements also need to be satisfied. Any power allocationmust therefore be constrained by a feasible set formed by these minimumrequirements from the users. Such a constrained optimization captures thetradeoff between user-centric constraints and some network-centric objective.Because a higher power level from one transmitter increases the interferencelevels at other receivers, there may not be any feasible power allocation tosatisfy the requirements from all the users. Sometimes an existing set of re-quirements can be satisfied, but when a new user is admitted into the system,there exists no more feasible power control solutions, or the maximized objec-tive is reduced due to the tightening of the constraint set, leading to the needfor admission control and admission pricing, respectively.

Because many QoS metrics are nonlinear functions of SIR, which is in turna nonlinear (and neither convex nor concave) function of transmit powers, ingeneral power control optimization or feasibility problems are difficult non-linear optimization problems that may appear to be NP-hard problems. Thissection shows that, when SIR is much larger than 0dB, a class of nonlinearoptimization called Geometric Programming (GP) can be used to efficientlycompute the globally optimal power control in many of these problems, andefficiently determine the feasibility of user requirements by returning either afeasible (and indeed optimal) set of powers or a certificate of infeasibility. Thisalso leads to an effective admission control and admission pricing method.

The key observation is that despite the apparentnonconvexity, through log

change of variable the GP technique turns these constrained optimization ofpower control into convex optimization, which is intrinsically tractable despiteits nonlinearity in objective and constraints. However, when SIR is comparableto or below 0dB, the power control problems are truly nonconvex with noefficient and global solution methods. In this case, we present a heuristic thatis provably convergent and empirically almost always compute the globallyoptimal power allocation by solving a sequence of GPs through the approachof successive convex approximations.

The GP approach reveals the hidden convexity structure, which impliesefficient solution methods and the global optimality of any local optimum inpower control problems with nonlinear objective functions. It clearly differ-entiates the tractable formulations in high-SIR regime from the intractableones in low-SIR regime. Power control by GP is applicable to formulations

in both cellular networks with single-hop transmission between mobile usersand base stations, and ad hoc networks with mulithop transmission among


17/48

Special Volume 17

the nodes, as illustrated through several numerical examples in this section.Traditionally, GP is solved by centralized computation through the highly ef-ficient interior point methods. In this section we present a new result on howGP can be solved distributively with message passing, which has independentvalue to general maximization of coupled objective, and applies it to power

control problems with a further reduction of message passing overhead byleveraging the specific structures of power control problems.

3.2 Geometric Programming

GP is a class of nonlinear, nonconvex optimization problems with many usefultheoretical and computational properties. It was invented in 1967 by Duffin,Peterson, and Zener [14], and much of the developments by early 1980s wassummarized in [1]. Since a GP can be turned into a convex optimizationproblem, a local optimum is also a global optimum, Lagrange duality gapis zero under mild conditions, and a global optimum can be computed veryefficiently. Numerical efficiency holds both in theory and in practice: interior

point methods applied to GP have provably polynomial time complexity [35],and are very fast in practice with high-quality software downloadable fromthe Internet (e.g., the MOSEK package). Convexity and duality propertiesof GP are well understood, and large-scale, robust numerical solvers for GPare available. Furthermore, special structures in GP and its Lagrange dualproblem lead to distributed algorithms, physical interpretations, and compu-tational acceleration beyond the generic results for convex optimization. Adetailed tutorial of GP and comprehensive survey of its recent applications tocommunication systems can be found in [10]. This subsection contains a briefintroduction of GP terminology.

There are two equivalent forms of GP: standard form and convex form.The first is a constrained optimization of a type of function called posynomial,and the second form is obtained from the first through a logarithmic change

of variable.We first define a monomial as a function f :Rn++ R:

f(x) = dxa(1)

1 xa(2)

2 . . . xa(n)

n

where the multiplicative constant d 0 and the exponential constantsa(j) R, j = 1, 2, . . . , n. A sum of monomials, indexed by k below, is called aposynomial:

f(x) =Kk=1

dkxa(1)

k

1 xa(2)

k

2 . . . xa(n)

kn .

wheredk 0, k = 1, 2, . . . , K , and a(j)k R, j = 1, 2, . . . , n , k= 1, 2, . . . , K .

For example, 2x1 x0.52 + 3x1x

1003 is a posynomial inx, x1 x2 is not a posyn-

omial, andx1/x2 is a monomial, thus also a posynomial.


18/48


Minimizing a posynomial subject to posynomial upper bound inequalityconstraints and monomial equality constraints is called GP in standard form:

minimize f0(x)subject tofi(x) 1, i= 1, 2, . . . , m ,

hl(x) = 1, l= 1, 2, . . . , M

(13)

where fi, i= 0, 1, . . . , m , are posynomials: fi(x) =Kik=1dikx

a(1)ik

1 xa(2)ik

2 . . . xa(n)ikn ,

andhl, l= 1, 2, . . . , M are monomials: hl(x) =dlxa(1)l

1 xa(2)l

2 . . . xa(n)ln .

GP in standard form is not a convex optimization problem, because posyn-omials are not convex functions. However, with a logarithmic change of thevariables and multiplicative constants: yi = log xi, bik = log dik, bl = log dl,and a logarithmic change of the functions values, we can turn it into thefollowing equivalent problem in y:

minimize p0(y) = logK0k=1exp(a

T0ky + b0k)

subject topi(y) = log

Kik=1exp(a

Tiky+ bik) 0, i= 1, 2, . . . , m ,

ql(y) =aTl y + bl= 0, l= 1, 2, . . . , M .

(14)

This is referred to as GP in convex form, which is a convex optimizationproblem since it can be verified that the log-sum-exp function is convex [5].

In summary, GP is a nonlinear, nonconvex optimization problem that canbe transformed into a nonlinear, convex problem. GP in standard form canbe used to formulate network resource allocation problems with nonlinear ob-

jectives under nonlinear QoS constraints. The basic idea is that resources areoften allocated proportional to some parameters, and when resource alloca-tions are optimized over these parameters, we are maximizing an invertedposynomial subject to lower bounds on other inverted posynomials, which areequivalent to GP in standard form.

SP/GP, SOS/SDP

Note that, although posynomial seems to be a non-convex function, it becomesa convex function after the log transformation, as shown in an example inFigure 5. Compared to the (constrained or unconstrained) minimization ofa polynomial, the minimization of a posynomial in GP relaxes the integerconstraint on the exponential constants but imposes a positivity constraint onthe multiplicative constants and variables. There is a sharp contrast betweenthese two problems: polynomial minimization is NP-hard, but GP can beturned into convex optimization with provably polynomial-time algorithmsfor a global optimum.

In an extension of GP called Signomial Programming to be discussed laterin this section, the restriction of non-negative multiplicative constants is re-moved. This results in a general class of nonlinear and truly non-convex prob-

lems that is simultaneously a generalization of GP and polynomial minimiza-tion over the positive quadrant, as summarized in the comparison Table 1.


19/48

Special Volume 19

05

10

0

5

10

0

20

40

60

80

100

120

Y

X

Function

02

4

02

40.5

1

1.5

2

2.5

3

3.5

4

4.5

5

AB

Function

Fig. 5. A bi-variate posynomial before (left graph) and after (right graph) the logtransformation. A non-convex function is turned into a convex one.

GP PMoP SPc R+ R R

a(j) R Z+ R

xj R++ R++ R++

Table 1.Comparison of GP, constrained polynomial minimization over the positivequadrant (PMoP), and Signomial Programming (SP). All three types of problemsminimize a sum of monomials subject to upper bound inequality constraints on sums

of monomials, but have different definitions of monomial: c

jxa

(j)

j , as shown in thetable. GP is known to be polynomial-time solvable, but PMoP and SP are not.

The objective function of Signomial Programming can be formulated asminimizing a ratio between two posynomials, which is not a posynomial (sinceposynomials are closed under positive multiplication and addition but notdivision). As shown in Figure 6, a ratio between two posynomials is a non-convex function both before and after the log transformation. Although itdoes not seem likely that Signomial Programming can be turned into a con-vex optimization problem, there are heuristics to solve it through a sequenceof GP relaxations. However, due to the absence of algebraic structures foundin polynomials, such methods for Signomial Programming currently lack atheoretical foundation of convergence to global optimality. This is in contrast

to the sum-of-squares method [38], which uses a nested family of SDP relax-


20/48


ations to solve constrained polynomial minimization problems as explained inthe last section.

0

5

10

0

5

10

20

0

20

40

60

XY

Function

0

12

3

0

1

2

3

2

2.5

3

3.5

AB

Function

Fig. 6.Ratio between two bi-variate posynomials before (left graph) and after (rightgraph) the log transformation. It is a non-convex function in both cases.

3.3 Power Control by Geometric Programming: Convex Case

Various schemes for power control, centralized or distributed, have been ex-tensively studied since 1990s based on different transmission models and ap-

plication needs,e.g., in [2, 16, 34, 43, 50, 55]. This subsection summarizes thenew approach of formulating power control problems through GP. The key ad-vantage is that globally optimal power allocations can be efficiently computedfor a variety of nonlinearsystem-wide objectives and user QoS constraints,even when these nonlinear problems appearto be nonconvex optimization.

Basic model

Consider a wireless (cellular or multihop) network with n logical transmit-ter/receiver pairs. Transmit powers are denoted asP1, . . . , P n. In the cellularuplink case, all logical receivers may reside in the same physical receiver, i.e.,the base station. In the multihop case, since the transmission environment

can be different on the links comprising an end-to-end path, power controlschemes must consider each link along a flows path.


21/48

Special Volume 21

Under Rayleigh fading, the power received from transmitter j at receiveri is given by GijFijPj where Gij 0 represents the path gain (it may alsoencompass antenna gain and coding gain) that is often modeled as propor-tional to dij , where dij denotes distance, is the power fall-off factor, andFijmodels Rayleigh fading and are independent and exponentially distributed

with unit mean. The distribution of the received power from transmitter j atreceiver i is then exponential with mean valueE [GijFijPj ] = GijPj . The SIRfor the receiver on logical linki is:

SIRi= PiGiiFiiNj=i PjGijFij+ ni

(15)

whereni is the noise power for receiveri.The constellation size Mused by a link can be closely approximated for

MQAM modulations as follows: M = 1 + 1ln(2BER)SIR, where BER is the

bit error rate and 1, 2 are constants that depend on the modulation type.Defining K = 1ln(2BER) leads to an expression of the data rate Ri on the

ith link as a function of the SIR: Ri = 1

T log

2(1 + KSIRi), which can be

approximated as

Ri = 1

T log2(KSIRi) (16)

when KSIR is much larger than 1. This approximation is reasonable eitherwhen the signal level is much higher than the interference level or, in CDMAsystems, when the spreading gain is large. For notational simplicity in the restof this section, we redefine Gii as K times the original Gii, thus absorbingconstantKinto the definition of SIR.

The aggregate data rate for the system can then be written as

Rsystem=

iRi =

1

T log2

i

SIRi

.

So in the high SIR regime, aggregate data rate maximization is equivalentto maximizing a product of SIR. The system throughput is the aggregatedata rate supportable by the system given a set of users with specified QoSrequirements.

Outage probability is another important QoS parameter for reliable com-munication in wireless networks. A channel outage is declared and packetslost when the received SIR falls below a given threshold SIRth, often com-puted from the BER requirement. Most systems are interference dominatedand the thermal noise is relatively small, thus the ith link outage probabilityis

Po,i= Prob{SIRi SIRth}

=Prob{GiiFiiPi SIRthj=i

GijFijPj}.


22/48


The outage probability can be expressed as Po,i = 1 j=i

1

1+SIRthGijPj

GiiPi

[25], which means that the upper bound Po,i Po,i,maxcan be written as anupper bound on a posynomial in P:

j=i

1 +

SIRthGijPjGiiPi

1

1 Po,i,max . (17)

Cellular wireless networks

We first present how GP-based power control applies to cellular wireless net-works with one-hop transmission fromNusers to a base station. These resultsextend the scope of power control by the classical solution in CDMA systemsthat equalizes SIRs, and those by the iterative algorithms (e.g., in [2, 16, 34])that minimize total power (a linear objective function) subject to SIR con-straints.

We start the discussion on the suite of power control problem formula-tions with a simple objective function and basic constraints. The following

constrained problem of maximizing the SIR of a particular useri is a GP:

maximize Ri(P)subject toRi(P) Ri,min, i,

Pi1Gi1 = Pi2Gi2,0 Pi Pi,max, i.

The first constraint, equivalent to SIRi SIRi,min, sets a floor on the SIRof other users and protects these users from user i increasing her transmitpower excessively. The second constraint reflects the classical power controlcriterion in solving the near-far problem in CDMA systems: the expectedreceived power from one transmitteri1 must equal that from another i2. Thethird constraint is regulatory or system limitations on transmit powers. All

constraints can be verified to be inequality upper bounds on posynomials intransmit power vectorP.

Alternatively, we can use GP to maximize the minimum rate among allusers. The maxmin fairness objective:

maximizeP mini

{Ri}

can be accommodated in GP-based power control because it can be turnedinto equivalently maximizing an auxiliary variable t such that SIRi(P) exp(t), i, which has posynomial objective and constraints in (P, t).

Example 5. A small illustrative example. A simple system comprisedof five users is used for a numerical example. The five users are spaced atdistancesdof 1, 5, 10, 15, and 20 units from the base station. The power fall-off

factor= 4. Each user has a maximum power constraint ofPmax= 0.5mW.The noise power is 0.5W for all users. The SIR of all users, other than the


23/48

Special Volume 23

user we are optimizing for, must be greater than a common threshold SIR level. In different experiments, is varied to observe the effect on the optimizedusers SIR. This is done independently for the near user at d = 1, a mediumdistance user atd = 15, and the far user at d = 20. The results are plotted inFigure 7.

5 0 5 1020

15

10

5

0

5

10

15

20

Threshold SIR (dB)

OptimizedSIR

(dB)

Optimized SIR vs. Threshold SIR

nearmediumfar

Fig. 7. Constrained optimization of power control in a cellular network (Example5).

Several interesting effects are illustrated. First, when the required thresh-old SIR in the constraints is sufficiently high, there is no feasible power controlsolution. At moderate threshold SIR, asis decreased, the optimized SIR ini-tially increases rapidly. This is because it is allowed to increase its own powerby the sum of the power reductions in the four other users, and the noise isrelatively insignificant. At low threshold SIR, the noise becomes more signifi-cant and the power trade-off from the other users less significant, so the curvestarts to bend over. Eventually, the optimized user reaches its upper boundon power and cannot utilize the excess power allowed by the lower thresholdSIR for other users. This is exhibited by the transition from a sharp bend inthe curve to a much shallower sloped curve.

We now proceed to show that GP can also be applied to the problemformulations with an overall system objective of total system throughput,under both user data rate constraints and outage probability constraints.

The following constrained problem of maximizing system throughput is aGP:


24/48


maximize Rsystem(P)subject toRi(P) Ri,min, i,

Po,i(P) Po,i,max, i,0 Pi Pi,max, i

(18)

where the optimization variables are the transmit powers P. The objective isequivalent to minimizing the posynomial

i ISRi, where ISR is

1SIR

. Each ISRis a posynomial in P and the product of posynomials is again a posynomial.The first constraint is from the data rate demand Ri,min by each user. Thesecond constraint represents the outage probability upper bounds Po,i,max.These inequality constraints put upper bounds on posynomials ofP, as canbe readily verified through (16) and (17). Thus (18) is indeed a GP, andefficiently solvable for global optimality.

There are several obvious variations of problem (18) that can be solvedby GP, e.g., we can lower bound Rsystem as a constraint and maximize Rifor a particular user i, or have a total power

i Pi constraint or objective

function.The ob jective function to be maximized can also be generalized to a

weighted sum of data rates:iwiRi where w 0 is a given weight vector.This is still a GP because maximizing

iwilog SIRi is equivalent to maximiz-

ing logiSIR

wii , which is in turn equivalent to minimizing

i ISR

wii . Now use

auxiliary variables {ti}, and minimizei twii over the original constraints in

(18) plus the additional constraints ISRi ti for all i. This is readily verifiedto be a GP in (x, t), and is equivalent to the original problem.

Generalizing the above discussions and observing that high-SIR assump-tion is needed for GP formulation only when there are sumsof log(1 + SIR)in the optimization problem, we have the following summary.

Proposition 1 In the high-SIR regime, any combination of objectives (A)-(E) and constraints (a)-(e) in Table 2 (pick any one of the objectives and anysubset of the constraints) is a power control optimization problem that can be

solved by GP, i.e., can be transformed into a convex optimization with effi-cient algorithms to compute the globally optimal power vector. When objectives(C)-(D) or constraints (c)-(d) do not appear, the power control optimizationproblem can be solved by GP in any SIR regime.

In addition to efficient computation of the globally optimal power allo-cation with nonlinear objectives and constraints, GP can also be used foradmission control based on feasibility study described in [10], and for deter-mining which QoS constraint is a performance bottleneck, i.e., met tightly atthe optimal power allocation 4.

4 This is because most GP solution algorithms solve both the primal GP and itsLagrange dual problem, and by complementary slackness condition, a resourceconstraint is tight at optimal power allocation when the corresponding optimal

dual variable is non-zero.


25/48

Special Volume 25

Table 2. Suite of Power Control Optimization solvable by GP

Objective Function Constraints

(A) Maximize Ri (specific user) (a) Ri Ri,min (rate constraint)

(B) Maximize miniRi (worst-case user) (b) Pi1Gi1= Pi2Gi2 (near-far constraint)

(C) Maximize

iRi (total throughput) (c)

iRi Rsystem,min (total throughput constraint)(D) Maximize

iwiRi (weighted rate sum) (d) Po,i Po,i,max (outage prob. constraint)

(E) Minimize

iPi (total power) (e) 0 Pi Pi,max (power constraint)

Extensions

In wireless multihop networks, system throughput may be measured either byend-to-end transport layer utilities or by link layer aggregate throughput. GPapplication to the first approach has appeared in [9], and those to the secondapproach in [10]. Furthermore, delay and buffer overflow properties can alsobe accommodated in the constraints or objective function of GP-based power

control.

3.4 Power Control by Geometric Programming: Non-convex Case

If we maximize the total throughput Rsystem in the medium to low SIR case,i.e., when SIR is not much larger than 0dB, the approximation of log(1+ SIR)as log SIRdoes not hold. UnlikeSIR, which is an inverted posynomial, 1+SIRisnot an inverted posynomial. Instead, 11+SIR is aratiobetween two posynomials:

f(P)

g(P) =

j=iGijPj+ nijGijPj+ ni

. (19)

Minimizing or upper bounding a ratio between two posynomials belongs to

a truly nonconvex class of problems known as Complementary GP [1, 10] thatis an intrinsically intractable NP-hard problem. An equivalent generalizationof GP is Signomial Programming (SP) [1, 10]: minimizing a signomial subjectto upper bound inequality constraints on signomials, where a signomial s(x) isa sum of monomials, possibly with negativemultiplicative coefficients:s(x) =Ni=1 cigi(x) wherec R

N andgi(x) are monomials 5.

Successive convex approximation method

Consider the following nonconvex problem:

5 An SP can always be converted into a Complementary GP, because an inequalityin SP, which can be written as fi1(x)fi2(x) 1,wherefi1, fi2are posynomials,

is equivalent to an inequality fi1(x)1+fi2(x) 1 in Complementary GP.


26/48


minimize f0(x)subject tofi(x) 1, i= 1, 2, . . . , m ,

(20)

wheref0 is convex without loss of generality6, but thefi(x)s,iare noncon-

vex. Since directly solving this problem is NP-hard, we want to solve it by

a series of approximations fi(x) fi(x), x, each of which can be optimallysolved in an easy way. It is known [33] that if the approximations satisfy thefollowing three properties, then the solutions of this series of approximationsconverge to a point satisfying the necessary optimality Karush-Kuhn-Tucker(KKT) conditions of the original problem:

(1)fi(x) fi(x) for all x,(2)fi(x0) = fi(x0) where x0 is the optimal solution of the approximated

problem in the previous iteration,(3)fi(x0) = fi(x0).The following algorithm describes the generic successive approximation

approach.Given a method to approximate fi(x) with fi(x) ,i, around somepoint of interest x0, the following algorithm provides the output of a vectorthat satisfies the KKT conditions of the original problem.

Algorithm 2.Successive approximation to a nonconvex problem.1) Choose an initial feasible pointx(0) and setk = 1.2) Form an approximated problem of (20) based on the previous point

x(k1).3) Solve the k-th approximated problem to obtain x(k).4) Incrementk and go to step 2 until convergence to a stationary point.

Single condensation method. Complementary GPs involve upper bounds onthe ratio of posynomials as in (19); they can be turned into GPs by approx-imating the denominator of the ratio of posynomials, g(x), with a monomialg(x), but leaving the numerator f(x) as a posynomial.

Lemma 1 Letg(x) =i ui(x) be a posynomial. Then

g(x) g(x) =i

ui(x)

i

i. (21)

If, in addition,i = ui(x0)/g(x0), i,for any fixed positivex0, theng(x0) =g(x0), andg(x) is the best local monomial approximation to g(x) nearx0 inthe sense of first order Taylor approximation.

Proof. The arithmetic-geometric mean inequality states thati ivi

i vii ,

wherev 0 and 0, 1T = 1. Letting ui = ivi, we can write this basic

6 If f0 is nonconvex, we can move the objective function to theconstraint by introducing auxiliary scalar variable t and writingminimizet subject to the additional constraint f0(x) t 0.


27/48

Special Volume 27

inequality asi ui i

uii

i. The inequality becomes an equality if we

leti= ui/i ui, i,which satisfies the condition that 0 and1

T = 1.

It can be readily verified that the best local monomial approximation ofg(x)nearx0 is g(x).

Proposition 2 The approximation of a ratio of posynomialsf(x)/g(x) withf(x)/g(x) where g(x) is the monomial approximation of g(x) using thearithmetic-geometric mean approximation of Lemma 1 satisfies the three con-ditions for the convergence of the successive approximation method.

Proof. Conditions (1) and (2) are clearly satisfied since g(x) g(x) andg(x0) = g(x0) (Lemma 1). Condition (3) is easily verified by taking derivativesofg (x) and g(x).

Double condensation method. Another choice of approximation is to makea double monomial approximation for both the denominator and numeratorin (19). However, in order to satisfy the three conditions for the convergenceof the successive approximation method, a monomial approximation for the

numeratorf(x) should satisfy f(x) f(x).

Applications to power control

Figure 8 shows a block diagram of the approach of GP-based power controlfor general SIR regime. In the high SIR regime, we need to solve onlyoneGP.In the medium to low SIR regimes, we solve truly nonconvex power controlproblems that cannot be turned into convex formulation through a seriesofGPs.

(High SIR)OriginalProblem Solve1 GP

(Mediumto

Low SIR)

OriginalProblem

SPComplementary

GP (Condensed)Solve1 GP

Fig. 8. GP-based power control for general SIR regime.

GP-based power control problems in the medium to low SIR regimes be-come SP (or, equivalently, Complementary GP), which can be solved by thesingle or double condensation method. We focus on the single condensation

method here. Consider a representative problem formulation of maximizingtotal system throughput in a cellular wireless network subject to user rate


28/48


and outage probability constraints in problem (18), which can be explicitlywritten out as:

minimizeN

i=11

1+SIRisubject to (2TRi,min 1) 1

SIRi 1, i= 1, . . . , N ,

(SIRth)N1(1 Po,i,max)Nj=i G

ijPjGiiPi 1, i= 1, . . . , N ,Pi(Pi,max)

1 1, i= 1, . . . , N .

(22)

All the constraints are posynomials. However, the objective is not a posyn-omial, but a ratio between two posynomials as in (19). This power controlproblem can be solved by the condensation method by solving a series ofGPs. Specifically, we have the following single-condensation algorithm:

Algorithm 3.Single condensation GP power control.1) Evaluate the denominator posynomial of the objective function in (22)

with the givenP.2) Compute for each term i in this posynomial,

i= value ofith term in posynomialvalue of posynomial

.

3) Condense the denominator posynomial of the (22) objective functioninto a monomial using (21) with weights i.

4) Solve the resulting GP using an interior point method.5) Go to step 1 using P of step 4.6) Terminate the kth loop if P(k) P(k1) where is the error

tolerance for exit condition.

As condensing the objective in the above problem gives us an underesti-mate of the objective value, each GP in the condensation iteration loop triesto improve the accuracy of the approximation to a particular minimum in the

original feasible region. All three conditions for convergence are satisfied, andthe algorithm is convergent. Empirically through extensive numerical experi-ments, we observe that it almost always computes the globally optimal powerallocation.

Example 6.Single condensation example. We consider a wireless cellularnetwork with 3 users. Let T = 106s, Gii = 1.5, and generate Gij , i = j,as independent random variables uniformly distributed between 0 and 0.3.Threshold SIR is SIRth = 10dB, and minimal data rate requirements are100 kbps, 600 kbps and 1000 kbps for logical links 1, 2 and 3 respectively.Maximal outage probabilities are 0.01 for all links, and maximal transmitpowers are 3mW, 4mW and 5mW for link 1, 2 and 3, respectively. For eachinstance of SP power control (22), we pick a random initial feasible powervector P uniformly between 0 and Pmax. Figure 9 compares the maximized

total network throughput achieved over five hundred sets of experiments withdifferent initial vectors. With the (single) condensation method, SP converges


29/48

Special Volume 29

to different optima over the entire set of experiments, achieving (or comingvery close to) the global optimum at 5290 bps 96% of the time and a localoptimum at 5060 bps 4% of the time. The average number of GP iterationsrequired by the condensation method over the same set of experiments is 15if an extremely tight exit condition is picked for SP condensation iteration:

= 1 1010. This average can be substantially reduced by using a larger ,e.g., increasing to 1 102 requires on average only 4 GPs.

0 50 100 150 200 250 300 350 400 450 5005050

5100

5150

5200

5250

5300

Experiment index

Totalsystemt

hroughputachieved

ptimized total system throughput

Fig. 9. Maximized total system throughput achieved by the (single) condensationmethod for 500 different initial feasible vectors (Example 6). Each point representsa different experiment with a different initial power vector.

We have thus far discussed a power control problem (22) where the objec-tive function needs to be condensed. The method is also applicable if someconstraint functions are signomials and need to be condensed [51].

3.5 Distributed Implementation

A limitation for GP-based power control in ad hoc networks without basestations is the need for centralized computation (e.g., by interior point meth-ods). The GP formulations of power control problems can also be solved bya new method of distributed algorithm for GP. The basic idea is that eachuser solves its own local optimization problem and the coupling among usersis taken care of by message passing among the users. Interestingly, the spe-cial structure of coupling for the problem at hand (all coupling among thelogical links can be lumped together using interference terms) allows one to


30/48


further reduce the amount of message passing among the users. Specifically,we use a dual decomposition method to decompose a GP into smaller sub-problems whose solutions are jointly and iteratively coordinated by the useof dual variables. The key step is to introduce auxiliary variables and to addextra equality constraints, thus transferring the coupling in the objective to

coupling in the constraints, which can be solved by introducing consistencypricing (in contrast to congestion pricing). We illustrate this idea throughan unconstrained GP followed by an application of the technique to powercontrol.

Distributed algorithm for GP

Suppose we have the following unconstrained standard form GP in x 0:

minimizei fi(xi, {xj}jI(i)) (23)

where xi denotes the local variable of the ith user, {xj}jI(i) denote thecoupled variables from other users, andfiis either a monomial or posynomial.Making a change of variableyi= log xi, i, in the original problem, we obtain

minimizei fi(e

yi , {eyj}jI(i)).

We now rewrite the problem by introducing auxiliary variables yij for thecoupled arguments and additional equality constraints to enforce consistency:

minimizei fi(e

yi , {eyij}jI(i))subject toyij=yj, j I(i), i.

(24)

Each ith user controls the local variables (yi, {yij}jI(i)). Next, the Lagrangianof (24) is formed as

L({yi}, {yij}; {ij}) =i

fi(eyi

, {eyij

}jI(i)) +i

jI(i)

ij(yj yij)

=i

Li(yi, {yij}; {ij})

where

Li(yi, {yij}; {ij}) = fi(eyi, {eyij}jI(i))+

j:iI(j)

ji

yijI(i)

ijyij. (25)

The minimization of the Lagrangian with respect to the primal variables({yi}, {yij}) can be done simultaneously and distributively by each user inparallel. In the more general case where the original problem (23) is con-

strained, the additional constraints can be included in the minimization ateach Li.


31/48

Special Volume 31

In addition, the following master Lagrange dual problem has to be solvedto obtain the optimal dual variables or consistency prices {ij}:

max{ij}

g({ij}) (26)

whereg({ij}) =

i

minyi,{yij}

Li(yi, {yij}; {ij}).

Note that the transformed primal problem (24) is convex with zero dualitygap; hence the Lagrange dual problem indeed solves the original standard GPproblem. A simple way to solve the maximization in (26) is with the followingsubgradient update for the consistency prices:

ij(t+ 1) = ij(t) + (t)(yj(t) yij(t)). (27)

Appropriate choice of the stepsize (t)> 0, e.g.,(t) =0/tfor some constant0 > 0, leads to convergence of the dual algorithm.

Summarizing, the ith user has to: i) minimize the function Li in (25)

involving only local variables, upon receiving the updated dual variables{ji, j : i I(j)}, and ii) update the local consistency prices {ij , j I(i)}with (27), and broadcast the updated prices to the coupled users.

Applications to power control

As an illustrative example, we maximize the total system throughput in thehigh SIR regime with constraints local to each user. If we directly applied thedistributed approach described in the last subsection, the resulting algorithmwould require knowledge by each user of the interfering channels and inter-fering transmit powers, which would translate into a large amount of messagepassing. To obtain a practical distributed solution, we can leverage the struc-

tures of power control problems at hand, and instead keep a local copy ofeach of the effective received powers PRij = GijPj . Again using problem (18)as an example formulation and assuming high SIR, we can write the problemas following (after the log change of variable):

minimize

i log

G1ii exp(Pi)

j=iexp(PRij ) +

2

subject to PRij =Gij+ Pj ,

Constraints local to each user, e.g., (a),(d) and (e) in Table (2).(28)

The partial Lagrangian is

L=i

logG1ii exp(Pi)j=i

exp(PRij ) + 2+

ij=i

ij PRij Gij+ Pj ,(29)


32/48


and the local ith Lagrangian function in (29) is distributed to the ith user,from which the dual decomposition method can be used to determine the op-timal power allocation P. The distributed power control algorithm is sum-marized as follows.

Algorithm 4.Distributed power allocation update to maximize Rsystem.At each iteration t:1) Theith user receives the term

j=i ji(t)

involving the dual variables

from the interfering users by message passing and minimizes the following local

Lagrangian with respect to Pi(t),

PRij (t)j

subject to the local constraints:

Li

Pi(t),

PRij (t)j

; {ij(t)}j

= log

G1ii exp(Pi(t))

j=iexp(

PRij (t)) + 2

+j=i ij

PRij (t)

j=i ji(t)

Pi(t).

2) The ith user estimates the effective received power from each of theinterfering usersPR

ij(t) = GijPj(t) for j =i, updates the dual variable by

ij(t+ 1) =ij(t) + (0/t)

PRij (t) log GijPj(t)

, (30)

and then broadcast them by message passing to all interfering users in thesystem.

Example 7. Distributed GP power control. We apply the distributed al-gorithm to solve the above power control problem for three logical links withGij = 0.2, i = j, Gii = 1, i, maximal transmit powers of 6mW, 7mW and7mW for link 1, 2 and 3 respectively. Figure 10 shows the convergence of thedual objective function towards the globally optimal total throughput of thenetwork. Figure 11 shows the convergence of the two auxiliary variables in

link 1 and 3 towards the optimal solutions.

3.6 Concluding Remarks and Future Directions

Power control problems with nonlinear objective and constraints may seem tobe difficult, NP-hard problems to solve for global optimality. However, whenSIR is much larger than 0dB, GP can be used to turn these problems intointrinsically tractable convex formulations, accommodating a variety of pos-sible combinations of objective and constraint functions involving data rate,delay, and outage probability. Then interior point algorithms can efficientlycompute the globally optimal power allocation even for a large network. Feasi-bility analysis of GP naturally lead to admission control and pricing schemes.When the high SIR approximation cannot be made, these power control prob-lems become SP and may be solved by the heuristic of condensation method


33/48

Special Volume 33

0 50 100 150 2000.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2x 10

4

Iteration

Dual objective function

Fig. 10. Convergence of the dual objective function through distributed algorithm(Example 7).

0 50 100 150 20010

8

6

4

2

0

2

Iteration

Consistency of the auxiliary variables

log(P2)

log(PR

12/G

12)

log(PR

32/G

32)

Fig. 11. Convergence of the consistency constraints through distributed algorithm(Example 7).

through a series of GPs. Distributed optimal algorithms for GP-based powercontrol in multihop networks can also be carried out through message passing.

Several interesting research issues remain to be further explored, in par-ticular, reduction of SP solution complexity (e.g., by using high-SIR approx-imation to obtain the initial power vector and by solving the series of GPsonly approximately except the last GP), and combination of SP solution and

distributed algorithm for distributed power control in low SIR regime.


34/48


4 DSL Spectrum Management

4.1 Introduction

Digital Subscriber Line (DSL) technologies transform traditional voice-band

copper channels into high bandwidth data pipes, which are capable of deliver-ing data rates up to several Mbps per twisted-pair over a distance of about 10kft. The major obstacle for performance improvement in modern DSL systems(e.g., ADSL and VDSL) is the crosstalk, which is the interference generatedbetween different lines in the same binder. The crosstalk is typically 10-20dB larger than the background noise, and direct crosstalk cancellation (e.g.,[6, 17]) may not be feasible in many cases due to the complexity issues or asa result of unbundling. To mitigate the detriments caused by crosstalk,staticspectrum management that mandates spectrum mask or flat power backoffacross all frequencies (i.e., tones) has been implemented in the current sys-tem.

Dynamic spectrum management (DSM) techniques, on the other hand,can significantly improves data rates over the current practice of static spec-

trum management. Within the current capability of the DSL modems, eachmodem has the capability to shape its own power spectrum density (PSD)across different tones, but can only treat crosstalk as background noise (i.e.,no signal level coordination, such as vector transmission or iterative decod-ing, is allowed), and each modem is inherently a single-input-single-outputcommunication system. The objective would be to optimize the PSD of allusers on all tones (i.e., continuous power loading or discrete bit loading), suchthat they are compatible with each other and the system performance (e.g.,weighted rate sum as discussed below) is maximized.

Compared to power control in wireless networks treated in the last section,the channel gains are not time-varying in DSL systems, but the problem di-mension increases tremendously because there are many tones (or frequencycarriers) over which transmission takes place. Nonconvexity still remains amajor technical challenge, and high SIR approximation in general cannot bemade. However, utilizing the specific structures of the problem (e.g., the chan-nel gain values), an efficient and distributed heuristics is shown to performclose to the optimum in many realistic DSL network scenarios.

This section develops, analyzes, and simulates the new algorithm for spec-trum management in frequency selective interference channels for DSL, calledAutonomous Spectrum Management (ASB). It is autonomous (distributed al-gorithm across the users without explicit information exchange) with linear-complexity, while provably convergent and comes close to the globally optimalrate region in practice, thus overcoming bottlenecks in the state-of-the-art al-gorithms in DSM, such as IW, OSB, and ISB summarized below.

Let Kbe the number of tones and N the number of users (lines). The

iterative waterfilling(IW) algorithm [56] is among one of the first DSM algo-rithms proposed. In IW, each user views any crosstalk experienced as additive


35/48

Special Volume 35

Gaussian noise, and seeks to maximize its data rate by waterfilling over theaggregated noise plus interference. No information exchange is needed amongusers, and all the actions are completely autonomous. IW leads to an greatperformance over the static approach, and enjoys a low complexity that islinear in N. However, the greedy nature of IW leads to a performance far

from optimal in the near-far scenarios such as mixed CO/RT deployment andupstream VDSL.

To address this, an optimal spectrum balancing(OSB) algorithm [8] hasbeen proposed, which finds the best possible spectrum management solutionunder the current capabilities of the DSL modems. OSB avoids the selfishbehaviors of individual users by aiming at the maximization of a total weightedsum of users rates, which corresponds to a boundary point of the achievablerate region. On the other hand, OSB has a high computation complexity thatis exponential in N, which quickly leads to intractability when N is largerthan 6. Moreover, it is a completely centralized algorithm where a spectrummanagement center at the central office needs to know the global information(i.e., all the noise PSDs and crosstalk channel gains in the same binder) toperform the algorithm.

As an improvement to the OSB algorithm, an iterative spectrum balancing(ISB) algorithm [7] has been proposed, which is based on a weighted sumrate maximization similar as OSB. Different from OSB, ISB performs theoptimization iteratively through users, which leads to a quadratic complexityinN.Closely to optimal performance can be achieved by the ISB algorithm inmost cases. However, each user still needs to know the global information as inOSB, thus ISB is still a centralized algorithm and considered to be impracticalin many cases.

This section presents the ASB algorithm [21], which further reduce thecomplexity from ISB algorithm, and achieves close optimal performance sim-ilar as ISB and OSB. The basic idea is to use the concept of reference lineto mimic a typical victim line in the current binder. By setting the power

spectrum level to protect the reference line, a good balance between selfishand global maximizations can be achieved. The ASB algorithm enjoys a linearcomplexity inNandK, and can be implemented in a completely autonomousway. We prove the convergence of ASB for both 2-user andN-user case, underboth sequential and parallel updates.

Table 3 compares various aspects of different DSM algorithms. Utilizingthe structures of the DSL problem, in particular, the lack of channel variationand user mobility, is the key to provide a linear complexity, distributed, con-vergent, and almost optimal solution to this coupled nonconvex optimizationproblem.

4.2 System Model

Using the notation as in [8, 7], we consider a DSL bundle with N = {1,...,N}modems (i.e., lines, users) and K = {1,...,K} tones. Assume discrete multi-


36/48


Table 3. Comparison of different DSM algorithms

Algorithm Operation Complexity Performance Reference

IW Autonomous O (KN) Suboptimal [56]

OSB Centralized O KeN

Optimal [8]ISB Centralized O

KN

2 Near optimal [7]

ASB Autonomous O (KN) Near optimal [21]

tone (DMT) modulation is employed by all modems, transmission can bemodeled independently on each tone as

yk = Hkxk+ zk.

The vector xk ={xnk , n N } contains transmitted signals on tone k, wherexnk is the signal transmitted onto line n at tone k. yk and zk have similarstructures.yk is the vector of received signals on tone k. zk is the vector ofadditive noise on tone k and contains thermal noise, alien crosstalk, single-

carrier modems, radio frequency interference etc. Hk = [hn,mk ]n,mN is theN N channel transfer matrix on tone k, where hn,mk is the channel fromTX m to RX n on tone k . The diagonal elements ofHk contains the direct-channels whilst the off-diagonal elements contain the crosstalk channels. We

denote the transmit power spectrum density (PSD) snk =E

|xnk |2

. In last

sections notation for single-carrier systems, we would have snk =Pn, k. Forconvenience we denote the vector containing the PSD of user n on all tonesassn = {snk , k K} . We denote DMT symbol rate asfs.

Assume that each modem treats interference from other modems as noise.When the number of interfering modems is large, the interference can bewell approximated by a Gaussian distribution. Under this assumption theachievable bit loading of user n on tonek is

bnk = log

1 +

1

snkm=n

n,mk s

mk +

nk

, (31)

where n,mk = |hn,mk |

2/ |hn,nk |

2is the normalized crosstalk channel gain, and

nk is the noise power density normalized by the direct channel gain |hn,nk |

2.

Heredenotes the SINR-gap to capacity, which is a function of the desiredBER, coding gain and noise margin [49]. Without loss of generality, we assume= 1. The data rate on linen is thus

Rn =fskK

bnk . (32)

Each modemn is typically subject to a total power constraintPn, due to thelimitations on each modems analog frontend.


37/48

Special Volume 37kK

snk Pn. (33)

4.3 Spectrum Management Problem Formulation

The spectrum management problem is defined as follows

maximize R1

subject toRn Rn,target, n >1kK s

nk P

n, n.(34)

Here Rn,target the target rate constraint of user n. In other words, wetry to maximize the achievable rate of user 1, under the condition that allother users achieve their target rates Rn,target. The mutual interference in(31) causes Problem (34) to be coupled across users on each tone, and theindividual total power constraint causes Problem (34) to be coupled acrosstones as well. Moreover, the objective function in Problem (34) is non-convexdue to the coupling of interference, and the convexity of the rate region cannot be guaranteed in general.

However, it has been shown in [57] that the duality gap between the dualgap of Problem (34) goes to zero when the number of tones Kgets large (e.g.,for VDSL), thus Problem (34) can be solved by dual decomposition method,which brings the complexity as a function ofKdown to linear. Moreover, afrequency-sharing property ensures the rate region is convex with large enoughK, and each boundary point of the boundary point of the rate region can beachieved by a weighted rate maximization as following [8]:

maximize R1 +n>1 w

nRn

subject tokK s

nk P

n, n N, (35)

such that the nonnegative weight coefficient wn is adjusted to ensure that the

target rate constraint of user n is met. Without loss of generality, here wedefinew1 = 1. By changing the rate constraintsRn,target for usersn >1 (orequivalently, changing the weight coefficients, wn forn >1), every boundarypoint of the convex rate region can be tracked.

We observe that at the optimal solutions of (34), each user chooses aPSD level that leads to a good balance of maximization of her own rateand minimization of the damages he causes to the other users. To accuratelycalculate the latter, the user needs to know the global information of the noisePSDs and crosstalk channel gains. However, if we aim at a less aggressiveobjective and only require each user give enough protection to the other usersin the binder while maximization her own rate, then global information maynot be needed. Indeed, we can introduce the concept of a reference line,a virtual line that represents a typical victim in the current binder. Theninstead of solving (34), each user tries to maximize the achievable data rate


38/48


on the reference line, subject to its own data rate and total power constraint.Define the rate of the reference line to user n as

Rn,ref =

kKbnk =

kKlog

1 +

sknks

nk + k

.

The coefficients {sk,k,nk , k, n} are parameters of the reference line andcan be obtained from field measurement. They represent the conditions of atypical victim user in an interference channel (here a binder of DSL lines),and are known to the users a priori. They can be further updated on a muchslower timescale through channel measurement data. User n then wants tosolve the following problem local to itself:

maximize Rn,ref

subject toRn Rn,target,kK s

nk P

n.(36)

By using Lagragnian relaxation on the rate target constraint in Problem

(36) with a weight coefficient (dual variable) wn

, the relaxed version of (36)is

maximize wnRn + Rn,ref

subject tokK s

nk P

n. (37)

The weight coefficientwn needs to be adjusted to enforce the rate constraint.

4.4 ASB Algorithms

We first introduce the basic version of the ASB algorithm (ASB-I), where eachusernchooses the PSD sn to solve (36) ,and updates the weight coefficient wn

to enforce the target rate constraint. Then we introduce a variation of the ASBalgorithm (ASB-II) that enjoys even lower lower computational complexityand provable convergence.

ASB-I

For each user n, replacing the original optimization (36) with the Lagrangedual problem

maxn0,

kKsnkPn

kK

maxsnk

Jnk

wn, n, snk , snk

, (38)

whereJnk

wn, n, snk , snk

= wnbnk + b

nk

nsnk . (39)

By introducing the dual variable n, we decouple (36) into several smaller

subproblem, one for each tone. And define Jnk as user ns objective functionon tone k . The optimal PSD that maximizes Jnk for givenw

n andn is


39/48

Special Volume 39

sn,Ik

wn, n, snk

= arg maxsnk[0,Pn]

Jnk

wn, n, snk , snk

, (40)

which can be found by solving the first order condition, Jnk wn, n, snk , s

nk

/snk =0, which leads to

wn

sn,Ik +m=n

n,mk s

mk +

nk

nk sk

sk+ nksn,Ik + k

nks

n,Ik + k

n = 0.(41)

Note that (41) can be simplified into a cubic equation which has threesolutions. The optimal PSD can be found by substituting these three solutionsback to the objective function Jnk

wn, n, snk , s

nk ,

, as well as checking theboundary solutions snk = 0 and s

nk = P

n, and pick the one that yields thelargest value ofJnk .

The user then updates n to enforce the power constraint, and updateswn to enforce the target rate constraint. The complete algorithm is given asfollows,

Documents

Nonconvex Opt