18
Optimal resource allocation in multi-class networks with user-specified utility functions q Suresh Kalyanasundaram a , Edwin K.P. Chong b , Ness B. Shroff c, * a Global Telecom Solutions Sector, Motorola, Arlington Heights, IL 60004, USA b Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523, USA c School of Electrical and Computer Engineering, Purdue University, 1285 Electrical Engineering Building, West Lafayette, IN 47907, USA Received 18 April 2001; accepted 17 September 2001 Responsible Editor: E. Knightly Abstract In this work, we consider the problem of resource allocation in multi-class networks, where users specify the value they attach to obtaining different amounts of resource by means of a utility function. We develop a resource allocation scheme that maximizes the average aggregate utility per unit time. We formulate this resource allocation problem as a Markov decision process. We present numerical results that illustrate that our scheme performs better than the greedy resource allocation policy. We also discuss the implications of deliberate lying by users about their utility functions and develop a pricing scheme that prevents such lying. Ó 2001 Elsevier Science B.V. All rights reserved. Keywords: Resource allocation; Pricing; Markov decision processes; Utility functions; Greedy scheme; Complete partitioning 1. Introduction This work is motivated by two emerging trends in the networking area. One is the move towards multi-service networks, where a single network infrastructure is expected to support a variety of applications, such as data, voice, and video [1]. Secondly, applications are increasingly able to adapt to a wide range of resource amounts allo- cated to it by the network [2]. For example, phones can be equipped with multi-rate vocoders that can guarantee intelligible reception at the rates at which these vocoders operate. Scalable video compression schemes ensure successful playback at the receiver even at reduced bandwidth alloca- tions [3]. Our aim in this work is to look at the resource allocation problem in this above scenario of multi-service networks and elastic applications. For the successful operation of a multi-service network, several resources need to be allocated appropriately. The resources that are of interest in this context are bandwidth, buffer space, delay, Computer Networks 38 (2002) 613–630 www.elsevier.com/locate/comnet q This research was supported in part by the National Science Foundation through grants NCR-9624525, CDA-9617388, ECS-9501652, and ANI-9805441, and by the US Army Research Office through grant DAAH04-95-1-0246. * Corresponding author. Tel.: +1-765-494-3471; fax: +1-765- 494-3358. E-mail addresses: [email protected] (S. Kalyanasundaram), [email protected] (E.K.P. Chong), shroff@dynamo.ecn.purdue.edu, shroff@ecn.purdue. edu (N.B. Shroff). 1389-1286/01/$ - see front matter Ó 2001 Elsevier Science B.V. All rights reserved. PII:S1389-1286(01)00275-4

Optimal resource allocation in multi-class networks with user-specified utility functions

Embed Size (px)

Citation preview

Page 1: Optimal resource allocation in multi-class networks with user-specified utility functions

Optimal resource allocation in multi-class networks withuser-specified utility functionsq

Suresh Kalyanasundarama, Edwin K.P. Chongb, Ness B. Shroff c,*

a Global Telecom Solutions Sector, Motorola, Arlington Heights, IL 60004, USAb Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523, USA

c School of Electrical and Computer Engineering, Purdue University, 1285 Electrical Engineering Building, West Lafayette,

IN 47907, USA

Received 18 April 2001; accepted 17 September 2001

Responsible Editor: E. Knightly

Abstract

In this work, we consider the problem of resource allocation in multi-class networks, where users specify the value

they attach to obtaining different amounts of resource by means of a utility function. We develop a resource allocation

scheme that maximizes the average aggregate utility per unit time. We formulate this resource allocation problem as a

Markov decision process. We present numerical results that illustrate that our scheme performs better than the greedy

resource allocation policy. We also discuss the implications of deliberate lying by users about their utility functions and

develop a pricing scheme that prevents such lying. � 2001 Elsevier Science B.V. All rights reserved.

Keywords: Resource allocation; Pricing; Markov decision processes; Utility functions; Greedy scheme; Complete partitioning

1. Introduction

This work is motivated by two emerging trendsin the networking area. One is the move towardsmulti-service networks, where a single networkinfrastructure is expected to support a variety

of applications, such as data, voice, and video[1]. Secondly, applications are increasingly able toadapt to a wide range of resource amounts allo-cated to it by the network [2]. For example, phonescan be equipped with multi-rate vocoders thatcan guarantee intelligible reception at the ratesat which these vocoders operate. Scalable videocompression schemes ensure successful playbackat the receiver even at reduced bandwidth alloca-tions [3]. Our aim in this work is to look at theresource allocation problem in this above scenarioof multi-service networks and elastic applications.For the successful operation of a multi-service

network, several resources need to be allocatedappropriately. The resources that are of interest inthis context are bandwidth, buffer space, delay,

Computer Networks 38 (2002) 613–630

www.elsevier.com/locate/comnet

qThis research was supported in part by the National Science

Foundation through grants NCR-9624525, CDA-9617388,

ECS-9501652, and ANI-9805441, and by the US Army

Research Office through grant DAAH04-95-1-0246.* Corresponding author. Tel.: +1-765-494-3471; fax: +1-765-

494-3358.

E-mail addresses: [email protected]

(S. Kalyanasundaram), [email protected] (E.K.P.

Chong), [email protected], [email protected].

edu (N.B. Shroff).

1389-1286/01/$ - see front matter � 2001 Elsevier Science B.V. All rights reserved.

PII: S1389-1286 (01 )00275-4

Page 2: Optimal resource allocation in multi-class networks with user-specified utility functions

etc., which may or may not be allocated indepen-dently. In this work, we are mainly concerned withthe allocation of bandwidth as a resource. Whilemany of the ideas can be carried over to othertypes of resources as well, this is outside the scopeof what we wish to address in this work. Therefore,we use the terms ‘‘resource’’ and ‘‘bandwidth’’interchangeably in this work unless specifiedotherwise. In this work, we first present a solutionto the problem of resource allocation on a singlelink and then show how it can be extended to thecase of multiple links.We now elaborate on the two emerging trends

described above and discuss the implications theyhave for the resource allocation problem. With theneed for multimedia services to be supported overthe same network infrastructure, the resource al-locator should consider the relative importance ofa certain quantity of resource for various appli-cations. In other words, the resource allocatorshould determine the application that would derivethe maximum benefit from a certain quantity ofresource, and then make a decision based on thisinformation. For example, in general, delay-sen-sitive applications such as voice and video willbenefit more from a certain quantity of bandwidthallocation than delay-tolerant file-transfer appli-cations. The idea of utility functions from micro-economics captures this notion of importance ofthe quantity of resource allocated to the applica-tion very well [4]. We use a similar concept in ourwork, and the users specify the utility or value theyattach to different quantities of resource using autility function. We assume that the resource al-locator knows the utility function of users at thetime of resource allocation. The most likely way ofachieving this is to have users communicate theirutility functions to the resource allocator at thetime of call setup. The resource allocator then al-locates resources based on the objective of maxi-mizing the aggregate average utility obtained perunit time. Maximization of aggregate utility isrecognized as an important objective in resourceallocation problems [5]. The objective of maxi-mizing aggregate utility (as opposed to maximizingrevenue) would be meaningful in private networksthat are owned and operated by organizations fortheir own use. Even in the case of public networks,

the potential loss of ‘‘goodwill’’ with customersmight motivate operators to use the objective ofmaximizing aggregate utility. While other objec-tives may also be appropriate, we believe thatmaximizing aggregate utility is an important oneto consider. Some of the ideas in this work wouldalso readily extend to other objectives.The second emerging trend of elastic appli-

cations means that the resource allocator hassignificant freedom in deciding the amount of re-source that can be allocated to an application orconnection. If the applications were non-elastic, asin the case of traditional voice telephony, thenetwork can only decide whether or not the con-nection can be admitted. This is known as calladmission control. But with elastic applications,besides doing call admission control, the networkcan also decide the amount of resource to allocateto an admitted call. The notion of utility functionsbecomes much richer and more useful for elasticapplications. As we will see, the utility function ofa non-elastic application reduces to the case of a‘‘step function’’ because the application can gain-fully use exactly a certain quantity of resource.While applications are elastic, this does not

necessarily mean that the resource allocated to agiven connection or call can be altered during thecourse of the connection. We believe reallocating adifferent amount of resource during the lifetime ofa connection will result in excessive overhead. Wealso believe that users will not like a frequentchange in the quality of reception, which couldresult from frequent resource reallocations. There-fore, in this paper, we require that the resource isallocated for the entire lifetime of a connectionand that resource reallocations are not allowed.More research is needed to determine if realloca-tions can be done during a connection’s lifetimeand how users perceive such reallocations. If re-allocations can be done, the frequency of reallo-cations that users are willing to tolerate also needsto be determined. This issue will not be addressedin this paper.Our approach to resource allocation in the

above context is to formulate the problem as aMarkov decision process (MDP). MDPs are usedin dynamic probabilistic systems to make sequen-tial decisions that optimize an appropriate objec-

614 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 3: Optimal resource allocation in multi-class networks with user-specified utility functions

tive [6–8]. In MDPs, the system can be in a finitenumber of states, and the decision maker canchoose from several actions in each of those states.Once an action has been chosen in each state, thesystem evolves as a continuous-time Markovchain 1 and accumulates reward with time. Therate of reward accrual depends on the state of thesystem and the choice of control action in thatstate. In other words, associated with each state(and possibly depending on the control actionchosen in that state), there is a reward accrual rate.This reward accumulates additively with time.In MDPs, the aim is to determine a policy (orequivalently, an action for each state) that opti-mizes the chosen objective. The chosen objectivecould either be a discounted total reward or anaverage per-unit-time reward. In the discountedtotal reward criterion, future rewards are dis-counted by a certain discount factor, which cap-tures the notion of future rewards being lessimportant than those accrued at the present time.In average reward criterion problems, the objectiveis to determine a policy that maximizes the averageper-unit-time reward. (In our case, we wish tomaximize the chosen objective because we asso-ciate ‘‘rewards’’ with each state. Alternatively,some authors associate ‘‘costs’’ with each state andminimize the chosen objective [9].)While there has been a lot of work in the area of

static resource allocation [5,10–14], where thenumber of users in the system is assumed fixed, ourwork is different in that we consider the dynamiccase, where the state of the system changes withuser arrivals and departures. For example, in Ref.[13], the authors discuss Pareto optimality andNash equilibria for the case of a fixed number ofusers with known utility functions. Similarly, inRef. [14] the authors consider the static scenario ofa fixed number of elastic applications character-ized by their minimum and peak rate require-ments. In this framework, they obtain distributedimplementations that converge to the optimalNash bargaining solution. In our work, we exploitthe knowledge of the arrival and departure pattern

to make ‘‘informed’’ resource allocation decisions.We also implicitly take into consideration ‘‘op-portunity costs’’ associated with decisions. Thepapers on static resource allocation cited aboveassume that whenever a call arrival or call depar-ture occurs, resource is reallocated according totheir static resource allocation algorithm again. Aswe elaborate later, our model does not allow re-source reallocations to already existing users.Therefore, our scheme considers the future effectsof its current decisions while allocating resources.MDPs have been successfully applied for admis-sion control problems [15–18], but their applica-tion to resource allocation problems is new. (Seethe recent survey article [19].)Recently, there has been work by Paschalidis

and Tsitsiklis on a dynamic pricing framework forrevenue and welfare maximization [20]. In thiswork the authors determine the optimal prices thatmaximize the long-run average revenue or welfareof the network. However, unlike our work whichexplicitly considers the case of elastic applicationswith the same user capable of operating over arange of bandwidth allocations, their work doesnot consider this scenario. Moreover, in Ref. [20],control of the users’ access to the network isachieved indirectly through the pricing mechanismwith the network admitting all users whose utilityfrom the call admission is higher than the price aslong as there is enough unused capacity. In con-trast, we allow the network to have explicit controlof its resources, and the network can determinewhether or not to admit a call and the exactamount of resource to be allocated to the newlyarriving call. While we also consider pricing in ourwork in Section 5, we use our pricing frameworkto ensure that users reveal their true utility func-tions rather than as a mechanism to control users’resource allocation amounts.The rest of the paper is structured as follows. In

Section 2, we formulate the single-link resourceallocation problem as a continuous-time MDP. InSection 3, we discuss a few issues related to ourresource allocation solution, such as incorporationof constraints on the call blocking probability,offline computation, complexity reduction, andextension of our scheme to the multiple-links case.In Section 4, we present sample numerical results

1 Our description here is for the case of continuous-time

MDPs. A similar description may also be given for the case of

discrete-time MDPs.

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 615

Page 4: Optimal resource allocation in multi-class networks with user-specified utility functions

and compare the performance of our optimal re-source allocation scheme with that of a greedyscheme that takes a shortsighted view to the re-source allocation problem. In Section 5, we discussthe implications of deliberate lying by users abouttheir utility functions, and develop a pricing mech-anism that prevents such behavior by users.

2. MDP formulation of resource allocation

In this section, we formulate the resource allo-cation problem as an MDP. Efficient methods existto solve such problems, namely value iteration,policy iteration, and linear programming [6–8]. Inthe interest of space, we merely formulate theproblem as an MDP and refer the reader to thereferences cited above for solution techniques.We first describe the problem scenario. In this

section, we consider a single link with a finite ca-pacity that supports a variety of applications andconsider the allocation of bandwidth to calls orconnections. We extend our resource allocationscheme to the case of multiple links in Section 3.6.This problem of bandwidth allocation is relevantto both circuit-switched and packet-switched net-works. In packet-switched networks, the notion ofbandwidth allocation hides the packet-level dy-namics and simplifies analysis [21]. We assume thatallocations can only be made in multiples of acertain minimum quantity, say, D bandwidth units(BWUs). This assumption is not very restrictivefrom a practical viewpoint because it is true formany slotted systems. Besides, we can make D assmall as we wish. Also, without loss of generality,we assume that D ¼ 1 BWU. We make this as-sumption to reduce notational complexity. We letthe capacity of the channel be C BWUs. Connec-tions or calls can be allocated any integer amountof resource between zero and C BWUs.Utility functions give the amount of utility or

benefit or gain that the user or application obtainsfor different amounts of allocated resource. It isa measure of how ‘‘happy’’ or ‘‘satisfied’’ a user orapplication is at receiving a certain amount of al-location. Economists use ‘utils’ as units of theutility values. Typically, if resources are infinitelydivisible, the utility function would be a mapping

from the interval ½0;C� to the non-negative realsRþ. But, in our case, it suffices if the utility func-tion specifies the per-unit-time utility obtained forall feasible allocations. Note that the only feasibleallocations are the integers between zero and C.We now introduce the concept of utility functionsin this limited context where only a finite numberof allocations are possible. The domain of theutility function is the set of all integers betweenzero and C. We let UðiÞ for i ¼ 0; 1; 2; . . . ;C de-note the value of the utility function for an allo-cation of i BWUs. The value of UðiÞ gives theutility obtained by the application for an alloca-tion of i BWUs for one second and has unitsof utils/s. Thus, a utility function U for the pur-pose of resource allocation is a vector ½Uð0Þ;Uð1Þ; . . . ;UðCÞ�. We further assume that the per-unit-time utility (or utils per second) that the ap-plication or user receives has to be a multiple of acertain other minimum quantity, say, d utils/s.Again, without loss of generality, we assume thatd ¼ 1 to avoid needless notational complexity. Wealso require that the utility values stay uniformlybounded. In other words, there exists a Umax suchthat UðiÞ6Umax for i ¼ 0; 1; . . . ;C for all users.Utility functions also satisfy the condition

UðiÞP 0 for i ¼ 0; 1; 2; . . . ;C:The above condition states that applications orusers cannot obtain a negative utility from a pos-itive quantity of allocated resource. Because utilityfunctions are non-negative, bounded, and integervalued, we have

UðiÞ 2 f0; 1; 2; . . . ;Umaxg for i ¼ 0; 1; 2; . . . ;C;ð1Þ

where we assume Umax is an integer. It is alsonatural to assume that

Uð0Þ ¼ 0; ð2Þwhere Eq. (2) simply states that users cannot havea non-zero utility for zero resource allocation.Besides these properties, there are two others

that many utility functions satisfy. However, ourproblem formulation does not depend on whetheror not the following conditions are met. First, theutility function is expected to be non-decreasing,i.e.,

616 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 5: Optimal resource allocation in multi-class networks with user-specified utility functions

UðiÞ6Uðiþ 1Þ for i ¼ 0; 1; . . . ;C 1:This is due to the fact that applications cannot beexpected to derive more benefit from a smalleramount of resource allocation than from a largeramount of allocation. Secondly, many exampleutility functions satisfy the following concavityproperty, which is a consequence of the ‘‘law ofdiminishing returns’’:

UðjÞ Uðj 1ÞPUðiÞ Uði 1Þ for all j < i:

The above equation says that for higher resourceallocations the utility increments obtained arelower. Applications, typically, find the most usefor the initial allocations, and as the allocation isincreased, the benefits begin to diminish. Someutility functions do not have the concavity prop-erty. (See Ref. [2] and the example of the multi-ratevocoder below.) Therefore, our scheme is designedto work with non-concave utility functions as well.A very elastic application can gainfully use

every additional amount of allocated resource, i.e.,UðiÞ > Uði 1Þ for i ¼ 1; 2; . . . ;C. File-transferapplications, which are very elastic, can be ex-pected to have such utility functions. If a phone isequipped with a multi-rate vocoder that can op-erate at rates r1, r2, and r3 with r1 < r2 < r36C,then its utility function will be such that Uðr3Þ >Uðr2Þ > Uðr1Þ, and UðiÞ for all other values ofallocated resource will be given as follows:

UðiÞ ¼

0 for 06 i < r1;Uðr1Þ for r16 i < r2;Uðr2Þ for r26 i < r3;Uðr3Þ for r36 i6C:

8>>><>>>:

ð3Þ

A non-elastic application that can only operateat a rate of r BWUs has a step utility function ofthe following form:

UðiÞ ¼ 0 for i < r;UðrÞ for r6 i6C:

We let U denote the set of all possible utilityfunctions U . Note that the set of all allowed util-ity functions is limited by the conditions that autility function has to satisfy, i.e.,

U ¼ fU ¼ ½Uð0Þ;Uð1Þ; . . . ;UðCÞ�:U satisfies Eqs: ð1Þ and ð2Þg:

The set U has only a finite number of utilityfunctions in it. It can be verified that the number ofelements in U is no greater than ðUmax þ 1ÞC,where Umax is the greatest utility that a user canobtain for any resource allocation amount. We letM denote the total number of elements in the setU. We think of all users having the same utilityfunction U 2 U as belonging to a class. We alsouniquely number the utility functions in U fromone through M, and use the notation Uk ¼½Ukð0Þ;Ukð1Þ; . . . ;UkðCÞ� to denote the kth suchutility function. Note that once we specify the in-dex of the utility function k, the quantities UkðiÞget fixed for all i ¼ 0; 1; 2; . . . ;C.We assume that class-k users arrive according

to an independent Poisson process with rate kk,and that the call holding duration of a class-k callallocated i BWUs is independent and identicallydistributed (i.i.d.) exponential with mean 1=lki. Bymaking the call holding duration depend on theallocation, we account for the scenario where usershave a fixed amount of data to send resulting inusers having a longer call holding duration forsmaller allocations than for larger allocations.Given the above scenario, the operation of thenetwork is as follows. When a new user arrives,say, belonging to class k, the network allocates it iBWUs of resource, where 06 i6C. (When i ¼ 0,the network blocks the newly arriving call.)The network then starts accruing additional util-ity (or reward) at the rate of UkðiÞ utils/s untilthe call gets completed. The resource allocationproblem is to determine i, given the state of thesystem, Uk, kk, and lki for k ¼ 1; 2; . . . ;Mand i ¼ 1; 2; . . . ;C, that maximizes the averageaggregate utility per unit time of the system. Byformulating the problem as an MDP, we de-termine what appropriate resource allocation de-cisions should be made to achieve such anobjective.We denote the state of the system by a matrix N

with M rows and C columns. The term on the kthrow and the ith column of the matrix N is denotedby nki, which is the number of class-k users allo-cated i BWUs. Let

nk ¼XCi¼1

nki for k ¼ 1; 2; . . . ;M ð4Þ

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 617

Page 6: Optimal resource allocation in multi-class networks with user-specified utility functions

denote the number of active class-k users in thesystem. The finite capacity (C) of the link impliesthat any valid state of the system has to satisfy thecondition

XMk¼1

XCi¼1

inki 6C:

The state space S is the set of all possible states inthe system, which can be written as follows:

S ¼ N

(¼ ½nki�: 16 k6M ; 16 i6C; nki 2 Zþ

andXMk¼1

XCi¼1

inki 6C

);

where Zþ is the set of all non-negative integers.Any nki for 16 k6M and 16 i6C can only takeinteger values between zero and bC=ic, where bxc isthe greatest integer not exceeding x. Therefore, thenumber of values that each nki can take is finite.This implies that the state space S itself has a finitenumber of elements in it.In each state of the system, the resource allo-

cator has a choice of several actions. An actionspecifies the amount of resource to be allocated toa new call arrival of each call class. If a call arrivaloccurs, the resource allocator allocates the amountof resources specified by the chosen action in thatstate, and the system moves to the correspondingnext state. If a call departure occurs in that state,then the system moves to the corresponding nextstate. In either case, the whole process gets re-peated again in the new system state. The objectivein our resource allocation problem is to determinethe optimal action for each state that maximizesthe average utility per unit time.We denote an action in state N 2 S by an M-

vector u ¼ ½u1; u2; . . . ; uM � such that uk 2 f0; 1;2; . . . ;Cg. The value of uk denotes the number ofunits of resource that a newly arriving class-k useris to be allocated. (When uk ¼ 0, the resource al-locator rejects newly arriving class-k users, and thestate of the system does not change.) Clearly, thenumber of allowed actions in a state depends onthe state of the system. For example, in a state thatalready has a non-zero amount of resource in use,

a newly arriving user cannot be allocated C BWUsof resource. The set of all allowed actions in stateN , known as the action space, can be written asfollows:

KN ¼ u: uk 2 f0; 1; . . . ;Cg andXMk¼1

XCi¼1

inki

(

þmaxfu1; u2; . . . ; uMg6C

);

where the nki for 16 k6M and 16 i6C define thematrix N . The action space in a state is the set ofall possible actions that does not exceed the ca-pacity of the channel, and it follows that the actionspace KN for N 2 S has only a finite number ofelements in it. If a new class-k call arrives in stateN 2 S and u is the chosen action, then it is allo-cated uk units of resource. The system then movesto the appropriate next state. If there is a callcompletion, the amount of resource used by theterminating call gets released for future call arrivals.In our MDP formulation of the resource allo-

cation problem, we now specify the reward rate ineach state N 2 S when action u 2 KN is chosen.The reward rate is

RuN ¼

XMk¼1

XCi¼1

nkiUkðiÞ utils=s; ð5Þ

which is the sum of the utility values of each activeuser in the system corresponding to the amount ofresources allocated to it. Note that the definitionof reward in Eq. (5) is independent of the choice ofaction u in that state.To complete our MDP formulation of the re-

source allocation problem, we now specify thetransition rates of the system. We denote thetransition rate from state N 2 S to state Q 2 Swhen action u 2 KN is chosen by au

N ;Q. Let N ¼½nki� and Q ¼ ½qki�. For ease of description, we alsodefine dki to be an M C matrix with zeros in allbut the ðk; iÞth entry where it has a one. Thetransition rates are then given as follows:

auN ;Q ¼

kk if uk 6¼ 0 and Q ¼ N þ dkuk ;

nkilki if nki 6¼ 0 and Q ¼ N dki;

0 otherwise:

8><>:

618 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 7: Optimal resource allocation in multi-class networks with user-specified utility functions

Given that the action in state N admits class-kcalls, i.e., uk 6¼ 0, the system moves to a state withone additional class-k call at the rate of kk, whichis the call arrival rate of class-k calls. If a newclass-k call arrives, then the new call arrival is al-located uk BWUs, and hence the number of activeclass-k users allocated uk BWUs increases by one.With nki active class-k users allocated i BWUs, thesystem moves to the state with one less class-k callallocated i units of resource at the rate of nkilki. Inother words, given that action u is chosen in stateN , the system stays in the same state for an ex-ponentially distributed duration of time with mean½PM

k¼1 ðkkIfuk 6¼0g þPC

i¼1 nkilkiÞ�1, where nk is as

defined in Eq. (4), and I is the indicator functiongiven by

Ifuk 6¼0g ¼1 if uk 6¼ 0;0 if uk ¼ 0:

(

When the system makes a transition, it movesto state N þ dkuk with probability

kkIfuk 6¼0gPMj¼1 kjIfuj 6¼0g þ

PCi¼1 njilji

� ;

and to state N dki with probability

nkilkiPMj¼1 kjIfuj 6¼0g þ

PCi¼1 njilji

� :

It follows from the above formulation that wehave a continuous-time MDP to solve because thetransition rates and the reward rates depend onlyon the current state and the chosen action in thatstate. Note that an MDP is completely character-ized by its state and action spaces, reward rates,and transition rates. This continuous-time MDPformulation with average reward optimizationcriterion has finite state space and finite actionspace. Efficient methods exist to solve such prob-lems, namely value iteration, policy iteration, andlinear programming [6–8]. For the sake of com-pleteness, we briefly describe one particular solu-tion technique, namely linear programming, thatwas also used in Section 4 to obtain the numericalresults. To obtain the optimal policy we solve thefollowing linear program:

maximizefxu

N:N 2 S and u2KN g

XN 2 S

Xu2KN

RuNx

uN ;

subject toXQ 2 S

Xu2KN

xuNauN ;Q

XQ 2 S

Xu2KQ

auQ;Nx

uQ ¼ 0 for N 2 S;

XN 2 S

Xu2KN

xuN ¼ 1; xuN P 0 for N 2 S:

ð6ÞOnce the optimal xu�N have been obtained, the op-timal action in each state is obtained according tothe following procedure. Let S� be the set

S� ¼ N 2 S : xu�N�

> 0 for some u 2 KN

: ð7Þ

Then the optimal action in state N 2 S� is deter-mined as follows. For N 2 S�, action v is chosenwith probability xv�N =

Pu2KN

xu�N , and for N 62 S�,an arbitrary action v 2 KN is chosen with proba-bility one. Note that it follows from Ref. [22] thatthere exists an optimal policy such that xu�N > 0 forexactly one action u 2 KN for each state N 2 S�.An important special case of our formulation

above is the admission control problem for net-works with non-elastic applications. Such prob-lems have been the subject of several papers[15–18]. When applications are non-elastic, ourresource allocation problem reduces to the ad-mission control problem, where the network eitherrejects the call or admits the call and allocates itthe exact amount of resource it needs. The solutiondeveloped here can be used for the admissioncontrol problem as well.

3. Discussion

In this section, we discuss a few aspects relatedto the MDP formulation of the resource allocationproblem developed in the last section.

3.1. Offline computation

The computation of the optimal action associ-ated with each state (or the optimal policy) can bedone beforehand and need not be done each time acall arrival or call departure occurs. The computeddecisions may be stored in the form of a table, and

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 619

Page 8: Optimal resource allocation in multi-class networks with user-specified utility functions

a table look-up can be done each time the systemreaches a new state. Thus there is little real-timecomputational overhead involved in the imple-mentation of the scheme. However, if the call ar-rival rates and the call holding durations vary withtime, the optimal decisions may need to be re-computed as needed based on updated values ofthese quantities. (Numerical results in Section 4show that the optimal decision depends on thevalue of the call arrival rates and call holdingdurations of call classes.)

3.2. Uncertainty in transition rates

As we show in Section 4, the optimal policydepends on the values of the call arrival rates andcall holding durations of call classes. But the re-source allocator may not have exact knowledge ofthese quantities beforehand. However, it is rea-sonable to expect the resource allocator to know arange over which these quantities take values.Given such a range of possible values for the callarrival rates and the call holding durations, themethods described in Refs. [23–25] can be used toperform sensitivity analysis of the chosen policyand to obtain max–min and max–max optimalpolicies. Sensitivity analysis of a policy is per-formed to obtain the range of values that the av-erage per-unit-time utility can take given the rangeof values that the call arrival rates and the callholding durations take. Max–min optimal policy,on the other hand, is a policy that maximizes theworst-case average reward per unit time. In otherwords, using the max–min optimal policy, the re-source allocator can maximize the average utilityper unit time assuming that the transition rates willget chosen in a manner that minimizes this quan-tity.

3.3. Model reduction techniques

While our scheme can be applied to networkscenarios with a large number of classes and highcapacity values, computing the optimal solution insuch cases may involve solving MDPs with largestate and action spaces. In this section, we describea few model reduction techniques that can be used

to reduce the computational complexity involvedin obtaining the optimal solution. We classify themodel reduction techniques in this section intothose that yield the optimal policy and those thatcan be used to obtain approximately optimalpolicies.We first describe a model reduction technique

such that the solution to the reduced model stillyields the optimal policy. We make a simple ob-servation that can result in a significant reductionin the state and action spaces of practical problemsof interest. Note that if a certain utility functionUk is such that Ukðiþ 1Þ ¼ UkðiÞ, then the networkobtains no additional amount of utility by allo-cating that user iþ 1 BWUs instead of i BWUs ofresource. Therefore, we can eliminate those statesand actions that correspond to allocating class-kusers iþ 1 BWUs. For example, the only alloca-tions that are meaningful for the multi-rate vo-coder whose utility function is given by Eq. (3) arer1, r2, and r3. We can define the set of all actionsthat can be eliminated in state N 2 S due to theabove observation as follows:

LN ¼ u 2 KN : uk > 0 and UkðukÞ ¼ Ukðuk 1Þg:fSimilarly, we can define the set of states that can beeliminated due to the above observation as fol-lows:

A ¼ N 2 S : nkif > 0 and UkðiÞ ¼ Ukði 1Þfor iP 1g:

Based on the above observation, we can now de-fine our reduced state space as

S0 ¼ S n A;where the notation S n A is used to denote the setof all elements in S but not in A. Similarly, thereduced action space in state N 2 S0 can be writtenas

K 0N ¼ KN n LN :

We believe that the above state and action spacereduction will reduce the computational complex-ity involved in obtaining the solution to practicalproblems of interest.We now describe model reduction techniques

that yield approximately optimal policies. Because,

620 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 9: Optimal resource allocation in multi-class networks with user-specified utility functions

we compromise on the optimality of the policieswe seek, the reduction in complexity can be quitesubstantial. Generic model reduction techniques[26,27] can be used to obtain approximate solu-tions that reduce the computational complexityinvolved in solving MDPs with large state andaction spaces. Neuro-dynamic programming [9]techniques have recently been successfully appliedto admission control problems to obtain approxi-mate solutions [28].We next describe the optimal complete parti-

tioning policy, which is an approximately optimalpolicy. The optimal complete partitioning policywould set aside portions of the bandwidth exclu-sively for users of different classes (similar to thecomplete partitioning admission control policy[29]). Let there be partitions P1; P2; . . . ; PM of thecapacity of the channel C such that

XMk¼1

Pk ¼ C;

with Pk P 0 being the capacity of the channel thatis exclusively reserved for use by class-k calls. Be-cause we allow only integer allocations to users, Pkfor k ¼ 1; 2; . . . ;M have to be integers. The aim ofthe optimal complete partitioning policy is to de-termine the values P �

1 ; P�2 ; . . . ; P

�M such that the

average aggregate utility is maximized. Becausethe set of complete partitioning policies are asubset of the set of all policies we considered inSection 2, the average aggregate utility obtainedfrom the optimal complete partitioning policy canat most be equal to that of the optimal policyobtained in Section 2.Within each partition Pk reserved for class-k

calls the optimal choice of allocations can be de-termined by solving MDPs that have much smallerstate and action spaces. Let R�

kðPkÞ for k ¼1; 2; . . . ;M be the optimal average aggregate utilityfrom class-k calls when the capacity set aside forclass-k calls is Pk. The value for R�

kðPkÞ is obtainedfrom the solution to the MDP that is formulated inthe same manner as in Section 2 but with a singlecall class and with the capacity being equal to Pk.The optimal complete partitioning policy can thenbe cast in the following form:

maximizeP1;P2;...;PM

XMk¼1

R�kðPkÞ;

subject toXMk¼1

Pk ¼ C;

Pk P 0;

Pk integer:

The solution to the above problem can be obtainedby a dynamic program similar to that in Ref. [29].If we let fkðyÞ be the average aggregate utility ofclasses 1 through k with y as the capacity available,then the dynamic programming equations are

f1ðyÞ ¼ R�1ðyÞ for 06 y6C;

fkðyÞ ¼ maximize06 s6 y

ðR�kðsÞ þ fk1ðy sÞÞ

for 06 y6C and k ¼ 2; 3; . . . ;M :

The solution to the above dynamic program wouldgive the optimal partition ½P �

1 ; P�2 ; . . . ; P

�M �.

The performance of the optimal complete par-titioning policy should be compared with thatof the optimal policy over a range of parametervalues to determine the loss in utility that can beexpected from using the optimal complete parti-tioning policy. Another strategy would be to usethe policy that performs best between the optimalcomplete partitioning policy and the greedy policythat is explained in Section 4. In Section 4, wecompare the performance of the optimal completepartitioning policy with that of the optimal policy.In our example, we find that the performance ofthe optimal complete partitioning policy is fairlyclose to that of the optimal policy. While we havediscussed some complexity reduction techniques inthis section, more work is needed to study the ef-fectiveness of these in practical problems of inter-est and to develop techniques to reduce thecomplexity even further.

3.4. Maximization of utility with constraints

Our resource allocation scheme was developedwith the objective of maximizing utility. In certainsituations, maximization of utility will not be theonly objective of the network. For example, in

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 621

Page 10: Optimal resource allocation in multi-class networks with user-specified utility functions

admission control problems with non-elastic ap-plications, several authors consider constraintsplaced on the call blocking probabilities experi-enced by the call classes [15,17,18]. In Ref. [15], theauthors consider the scenario where each call classhas a maximum call blocking probability that itcan tolerate. Therefore, they design an admissioncontrol algorithm that maximizes the throughputof the network subject to the constraints placed onthe call blocking probabilities of each call class.The viewpoint in Refs. [17,18] is that differentlevels of pricing will require that the call blockingprobabilities of call classes be related in a certainmanner to justify the difference in prices. The au-thors treat the problem of maximizing throughputand maintaining the desired relations between callblocking probabilities as a multi-objective opti-mization problem.Motivated by the same considerations as in the

above papers, we envision a scenario where theresource allocation needs to consider multipleobjectives. Constraints similar to those in Refs.[15,17,18] can be incorporated in our MDP for-mulation. For example, if each call class has amaximum call blocking probability of Dk, 16k6M that it can tolerate, then the optimal deci-sion is obtained from the solution of the followingmodified linear program:

maximizefxu

N:N 2 S and u2KN g

XN 2 S

Xu2KN

RuNx

uN ;

subject toXQ 2 S

Xu2KN

xuNauN ;Q

XQ 2 S

Xu2KQ

auQ;Nx

uQ ¼ 0 for N 2 S;

XN 2 S

Xu2KN

xuN ¼ 1; xuN P 0 for N 2 S:

ð8ÞXN 2 S

Xu2 BN ðkÞ

xuN 6Dk for 16 k6M ; ð9Þ

where BNðkÞ is the following set:

BNðkÞ ¼ fu ¼ ½u1; u2; . . . ; uM �: u 2 KN and uk ¼ 0g:

The set BNðkÞ is the set of all actions in state Nthat reject class-k calls, and Eq. (9) is the long-runaverage duration of time spent in those states ofthe system that reject class-k calls. We note that xuN

represents the long-run fraction of time the systemspends in state N with u as the chosen action. Oncethe optimal xu�N have been obtained from the abovelinear program, the optimal action in each state isobtained according to the procedure described inSection 2. Using the result in Ref. [22] it can beshown that the optimal solution obtained in such amanner will require randomization in at most Mstates.

3.5. Extension to general call holding times

In our work, we assume that call arrivals arePoisson and that call holding durations are expo-nentially distributed. While exponential call hold-ing durations are justified for traditional voiceusers, there is growing evidence that it may beunsuited for other types of applications [30]. Ourwork here is a first step towards addressing theresource allocation problem with general callholding time distributions. We believe that ourwork can be extended to more general call holdingdistributions. Our motivation is previous workthat shows the insensitivity of the steady statedistribution of the number of calls belonging toeach call class to the call holding time distributionfor coordinate convex admission control policies[31,32]. Thus, if the optimal resource allocationpolicy obtained by our method turns out to becoordinate convex, then that policy will continueto be optimal among all coordinate convex policiesfor any general call holding distribution with thesame mean as the original exponential call holdingduration. Our policy will continue to be optimalamong all coordinate convex policies for anygeneral call holding distribution with the samemean as the original exponential call holding du-ration whenever the optimal policy itself is coor-dinate convex. However, we cannot guarantee theoptimality of our resource allocation policy amongall resource allocation policies when the callholding durations are generally distributed. Toobtain the optimal policy among all policies withgeneral call holding distributions, more research isneeded on Markov regenerative decision processes[33]. The usefulness of such processes stems fromthe fact that, when call holding durations aregenerally distributed, the Markov property still

622 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 11: Optimal resource allocation in multi-class networks with user-specified utility functions

holds at those instants when the system becomesempty. Note, however, that the Markov propertyno longer holds at call arrival and call terminationinstants.

3.6. Multi-link resource allocation

In this section, we extend our single-link re-source allocation solution to the case of multiplelinks with pre-computed routes. The objective ofthe resource allocation is to maximize the aggre-gate utility per unit time of the entire network. Theutility functions now denote the amount of utilitythat the user obtains from all possible end-to-endresource allocations. If a user obtains differentamounts of allocation over different links throughwhich it traverses, then the utility that the userobtains corresponds to that obtained from theminimum of those allocations. We consider anetwork with a finite set L of links, and we as-sume that the routes for each user are pre-com-puted and fixed. Because the number of links in thenetwork is finite, so are the number of possible(cycle-free) routes. For the multi-link case, twousers are said to belong to the same class only ifthose users have identical utility functions (as de-fined in Section 2) and identical routes. Because wehave a finite number of possible utility functionsand finite number of routes, we have a finitenumber of possible classes. We denote the totalnumber of classes of users by M 0. We assume thatthe route information for user classes is stored in amatrix c such that

ckj ¼

1 if class-k users are routedthrough link j;

0 if class-k users are not routedthrough link j:

8>><>>: ð10Þ

We denote the capacity of link j by Cj. As in thesingle-link case, we assume that class-k users arriveaccording to a Poisson process with rate kk andthat the call holding duration of a class-k call al-located i BWUs is independent and exponentiallydistributed with a mean of 1=lki. For the multi-link case, states are again denoted by a matrix Nwith M 0 rows and maxfCj: j 2 Lg columns. We let

Cmax ¼ maxfCj: j 2 Lg;

where we assume that Cmax is an integer. The termon the kth row and the ith column of the matrix Nis denoted by nki, which is the number of class-kusers allocated i BWUs. Note that the user is al-located i BWUs over all the links through whichthe class-k user traverses. The state space S for themulti-link case can be written as

S ¼ N

8<: ¼ ½nki� : nki 2 Zþ and

XM 0

k¼1

XCmaxi¼1

inkickj 6 Cj for j 2 L

9=;:

Note that any valid state has to be such that thelink capacity of none of the links is exceeded. Theckj in the above equation ensures that the sum-mation is carried out only over those classes ofusers that are routed through link j. Actions aredenoted by an M 0 vector u ¼ ½u1; u2; . . . ; uM 0 � suchthat uk 2 f0; 1; . . . ;Cmaxg. As before, the value ofuk denotes the number of units of resource to beallocated to a class-k user. The action space cannow be written as follows:

KN ¼ u :XM 0

k¼1

XCmaxi¼1

inkickj

8<:

þmaxfu1c1j; u2c2j; . . . ; uMcMjg6 Cj for all j 2 L

9=;:

The reward rates and the transition rates are de-fined as in Section 2. The problem can then besolved by any of the methods used to solve MDPs[6–8].

4. Numerical results

In this section, we present numerical results forthe single-link case to illustrate our resource allo-cation scheme. In our first example the capacityof the channel is 4 BWUs. There are two classesof calls with their utility functions given byU1 ¼ ½1; 2; 2; 2� and U2 ¼ ½0; 3; 3; 3�. Calls belong-ing to class 1 are elastic and can use 1 or 2 BWUsgainfully with varying utilities associated with it.Class-2 calls, however, are non-elastic and can

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 623

Page 12: Optimal resource allocation in multi-class networks with user-specified utility functions

only use 2 BWUs. Any amount of allocation above2 BWUs does not result in an increase in utility,and any amount of allocation below 2 BWUs re-sults in zero utility for class-2 calls. The callholding durations of both classes of calls are as-sumed i.i.d. exponential with mean 60 s.The optimal action in each state of the system

for the above example is given in Table 1. Weconsider three different scenarios by varying thecall arrival rate of class-2 calls. We put an ‘x’ forthe optimal action in those states that are ‘‘do notcare’’ states. These states do not belong to S� asdefined in Eq. (7). Thus any arbitrary action canbe chosen in such states. From the table, it can beseen that such states are transient states corre-sponding to the chosen policy. Those states thatare not listed in the table are ‘‘do not care’’ statesfor all the three scenarios considered. The resultsin the table are intuitively appealing. For example,as we increase the call arrival rate of class-2 calls,the optimal resource allocation policy tends tofavor class-2 calls over class-1 calls. This is to beexpected because allocating 2 BWUs to a class-2call results in utility being accrued at the rate of 1.5utils/s per BWU, which is higher than that ob-tained for class-1 calls for any amount of alloca-

tion, and also due to the total capacity being aneven number.To further illustrate our scheme, we consider

a second example. As before, we have two callclasses with the channel capacity being 4 BWUs.But now the utility function for class-1 calls is½1; 1; 4; 4� and that of class-2 calls is ½0; 3; 3; 3�. Callholding durations of both call classes are assumedi.i.d. exponential with a mean of 60 s. Table 2 givesthe optimal actions for the corresponding statesfor three different scenarios.In Fig. 1, we plot the average utility per unit

time obtained from our utility maximizing scheme.The capacity of the channel is 20 BWUs, and thereare two call classes with utility functions ½1; 1; 4;4; . . . ; 4� and ½0; 3; 3; . . . ; 3�, respectively. The figureis for a fixed k2 value of 0.05 calls/s, and the callholding durations of both classes of calls areassumed i.i.d. exponential with mean 60 s. Wecompare the performance of our scheme with thegreedy scheme. The greedy scheme allocates allincoming calls that amount of resource of the

Table 1

Optimal actions at the corresponding states for three different

scenarios

Statek1 ¼ 0:5,k2 ¼ 0:01

k1 ¼ 0:5,k2 ¼ 0:05

k1 ¼ 0:5,k2 ¼ 0:1

0 0 0 00 0 0 0

� �(2,2) (2,2) (0,2)

0 0 0 00 1 0 0

� �(2,2) (0,2) (0,2)

0 0 0 00 2 0 0

� �(0,0) (0,0) (0,0)

0 1 0 00 0 0 0

� �(2,2) (0,2) x

0 1 0 00 1 0 0

� �(0,0) (0,0) x

0 2 0 00 0 0 0

� �(0,0) x x

The utility function of class-1 calls is ½1; 2; 2; 2� and that of class-2 calls is ½0; 3; 3; 3�.

Table 2

Optimal actions at the corresponding states for three different

scenarios

State k1 ¼ 0:1,k2 ¼ 0:02

k1 ¼ 0:1,k2 ¼ 0:03

k1 ¼ 0:1,k2 ¼ 0:3

0 0 0 00 0 0 0

� �(3,2) (3,2) (0,2)

0 0 0 00 1 0 0

� �(1,2) (0,2) (0,2)

0 0 0 00 2 0 0

� �(0,0) (0,0) (0,0)

1 0 0 00 0 0 0

� �(3,0) (3,2) x

1 0 0 00 1 0 0

� �(1,0) (1,0) x

2 0 0 00 0 0 0

� �(0,2) (0,2) x

2 0 0 00 1 0 0

� �(0,0) (0,0) x

0 0 1 00 0 0 0

� �(1,0) (1,0) x

1 0 1 00 0 0 0

� �(0,0) (0,0) x

The utility function of class-1 calls is ½1; 1; 4; 4� and that of class-2 calls is ½0; 3; 3; 3�.

624 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 13: Optimal resource allocation in multi-class networks with user-specified utility functions

unused capacity that maximizes the utility ob-tained from the incoming call. To exactly charac-terize the action chosen by the greedy policy instate N 2 S, we define the set FN of all feasibleallocations in state N as follows:

FN ¼ x 2 Zþ: x:þXMk¼1

XCi¼1

inki 6C

):

(

With the above definition, the kth component ofthe action chosen by the greedy scheme is

ukðNÞ ¼ min x 2 FN : UkðxÞf PUkðyÞfor all y 2 FNg:

The greedy scheme clearly takes a shortsightedview and tries to maximize the utility that can beobtained from the first arriving call. As can be seenfrom Fig. 1, the greedy scheme results in loweraverage utility per unit time than the optimalutility maximizing scheme. We also find that theperformance of the greedy scheme deteriorates aswe increase the call arrival rate of class-1 calls.This is because the greedy scheme wastes moreresources on class-1 calls as k1 increases. As re-marked in Section 3.3, we find that the perfor-mance of the optimal complete partitioning policyis very close to that of the optimal policy. Webelieve that the optimal complete partitioningpolicy and the complete sharing policy will com-

plement each other very well, with the completepartitioning policy performing well when thecomplete sharing policy does not, and vice versa.In this section the numerical examples are meantfor the purposes of illustration. It should be noted,however, that our scheme can also be applied tonetwork scenarios including a large number ofclasses and high capacity values.

5. Pricing mechanism

In this section, we discuss the implications of‘‘selfish behavior’’ by users and recommend pric-ing as a means of preventing such behavior. Weshow that such ‘‘selfish behavior’’ by users resultsin poor performance of the network, and we de-velop a pricing mechanism that prevents individualusers from deliberately modifying their utilityfunctions. Our resource allocation scheme requiresknowledge of the utility functions of all incomingusers of the network. The natural way to obtainsuch information is for each incoming user tocommunicate to the resource allocator its utilityfunction for the requested connection or call. Butthis allows a user to deliberately communicate amodified utility function that will optimize its ownallocation. Clearly, such a situation is undesirableand should be avoided.To illustrate the consequences of deliberate

modification of the utility functions by users, let usconsider the scenario where there are two classesof calls. Let the capacity of the channel be4 BWUs. Let the utility function of class-1 calls be½1; 2; 2; 2�, and let that of class-2 calls be ½0; 3; 3; 3�.Now let us assume that all class-1 calls commu-nicate their utility function to be ½0; 3; 3; 3� whileclass-2 calls communicate their true utility func-tion. Class-1 users communicate such a modifiedutility function because they are aware that in-flating their utility function in this manner willincrease their resource allocation. We set the meancall holding duration of both classes of calls at 60s. We fix the call arrival rate of class-2 calls at 0.1calls/s and allow the call arrival rate of class-1 callsto vary between 0.1 and 1 call/s. The resource al-locator makes resource allocation decisions as-suming that class-1 calls have the utility function

Fig. 1. Comparison of the average utility per unit time obtained

for the greedy, utility maximizing, and optimal complete par-

titioning schemes. The utility function of class-1 calls is

½1; 1; 4; 4; . . . ; 4� and that of class-2 calls is ½0; 3; 3; . . . ; 3�.

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 625

Page 14: Optimal resource allocation in multi-class networks with user-specified utility functions

½0; 3; 3; 3�. In Fig. 2, we plot the average utility perunit time as a function of the call arrival rate ofclass-1 calls. We plot the utility the network thinksit is obtaining from its resource allocation decision(perceived utility), the utility that can be achievedif the network knew the actual utility of class-1calls (achievable utility), and the utility that is ac-tually achieved by using the modified utility func-tion of class-1 users (achieved utility). Clearly, theachieved utility is far below what is achievable. Toavoid this situation, there should be disincentivesto prevent users from lying about their utilityfunctions. One possibility is to introduce a pricingmechanism that is appropriately designed so thatusers do not gain from deliberately lying abouttheir utility functions. We now develop such apricing mechanism.Let pk denote the price per unit time that a

class-k user is charged. Thus if a class-k user staysin the system for T seconds, then the user ischarged pkT dollars. Such a pricing scheme is al-location independent because the price does notdepend on the amount of resource allocated.(We will show later that charging users differentamounts for different allocations does not result inany additional flexibility in inducing users to re-veal their true utility functions.) We also let dkðiÞbe the probability that a class-k user is allocated i

BWUs. With these notations, the expected rate ofreturn for a class-k user conveying its true utilityfunction Uk can be written asXCi¼1

UkðiÞdkðiÞ pk;

where we assume that the utility functions are alsoexpressed in units of $/s. On the other hand, if theuser were to communicate an alternate utilityfunction U l instead of its true utility function Uk,then the expected reward accrual rate would be

XCi¼1

UkðiÞdlðiÞ pl:

In this case, the resource allocator decides the al-location amount and the price to be charged as-suming that the user’s utility function is U l.Our objective in using pricing is to ensure that

there is no incentive for users to lie about theirutility functions. If the prices fpkg are determinedin a manner that satisfies the following conditions,then a user with utility function Uk will reveal itstrue utility function:XCi¼1

UkðiÞdkðiÞ pk PXCi¼1

UkðiÞdlðiÞ pl

for all l 6¼ k: ð11ÞIn other words, if the condition in Eq. (11) issatisfied for all k, then, among all the utilityfunctions that a user can convey, the expected re-ward rate is highest for its true utility function. Onthe left hand side of Eq. (11) is the expected rewardrate for a class-k user revealing its true utilityfunction, and on the right hand side is the expectedreward rate for a class-k user conveying its utilityfunction to be U l. Note that the user’s allocationand price are determined by the utility functionthat the user reveals. Such a pricing mechanismthat forces users to reveal their true utility func-tions is called an incentive-compatible pricingscheme [34,35].The incentive-compatible pricing problem,

therefore, is to choose a set of prices fpkg thatsatisfies Eq. (11) for each k 2 f1; 2; . . . ;Mg, whichis a linear feasibility problem. We have the fol-lowing necessary condition for the incentive-com-patible pricing problem.

Fig. 2. Consequence of users lying about their utility functions.

In this case, class-1 users communicate their utility function as

½0; 3; 3; 3� instead of the actual ½1; 2; 2; 2�. Class-2 users com-municate their real utility function ½0; 3; 3; 3�.

626 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 15: Optimal resource allocation in multi-class networks with user-specified utility functions

XCi¼1

ðUkðiÞ UlðiÞÞðdlðiÞ dkðiÞÞ6 0

for every pair k; l: ð12Þ

The condition in Eq. (12) is obtained by observingthat for a class-k user to reveal its true utilityfunction, we need

XCi¼1

UkðiÞdkðiÞ pk PXCi¼1

UkðiÞdlðiÞ pl;

and for a class-l user to reveal its true utilityfunction we need

XCi¼1

UlðiÞdlðiÞ pl PXCi¼1

UlðiÞdkðiÞ pk:

Adding these two equations we get the necessarycondition in Eq. (12). The necessary condition inEq. (12) has the interpretation that a class-k usershould not be able to gain more by revealing itselfto be a class-l user than a class-l user itself gains byrevealing its true utility function instead of Uk.From the necessary condition in Eq. (12), it fol-lows that an incentive-compatible pricing schememay not always exist. It can also be shown that theconditions in Eq. (12) are not sufficient for theincentive-compatible pricing problem.We now show that there is no loss of generality

in restricting attention to allocation-independentpricing schemes. Towards this end, we defineallocation-dependent pricing schemes and ran-domized allocation-dependent pricing schemes. Apricing scheme is said to be allocation dependent ifthe price per unit time for a user depends on theamount of resource allocated to it. For the allo-cation-dependent pricing scheme, we let pkðiÞ de-note the price per unit time for a class-k userallocated i BWUs. If the per-unit-time price for auser is determined from a probability distribu-tion that depends on the class of the user and theamount of resource allocated, then we call it arandomized allocation-dependent pricing scheme.

Proposition 1. If there exists an allocation-de-pendent pricing scheme that guarantees incentivecompatibility, then there exists an allocation-in-dependent pricing scheme that guarantees incentive

compatibility. Similarly, if there exists a ran-domized allocation-dependent incentive-compatiblepricing scheme, then there exists an allocation-in-dependent incentive-compatible pricing scheme.

Proof. Let fpkðiÞg be the set of allocation-depen-dent prices that guarantees incentive compatibility.This implies that the following conditions hold foreach k:

XCi¼1

ðUkðiÞ pkðiÞÞdkðiÞPXCi¼1

ðUkðiÞ plðiÞÞdlðiÞ

for all l 6¼ k: ð13Þ

By choosing the allocation-independent price pk tobe

PCi¼1 pkðiÞdkðiÞ for each k we see that an allo-

cation-independent price that guarantees incentivecompatibility exists whenever there exists an allo-cation-dependent pricing scheme that guaranteesincentive compatibility. Similarly, we can showthat if a randomized allocation-dependent in-centive-compatible pricing scheme exists, then sodoes an allocation-independent incentive-compat-ible pricing scheme. �

The solution to the incentive-compatible pricingproblem depends on the values of dkðiÞ. We canobtain values of dkðiÞ by summing the steady-stateprobabilities of the system being in those statesthat allocate class-k users i BWUs. This is becausethe state of the system, as observed by a newlyarriving user in steady state, has a probabilitydistribution identical to that of the steady-statedistribution of the system. For instance, in theexample in Table 1 for the case where k1 ¼ 0:5 andk2 ¼ 0:05, we can obtain the values of d1ð2Þ andd2ð2Þ to be 0.0120 and 0.3952, respectively. Thesevalues represent the sum of the steady-stateprobabilities of the system being in states that al-locate 2 BWUs to the respective user classes. But,if a group of users decide to collude and lie abouttheir utility functions, these values would be in-accurate because the steady-state distribution ismodified as a result of the modification in the callarrival rates. For the example discussed above, tosimplify the problem, we make the assumptionthat U1 ¼ ½1; 2; 2; 2� and U2 ¼ ½0; 3; 3; 3� are theonly possible utility functions. Thus users can only

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 627

Page 16: Optimal resource allocation in multi-class networks with user-specified utility functions

choose one of U1 or U2 as their utility functions.Then, under the no-collusion assumption, it fol-lows from Eq. (11) that any pricing scheme thatsatisfies the condition 0:74246 p2 p16 1:1136will be an incentive-compatible scheme, where wehave assumed that the units on the utility functionsare $/s.In addition to the incentive compatibility con-

straint, it may be desired that a pricing mechanismshould ensure that the expected return to users behigher than the price. Without such a condition,users may not be interested in the service at all.This condition can easily be incorporated by im-posing the following additional constraint.

XCi¼1

UkðiÞdkðiÞP pk for each k: ð14Þ

Incorporating the condition in Eq. (14) in the ex-ample discussed above, we have that p1 and p2should satisfy the additional conditions that p160:024 and p26 1:1856.

6. Conclusions

We developed an optimal resource allocationscheme for elastic applications in multi-class net-works. We formulated the resource allocationproblem as a continuous-time MDP. In our ap-proach, users specify a utility function that quan-tifies their benefit corresponding to differentresource allocation amounts. Using numerical re-sults, we showed that the optimal scheme results insignificant improvement in the average aggregateutility when compared to the greedy resourceallocation scheme. To reduce the computationalcomplexity involved in obtaining the optimalscheme, we discussed two types of model reductiontechniques. In the first type, the policy obtained isstill optimal, and in the second type, the policywould only be approximately optimal but with ahigher reduction in the computation complexity.We studied the implications of users lying abouttheir utility functions and developed a pricingmechanism that prevents users from doing so. Wealso discussed other aspects related to our scheme,such as incorporating constraints on the call

blocking probabilities, offline computation, andextension to multiple links and general call holdingtime distributions.

Acknowledgements

The authors wish to thank Mike Needham andSteve Gilbert of Motorola Labs for useful discus-sions on the subject of this paper.

References

[1] J. Liebeherr, Multimedia networks: issues and challenges,

Computer 28 (4) (1995) 68–69.

[2] S. Shenker, Fundamental design issues for the future

Internet, IEEE Journal on Selected Areas in Communica-

tions 13 (7) (1995) 1176–1188.

[3] N. Davies, J. Finney, A. Friday, A. Scott, Supporting

adaptive video applications in mobile environments, IEEE

Communications Magazine 36 (6) (1998) 138–143.

[4] H.R. Varian, Microeconomic Analysis, W.W. Norton,

New York, 1992.

[5] F.P. Kelly, A.K. Maulloo, D.K.H. Tan, Rate control for

communication networks: shadow prices proportional

fairness and stability, Journal of Operations Research

Society 49 (3) (1998) 237–252. Available from <http://

www.statslab.cam.ac.uk/�frank/PAPERS>.[6] H. Mine, S. Osaki, Markovian Decision Processes, Else-

vier, Amsterdam, 1970.

[7] S.M. Ross, Applied Probability Models with Optimization

Applications, Holden-Day, San Francisco, CA, 1970.

[8] M.L. Puterman, Markov Decision Processes: Discrete

Stochastic Dynamic Programming, Wiley, New York,

1994.

[9] D.P. Bertsekas, J.N. Tsitsiklis, Neuro-dynamic Program-

ming, Athena Scientific, Belmont, MA, 1996.

[10] J. Sairamesh, D. Ferguson, Y. Yemini, An approach to

pricing, optimal allocation and quality of service provi-

sioning in high-speed packet networks, in: IEEE INFO-

COM’95, vol. 3, 1995, pp. 1111–1119.

[11] J.F. Kurose, R. Simha, A microeconomic approach to

optimal resource allocation in distributed computer sys-

tems, IEEE Transactions on Computers 38 (5) (1989) 705–

717.

[12] S.H. Low, D.E. Lapsley, Optimization flow control. I:

Basic algorithm and convergence, IEEE/ACM Transac-

tions on Networking 7 (6) (1999) 861–875.

[13] K. Park, M. Sitharam, S. Chen, Quality of service

provision in noncooperative networks with diverse user

requirements, Decision Support Systems, Special issue on

Information and Computation Economics, 28 (2000) 101–

122.

628 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630

Page 17: Optimal resource allocation in multi-class networks with user-specified utility functions

[14] H. Yaiche, R.R. Mazumdar, C. Rosenberg, A game

theoretic framework for bandwidth allocation and pricing

in broadband networks, IEEE/ACM Transactions on

Networking 8 (5) (2000) 667–678.

[15] J.M. Hyman, A.A. Lazar, G. Pacifici, A separation

principle between scheduling and admission control for

broadband switching, IEEE Journal on Selected Areas in

Communications 11 (4) (1993) 605–616.

[16] K.W. Ross, D.H.K. Tsang, Optimal circuit access policies

in an ISDN environment: A Markov decision approach,

IEEE Transactions on Communications 37 (9) (1989)

934–939.

[17] S. Kalyanasundaram, E.K.P. Chong, N.B. Shroff, Admis-

sion control schemes to provide class-level QoS in multi-

class networks, in: Proceedings of the 38th Annual Allerton

Conference on Control Communication and Computing,

vol. 1, 2000, pp. 113–122.

[18] S. Kalyanasundaram, E.K.P. Chong, N.B. Shroff, Admis-

sion control schemes to provide class-level QoS in multi-

class networks, Computer Networks 35 (2001) 307–326.

[19] E. Altman, Applications of Markov decision processes in

communication networks––A survey, Tech. Rep., INRIA,

2000. Available from www.inria.fr/RRRT/RR-3984.html.

[20] I.C. Paschalidis, J.N. Tsitsiklis, Congestion-dependent

pricing of network services, IEEE/ACM Transactions on

Networking 8 (2) (2000) 171–184.

[21] G. de Veciana, C. Courcoubetis, J. Walrand, Decoupling

bandwidths: A decomposition approach to resource man-

agement in networks, in: IEEE INFOCOM’94, vol. 2,

Toronto, Canada, 1994, pp. 446–474. Available from

http://walrandpc.eecs.berkeley.edu/Pubs.html.

[22] K.W. Ross, Randomized and past-dependent policies for

Markov decision processes with multiple constraints,

Operations Research 37 (3) (1989) 474–477.

[23] R. Givan, S. Leach, T. Dean, Bounded parameter Markov

decision processes, in: Proceedings of the Fourth European

Conference on Planning, 1997, pp. 234–246. Available

from <http://dynamo.ecn.purdue.edu/�givan>.[24] J.K. Satia, R.E. Lave Jr., Markovian decision processes

with uncertain transition probabilities, Operations Re-

search 21 (1973) 728–740.

[25] S. Kalyanasundaram, E.K.P. Chong, N.B. Shroff, Markov

decision processes with uncertain transition rates: Sensi-

tivity and robustness, Tech. Rep., Purdue University, 1999.

[26] T. Dean, R. Givan, S. Leach, Model reduction techniques

for computing approximately optimal solutions for Mar-

kov decision processes, in: Uncertainty in AI, 1997.

Available from <http://dynamo.ecn.purdue.edu/�givan>.[27] T. Dean, R. Givan, Model minimization in Markov

decision processes, in: Proceedings of the National

Conference on Artificial Intelligence, 1997, pp. 106–

111. Available from <http://dynamo.ecn.purdue.edu/�givan>.

[28] P. Marbach, O. Mihatsch, J.N. Tsitsiklis, Call admission

control and routing in intergrated service networks using

neuro-dynamic programming, IEEE Journal on Selected

Areas in Communications 18 (2) (2000) 197–208.

[29] K.W. Ross, D.H.K. Tsang, The stochastic knapsack

problem, IEEE Transactions on Communications 37 (7)

(1989) 740–747.

[30] Y. Fang, I. Chlamtac, Y.-B. Lin, Modeling PCS networks

under general call holding time and residence time distri-

butions, IEEE/ACM Transactions on Networking 5 (6)

(1997) 893–905.

[31] J.S. Kaufman, Blocking in a shared resource environment,

IEEE Transactions on Communications 29 (10) (1981)

1474–1481.

[32] J.-F.P. Labourdette, G.W. Hart, Blocking probabilities in

multitraffic loss systems: Insensitivity asymptotic behavior

and approximations, IEEE Transactions on Communica-

tions 40 (8) (1992) 1355–1366.

[33] A. Pfening, M. Telek, Optimal control of Markov regen-

erative processes, IEEE International Conference on Sys-

tems Man and Cybernetics 1 (1998) 663–668.

[34] R.M. Bradford, Pricing, routing, and incentive compati-

bility in multiserver queues, European Journal of Opera-

tional Research 89 (2) (1996) 226–236.

[35] Y.A. Korilis, A. Orda, Incentive compatible pricing

strategies for QoS routing, in: IEEE INFOCOM’99, vol.

2, New York, March 1999, pp. 891–899.

Suresh Kalyanasundaram received hisbachelor degree in Electrical andElectronics Engineering and mastersdegree in Physics from Birla Instituteof Technology and Science, Pilani,India, in 1996 and his Ph.D. degreefrom Purdue University in April 2000.Since then he has been working forMotorola. His research interests are inthe area of performance analysis ofwired and wireless networks.

Edwin K.P. Chong received theB.E.(Hons.) degree with first classhonors from the University of Adela-ide, South Australia, in 1987; and theM.A. and Ph.D. degrees in 1989 and1991, respectively, both from Prince-ton University, where he held an IBMFellowship. He joined the School ofElectrical and Computer Engineeringat Purdue University in 1991, where hewas named a University FacultyScholar in 1999, and was promoted toProfessor in 2001. Since August 2001,he has been a Professor of Electrical

and Computer Engineering at Colorado State University. Hiscurrent interests are in communication networks and optimi-zation methods. He co-authored the recent book, An Intro-duction to Optimization, 2nd Edition, Wiley-Interscience, 2001.He was on the editorial board of the IEEE Transactionson Automatic Control, and is currently an editor for ComputerNetworks. He is an IEEE Control Systems Society distin-guished lecturer. He received the NSF Career Award in 1995and the ASEE Frederick Emmons Terman Award in 1998.

S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630 629

Page 18: Optimal resource allocation in multi-class networks with user-specified utility functions

Ness B. Shroff received his Ph.D. de-gree from Columbia University, NY,in 1994. He is currently an associateprofessor in the School of Electricaland Computer Engineering at PurdueUniversity. His research interests spanthe areas of wireless and wirelinecommunication networks. He isespecially interested in fundamentalproblems involving the design, perfor-mance, scheduling, pricing, and con-trol of these networks. His research is

funded by numerous companies and federal and state agencies.Dr. Shroff is an editor for IEEE/ACM Transactions on Net-working and Computer Networks, and past editor of IEEECommunications Letters. He was the conference chair for the14th Annual IEEE Computer Communications Workshop (inEstes Park, CO, Oct. 1999) and program co-chair for theSymposium on High-Speed Networks, Globecom 2001 (SanFrancisco, CA, Nov. 2000). He is also the technical program co-chair for IEEE INFOCOM’03. He received the NSF Careeraward in 1996.

630 S. Kalyanasundaram et al. / Computer Networks 38 (2002) 613–630