3

Click here to load reader

Cooperative dynamics and Wardrop equilibria

Embed Size (px)

Citation preview

Page 1: Cooperative dynamics and Wardrop equilibria

Systems & Control Letters 58 (2009) 91–93

Contents lists available at ScienceDirect

Systems & Control Letters

journal homepage: www.elsevier.com/locate/sysconle

Cooperative dynamics and Wardrop equilibriaVivek S. Borkar ∗

School of Technology and Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India

a r t i c l e i n f o

Article history:Received 19 November 2007Received in revised form26 August 2008Accepted 26 August 2008Available online 19 September 2008

Keywords:Monotone dynamicsWardrop equilibriaNetwork gamesUtility functionsStochastically stable equilibria

a b s t r a c t

A simple dynamics capturing user behavior in a network is shown to lead to equilibria that can be viewedas approximate, generalized Wardrop equilibria.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction

This work considers a network game with large number ofusers, whereby each individual user is perforce a ‘small user’.The user on entering the network sees several alternative routesand picks one according to a rule which places a high probabilityon the utility-maximizing route(s), and a smaller probability onothers, depending on their utility. The utility in turn depends onthe congestion level. We show that this leads to approximateWardrop equilibria, i.e., to steady state behaviorwherein the trafficconcentrates on routes that maximize utility. An accompanyingdynamic pricing scheme is also analyzed and shown to convergeto a stable price profile in an appropriate sense.The next section describes the user dynamics and its equilib-

rium behavior. The correspondence with Wardrop equilibria is es-tablished in Section 3. Section 4 describes the dynamic pricingscheme. See [1,6] for some earlier work with a similar flavor.

2. The dynamics

Consider a network of links wherein users arrive at a ‘sourcenode’ and choose one of M alternative routes. With each routei, we associate a random variable Xi(n)

def= the number of users

already on the route at time n; and another random variable Yi(n)that corresponds to an instantaneous average utility perceived by

∗ Tel.: +91 22 22782293; fax: +91 22 22782299.E-mail address: [email protected].

0167-6911/$ – see front matter© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.sysconle.2008.08.006

a newly arriving customer at time n, who is contemplating joiningroute i. Specifically, we suppose that this is given by Yi(n) =Ui(Xi(n)) for a prescribed function Ui(·). We assume that:

1. Ui(·) ≥ 0 are differentiable and monotonic decreasing, i.e., theutility decreases with congestion. (For example, it could be afunction of a path length, a price charged on the path, and thecongestion, with the dependence on the first two subsumed inthe subscript i.)

2. The different Ui’s have identical growth properties, i.e.,

Ui(x) = Θ(Uj(x)) as ‖x‖ → ∞, i 6= j. (1)

A newly arriving user joins route iwith probability

Ui(Xi(n))α∑jUj(Xj(n))α

(2)

for some α ≥ 1. At time n, a small random fraction ξi(n) ∈ [0, 1]of users on route i complete their journey.We assume that {ξn} arei.i.d. and each ξn is independent of the history up to time n, withmean µ > 0.Finally, the customers arrive according to an i.i.d. process {ζ (n)}

with mean λ. More general arrival processes, such as ergodicMarkov chains, can also be handled in our framework, the abovecondition is merely to keep matters simple. We shall assume thatµ is small and λ < µM (which ensures stability). We shall beinterested in the µ ↓ 0 limit with λ

µkept constant. This captures

the spirit of the ‘small user’ assumption.

Page 2: Cooperative dynamics and Wardrop equilibria

92 V.S. Borkar / Systems & Control Letters 58 (2009) 91–93

Let γi(n) ∈ [0, 1] denote the fraction of new arrivals at time nthat join the i-th route. The dynamics of {Xi(n)} is thus given by

Xi(n+ 1) = Xi(n)+ γi(n+ 1)ζ (n+ 1)− ξi(n+ 1)Xi(n)

= Xi(n)+ µ

λµ

Ui(Xi(n))α∑jUj(Xj(n))α

− Xi(n)

+ µMi(n+ 1), (3)

where

Mi(n+ 1)def=1µ(γi(n+ 1)ζ (n+ 1)− ξi(n+ 1)Xi(n))

−1µ

λ Ui(Xi(n))α∑jUj(Xj(n))α

− µXi(n)

for n ≥ 0. {Mi(n)} for each i is seen to be amartingale difference se-quencew.r.t. the filtrationFn

def= σ(Xi(m), ξi(m), ζ (m), γi(m),m ≤

n, 1 ≤ i ≤ M). Furthermore, it is assumed to satisfy the condition

E[|Mi(n+ 1)|2|Fn] ≤ K(1+ ‖X(n)‖2) ∀i, n. (4)

Dynamics (3) can be viewed as a constant stepsize (= µ) stochasticapproximation algorithm. By standard results from the ‘o.d.e.’approach to these (see, e.g., [2] – these require (4)), it follows thatits asymptotic behavior can be analyzed by looking at the o.d.e.

xi(t) = hi(x(t))− xi(t)

=

µ

)Ui(xi(t))α∑jUj(xj(t))α

− xi(t) (5)

for 1 ≤ i ≤ M, t ≥ 0. Let h(·) = [h1(·), . . . , hM(·)]T. Note that fori 6= j,

∂hi(x)∂xj

µ

−αUi(xi)αU ′j (xj)Uj(xj)α−1(∑kUk(xk)α)2

> 0under our hypotheses. Thus (5) is a cooperative o.d.e. in the senseof [5], a special case of ‘monotone dynamics’ [7]. Since

xi(t) = e−txi(0)+∫ t

0e−(t−s)hi(x(s))ds, 1 ≤ i ≤ M, t ≥ 0,

and |hi(·)| are bounded, the trajectories x(·) remain bounded. Bythe results of [5], we then have:

Lemma 1. For initial conditions belonging to an open dense set, x(·)converges to the set Hα

def= {x : h(x) = x} of equilibria of (5).

We shall also consider the ‘scaled limit’ of (5) given by

x(t) = h∞(x(t))− x(t),

with

h∞(x)def= lima↑∞

h(ax)a≡ 0

in view of (1). This o.d.e. has the origin as the globallyasymptotically stable equilibrium. By Theorem 2.1(ii) of [2], it thenfollows that

supnE[‖X(n)‖2] <∞. (6)

This prompts the following assumption: Note that {X(n)} is aMarkov chain which, in view of (6), will have at least one invariantdistribution.(*) There exists an open dense set D of RM such that for allinitial conditions in D, the trajectories of (3) converge to Hα; and

furthermore, any invariant distribution of {X(n)} assigns zeromassto Dc .Intuitively, Dc should comprise all points excluded in Lemma 1.

For ε > 0, let Hεαdef= {x : infy∈Hα ‖x − y‖ < ε} denote the ε-

neighborhood of Hα . We can argue, as in the proof of Theorem 1 of[4], the following:

Theorem 1. For any ε, δ ∈ (0, 1),

lim supn↑∞

P(X(n) 6∈ Hεα) ≤ δ + O(µε

).

That is, if µ is ‘small’, then {X(n)} asymptotically concentratenear Hα with high probability.

3. Wardrop equilibria

Let

H∞def= {x = [x1, . . . , xM ] : Ui(xi) 6= max

jUj(xj)⇒ xi = 0 and

Uk(xk),U`(x`) = maxjUj(xj)⇒ xk = x`}.

Theorem 2. As α ↑ ∞, Hα → H∞ in the sense that, if xα ∈ Hα ∀αand xα → x∞ along a subsequence as α ↑ ∞, then x∞ ∈ H∞.

Proof. Let S(x) def= {i : Ui(xi) = maxj Uj(xj)}, x = [x1, . . . , xM ]. Asα ↑ ∞,

Ui(xi)α∑jUj(xj)α

→1|S(x)|

, i ∈ S(x),

→ 0, i 6∈ S(x).

The claim follows. �

The set corresponds to an equal allocation of flow to theutility maximizing routes, with zero allocation to the rest.This corresponds to the notion of Wardrop equilibria fromtransportation theory [8]. What underlies the above argument isour choice of route selection probabilities to be such as to favorthe maximizer(s) of utility. This is the so called ‘softmax’ in neuralnetwork literature. What it achieves is a smooth approximationto the indicator of the maximizer(s), which makes the resultingo.d.e. well-posed.

Remarks. 1. One can also view our model as a scenario whereusers use the utility maximizing routes with high probability, butnot with certainty, and assign a small probability to the non-optimal choices due to imperfect information, bounded rationality,etc. As this ‘error’ shrinks to zero, their behavior converges toWardrop equilibria. Thus Wardrop equilibria are ‘stochasticallystable’ in the sense of Foster and Young [3].2. If there are several source-destination pairs and there is apossibility of overlap between routes, the ‘cooperative’ nature ofthe dynamics may break down. The above model will continue tobe a reasonable approximation, however, in the ‘large populationlimit’ when applied to a segment of traffic corresponding to asingle source destination pair whose effect on the rest of thetraffic is negligible. That is, one treats the rest of the traffic asstationary environmental effects which will get averaged over bythe stochastic approximation dynamics. We omit the details.

Page 3: Cooperative dynamics and Wardrop equilibria

V.S. Borkar / Systems & Control Letters 58 (2009) 91–93 93

4. Dynamic pricing

Next consider the case when the utilities explicitly dependon a price associated with each route, which is also dynamicallyadjusted. That is, the utility for joining the i−th route at time n isUi(Xi(n), pi(n)), where pi(n) is the price for joining the i−th routeat time n. A simple price adjustment scheme would be

pi(n+ 1) = (1− a)pi(n)+ aκXi(n), (7)

where κ > 0 is a prescribed scalar which we take to be 1for sake of simplicity. That is, the price is a convex combinationof the previous price and a number proportional to the averagecongestion. (More generally, κXi(n) above could be replaced byf (Xi(n)) for a nonnegative increasing f with a similar analysis.) Theconstant a > 0 is a prescribed stepsize. We assume that a � µ,implying, in particular, that the price adjustment is on a slowertimescale than (3). Stability of iterates is not an issue here, becauseit follows from the stability (i.e., bounded second moments) of{X(n)}, which in turn follows as before. Stochastic approximationtheory then allows us to treat {X(n)} as a so called ‘Markov noise’and consider the limiting o.d.e.

pi(t) = κ xi(p(t))− pi(t), 1 ≤ i ≤ M. (8)

Here xi(p) is the stationary average of the congestion in route i if theprice profile is frozen at p = [p1, . . . , pM ]. Consider the followingassumption:(Ď) x(p) is differentiable and satisfies the ‘monotonicity’ condition:

〈x(p)− x(p′), p− p′〉 < 0 (9)

if p 6= p′.That is, a higher price profile leads to lower congestion. From

(3), we know that xi(p) ≤ λµfor all i, p. Thus x(·) has a bounded

range and therefore by Brouwer’s fixed point theorem, it has atleast one fixed point p∗. Then for p(·) = [p1(·, . . . , pM(·))]T givenby (8),

ddt‖p(t)− p∗‖2 = 2〈p(t)− p∗, x(p(t))− p(t)〉

= 2〈p(t)− p∗, x(p(t))− x(p∗)〉 − 2‖p(t)− p∗‖2

< 0

for p(t) 6= p∗. Thus ‖p(t) − p∗‖2 serves as a Liapunov function,leading to p(t) → p∗. In particular, if p′ were another fixed point,p(t) ≡ p′would satisfy this, forcing p′ = p∗. Hence p∗ is the unique

fixed point of x(·) and a globally asymptotically stable equilibriumfor the o.d.e. (8). By Theorem 2.3 of [2], we then have:

Theorem 3. The iterates p(n) = [p1(n), . . . , pM(n)] satisfy: for anyε > 0,

lim supn↑∞

P(‖p(n)− p∗‖ > ε) ≤ O(a).

That is, the prices stabilize to a fixed profile p∗. An alternative(weaker) assumption is:(ĎĎ) xi(p) is a continuously differentiable function of p which isincreasing in pj, j 6= i.Phenomenologically, (ĎĎ) is a reasonable assumption to make,

for increasing pi (say) would drive users away from route i to otherroutes. Under this assumption, (8) is a cooperative o.d.e. and ananalysis along the lines of the preceding section is possible, withanalogous conclusions. Here, however, the equilibrium need notbe unique and all we can say is that the price will concentrate nearthe set of equilibria with high probability.

Acknowledgements

The author’s work was supported in part by a J.C. BoseFellowship from the Dept. of Science and Technology, Govt. ofIndia. This work was also presented at the 13th InternationalSymposium on Dynamic Games, Wroclaw, Poland, June 29 – July3, 2008.

References

[1] E. Altman, N. Shimkin, Individual equilibrium and learning in processor sharingsystems, Operations Research 46 (1998) 776–784.

[2] V.S. Borkar, S.P. Meyn, The O.D.E. method of convergence of stochasticapproximation and reinforcement learning, SIAM J. Control Optim. 38 (2000)447–469.

[3] D.P. Foster, H.P. Young, Stochastic evolutionary dynamics, Theoretical Popula-tion Biology 38 (1990) 219–232.

[4] D. Garg, V.S. Borkar, D. Manjunath, Network pricing for QoS: A ‘regulation’approach, in: E.H. Abed (Ed.), Advances in Communication Networks, Controland Transportation Systems, Birkhäuser, Boston, 2005, pp. 137–157.

[5] M.W. Hirsch, Systems of differential equations that are competitive orcooperative II: Convergence almost everywhere, SIAM J. Math. Anal. 16 (1985)423–439.

[6] W.H. Sandholm, Evolution and learning in games: Overview, in: S.N. Durlauf,L.E. Blume (Eds.), The New Palgrave Dictionary of Economics, 2nd edition,Palgrave, Macmillan, 2008.

[7] H.L. Smith, Monotone Dynamical Systems, AmericanMath. Soc, Providence, R.I.,1995.

[8] J.G. Wardrop, Some theoretical aspects of road traffic research, in: Proc. Inst.Civil Engg. (Part 2), 1952, pp. 325–378.