20
Directed Graphs, Hamiltonicity and Doubly Stochastic Matrices Vivek S. Borkar, 1 Vladimir Ejov, 2 Jerzy A. Filar 2 1 School of Technology and Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 40005, India; e-mail: [email protected] 2 School of Mathematics, The University of South Australia, Mawson Lakes, SA 5095, Australia; e-mail: [email protected]; jerzy.fi[email protected] Received 6 December 2002; revised 5 October 2003; accepted 29 May 2004 DOI 10.1002/rsa.20034 Published online 3 August 2004 in Wiley InterScience (www.interscience.wiley.com). ABSTRACT: We consider the Hamiltonian cycle problem embedded in singularly perturbed (con- trolled) Markov chains. We also consider a functional on the space of stationary policies of the process that consists of the (1,1)-entry of the fundamental matrices of the Markov chains induced by the same policies. In particular, we focus on the subset of these policies that induce doubly stochastic proba- bility transition matrices, which we refer to as the “doubly stochastic policies.” We show that when the perturbation parameter ε is sufficiently small the minimum of this functional over the space of the doubly stochastic policies is attained very close to a Hamiltonian cycle, provided that the graph is Hamiltonian. We also derive precise analytical expressions for the elements of the fundamental matrix that lend themselves to probabilistic interpretation as well as asymptotic expressions for the first diagonal element, for a variety of deterministic policies that are of special interest, including those that correspond to Hamiltonian cycles. © 2004 Wiley Periodicals, Inc. Random Struct. Alg., 25, 376–395, 2004 Keywords: Hamiltonian cycle; controlled Markov chains; optimal policy; singular perturbation 1. INTRODUCTION This paper is a continuation of a line of research [10, 1, 8, 4, 7] which aims to exploit the tools of controlled Markov decision chains (MDPs) 1 to study the properties of a famous problem Correspondence to: J. Filar This work is supported by Australian Research Council Grant. 1 The acronym MDP stems from the alternative name of Markov decision processes. © 2004 Wiley Periodicals, Inc. 376

Directed graphs, Hamiltonicity and doubly stochastic matrices

Embed Size (px)

Citation preview

Directed Graphs, Hamiltonicity andDoubly Stochastic Matrices∗

Vivek S. Borkar,1 Vladimir Ejov,2 Jerzy A. Filar2

1School of Technology and Computer Science, Tata Institute of FundamentalResearch, Homi Bhabha Road, Mumbai 40005, India; e-mail: [email protected]

2School of Mathematics, The University of South Australia, Mawson Lakes, SA5095, Australia; e-mail: [email protected]; [email protected]

Received 6 December 2002; revised 5 October 2003; accepted 29 May 2004DOI 10.1002/rsa.20034Published online 3 August 2004 in Wiley InterScience (www.interscience.wiley.com).

ABSTRACT: We consider the Hamiltonian cycle problem embedded in singularly perturbed (con-trolled) Markov chains. We also consider a functional on the space of stationary policies of the processthat consists of the (1,1)-entry of the fundamental matrices of the Markov chains induced by the samepolicies. In particular, we focus on the subset of these policies that induce doubly stochastic proba-bility transition matrices, which we refer to as the “doubly stochastic policies.” We show that whenthe perturbation parameter ε is sufficiently small the minimum of this functional over the space ofthe doubly stochastic policies is attained very close to a Hamiltonian cycle, provided that the graphis Hamiltonian. We also derive precise analytical expressions for the elements of the fundamentalmatrix that lend themselves to probabilistic interpretation as well as asymptotic expressions for thefirst diagonal element, for a variety of deterministic policies that are of special interest, includingthose that correspond to Hamiltonian cycles. © 2004 Wiley Periodicals, Inc. Random Struct. Alg., 25,376–395, 2004

Keywords: Hamiltonian cycle; controlled Markov chains; optimal policy; singular perturbation

1. INTRODUCTION

This paper is a continuation of a line of research [10, 1, 8, 4, 7] which aims to exploit the toolsof controlled Markov decision chains (MDPs)1 to study the properties of a famous problem

Correspondence to: J. Filar∗This work is supported by Australian Research Council Grant.1The acronym MDP stems from the alternative name of Markov decision processes.© 2004 Wiley Periodicals, Inc.

376

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 377

of combinatorial optimization: the Hamiltonian Cycle Problem (HCP). More specifically,the present paper provides a partial answer to the open problem posed both in Filar and Liu[9] and Ejov, Filar, and Nguyen [5]. In these papers it was shown that Hamiltonian Cyclesof a graph can be characterized as the minimizers of a functional based on the fundamentalmatrices of Markov chains induced by deterministic policies in a suitably perturbed MDP,provided that the value of the perturbation parameter ε is sufficiently small. Furthermore,it was conjectured that the minimum over the space of deterministic policies would alsoconstitute the minimum over the space of all stationary policies.

While the above conjecture remains open under the perturbation studied in [5] and [9], inthe present paper we prove that under another linear, but symmetric, singular perturbation thesame element of the fundamental matrix is minimized in a neighborhood of a Hamiltoniancycle over the space of all stationary policies that induce doubly stochastic probabilitytransition matrices.

In the course of our analysis we derive formulae for all elements of the fundamentalmatrices of Markov chains induced by a range of policies that are of special interest. Theseformulae lend themselves to natural probabilistic interpretations involving the so-called“first hitting times” of states of these chains. Furthermore, we derive asymptotic characteri-zations of the (1,1)-element of the fundamental matrices for classes of deterministic policiesthat form a partition of the space of all deterministic policies. In the process we make use ofclassical results on MDP’s due to Blackwell [2]. We conclude with a novel reformulation ofthe Hamiltonian cycle problem as a specially structured nonlinear programming problem.

In this paper, we consider the following version of the Hamiltonian cycle problem: Givena directed graph, find a simple cycle that contains all vertices of the graph (Hamiltoniancycle (HC)) or prove that HC does not exist. With respect to this property—Hamiltonicity—graphs possessing HC are called Hamiltonian.2 Next we shall, briefly, differentiate betweenour approaches and some of the best related lines of research.

Many of the successful classical approaches of discrete optimisation focus on solving alinear programming “relaxation” followed by heuristics that prevent the formation of subcy-cles. In our approach, we embed a given graph in a singularly perturbed MDP in such a waythat we can identify Hamiltonian cycles and subcycles with exhaustive and nonexhaustiveergodic classes of induced Markov chains, respectively. Indirectly, this allows us to exploita formidable array of properties of Markov chains, including those of the correspondingfundamental matrices. It should be mentioned that algorithmic approaches to the HCP basedon embeddings of the problem in Markov decision processes are also beginning to emerge(e.g., see Andramonov et al. [1] and Ejov, Filar, and Gondzio [6]).

Note that this approach is essentially different from that adopted in the study of randomgraphs where an underlying random mechanism is used to generate a graph (e.g., seeKarp’s seminal paper [11]). In our approach, the graph that is to be studied is given andfixed, but a controller can choose arcs according to a probability distribution and with asmall probability (due to a perturbation) an arc may take you to a node other than its “head.”Of course, random graphs played an important role in the study of Hamiltonicity, a strikingresult to quote is that of Robinson and Wormald [14], who showed that with high probabilityk-regular graphs are Hamiltonian for k ≥ 3.

2The name of the problem owes to the fact that Sir William Hamilton investigated the existence of such cycles onthe dodecahedron graph [17].

378 BORKAR, EJOV, AND FILAR

More precisely, our dynamic, stochastic approach to the HCP, considers a moving objecttracing out a directed path on the graph � with its movement “controlled” by a function fmapping the set of nodes V = V(�) = {1, 2, . . . , s} of � into the set of arcs A = A(�) of�. We think of this set of nodes as the state space of a controlled Markov chain � = �(�)

where, for each state/node i, the action space A(i) := {a|(i, a) ∈ A} is in one-to-onecorrespondence with the set of arcs emanating from that node, or, equivalently, with the setof endpoints of those arcs.

Illustration. Consider the complete graph �5 on five nodes (with no self-loops) andthink of the nodes as the states of an MDP, denoted by �, and of the arcs emanatingfrom a given node as actions available at that state. In a natural way the Hamiltoniancycle c1 : 1 → 2 → 3 → 4 → 5 → 1 corresponds to the “deterministic control”f1 : {1, 2, 3, 4, 5} → {2, 3, 4, 5, 1}, where f1(2) = 3 corresponds to the controller choosingarc (2-3) in state 2 with probability 1. The Markov chain induced by f1 is given by the“zero-one” transition matrix P(f1), which, clearly, is irreducible. On the other hand, theunion of two subcycles: 1 → 2 → 3 → 1 and 4 → 5 → 4 corresponds to the con-trol f2 : {1, 2, 3, 4, 5} → {2, 3, 1, 5, 4} which identifies the Markov chain transition matrixP(f2) (see below) containing two distinct ergodic classes. This leads to a natural embed-ding of the Hamiltonian cycle problem in a Markov decision problem �. The latter MDPhas a multichain ergodic structure which considerably complicates the analysis. However,this multichain structure can be “disguised”—but not completely lost—with the help of a“singular perturbation”. For instance, we could easily replace P(f2) with Pε(f2):

P(f2) =

0 1 0 0 00 0 1 0 01 0 0 0 00 0 0 0 10 0 0 1 0

and

Pε(f2) =

ε 1 − 4ε ε ε ε

ε ε 1 − 4ε ε ε

1 − 4ε ε ε ε ε

ε ε ε ε 1 − 4ε

ε ε ε 1 − 4ε ε

.

The above perturbation is singular because it altered the ergodic structure of P(f2) bychanging it to an irreducible (completely ergodic) Markov Chain Pε(f2). Furthermore, itleads to a doubly stochastic matrix whose rows and columns both add to 1. This has beenachieved as follows: Each row and column of the unperturbed n × n matrix (a permutationmatrix) has exactly one 1 and the rest zeros. We replace the 1’s by 1 − (n − 1)ε for some0 < ε < 1/(n − 1) and the zeros by ε to get the doubly stochastic perturbation of thematrix.

Next, we introduce the notion of a Markov chain induced by a stationary randomizedpolicy (also called a randomized strategy or a randomized control). The latter can be definedby an s × s stochastic matrix f with entries representing probabilities f (i, a) of choosing apossible action a at a particular state i whenever this state is visited. Of course, f (i, a) = 0,whenever a �∈ A(i). Randomized policies compose the strategy space FS. The discretenature of the HCP focuses our attention on special paths which our moving object can traceout in �. These paths correspond to the subspace FD ⊂ FS of deterministic policies arising

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 379

when the controller at every fixed state chooses some particular action with probability1 whenever this state is visited (f1 and f2 are instances of the latter). To illustrate thesedefinitions, consider the simple case where fλ is obtained from the strictly deterministicpolicy f2 by the “controller” deciding to randomize at node 4 by choosing the arcs (4, 5)

and (4, 3) with probabilities f (4, 5) = 1 − λ and f (4, 3) = λ, respectively. The transitionprobability matrix of the resulting policy fλ is given by

Pε(fλ) =

ε 1 − 4ε ε ε ε

ε ε 1 − 4ε ε ε

1 − 4ε ε ε ε ε

ε ε λ − 5ελ + ε ε 5ελ − λ − 4ε + 1ε ε ε 1 − 4ε ε

.

As λ ranges from 0 to 1 the Markov chain ranges from the one induced by f2 to oneinduced by another deterministic policy. An attractive feature of an irreducible Markovchain P is the simple structure of its Cesaro-limit (or, stationary distribution) matrix P∗. Itconsists of an identical row-vector q representing the unique solution of the linear systemof equations: qP = q, q1 = 1, where 1 is an s-dimensional column vector with unity inevery entry.

2. THE FUNDAMENTAL MATRIX

We begin by deriving certain characterizations of the elements of the fundamental matrix ofa Markov chain. We consider a Markov Chain {Xn} on a finite state space S = {1, 2, . . . , s},with a transition matrix P = [[p(i, j)]], assumed to be irreducible. Let π = [π1, . . . , πs]denote its unique invariant probability vector and define its stationary distributionmatrix by

P∗ = limN↑∞

1

N + 1

N∑n=0

Pn =

π

. . .

π

. . ....

. . .

π

.

In this section we shall derive probabilistic expressions for the elements of the fundamental

matrix G = [[g(i, j)]] = (I − P + P∗)−1, I being the s × s identity matrix.Let

r = [1, 0, . . . , 0]T ∈ Rs, 1 = [1, 1, . . . , 1]T ∈ R

s,

and definew = G r, z = rT G.

These then satisfy the equations

(I − P + P∗)w = r,

380 BORKAR, EJOV, AND FILAR

i.e.,

w − Pw +(∑

i

πiwi

)1 = r, (1)

andz(I − P + P∗) = rT ,

i.e.,

z − zP +(∑

i

zi

)π = rT , (2)

respectively. We begin with (1).

Lemma 1. The solution of (1) satisfies:∑

i πiwi = π1.

Proof. This follows by left-multiplying (1) by P∗ and using the fact P∗P = P∗ = PP∗.

Let Ei[·] denote the expectation when X0 = i, and τi = min{n > 0 : Xn = i}, the firsthitting time of i (first return time if X0 = i) for i ∈ S. The next lemma follows from theresults of section 4.5 of [12]. We provide a simple proof to make our account self-contained.

Lemma 2. The elements, {wi}, of the solution of (1) are given by

w1 = 1

2

E1[τ1(τ1 + 1)]E1[τ1]2

, (3)

wj = 1

2

E1[τ1(τ1 + 1)]E1[τ1]2

− Ej[τ1]E1[τ1] , j �= 1. (4)

Proof. Let Mn = ∑nm=1(wXm − ∑

j p(Xm−1, j)wj), n ≥ 1. Then {Mn} is a martingale w.r.t.the family of σ -fields σ(Xm, m ≤ n), n ≥ 1. Let X0 = i. Then by the optional samplingtheorem ([3], p. 45),

Ei[Mτ1∧n] = Ei

[τ1∧n∑m=1

(wXm −

∑j

p(Xm−1, j)wj

)]= 0

for i ∈ S, n ≥ 1. Letting n ↑ ∞, by the dominated convergence theorem (in view ofEi[τ1] < ∞), we have

Ei

[τ1∑

m=1

(wXm −

∑j

p(Xm−1, j)wj

)]= 0.

Since Xτ1 = 1 and X0 = i, this leads to

w1 − Ei

[τ1−1∑m=0

(∑j

p(Xm, j)wj − wXm

)]− wi = 0. (5)

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 381

The Xmth row of (1) yields

wXm −∑

j

p(Xm, j)wj = I{Xm = 1} − π1,

where we also use Lemma 1.Thus (5) is the same as

wi = w1 + Ei

[τ1−1∑m=0

(I{Xm = 1} − π1)

]. (6)

For i ∈ S, i �= 1, Xm �= 1 for m < τ1, and hence

wi = −π1Ei[τ1] + w1, (7)

whereas for i = 1, (6) reduces to the tautology w1 = w1 by virtue of Theorem 5.3.2, p. 96,of [3], which in particular says that π1E1[τ1] = 1. Since

∑i πiwi = π1, multiplying (7) by

πi and summing, we have

w1 − π1

∑j �=1

πjEj[τ1] = π1.

But π1 = E1[τ1]−1. Thusw1 − π1

∑j

πjEj[τ1] = 0, (8)

or

w1 = π1

∑j

πjEj[τ1]

= π1

E1

[∑τ1−1m=0 EXm [τ1]

]E1[τ1]

=E1

[∑τ1−1m=0 EXm [τ1]

]E1[τ1]2

= 1

2

E1[τ1(τ1 + 1)]E1[τ1]2

,

where the second equality follows from Theorem 5.3.4, p. 101, of [3], the third from thefact that π1 = E1[τ1]−1, and the last from the occupation measure identity of [13], p. 74.This establishes (3). Now (4) follows from (3) and (7).

Let pm(i, j) = P(Xm = j/X0 = i) for i, j ∈ S and m ≥ 1. We have an alternativeexpression for w in the aperiodic case:

Lemma 3. If in addition to irreducibility we assume that P is aperiodic, then

w =∞∑

m=0

(Pmr − π11) + π11,

382 BORKAR, EJOV, AND FILAR

that is,

wi =∞∑

m=0

(pm(i, 1) − π1) + π1. (9)

Proof. By iterating (1),

w =n−1∑m=0

(Pmr − π11) + Pnw

n↑∞→∞∑

m=0

(Pmr − π11) +(∑

i

πiwi

)1

=∞∑

m=0

(Pmr − π11) + π11.

Similarly, from (2) we have for the aperiodic case the following representation.

Lemma 4. If P is irreducible and aperiodic, then the solution of (2) can be expressed as

z =∞∑

m=0

(rT − π)Pm + π ,

i.e.,

zi = πi +∞∑

m=0

(pm(1, i) − πi). (10)

In case we drop the aperiodicity assumption, arguments analogous to the above lead to thefollowing variants:

w = limn→∞

1

n

n−1∑m=0

m−1∑�=0

(P�r − π11) + π11,

i.e.,

wi = limn→∞

1

n

n−1∑m=0

m−1∑�=0

(p�(i, 1) − π1) + π1. (11)

z = limn→∞

1

n

n−1∑m=0

m−1∑�=0

(rT P� − π) + π ,

i.e.,

zi = limn→∞

1

n

n−1∑m=0

m−1∑�=0

(p�(1, i) − πi) + πi. (12)

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 383

Note that z1 = w1, as expected. Comparing (3), (4), (11), (12), we have

z1 = 1

2

E1[τ1(τ1 + 1)](E1[τ1])2

, (13)

zi = 1

2

Ei[τi(τi + 1)](Ei[τi])2

− E1[τi]Ei[τi] , i �= 1. (14)

Remark 1. It is worth noting that Lemma 2 can be derived under the weaker condition thatP is unichain with state 1 accessible with probability 1 from all other states. This makesstate 1 a recurrent state, but that may not be the case for all other states.

Collecting together the results of Lemma 2, (13), and (14), we have:

Theorem 1. If P is irreducible, the elements of the fundamental matrix G = [[g(i, j)]] aregiven by

g(i, i) = 1

2

Ei[τi(τi + 1)](Ei[τi])2

,

g(i, j) = 1

2

Ei[τi(τi + 1)](Ei[τi])2

− Ei[τj]Ei[τi] , i �= j.

We conclude this section with a useful observation which is the counterpart of Lemma 1for (2).

Lemma 5. The elements of the solution of (2) sum to 1. That is,∑

i zi = 1.

Proof. This follows immediately on multiplying (2) on the right by 1.

In particular, (2) reduces toz − zP + π = rT . (15)

3. DOUBLY STOCHASTIC PERTURBATIONS OF GRAPHS

A stochastic matrix P is called doubly stochastic if the transposed matrix PT is also stochastic(i.e., row sums of P and column sums of P add up to 1). A doubly stochastic deter-ministic policy is one that induces a doubly stochastic probability transition matrix whenunits are inserted in place of arcs selected by that policy and zeroes in all other places.Hence a Markov chain induced by such a policy has a probability transition matrix thatis a permutation matrix. The doubly stochastic perturbation of a policy (deterministic orrandomized) is achieved by passing to a singularly perturbed MDP that is obtained from theoriginal MDP generated by the graph � by introducing perturbed transition probabilities{pε(j|i, a)| i, a, j = 1, . . . , s} defined by the rule

pε(j|i, a) :={

1 − (s − 1)ε, if a = j and (i, a) is an arc of �,

ε, if a �= j.

384 BORKAR, EJOV, AND FILAR

For any randomized policy f the corresponding s × s probability transition matrix Pε(f) hasentries defined by the rule

(for 0 < ε < 1

s2

)

Pε(f)(i, j) :=s∑

a=1

pε(j|i, a) · f (i, a).

Denote by Dε the convex set of doubly stochastic matrices obtained by taking the closedconvex hull of the finite set Dε

e corresponding to the doubly stochastic deterministic policies.We also write Dε

e as the disjoint union Dεd ∪ Dε

h, where Dεh corresponds to the Hamiltonian

cycles and Dεd to disjoint unions of short cycles that cover the graph. (Double stochasticity

eliminates any other possibilities.) The expression for the stationary distribution of a doublystochastic policy implies that Ei[τi] = π−1

i = s for all i ∈ S. This allows us to simplify theexpressions for w1 = z1 derived in the previous section. We will at times use the superscriptwε

1 to emphasize the dependence of w1 on ε.

Lemma 6. For a policy inducing a doubly stochastic probability transition matrix wehave

w1 = 1

s2

∑i

Ei[τ1]

= 1

2

(s + 1)

s+ 1

2s2E1[(τ1 − s)2]. (16)

Furthermore, for policies corresponding to Dεh, the above reduces to

wε1 = 1

2

s + 1

s+ O(ε).

Proof. We have from (8)

w1 = π1

∑i

πiEi[τ1] = 1

s2

∑i

Ei[τ1],

since πi = 1s ∀i. Alternatively,

w1 = 1

2

E1[τ1(τ1 + 1)](E1[τ1])2

= 1

2

E1[τ1]2 + E1[τ1]E1[τ1]2

+ 1

2

E1[(τ1 − E1[τ1])2]E1[τ1]2

= 1

2

s + 1

s+ 1

2s2E1[(τ1 − s)2],

since E1[τ1] = s. For the last claim, it suffices to show that E[(τ1 − s)2] = O(ε). To seethis, call a transition whose probability is ε as an ε−transition. Then the probability thatthe Markov chain makes at least one ε−transition in s steps is at most s(s − 1)ε. Thus

P(τ1 �= s) ≤ s(s − 1)ε.

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 385

Furthermore, to have τ1 > ks for k ≥ 1, the chain must make at least one ε−transition ineach block of s consecutive steps. Thus

P(τ1 > ks) ≤ [s(s − 1)ε]k .

Hence

E[(τ1 − s)2] =∑j≥1

j2P(|τ1 − s| = j)

≤ s3(s − 1)ε +∑j≥1

j2P(τ1 = j + s)

≤ s3(s − 1)ε +∑k≥1

[(k + 1)s]2P(τ1 > ks)

≤ s3(s − 1)ε +∑k≥1

[(k + 1)s]2[s(s − 1)ε]k

= O(ε).

For ε > 0, let Pε ∈ Dεd be a doubly stochastic matrix that is a perturbation of a permutation

matrix P0 = lim Pε as ε ↓ 0.

Lemma 7. For Pε ∈ Dεd , wε

1 → ∞ as ε ↓ 0.

Proof. Let i ∈ S lie in a short cycle of P0 not containing 1. Then a chain starting at i mustmake an ε−transition before ever hitting 1. Thus if ζ denotes the first time it makes anε−transition,

Ei[τ1] ≥ Ei[ζ ]=

∑m≥1

m(s − 1)ε(1 − (s − 1)ε)m−1

= 1/[(s − 1)ε].The claim follows from (16).

Corollary 1. For sufficiently small ε > 0, all minima of wε1 on Dε

e are attained on Dεh.

Proof. The claim follows immediately from the preceding two lemmas.

Define the matrix norm ||A||∞ = maxi,j|a(i, j)| for a square matrix A = [[a(i, j)]]. CallP ∈ Dε a perturbation of Hamiltonian cycle if here exists a P̂ ∈ Dε

h such that ||P−P̂||∞ < ε0

for a prescribed ε0 > 0. Let Dεp denote the set of such P. Note that this depends on the

choice of ε0. Choose a C > 0 so that

Cε > s3(s − 1)ε +∑k≥1

[(k + 1)s]2[s(s − 1)ε]k .

386 BORKAR, EJOV, AND FILAR

Theorem 2. For sufficiently small ε > 0, all minima of wε1 on Dε are attained on Dε

p.

The proof uses the following lemma. In what follows we shall assume “ε sufficiently small.”This will be refined from time to time by adding constraints on ε as needed. Since we hadassumed that ε < 1

s2 , we also have that (s − 1)ε < 12 .

Lemma 8. For a policy corresponding to P ∈ Dε\Dεp,

wε1 ≥ 1

2

(s + 1)

s+ Cε.

Proof. By (16),

wε1 = 1

2

(s + 1

s

)+ 1

2s2E1[(τ1 − s)2].

Thus it suffices to prove that for P ∈ Dε\Dεp,

E1[(τ1 − s)2] ≥ 2s2Cε. (17)

Suppose not. P is a finite convex combination of elements of Dεe . Suppose the convex

combination puts a weight qk on an element of Dεe that has a short cycle of length k < s

containing 1. Then we must have

qsk(1 − (s − 1)ε)s(s − k)2 ≤ qk

k(1 − (s − 1)ε)k(s − k)2 ≤ E1[(τ1 − s)2] < 2s2Cε.

Taking the sth root of the above we obtain, since 1 − (s − 1)ε > 12 ,

(2s2Cε)1/s > (s − k)2/sqk(1 − (s − 1)ε) ≥ 1

2(s − k)2/sqk .

Thus,

q̂def=

s−1∑k=1

qk < (s22s+1Cε)1/ss−1∑i=1

1

i2/s

def= C1ε1/s.

We assume that C1ε1/s < 1. It follows that a weight of at least 1 − C1ε

1/s is put on elementsof Dε

h. We claim that a strict convex combination of transition matrices in Dεh with weight

of at least q̂ > 0 on two distinct elements thereof will in fact have a short cycle, containingnode 1, with edge weights of at least q̂(1 − (s − 1)ε). To see this, let P′ be one such matrix,putting weights of, at least, q̂ on P1, P2 ∈ Dε

h corresponding to the Hamiltonian cyclesh1, h2, respectively. Order the nodes along h1, h2, respectively, as {1, x1, x2, . . . , xs−1} and{1, y1, y2, . . . , ys−1}. Let i = min{j > 1 : xj �= yj}. Then xi = yk for some k > i. Hence{1, . . . , xi, yk+1, . . . , ys−1, 1} defines a short cycle that contains node 1, and with a probabilityassigned to each edge being at least q̂(1 − (s − 1)ε). Argue as above to conclude that

2s2Cε > [q̂(1 − (s − 1)ε)]k(s − k)2 ≥ [q̂(1 − (s − 1)ε)]s ≥(

2

)s

.

Hence,q̂ <

(s22s+1Cε

)1/s.

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 387

Note that the cardinality of Dεh is at most s!. Let

C2 =(

s!2

)(s22s+1C

)1/s.

Suppose (C1 + C2)ε1/s < 1. It then follows that a weight of at least 1 − (C1 + C2)ε

1/s sitson a single element of Dε

h. Choose ε such that

(C1 + C2)ε1/s < ε0.

This implies that there exists a P̂ ∈ Dεh such that ||P − P̂||∞ < ε0. Then P ∈ Dε

p, acontradiction. Thus (17) holds.

Proof of Theorem 2. The claim now follows in view of our choice of C and the explicitupper bound on E[(τ − s)2] for P ∈ Dε

h derived in the proof of Lemma 6.

Note that, given any P ∈ Dεp, one can find the associated Hamiltonian cycle simply by

following the transitions, one per node, whose probabilities are strictly less than 1 by anamount that is at most O(ε). This motivates the nonlinear programming formulation of theHamiltonian cycle problem given in the last section.

The foregoing uses rather liberal estimates for the various quantities involved. Whilethese suffice for “proof of concept,” their practical utility may be limited. Better estimatesare possible by using asymptotic expansions. This will be the theme of the next section.

4. ASYMPTOTIC EXPRESSIONS

In this section, we derive some asymptotic expressions for some of the quantities of interestto us, which further strengthen some of the results above and may be of independent interest.In order to distinguish HC from among other deterministic policies in D, we use the (1, 1)-entry of the fundamental matrix G(π), introduced in the previous section, to define theL−function, as in [9] and [6]:

L(Pε(f )) := [G(Pε(f ))](1, 1).

Note that for any P ∈ Dε, the unique stationary distribution isπ = [1s , 1

s , . . . , 1s

]. This simple

expression for the stationary distribution enables us to derive closed form expressions forthe L-function for both types, viz., Dε

h and Dεd , of deterministic policies.

Lemma 9. For f ∈ Dεh the value L(Pε(f )) does not depend on the particular Hamiltonian

cycle represented by f and equals

L(Pε(f )) = L(Pε(fHC)) = 1

s+ s2ε − 1 + (1 − sε)s

s2ε(1 − (1 − sε)s),

with fHC being the standard HC 1 → 2 → 3 → · · · → s → 1. In other words, L-functionis the constant function on Dε

h with the value as given above.

Proof. In Blackwell’s notations [2]

L(Pε(fHC)) = πε(fHC)1 + y1,

388 BORKAR, EJOV, AND FILAR

where πε(fHC) is the stationary distribution vector[

1s , 1

s , . . . , 1s

]and vector y :=

[y1, y2, . . . , ys]T satisfies the system

πε(fHC)y = 0,

(I − Pε(fHC))y = r − [πε(fHC)1, πε(fHC)1, . . . , πε(fHC)1]T(18)

for r = [1, 0, . . . , 0]T and

Pε(fHC) =

ε 1 − (s − 1)ε ε ε . . . ε

ε ε 1 − (s − 1)ε ε . . . ε

ε ε ε 1 − (s − 1)ε . . . ε

. . . . . . . . . . . . . . . . . .

ε ε ε ε . . . 1 − (s − 1)ε

1 − (s − 1)ε ε ε ε . . . ε

.

The first equation in (18) implies

y1 + · · · + ys = 0. (19)

Consider system (18) in the form Ay = b and consider (19) as the first equation of (18)corresponding to the row A1. Adding εA1 to every consecutive row Ai, i = 2, . . . , s + 1, weobtain the matrix of (18) in the augmented form:

1 1 1 . . . 1 1... 0

1 −(1 − sε) 0 . . . 0 0... 1 − 1

s

0 1 −(1 − sε) . . . 0 0... − 1

s

0 0 1 . . . 0 0... − 1

s

. . . . . . . . . . . . . . . . . .... . . .

0 0 0 . . . 1 −(1 − sε)... − 1

s

−(1 − sε) 0 0 . . . 0 1... − 1

s

.

Starting with the bottom equation one derives:

ys = (1 − sε)y1 − 1

s.

The second equation then implies:

ys−1 = (1 − sε)ys − 1

s= (1 − sε)2y1 − 1

s(1 + (1 − sε)),

then

ys−2 = (1 − sε)3y1 − 1

s(1 + (1 − sε) + (1 − sε)2),

and so on:

ys−k = (1 − sε)k+1y1 − 1

s(1 + (1 − sε) + · · · + (1 − sε)k),

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 389

until the third top equation reads for y2:

y2 = (1 − sε)s−1y1 − 1

s(1 + (1 − sε) + · · · + (1 − sε)s−2).

Hence, all components yi, i = 1, . . . , s are expressed in terms of y1. The value of y1 weobtain from the top equation:

y1(1 + (1 − sε) + · · · + (1 − sε)s−1) = 1

s(1 + S1 + · · · + Ss−3 + Ss−2),

where Sj := ∑jk=0(1 − sε)k . Since

S0 = 1 = 1 − (1 − sε)

sε, S1 = 2 − sε = 1 − (1 − sε)2

sε,

S2 = 1 − (1 − sε)3

sε, . . . , Ss−2 = 1 − (1 − sε)s−1

sε,

the sum

s−2∑j=0

Sj = 1

((s − 1) −

s−1∑k=1

(1 − sε)k

)= 1

(s − 1 − 1 − sε − (1 − sε)s

)

= 1

(s − 1 − (1 − sε)s

)= 1

s2ε2

(s2ε − 1 + (1 − sε)s

).

It now follows that

y1 =1s

∑s−2j=0 Sj

Ss−1=

1s3ε2 (s2ε − 1 + (1 − sε)s)

1sε (1 − (1 − sε)s)

= s2ε − 1 + (1 − sε)s

s2ε(1 − (1 − sε)s).

And hence,

L(Pε(fHC)) = πε(fHC)1 + y1 = 1

s+ s2ε − 1 + (1 − sε)s

s2ε(1 − (1 − sε)s).

The series expansion of the above form of L(Pε(fHC)) leads to the following:

Corollary 2. For a policy fHC defining a Hamiltonian cycle

limε→0

L(Pε(fHC)) = 1

2+ 1

2s.

Moreover, the initial terms of L(Pε(fHC)) Taylor expansion are

L(Pε(fHC)) = 1

2

(1 + s

s

)+ 1

12(s2 − 1)ε + 1

24(s(s2 − 1))ε2 + O(ε3).

390 BORKAR, EJOV, AND FILAR

Example 1. Matrix Pε(fHC) for s = 4 has the form

Pε(fHC) =

ε 1 − 3ε ε ε

ε ε 1 − 3ε ε

ε ε ε 1 − 3ε

1 − 3ε ε ε ε

.

The corresponding matrix

I − Pε(fHC) + P∗ε (fHC) =

5/4 − ε −3/4 + 3ε −ε + 1/4 −ε + 1/4−ε + 1/4 5/4 − ε −3/4 + 3ε −ε + 1/4−ε + 1/4 −ε + 1/4 5/4 − ε −3/4 + 3ε

−3/4 + 3ε −ε + 1/4 −ε + 1/4 5/4 − ε

.

And, hence, the fundamental matrix

(I − Pε(fHC) + P∗ε (fHC))−1

=

1/8 −5+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −3+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −1+12 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 1−4 ε−8 ε2+32 ε3

6 ε−1+16 ε3−16 ε2

1/8 1−4 ε−8 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −5+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −3+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −1+12 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2

1/8 −1+12 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 1−4 ε−8 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −5+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −3+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2

1/8 −3+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −1+12 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 1−4 ε−8 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 1/8 −5+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2

.

The left-hand corner value of the fundamental matrix, 1/8 −5+20 ε−40 ε2+32 ε3

6 ε−1+16 ε3−16 ε2 , agrees with theformula in Lemma 9.

Let now a policy g� ∈ Dεe correspond to � + 1 disjoint cycles of the sizes k0, k1, . . . , k�;∑�

j=0 kj = s.

Lemma 10. The value L(Pε(g�)) depends only on ε and the size k0 of the g� cycle thatcontains the home node and equals

L(Pε(g�)) = 1

s+

(1

s

)s − k0

1 − (1 − sε)k0+

(1

s2ε

)k0sε − 1 + (1 − sε)k0

1 − (1 − sε)k0.

Proof. To avoid messy indexed notations, we demonstrate our calculations for the case� = 1, in which situation g� represents two independent cycles

1 → 2 → · · · → k → 1 and k + 1 → k + 2 → · · · → s → k + 1.

As π(g�) = [1s , 1

s , . . . , 1s

], the first equation of the system analogous to (18) implies as

before:

y1 + · · · + ys = 0.

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 391

The reduced second equation of such system leads to the following expression of yj,j = 1, . . . , k via y1:

yk = (1 − sε)y1 − 1

s,

yk−1 = (1 − sε)yk − 1

s= (1 − sε)2y1 − 1

s(1 + (1 − sε)),

yk−2 = (1 − sε)3y1 − 1

s(1 + (1 − sε) + (1 − sε)2), . . . ,

yk−j = (1 − sε)j+1y1 − 1

s(1 + (1 − sε) + · · · + (1 − sε) j), . . . ,

y2 = (1 − sε)k−1y1 − 1

s

(1 + (1 − sε) + · · · + (1 − sε)k−2

).

Hence,

y1 + y2 + · · · + yk = y1(1 + (1 − sε) + · · · + (1 − sε)k−1)

− 1

s(1 + S1 + · · · + Sk−2) = y1Sk−1 − 1

s

k−2∑j=0

Sj.

The defining equations for yj, j = k + 1, . . . , s read:

ys = (1 − sε)yk+1 − 1

s,

ys−1 = (1 − sε)ys − 1

s= (1 − sε)2yk+1 − 1

s(1 + (1 − sε)),

ys−2 = (1 − sε)3yk+1 − 1

s(1 + (1 − sε) + (1 − sε)2), . . . ,

ys−j = (1 − sε)j+1yk+1 − 1

sSj, . . . ,

yk+2 = (1 − sε)s−k−1yk+1 − 1

sSs−k−2,

and

yk+1 = (1 − sε)yk+2 − 1

s,

where notation Sj is the same as in the proof of previous lemma. Expressions for yk+1 andyk+2 further imply

yk+1 = (1 − sε)s−kyk+1 − 1

sSs−k−1,

so

yk+1(1 − (1 − sε)s−k) = −1

s

1 − (1 − sε)s−k

sε;

therefore,

yk+1 = − 1

s2ε,

392 BORKAR, EJOV, AND FILAR

and so, in succession,

ys = (1 − sε)yk+1 − 1

s= − 1

s2ε= yk+1, ys−1 = (1 − sε)ys − 1

s= − 1

s2ε, . . . ,

yk+2 = (1 − sε)yk+3 − 1

s= − 1

s2ε.

Hence, the sum

yk+1 + yk+2 + · · · + ys = − s − k

s2ε.

Combining both sums y1 + y2 + · · · + yk and yk+1 + yk+2 + · · · + ys and equating to 0 theentire sum

∑j=sj=1 yj, we derive the defining equation for y1:

y1Sk−1 = s − k

s2ε+ 1

s

k−2∑j=0

Sj.

Since for every m ≥ 2 the sum

m−2∑j=0

Sj = 1

(m − 1 − 1 − sε − (1 − sε)m

)= 1

s2ε2(msε − 1 + (1 − sε)m),

then the above equation for y1 reads

y1

(1 − (1 − sε)k

)= s − k

s2ε+ 1

s2ε2

(ksε − 1 + (1 − sε)k

).

Hence

y1 = 1

s

s − k

1 − (1 − sε)k+ 1

s2ε

ksε − 1 + (1 − sε)k

1 − (1 − sε)k.

The value for L(g�) is now obtained by adding 1s to y1. The generalization for the situation

when there are more than two cycles is straightforward.

Corollary 3. For a policy g� ∈ Dεd

limε→0

L(Pε(g�)) = ∞.

Moreover, functional L(Pε(gl)) has a pole of order 1 at ε = 0, and the initial terms of itsLaurent expansion are

L(Pε(g�)) = s − k

s2kε−1 + 2 k + ks − s

2ks+ s(k − 1)(1 + k)

12kε

+ s2(k − 1)(1 + k)

24kε2 + O(ε3).

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 393

Example 2. For the same size s = 4 as in the Example 1 the perturbed policy Pε(g(2,2))

that represents two cycles of size 2 has the form:

Pε(g(2,2)) =

ε 1 − 3 ε ε ε

1 − 3 ε ε ε ε

ε ε ε 1 − 3 ε

ε ε 1 − 3 ε ε

.

The corresponding matrix

I − Pε(g(2,2)) + P∗ε (g(2,2)) =

5/4 − ε −3/4 + 3 ε −ε + 1/4 −ε + 1/4−3/4 + 3 ε 5/4 − ε −ε + 1/4 −ε + 1/4−ε + 1/4 −ε + 1/4 5/4 − ε −3/4 + 3 ε

−ε + 1/4 −ε + 1/4 −3/4 + 3 ε 5/4 − ε

.

So, the fundamental matrix

(I − Pε(g(2,2)) + P∗

ε (g(2,2)))−1

=

1/16 8 ε2−6 ε−1(−1+2 ε)ε

1/16 8 ε2+2 ε−1(−1+2 ε)ε

1/16 4 ε−1ε

1/16 4 ε−1ε

1/16 8 ε2+2 ε−1(−1+2 ε)ε

1/16 8 ε2−6 ε−1(−1+2 ε)ε

1/16 4 ε−1ε

1/16 4 ε−1ε

1/16 4 ε−1ε

1/16 4 ε−1ε

1/16 8 ε2−6 ε−1(−1+2 ε)ε

1/16 8 ε2+2 ε−1(−1+2 ε)ε

1/16 4 ε−1ε

1/16 4 ε−1ε

1/16 8 ε2+2 ε−1(−1+2 ε)ε

1/16 8 ε2−6 ε−1(−1+2 ε)ε

.

The left-hand corner element, 8 ε2−6 ε−116ε (−1+2 ε)

clearly agrees with the value given by the formulain Lemma 10.

5. OPTIMIZATION PROBLEMS

The foregoing discussion shows that the Hamiltonian cycle problem is equivalent to theexistence of a feasible solution (w, f) to the following system of equations:

(a) wi − ∑j,a f (i, a)pε(j|i, a)wj + 1/s = δi,1 ∀i,

(b)∑

i wi = 1,(c) f (i, a) ≥ 0,

∑b f (i, b) = 1 ∀i, a,

(d)∑

i,a f (i, a)pε(j|i, a) = 1 ∀j,(e) w1 = L(Pε(fHC)) ≥ wj ≥ L(Pε(fHC)) − 1 ∀j �= 1.

Equation (a) is a restatement of (1) for a doubly stochastic P and (b) that of Lemma1, in view of the fact that πi = 1

s ∀i. Equation (c) identifies f as a randomized policyand (d) identifies Pε(f ) as being doubly stochastic, as warranted by (a), (b). Note that(a) alone is the Poisson equation for the transition matrix Pε(f ) and “cost” δi,1, whichfixes w up to an additive constant. Equation (b) then renders it unique by forcing it tocoincide with (I − P + P∗)−1r for P = Pε(f ). It is (e) which forces f to correspond to aHamiltonian cycle. The inequalities in (e) follow from the expressions in Theorem 1 and theobservation that Ei[τ1] is withinO(ε)of some j, 1 ≤ j < s, for a perturbation of Hamiltoniancycle.

394 BORKAR, EJOV, AND FILAR

A “dual” feasibility problem can be stated in terms of (z, f) instead of (w, f) as

(a′) zi − ∑j,a f (j, a)pε(i|j, a)zj + 1/s = δi,1 ∀i,

(b′)∑

i zi = 1,(c′) f (i, a) ≥ 0,

∑b f (i, b) = 1 ∀i, a,

(d′)∑

i,a f (i, a)pε(j|i, a) = 1 ∀j,(e′) z1 = L(Pε(fHC)) ≥ zj ≥ L(Pε(fHC)) − 1 ∀j �= 1.

Here (a′) is the Poisson equation for transition matrix Pε(f )T [also stochastic due to thedouble stochasticity of Pε(f )] and cost δi,1, (b′) follows from Lemma 5. The rationale for(a′)–(e′) is thus similar to that for (a′)–(e′). The inequalities in (e′) ensure a priori that thesearch domain for the optimization problem introduced below is bounded.

Setting x(i, a) = zif (i, a), ∀i, a, one may rewrite (a′), (b′), (e′), respectively, as

(i)∑

a x(i, a) − ∑j,a x(j, a)pε(i|j, a) + 1/s = δi,1 ∀i,

(ii)∑

i,a x(i, a) = 1,(iii)

∑a x(1, a) = L(Pε(fHC)) ≥ ∑

a x(i, a) ≥ L(Pε(fHC)) − 1 ∀i �= 1.These must be supplemented with (c′), (d′) along with the “consistency condition”

(iv) x(i, a) = (∑

b x(i, b))f (i, a) ∀i, a.Condition (iv) captures the nonlinearity. This suggests the nonlinear program:

Minimize∑

i,a

(x(i, a) −

(∑b

x(i, b)

)f (i, a)

)2

,

subject to (i), (ii), (iii), (c′), (d′).

Note that the constraints are linear and the objective function separately convex in[[x(i, a)]], [[f (i, a)]]. Its globally minimum value of zero is attained for f ∼ fHC if ε issufficiently small, whenever a Hamiltonian cycle exists.

ACKNOWLEDGEMENT

The authors wish to acknowledge many valuable suggestions made by two expert (anony-mous) referees. In particular, the present proofs of Lemmas 6 and 7 and Theorem 2 arealong the lines suggested by one of the referees.

REFERENCES

[1] M. Andramonov, J. A. Filar, A. Rubinov, and P. Pardalos. “Hamiltonian cycle problem viaMarkov chains and min-type approaches.” Approximation and complexity in numerical opti-mization: Continuous and discrete problems, Ed. P. M. Pardalos, 31–47. Kluwer Academic,Dordrecht, 2000.

[2] D. Blackwell. Discrete dynamic programming, Ann Math Statist 33 (1962), 719–726.

[3] V. S. Borkar. Probability theory: An advanced course, Springer-Verlag, New York, 1995.

DIRECTED GRAPHS AND DOUBLY STOCHASTIC MATRICES 395

[4] M. Chen and J. A. Filar. “Hamiltonian cycles, quadractic programming and ranking of extremepoints,” Recent advances in global optimization, Eds., C. A. Floudas and P. M. Pardalos, 32–49.Princeton University Press, Princeton, NJ, 1992.

[5] V. Ejov, J. A. Filar and M. Nguyen. Hamiltonian cycles and singularly perturbed Markovchains, Math Oper Res 29(1) (2004), 114–131.

[6] V. Ejov, J. A. Filar and J. M. Gondzio. An interior point heuristic for the Hamiltonian cycleproblem via Markov decision processes, Global Optim (2004), to appear.

[7] E. Feinberg. Constrained discounted Markov decision processes with Hamiltonian cycles, MathOper Res 25 (2000), 130–140.

[8] J. A. Filar and J.-B Lasserre. A non-standard branch and bound method for the Hamiltoniancycle problem, ANZIAM J 42(E) (2000), 556–577.

[9] J. A. Filar and K. Liu. “Hamiltonian cycle problem and singularly perturbed decision process,”Statistics, probability and game theory: Papers in honor of David Blackwell, IMS LectureNotes - Monograph Series, USA, 30 (1996), 44–63.

[10] J. A. Filar and D. Krass. Hamiltonian cycles and Markov chains, Math Oper Res 19 (1994),223–227 .

[11] R. Karp. Probabilistic analysis of partitioning algorithms for the travelling-salesman problemin the plane, Math Oper Res 2(3) (1977), 209–224.

[12] J. G. Kemeny and J. L. Snell. Finite Markov chains, Van Nostrand, Princeton, NJ, 1960.

[13] J. W. Pitman. Occupation measures for Markov chains, Adv Appl Probab 9 (1977), 69–86.

[14] R. Robinson and N. Wormald. Almost all regular graphs are Hamiltonian, Random StructuresAlgorithms 5(2) (1994), 363–374.

[15] R. J. Wilson. Introduction to graph theory, Longman Scientific and Technical, Harlow, UnitedKingdom, 1996.