On Markov Chains for Independent Sets

Ž .Journal of Algorithms 35, 17�49 2000doi:10.1006�jagm.1999.1071, available online at http:��www.idealibrary.com on

On Markov Chains for Independent Sets1

Martin Dyer and Catherine Greenhill

School of Computer Studies, Uni�ersity of Leeds, Leeds LS2 9JT, United Kingdom

Received December 20, 1997

Random independent sets in graphs arise, for example, in statistical physics, inthe hardcore model of a gas. In 1997, Luby and Vigoda described a rapidly mixingMarkov chain for independent sets, which we refer to as the Luby�Vigoda chain. Anew rapidly mixing Markov chain for independent sets is defined in this paper.Using path coupling, we obtain a polynomial upper bound for the mixing time ofthe new chain for a certain range of values of the parameter �. This range is widerthan the range for which the mixing time of the Luby�Vigoda chain is known to bepolynomially bounded. Moreover, the upper bound on the mixing time of the newchain is always smaller than the best known upper bound on the mixing time of the

ŽLuby�Vigoda chain for larger values of � unless the maximum degree of the.graph is 4 . An extension of the chain to independent sets in hypergraphs is

described. This chain gives an efficient method for approximately counting thenumber of independent sets of hypergraphs with maximum degree two, or withmaximum degree three and maximum edge size three. Finally, we describe amethod which allows one, under certain circumstances, to deduce the rapid mixingof one Markov chain from the rapid mixing of another, with the same state spaceand stationary distribution. This method is applied to two Markov chains forindependent sets, a simple insert�delete chain, and the new chain, to show that theinsert�delete chain is rapidly mixing for a wider range of values of � than waspreviously known. � 2000 Academic Press

1. INTRODUCTION

An independent set in a graph G is a subset of vertices, no two of whichare adjacent. In statistical physics, independent sets arise in the hardcoremodel of a gas, where a vertex is a possible site for a particle and adjacent

Ž .sites cannot be simultaneously occupied. The probability � X that aparticular configuration X is observed is proportional to � � X � for some

Ž .positive parameter �. The normalizing factor, Z � , is called the partition

1 Supported by ESPRIT Working Group RAND2.

17

0196-6774�00 $35.00Copyright � 2000 by Academic Press

All rights of reproduction in any form reserved.

DYER AND GREENHILL18

function of the system. The two main tasks are evaluating the partitionfunction and sampling according to the probability distribution � . Both ofthese problems can be solved in polynomial time using the Markov ChainMonte Carlo method wherever a rapidly mixing Markov chain for indepen-

Ž � �.dent sets is available see, for example 18 .The simplest Markov chain for independent sets is the so-called

insert�delete chain. This chain was only known to be rapidly mixing forsmall values of �. A slightly more complicated chain was proposed by Luby

� �and Vigoda 21 . We refer to this chain as the Luby�Vigoda chain. Arapidly mixing Markov chain for independent sets of fixed size was intro-

� �duced by Bubley and Dyer 3 ; in this paper we concentrate on resultsŽ .relating to the set II G of all independent sets in the given graph.

� �In 21 , Luby and Vigoda used the coupling method to establish therapid mixing of their chain. Here we state tighter bounds on the mixingtime of the Luby�Vigoda chain which can be established using the path

� �coupling method 3 . Then a new Markov chain for independent sets isdefined. This chain was independently discovered by Luby, Mitzenmacher,

� �and Vigoda, as mentioned in 20 .The new chain is an improvement on the Luby�Vigoda chain in two

areas. The upper bound on the mixing time of the new chain is polynomialfor a wider range of values of � than the best known upper bound on themixing time of the Luby�Vigoda chain. Moreover, the new chain hasbetter time bounds than the Luby�Vigoda chain for ‘‘large’’ values of �Ž .unless the maximum degree � of the graph is 4 . We do not knowwhether this apparent improvement is an artifact of the analysis, or

Žwhether the new chain really is more rapidly mixing or rapidly mixing for.a wider class of graphs than the Luby�Vigoda chain.

To compare the time bounds of the new chain and the Luby�Vigodachain, we calculate the ratio of the best known upper bounds of therespective mixing times. In fact we show that this ratio tends to infinity as� increases. The Luby�Vigoda chain has better time bounds on �-regulargraphs for small values of �, but never by more than a factor of two. Whilecomparing best known upper bounds on the mixing times is not entirely

Žsatisfactory we cannot claim, for example, that one chain mixes more.rapidly than the other , in the context of Markov chain Monte Carlo

algorithms it is the upper bound which determines the number of steps ofthe chain to be simulated per sample.

Like the Luby�Vigoda chain, the new chain can be used to approxi-mately count independent sets in graphs of degree at most four, and we

� �give better time bounds for this than those given in 21 . Specifically, the� � Ž 3 Ž ..time bound given in 21 for this task is O n log n , while our chain has

Ž Ž ..time bound O n log n if the graph has maximum degree three, and

ON MARKOV CHAINS FOR INDEPENDENT SETS 19

Ž 2 Ž ..O n log n if the graph has maximum degree four, where n is thenumber of vertices of the graph.

The new chain is easily extended to act on independent sets of hyper-graphs. The upper bound on the mixing time increases with the size of thelargest edge in the hypergraph. We show that the new chain can be used to

Ž .approximately count in polynomial time independent sets in hypergraphswhere either each vertex is contained in at most two edges, or each vertexis contained in at most three edges and each edge contains at most threevertices. The former problem is related to the problem of approximatelycounting edge covers of a graph. The bound on the mixing time of the new

� Ž 2 .chain is � n smaller than the bound on the mixing time of the onlyŽpreviously available Markov chain for edge covers in graphs where the

� Ž . Ž ..� � notation hides factors of log n . As far as we are aware, the Markovchain presented here is the first which can be used to approximately countthe number of independent sets in hypergraphs of maximum degree three,maximum edge size three. We show that the problem of counting indepen-dent sets in graphs with maximum degree 3 is �P-complete, and that it is�P-complete to count independent sets in hypergraphs where every edgehas at most three vertices and each vertex is contained in at most threeedges.

To conclude, we show how the rapid mixing of one Markov chain can,under certain conditions, be used to deduce the rapid mixing of anotherMarkov chain with the same state space and stationary distribution. Thismethod is useful in the following situation. Suppose that a known Markovchain for a given state space is rapidly mixing. If a simpler Markov chainexists with the same state space and stationary distribution, one might

Žprefer to work with this simple chain when performing simulations such as.in a Markov chain Monte Carlo algorithm . If the specified conditions are

met, we can deduce that the simple chain is also rapidly mixing.The method involves relating the spectral gap of the two chains, using a

linear relationship between the entries of the transition matrices of the�chains. It gives a simple alternative to the approaches described in 7�9,

�25 . We illustrate our method using the insert�delete chain and the newchain. This allows us to demonstrate that the simple insert�delete chain israpidly mixing for a much wider range of values of � than was previouslyknown.

The plan of the paper is as follows. In the remainder of the Introductionwe review the path coupling method. In Section 2 we introduce somenotation and describe the insert�delete chain and the Luby�Vigoda chain.The new chain is described in Section 3 and a proof by path coupling isgiven to show that it is rapidly mixing for a wider range of values of � thanthe Luby�Vigoda chain is known to be. An extension of the new chain toindependent sets of hypergraphs is described in Section 4. Classes of


hypergraphs for which this chain mixes rapidly when � � 1 are discussed.In Section 5 we develop and apply the new method referred to abovewhich allows us, under certain conditions, to deduce the rapid mixing ofone Markov chain from the rapid mixing of another with the same statespace and stationary distribution.2

1.1. A Re�iew of Path Coupling

Let � be a finite set and let MM be a Markov chain with state space �,transition matrix P, and unique stationary distribution � . In order for aMarkov chain to be useful for almost uniform sampling or approximatecounting, it must converge quickly toward its stationary distribution � . Wemake this notion more precise below. If the initial state of the Markovchain is x then the distribution of the chain at time t is given by

tŽ . tŽ .P y � P x, y . The total �ariation distance of the Markov chain from �xat time t, with initial state x, is defined by

1t td P , � � P x , y � � y .Ž . Ž .Ž . ÝTV x 2y��

� � Ž .Following Aldous 1 , let � � denote the least value T such thatxŽ t . Ž .d P , � � � for all t T. The mixing time of MM, denoted by � � , isTV x

Ž . � Ž . 4defined by � � � max � � : x � � . A Markov chain will be said to bexrapidly mixing if the mixing time is bounded above by some polynomial in

Ž �1 .n and log � , where n is a measure of the size of the elements of �.Ž .Throughout this paper all logarithms are to base e. Sometimes it isconvenient to specify an initial distribution for the Markov chain, ratherthan an initial state. Then the total variation distance at time t is given by

1t td P , � � z P z , y � � y .Ž . Ž . Ž .Ž . Ý ÝTV 2y�� z��

2 Note added in proof : Since submitting this paper, we have learnt that Luby and Vigoda� �20 have analyzed the insert�delete chain, using path coupling with respect to a speciallydefined metric. They have shown that the insert�delete chain is rapidly mixing for the same

Ž � �range of values as the new chain described in this paper. The proof in 20 only considers.triangle-free graphs, and the proof for arbitrary graphs uses coupling. The upper bound on

Ž .the mixing time that they obtain in the triangle-free case is a constant times larger than theupper bound on the mixing time that we prove for the new chain. For example, suppose thatthe maximum degree of the graph is 4. As � tends to 1, the upper bound on the mixing time

� �of the Markov chain given in 20 is almost three times bigger than the upper bound on themixing time of the chain described in this paper.


One can easily prove that

d P t , � � max d P t , � : x � � 1Ž .� 4Ž . Ž .TV TV x

for all initial distributions � .There are relatively few methods available to prove that a Markov chain

is rapidly mixing. One such method is coupling. A coupling for MM is aŽ . Ž . Ž .stochastic process X , Y on � � � such that each of X , Y , consid-t t t t

Žered marginally, is a faithful copy of MM. The Coupling Lemma see for� �.example, Aldous 1 states that the total variation distance of MM at time t

� �is bounded above by Prob X � Y , the probability that the process hast tnot coupled. The difficulty in applying the Coupling Lemma lies in obtain-ing an upper bound for this probability. In the path coupling method,

� �introduced by Bubley and Dyer 3 , one need only define and analyze acoupling on a subset S of � � �. Choosing the set S carefully canconsiderably simplify the arguments involved in proving rapid mixing bycoupling. The path coupling method is described in the next theorem,

� �taken from 12 . We use the term path to describe a sequence of elementsof the state space, which need not necessarily be a sequence of possibletransitions of the Markov chain.

THEOREM 1.1. Let be an integer-�alued metric defined on � � �� 4which takes �alues in 0, . . . , D . Let S be a subset of � � � such that for all

Ž .X , Y � � � � there exists a patht t

X � Z , Z , . . . , Z � Yt 0 1 r t

Ž .between X and Y such that Z , Z � S for 0 � l � r andt t l l1

r�1

Z , Z � X , Y .Ž . Ž .Ý l l1 t tl�0

Ž . Ž � �.Define a coupling X, Y � X , Y of the Marko� chain MM on all pairsŽ . � Ž � �.�X,Y � S. Suppose that there exists � � 1 such that E X , Y �

Ž . Ž . Ž .� X, Y for all X, Y � S. If � � 1 then the mixing time � � of MM

satisfies

log D��1Ž .� � � .Ž .

1 � �

� Ž . Ž .�If � � 1 and there exists � � 0 such that Prob X , Y � X , Yt1 t1 t t � for all t, then

2eD�1� � � log � .Ž . Ž .

�


Remark 1.1. In many applications the set S is defined by

S � X , Y � � � � : X , Y � 1 .� 4Ž . Ž .

Here one need only define and analyze a coupling on pairs at distance 1apart.

2. KNOWN MARKOV CHAINS FOR INDEPENDENT SETSIN GRAPHS

Ž .Let G � V, E be a graph. A subset X of V is an independent set if� 4 Ž .� , w � E for all � , w � X. Let II G be the set of all independent sets ina given graph G and let � be a positive number. The partition function

Ž .Z � Z � is defined by

Z � Z � � � � X � .Ž . ÝŽ .X�II G

The function � defined by

� � X �

� X �Ž .Z

Ž .is a probability measure on II G . The two main tasks are to approxi-Ž .mately evaluate the partition function Z � and to approximately sample

Ž .from II G according to the distribution � . When � � 1 the task ofŽ .approximately evaluating Z � is equivalent to approximately counting

Ž . Ž .II G . Note that it is NP-hard to compute Z � to within any polynomialŽ .factor whenever � � c�� for some constant c � 0 unless NP � RP . For

� �a proof of this result see 21, Theorem 4 .For the remainder of the paper we will assume that the maximum

degree � of the graph G is at least 3. This assumption is justified by thefollowing theorem, the proof of which is easy and omitted.

Ž .THEOREM 2.1. Let G be a graph with maximum degree � and let II Gbe the set of all independent sets in G. Suppose that 0 � � � 2. Then we can

Ž . Ž .e�aluate Z � exactly in polynomial time for all � � 0. Moreo�er thereŽ .exists a polynomial-time procedure for sampling from II G according to the

distribution � .

Ž .For � 3 the tasks of approximately evaluating Z � and approxi-Ž .mately sampling from II G according to � can be performed using a

Ž .rapidly mixing Markov chain with state space II G and stationary distribu-� �tion � . For a description of how this can be achieved, see for example 18 .


Ž .The simplest Markov chain on II G which converges to the stationarydistribution � is the so-called insert�delete chain. If X is the state at timett then the state at time t 1 is determined by the following procedure:

Ž .i choose a vertex � uniformly at random from V,Ž . Ž . � 4ii Delete if � � X then let X � X � � with probabilityt t1 t

Ž .1� 1 � ,Ž .Insert if � � X and � has no neighbors in X then lett t

� 4 Ž .X � X � � with probability �� 1 � ,t1 t

otherwise let X � X .t1 t

This chain is easily shown to be ergodic with stationary distributionŽ . Ž .equal to � . Given X, Y � II G , let H X, Y denote the Hamming dis-

� � � �tance between X and Y which equals X �Y Y � X . A bound on themixing time of the insert�delete chain is stated below. This bound isobtained using the path coupling method on states at distance 1 apart. Thedetails are omitted.

THEOREM 2.2. Let G be a graph with maximum degree �. The insert�Ž . Ždelete Marko� chain is rapidly mixing for � � 1� � � 1 . When � � 1� � �

. Ž .1 the mixing time � � of the insert�delete chain satisfies

1 ��1� � � n log n� .Ž . Ž .

1 � � � � 1Ž .Ž .

Ž .When � � 1� � � 1 the mixing time satisfies

2 �1� � � 2n e 1 � log n 1 log � .Ž . Ž . Ž . Ž .Ž .

Remark 2.1. Given the above bound on the mixing time of theinsert�delete chain, we can only guarantee rapid mixing at � � 1 when

� Ž . �the input graph has maximum degree 2. Since II G can be calculatedŽ .exactly for these graphs see Theorem 2.1 , this would suggest that the

insert�delete chain is not useful in terms of approximate counting. How-ever, the results of Section 5 below show that the insert�delete chain israpidly mixing for a wider range of values of � than stated above. Inparticular, we show that the insert�delete chain can be used to approxi-mately count independent sets in graphs with maximum degree at mostfour.3

3 � �Note added in proof : Since submission of this paper, Luby and Vigoda 20 have provedrapid mixing of the insert�delete chain for the same range of values of � as we consider inSection 5, but with a better bound on the mixing time.


� �In 21 , Luby and Vigoda proposed the following Markov chain on stateŽ . Ž Ž ..space II G , which we shall denote by LL VV II G . If the Markov chain is

at state X at time t, the state at time t 1 is determined by the followingtprocedure:

Ž . � 4i choose an edge e � � , w from E uniformly at random,�

�Ž . Ž . Ž � 4. � 4ii with probability �� 1 2� let X � X � � � w ,t

��Ž . Ž � 4. � 4with probability �� 1 2� let X � X � w � � ,t

��Ž . � 4with probability 1� 1 2� let X � X � � , w ,t

Ž . � Ž . �iii if X � II G then let X � X else let X � X .t1 t1 t

The Luby�Vigoda chain is ergodic with stationary distribution � . In� � Ž .21 , the chain is shown to be rapidly mixing for � 4, � � 1� � � 3 ,using the coupling method. A bound for the mixing time of theLuby�Vigoda chain is given below which is an improvement on that stated

� �in 21 . The Luby�Vigoda chain is also rapidly mixing when � � 3, � � 1,� �as stated in 21 . A bound for the mixing time of the chain in this case is

stated below. Both bounds were obtained using path coupling on pairs atdistance 1 apart. The details are omitted, but note that the bounds

� � Ž .improve those given in 21 by a factor of � n .

THEOREM 2.3. Let G be a graph with maximum degree � and minimumŽ Ž ..degree . The Marko� chain LL VV II G is rapidly mixing for � � 3, � � 1

Ž .and for � 4, � � 1� � � 3 . When � � 3 and � � 1 the mixing timesatisfies

1 2��1� �� E log n� .Ž . Ž .LV 1 � �Ž .

Ž . Ž .When � 4 and � � 1� � � 3 the mixing time � � satisfiesLV

1 2��1� �� E log n� .Ž . Ž .LV 1 � � � 3 �Ž .Ž .

Ž . Ž . Ž .When a � � 3 and � � 1 or b � 4 and � � 1� � � 3 the mixingtime satisfies

�1� �� 2ne E 1 2� log n � 1 log � .Ž . Ž . Ž . Ž .Ž .LV

Remark 2.2. The Luby�Vigoda chain gives an efficient algorithm forapproximately counting of independent sets in graphs with maximumdegree at most four.

ŽRemark 2.3. In all but the boundary cases i.e., when � � 3 andŽ . .� � 1� � � 3 , or � � 3 and � � 1 , the bound on the mixing time of the

Ž Ž .. � �Markov chain LL VV II G is proportional to E � , where is theminimum degree of the graph G. Therefore the chain may be � times


more rapidly mixing for a �-regular graph than for another graph withmaximum degree �. It is possible to improve this situation by embedding

� Ž �.G into a �-regular multigraph G � V, E , adding self-loop edges atŽ �.vertices with degree less than �. A Markov chain on II G can be defined

Ž .which is rapidly mixing for � � 1� � � 3 when � � 3, and for � � 1 if� � 3. Moreover, the mixing time of this chain can be bounded above bythe upper bound established in Theorem 2.3 for the Luby�Vigoda chain

Ž .on regular graphs, in the following situations: when � 4 and 1� 2� � 5Ž .� � � 1� � � 3 , or when � � 3 and 1�3 � � � 1. The details are

omitted.

� �Remark 2.4. Recently, Dyer et al. 13 have given a moderate value of�, above which approximate counting is impossible unless RP � NP.Moreover, they show that, for � 6, no random walk can converge inpolynomial time if it adds to, or deletes from, the independent set only a‘‘small’’ number of vertices. Thus, for random walks of the type describedhere, the issue of rapid mixing is only open for � � 5.

3. A NEW MARKOV CHAIN FOR INDEPENDENT SETSIN GRAPHS

Ž Ž .. Ž .In this section we define a new chain MM II G with state space II G ,the set of all independent sets in the graph G. The new Markov chain canperform the moves of the insert�delete chain, and can also perform a newkind of move, called a drag move. If X is the state at time t then the statetat time t 1 is determined by the following procedure:

Ž .i choose a vertex � uniformly at random from V,Ž . Ž . � 4ii Delete if � � X then let X � X � � with probabilityt t1 t

Ž .1� 1 � ,Ž .Insert if � � X and � has no neighbors in X thent t

� 4 Ž .let X � X � � with probability �� 1 � ,t1 t

Ž .Drag if � � X and � has a unique neighbor u in X thent t

Ž � 4 � 4 Ž Žlet X � X � � � u with probability �� 4 1t1 t..� ,

otherwise let X � X .t1 t

This chain is easily shown to be ergodic with stationary distribution � .Ž .Recall the definition of the line graph L of a graph , with a vertex for

Ž .every edge in and edges in L between adjacent edges in . AnŽ .independent set in L corresponds to a matching in : that is, a set of

Ž Ž ..edges no two of which are adjacent. The Markov chain MM II G defined


above corresponds almost exactly to the Markov chain on matchings� � Ždescribed in 18, p. 495 in the matchings chain, a drag move is performed

.with probability 1 .The set of transitions which this Markov chain can perform is identical

to the set of transitions performed by the Luby�Vigoda chain. However,the probabilities with which these transitions are performed can be verydifferent. For example, the Luby�Vigoda chain will insert the vertex �with probability which depends on the degree of �. The probability ofinsertion in the new chain is independent of the degree of the vertex.

Ž Ž .. Ž .We prove below that MM II G is rapidly mixing for � � 2� � � 2using the path coupling method on pairs at Hamming distance 1 apart.This result improves upon that of Luby�Vigoda when � � 3 or � 5. Acomparison of the best known upper bounds for the mixing times of thetwo chains is given at the end of this section.

THEOREM 3.1. Let G be a graph with maximum degree �. The Marko�Ž Ž .. Ž . Ž .chain MM II G is rapidly mixing for � � 2� � � 2 . When � � 2� � � 2

Ž .the mixing time � � satisfies

2 1 �Ž . �1� � � n log n� .Ž . Ž .2 � � � 2 �Ž .

Ž .When � � 2� � � 2 the mixing time satisfies

2 �1� � � 2n e 1 � log n 1 log � .Ž . Ž . Ž . Ž .Ž .Ž . Ž . Ž .Proof. Let X , Y � II G � II G be given. We can certainly con-t t

struct a path

X � Z , . . . , Z � Yt 0 r t

Ž . Ž .between X and Y such that H Z , Z � 1 for 0 � i � r and Z � II Gt t i i1 iŽ .for 0 � i � r, where r � H X , Y . The path may be defined by removingt t

each element of X �Y one by one until the independent set X Y ist t t tobtained, then by adding each element of Y � X in turn. Therefore itt t

Ž . Ž .suffices to define a coupling on elements X, Y such that H X, Y � 1.Let X and Y be independent sets which differ just at a vertex � withdegree d. Without loss of generality, assume that � � X �Y. We now

Ž .define the coupling on X, Y . Choose a vertex w � V uniformly atrandom. Suppose first that w does not satisfy any of the following condi-tions:

Ž .i w � � ,Ž . � 4ii w, � � E and w has no neighbor in Y,Ž . � 4iii w, � � E and w has a unique neighbor in Y.


Then perform the following procedure:�

� �Ž . Ž � 4 � 4. Žif w � X then let X , Y � X � w , Y � w with probability 1� �. 1 ,

�� Ž . Žif w � X and w has no neighbor in X then let X , Y � X �

� 4 � 4. Ž .w , Y � w with probability �� 1 � ,� if w � X and w has a unique neighbor u in X then let

� � � 4 � 4 � 4 � 4X , Y � X � w � u , Y � w � uŽ . Ž . Ž .Ž .Ž Ž ..with probability �� 4 1 � ,

�� Ž . Ž .otherwise let X , Y � X, Y .

Ž . Ž .It follows from the fact that w does not satisfy Conditions i � iii thatŽ Ž .. Ž � �.procedure defines a coupling of MM II G and that H X , Y � 1 with

probability 1. The coupling in the special cases will now be described and� Ž � �. �the contribution made to E H X , Y � 1 will be calculated.

Ž � �. Ž . Ž .If w � � then let X , Y � X, X with probability �� 1 � , other-Ž � �. Ž . Ž � �.wise let X , Y � Y, Y . Here H X , Y � 0 with probability 1.

Ž .Suppose next that w satisfies Condition ii . Define the coupling here asfollows:

�� Ž Ž .. Ž . Ž � 4 � 4.with probability �� 4 1 � let X , Y � Y � w , Y � w ,

�� Ž Ž .. Ž . Ž � 4.with probability 3�� 4 1 � let X , Y � X, Y � w ,

�� Ž . Ž .otherwise let X , Y � X, Y .

Ž � �. Ž Ž .. Ž � �.Here H X , Y � 0 with probability �� 4 1 � and H X , Y � 2Ž Ž .. Ž � �.with probability 3�� 4 1 � , otherwise H X , Y � 1.

Ž .Finally suppose that w satisfies Condition iii . If u is the uniqueŽ � � . Ž Žneighbor of w which is an element of Y then let X , Y � X, Y �

� 4. � 4. Ž Ž .. Ž � � . Ž .w � u with probability �� 4 1 � , otherwise let X , Y � X, Y .Ž � � . Ž Ž .. Ž � �.Here H X , Y � 3 with probability �� 4 1 � and H X , Y � 1

otherwise.Let d� be the number of neighbors w of � which have neighbors in the

� Ž .independent set Y. Then d � d elements of V satisfy Condition ii and at� Ž .most d elements of V satisfy Condition iii . Combining these calcula-

tions, we obtain

1 d � d� � d��Ž .� �E H X , Y � 1 � � 1Ž . ž /n 2 1 � 2 1 �Ž . Ž .

1 d�� 1ž /n 2 1 �Ž .

1 �� 1 . 2Ž .ž /n 2 1 �Ž .


Let � be defined by

2 � � � 2 �Ž .� � 1 � .

2n 1 �Ž .

Ž . � Ž � �.� ŽThen 2 states that E H X , Y � �. Now � � 1 whenever � � 2� � �. Ž . Ž Ž ..2 . When � � 2� � � 2 the chain MM II G is rapidly mixing with mixing

Ž .time � � given by

2 1 �Ž . �1� � � n log n� ,Ž . Ž .2 � � � 2 �Ž .

Ž .by Theorem 1.1. When � � 2� � � 2 we must estimate the probabilitythat the Hamming distance changes in any given step. Rather than applythe second part of Theorem 1.1 directly, we will derive an improved bound.

Ž .Suppose that H X, Y � i. Then there exist i vertices � such that � �Ž .X �Y or � � Y � X. With probability 1� 1 � we will delete � from one

independent set, decreasing the Hamming distance between X and Y.Therefore the probability that the Hamming distance changes is at leastŽ Ž ..i� n 1 � . We estimate the expected time that it takes a coupling

Ž . �started at X, Y to decrease the Hamming distance to i � 1. By 19,�Lemma 4 , the expected time to reach Hamming distance i � 1 is at most

n 1 �Ž .22 n � i 1 � 1 � 2n 1 � �i .Ž . Ž .Ž .

i

Therefore the expected time to couple is bounded above by

n 122n 1 � ,Ž . Ý ii�1

2Ž .Ž Ž . .which is less than 2n 1 � log n 1 . It follows using Markov’s in-� 2 Žequality that the probability that we have not coupled after T � 2n e 1

.Ž Ž . . �1� log n 1 steps is at most e . If we perform s independent couplingtrials of T steps then the probability that we have not coupled at the endof these sT steps is at most e�s. It follows that the mixing time satisfies

2 �1� � � 2n e 1 � log n 1 log � ,Ž . Ž . Ž . Ž .Ž .

as stated.

Ž Ž ..Remark 3.1. The probability that MM II G performs a given dragŽ Ž ..move is �� 4 1 � . We could instead have defined a family of Markov

Ž .chains by letting this drag probability be p � p � . However, it is easy to


show that our chosen value of p minimizes the upper bound on the mixingtime, for all values of �.

Remark 3.2. Like the Luby�Vigoda chain, the new chain gives anefficient algorithm for approximately counting independent sets in graphswith maximum degree at most four. The upper bound for the mixing time

Ž . � �for this task is � n times smaller than that given in 21 if � � 4, andŽ 2 .� n times smaller if � � 3.

3.1. Comparing Upper Bounds on Mixing Times

We conclude this section by comparing our improved upper bounds forŽ Ž ..the mixing time of the Luby�Vigoda chain LL VV II G and the upperŽ Ž ..bounds for the mixing time of the new chain MM II G , as given in

Theorems 2.3 and 3.1. In both cases, the upper bound given is the bestknown upper bound for the mixing time of the chain. First let us make ageneral definition.

DEFINITION 3.1. Let MM be a Markov chain on a state space � withiŽ . Ž .mixing time � � , and suppose that B � is the best known upper boundi i

Ž . Ž . Ž .on � � for i � 1, 2. If B � � B � then MM is said to be UB-superiori 1 2 1Ž .to MM here ‘‘UB’’ stands for ‘‘upper bound’’ .2

Suppose that MM is UB-superior to MM . Of course, it does not necessar-1 2Ž . Ž . Ž .ily follow that � MM � � MM , since the upper bound B � could be far1 1 2 2 2

too high. However, in a Markov chain Monte Carlo setting, the Markovchains MM are used to perform approximate sampling from the underlyingistate space �. In order to ensure that the final state is drawn from adistribution which is within � of stationary in total variation distance, the

Ž .Markov chain MM must be simulated for B � steps, as this is the besti iknown upper bound on the mixing time. Thus the approximate samplingalgorithm based on MM is faster than the approximate sampling algorithm1

Ž .based on MM and similarly for approximate counting .2In some other contexts, the concept of UB-superiority may be less

useful. For example, suppose that the Markov chain is being used in a� �perfect sampling algorithm such Fill’s algorithm 14 or coupling from the

� �past 24 . Here, the running time of the perfect sampling algorithmdepends on the mixing time of the Markov chain, rather than on any upperbound for the mixing time.

Now let us return to the comparison of the upper bounds of the newŽ .Markov chain and the Luby�Vigoda chain for independent sets. Let r �

Ž .be the ratio of the upper bounds, with the bound for � � as theLVŽ . Ž .numerator and the bound for � � as the denominator. If r � � 1 then

Ž .the new chain is UB-superior, while if r � � 1 then the Luby�Vigodachain is UB-superior. For convenience we restrict ourselves to the case


that G is �-regular. Note however that, by Remark 2.3, the comparison isŽ . Ž .also valid for nonregular graphs if � 4, 1� 2� � 4 � � � 1� � � 3 or

� � 3, 1�3 � � � 1. A summary of the results for � � 1, � � 3, � � 1,Ž .� � 4 and � � 1� � � 3 , � 5 is given in Fig. 1.

Ž .In all cases, the ratio r � is a monotonic increasing function of �.Hence, as � increases, we may move from values where the Luby�Vigodachain is UB-superior to values where the new chain is UB-superior.

Ž .Moreover, r � � 1�2 as � � 0. This shows that the upper bound on themixing time of the new chain is at most twice as large as the upper boundon the mixing time of the Luby�Vigoda chain. In contrast, if � � 4 thenŽ .r � tends to infinity as � grows. This corresponds to situations where the

upper bound on the mixing time of the Luby�Vigoda chain is arbitrarilylarger than the upper bound on the mixing time of the new chain. Notethat, in applications, it is generally the larger values of � which are ofmore interest.

Ž .First suppose that � � 3 and � � 1. Then the ratio r � is given by

1 2� 2 � � 2 3� � 2�2Ž . Ž .r � � � .Ž . 24 1 � � 1 � 4 � 4�Ž . Ž .

FIG. 1. The ratio r of the upper bounds for the mixing times, LL VV : MM, for various valuesof �.


The Luby�Vigoda chain is UB-superior if 0 � � � 1�2, and the new chainis UB-superior if 1�2 � � � 1. As � tends to 1 the ratio tends to infinity.For example, when � � 1�4 the ratio is 7�10, when � � 3�4 the ratio is25�14, and when � � 99�100 the ratio is 15049�398. When � � 1 the

� Ž .upper bound of the Luby�Vigoda chain is � n times bigger than theupper bound for the new chain. When 1 � � � 2 the new chain is rapidlymixing but it is not known whether the Luby�Vigoda chain is rapidlymixing.

Ž .Next consider the case � � 4. If � � 1 then the ratio r � is given by

1 2�r � � ,Ž .

2 2�

which is less than 1 for all values of � � 1. Therefore the Luby�Vigodachain is UB-superior when � � 4 for all � � 1. The ratio tends to 3�4 as

Ž .� tends to 1. When � � 1 the ratio r � is given by

212n e log n 1Ž .Ž .r � � ,Ž . 24n e log n 1Ž .Ž .

which is approximately equal to three for large values of n. This suggeststhat the upper bound on the mixing time of the Luby�Vigoda chain isroughly three times as large as the upper bound on the mixing time of thenew chain in the boundary case � � 1.

Ž . Ž .Finally suppose that � 5. If � � 1� � � 3 then the ratio r � of theupper bounds satisfies

1 2� 2 � � � 2 �Ž . Ž .Ž .r � � .Ž .

4 1 � 1 � � � 3 �Ž . Ž .Ž .

Ž . Ž .It is not difficult to check that r � � 1 when � � 2� 3� � 8 , andŽ . Ž .r � � 1 when � � 2� 3� � 10 . Therefore there exists � such that0Ž . Ž .2� 3� � 8 � � � 2� 3� � 10 , the Luby�Vigoda chain is UB-superior0

Ž .for 0 � � � � and the new chain is UB-superior for � � � � 1� � � 3 .0 0Note that the latter interval is nonempty, as � 5. As � tends to

Ž .1� � � 3 the ratio tends to infinity. For example, suppose that � � 8.When � � 1�9 the ratio r equals 33�40, when � � 1�7 the ratio is 9�8

Ž .and when � � 24�125 the ratio is 9169�1490. If � � 1� � � 3 then the� Ž .upper bound on the mixing time of the Luby�Vigoda chain is � n times

larger than the upper bound on the mixing time of the new chain. WhenŽ . Ž . Ž Ž ..1� � � 3 � � � 2� � � 2 the chain MM II G is known to be rapidly

mixing but the Luby�Vigoda chain has no direct proof of rapid mixing.


4. A MARKOV CHAIN FOR INDEPENDENT SETSIN HYPERGRAPHS

Ž Ž ..In this section it is shown that the Markov chain MM II G can be easilyextended to operate on the set of all independent sets of a hypergraph. Aswe shall see, this gives an efficient method for approximately counting thenumber of independent sets of two classes of hypergraphs: those withmaximum degree two and those with maximum degree three, maximumedge size three.

Ž .A hypergraph G � V, E consists of a vertex set V and a set E of edges,where each edge is a subset of V. We will insist that every edge contains atleast 2 elements. Let

� �� 4m � max e : e � E ,

the size of the largest edge. Then the hypergraph G is a graph if and onlyif m � 2. Say that w is a neighbor of � if there exists an edge whichcontains both � and w. We use the degree function d defined by

� 4d � � e � E : � � eŽ .

for all � � V. Let the maximum degree � of G be defined by

� � max d � : � � V .� 4Ž .

An independent set in the hypergraph G is a subset of the vertex set, no� 4subset of which is an edge in E. Consider a map X : V � 0, 1 . An edge e

Ž . Ž .is said to be a flaw in X if and only if X � � 1 for all � � e. Let f X bethe number of flaws in the map X. An independent set corresponds to a

� 4map X : V � 0, 1 with no flaws, where the vertex � belongs to theŽ . Ž .independent set if and only if X � � 1. Let II G be the set of all

independent sets in G, defined by

� 4II G � X : V � 0, 1 : f X � 0 .� 4Ž . Ž .

Ž . Ž . � 4If � � e and X � � 0 but X u � 1 for all u � e� � , then � is said tobe critical for the edge e in X. In this situation e is said to be a criticaledge.

Let � be a positive parameter. Just as in the graph case, we use theŽ . Ž . � X �distribution � on II G where � X is proportional to � . We now give

Ž Ž ..a definition of the Markov chain MM II G which agrees with the chaindescribed in Section 3 whenever G is a graph. For ease of notation, let

Ž . Ž Ž ..p � m � 1 �� 2m � 1 . If the chain is in state X at time t, the nextt


state X is determined according to the following procedure:t1

Ž .i choose a vertex � � V uniformly at random,�Ž . � 4 Ž .ii if � � X then let X � X � � with probability 1� � 1 ,t t1 t

� if � � X and � is not critical in X for any edge then lett t� 4 Ž .X � X � � with probability �� 1 � ,t1 t

� if � � X and � is critical in X for a unique edge e then witht t� 4 Žprobability p choose w � e� � uniformly at random and let X � Xt1 t

� 4. � 4� � � w ,� otherwise let X � X .t1 t

This chain is easily shown to be ergodic. Let P be the transition matrixŽ Ž .. � 4 Ž . Ž .of MM II G . If Y � X � � then �P X, Y � P Y, X and hence

Ž . Ž . Ž . Ž .� X P X, Y � � Y P Y, X . Suppose that � � X and � is critical in X� 4 Ž � 4. � 4for a unique edge e. Let w � e� � and let Y � X � � � w . Then

Ž . Ž � � .P X, Y � p� e � 1 . Now w is critical for e in Y. The fact that X hasno flaws implies that w is not critical for any other edge in Y. ThereforeŽ . Ž . Ž . Ž . Ž . Ž .P Y, X � P X, Y and so � X P X, Y � � Y P Y, X . These obser-

Ž Ž ..vations show that MM II G is reversible with respect to � , so � is theŽ Ž ..stationary distribution of MM II G .

Ž Ž ..The following theorem proves that the Markov chain MM II G israpidly mixing for

� � m� m � 1 � � m ,Ž .Ž .

using the path coupling method. When m � 2 both the chain, and theresult, are identical to those of Theorem 3.1.

THEOREM 4.1. Let G be a hypergraph with maximum degree � andŽŽmaximum edge size m. The Marko� chain MM is rapidly mixing for � � m� m

. . ŽŽ . . Ž .� 1 � � m . When � � m� m � 1 � � m the mixing time � � satisfies

m 1 �Ž . �1� � � n log n� .Ž . Ž .m � m � 1 � � m �Ž .Ž .

ŽŽ . .When � � m� m � 1 � � m the mixing time satisfies

2 �1� � � 2n e 1 � log n 1 log � .Ž . Ž . Ž . Ž .Ž .

Ž .Proof. As in Theorem 3.1, we can couple the pair X , Y of indepen-t tŽ .dent sets along a path of length H X , Y , where consecutive elements oft t

the path are at Hamming distance 1 apart. Let X and Y be independentsets which differ just at a vertex � with degree d. Without loss of

Ž .generality, assume that � � X �Y. We now define the coupling on X, Y .


Choose a vertex w � V uniformly at random. Suppose first that w does notsatisfy any of the following conditions:

Ž .i w � � ,Ž .ii w is a neighbor of � and there exists an edge e such that

� 4� , w � e and e is the only edge for which w is critical in X,Ž .iii w is a neighbor of � which is critical in X for more than one

edge but is critical in Y for a unique edge e.

Then perform the following procedure:

�� Ž . Ž � 4 � 4.If w � X then let X , Y � X � w , Y � w with probability

Ž .1� � 1 ,� if w � X and w is not critical in X for any edge then let

� � � 4 � 4X , Y � X � w , Y � wŽ . Ž .

Ž .with probability �� 1 � ,� if w � X and w is critical in X for a unique edge e then with

� 4probability p choose u � e� w uniformly at random and let

� � � 4 � 4 � 4 � 4X , Y � X � w � u , Y � w � u ,Ž . Ž . Ž .Ž .�

� �Ž . Ž .otherwise let X , Y � X, Y .

Ž . Ž .It follows from the fact that w does not satisfy any of Conditions i � iiiŽ Ž .. Ž � �.that this procedure is indeed a coupling for MM II G and that H X , Y

� 1 with probability 1. The coupling in the special cases will now be� Ž � �. �described and the contribution made to E H X , Y � 1 will be calcu-

lated.Ž � �. Ž . Ž .If w � � then let X , Y � X, X with probability �� 1 � , other-

Ž � �. Ž . Ž � �.wise let X , Y � Y, Y . Here H X , Y � 0 with probability 1.Ž .Suppose next that w satisfies Condition ii . Define the coupling here as

follows:

� � 4with probability p choose u � e� w uniformly at random and let

� � � 4 � 4 � 4X , Y � X � w � u , Y � w ,Ž . Ž .Ž .�

� �Ž . Ž . Ž � 4.with probability �� 1 � � p let X , Y � X, Y � w ,�

� �Ž . Ž .otherwise let X , Y � X, Y .

Ž � �. Ž � � .Now H X , Y � 0 with probability p� e � 1 , as this occurs if and� 4only if � is the element of e� w chosen uniformly at random. For all

� 4 Ž � �. Žother elements of e� w we obtain H X , Y � 2. With probability �� 1. Ž � �. Ž � 4. Ž � �. � � p we let X , Y � X, Y � w giving H X , Y � 2. With prob-


Ž .ability 1� 1 � there is no change in the Hamming distance. Therefore,in this case,

� �p p e � 2 �Ž .� �E H X , Y � 1 � � � pŽ . ž /� � � �e � 1 e � 1 � 1Ž . Ž .

� 2 p� �

� �1 � e � 1Ž .� 2 p

� �1 � m � 1Ž .

2 � p ,

using the definitions of m, p.Ž .Finally suppose that w satisfies Condition iii . Then w is critical in X

for at least one edge containing � and for exactly one edge not containing� 4� . With probability p choose u � e� w uniformly at random and let

� � � 4 � 4X , Y � X , Y � w � u ,Ž . Ž .Ž .

Ž � �. Ž . Ž Žotherwise let X , Y � X, Y . Since � � e it follows that H X, Y �� 4. � 4. � 4 Ž � �.w � u � 3 for all u � e� w . Hence H X , Y � 3 with probability p

Ž � �.and H X , Y � 1 otherwise.� Ž .Let d be the number of vertices w � V which satisfy Condition ii and

Ž .let d be the number of vertices w � V which satisfy Condition iii . NowŽ . Ž .any vertex w which satisfies Condition ii or iii is critical in X for some

edge e which contains �. Moreover, each such edge can contribute at mostone such vertex w. It follows that d� d � d. Combining these calcula-tions we obtain

1� � � E H X , Y � 1 � 2 pd 2 pd � 1Ž . Ž .

n

1� 2 pd � 1Ž .

n

1� 2 p� � 1Ž .

n

1 � m � 1 �Ž .� � 1 . 3Ž .ž /n m � 1Ž .


Let � be defined by

m � m � 1 � � m �Ž . .Ž .� � 1 � .

nm 1 �Ž .

Ž . � Ž � � .�Then 3 can be rearranged to give E H X , Y � �. Now � � 1 when-ŽŽ . . ŽŽ . .ever � � m� m � 1 � � m . When � � m� m � 1 � � m the chain

Ž .MM is rapidly mixing with mixing time � � given by

m 1 �Ž . �1� � � n log n� ,Ž . Ž .m � m � 1 � � m �Ž .Ž .

ŽŽ . .by the first part of Theorem 1.1. When � � m� m � 1 � � m weŽ .proceed as in the proof of Theorem 3.1. Suppose that H X, Y � i for two

independent sets X, Y. Then for i choices of vertices we can ensure thatŽ � �. Ž .H X , Y � i � 1 with probability at least 1� 1 � . This implies that the

mixing time satisfies

2 �1� � � 2n e 1 � log n 1 log � ,Ž . Ž . Ž . Ž .Ž .

as stated.

Ž Ž ..We have just shown that the Markov chain MM II G , when extended toŽŽ . .act on hypergraphs, is rapidly mixing for � � m� m � 1 � � m . Hence

this chain gives an efficient algorithm for approximately counting indepen-Ž .dent sets whenever m � 1 � � 2m; that is, for

Ž . � 4i � � 3, 4 , m � 2,Ž .ii � � 2, m 2,Ž .iii � � 3, m � 3.

Ž .In i we find graphs with maximum degree 4, as mentioned in Remark3.2. We discuss the remaining cases below.

Ž .First consider a hypergraph H � V, E with maximum degree two and� � � �maximum edge size m. Let n � V and q � E . Then q � n. Now H is

Ž .the dual hypergraph of a graph G � V , E with maximum degree m,G G� � � � Žwhere V � q and E � n. The dual hypergraph of a hypergraph isG G

Ž .obtained by transposing the incidence matrix, i.e., the 0, 1 -matrix withrows corresponding to vertices, columns corresponding to edges and a 1

Ž . .entry in the � , e position if and only if � � e. An independent set in Hcorresponds to a subset X � E which satisfies the following condition:Gfor every vertex � � V there exists an edge e � E such that � � e andG Ge � X. The complement E � X of X is an edge co�er of the graph G.G

ŽCounting edge coverings in graphs is a �P-complete problem for a


� �description of the decision version of the problem see 15 and for a sketch� �. � �of the proof of �P-completeness see 2 . In 2 a rapidly mixing Markov

chain for a generalization of this problem is described. The mixing time ofŽ Ž 2 3. Ž �1 .. Ž Ž ..this chain is O n n q log � , while the mixing time of MM II G is

Ž �1 .bounded above by mn log n� . Therefore approximately counting edge� Ž 2 .coverings in a graph can be done � n times faster using a Markov

Ž Ž .. Žchain Monte Carlo algorithm based on the Markov chain MM II G see.Section 3.1 .

Now consider the problem of counting independent sets in hypergraphswith maximum degree 3 and maximum edge size 3. We now sketch a proofthat counting independent sets in graphs of maximum degree 3 is �P-com-

Ž � �.plete for full details see 16, 11 .

Ž .THEOREM 4.2. Let �INDEPENDENT-SETS 3 denote the followingcounting problem. An instance is a graph with maximum degree three and theproblem is to calculate the number of independent sets in that graph. Then

Ž .�INDEPENDENT-SETS 3 is �P-complete.

Sketch of proof. The problem of counting the number of independentsets in graphs with maximum degree � is �P-complete for � 4, as

� � � �follows from 29 or 26, Theorem 3.1 . We give a polynomial-time reduc-Ž . Ž .tion from this problem to �INDEPENDENT-SETS 3 . Let G � V , EG G

be a graph with maximum degree � where � 3. We construct a graphŽ . � H � V, E with maximum degree three and edge bipartition E � E � E

by replacing every vertex � � V of degree greater than three by a chainGof new vertices � , . . . , � , each of degree three. Each neighbor of � in1 d�2

� � 4G is joined to exactly one � by an edge in E , while the edge � , �j j j1

belongs to E for 1 � j � d � 2. All other edges in G are placed in E�.Clearly H can be formed from G in polynomial time. For 0 � t � m,0 � i � t let n denote the number of subsets X of V such thatt, i

Ž . �i If e � E then e � X,Ž . �� 4 �ii e � E : e X � � � i,Ž . �� 4 �iii e � E : e � X � t � i.

� Ž . � mIt is not difficult to see that II G � Ý n .i�1 m , i

� �Let m � E . Denote by C denote the chain graph with r edges, whererr 1. Let f denote the r th Fibonnaci number, defined inductively byr

� Ž . �f � 1, f � 1 and f � f f for r 1. Then II C � f . For1 2 r2 r r1 r r31 � r � m 1 for the graph H , which is the r-stretch of H with respectrto E. That is, form H from H by replacing every edge in E by a copy ofr

� 4C . The family of graphs H : 1 � r � m 1 can be formed from H inr rpolynomial time. By counting the number of independent sets in C whichrcontain both, exactly one, or neither of the endpoints, we obtain an


equation of the form

m ti t�i m�tI H � n f f f .Ž . Ý Ýr t , i r1 r�1 r

t�0 i�0

This can be written as a polynomial in x , where x � f �f . Since theser r r r1Ž � �.quotients are distinct see for example 30, Lemma A.1 , standard interpo-

lation techniques ensure that the coefficients of this polynomial can befound in polynomial time. Therefore we can evaluate the polynomial atx � 0, giving the valuer

m

n � II G .Ž .Ý m , ii�0

This completes the polynomial-time reduction from �INDEPENDENT-Ž . Ž .SETS 4 to �INDEPENDENT-SETS 3 , proving that the latter problem is

�P-complete.

The set of independent sets in a given hypergraph corresponds in anatural way to the set of satisfying assignments of a certain SAT instanceŽ � �.for background on SAT see, for example, 23, Section 4.2 . A SATinstance is said to be monotone if no variable appears negated. Using

� �notation borrowed from Roth 26 , denote by l-kMON the class of allmonotone SAT instances where each variable appears in at most l clausesand each clause contains at most k literals. Let G be a hypergraph whereall edges have at most m vertices and all vertices appear in at most �

Žedges. Let each vertex correspond to a variable denoted by the same. � 4symbol and for each e � E form a clause C as follows. If e � � , . . . , �e 1 d

then letd

C � � .�e ii�1

Then

C 4Ž .� ee�E

� Ž . �is a SAT instance with II G satisfying assignments. Replace the variable� by w for each variable � . This gives an instance of �-mMON. Thei i iabove construction can be reversed, showing that the problem of findingthe number of independent sets in hypergraphs with maximum degree atmost � and maximum edge size at most m is equivalent to the countingproblem ��-mMON. Thus Theorem 4.2 shows that the counting prob-lem �3-2MON is �P-complete. It follows immediately that �3-3MON


Ž Ž ..is also �P-complete. As far as we are aware, the Markov chain MM II Gis the first which can be used to approximately count the number ofsatisfying assignments of a �3-3MON instance.

Remark 4.1. It is not difficult to further show that the problem ofcounting independent sets in 3-regular graphs is �P-complete. Moreover, itis possible to show that the problem of counting independent sets inhypergraphs with maximum degree three is �P-complete, even when every

Ž .edge has size three. This disproves the admittedly rather unlikely conjec-ture that the counting problem �3-3MON is �P-complete because of thepossible presence of edges of size two. For full details of these proofs see� �11, 16 .

5. COMPARING MARKOV CHAINS WITH THE SAMESTATIONARY DISTRIBUTION

As remarked in the Introduction, there are relatively few methodsavailable for investigating the mixing time of Markov chains. In this paperwe have used path coupling. Some alternative approaches have been

Ždeveloped which involve the eigenvalues of the Markov chain see for� �.example 10, 28 . Let � denote the second eigenvalue of the Markov1

chain MM. There is a well-known inequality relating the mixing time of aMarkov chain and the so called spectral gap 1 � � of the chain. The1difficulty in applying this inequality lies in obtaining a bound for thespectral gap of MM. The bound on the mixing time obtained in this mannerinvolves a factor related to the stationary distribution � of MM. This oftenresults in a lack of tightness in the bound on the mixing time.

In this section we consider two Markov chains with the same state spaceand stationary distribution, one of which is known to be rapidly mixing. Forthe first result we assume that a certain linear relationship exists betweenthe entries of the transition matrices of these chains. We prove that thisimplies a linear relationship between the spectral gaps of the chains. In thesecond result the mixing rate of a Markov chain is used to provide anupper bound for the second eigenvalue of the chain. Combining theseresults, we show that the mixing time of one of the chains can be boundedin terms of the mixing time of the other using the relationship between thespectral gaps. In certain circumstances this allows us to deduce that theother chain is also rapidly mixing. The bound obtained on the mixing timeof the other chain will be larger than the bound for the mixing time of thefirst chain. This is partly due to the factor related to � incurred throughthe use of the spectral gap bound, as mentioned above. At the end of thesection we apply our method to two Markov chains of independent sets:


the simple insert�delete chain described in Section 2 and the new chainŽ Ž ..MM II G defined in Section 3.

�Our approach is quite simple, but gives similar results to those in 7�9,�25 , which provide alternative methods for relating the spectral gaps of two

Markov chains. We consider only Markov chains with the same stationary� � �distribution, but in this case, the conditions of 8, Theorem 2.1 and 9,

�Lemma 3.3 are met whenever our requirement holds, that is, when thetransition matrices of the two chains are linearly related. While our

�approach may yield worse constants than those produced by 8, Theorem� � �2.1 or 9, Lemma 3.3 , it has the advantage of being extremely simple to

� �apply. In 25 an alternative method of bounding the mixing time of onechain, using the relationship between the spectral gaps of two chains, isgiven. The method we present here improves the bound on the mixing timeby a constant, assuming the same value of a certain parameter.

First we introduce some notation. Let MM be an irreducible Markovchain with transition matrix P and state space �. Assume that P is

Ž .reversible with respect to some probability distribution � such that � x� 0 for all x � �. The eigenvalues � , . . . , � of P are real numbers0 N�1which satisfy

1 � � � � �� 1,0 1 N�1

� � Ž � �.where N � � see, for example, 8 . For convenience, we can assumethat all the eigenvalues are nonnegative by adding a holding probability of

Ž .1�2 to every state of MM. This amounts to replacing P by I P �2 andwill double the mixing time. For the remainder of the section we assumethat all the eigenvalues are nonnegative. The second largest eigenvalue �1appears in the following well-known bound on the variation distance of MM

at time t with initial state x:

1 1 � � xŽ .t td P , � � � .Ž .TV x 1(2 � xŽ .

� �For a proof see, for example, 10, Proposition 3 . It follows that the mixingŽ .time � � of MM satisfies

�1�log � 2�Ž .Ž .� � � 5Ž . Ž .

1 � �1

where

1 � � xŽ .�� max : x � � . 6Ž .(½ 5� xŽ .


Ž . Ž � . Ž .�1It follows from 5 that MM is rapidly mixing if both log � and 1 � �1are bounded above by polynomials in n, where n is a measure of the sizeof the elements of �. This result is of theoretical importance, but of littlepractical value for a given chain unless an upper bound on the spectral gapof MM can be obtained.

Suppose now that MM and MM are two Markov chains on the same state1 2space � which are reversible with respect to the same stationary distribu-tion � . The first result of this section shows how a linear relationshipbetween the entries of the transition matrices of MM and MM gives rise to a1 2linear relationship between the spectral gap of the two chains.4

THEOREM 5.1. Let MM and MM be Marko� chains with state space � and1 2stationary distribution � . Let P denote the transition matrix of MM and leti iŽ .� P be the second eigen�alue of P , for i � 1, 2. Suppose that there exists1 i i

� 0 such that P �P . Then1 2

1 � � P � 1 � � P . 7Ž . Ž . Ž .Ž . Ž .1 1 1 2

Proof. Define the matrix P by3

P � �P1 2P � .3 1 � �

Then every entry in P is nonnegative and P e � e, where e is the column3 3vector with every entry equal to 1. Therefore P is the transition matrix of3a Markov chain on �. Moreover P is reversible with respect to � . Let D3

Ž . � �be the diagonal matrix with j, j th entry equal to � for 1 � j � � . Let' j

R � DP D�1 for 1 � i � 3. Since P is reversible with respect to � thei i iŽ . Ž .matrix R is symmetric for 1 � i � 3. Also � R � � P for 1 � i � 3.i 1 i 1 i

The usual min-max characterization of eigenvalues of symmetric matricesstates that

T � �� R � max x R x : x .e � 0, x � 1Ž . � 421 i i

Ž � �. Ž .for 1 � i � 3 see for example, 6, Theorem 1.9.1 . Since R � 1 � � R1 3 �R it follows that2

T t � �� R � max 1 � � x R x � x R x : x .e � 0, x � 1Ž . Ž .� 421 1 3 2

t � �� 1 � � max x R x : x .e � 0, x � 1Ž . � 423

T � � � max x R x : x .e � 0, x � 1� 422

� 1 � � � R �� RŽ . Ž . Ž .1 3 1 2

� 1 � � �� R .Ž .1 2

This proves the theorem.

4 Note added in proof : Since submitting this paper, we have learned that this result is� �implied by the results of Caracciolo et al. 5, Theorems A.1, A.3 .


Remark 5.1. Suppose that MM and MM are two Markov chains which1 2satisfy the conditions of Theorem 5.1. Then MM will perform the same1transition as MM with probability at least � , irrespective of the current2state. This provides some intuition as to why rapid mixing of MM may in2certain circumstances imply that MM is also rapidly mixing.1

The second result of this section is a straightforward calculation, show-ing how the second eigenvalue of a Markov chain may be bounded aboveusing its mixing time.

THEOREM 5.2. Let MM be an irreducible Marko� chain with mixing timeŽ . �1� T Ž �1 .� � and second eigen�alue � . Then � � e where T � � e .1 1

Proof. Consider a maximal coupling of MM with an arbitrary pair ofinitial states. After performing T steps of the coupling the probability thatthe process has not coupled is at most e�1. If we run s independentcoupling trials of length T then the probability that we have not coupledafter these sT steps is at most e�s. By the Coupling Lemma it follows that

tt �1� Td P , � � e 8Ž . Ž .Ž .TV x

� 4for t � T , 2T , 3T , . . . and for all initial states x. We now construct aparticular initial distribution . Let � be a left eigenvector of � such that1� �� 1, and let e be the column vector with each entry equal to 1. Now e2

is a right eigenvector corresponding to eigenvalue � . It follows that0� Ž . 4� .e � 0. Let � � min � x : x � � . Then � � 0 as MM is irreducible.

Finally let be the vector defined by � � � � � . Then every compo-nent of is nonnegative and .e � � .e � 1. Therefore is a distributionand

1 1 �t t t t� �d P , � P � � � � � P � � .Ž .Ž . 2 2TV 12 2 2

Ž . Ž .It follows from 1 , 8 that

tt �1� Td P , � � eŽ .Ž .TV

� 4for t � T , 2T , 3T , . . . . Therefore

1� t �1� T� � 2�� eŽ .1

for these values of t. Letting t tend to infinity along this sequence we�1� Tobtain � � e , as required.1

Whenever two Markov chains MM , MM satisfy the requirements of Theo-1 2rem 5.1, the following theorem gives a relationship between the mixingtimes of the chains.


THEOREM 5.3. Let P , P be transition matrices of irreducible Marko�1 2chains MM , MM with state space � and stationary distribution � . Assume that1 2

� Ž .all eigen�alues of MM are positi�e, for i � 1, 2. Let � be as defined in 6iŽ .and let � � denote the mixing time of MM for i � 1, 2. Suppose that therei i

exists � 0 such that P �P . Then1 2

�1��1 �1� � � 2� � e log � 2� . 9Ž . Ž . Ž . Ž .Ž .1 2

Ž .Proof. Let � P denote the second eigenvalue of P for i � 1, 2 and1 i iŽ �1 .let T � � e . Then2

1 � � P � 1 � � P � 1 � e�1� TŽ . Ž . Ž .Ž .1 1 1 2

Ž .by Theorems 5.1, 5.2. Using 5 , it follows that

�1�log � 2�Ž .Ž .�1� � � � . 10Ž . Ž .1 �1� T1 � e

The inequality

2 rs sse 1 � � 1 ž /ž /r r

� � Ž � �.is satisfied for all r 1 and s � r see for example 22, p. 435 . Takingr � 2 and s � �1�T we obtain

4T 2 � 4T 1�1� Te � .24T � 2

Therefore

4T 2 � 2�1�1� T1 � e � � 2T .Ž .4T � 3

Ž .Substituting this into 10 we obtain

�1��1 �1� � � 2� � e log � 2� ,Ž . Ž . Ž .Ž .1 2

as required.

The following corollary is immediate.

COROLLARY 5.1. In the situation described in Theorem 5.3, let n be ameasure of the size of the elements of �. Suppose that MM is rapidly mixing.2

�1 Ž � .If � and log � are each bounded abo�e by a polynomial in n, then MM1is also rapidly mixing.


Remark 5.2. The bound on the mixing time of MM obtained in Theo-1Ž .rem 5.3 may be significantly larger than the known bound for � � . One2

reason for this is the ��1 factor. This factor is large if the probability thatŽ .MM acts like MM is small see Remark 5.1 . Another factor involved in the1 2

Ž . Ž � . � � nbound for � � is log � . If � is approximately equal to k for some k1Ž � . Ž .then log � n log k �2. This increases the bound on the mixing time

by at least one extra factor of n.

Remark 5.3. Let MM be two Markov chains with the same state space �iŽ .and stationary distribution � . Suppose that 7 holds for some � � 0. In

� �25, Proposition 4 , Randall and Tetali give the bound

�1 �1�1� � � 4� � � log �� log 2� ,Ž . Ž . Ž . Ž .Ž . Ž .1 2

� Ž . 4 �1where �� min � x : x � � . By setting � � e , we can compare thisŽ �1 . Ž . Župper bound on � e with the upper bound given in 9 . Note that2

Ž �1 . .� e is sometimes taken as the definition of the mixing time of a chain.It is not difficult to show that the ratio of the Randall�Tetali upper boundwith the new upper bound is given by

RT 4��1� e�1 log e��Ž . Ž .2� ��1 �1DG 2� � e log � e�2 log e�2Ž . Ž . Ž .2

2 1 log ��1Ž .Ž .� �1 � log 2 log � 1 � log 2Ž . Ž .Ž . Ž .Ž

4 1 log ��1Ž .Ž . ,�11 � log 2 log �� 2 � 2 log 2Ž . Ž .Ž . Ž .Ž .

Ž � .2 � � Ž �1 .since �� . Assuming that � grows with n, we see that log ��also grows with n and so dominates the constant terms. Hence this ratio

Ž Ž .. Ž .tends to 4� 1 � log 2 as n tends to infinity. Thus the bound given by 9is at least 8 times better than the Randall�Tetali bound, for the samevalue of � , for large values of n.

We now illustrate this method with reference to the insert�delete chainŽ Ž ..described in Section 2 and the chain MM II G introduced in Section 3.

Ž .Both chains have state space II G , the set of independent sets of a graphŽ . � X �G. The stationary distribution � of both chains is given by � X � � �Z

Ž Ž ..for some positive parameter �. We shall not compare the chain MM II Gdirectly with the insert�delete chain. One reason for this is the fact that a

Ž Ž ..transition of MM II G may involve two vertices of G while each transitionof the insert�delete chain involves only one vertex. Another reason is thatwe must ensure that all the eigenvalues of the chains are positive. There-


Ž Ž ..fore we consider the two-step insert�delete chain MM II G with a hold-1Ž Ž ..ing probability of 1�2 at each step. A transition of MM II G from current1

state X consists of performing the following procedure twice:With probability 1�2 do nothing, otherwise

Ž .i choose � � V uniformly at random,Ž . � � 4 Ž . �ii let X � X � � with probability �� 1 � otherwise let X �

� 4X � � ,Ž . � Ž . �iii if X � II G then move to X , else stay at X.

Ž Ž .. Ž Ž ..Similarly, let MM II G be the Markov chain obtained from MM II G2by adding a holding probability of 1�2 to every state. We prove below thatthe entries of the transition matrices of these chains satisfy a linearrelationship.

Ž Ž ..THEOREM 5.4. Let MM II G be the Marko� chains with state spaceiŽ .II G and stationary distribution � , as described abo�e. Let P be thei

Ž Ž ..transition matrix of MM II G for i � 1, 2. Then P �P , wherei 1 2

1 4� � min , .½ 52 1 � nŽ .

Proof. We need only verify the inequality P �P for nonzero entries1 2Ž Ž ..of P . Now MM II G has a holding probability of 1�2 at every state.2 1

Therefore1P X , X � �P X , XŽ . Ž .1 22

Ž . Ž .for all X � II G . Suppose next that H X, Y � 1. We can assume with-� � � � Ž . Ž Ž ..out loss of generality that Y � X 1, so P X, Y � �� 2n 1 � .2

Ž Ž ..The probability that MM II G makes the same transition is given by the1probability that the chain performs the correct insertion in one step anddoes nothing in the other step. Since the order that the steps are per-formed is immaterial it follows that

�P X , Y � P X , Y �P X , YŽ . Ž . Ž .1 2 22n 1 �Ž .

Ž . Ž � 4. � 4for these pairs X, Y . Finally suppose that Y � X � � � w for some� 4 Ž . Ž Ž ..edge � , w . Then P X, Y � �� 8n 1 � and2

� 4P X , Y � � P X , Y �P X , Y .Ž . Ž . Ž .1 2 222 n 1 �Ž .2n 1 �Ž .

This proves the theorem.


Since MM satisfy the requirements of Theorem 5.1, we can apply Theo-irem 5.3 to relate their mixing times. This implies that the insert�delete

Ž .chain is rapidly mixing for � � 2� � � 2 , as described below.

THEOREM 5.5. Let G be a graph with maximum degree � and let � be aŽ .positi�e parameter. The insert�delete chain with state space II G is rapidly

Ž .mixing for � � 2� � � 2 .

Ž Ž ..Proof. Consider the Markov chains MM II G defined above, for i �i1, 2. Let � denote their stationary distribution and let � be as defined in

Ž .Theorem 5.4. Let � � denote the mixing time of MM , for i � 1, 2. Byi iTheorems 5.3 and 5.4, the mixing times are related as follows:

�1��1 �1� � � 2� � e log � 2� . 11Ž . Ž . Ž . Ž .Ž .1 2

Ž Ž ..By definition, the Markov chain MM II G is obtained by adding a2Ž Ž ..holding probability of 1�2 to each state of the chain MM II G described

Ž Ž ..in Section 3. Hence, by Theorem 3.1, the chain MM II G is rapidly mixing2Ž . Ž .for � � 2� � � 2 . The mixing time � � bounded above by twice the2

bound given in Theorem 3.1. In order to apply Corollary 5.1, we must showŽ � . �1that both log � and � are bounded above by a polynomial in n.

Ž . � X �Now � X � � �Z, where � is a positive parameter. Therefore the� Ž .quantity � , defined in 6 , is given by

�n'� �Z � � 1 if � � 1,�� ½ '� �Z � 1 if � � 1.

In both casesn

� �log � � log � 1 12Ž . Ž . Ž .Ž .2

� � �14where � � max �, � . This quantity is linear in m. Moreover theconstant � defined in Theorem 5.4 satisfies

��1 � max 2, 1 � n�4 , 13� 4Ž . Ž .

which is either constant or linear in n. Thus, by Corollary 5.1, the MarkovŽ .chain MM is rapidly mixing for � � 2� � � 2 .1Ž Ž ..By definition, the transitions of MM II G consist of two steps of the1

simple insert�delete chain, with a holding probability of 1�2 added atŽeach step. The simple insert�delete chain without the holding probabili-

. Ž Ž ..ties has the same mixing time as MM II G , since the factor 2 increase1incurred by adding a holding probability of 1�2 is canceled by taking twosteps per transition. Thus the insert�delete chain is rapidly mixing for

Ž . Ž .� � 2� � � 2 with mixing time � � .1


Let us compare this result with Theorem 2.2. We may conclude that theinsert�delete chain is rapidly mixing for a wider range of � than waspreviously known. In particular the insert�delete chain can be used toapproximately count independent sets of graphs with maximum degree atmost four. However, extending the range of � for which rapid mixing ofthe insert�delete chain is known using this method comes at a cost.Ignoring constant factors, the upper bound for the mixing time of the

Ž Ž .. Žinsert�delete chain provided by Theorem 2.2 is O n log n for � � 1� �. Ž 2 . Ž .� 1 and O n for � � 1� � � 1 . Using Theorem 3.1, it follows that the

Ž . Ž 3 Ž .. Ž .upper bound given by 11 is O n log n when � � 2� � � 2 , andŽ 4. Ž .O n when � � 2� � � 2 . A factor of n is introduced by each ofŽ � . �1 Ž . Ž .log � and � , as can be seen from 12 and 13 , respectively. This

illustrates the warning given in Remark 5.2.This technique has also been applied to the problem of counting graph

� � Ž .colorings. In 4 , a Markov chain was defined with state space � G , thekset of all proper k-colorings of the given graph G. Using a computer-as-sisted proof, this chain was shown to be rapidly mixing for k � 5 when Ghas maximum degree 3. The result was extended to show rapid mixing fork � 7 when G is a triangle-free 4-regular graph. Previously, other Markovchains for graph colorings were only known to be rapidly mixing fork 2�, including the simple Markov chain discovered independently by

� � � �Jerrum 17 and Salas and Sokal 27 . Using the comparison techniquedescribed in this section, we can conclude that the Jerrum�Salas�Sokalchain is also rapidly mixing for k � 5 when G has maximum degree 3, andfor k � 7 when G is a triangle-free 4-regular graph.

Remark 5.4. Let P be the transition matrix of the Luby�VigodaLVchain and let P be the transition matrix of the new chain, as introducedDGin Section 3. It is not difficult to show that there exists a constant � ,independent of n, such that P �P . Therefore, applying the com-LV DGparison technique described in this section, we can conclude that the

Ž .Luby�Vigoda chain is also rapidly mixing for � � 2� � � 2 , and that theŽ .mixing time of the Luby�Vigoda chain is at most O n times greater than

Ž . Ž .the mixing time of the new chain in the range 1� � � 3 � � � 2� � � 2 .Ž .However, this O n factor is almost certainly just an artifact of the

analysis.

ACKNOWLEDGMENTS

The authors thank Mark Jerrum, Alan Sokal, and David Wilson for their helpful com-ments.


REFERENCES

1. D. Aldous, Random walks on finite groups and rapidly mixing Markov chains, inŽ .‘‘Seminaire de Probabilites XVII 1981�1982’’ A. Dold and B. Eckmann, Eds. , Springer-´ ´

Verlag Lecture Notes in Mathematics, Vol. 986, pp. 243�297, Springer-Verlag, NewYork, 1983.

2. R. Bubley and M. Dyer, Graph orientations with no sink and an approximation for a hardcase of �SAT, in ‘‘8th Annual Symposium on Discrete Algorithms,’’ pp. 248�257,ACM�SIAM, New York�Philadelphia, 1996.

3. R. Bubley and M. Dyer, Path coupling: A technique for proving rapid mixing in Markovchains, in ‘‘38th Annual Symposium on Foundations of Computer Science,’’ pp. 223�231,IEEE, Los Alimitos, 1997.

4. R. Bubley, M. Dyer, C. Greenhill, and M. Jerrum, On approximately counting colouringsŽ .of small degree graphs, SIAM J. Comput. 29 1999 , 387�400.

5. S. Caracciolo, A. Pelissetto, and A. Sokal, Nonlocal Monte Carlo algorithm for self-avoid-Ž .ing walks with fixed endpoints, J. Statist. Phys. 60 1990 , 1�53.

6. F. Chatelin, ‘‘Eigenvalues of Matrices,’’ Wiley, Chichester, 1993.7. F. R. K. Chung and R. L. Graham, Random walks on generating sets for finite groups,

Ž .Electron. J. Combin. 1997 , 14.8. P. Diaconis and L. Saloff-Coste, Comparison theorems for reversible Markov chains,

Ž .Ann. Appl. Probab. 3 1993 , 696�730.9. P. Diaconis and L. Saloff-Coste, Logarithmic Sobolev inequalities for finite Markov

Ž .chains, Ann. Appl. Probab. 6 1996 , 695�750.10. P. Diaconis and D. Strook, Geometric bounds for eigenvalues of Markov chains, Ann.

Ž .Appl. Probab. 1 1991 , 36�61.11. M. Dyer and C. Greenhill, ‘‘Some �P-completeness Proofs for Colourings and Indepen-

dent Sets,’’ Tech. Rep. 97.47, School of Computer Studies, University of Leeds, 1997.12. M. Dyer and C. Greenhill, A more rapidly mixing Markov chain for graph colourings,

Ž .Random Structures Algorithms 13 1998 , 285�317.13. M. E. Dyer, A. M. Frieze, and M. R. Jerrum, On counting independent sets in sparse

graphs, in ‘‘40th Annual Symposium on Foundations of Computer Science,’’ pp. 210�217,IEEE, Los Alimitos, 1999.

14. J. A. Fill, An interruptible algorithm for perfect sampling via Markov chains, Ann. Appl.Ž .Probab. 1998 , 131�162.

15. M. R. Garey and D. S. Johnson, ‘‘Computers and Intractibility: A Guide to the Theory ofNP-completeness,’’ Freeman, San Francisco, 1979.

16. C. Greenhill, The complexity of counting colourings and independent sets in sparseŽ .graphs and hypergraphs, Comput. Complexity to appear .

17. M. Jerrum, A very simple algorithm for estimating the number of k-colorings of aŽ .low-degree graph, Random Structures Algorithms 7 1995 , 157�165.

18. M. Jerrum and A. Sinclair, The Markov chain Monte Carlo method: An approach toapproximate counting and integration, in ‘‘Approximation Algorithms for NP-Hard

Ž .Problems’’ D. Hochbaum, Ed. , pp. 482�520, PWS Publishing, Boston, 1996.19. M. Luby, D. Randall, and A. Sinclair, Markov chain algorithms for planar lattice

Ž .structures extended abstract , in ‘‘36th Annual Symposium on Foundations of ComputerScience,’’ pp. 150�159, IEEE, Los Alimitos, 1995.

20. M. Luby and E. Vigoda, Fast convergence of the Glauber dynamics for samplingŽ .independent sets, Random Structures Algorithms 15 1999 , 229�241.

21. M. Luby and E. Vigoda, Approximately counting up to four, in ‘‘Twenty-Ninth AnnualSymposium on Theory of Computing,’’ pp. 682�687, ACM, New York, 1997.


22. R. Motwani and P. Raghavan, ‘‘Randomized Algorithms,’’ Cambridge University Press,Cambridge, 1995.

23. C. H. Papadimitriou, ‘‘Computational Complexity,’’ Addison�Wesley, Reading, MA, 1994.24. J. G. Propp and D. B. Wilson, Exact sampling from coupled Markov chains and

Ž .applications to statistical mechanics, Random Structures Algorithms 9 1996 , 223�252.25. D. Randall and P. Tetali, Analysing Glauber dynamics by comparison of Markov chains,

in ‘‘Third Latin American Symposium on Theoretical Informatics,’’ Lecture Notes inComputer Science, Vol. 1380, pp. 292�304, Springer, Campinas, Brazil, 1998.

Ž .26. D. Roth, On the hardness of approximate reasoning, Artificial Intelligence 82 1996 ,273�302.

27. J. Salas and A. D. Sokal, Absence of phase transition for antiferromagnetic Potts modelsŽ .via the Dobrushin uniqueness theorem, J. Statist. Phys. 86 Feb. 1997 , 3�4, 551�579.

28. A. Sinclair, Improved bounds for mixing rates of Markov chains and multicommodityŽ .flow, Combin. Probab. Comput. 1 1992 , 351�370.

29. S. Vadhan, ‘‘The Complexity of Counting,’’ undergraduate thesis, Harvard University,1995.

30. S. P. Vadhan, The complexity of counting in sparse, regular, and planar graphs, preprint,� �May 1997. a�ailable at http:��www-math.mit.edu� salil�

Documents

On Markov Chains for Independent Sets