15
Journal of Mathematical Economics 46 (2010) 1015–1029 Contents lists available at ScienceDirect Journal of Mathematical Economics journal homepage: www.elsevier.com/locate/jmateco Escape dynamics and equilibria selection by iterative cycle decomposition Zhiwei Cui a,b , Jian Zhai c,a Department of Mathematics, Zhejiang University, Hangzhou 310027, PR China b School of Economics and Management, Beijing University of Aeronautics & Astronautics, Beijing 100191, PR China c Center of Neuroscience and Informatics, Department of Mathematics, Zhejiang University, Hangzhou 310027, PR China article info Article history: Received 26 July 2008 Received in revised form 6 November 2009 Accepted 25 November 2009 Available online 3 December 2009 JEL classification: C72 C73 Keywords: Adaptive learning dynamics in games Escape dynamics Iterative cycle decomposition Stochastic stability abstract This paper explores the medium-run behaviour of bounded rational players in repeatedly played games when they occasionally experiment or make mistakes. The formal analysis introduces a hierarchical structure of limit sets to characterize the most possible medium- run behaviour over gradually increased time intervals. The paper refines the notion of stochastic stability and offers a precise measure of the speed at which stochastically stable equilibria occur. Finally, the paper applies the results to a 3 × 3 symmetric game of Young (1993). © 2009 Elsevier B.V. All rights reserved. 1. Introduction In game theory literature, an important question is: How will players coordinate their actions on a Nash equilibrium? To answer the question, the literature considers the adaptive dynamics of bounded rational players in repeatedly played games (Fudenberg and Levine, 1998, 2009; Young, 1998). These dynamics are referred to as the adaptive learning dynamics. A convincing explanation is that Nash equilibria correspond to the limit sets or the long-run outcomes of the adaptive learning dynamics. Since the seminal papers of Foster and Young (1990); Kandori et al. (1993) and Young (1993), it has become a common practice to introduce random noise into the adaptive learning dynamics and examine the stochastic stability of the limit sets. According to the theory of Markov chains, stochastically stable equilibria are those equilibria that could be observed most of the time provided that the amount of noise is sufficiently small. Stochastically stable equilibria of the adaptive learning dynamics can explain the evolution of social conventions (Young, 1993, 1998, 2006). In the existing literature, stochastically stable equilibria are identified by finding the minimum-cost tree. This method, developed by Kandori et al. (1993); Young (1993) and Kandori and Rob (1995), explores the long-run behaviour of the The paper was presented at the Fifth General Equilibrium Theory Workshop, Asia (GETA2008), Xiamen. Corresponding author. E-mail addresses: [email protected] (Z. Cui), [email protected] (J. Zhai). 0304-4068/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jmateco.2009.11.014

Escape dynamics and equilibria selection by iterative cycle decomposition

Embed Size (px)

Citation preview

Journal of Mathematical Economics 46 (2010) 1015–1029

Contents lists available at ScienceDirect

Journal of Mathematical Economics

journa l homepage: www.e lsev ier .com/ locate / jmateco

Escape dynamics and equilibria selection by iterativecycle decomposition�

Zhiwei Cuia,b, Jian Zhaic,∗

a Department of Mathematics, Zhejiang University, Hangzhou 310027, PR Chinab School of Economics and Management, Beijing University of Aeronautics & Astronautics, Beijing 100191, PR Chinac Center of Neuroscience and Informatics, Department of Mathematics, Zhejiang University, Hangzhou 310027, PR China

a r t i c l e i n f o

Article history:Received 26 July 2008Received in revised form 6 November 2009Accepted 25 November 2009Available online 3 December 2009

JEL classification:C72C73

Keywords:Adaptive learning dynamics in gamesEscape dynamicsIterative cycle decompositionStochastic stability

a b s t r a c t

This paper explores the medium-run behaviour of bounded rational players in repeatedlyplayed games when they occasionally experiment or make mistakes. The formal analysisintroduces a hierarchical structure of limit sets to characterize the most possible medium-run behaviour over gradually increased time intervals. The paper refines the notion ofstochastic stability and offers a precise measure of the speed at which stochastically stableequilibria occur. Finally, the paper applies the results to a 3 × 3 symmetric game of Young(1993).

© 2009 Elsevier B.V. All rights reserved.

1. Introduction

In game theory literature, an important question is: How will players coordinate their actions on a Nash equilibrium? Toanswer the question, the literature considers the adaptive dynamics of bounded rational players in repeatedly played games(Fudenberg and Levine, 1998, 2009; Young, 1998). These dynamics are referred to as the adaptive learning dynamics. Aconvincing explanation is that Nash equilibria correspond to the limit sets or the long-run outcomes of the adaptive learningdynamics.

Since the seminal papers of Foster and Young (1990); Kandori et al. (1993) and Young (1993), it has become a commonpractice to introduce random noise into the adaptive learning dynamics and examine the stochastic stability of the limit sets.According to the theory of Markov chains, stochastically stable equilibria are those equilibria that could be observed mostof the time provided that the amount of noise is sufficiently small. Stochastically stable equilibria of the adaptive learningdynamics can explain the evolution of social conventions (Young, 1993, 1998, 2006).

In the existing literature, stochastically stable equilibria are identified by finding the minimum-cost tree. This method,developed by Kandori et al. (1993); Young (1993) and Kandori and Rob (1995), explores the long-run behaviour of the

� The paper was presented at the Fifth General Equilibrium Theory Workshop, Asia (GETA2008), Xiamen.∗ Corresponding author.

E-mail addresses: [email protected] (Z. Cui), [email protected] (J. Zhai).

0304-4068/$ – see front matter © 2009 Elsevier B.V. All rights reserved.doi:10.1016/j.jmateco.2009.11.014

1016 Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029

perturbed learning dynamics. However, it misses the analysis of the medium-run behaviour, which is certainly essentialwhen stochastically stable equilibria emerge at a low speed (Ellison, 1993, 2000; Young, 2006).

Applying the theory of large deviations, this paper studies the medium-run behaviour of the perturbed learning dynamicswhen the noise is vanishing. To depict the most possible medium-run behaviour over gradually increased time intervals,the limit sets of the unperturbed learning dynamics are iteratively decomposed into a hierarchy of cycles. With the cycledecomposition, the notion of stochastic stability is refined, and a precise measure is offered to evaluate the speed at whichstochastically stable equilibria occur.

The main idea of the cycle decomposition is illustrated. A collection of limit sets is a cycle of rank 1 when two conditionsare satisfied. First, any two limit sets are accessible via transitions between the limit sets in this collection. Each transitionconsists of the most possible exit path and the convergence of the unperturbed learning dynamics. Second, if the initial stateis the terminal point of the most possible exit path from the attraction basin of a limit set in this collection, the unperturbedlearning dynamics only converge to the limit sets within this collection. Thus, the perturbed learning dynamics will traverseover the limit sets in this collection many times before entering into the attraction basin of other limit sets as the initial statebelongs to the attraction basin of the collection of limit sets and the amount of noise goes to zero.

The decomposition technique is informally proposed by Nöldeke and Samuelson (1993) and Samuelson (1994), and ithas been applied extensively to the identification of stochastically stable equilibria in models with random noise, e.g., Ely(2002) and Goyal and Vega-Redondo (2005). In these papers, the limit sets are decomposed into components. Specifically,a component is a collection of limit sets satisfying the following condition.2 (i) Any two limit sets are accessible throughthe single-mutation transitions between the limit sets in the collection. (ii) The collection contains the single-mutationneighborhoods of each limit set in the collection. An important result of the component decomposition is whether allelements in each component are stochastically stable or not. To determine stochastically stable equilibria, it is sufficient toconsider the adaptive dynamics on the collection of components. These facts substantially simplify the analysis. Throughconsidering general escape dynamics, this paper extends the component decomposition to the cycle decomposition. Withthe iterative decomposition on limit sets, “components” and so on, we not only get deeper insight into the long-run stochasticstability, but also investigate the medium-run behaviour.3

Moreover, the cycle decomposition has been used to study the perturbed diffusion Markov processes in a continuous-time framework (Freidlin and Wentzell, 1984; Hwang and Sheu, 1990), simulated annealing (Chiang and Chow, 1989, 1998)and stochastic stability with time-dependent mutations (Chen and Chow, 2001).

To our knowledge, Ellison (2000) is the first to systematically consider the medium-run behaviour of the adaptive learningdynamics. He introduces two new measures, the radius and the (modified) coradius, to characterize the persistence and theattraction of the collection of limit sets, respectively. Building upon his original work, this paper establishes a hierarchicalstructure of the limit sets. This paper offers a necessary and sufficient condition to identify stochastically stable equilibria anda precise measure of the speed at which stochastically stable equilibria occur, while Ellison (2000) only provides a sufficientcondition and an upper bound of the speed. It is necessary to mention the use of the graph theory in setting up our model;an idea owing to Beggs (2005), which shows that graphs can be utilized to calculate the expected waiting time.

This paper considers the cases in which the perturbed learning dynamics correspond to a family of irreducible Markovchains. For example, a fixed 2 × 2 game is repeatedly played by bounded rational players. In each period, each player adoptsthe strategy which is the optimal response to the distribution of strategies chosen by the other player in the most recentperiods. Occasionally players experiment with a randomly selected strategy. Formally, the adaptive learning dynamicsconsidered in this paper are formulated as Markov chains consisting of the following elements:

1. A finite set S denoting the state space.2. A Markov transition matrix P on S referred to as the unperturbed one.3. A family of Markov transition matrices P(�) on S indexed by a parameter � ∈ [0, �) ⊆ (0, 1) satisfying that

• P(�) is irreducible for any � ∈ (0, �);• P(�) is continuous in � with P(0) = P;• there exists a unique cost function c : S × S → R

+ ∪ {0, +∞} such that for any s1, s2 ∈ S,

⎧⎨⎩ 0 < lim

�→0

Ps1,s2 (�)�c(s1,s2) < +∞ if c(s1, s2) < +∞;

for any sufficiently small�, Ps1,s2 (�) = 0 if c(s1, s2) = +∞.

Here Ps1,s2 (�) is the probability that state s1 is followed by state s2 given that the amount of noise is �. Ellison (2000)provides insightful and comprehensive interpretations for each element in the above reduced-form framework. We omitour discussion here.

2 For detailed discussion, see Definition 2 and Definition 3 in Nöldeke and Samuelson (1993) or Definition 3 and Definition 4 in Samuelson (1994).3 Without confusion, we use terminology in Nöldeke and Samuelson (1993).

Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029 1017

The remainder of the paper is organized as follows. Section 2 introduces notations and basic definitions, and employs thetheory of large deviations to analyze the escape dynamics from the attraction basin of a limit set (0-cycle). By decomposinglimit sets into a hierarchy of cycles, Section 3 studies the medium-run behaviour over larger time intervals. Section 4 focuseson the analysis of stochastic stability. Section 5 applies the results to a 3 × 3 symmetric game of Young (1993). Finally, Section6 concludes. Proofs are contained in Appendix A.

2. Notations and basic escape dynamics

2.1. Notations and basic definitions

Let {Xn}n ∈N and {X�n }n ∈N be, respectively, the unperturbed and the perturbed Markov chains with respect to (S, P) and

(S, P(�)). K={Ki : 1 ≤ i ≤ i0} denotes the set of all recurrent communication classes or the collection of all limit sets.4 Supposethat i0 = |K| > 1, where |K| is the total number of the elements within the set K. For any subset E of I={1, 2, . . . , i0}, D(∪i ∈ EKi)denotes the attraction basin of ∪i ∈ EKi. For any s ∈ S, s ∈ D(∪i ∈ EKi) if two conditions are satisfied. First, there exists some integerT0 > 0 such that

∑s ∈ ∪i ∈ EKi

(PT0 )s,s > 0, where for any T ∈N \ {0}, PT is the T-step transition matrix. Second, for any i ∈ I \ E

and integer T > 0,∑

s ∈ Ki(PT )s,s = 0. The first condition implies the convergence to ∪i ∈ EKi; and the second condition means

that (Xn)n ∈N cannot reach a recurrent communication class belonging to {Ki : i ∈ I \ E}. In addition, the set D(∪i ∈ EKi) consistsof the elements satisfying the first condition. For any subset W of S and � ∈ (0, �), let ��(W)= inf{n > 0 : X�

n /∈ W} be the firsttime that {X�

n }n ∈N leaves W.Now a couple of basic definitions are given.

Definition 2.1. (path) For any X, Y ⊂ S satisfying X /= ∅, Y /= ∅ and X ∩ Y = ∅, a path from X to Y is a finite sequence ofdistinct states (s0, s1, . . . , st) such that

• s0 ∈ X , si /∈ X ∪ Y for 1 ≤ i ≤ t − 1, and st ∈ Y;• si is accessible from si−1 for 1 ≤ i ≤ t; that is, c(si−1, si) < +∞ where c(·, ·) is the cost function given by the model specifi-

cations in Section 1.

Let l be a path and L(X, Y) be the set of all paths from X to Y. For a path l = (s0, s1, . . . , st), the cost c(l) is defined byc(l)=∑0≤i≤t−1c(si, si+1).

Definition 2.2. (W-graph) Let W be a subset of S. A graph consisting of arrows s1 → s2, s1, s2 ∈ S, is called a W-graph, if

• there is no arrow starting from an element of W;• for any s ∈ S \ W , there exists a unique path leading from s into W.

The second condition in Definition 2.2 is equivalent to:

• each element of S \ W is the initial point of exactly one arrow;• there is no cycle in the graph.

Denote G(W) the set of all W-graphs. The cost of a graph g is defined by c(g)=∑

(s1→s2) ∈ gc(s1, s2). Further, for any

W ⊂ S ⊂ S, GS(W) is the set of W-graphs over the set S.

Definition 2.3. (graphs Gs1,s2 (W)) For any W ⊂ S, s1 ∈ S \ W and s2 ∈ W , Gs1,s2 (W) is the set of W-graphs in which thereexists a path from s1 to s2.

For any W ⊂ S and s ∈ S \ W , G(s /→ W) denotes t he union of G(W ∪ {s}) and∪s ∈ S\W,s /= sGs,s(W ∪ {s}).The remaining part of this section considers the escape dynamics from D(K) for any K ∈K.

2.2. Exit point and exit time

Two useful propositions on the exit point of leaving D(K) are given.

Proposition 2.1. (exit point) For any K ∈K, let

B(D(K))={s ∈ S \ D(K) : ∃s ∈ Ks.t ming ∈ Gs,s(S\D(K))

c(g) = ming ∈ G(S\D(K))

c(g)}.

4 For any K ∈K, K corresponds to a limit set of the unperturbed learning dynamics in games. Young (1993) gives more detailed discussion of the recurrentcommunication class.

1018 Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029

Then, for any s ∈ D(K),

lim�→0

P{X���(D(K)) ∈ B(D(K))|X�

0 = s} = 1.

Proposition 2.2. (exit probability) For any s1, s2 ∈ D(K) and s ∈ B(D(K)),

lim�→0

P{X���(D(K)) = s|X�

0 = s1} = lim�→0

P{X���(D(K)) = s|X�

0 = s2} > 0.

Propositions 2.1 and 2.2 are direct corollaries of Lemma 3.3 in Freidlin and Wentzell (1984, Chap. 6). Note that every twostates within the recurrent communication class K are accessible from each other along a zero-cost path and D(K) is theattraction basin of K. These facts play an important role in the derivation of Propositions 2.1 and 2.2. Proposition 2.1 assertsthat the most possible exit point from D(K) is contained in the set B(D(K)).5Proposition 2.2 implies that the limit distributionof exit probabilities on B(D(K)) is invariant with respect to the initial state in D(K).6

The expectation of exit time ��(D(K)) can be estimated.

Proposition 2.3. (exit time) For any K ∈K and s ∈ D(K),

0 < lim�→0

E[��(D(K))|X�0 = s]

�−R(K)< ∞,

where R(K)=ming ∈ G(S\D(K))c(g).

Proposition 2.3 is an immediate result of Lemma 3.4 in Freidlin and Wentzell (1984, Chap. 6) and gives the convergencerate of the expected exit time. To obtain Proposition 2.3, it is sufficient to show that ming ∈ G(s /→ S\D(K))c(g) = 0 for any s ∈ D(K),which immediately follows from the definition of D(K).

The following is a corollary of Propositions 2.2 and 2.3. It offers an alternative estimation for R(K).

Corollary 2.1. For any K ∈K,

R(K) = minl ∈ L(K,S\D(K))

c(l) = minl ∈ L(K,B(D(K)))

c(l).

R(K) is nothing but the definition of the radius of K in Ellison (2000). In this paper, it is the radius of the 0-cycle K.

2.3. Exit path

We already discussed the most possible exit point and the expected exit time of {X�n }n ∈N escaping from D(K). The next

question that crosses our mind is: What can be said about the exit path from D(K)? The objective of this subsection is tocharacterize the most possible exit path.

At first, two notations are introduced:

��K = sup{n < ��(D(K)) : X�

n ∈ K}and

lK (X�)=(X���

K, X�

��K

+1, . . . , X�

��(D(K))).

lK (X�) is an exit path of the perturbed Markov chain {X�n }n ∈N leaving D(K).

The following theorem details the most possible exit path.

Theorem 2.1. (exit path) Suppose that B(D(K)) is a singleton and there exists a unique path lK ∈ L(K, S \ D(K)) attaining R(K).Then, for any s ∈ D(K),

lim�→0

P{lK (X�) /= lK |X�0 = s} = 0.

Theorem 2.1 suggests that, after escaping from K, the trajectory of the Markov chain {X�n }n ∈N before leaving D(K) is almost

surely lK when the amount of noise � is small enough. If B(D(K)) is not a singleton or there exists multiple extremal paths,similar results hold. In these cases, each extremal path is the most possible exit path with strictly positive probability.Theorem 2.1 is a discrete-time and finite-state version of Theorem 2.3 in Freidlin and Wentzell (1984, Chap 4).

The escape dynamics from D(K) is that it leaves D(K) along the most possible exit path lK and arrives at the most possibleexit point in B(D(K)). The order of the expected exit time is c(lK ).

5 B(D(K)) can be regarded as the boundary of D(K).6 This distribution depends on the details of the random noise (Fudenberg and Imhof, 2006).

Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029 1019

3. Iterative cycle decomposition

This section explores the medium-run behaviour of the perturbed Markov chains {X�n }n ∈N over larger time intervals. To

portray the most possible medium-run behaviour over gradually increased time intervals, we shall iteratively decomposeK, the collection of all limit sets, into a hierarchy of cycles.

3.1. Cycle decomposition

For any s ∈ ∪i ∈ EKi and E ⊂ I, S∪i ∈ EKi(s) denotes the set of the most possible exit points from D(∪i ∈ EKi) provided that X�

0 = sas � → 0. From Lemma 3.3 in Freidlin and Wentzell (1984, Chap. 6), for any s ∈ S \ D(∪i ∈ EKi), s ∈S∪i ∈ EKi

(s) if there exists aGs,s(S \ D(∪i ∈ EKi))-graph g such that

c(g) = ming ∈ G(S\D(∪i ∈ EKi))

c(g).

From Propositions 2.1 and 2.2, for any K ∈K and s ∈ K , SK (s) = B(D(K)). That is to say, for any K ∈K, SK (s) is invariantregarding the specific value of s, s ∈ K . The following definition is not ambiguous.

Definition 3.1. The subset � of K is a cycle of rank 1, if

(1) for anyK, K ∈ �,K ∩ K = ∅, there exists a sequence of distinct limit sets (Kj0 , . . . , Kjt ) ∈ � × · · · × � such that, Kj0 = K , Kjt =K . For any i and s ∈ Kji

, 0 ≤ i ≤ t − 1, SKji(s) ∩ D(Kji+1

) /= ∅, where D(.) is given by the first paragraph in Section 2.1;

(2) for any K ∈ �, L(K) ⊆ �, where for any E ⊂ I,

L(∪i ∈ EKi)={K ∈K : ∃s ∈ ∪i ∈ EKis.tS∪i ∈ EKi(s) ∩ D(K) /= ∅}.

The first condition requires that any two elements of � are accessible through the transitions between the elementswithin �. Each transition consists of the most possible escape dynamics and the convergence of the unperturbed learningdynamics. The second one means that the unperturbed adaptive learning dynamics only converges to one of the limit setswithin � if the initial state is the most possible exit point from D(K) for any K ∈ �.

Definition 3.1 is an extension of Definition 4 in Nöldeke and Samuelson (1993). Nöldeke and Samuelson (1993) onlyfocus on the single-mutation case, while Definition 3.1 considers the most possible escape dynamics which may occurwith multiple mutations (see the application in Section 5). On the other hand, with regard to the definition of a cycle, onecommon feature shared by the existing mathematics literature is that they widely examine the perturbed diffusion Markovprocesses in a continuous-time framework, and the analysis is simplified by reducing the original processes to Markov chainson the collection of limit sets (Freidlin and Wentzell, 1984; Hwang and Sheu, 1990). Each state of these Markov chains isa recurrent communication class which corresponds to a limit set of the original unperturbed processes.7Definition 3.1naturally generalizes the ideas in Freidlin and Wentzell (1984) and Hwang and Sheu (1990) by taking the details of escapedynamics into account.8

We iteratively decompose K into a hierarchy of cycles as follows.Cycles of rank 0 are the recurrent communication classes.According to Definition 3.1, K is decomposed into the cycles of rank 1. For any K ∈K, let �1(K) be the cycle of rank 1

containing K. If there exists a subset � of K satisfying Definition 3.1 and K ∈ �, then �1(K) = �. Otherwise, �1(K) = {K}.Repeat this operation on the elements of K \ �1(K) and so on. At last, all elements of K are exhausted. Let C1 be the set of allcycles of rank 1. For any �1

i, �1

j∈ C1, �1

iand �1

jeither do not intersect or coincide.

We continue the decomposition by induction. Assume that the cycles of rank k − 1 (briefly (k − 1)-cycles) have beenalready given for any k ≥ 2. For any �k−1 ∈ Ck−1, it is a set of (k − 2)-cycles equipped with a cyclic order. For simplicity,denote the set of elements constituting a cycle by the same symbol as the cycle itself. To determine the k-cycle �k(�k−1)containing �k−1 for any �k−1 ∈ Ck−1, it is sufficient to substitute �k−1 for K and substitute Ck−1 forK in Definition 3.1. Followingthe same procedure as the above paragraph, the set Ck−1 can be decomposed into k-cycles.

In the end, for some positive integer k, all (k − 1)-cycles join one k-cycle. Let �(X�) be the minimum of these numbers.Remark. As in Propositions 2.1 and 2.2, for any �k ∈ Ck and s ∈ �k, k ≥ 1, S�k (s) specifies the same set equipped with the

same limit distribution of positive exit probabilities. These facts guarantee that the iterative decomposition described aboveis unambiguous.

The cycle decomposition completely determines the most possible order with which the perturbed Markov chains {X�n }n ∈N

traverse over all recurrent communication classes or all limit sets as the amount of noise, �, tends to be zero. Section 5 willdemonstrate the point.

7 Usually, the original unperturbed process is an ordinary differential equation.8 Nöldeke and Samuelson (1993) and Samuelson (1994) do not pay attention to the escape dynamics or the medium-run behaviour.

1020 Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029

The following theorem estimates the expectation of exit time ��(D(�k)) for any k ≥ 1.

Theorem 3.1. (exit time of cycle) Consider a k-cycle �k with k ≥ 1. For any �k−1 ∈ �k, let

�(�k−1)= ming ∈ G

D(�k−1)∪(S\D(�k))(S\D(�k))

c(g) − ming ∈ G(S\D(�k−1))

c(g)

and

R(�k)=max�k−1 ∈ �k R(�k−1) + min�k−1 ∈ �k

�(�k−1).

Then, for any s ∈ D(�k),

0 < lim�→0

E(

��(D(�k))|X�0 = s

)�−R(�k)

< ∞.

For any k, k ≥ 1, R(�k) is defined as the radius of the k-cycle �k. Theorem 3.1 gives a concise expression for R(�k), whichdepends on the escape dynamics of (k − 1)-cycles constituting �k. The formula in Theorem 3.1 is simpler and more intuitivethan the one in Theorem 6.2 in Freidlin and Wentzell (1984, Chap. 6). In addition, the radii are calculated iteratively.

The following corollary is an immediate result of Theorem 3.1.

Corollary 3.1. Let �k be a k-cycle, k ∈N. Then,

R(�k) < R(�k+1(�k)) < R(�k+2(�k+1(�k))) < · · · .

Corollary 3.1 is intuitive. It implies that the cycles of higher ranks have larger radii.

3.2. Escape dynamics from D(�k) (k ≥ 1)

This subsection investigates the escape dynamics from D(�k) with k ≥ 1. The estimation of the expected exit time fromD(�k) has been given in Theorem 3.1; the most possible exit point and the most possible exit path can be studied in the sameway as Propositions 2.1 and 2.2, and Theorem 2.1 for the 0-cycle. We only focus on the most possible last visited (k − 1)-cyclefrom D(�k).

Proposition 3.1. (last visited (k − 1)-cycle) For any �k−1 ∈ �k and s ∈ D(�k),

lim�→0

P{X���(D(�k))−1

∈ D(�k−1)|X�0 = s} > 0

is equivalent to

�(�k−1) = min�k−1 ∈ �k

�(�k−1).

Proposition 3.1 characterizes the most possible last visited (k − 1)-cycle before leaving D(�k). Its proof follows the sameprocedure as in the proof of Propositions 2.1 and 2.2, and Theorem 3.1.

4. Stochastic stability

This section applies the cycle decomposition to studying stochastic stability. We shall refine the notion of stochasticstability and evaluate the speed at which stochastically stable equilibria emerge.

At first, we estimate the time spent by the perturbed Markov chains in the attraction basin of cycles.

Theorem 4.1. Consider a k-cycle �k with k ≥ 0. For any ˛ > 0 and s ∈ D(�k),

lim�→0

P{

1

�R(�k)−˛< ��(D(�k)) <

1

�R(�k)+˛|X�

0 = s}

= 1

holds.

Theorem 4.1 is an analog of Lemma 2.2 in Hwang and Sheu (1990). It provides an perfect estimation of ��(D(�k)) when� is sufficiently small. That is, R(�k) is a precise measure of the persistence of the set D(�k). It is a better measure than theradius in Ellison (2000) (see the application in Section 5). A feasible direction to extend the results in Ellison (2000) is toimprove the radius.

Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029 1021

4.1. Selection of stochastically stable equilibria

As a result of the irreducibility, for any �, 0 < � < �, there exists a unique invariant distribution �� for {X�n }n ∈N. From

Lemma 3.1 in Freidlin and Wentzell (1984, Chap. 6) and the model specifications in Section 1, �� has a unique limit as � → 0.A state s ∈ S is stochastically stable if lim�→0��(s) > 0.

Applying Lemma 3.1 in Freidlin and Wentzell (1984, Chap. 6) again, for any K ∈K, the convergence rate of ��(s) takes thesame value for all s ∈ K as � → 0. It is sufficient to study the convergence rate of ��(K) as � → 0.

For any K ∈K, there exists a unique sequence of cycles �l(K) ∈ Cl , 0 ≤ l ≤ �(X�), such that

K = �0(K) ∈ �1(K) ∈ . . . ∈ ��(X�)(K).

The following theorem states the convergence rate of ��(K) for any K ∈K, as � → 0.

Theorem 4.2. (convergence rate) For any K ∈K,

0 < lim�→0

��(K)

�h(K)< +∞

holds, where

h(K)=�(X�)−1∑

l=0

[max�l ∈ �l+1(K)R(�l) − R(�l(K))].

Theorem 4.2 implies that S={K ∈K : h(K) = 0} is the collection of all stochastically stable equilibria. The convergence rateof ��(K) as � → 0, is precisely measured by h(K), which is calculated by the radii of cycles.

Suppose that the Markov chains (S, P(�)), 0 ≤ � < � and (S, P) satisfy the conditions prescribed in Section 1. Through thecycle decomposition and Theorem 4.2, we can build up an algorithm to identify stochastically stable equilibria as follows:

Step 1. Decompose K into the hierarchy of cycles and calculate the radii.Step 2. Set k = �(X�) and S�(X�) = C�(X�).Step 3. Determine the set

Sk−1 = {�k−10 ∈ Ck−1 : �k(�k−1

0 ) ∈ Sk and R(�k−10 ) = max

�k−1 ∈ �k(�k−10

)R(�k−1)}.

Step 4. If k = 1, the process ends and S0 is the collection of all stochastically stable equilibria; otherwise, set k = k − 1 andreturn to Step 3.

With this algorithm, we do not have to rely on the tree construction on S to find stochastically stable equilibria. And,this algorithm has more implications. An implication is that stochastically stable equilibria always belong to cycles with thelargest radii. In other words, the cycles containing stochastically stable equilibria are the most persistent. Thus, this papersubstantially extends the result reported in Ellison (2000).

4.2. Waiting time for equilibria selection

According to the theory of Markov chains, the perturbed learning dynamics {X�n }n ∈N will stay in D(S) most of the time as

the amount of noise, �, goes to zero. But, how fast does {X�n }n ∈N reach D(S)? Ellison (1993) is the first to consider this problem

with a learning model in games. He shows that, for a large population, global interaction make the learning model enter intoD(S) more slowly than local interaction. Further, Ellison (2000) analyses more general models. He uses a new measure, the(modified) coradius, to characterize the attraction of the collection of limit sets and to give an upper bound for the speedat which stochastically stable equilibria occur. This subsection aims to precisely measure the speed at which the perturbedlearning dynamics {X�

n }n ∈N arrive at D(S).A couple of notations are given firstly. For any k ≥ 1, ϒ(�k) denotes the second largest number among R(�k−1) for all

�k−1 ∈ �k.9 Let

ϒ=maxK ∈ Smax1≤l≤�(X�)ϒ(�l(K)) and �= inf{n :�Xn

∈ D(S)}.

� is the waiting time before the first arrival at D(S).

9 The second largest number of a singleton is zero.

1022 Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029

From the cycle decomposition and Theorem 4.1, the following proposition is obtained.

Proposition 4.1. (waiting time) For any s ∈ S and ˛ > 0,

lim�→0

P{

� <1

�ϒ+˛|X�

0 = s}

= 1.

And, there exists a state s0 ∈ S such that

lim�→0

P{

1

�ϒ−˛< � <

1

�ϒ+˛|X�

0 = s0

}= 1.

The first equation says that ϒ is an upper bound for the speed at which stochastically stable equilibria emerge as � → 0.The second one implies that the upper bound ϒ can be attained. That is, any number strictly less than ϒ cannot bound thepeed at which stochastically stable equilibria arise as � → 0. Therefore, ϒ exactly describes how fast the perturbed Markovchains {X�

n }n ∈N reach D(S) as � → 0. It is no larger than the (modified) coradius of S (Ellison, 2000). In fact, strict inequalityholds for a large class of dynamics, e.g., the application in Section 5.

5. Application

In this section, we apply the above results to a 3 × 3 symmetric game of Young (1993). More, we employ this game toillustrate the relationship between the cycle decomposition and the methodologies adopted in Ellison (2000).

5.1. The model of Young (1993)

Consider a two-player, 3 × 3 symmetric game, where the payoff matrix is

A B C

A (2, 2) (2, 1) (0, 0)

B (1, 2) (3, 3) (3, 0)

C (0, 0) (0, 3) (4, 4)

It is clear that the payoff functions are

uA = 2(pA + pB), uB = 3 − 2pA, and uC = 4(1 − pA − pB),

where ui, i ∈ {A, B, C}, is the payoff of adopting the pure strategy i; (pA, pB, 1 − pA − pB) denotes the other player’s mixedstrategy and pA and pB are the weights of the pure strategy A and B, respectively. The outcomes (A, A), (B, B) and (C, C)constitute three strict pure strategy Nash equilibria.

The game is repeatedly played by bounded rational players. In each period t = 0, 1, 2, . . ., two players are randomly drawnfrom a large, finite population of individuals. The strategy profile (a1(t), a2(t)) chosen by both players from {A, B, C} × {A, B, C}is defined as the play at time t. Let a(t)=(a1(t), a2(t)) and hm(t)= (a(t − m), a(t − m + 1), . . . , a(t − 1)). hm(t) is the history inperiod t with memory m.

Suppose that the first m plays are randomly selected. In period t, t ≥ m, each player inspects k plays drawn withoutreplacement from hm(t), where k, 1 ≤ k ≤ m, is an integer. Then, he or she chooses the optimal strategy. An optimal strategyis the best response to the distribution of the average of the other player’s strategy in these k plays. Assume that the‘inspecting’ processes are independent across players and periods. The adaptive learning dynamics define a Markov chain{Xt}t ∈N with Xt = hm(t + m), for any t ∈N.

According to Theorem 1 in Young (1993), if m ≥ 3k, the adaptive learning dynamics {Xt}t ∈N will converge to one of thelimit states: A, B, or C, where

A-=

⎛⎝(A, A), . . . (A, A)︸ ︷︷ ︸

m

⎞⎠ ,

B-=

⎛⎝(B, B), . . . (B, B)︸ ︷︷ ︸

m

⎞⎠ ,

Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029 1023

and

C- =

⎛⎝(C, C), . . . (C, C)︸ ︷︷ ︸

m

⎞⎠ ,

As in Young (1993), the perturbed learning dynamics are generated as follows. Rather than always follow the best-replydynamics, each player independently experiments with a probability � in each period. When a player experiments, eachstrategy in {A, B, C} is selected with the equal probability (1/3).10 Thus, the perturbed learning dynamics form a family ofirreducible Markov chains. For any � ∈ (0, 1), {X�

t }t ∈N has a unique invariant distribution ��. �� will almost surely concentratearound a particular convention (the stochastically stable equilibrium) as the experimenting probability � is vanishing. Forsimplicity, suppose that k = 3 and m ≥ 9.

5.2. The cycle decomposition

Since the game is symmetric, it does not matter which player experiments and which player reacts. Assume that the rowplayer experiments with a randomly selected strategy. To make it easy to follow, we divide the derivation into three steps.In the first step, the row player experiments in succession; the column player reacts after observing the most recent threeplays. In the second step, the row player reacts after inspecting the most recent three plays; the column player reacts againafter inspecting the experimented plays of the row player. In the final step, both players take an optimal strategy in responseto the most recent three plays. The first step describes the most possible escape dynamics; the last two steps correspond tothe unperturbed learning dynamics. For the sake of brevity, this section only gives the experimented plays of the row player.

The cycle decomposition is manipulated as in the following steps:

• Cycles of rank 0 A, B and C are three 0-cycles.• Cycles of rank 1

The most possible exit paths from the attraction basin of 0-cycles are:

A-=(A, . . . A, A, A)︸ ︷︷ ︸m

→ (A, . . . A, A︸ ︷︷ ︸m−1

, C) ∈ D(B-),

B-=(B, . . . B, B, B)︸ ︷︷ ︸m

→ (B, . . . B, B︸ ︷︷ ︸m−1

, A) → (B, . . . B︸ ︷︷ ︸m−2

, A, A) ∈ D(A-),

C- =(C, . . . C, C, C)︸ ︷︷ ︸m

→ (C, . . . C, C︸ ︷︷ ︸m−1

, B) ∈ D(B-),

where ‘→’ denotes that the row player experiments one time. As an illustration, the escape dynamics from D(A) are givenin Appendix A.

From Corollary 2.1, R(A) = 1, R(B) = 2, and R(C) = 1. According to the decomposition process in Section 3.1, three 0-cyclesform two 1-cycles: {A, B} and {C}.

• Cycles of rank 2The most possible exit paths from the 1-cycle {A, B} are

A-=(A, . . . A, A, A)︸ ︷︷ ︸m

→ (A, . . . A, A︸ ︷︷ ︸m−1

, C) → (A, . . . A︸ ︷︷ ︸m−2

, C, C) ∈ D(C- ),

10 For more general discussion of the experimenting process, see Young (1993). As in Young (1993), this section only considers the homogenous noise,but the analysis holds for the case of state-dependent noise (Bergin and Lipman, 1996).

1024 Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029

Fig. 1. The cycle decomposition.

and

B-=(B, . . . B, B, B)︸ ︷︷ ︸m

→ (B, . . . B, B︸ ︷︷ ︸m−1

, C) → (B, . . . B︸ ︷︷ ︸m−2

, C, C) → (B, . . . B︸ ︷︷ ︸m−3

, C, C, C) ∈ D(C- ).

From Theorem 3.1,

R({A, B}) = max{1, 2} + min{2 − 1, 3 − 2} = 3.

Two 1-cycles {A, B} and {C} form the unique 2-cycle. The cycle decomposition is completed, as illustrated in Fig. 1.According to the cycle decomposition and Theorem 4.2,

h(A) = 1, h(B) = 0, and h(C) = 2;

that is,

0 < lim�→0

��(A)�

< ∞, lim�→0

��(B) = 1, and 0 < lim�→0

��(C)�2

< ∞.

We obtain the stochastic stability of the convention B as well as the convergence rates of ��(A) and ��(C).11

Further,

ϒ = max{1, 1} = 1.

As a result of Proposition 4.1, the speed at which the convention B emerges is 1. In detail, waiting time before the perturbedlearning dynamics {X�

t }t ∈N entering into D(B) is no more than O(�−1) as � tends to be zero.The game offers us an opportunity to present the relationship between the results in this paper and the measures in

Ellison (2000). To avoid confusion, E is added to the notations in Ellison (2000), e.g., the radius RE(·). If � = {A, B}, accordingto the formula in Ellison (2000, p. 23),

RE(�) = 2.

That is, RE(�) < R(�). The reason is that Ellison (2000) misses the analysis of the medium-run behavior over the limitsets composing �. In addition, according to the formula in Ellison (2000, p. 28), it is straightforward to see that

CR∗E(B) = 3,

where CR∗E(B) is the modified coradius of the stochastically stable equilibrium B. A notable fact is that CR∗

E(B) > ϒ. In otherwords, compared with the modified coradius, ϒ offers a more exact measure of the speed at which the stochastically stableequilibrium B arises.

Finally, we describe how the perturbed learning dynamics traverse over the conventions A, B and C provided that theprobability of experimenting, �, tends to be zero. Without loss of generality, assume that the initial state is in D(A). First,it will enter into A after finite steps. Before t < O(�−1+˛), it will stay in D(A), where ˛ > 0 is a sufficiently small constant.During O(�−1−˛) < t < O(�−2+˛) it will exit from D(A) along the most possible exit path and enter into D(B). Then, duringO(�−2−˛) < t < O(�−3+˛), it will escape from D(B) entering into D(A) and escape from D(A) entering into D(B), and so on. Theprocess of moving backwards and forwards between A and B forms the 1-cycle {A, B}. When t > O(�−3−˛), it will exit fromD({A, B}) along the most possible exit path and enter into D(C). After staying there with time no longer than O(�−1−˛), it exitsfrom D(C) along the most possible exit path and enter into D(B). Now, another trip starts.

11 The results in Ellison (2000) depends on the choice of �, the collection of limit sets. If � = {A, B}, Theorem 1 or Theorem 2 in Ellison (2000) works andthe stochastically stable equilibrium is contained in �; if � = {B}, the results in Ellison (2000) lack power. The cycle decomposition is not troubled by theproblem.

Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029 1025

6. Conclusion

This paper has explored the medium-run or sublimit behaviour of the adaptive learning dynamics in games. Built on themost possible medium-run behaviour over the gradually increased time intervals, a hierarchical structure of cycles has beenintroduced. The cycle decomposition specifies the most possible order with which the perturbed learning dynamics traverseover all limit sets as the amount of noise goes to zero. When the speed at which stochastically stable equilibria occur is verylow, the cycle decomposition describes what equilibria will be observed in the first million or trillion periods once the initialstate is known and the noise vanishes. In other words, the cycle decomposition fully explores the dynamic behaviour overthe intermediate equilibria before the emergence of stochastically stable equilibria.

In addition, this paper contributes to the existing literature through the study of stochastic stability. Among the cycleswith the same rank, it is always the most difficult to escape from the attraction basin of the cycles containing stochasticallystable equilibria. The cycle decomposition accurately estimates the expected exit time from a particular collection of limitsets, and it exactly measures the speed at which stochastically stable equilibria can occur. Both results have extensiveapplications in the dynamic analysis of economic phenomena.

Acknowledgements

We are indebted to Professor Chenghu Ma and an anonymous referee for very useful comments and suggestions. We arealso grateful to Xuan Liu, Yan Wen, Jin Zhang and Lei Zu for helpful discussions and comments. This work is partly supportedby NSFC Nos. 10571157 and 10871176.

Appendix A.

Proof of Theorem 2.1. First of all, we give a lemma from Catoni and Cerf (1997), which is indispensable for the proof. �

Lemma 1 (expected number of visits before exit). Assume that the Markov chain (Q, S) is irreducible. Then, for any nonemptysubset W of S and s1, s2 ∈ S \ W ,

+∞∑n=0

P{Xn = s2, �W > n|X0 = s1} =

∑g ∈ Gs1,s2 (W∪{s2})

�(g)

∑g ∈ G(W)

�(g),

where �W = inf{n > 0 : Xn ∈ W}, and for any graph g oriented on S, �(g)=∏

(s1→s2) ∈ gQs1,s2 .12

For any � ∈ (0, �) and l = (s0, s1, . . . , st) ∈ L(K, S \ D(K)),

P{lK (X�) = (s0, s1, · · · , st)|X�0 = s}

=+∞∑k=0

P{lK (X�) = (s0, s1, · · · , st), ��K = k, X�

��K

= s0|X�0 = s}

=+∞∑k=0

P{lK (X�) = (s0, s1, · · · , st), ��K = k, X�

k = s0|X�k = s0}

×P{X�k = s0, k < ��(D(K))|X�

0 = s}

=( +∞∑

k=0

P{X�k = s0, k < ��(D(K))|X�

0 = s})

×P{lK (X�) = (s0, s1, · · · , st), ��K = 0|X�

0 = s0},

where the second equality follows from the Markov property.

12 If s1 = s2, Gs1,s2 (W ∪ {s2}) is equal to G(W ∪ {s2}).

1026 Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029

From the definition of D(K), the accessibility between the elements from K and Lemma 1,

0 < lim�→0

P{lK (X�) = (s0, s1, · · · , st)|X�0 = s}

�c(l)−R(K)< ∞.

Theorem 2.1 is an immediate result of the above equation. �

Proof of Theorem 3.1. We prove the theorem by induction on k.According to Lemma 3.4 in Freidlin and Wentzell (1984, Chap. 6), for any s ∈ D(�k),

E(

��(D(�k))|X�0 = s

)=

∑g ∈ G(s /→ S\D(�k))

��(g)

∑g ∈ G(S\D(�k))

��(g),

where for any graph g oriented on S, ��(g)=∏

(s1→s2) ∈ gPs1,s2 (�). According to the model specifications in Section 1, for

any s ∈ D(�k),

0 < lim�→0

E(

��(D(�k))|X�0 = s

)�−CR(s,�k)

< ∞,

where CR(s, �k)=ming ∈ G(S\D(�k))c(g) − ming ∈ G(s /→ S\D(�k))c(g).From the first condition in Definition 3.1,

ming ∈ G(S\D(�k))

c(g)

= min�k−1 ∈ �k

[min

g ∈ GD(�k)

(D(�k−1))c(g) + min

g ∈ GD(�k−1)∪(S\D(�k))

(S\D(�k))c(g)

]

= min�k−1 ∈ �k

[ ∑�k−1 ∈ �k

ming ∈ G(S\D(�k−1))

c(g) − ming ∈ G(S\D(�k−1))

c(g)

+ ming ∈ G

D(�k−1)∪(S\D(�k))(S\D(�k))

c(g)

]

= min�k−1 ∈ �k

[min

g ∈ GD(�k−1)∪(S\D(�k))

(S\D(�k))c(g) − min

g ∈ G(S\D(�k−1))c(g)

]

+∑

�k−1 ∈ �k

ming ∈ G(S\D(�k−1))

c(g). (1)

Assume that s ∈ D(K1), where K1 ∈ �k. For any s ∈ K1,

ming ∈ G(s /→ S\D(�k))

c(g) = ming ∈ G(s /→ S\D(�k))

c(g)

= min{ ming ∈ G(K1∪(S\D(�k)))

c(g), minK ∈ �k\K1

ming ∈ G(K∪(S\D(�k)))

c(g)}

= minK ∈ �k

ming ∈ G(K∪(S\D(�k)))

c(g) = minK ∈ �k

ming ∈ G

D(�k)(K)

c(g).

Thus, for any s ∈ D(�k),

CR(s, �k) = ming ∈ G(S\D(�k))

c(g) − minK ∈ �k

ming ∈ G

D(�k)(K)

c(g).

Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029 1027

In other words, CR(s, �k) takes the same value for all s ∈ D(�k). Further,

minK ∈ �k

ming ∈ G

D(�k)(K)

c(g)

= min�k−1 ∈ �k

[min

g ∈ GD(�k)

(D(�k−1))c(g) + min

K ∈ �k−1min

g ∈ GD(�k−1)

(K)c(g)

]

= min�k−1 ∈ �k

[min

K ∈ �k−1min

g ∈ GD(�k−1)

(K)c(g) − min

g ∈ G(S\D(�k−1))c(g)

]

+∑

�k−1 ∈ �k

ming ∈ G(S\D(�k−1))

c(g)

= −max�k−1 ∈ �k

[min

g ∈ G(S\D(�k−1))c(g) − min

K ∈ �k−1min

g ∈ GD(�k−1)

(K)c(g)

]

+∑

�k−1 ∈ �k

ming ∈ G(S\D(�k−1))

c(g)

=∑

�k−1 ∈ �k

ming ∈ G(S\D(�k−1))

c(g) − max�k−1 ∈ �k R(�k−1). (2)

The case k = 1 follows from Proposition 2.3. Cases k ≥ 1 are derived by induction.As a result of Eqs. (1) and (2), for any s ∈ D(�k),

CR(s, �k)

= min�k−1 ∈ �k

[min

g ∈ GD(�k−1)∪(S\D(�k))

(S\D(�k))c(g) − min

g ∈ G(S\D(�k−1))c(g)

]

+max�k−1 ∈ �k R(�k−1) = R(�k).

The proof is completed. �

Proof of Theorem 4.1. We only consider the case k = 0. For the cases k ≥ 1, the proof can be completed in the same wayas the case k = 0.

Let B(K)={s ∈ S \ K | ∃s ∈ Ks.tc(s, s) < +∞}. Without loss of generality, assume that B(K) ⊂ D(K).By the Chebyshev’s inequality,

lim�→0

P{��(D(�k)) ≥ 1

�R(�k)+˛|X�

0 = s}

≤ lim�→0

E(

��(D(�k))|X�0 = s

)× �R(�k)+˛

= lim�→0

[E(

��(D(�k))|X�0 = s

)× �R(�k)] × �˛ = 0.

The last equality follows from Theorem 3.1.Now, it is sufficient to show that

lim�→0

P{��(D(�k)) ≤ 1

�R(�k)−˛|X�

0 = s} = 0.

To prove it, an increasing sequence of Markov times is introduced as follows:

��0 = 0, �

0 = inf{n > 0 :�Xn

∈ B(K)};

1028 Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029

for any n ≥ 1,

��n= inf{n > n−1 : X�

n ∈ K ∪ (S \ D(K))},

�n= inf{n > �n : X�

n ∈ B(K)}.The rest of proof is the same as Theorem 4.2 in Freidlin and Wentzell (1984, Chap. 4). �

Proof of Theorem 4.2. It is sufficient to prove that for any K ∈K,

0 < lim�→0

��(D(K))

�h(K)< +∞.

For any �, 0 < � < �, the perturbed Markov chain {X�n }n ∈N is irreducible. Therefore, for any initial distribution, the invariant

distribution �� is the limit of the frequency of the occurrence of each state during the first n periods, as n → +∞. In otherwords, for any i ∈ S,

��(i) = limn→+∞

n∑l=0

1{i}(X�l

)

n + 1,

and for any W ⊂ S,

��(W) = limn→+∞

n∑l=0

1W (X�l

)

n + 1.

Following from Theorem 4.1, for any ��(X�)−1 ∈ ��(X�),

0 < lim�→0

��(D(��(X�)−1))

�h�(X�)−1(��(X�)−1)< +∞,

where

h�(X�)−1(��(X�)−1)=max��(X�)−1 ∈ ��(X�) R(��(X�)−1) − R(��(X�)−1).

For any ��(X�)−2 ∈ ��(X�)−1, R(��(X�)−2) < R(��(X�)−1). From Theorem 4.1, when the amount of noise � is sufficientlysmall, {X�

n }n ∈N will traverse over

{��(X�)−2 ∈ C�(X�)−2 : ��(X�)−2 ∈ ��(X�)−1}

C(1/�)�(��(X�)−1) times before exit from D(��(X�)−1) with a probability sufficiently close to 1, where C is a constant, and

�(��(X�)−1)=R(��(X�)−1) − max��(X�)−2 ∈ ��(X�)−1 R(��(X�)−2).

Just as in the case �(X�) − 1, for any ��(X�)−2 ∈ ��(X�)−1,

0 < lim�→0

��(D(��(X�)−2))

�h�(X�)−2(��(X�)−2)+h�(X�)−1(��(X�)−1)< +∞,

where

h�(X�)−2(��(X�)−2)=max��(X�)−2 ∈ ��(X�)−1 R(��(X�)−2) − R(��(X�)−2).

Hence the result follows by keeping the process on to 0-cycles. �

Escape dynamics from D(A). The escape dynamics from D(A) is illustrated by the following table:

Period 1 2 · · · m m + 1 m + 2 m + 3 m + 4 m + 5 m + 6 · · ·Row A A · · · A C A A B B B · · ·Column A A · · · A A B B B B B · · ·

The play in period m + 1 corresponds to the first step in the deriving process—the row player adopting the pure strategyC by experimenting; the plays in period m + 2, . . . , m + 5 correspond to the second step, where in each period, the columnplayer reacts after observing the strategies chosen by the row player in periods m − 1, m, m + 1; the plays in periods m +6, . . . , 2m + 5 constitute the convention B, and they correspond to the final step. Only one single noise occurs in periodm + 1.

Z. Cui, J. Zhai / Journal of Mathematical Economics 46 (2010) 1015–1029 1029

References

Beggs, A., 2005. Waiting times and equilibrium selection. Economic Theory 25, 599–628.Bergin, J., Lipman, B., 1996. Evolution with state-dependent mutations. Journal of Economic Theory 64, 943–956.Catoni, O., Cerf, R., 1997. The exit path of a Markov chain with rare transitions. ESAIM: Probability and Statistics 132, 95–144.Chen, H., Chow, Y., 2001. On the convergence of evolution processes with time-varying mutations and local interaction. Journal of Applied Probability 38,

301–323.Chiang, T., Chow, Y., 1989. On the problem of exit from cycles for simulated annealing processes—a backward equation approach. The Annals of Probability

17, 1483–1502.Chiang, T., Chow, Y., 1998. A limit theorem for a class of inhomogeneous Markov chain. The Annals of Applied Probability 8, 896–916.Ellison, G., 1993. Learning, local interaction, and coordination. Econometrica 61, 1047–1071.Ellison, G., 2000. Basion of attraction, long-run stochastic stability, and the speed of step-by-step evolution. Review of Economic Studies 67, 17–45.Ely, J., 2002. Local conventions. Advances in Theoretical Economics 2, Article 1.Foster, D.P., Young, H.P., 1990. Stochastic evolutionary game dynamics. Theoretical Population Biology 38, 219–232.Freidlin, M., Wentzell, A., 1984. Random Perturbations of Dynamic Systems, 1st edition. Springer-Verlag, Berlin.Fudenberg, D., Imhof, L.A., 2006. Imitation processes with small mutations. Journal of Economic Theory 131, 251–262.Fudenberg, D., Levine, D.K., 1998. The Theory of Learning in Games. The MIT Press.Fudenberg, D., Levine, D.K., 2009. Learning and equilibrium. Annual Review of Economics 1, 385–420.Goyal, S., Vega-Redondo, F., 2005. Network formation and social coordination. Games and Economic Behavior 50, 178–207.Hwang, C., Sheu, S., 1990. Large-time behavior of perturbed diffusion Markov processes with applications to the second eigenvalue problem for

Fokker–Planck operators and simulated annealing. Acta Applicandae Mathematicae 19, 253–295.Kandori, M., Mailath, G., Rob, R., 1993. Learning, mutation, and long run equilibria in games. Econometrica 61, 29–56.Kandori, M., Rob, R., 1995. Evolution in the long run: a general theory and applications. Journal of Economic Theory 65, 383–414.Nöldeke, G., Samuelson, L., 1993. An evolutionary analysis of backward and forward induction. Games and Economic Behavior 5, 425–454.Samuelson, L., 1994. Stochastic stability in games with alternative best replies. Journal of Economic Theory 64, 35–65.Young, H.P., 1993. The evolution of conventions. Econometrica 61, 57–84.Young, H.P., 1998. Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Princeton University Press, Princeton, NJ.Young, H.P., 2006. Social dynamics: theory and applications. In: Tesfatsion, L., Judd, K.L. (Eds.), Handbook of Computational Economics, V2: Agent-Based

Computational Economics. Elsevier, North-Holland, pp. 1082–1108.