22
http://www.econometricsociety.org/ Econometrica, Vol. 75, No. 6 (November, 2007), 1591–1611 SOCIAL LEARNING IN ONE-ARM BANDIT PROBLEMS DINAH ROSENBERG Institut Galilée, Université Paris Nord, 93430 Villetaneuse, France, and Laboratoire d’Econométrie de l’Ecole Polytechnique, Paris, France EILON SOLAN The School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel NICOLAS VIEILLE HEC, 78 351 Jouy-en-Josas, France The copyright to this Article is held by the Econometric Society. It may be downloaded, printed and reproduced only for educational or research purposes, including use in course packs. No downloading or copying may be done for any commercial purpose without the explicit permission of the Econometric Society. For such commercial purposes contact the Office of the Econometric Society (contact information may be found at the website http://www.econometricsociety.org or in the back cover of Econometrica). This statement must the included on all copies of this Article that are made available electronically or in any other format.

Social Learning in One-Arm Bandit Problemseilons/bandit_revised9.pdf · Econometrica, Vol. 75, No. 6 (November, 2007), 1591–1611 SOCIAL LEARNING IN ONE-ARM BANDIT PROBLEMS BY DINAH

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • http://www.econometricsociety.org/

    Econometrica, Vol. 75, No. 6 (November, 2007), 1591–1611

    SOCIAL LEARNING IN ONE-ARM BANDIT PROBLEMS

    DINAH ROSENBERGInstitut Galilée, Université Paris Nord, 93430 Villetaneuse, France, and Laboratoire

    d’Econométrie de l’Ecole Polytechnique, Paris, France

    EILON SOLANThe School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel

    NICOLAS VIEILLEHEC, 78 351 Jouy-en-Josas, France

    The copyright to this Article is held by the Econometric Society. It may be downloaded,printed and reproduced only for educational or research purposes, including use in coursepacks. No downloading or copying may be done for any commercial purpose without theexplicit permission of the Econometric Society. For such commercial purposes contactthe Office of the Econometric Society (contact information may be found at the websitehttp://www.econometricsociety.org or in the back cover of Econometrica). This statement mustthe included on all copies of this Article that are made available electronically or in any otherformat.

    http://www.econometricsociety.org/

  • Econometrica, Vol. 75, No. 6 (November, 2007), 1591–1611

    SOCIAL LEARNING IN ONE-ARM BANDIT PROBLEMS

    BY DINAH ROSENBERG, EILON SOLAN, AND NICOLAS VIEILLE1

    We study a two-player one-arm bandit problem in discrete time, in which the riskyarm can have two possible types, high and low, the decision to stop experimenting isirreversible, and players observe each other’s actions but not each other’s payoffs. Weprove that all equilibria are in cutoff strategies and provide several qualitative resultson the sequence of cutoffs.

    KEYWORDS: Social learning, one-arm bandit, equilibrium, cutoff strategies.

    INTRODUCTION

    RECENT MODELS of strategic experimentation (see Bolton and Harris (1999)and Keller, Rady, and Cripps (2005)) feature players who face identical banditproblems with a risky arm and a safe arm. It is assumed that players are freeto switch from one arm to the other and that all information—both the actionsand the actual rewards of the players—is publicly disclosed. The latter assump-tion appears to be restrictive in many economic setups, and the necessity todrop it has been recognized; see, for example, Bolton and Harris (1999).2

    Here we address this task partially and study the following model. Each oftwo players faces a bandit machine with a risky arm and a safe arm. The riskyarm is either of the high type, which yields independent and identically distrib-uted (i.i.d.) payoffs with positive expectation, or of the low type, which yieldspayoffs with negative expectation. The machines of the two players have thesame type. The decision to switch to the safe arm is irreversible. Along the play,each player observes her opponent’s choices, but not her opponent’s payoffs.

    Dropping the assumption that payoffs are publicly observed raises new is-sues. Player i would like to make inferences about player j’s observations onthe basis of player j’s actions, but cannot do so without knowing how playerj’s decisions relate to player j’s observations, that is, j’s strategy. As a con-sequence, there is no commonly observed state variable, such as a commonposterior belief, on which to condition one’s actions.

    1We are indebted to Itzhak Gilboa for his help. We thank Yisrael Aumann, Pierpaolo Batti-galli, Thomas Mariotti, Peter Sozou, and Shmuel Zamir for their suggestions, and seminar au-diences at Caltech, Hebrew University of Jerusalem, LSE, Northwestern, Paris, Stanford, TelAviv University, Toulouse 1 University, the Workshop on Stochastic Methods in Game Theory inErice, and the Second World Congress of the Game Theory Society. The research of the secondauthor was supported by the Israel Science Foundation, grant 69/01.

    2Other models of strategic experimentation include Bergemann and Välimäki (1997, 2000)and Décamps and Mariotti (2004). Bergemann and Välimäki (1997, 2000) studied a model ofsellers and buyers who learn the value of new products by experimentation. Décamps and Mar-iotti (2004) studied a specific duopoly model where each player learns about the quality of acommon value project by observing some public information plus the experience of her rival.

    1591

    http://www.econometricsociety.org/

  • 1592 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    Our main result states that, nevertheless, all equilibrium strategies processinformation in a simple way: at each stage a player (i) computes the conditionalprobability that the type is high, using only her private observations (i.e., herown payoffs), (ii) determines a time-dependent cutoff, which depends on thepublic information (the decisions of the other player), and (iii) switches to thesafe arm if the conditional probability does not exceed the cutoff.

    The intuition for this result runs as follows. Once player j’s strategy is given,player i faces an optimal stopping problem. It turns out that the pair formed byplayer j’s status (active or not) and player i’s private belief follows a Markovprocess. This observation crucially relies on the property that the payoffs toboth players are conditionally independent given the type of the machines, andit allows one to recast player i’s optimal continuation payoff as a function ofthis pair. We next prove that, everything else being equal, player i’s continua-tion payoff increases with respect to (w.r.t.) her private belief and is, therefore,positive (resp. negative) above (resp. below) some cutoff.

    We also prove that the equilibrium cutoffs are nonincreasing as long as theother player is active and are constant afterward: seeing the other player activeinduces a player to be more optimistic and, therefore, to stay active with lowerbeliefs associated to private payoffs. Finally, we argue that, as the number ofplayers increases, there is eventually a unique equilibrium that can be explicitlyderived.

    Our model is equivalent to a multiplayer version of the standard real-optionsproblem (see Dixit and Pindyck (1994, p. 136)) in which an investor has tochoose when to invest in some project with uncertain prospects. This equiva-lence is discussed in Section 2.2.

    The model also relates to the literature on social learning with endogenoustiming. In Chamley and Gale (1994) (see also Chamley (2004)), players are en-dowed with private information on the state of nature and must decide whento “invest.” Like here, externalities are purely informational, decisions are ir-reversible, and information is private. The main difference is that private in-formation is received only once, in contrast to our setup where it keeps flowingin.

    Finally, some studies in biology address similar issues. In some contexts, an-imals can learn some relevant information by observing the behavior of otheranimals of the same species. Such behavior has been studied by, for example,Valone and Templeton (2002) and Giraldeau, Valone, and Templeton (2002).

    The paper is organized as follows. In Section 1, we present the model andthe main results. Comments and extensions appear in Section 2. All proofs arerelegated to the Appendix.

  • ONE-ARM BANDIT PROBLEMS 1593

    FIGURE 1.—Evolution of the game.

    1. MODEL AND RESULTS

    1.1. The Game

    Each of two players operates a one-arm bandit machine in discrete timeand must decide when to stop operating the machine. The decision to stop isirreversible and yields an outside payoff normalized to zero.

    At each stage n ≥ 0, the following sequence of events unfolds (see Figure 1).First, each (active) player i decides whether to drop out (that is, stop operatingthe machine) or not. If she chooses the latter, she receives a random payoffXin and observes who decided to stay in the game. Thus, payoffs are privateinformation, while the exit decisions are publicly observed.

    Player i’s machine is one of two types: high or low. The two machines havethe same type Θ, which is chosen by nature at the outset of the game accordingto a known prior. We assume that, conditional on Θ, the payoffs (Xin�X

    jn),

    n ≥ 0, are i.i.d.Denote by θ (resp. θ) the expected stage payoff of a machine of type high

    (resp. low), which we identify with the type of the machine. To eliminate trivialcases, we assume that θ < 0 < θ. The players discount payoffs at a commonrate δ ∈ (0�1).

    Note that the payoff to player i is not affected by player j’s decisions. How-ever, insofar as player j’s decisions are affected by her payoffs, and since thepayoffs to both players are correlated, player j’s decisions may be used byplayer i to infer information about Θ.

    1.1.1. Strategies

    Let (Ω�P) be the probability space over which all random variables aredefined. Pθ stands for the conditional probability given Θ = θ. Expectationsw.r.t. P and Pθ are denoted E and Eθ, respectively. The prior probability thatthe machines’ type is high is denoted by p0 := P(Θ = θ).

    A strategy of player i specifies when to drop out from the game. At stage n,player i’s private information consists of her past payoffs (Xi0� � � � �X

    in−1). We

    denote by F in = σ(Xi0� � � � �Xin−1) the σ-algebra over Ω defined by player i’sprivate information at stage n.

  • 1594 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    In addition, player i knows if (and when) the other player, player j, droppedout. We denote by α ∈ N ∪ {�} the status variable of player j: α = � if player jis still active and α= k if player j dropped out at stage k. Accordingly, we havethe following definition of a pure strategy.

    DEFINITION 1: A pure strategy of player i is a family φi = (τi(α)�α ∈ N∪{�})of stopping times for the filtration (F in)n∈N, with the property that τi(k) > k, Palmost surely (P-a.s.) for each k ∈ N.

    Player i’s behavior until player j drops out is described by τi(�), while τi(k)(k ∈ N) describes her behavior after stage k, in the event player j drops out atstage k.

    Cutoff strategies process information in the simplest way. At any stage, thedecision whether to drop out or to continue is made by computing the beliefassigned to the state high, given only one’s own private information, and com-paring it to a time-dependent cutoff.

    Formally, for every stage n ∈ N, we define pin := P(Θ = θ | F in). This is theposterior belief over the machine’s type, taking into account player i’s privateinformation. We call it the private belief of player i.

    DEFINITION 2: A strategy φi is a cutoff strategy if there exist πin(α) ∈ [0�1],(n ∈ N and α ∈ {��0�1� � � � � n − 1}), such that τi(�) = inf{n ≥ 0 :pin ≤ πin(�)}and τi(k) = inf{n > k :pin ≤ πin(k)} for each k ∈ N.3

    Figure 2 shows typical cutoffs of a cutoff strategy (here we depict only π1n(�);they are decreasing, and are denoted by large circles), with a typical evolutionof the private belief of the player (assuming player 2 stays in throughout). Inthis figure, the player stops at stage 7, once her private belief falls below hercutoff.

    Mixed strategies are probability distributions over pure strategies. FollowingAumann (1964), this is formalized by supplying each player i with an exter-nal randomization device uniformly distributed over [0�1], which is privatelyobserved at the outset of the game.

    Given a pair of (pure or mixed) strategies φ = (φ1�φ2), we denote ti(φ) ∈N ∪ {+∞} as the stage in which player i drops out. That is, t1(φ) = n if either(i) both τ2(�) ≥ n and τ1(�) = n or if (ii) both τ2(�) = k, τ1(�) > k, andτ1(k) = n for some k< n. In the former case, player 1 drops out before or withplayer 2, whereas in the latter, player 2 drops out first at stage k.

    Player i’s overall payoff is the discounted sum ri(φ) := ∑ti(φ)−1n=0 δnXin of pay-offs received prior to dropping out. Her expected payoff is γi(φ) := E[ri(φ)].

    3These cutoffs need not be uniquely defined. For example, if π and π ′ are such that P(π ′ ≤p1i ≤ π)= 0, then any choice of π1n(α) in the interval [π ′�π] gives rise to the same strategy.

  • ONE-ARM BANDIT PROBLEMS 1595

    FIGURE 2.—Typical evolution of the private belief.

    1.2. Main Results

    All of our results are obtained under Assumption A below.

    ASSUMPTION A: The law of the private belief pi1 held by player i at stage 1has a density (w.r.t. Lebesgue measure).

    The assumption means that the distribution of Xin given θ has a densityfθ whose support does not depend on θ, and, moreover, the likelihood ratio(fθ(X

    in))/(fθ(X

    in)) is a random variable (r.v.) that has a density. In particular,

    the probability that pi1 = pi2 is 0; with probability 1, the players hold differentprivate beliefs. By Bayes’ rule, the likelihood ratio at stage 1 is then given by

    pi11 −pi1

    = p01 −p0 ×

    fθ(Xi0)

    fθ(Xi0)�

    Thus, the existence of a density for pi1 is equivalent to the existence of a densityfor the r.v. (fθ(Xi0))/(fθ(X

    i0)). Under Assumption A, the private belief p

    in has

    a density for each n≥ 0.Our first result is standard and claims that a symmetric equilibrium exists.

    THEOREM 3—Existence: The game has a symmetric equilibrium.

    The uniqueness issue is addressed in Section 2.3. The next two results char-acterize the equilibrium strategies.

    THEOREM 4—Structure: All equilibria are in cutoff strategies.

    We actually prove that any best reply is a cutoff strategy and, therefore, so isany rationalizable strategy.

  • 1596 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    According to Theorem 4, all equilibria are pure and process information ina simple way. The interaction is incorporated in the way cutoff values dependon time and public information: if player j drops out at stage n, player i takesit into account by changing the cutoff and using πi·(n) rather than π

    i·(�) from

    then on; if player j does not drop out by stage n, player i takes this fact intoaccount by using at stage n + 1 the cutoff πin+1(�) rather than πin(�) that sheused at stage n.

    At a given stage, equilibrium behavior is monotonic w.r.t. the private be-lief. However, private belief need not be monotonic w.r.t. payoffs, unless thelikelihood ratio fθ/fθ is monotonic; high payoffs need not be good news; seeMilgrom (1981).

    If player j drops out at stage k, player i remains alone and gets no more pub-lic information from player j. The continuation game starting at stage k+ 1 isthen analogous to a one-player game, where the initial prior takes into accountthe private belief pik+1 of player i and the fact that player j dropped out at stagek. This one-player version is equivalent to the usual one-arm bandit problem,in which exit decisions are reversible; see Chow and Robbins (1963) or Fergu-son (2004). The optimal policy is to drop out as soon as the belief assigned toθ (given public and private information) drops below a time-independent cutoffvalue, which we denote π∗.

    Accordingly, after player j drops out, player i faces an auxiliary decisionproblem, in which she drops out whenever her belief is lower than π∗. This isequivalent to using a cutoff strategy, that is, dropping out whenever the privatebelief is lower than some cutoff. The cutoff is calculated by Bayes’ rule fromπ∗ and the fact that player j dropped out at stage k. We denote it by πi(k), sothat πin(k) = πi(k) for every n > k.

    THEOREM 5: Let an equilibrium with cutoffs (πin) be given.P1. The cutoff sequences (πin(�))n∈N are nonincreasing for i = 1�2.4P2. limn→∞ πin(�)= 0 for at least one player i.P3. For each player i, πin(�) < π∗, and πin(�) < πi(k) whenever k< n.

    Statements P1–P3 hold for any pair of rationalizable strategies, and not onlyfor equilibrium strategies.5 We comment briefly on these statements. Two pos-sibly conflicting effects combine in P1. Assume that player i reaches stages nand n + 1 with the same private belief p, while player j is still active. State θis then more likely at stage n+ 1 than at stage n: indeed, the longer player j is

    4Since the sequence (πin(�))n∈N need not be uniquely defined, P1 should be interpreted as tomean that given an equilibrium, the corresponding cutoffs may be chosen in such a way that thesequences (πin(�))n∈N are nonincreasing.

    5Indeed, by our proof of Theorem 4, every best reply is a cutoff strategy. Moreover, by prop-erly adapting our proof of Theorem 5, one obtains that every best reply to a mixture of cutoffstrategies satisfies P1–P3. It follows that P1–P3 hold for any pair of rationalizable strategies.

  • ONE-ARM BANDIT PROBLEMS 1597

    active, the better news it is on Θ. On the other hand, the optimal continuationvalue for player i also depends on whether and when she is likely to infer in-formation about θ through player j’s behavior in the future. That is, it involvesboth player i’s current belief over the private belief currently held by player j,and player j’s future cutoffs. This second effect is ambiguous. According to P1,the combined effect is such that the optimal continuation payoff is higher atstage n+ 1: a lower value for the private belief is necessary to trigger exit.

    The intuition behind P2 is the following. If player j never drops out, pjn con-verges to 1 if Θ = θ and to 0 if Θ = θ. If player j’s cutoffs are bounded awayfrom 0, the fact that she stays in longer provides very good news on Θ. Playeri will therefore remain active, unless she gets strong private information in theopposite direction: player i’s cutoffs will converge to zero.

    We turn to P3 that incorporates two effects. On the one hand, having anactive opponent is good news on Θ and is better news than the opponent drop-ping out in some earlier stage k. This is reflected in a higher posterior belief.On the other hand, having an active opponent creates an informational ex-ternality that does not exist in the one-player case and no longer exists if theopponent has already dropped out. For a given posterior, this is reflected in ahigher option value.

    Note that the one-player cutoff π∗ decreases to zero when the discount rategoes to 1: the cost of experimentation drops to zero. According to P3, all equi-librium cutoffs (πin(�)) then converge to zero.

    We conclude by mentioning that our results still hold whenever the playersreceive private signals at every stage that are conditionally independent givenΘ, in addition to being told their own payoff. They also hold if the players holddifferent (thereby inconsistent) prior beliefs on Θ, provided we use the notionof subjective Nash equilibrium.

    1.2.1. Large games

    When the number of players exceeds two, a strategy keeps track of the statusof every other player. The definition of cutoff strategies is similar, except thatcutoffs now depend on who dropped out and when. Theorems 3, 4, and 5(P2,P3) hold for any finite number of players.

    When the number of players gets large, equilibrium behavior can be fullycharacterized.6 To simplify the characterization, we will assume that (i) pi1 hasfull support and (ii) p0 >π∗, so that dropping out at stage 0 is a strictly domi-nated strategy.

    Let φN be an arbitrary equilibrium of the N-player game, with cutoffs (πi�Nn ).At stage 1, all cutoffs πi�N1 are bounded away from zero. All players are active atstage zero, and more players drop out at stage 1 if Θ = θ than if Θ = θ. Hence

    6We refer to Rosenberg, Solan, and Vieille (2004) for a detailed proof.

  • 1598 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    the proportion of players who drop out at stage 1 reveals Θ with high probabil-ity to all remaining players. Therefore, the continuation payoff at stage 1 may(asymptotically) be computed under the assumption that θ will be revealed atthe beginning of stage 2: all players use cutoffs that are close to πc , the uniquesolution of πθ+ (1 −π)θ+πδθ/(1 − δ)= 0.

    Denote by ρN the fraction of players who drop out at stage 1. By a largedeviations argument, there is ρc ∈ (0�1), such that the public likelihood thatΘ = θ is close to +∞ (resp. close to zero) if ρN < ρc (resp. if ρN > ρc). Hence,except in the unlikely event where ρN = ρc , herding takes place at stage 2.

    To summarize, as the number N of players increases to +∞:Stage 1: supi=1�2�����N |πi�N1 (�)−πc| converges to zero.Stage 2: Let a list αN be given that specifies the status of the other players at

    stage 2, and such that the fraction ρN(αN) of drop outs converges to ρ ∈ [0�1]:• If ρ > ρc , the maximal cutoff maxi πi�N2 (αN) converges to 0.• If ρ < ρc , the minimal cutoff mini πi�N2 (αN) converges to 1.

    In a sense, this result provides an asymptotic version of the results of Caplinand Leahy (1994), which deal with the continuum-of-players case.

    2. COMMENTS AND EXTENSIONS

    2.1. Reversible Decisions

    In the one-player bandit problem, assuming reversible decisions rather thanirreversible ones does not change the equilibrium behavior. This property doesnot extend to the two-player case, since in the latter, free-riding effects appear.

    The true multiplayer generalization of one-arm bandit problems allows theplayers to alternate between the two arms, as in Bolton and Harris (1999).The definition of cutoff strategies readily adapts to that case. However, publicinformation is then more cumbersome, since it coincides with the list of thearms selected by each of the players in the past.

    In this case, given an equilibrium, let tn denote the public information avail-able at stage n. The sequence (pin� tn) is a Markov chain for the filtration (G in),so that the optimal continuation payoff W in of player i can be written as a func-tion of (pin� tn). Therefore, in equilibrium the players play as a function of theirprivate beliefs and public information. We do not know whether all equilibriain this case are in cutoff strategies.

    2.2. Timing Games and Real Options

    In our model, a player keeps receiving payoffs until she drops out. In mostreal option problems, each player i chooses a time n at which she starts toreceive a stream of payoffs X̃in� X̃

    in+1� � � � � Our results apply to this situation as

    well. Indeed, observe that, given any strategy profile φ, the sum of the payoffsreceived by player i in the timing game we study and in the real options game

  • ONE-ARM BANDIT PROBLEMS 1599

    is∑+∞

    n=0 X̃in, and is, therefore, independent of the strategy profile. As a result,

    the equilibria of the real option game coincide with the equilibria of the timinggame with stage payoffs Xin := −X̃in.

    2.3. Equilibrium Uniqueness

    In general, the equilibrium need not be unique. To see this, assume that theinitial prior p0 coincides with the optimal one-player cutoff π∗. One equilib-rium is that both players drop out immediately. It can be shown that there isanother symmetric equilibrium in which both players are active at stage 0: thefact that player j is active at stage 0 creates a positive informational externalitythat induces player i to enter as well. The uniqueness issue remains open in thefollowing special cases: (a) when one restricts attention to equilibria in whichboth players are active at stage 0 and (b) when p0 >π∗.

    2.4. Efficiency

    First-order efficiency would imply that both players drop out at stage 0 ifΘ = θ and that both players stay in forever if Θ = θ. Plainly, this cannot beachieved. Lemma 18 below proves that if the machines are high, then there is apositive probability that both players stay in forever. It is not difficult to provethat if both machines are low, then with probability 1 both players drop out infinite time. As discussed in Section 1.2.1, sharper results are obtained when thenumber of players gets large.

    Laboratoire d’Analyse Géométrie et Applications, Institut Galilée, UniversitéParis Nord, avenue Jean-Baptiste Clément, 93430 Villetaneuse, France, and Labo-ratoire d’Econométrie de l’Ecole Polytechnique, Paris, France; [email protected],

    The School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Is-rael; [email protected],

    andDépt. Finance et Economie, HEC, 1 rue de la Libération, 78 351 Jouy-en-Josas,

    France; [email protected].

    Manuscript received January, 2006; final revision received April, 2007.

    APPENDIX

    For most of the appendix, we let a mixed strategy, φj , of player j be given.The payoff function to player i, γi(·�φj), does not depend on player j’s strate-gic decisions τj(k), once player i has dropped out. In the analysis of player i’sbest replies to φj , we therefore may, and will, assume that player j’s behav-ior is independent of player i’s actions, that is, τj(k) = τj(�) for each k ∈ N.

    mailto:[email protected]:[email protected]:[email protected]:[email protected]

  • 1600 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    Accordingly, we denote by τj the unique stopping time that governs player j’sdecisions.

    Section A contains preliminary material on beliefs. Each of the followingthree sections is devoted to the proof of one theorem.

    A: BELIEFS

    We first state the basic stochastic dominance properties of the private beliefs.Then we examine the Markov property of the sequence of beliefs w.r.t. varioussequences of σ-algebras.

    A.1. Stochastic Dominance

    Recall that F in := σ(Xi0� � � � �Xin−1) is the private information of player i priorto some stage n and that the private belief of player i at that time is defined aspin := P(Θ = θ |F in).

    By Bayes’ rule, a version of pin is given by

    pin1 −pin

    = p01 −p0 ×

    n−1∏k=0

    fθ(Xik)

    fθ(Xik)�(1)

    It is well known that the sequence (pin) is a martingale (resp. a submartingale, asupermartingale) under P (resp. under Pθ, Pθ). In addition, the law of pin underPθ stochastically dominates (in the first-order sense) the law of pin under Pθ:the private belief tends to be higher when the state is θ than when it is not.A slightly stronger statement holds here. We omit the proof.7

    LEMMA 6: One has Pθ(pin ≤ p) < Pθ(pin ≤ p) as soon as Pθ(pin ≤ p) > 0 andPθ(pin ≤ p) < 1.

    This dominance property extends to vectors of beliefs. Again, we omit theproof.

    LEMMA 7: For each stage n ∈ N and x1� � � � � xn ∈ [0�1], one has

    Pθ((pi1� � � � �p

    in)≤ (x1� � � � � xn)

    ) ≤ Pθ((pi1� � � � �pin)≤ (x1� � � � � xn))�Moreover, Pθ((pi1� � � � �p

    in) > (x1� � � � � xn))≤ Pθ((pi1� � � � �pin) > (x1� � � � � xn)).

    7All omitted proofs can be found in Rosenberg, Solan, and Vieille (2004).

  • ONE-ARM BANDIT PROBLEMS 1601

    A.2. Markov Properties

    We denote by tjn the status of player j at stage n: tjn := � if τj ≥ n and tjn = k

    if τj = k < n. Since player j’s strategy is given, her past decisions are infor-mative and tjn is a well defined random variable. Prior to stage n ≥ 0, playeri’s information over Ω is given by the σ-algebra G in := σ(F in� tjn), and we defineher posterior belief as qin := P(Θ = θ | G in). Our goal is to establish Proposition 9below.

    Conditional on the arm type Θ, the status tjn is independent of the payoffs(Xik)k≥0, hence also of F in. Thus, a version of the posterior belief is given by

    qin1 − qin

    = pin

    1 −pin× Pθ(t

    jn = α)

    Pθ(tjn = α)

    whenever tjn = α�(2)

    In particular, qin is measurable w.r.t. the pair (pin� t

    jn). For later use, note also

    that the version Qin(pin� t

    jn) of q

    in given by (2) is continuous and increasing in

    pin.We now prove that the pair (pin� t

    jn) is a Markov chain. This will later allow us

    to prove that (pin� tjn) contains all relevant information for player i’s best-reply

    problem.We use the following definition.

    DEFINITION 8—Shiryaev (1996, p. 564): Let (Gn)n∈N be a filtration over aprobability space (Ω�P) and let (An)n∈N be a sequence of random variables,adapted to (Gn)n∈N and with values in Rd . The sequence (An) is a Markovchain for the filtration (Gn)n∈N if, for each Borel set B ⊆ Rd and n ∈ N, onehas P(An+1 ∈ B | Gn)= P(An+1 ∈ B |An) P-a.s.

    PROPOSITION 9: Under P, the sequence (pin� tjn)n∈N is a Markov chain for (G in).

    We will use the following technical observation, stated without proof.

    LEMMA 10: Let H1 and H2 be two independent σ-algebras on a probabilityspace (Ω�P) and let Ai be a sub-σ-field of Hi, i = 1�2. For each C1 ∈ H1, C2 ∈H2, one has

    P(C1 ∩C2 | σ(A1�A2))= P(C1 |A1)× P(C2 |A2)�PROOF OF PROPOSITION 9: Observe first that the sequence (pin)n≥0 is a

    Markov chain for (F in)n≥0 under Pθ. Let a stage n ≥ 0 be given, let B ⊆ [0�1]be a Borel set, and fix α ∈ {�} ∪ N.

    Under Pθ(θ ∈ {θ�θ}), the σ-algebra F in+1 and the r.v. tjn+1 are independent.By Lemma 10, this implies

    Pθ(pin+1 ∈ B� tjn+1 = α | G in)= Pθ(pin+1 ∈ B |F in)× Pθ(tjn+1 = α | tjn)�

  • 1602 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    Since (pin) is a Markov chain under Pθ, Pθ(pin+1 ∈ B | F in) = Pθ(pin+1 ∈ B | pin).

    Therefore, Pθ(pin+1 ∈ B� tjn+1 = α | G in) is measurable w.r.t. the pair (pin� tjn).Since qin = P(Θ = θ | G in) is measurable w.r.t. (pin� tjn), the conditional proba-

    bility

    P(pin+1 ∈ B� tjn+1 = α | G in)=

    ∑θ

    P(Θ = θ | G in)× Pθ(pin+1 ∈ B� tjn+1 = α | G in)

    is also measurable w.r.t. (pin� tjn), hence

    P(pin+1 ∈ B� tjn+1 = α | G in)= P(pin+1 ∈ B� tjn+1 = α | pin� tjn)as desired. Q.E.D.

    B: PROOF OF THEOREM 4

    When facing φj , player i must choose when to stop; that is, a stopping timefor the filtration (G in)n≥0. If she stops at stage n, her overall realized payoffis Y in :=

    ∑n−1k=0 δ

    kXik. Hence, player i’s best-reply problem is equivalent to theoptimal stopping problem,

    problem P : find a solution to supσ E[Y iσ ]�where the supremum is taken over all stopping times σ for (G in). That is, anybest reply to φj yields an optimal solution to (P) and vice versa. We will provethat (P) admits a unique optimal stopping time, which moreover correspondsto a cutoff strategy.

    We first recall standard material on optimal stopping problems.

    STEP 0: Optimal stopping problems. Given n ≥ 0, we let �n be the set of stop-ping times σ ≥ n (P-a.s.). The Snell envelope of the sequence (Y in) is defined tobe the sequence (V in ), where

    V in := ess supσ∈�n

    E[Y iσ | G in]�

    It is the optimal payoff to player i when she is constrained not to drop outbefore stage n. The following lemma is well known; see, for example, Chowand Robbins (1963), Ferguson (2004, Chap. 3), or Neveu (1972).

    LEMMA 11: The stopping time σ∗ := inf{n ≥ 0 :V in = Y in} (with inf∅ = +∞) isa solution to P . Moreover, σ ≥ σ∗ for every optimal stopping time σ .

  • ONE-ARM BANDIT PROBLEMS 1603

    The Snell envelope (V in ) can be obtained as the (P-a.s.) limit of the optimalpayoff in the finite horizon versions of (P). To be specific, define V in�k for n�k≥0 by V in�0 = Y in, and

    V in�k+1 = max{Y in�E[V in+1�k | G in]} for every n≥ 0 and k≥ 1�(3)In (3), V in�k+1 is the optimal payoff to player i when she is constrained not

    to drop out before stage n, but must drop out at stage n + k + 1 at the latest.Thus, (3) is a dynamic programming principle: the optimal payoff is obtainedas the maximum of the two choices that are available at stage n, dropping outor staying active. Then V in = limk→∞ V in�k.

    The quantity V in − Y in is the optimal payoff from stage n onward. We firstprove that this quantity is measurable w.r.t. (pin� t

    jn). We then define W

    in to be

    the optimal continuation payoff, and show that it is continuous and increasingin pin. We then conclude that σ

    ∗ is the only optimal stopping time and that itcorresponds to a cutoff strategy.

    STEP 1: The sequence (W in ).

    LEMMA 12: V in −Y in is measurable w.r.t. the pair (pin� tjn).Thus, V in −Y in coincides (P-a.s.) with some function ηin(pin� tjn).PROOF OF LEMMA 12: We adapt the proof from Neveu (1972). By (3), one

    has

    V in�k+1 −Y in = max{0� δnE[Xin | G in] + E[V in+1�k −Y in+1 | G in]}�for every n ≥ 0 and k ≥ 1�

    Observe that the expected current payoff, E[Xin | G in] = qinEθ[Xin] + (1 −qin)Eθ[Xin] is measurable w.r.t. (pin� tjn).

    On the other hand, since the sequence (pin� tjn) is a Markov chain for (G in),

    the r.v. E[V in+1�k −Y in+1 | G in] is measurable w.r.t. (pin� tjn) as soon as V in+1�k −Y in+1is measurable w.r.t. (pin+1� t

    jn+1). It thus follows inductively that V

    in�k − Y in is

    measurable w.r.t. (pin� tjn) for each k. The result follows by letting k → +∞.

    Q.E.D.

    We define W in := δnE[Xin | G in] + E[V in+1 − Y in+1 | G in]. This is player i’s opti-mal continuation payoff if she decides to remain active. One has V in − Y in =max{0�W in }. As in the proof of Lemma 12, W in is measurable w.r.t. (pin� tjn).

    Set τ∗ := inf{k> n :V ik = Y ik}, so that V in+1 = E[Y iτ∗ | G in+1]. The stopping timeτ∗ is optimal for player i, assuming she is active at stage n. It follows thatW in = E[Y in→τ∗ | G in], where Y in→τ∗ = Y iτ∗ −Y in =

    ∑τ∗−1k=n δ

    kXik is the sum of payoffsreceived from stage n up to stage τ∗.

  • 1604 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    STEP 2: Regularity properties. We here state and prove Lemmas 13 and 14.

    LEMMA 13: The r.v. W in has a version ωin(p

    in� t

    jn), such that for fixed α, the map

    ωin(·�α) is continuous on the support of pin.Plainly, it will be sufficient to consider α’s such that P(tjn = α) > 0. For such

    α’s, the map ωin(·�α) need not be uniquely defined. Indeed, if the distributionsof payoffs are such that the private belief pi1 cannot fall below, say, some levelλ > 0, the definition of ωi1(p�α) for p < λ is completely arbitrary. However,if P(tjn = α) > 0, the restriction of ωin(·�α) to the support of pin is uniquelydefined.

    LEMMA 14: For fixed α, the map ωin(·�α) is increasing on the support of pin.We will use the following technical lemma, which follows from Lusin’s theo-

    rem.

    LEMMA 15: Let ν be a probability measure over R, absolutely continuousw.r.t. the Lebesgue measure, and let B ⊆ R be a Borel set. Then the map x ∈R �→ ν(x+B) is continuous.

    PROOF OF LEMMA 13: W in may also be expressed as

    W in = E[Y in→τ∗ | pin� tjn]=

    ∑θ∈{θ�θ}

    P(Θ = θ | pin� tjn)Eθ[Y in→τ∗ | pin� tjn]

    =∑

    θ∈{θ�θ}P(Θ = θ | pin� tjn)

    ∑k≥n

    δk θ Pθ(τ∗ > k | pin� tjn)�

    The belief P(Θ = θ | pin� tjn) has a continuous version, qin. It is therefore suffi-cient to construct a continuous version of Pθ(τ∗ >k | pin� tjn) for each k≥ n.

    To be concise, we let tj stand for the vector (tjn+1� � � � � tjk) and let α =(αn+1� � � � �αk) denote generic values of tj .

    Observe first that τ∗ > k if and only if8 ηim(pim� t

    jm) > 0 for all n < m ≤ k.

    For a given α, define Gm(α) := {p :ηim(p�α) > 0} to be those beliefs at whichplayer i remains active at stage m. Thus, on the event tj = α, one has τ∗ > k ifand only if pim ∈ Gm(αm) for all n

  • ONE-ARM BANDIT PROBLEMS 1605

    so that on the event tj = α, one has τ∗ > k if and only if

    lnpin

    1 −pin+ ln fθ(X

    in)

    fθ(Xin)+

    m−1∑s=n+1

    lnfθ(X

    is)

    fθ(Xis)∈ Fm(αm)� for all n k� tj = α | pin� tjn), which is continuous in pin. By summing over α, onethen obtains a version of Pθ(τ∗ > k | pin� tjn), which is continuous in pin, as de-sired. Q.E.D.

  • 1606 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    PROOF OF LEMMA 14: Given a version of pin, we use (4) to choose a versionfor pim = pim(pin�Xin� � � � �Xim−1) (m> n). Fix p in the support of pin. Define thestopping time

    σp := inf{k ≥ n+ 1 :ωik(pik(p�Xin� � � � �Xik−1)� tjk)≤ 0

    }�

    Under σp, player i behaves as if she had reached stage n with a private beliefequal to p, and would play τ∗. Thus, conditional on θ, her continuation payoffEθ[Y in→σp | G in] does not depend on pin. One version of this continuation pay-off is C(pin� t

    jn;p) := qin(pin� tjn)Eθ[Y in→σp] + (1 − qin(pin� tjn))Eθ[Y in→σp], which is

    continuous and increasing in pin.Fix the version of τ∗ to be τ∗ = inf{k> n :ωik(pik(pin�Xin� � � � �Xim−1)� tjk)≤ 0}.

    By construction, one has C(p� tjn;p) = ωin(p� tjn). This inequality holds every-where, and not only P-a.s., since both C and ωin are continuous (see Lemma 13for the latter). For the same reason, and since ωin(p

    in� t

    jn) is the highest continu-

    ation payoff, one has C(pin� tjn;p) ≤ ωin(pin� tjn) everywhere. Since C is increas-

    ing, this implies for every p′ higher than p in the support of pin and every α,

    win(p�α)= C(p�α;p) < C(p′�α;p)≤win(p′�α)�(7)We obtained that win(·�α) is increasing in p, as desired. Q.E.D.

    STEP 3: Conclusion. We here state and prove Lemma 16, which concludesthe proof.

    LEMMA 16: The stopping time σ∗ is the only optimal solution to P . Moreover,it corresponds to a cut-off strategy.

    PROOF: We start with the first assertion. Let σ be a solution to P . ByLemma 11, σ ≥ σ∗. Fix a stage n. By Lemma 14, and since pin has a density, onehas ωin(p

    in� t

    jn) < 0 on the event Ωn := {σ∗ = n < σ}. In particular, E[Y iσ1Ωn] ≤

    E[Y iσ∗1ωn], with a strict inequality if P(Ωn) > 0. Since E[Y iσ ] = E[Y iσ∗ ], one musthave P(Ωn)= 0 for each n.

    We now turn to the second claim. If P(tjn = α) > 0, Lemmas 13 and 14 pro-vide us with a version ωin(·�α) that is continuous and increasing over the sup-port of pin. We extend it to a continuous, increasing, function defined over [0�1]and define πin(α) to be the unique value of p such that w

    in(p�α)= 0. If instead

    P(τj(α))= 0, we choose πin(α) ∈ [0�1] in an arbitrary way.It is immediate to check that σ∗ = inf{n : pin ≤ πin(tjn)} (P-a.s.). Hence, σ∗ is

    a cutoff strategy. Q.E.D.

    REMARK: σ∗ is the unique optimal stopping time (up to P-null sets). How-ever, the associated cutoffs need not be uniquely defined, for two reasons.

    Consider first α such that P(tjn = α) > 0. If the value of πin(α), as obtained inthe previous proof, falls outside the support of pin, then its value depends on

  • ONE-ARM BANDIT PROBLEMS 1607

    the choice of the extension of win. However, this indeterminacy is only appar-ent, as these correspond to beliefs that are reached with probability zero: thecorresponding stopping time τi(α) is uniquely defined (up to P-null sets).

    If P(tjn = α) = 0, then any choice for πin(α) is admissible, and differentchoices may yield different stopping times τi(α). Again, this indeterminacy isonly apparent, since, against τj , the stopping time τi(α) will “never” be used.

    If the cutoff p does not belong to the support of pin, then changing the def-inition of ωin around p would change the cutoff as well. This indeterminacy isonly apparent if p corresponds to a belief that is reached with probability zero.In other words, the best-reply is unique, even if there may be different cutoffsequences associated with it.

    C: PROOF OF THEOREM 3

    The existence of a symmetric equilibrium derives from a standard fixed-pointargument. A cutoff strategy of player i is a sequence (πin(k)) indexed by n ≥ 0and k ∈ {��1� � � � � n − 1}, with values in [0�1]. The set Φ of such sequences iscompact when endowed with the product topology.

    Player i’s best-reply map is given by B(φj) := {φi ∈ Φ :γi(φi�φj) =maxΦ γi(·�φj)} for each cutoff strategy φj ∈ Φ. From the analysis of the previ-ous section, B(φ′) is convex-valued.

    We now check that the payoff function γi is continuous over the space ofcutoff profiles. Let a sequence (φm) of cutoff profiles be given, that convergesto φ (in the product topology). The realized payoff ri(φm) converges to ri(φ),except possibly if the belief is equal to the cutoff: pin = πin(α) for some n ≥ 0,α= ��0�1� � � � � n− 1, and i = 1�2. Since the law of pin has a density, this eventhas P-measure zero. Since |ri(φm)| ≤ supn∈N |Y in|, the dominated convergencetheorem applies and limm→∞ γi(φm)= γi(φ).

    Since γi is continuous, B is upper hemi continuous Since Φ is compact and,by Glicksberg’s (1952) generalization of Kakutani’s fixed-point theorem, B hasa fixed point, φ∗. Plainly, the profile (φ∗�φ∗) is an equilibrium.

    D: PROOF OF THEOREM 5

    We here prove the qualitative results listed in Theorem 5. Most proofs haveto do with the impact of the public information on the posterior belief.

    We fix a cutoff strategy φj of player j. Denote by πjn the cutoff used by playerj if player i is still active at that stage.

    D.1. Proof of P3

    Player j stops at stage τj := inf{n :pjn ≤ πjn}. In particular, by Lemma 7,player j tends to stop earlier if the state is θ: Pθ(τj ≥ n) ≤ Pθ(τj ≥ n). By (2)

  • 1608 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    this implies that qin ≥ pin whenever player j is active: having an active opponentis good news.

    If at stage n player i chooses not to watch player j any longer, she facesa one-player problem, in which her continuation payoff is positive once herposterior belief exceeds π∗. In the two-player game, player i has more optionsif player j is still active, hence player i’s continuation payoff is positive as well.Since qin ≥ pin, whenever her private belief pin exceeds π∗, player i continues.This readily implies that πin(�)≤ π∗: the first part of P3 follows.

    We now prove that having an active opponent is always the best possiblenews on θ. Recall that Qin(p

    in� t

    jn) is the version of q

    in given by (2).

    LEMMA 17: One has Qin(p��)≥ Qin(p�m) for every m< n and p ∈ [0�1].

    PROOF: We introduce an auxiliary family of beliefs and set pi�jn�m := P(Θ = θ |F in�F jm), m≤ n. The belief pi�jn�m is computed by collecting the private informa-tion held by the two players at two possibly different stages n and m. (A versionof) pi�jn�m is given by

    pi�jn�m

    1 −pi�jn�m= p

    in

    1 −pin× p

    jm

    1 −pjm× 1 −p0

    p0�

    hence, it is a continuous and increasing function of pin and pjm, that we denote

    pi�jn�m(·� ·).We also let Qin�m(p

    in� t

    jm) := P(Θ = θ | pin� tjm). By the law of iterated condi-

    tional expectations, and since tjm is a function of pj1� � � � �p

    jm, one has

    Qin�m(pin� t

    jm) = E

    [P(Θ = θ | pin�pj1� � � � �pjm) | pin� tjm

    ]= E[pi�jn�m | pin� tjm]�

    We will prove that the following two inequalities hold:

    Qin�m(p�m− 1)

  • ONE-ARM BANDIT PROBLEMS 1609

    m−1, one has pjm ≤ πjm(�), hence Qin�m(pin� tjm)≤ pi�jn�m(pin�πjm(�)). Combiningthe two inequalities yields Qin�m(p�m− 1) 0.PROOF: Recall that (pin) is a submartingale under Pθ, bounded by 1. There-

    fore, the probability that pin ≤ π∗ for some n ∈ N is at most (1 −p0)/(1 −π∗).The first part of P3 implies that Pθ(τi(�)= +∞) > 0: given θ, there is positiveprobability that no player will ever stop. Q.E.D.

    LEMMA 19: Let φ be an equilibrium. Then limn→∞ πin(�) = 0 for someplayer i.

    PROOF: Assume that the sequence π2n(�) does not converge to zero, forotherwise the conclusion already holds.

    If player 2 is still active, player 1 has more opportunities than in the one-player problem. Hence, if player 1 drops out when player 2 is still active, shewill a fortiori drop out when alone. Therefore, assuming p1n = π1n(�),

    q1n1 − q1n

    = Pθ(τ2(�)≥ n)

    Pθ(τ2(�)≥ n) ×π1n(�)

    1 −π1n(�)≤ π∗

    1 −π∗ �

    Since Pθ(τ2(�) ≥ n) > 0, to show that limn→∞ π1n(�) = 0 it is then sufficient toshow that limn→∞ Pθ(τ2(�) ≥ n) = 0. But this holds since under Pθ the privatebeliefs (p2n)n form a supermartingale that converges to 0 a.s. Q.E.D.

    D.3. Proof of P1

    Let φi be the unique best reply to φj , with cutoffs (πin). We will prove thatthe sequence (πin(�)) is nonincreasing.

    The formal proof involves a long list of inequalities. We provide a detailedsketch, which can be easily transformed into a formal proof. We will prove thatplayer i’s optimal continuation payoff (OCP for short) is lower in situation (A)than in situation (E) below:

  • 1610 D. ROSENBERG, E. SOLAN, AND N. VIEILLE

    (A) pin = p and tjn = �.(E) pin+1 = p and tjn+1 = �.This will show that win(p��)≤ win+1(p��) for every p, and the result follows.We proceed by introducing several situations player i may face, including

    fictitious ones:(B) pin = p, tjn = �, and there is an interim stage n − 12 , between stages

    n − 1 and n, in which only player i receives a payoff (but the players make nochoices). This situation is purely fictitious.

    (C) pin+1 = p, tjn = �, and, starting from stage n, player i observes the statusof player j with a one-stage delay. This situation involves a modified game.

    (D) pin+1 = p and tjn = �. This is the situation in which player i reaches stagen+ 1 with a private belief p, but has not yet figured out whether player j choseto remain active or not at stage n.

    We compare these situations, from the viewpoint of the optimization prob-lem faced by player i.

    STEP 1: Variations (A) and (B). All relevant information contained in pastpayoffs is summarized in the private belief: it is irrelevant that in (A) and (B)different payoffs and a different number of payoffs lead to the same private be-lief. Besides, player j receives the same number of observations in both cases,so that the conditional distribution of pjn is the same in both cases. Since fromstage n on, the two situations coincide, player i’s OCP at stage n in both situa-tions is the same.

    STEP 2: Variations (B) and (C). The continuation games faced by player i insituations (B) and (C) are strategically equivalent. Therefore, player i’s OCPis the same in both situations.

    STEP 3: Variations (C) and (D). The only difference between the continua-tion games from stage n+ 1 in the two situations is that in (C), information isdelayed for player i. Hence in (C), player i has fewer strategies, so that playeri’s OCP in (C), does not exceed her expected OCP in (D).

    STEP 4: Variations (D) and (E). In (D), player i has not yet observed thechoice made by player j at stage n. Once player i observes player j’s choice,two cases may arise: either player j remained active and we reach (E), or shechose to drop out, so that we reach yet another situation:

    (F) pin+1 = p and tjn+1 = n.As a result, player i’s expected OCP in situation (D) is a weighted average

    of her OCP’s in situations (E) and (F).Therefore, to prove that the OCP in situation (D) is at most her OCP in

    situation (E), it is sufficient to prove that the OCP in situation (F) does notexceed that in situation (E). This holds since (i) player i has more strategies in(E) than in (D) and (ii) by Lemma 17, her posterior belief that the state is θ ishigher in (E) than in (D). Q.E.D.

  • ONE-ARM BANDIT PROBLEMS 1611

    REFERENCES

    AUMANN, R. J. (1964): “Mixed and Behavior Strategies in Infinite Extensive Games,” in Advancesin Game Theory, Annals of Mathematics Study, Vol. 52, ed. by M. Dresher, L. S. Shapley, andA. W. Tucker. Princeton, NJ: Princeton University Press, 627–650. [1594]

    BERGEMANN, D., AND J. VÄLIMÄKI (1997): “Market Diffusion with Two-Sided Learning,” RANDJournal of Economics, 28, 773–795. [1591]

    (2000): “Experimentation in Markets,” Review of Economic Studies, 67, 213–234. [1591]BOLTON, P., AND C. HARRIS (1999): “Strategic Experimentation,” Econometrica, 67, 349–374.

    [1591,1598]CAPLIN, A., AND J. LEAHY (1994): “Business as Usual, Market Crashes and Wisdom after the

    Fact,” American Economic Review, 84, 547–564. [1598]CHAMLEY, C. (2004): “Delays and Equilibria with Large and Small Information in Social Learn-

    ing,” European Economic Review, 48, 477–501. [1592]CHAMLEY, C., AND D. GALE (1994): “Information Revelation and Strategic Delay in a Model of

    Investment,” Econometrica, 62, 1065–1085. [1592]CHOW, Y. S., AND H. ROBBINS (1963): “On Optimal Stopping Rules,” Zeitschrift Warschein-

    lichkeitstheorie und Verwandte Gebiete, 2, 33–49. [1596,1602]DÉCAMPS, J. P., AND T. MARIOTTI (2004): “Investment Timing and Learning Externalities,” Jour-

    nal of Economic Theory, 118, 80–102. [1591]DIXIT, A. K., AND R. S. PINDYCK (1994): Investment under Uncertainty. Princeton, NJ: Princeton

    University Press. [1592]FERGUSON, T. (2004): “Optimal Stopping and Applications,” available at http://www.math.ucla.

    edu/~tom/Stopping/Contents.html. [1596,1602]GIRALDEAU, L.-A., T. J. VALONE, AND J. J. TEMPLETON (2002): “Potential Disadvantages of Us-

    ing Socially Acquired Information,” Philosophical Transactions of the Royal Society of London,Ser. B, 357, 1559–1566. [1592]

    GLICKSBERG, I. L. (1952): “A Further Generalization of the Kakutani Fixed Point Theorem, withApplication to Nash Equilibrium Points,” Proceedings of the American Mathematical Society, 3,170–174. [1607]

    KELLER, G., S. RADY, AND M. CRIPPS (2005): “Strategic Experimentation with Exponential Ban-dits,” Econometrica, 73, 39–68. [1591]

    MILGROM, P. (1981): “Good News and Bad News: Representation Theorems and Applications,”Bell Journal of Economics, 12, 380–391. [1596]

    NEVEU, J. (1972): Martingales à Temps Discret, Paris: Masson. [1602,1603]ROSENBERG, D., E. SOLAN, AND N. VIEILLE (2004): “Social Learning in One-Arm Bandit Prob-

    lems,” Discussion Paper 1396, The Center for Mathematical Studies in Economics and Man-agement Science, Northwestern University. [1597,1600]

    SHIRYAEV, A. (1996): Probability (Second Ed.). New York: Springer-Verlag. [1601]VALONE, T. J., AND J. J. TEMPLETON (2002): “Public Information for the Assessment of Quality:

    A Widespread Social Phenomenon,” Philosophical Transactions of the Royal Society of London,Ser. B, 357, 1549–1557. [1592]

    http://www.e-publications.org/srv/ecta/linkserver/setprefs?rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:1/Aum1964&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:2/BerVal1997&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:3/BerVal2000&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:4/BolHar1999&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:5/CapLea1994&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:6/Cha2004&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:7/ChaGal1994&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:8/ChoRob1963&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:9/DecMar2004&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.math.ucla.edu/~tom/Stopping/Contents.htmlhttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:12/Giretal2002&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:13/Gli1952&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:14/Keletal2005&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:15/Mil1981&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:16/Nev1972&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:18/Shi1996&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:19/ValTem2002&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:1/Aum1964&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:1/Aum1964&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:2/BerVal1997&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:3/BerVal2000&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:5/CapLea1994&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:6/Cha2004&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:7/ChaGal1994&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:8/ChoRob1963&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:9/DecMar2004&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.math.ucla.edu/~tom/Stopping/Contents.htmlhttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:12/Giretal2002&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:12/Giretal2002&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:13/Gli1952&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:13/Gli1952&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:14/Keletal2005&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:15/Mil1981&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:19/ValTem2002&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-Shttp://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:19/ValTem2002&rfe_id=urn:sici%2F0012-9682%28200711%2975%3A6%3C1591%3ASLIOBP%3E2.0.CO%3B2-S

    IntroductionModel and ResultsThe GameStrategies

    Main ResultsLarge games

    Comments and ExtensionsReversible DecisionsTiming Games and Real OptionsEquilibrium UniquenessEfficiency

    Author's AddressesAppendix A: BeliefsStochastic DominanceMarkov Properties

    B: Proof of Theorem 4 C: Proof of Theorem 3 D: Proof of Theorem 5Proof of P3Proof of P2Proof of P1

    References