Learning from Experience, Simplyweb.mit.edu/hauser/www/Updates - 4.2016/Lin Zhang... · able to posit that a consumer cannot solve optimally in his or her head a dynamic problem that

This article was downloaded by: [18.154.1.183] On: 08 September 2014, At: 06:05Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Marketing Science

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

Learning from Experience, SimplySong Lin, Juanjuan Zhang, John R. Hauser

To cite this article:Song Lin, Juanjuan Zhang, John R. Hauser (2014) Learning from Experience, Simply. Marketing Science

Published online in Articles in Advance 05 Sep 2014

. http://dx.doi.org/10.1287/mksc.2014.0868

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2014, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

http://pubsonline.informs.org

http://dx.doi.org/10.1287/mksc.2014.0868

http://pubsonline.informs.org/page/terms-and-conditions

http://www.informs.org

Articles in Advance, pp. 1–19ISSN 0732-2399 (print) � ISSN 1526-548X (online) http://dx.doi.org/10.1287/mksc.2014.0868

© 2014 INFORMS

Learning from Experience, Simply

Song Lin, Juanjuan Zhang, John R. HauserMIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

{[email protected], [email protected], [email protected]}

There is substantial academic interest in modeling consumer experiential learning. However, (approximately)optimal solutions to forward-looking experiential learning problems are complex, limiting their behavioral

plausibility and empirical feasibility. We propose that consumers use cognitively simple heuristic strategies.We explore one viable heuristic—index strategies—and demonstrate that they are intuitive, tractable, and plausible.Index strategies are much simpler for consumers to use but provide close-to-optimal utility. They also avoidexponential growth in computational complexity, enabling researchers to study learning models in more complexsituations.

Well-defined index strategies depend on a structural property called indexability. We prove the indexability of acanonical forward-looking experiential learning model in which consumers learn brand quality while facingrandom utility shocks. Following an index strategy, consumers develop an index for each brand separately andchoose the brand with the highest index. Using synthetic data, we demonstrate that an index strategy achievesnearly optimal utility at substantially lower computational costs. Using IRI data for diapers, we find that an indexstrategy performs as well as an approximately optimal solution and better than myopic learning. We extendthe analysis to incorporate risk aversion, other cognitively simple heuristics, heterogeneous foresight, and analternative specification of brands.

Keywords: forward-looking experiential learning; index strategies; structural models; cognitive simplicity;heuristics; multi-armed bandit problems; restless bandit problems; indexability

History : Received: August 10, 2012; accepted: April 9, 2014; Preyas Desai served as the editor-in-chief and Teck Hoserved as associate editor for this article. Published online in Articles in Advance.

1. Introduction and MotivationConsiderable effort in marketing is devoted to studyingthe dynamics by which consumers learn from theirconsumption experience (e.g., Roberts and Urban 1988,Erdem and Keane 1996, Ching et al. 2013a). As anexample, imagine new parents who have to shop fordiapers, perhaps with little preexisting knowledgeabout this category. As these parents find out moreabout diaper brands through usage experience, theyface a strategic choice. They can exploit their knowledgeto date and select the most appealing brand. Theycan also explore further, which may entail sampling acurrently less-than-ideal brand, so that they can makea more informed decision in the future.

Researchers have developed theory-rich models ofoptimizing forward-looking consumers who balanceexploitation with exploration. Pillars of these modelsinclude an explicitly specified description of consumerutility and an explicitly specified process by whichconsumers learn. Most models assume consumerschoose brands by solving a dynamic program thatmaximizes expected total utility taking learning intoaccount. Researchers argue that theory-based modelsare more likely to uncover insight and be invariantfor new-domain policy simulations (Chintagunta et al.2006, p. 604). However, these advantages often come at

the expense of difficult problems and time-consumingsolution methods.

The dynamic programs for forward-looking expe-riential learning models are, themselves, extremelydifficult to solve optimally. We cite evidence below thatthe problems are PSPACE-hard—they are at least ashard to solve as any problem that requires PSPACEcomputational memory.1 This intractability presentsboth practical and theoretical challenges. Practically,researchers have had to rely on approximate solutions.Without explicit comparisons to the optimal solution,we do not know the impact of the approximations onestimation results. Moreover, the well-known “curse ofdimensionality” prevents researchers from investigatingproblems with moderate or large numbers of brandsor marketing variables, whereby even approximatesolutions may not be feasible. Theoretically, it is reason-able to posit that a consumer cannot solve optimallyin his or her head a dynamic problem that requiresvast amounts of memory and computation. In fact,

1 PSPACE is the set of problems that use polynomial-sized memory—memory proportional to �ì�n, where �ì� is a measure of the size ofthe problem and n can be extremely large. PSPACE-hard problemsare at least as hard as NP-hard problems, which are themselvessuspected of being unsolvable in polynomial time.

1

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Lin, Zhang, and Hauser: Learning from Experience, Simply2 Marketing Science, Articles in Advance, pp. 1–19, © 2014 INFORMS

well-developed theories in marketing, psychology, andeconomics suggest that observed consumer decisionrules are often cognitively simple (e.g., Payne et al.1988, 1993; Gigerenzer and Goldstein 1996).

We propose that consumers use cognitively simpleheuristics to solve learning problems. As an exampleof the class of cognitively simple heuristics, we investi-gate an attractive candidate heuristic, index strategies,whereby a consumer develops a numerical score, oran index, for each brand separately and then choosesthe brand with the largest index. Index strategies area solution concept that decomposes an intractableproblem into a set of tractable subproblems. We retainbasic pillars of structural modeling such as an explicitdescription of consumer utility and the decision processand an assumption that consumers seek to optimize.We posit, in addition, a cost to solving complex prob-lems (e.g., Shugan 1980, Johnson and Payne 1985).We assume the consumer chooses a strategy that opti-mizes expected discounted utility minus this cognitivecost. Whereas the cost of cognitive complexity mightbe observable in the laboratory, say through responselatency, it is unobservable in vivo. Instead, we identifydomains where index strategies are nearly optimal inthe sense of maximizing expected discounted utility.If, in such domains, index strategies are substantiallysimpler for the consumer to implement, then it is likelythat savings in cognitive costs exceed the slight devia-tion from optimality and, hence, provide the consumerwith greater utility net of cognitive costs. In the specialcases where index strategies provide optimal expectedutility, we argue that index strategies are superior as adescription of forward-looking learning. Following thesame logic, we establish conditions where myopic learn-ing strategies (i.e., exploiting posterior beliefs withoutexploration) suffice to model consumer behavior.

To motivate the viability of index strategies asa descriptive model of consumers we (1) establishwhether well-defined index strategies exist, (2) explainwhy they are intuitive and hence might be used by con-sumers, (3) investigate when index strategies are (near)optimal solutions to the reduced problem of utilitymaximization, and whether they are computationallysimpler than the approximately optimal solution,2 and(4) test whether index strategies explain observed con-sumer behavior at least as well as alternative models.

We address (1) analytically by proving the “indexabil-ity” property of canonical forward-looking experientiallearning models. (Indexability is hard to establish ingeneral.) We address (2) by examining the form andproperties of index strategies and arguing they arebehaviorally intuitive relative to the approximately opti-mal solution assumed in most forward-looking learning

2 We use computational simplicity as a surrogate for cognitivesimplicity in this paper.

Figure 1 Index Strategies Balance Utility and Simplicity (ConceptualDiagram)

Simplicity

Optimal

Utility

Approximatelyoptimal solution

Index strategy

Myopic learning

No learning

models. We address (3) using synthetic data. We address(4) by estimating alternative models using IRI data onthe purchase of diapers, a product category where weexpect to see forward-looking experiential learning.

Our basic hypothesis is that consumers can use a cog-nitively simple index strategy to solve forward-lookingexperiential learning problems. Figure 1 is a conceptualsummary of our hypothesis. We demonstrate viabilityby showing that there exists a well-defined index thatsatisfies the four criteria. We do not argue that con-sumers actually use this well-defined index. Ratherwe argue that the well-defined index is a better “as if”description than the (approximately) optimal solutionstrategy.3

We first describe a canonical learning problem.We next briefly review literatures that address learningdynamics, cognitive simplicity, and related optimizationproblems. We then examine index strategies from theperspectives of theory, synthetic data, and empiricalestimation. We close with extensions.

2. Canonical Forward-LookingExperiential Learning Problem

We consider the following canonical forward-lookingexperiential learning problem. A consumer sequentiallychooses from a set A containing J brands. Let j indexbrands and t index purchase occasions. The consumer’sutility, ujt , from choosing j at t has three components.The first component is quality, qjt , which can be definedto include enjoyment, fit with needs, weighted sum ofbrand features, etc. Quality is drawn independentlyfrom a distribution Fj4qjt3 �j5 with parameters �j . The Fjdistributions are independent across j . This indepen-dence assumption rules out learning about a brandby choosing another. The consumer, however, doesnot know the value of the parameters �j and observes

3 An empirical search among heuristics would risk exploiting randomvariation in the data. Instead we demonstrate that an index strategy,and at least one other cognitively simple heuristic, performs well onthe data.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Lin, Zhang, and Hauser: Learning from Experience, SimplyMarketing Science, Articles in Advance, pp. 1–19, © 2014 INFORMS 3

quality draws to infer the value of �j . A quality draw ofa brand is only realized after the consumer has chosenthat brand.

The second component of utility is a set of observ-able shocks, Exjt , such as advertising, price, promotion,and other control variables that are observable to theresearcher and consumer. For simplicity, we assumethat observable shocks affect utility directly, althoughthe model is extendable to indirect effects throughthe quality component as in Erdem and Keane (1996),Ackerberg (2003), and Narayanan et al. (2005). The thirdcomponent of utility is an unobservable shock, �jt , whichrepresents random fluctuations in realized utility thatare observed by the consumer, but not by the researcher.

Consumer decision making depends on quality andthe weighted sum of observable and unobservableshocks, E�′ Exjt + �jt , where E� is a vector of weight param-eters. We refer to this weighted sum as utility shocks.We let utility shocks be drawn from a joint distribution,Hj4Exjt1 �jt3�5, independently over purchase occasionswith parameters, �.4 The Hj ’s are independent across j .The consumer knows the distribution Hj and the valueof �, observes the current utility shocks prior to mak-ing a purchase decision, but does not know futurerealizations of the shocks. Notice that, unlike the qual-ity draws, the utility shocks of a brand are realizedregardless of whether the consumer has chosen thatbrand. We make the conservative assumption thatutility shocks are independent of qjt and thus do nothelp the consumer learn quality directly. However,utility shocks do shape learning indirectly by varyingthe consumer’s utility from exploitation, which in turnaffects the incentive for exploration.

In summary, we write the consumer’s utility fromchoosing brand j at purchase occasion t as follows:

ujt = qjt + E�′ Exjt + �jt0 (1)

For ease of exposition, in the main analysis, we assumethat the consumer is risk neutral. We extend the modelto incorporate risk aversion in §8.1.

We model each consumer as if the consumer usesBayes Theorem to update beliefs about the qual-ity parameter �j after each consumption experience(assumed to occur after choice but before the nextchoice). Let sjt be the information set that summa-rizes the consumer’s beliefs about �j at purchase occa-sion t. At t = 0, beliefs about �j are summarized bya prior distribution, Bj04�j3 sj05, where sj0 is based on

4 Observable shocks can be independently distributed over purchaseoccasions for a number of reasons. For example, firms may inten-tionally randomize price promotions in response to competition.Such “mixed strategies” can generate observed prices that appear tobe independently drawn at each purchase occasion from a knowndistribution (Narasimhan 1988).

all relevant prior experience. After the tth consump-tion experience the consumer’s posterior beliefs aresummarized by Bjt4�j3 sjt5. When both Fj and priorbeliefs are normal, Bayesian updating is naturally conju-gate. We obtain sjt = 4�̄jt1 �̄jt5 using standard updatingformulae. The parameters of posterior beliefs, sjt ∈ìand the realized utility shocks, Exjt ∈ X and �jt ∈ E,summarize the state of information about brand j .The collection of brand-specific states, 4Est1 Ext1 E�t5=4s1t1s2t10001sJt1 Ex1t1 Ex2t10001 ExJt1�1t1�2t10001�Jt5 representsthe set of states relevant to the decision problem at t.

We seek to model a decision strategy, ç2 4ì ×

X ×E5J →A, that maps the state space to the choiceset. Without further assumptions, the consumer mustchoose a decision strategy to maximize expected dis-counted utility:

V 4Est1 Ext1 E�t5

= maxç

Ɛç

[

�∑

�=t

��−t4qj� + E�′ Exj� + �j�5

∣

∣

∣

∣

4Est1 Ext1 E�t5

]

1 (2)

where � is the discount factor. The expectation Ɛ is takenover the stochastic process generated by the decisionstrategy (in particular, the transition between states thatmay depend on the consumer’s brand choice). The infi-nite horizon can be justified either by consumption overa long horizon or by the consumer’s subjective beliefthat the decision problem will end randomly.

The optimal solution to the consumer’s decisionproblem can be characterized as the solution to theBellman equation:

V 4Est1 Ext1 E�t5 = maxj∈A

{

E�′ Exjt + �jt

+ Ɛ[

qjt + �V 4Est+11 Ext+11 E�t+15 � Est1 j]}

0 (3)

Although the Bellman equation is conceptually simple,the full solution is computationally difficult because,even after integrating out the utility shocks Ext and E�t ,it evolves on a state space of size �ì�J , where �ì� is thenumber of elements in ì. Not only is �ì�J exponentialin the number of brands J , it becomes extremely largeif ì contains many elements, even when the optimalsolution is approximated by choosing discrete points torepresent ì, as is common in the literature. We providean illustrative example in §4.

3. Related LiteraturesBefore we introduce index strategies, it is helpful toreview concepts from literatures on learning dynamics,cognitive simplicity, and related optimization problems.

3.1. Learning DynamicsMany influential papers study consumer learningdynamics and apply learning models to explain orforecast consumer choices in problems related to the

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


canonical learning problem. For example, using datafrom automotive consumers, Roberts and Urban (1988)estimate a model in which consumers use Bayesianlearning to integrate information from a variety ofsources to resolve uncertainty about brand quality.Erdem and Keane (1996) build on the concept ofBayesian learning and include forward-looking con-sumers who trade off exploitation with exploration. Forfrequently purchased goods, their model fits data betterthan a no-learning model (reduced form of Guadagniand Little 1983) and the myopic learning model ofRoberts and Urban.

These papers stimulated a line of research thatestimates the dynamics of consumer learning—for acomprehensive review see Ching et al. (2013a). Somemodels focus on myopic consumers with Bayesianlearning (e.g., Narayanan et al. 2005; Mehta et al. 2008;Chintagunta et al. 2009; Narayanan and Manchanda2009; Ching and Ishihara 2010, 2012), whereas oth-ers explicitly model forward-looking consumers (e.g.,Ackerberg 2003; Crawford and Shum 2005; Erdem et al.2005, 2008). The computational complexity of forward-looking learning has been one of the reasons that someapplications assume myopic learning. However, if atheory is accurately descriptive, more-complex forward-looking models should improve policy simulations.

Because forward-looking choice problems thatinvolve continuous state space generally cannot besolved optimally, significant effort has been spent ondeveloping approximate solutions. For example, Keaneand Wolpin (1994) use Monte Carlo integration andinterpolation, Rust (1997a) introduces a randomizationapproach, and Imai et al. (2009) develop an estimatorthat combines dynamic programming solutions witha Bayesian Markov chain Monte Carlo algorithm.5

Although these solution methods vary in speed, allattempt to approximate the Bellman equation to theoverall problems and thus may suffer from the curseof dimensionality (�ì�J 5.

At the same time of technical developments, there isa growing recognition of the need for richer theories ofconsumer behavior. For example, Chintagunta et al.(2006, p. 614) suggest that “the future developmentof structural models in marketing will focus on theinterface between economics and psychology.”

3.2. Cognitive SimplicityParallel literatures in marketing, psychology, and eco-nomics provide evidence that consumers use decisionrules that are cognitively simple. In marketing, Payneet al. (1988, 1993) and Bettman et al. (1998) presentevidence that consumers use simple heuristic deci-sion rules to evaluate products. For example, under

5 There is a related literature on neuro-dynamic programming, whichuses neural networks and other approximation architectures toovercome the curse of dimensionality (Bertsekas and Tsitsiklis 1996).

time pressure, consumers often use conjunctive rules(require a few “must have” features) rather than more-complicated compensatory rules. Using simulatedthinking costs with “elementary information processes,”Johnson and Payne (1985) illustrate how heuristic deci-sion rules can be rational when balancing utility andthinking costs. Methods to estimate the parameters ofcognitively simple decision rules vary, but such rulesoften predict difficult consumer decisions as well asor better than compensatory rules (e.g., Bröder 2000,Gilbride and Allenby 2004, Kohli and Jedidi 2007, Yeeet al. 2007, Hauser et al. 2010).

Building on Simon’s (1955, 1956) theory of boundedrationality, researchers in psychology argue that humanbeings use cognitively simple rules that are “fast andfrugal” (e.g., Gigerenzer and Goldstein 1996, Martignonand Hoffrage 2002). Fast and frugal rules evolve whenconsumers learn decision rules from experience. Con-sumers continue to use the decision rules becausethey lead to good outcomes in familiar environments(Goldstein and Gigerenzer 2002). For example, whenjudging the size of cities, “take the best” often leads tosound judgments.6 In 2010–2011, two issues of Judgmentand Decision Making were devoted to the recognitionheuristic alone (e.g., Marewski et al. 2010).

The costly nature of cognition has also receivedattention in economics (see Camerer 2003 for a review).A line of research looks to extend or revise standarddynamic decision making models with the explicitrecognition that cognition is costly. For example, Gabaixand Laibson (2000) empirically test a behavioral solu-tion to decision-tree problems, whereby decision makersactively eliminate low-probability branches to simplifythe task. Gabaix et al. (2006) develop a “directed cog-nition model,” in which a decision maker acts as ifthere is only one more opportunity to search. In thelaboratory, the directed cognition model explains sub-jects’ behavior better than a standard search modelwith costless cognition. Houser et al. (2004) providefurther evidence that consumers might use heuristicrules to solve dynamic programs.

Cognitive process mechanisms are debated in themarketing, psychology, and economics literatures. Ourhypothesis, that consumers use heuristics such as indexstrategies, need only the observation that consumersfavor decision rules that are cognitively simple and thatsuch rules often lead to good outcomes. The simplicityhypothesis assumes that consumers trade off utilitygains versus cognitive costs, but does not requireexplicit measurement of cognitive costs.

6 The take-the-best rule is, simply, if you recognize one city andnot the other it is likely larger; if you recognize both use the mostdiagnostic feature to make the choice.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


3.3. Cognitively Simple Solutions to ComplexOptimization Problems

If a ballplayer wants to catch a ball that is alreadyhigh in the air and traveling directly toward the player,then all the player needs to do is gaze upon the ball,start running, and adjust his or her speed to maintaina constant gaze angle with the ball (Hutchinson andGigerenzer 2005, p. 102).7 The gaze heuristic is anexample where a cognitively simple rule accomplishesa task that might otherwise involve solving difficultdifferential equations. But the principle is more gen-eral: simple solutions often perform well in complexoptimization problems.

There are many examples in marketing and economicswhere descriptive decision rules solve more-complexproblems.8 In domains such as consumer budget alloca-tion, the choice of which information source to search,and the evaluation of products via agendas, heuristicsolutions appear to describe consumer behavior well(Hauser 1986, Hauser and Urban 1986, Hauser et al.1993). Rust (1997b) argues that it is likely consumerssolve problems requiring an “infeasibly large numberof calculations” by using heuristic solutions such asdecomposition into subproblems. He states that “[t]hechallenge is to recognize whether or not a problem isnearly decomposable, and if so, to identify its approx-imately independent subproblems, [and] determinewhether they can be solved separately (p. 18).” Thisview is closely related to our index approach to complexforward-looking learning problems.

3.4. Related Optimization Problems: BanditProblems and Index Solutions

The model we formulate in §2 is closely related tothe multi-armed bandit problem, a prototypical problemthat illustrates the fundamental trade-off betweenexploration and exploitation in sequential decisionmaking under uncertainty. In a bandit problem, thedecision maker faces a finite number of choices, each ofwhich yields an uncertain payoff. The decision makermust make choices, observe outcomes, and updatebeliefs with a sequential decision rule. The decisionmaker seeks to maximize expected discounted values.

The bandit problem was first formulated by theBritish in World War II, and, for over 30 years, no simplesolution was known. Then Gittins and Jones (1974)demonstrated a simple index solution—develop anindex for each “arm” (i.e., each choice alternative) bysolving a subproblem that involves only that arm, then

7 Professional athletes use more-complicated heuristics that givethem greater range, for example, in baseball, prepositioning basedon prior tendencies and the expected pitch, and the sound as the bathits the ball.8 Of course, the empirical performance of descriptive solutions isnot guaranteed. Gilovich et al. (2002) provide a comprehensivesurvey of human decision heuristics and their possible biases.

choose the arm with the largest index. This index solu-tion reduces an exponentially complex problem to a setof one-dimensional problems. Gittins and Jones (1974)proved the surprising result that the index solution isthe optimal solution to the classic bandit problem.9

However, the Gittins-Jones’ striking result comes atthe cost of a strict assumption that the states of thenonchosen choice alternatives do not evolve. Whenthis assumption is violated, say because of randomshocks, Gittins’ index is no longer guaranteed to beoptimal. Such problems are known as restless bandits(Whittle 1988) and, in general, are computationallyintractable (Papadimitriou and Tsitsiklis 1999). In hisseminal paper, Whittle (1988) proposes a tractableheuristic solution. The solution generalizes Gittins’index such that the problems can be solved optimallyor near optimally by associating an index, referred toas Whittle’s index, separately with each alternative andchoosing the alternative with the largest index.

The existence of well-defined index solutions relieson a structural property called indexability, which isnot guaranteed for all restless bandit problems. Whittle(1988, p. 292) wrote that “One would very much liketo have simple sufficient conditions for indexability; atthe moment, none are known” (see also Niño-Mora2001). Gittins et al. (2011, p. 154) also lament that“the question of indexability is subtle, and a completeunderstanding is yet to be achieved.”10 In an importantclass of marketing models, choice models, consumerutility tends to be restless over purchase occasions.For example, in most random-utility choice modelsthere is an idiosyncratic “error term” as well as otherchanges in the choice environment (e.g., McFadden1986).11 Without further study, we do not know whether

9 Hauser et al. (2009) apply Gittins’ index to derive optimal “websitemorphing” strategies that match website design with customers’cognitive styles. Urban et al. (2014) field test morphing for AT&T’sbanner advertising on CNET and General Motors’ banner advertisingon a variety of websites. Other well-known applications of indexstrategies include job-match learning (Jovanovic 1979, Miller 1984)and pharmaceutical-product learning (Dickstein 2012). See Chinget al. (2013a) for a survey.10 The indexability of restless bandits is problem specific. For example,Niño-Mora (2001) takes the achievable region approach (Bertsimasand Niño-Mora 2000) and establishes the indexability of a class ofrestless bandit problems with linear performance measures (e.g.,queue input control). Glazebrook et al. (2006) show that a specialclass of restless bandit problems—stochastic scheduling—is indexable.To our knowledge, no general result analogous to Gittins’ indextheorem exists as of today.11 The error term has been modeled as an unobserved (to theresearcher) state variable in structural applications (Rust 1994,Chapter 51, §§3.1 and 3.2). This modeling approach “provides anatural way to ‘rationalize’ discrepancies between observed behaviorand the predictions of the discrete decision process model” (Rust1994, p. 3101). This is different from the “optimal choice plusnoise/measurement error” approach.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


an index strategy is a good solution to such restlessproblems.

We recognize that the canonical forward-lookingexperiential learning problem belongs to the generalclass of restless bandits because of the presence ofutility shocks. In §5 we prove that the problem isindexable and, thus, a well-defined index solutionexists in the sense of Whittle (1988). Moreover, weexplore the key properties of such an index, whichshed light on how consumers may behave in solvingthe learning problem.

4. An Index Strategy in the Absence ofUtility Shocks

The learning problem we examine includes utilityshocks, but it is easier to illustrate the intuition ofindex strategies using a problem without utility shocks.Temporarily assume both observable and unobservableshocks are zero for all j and t, although the same resultholds when there is no intertemporal variation in Exjtand �jt . In this special case, the consumer’s decisionproblem is a classic multi-armed bandit.

Gittins’ insight is as follows. To evaluate a brand j ,the consumer thinks as if he or she is choosing betweenthis brand and a reward �j that is fixed for all futurepurchase occasions. The consumer thus solves a sub-problem at each purchase occasion—the consumer caneither sample this brand to gain more informationabout it, or exploit the fixed reward �j . In the lattercase, the consumer’s belief about brand j ceases toevolve, such that sj1 t+1 = sjt . The optimal solution tothis subproblem is determined by a greatly simplifiedversion of the Bellman equation:

V 4sjt1�j5

= max{

�j + �V 4sjt1�j51Ɛ[

qjt + �V 4sj1 t+11�j5 � sjt]}

0 (4)

Figure 2 Gittins’ Index, Posterior Mean Quality, and the Value of Exploration

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

1 6 11 16 21 26 31 36 41 46

Gitt

ins’

inde

x

Purchase occasion

Gittins’ index

Posterior mean qualityValue of exploration

Notice that each subproblem only depends on thestate evolution of a single brand, j . The subproblemis much simpler than the full problem specified inEquation (3).

Gittins’ index, G4sjt5, is defined as the smallest valueof �j such that the consumer at purchase occasion tis just indifferent between experiencing brand j andreceiving the fixed reward. That is, we obtain G4sjt5by equating the two terms inside the maximizationoperator of Equation (4). Gittins proposes that G4sjt5could be used as a measuring device for the value ofexploring brand j—if there is more uncertainty abouta brand left to explore, the consumer will demanda higher fixed reward to be willing to stop explo-ration. Naturally, Gittins’ index is updated when newinformation arrives.

Gittins’ surprising result is the Index Theorem.The optimal solution is to choose the brand with thehighest index at each purchase occasion. A computa-tionally difficult problem has thus been decomposedinto J simpler subproblems.

Index Theorem (Gittins and Jones 1974). The opti-mal decision strategy when there are no utility shocks isçG4Est5= arg maxj∈AG4sjt5.

Figure 2 illustrates intuitive properties of Gittins’index. We consider one brand. The solid line plotsone realization of Gittins’ index as it evolves whenthe brand is chosen repeatedly. The dashed line plotsthe consumer’s posterior mean quality belief. It isupdated by brand experience and converges toward thetrue brand quality. Myopic consumers would exploitexperience and choose the brand that yields the highestposterior mean quality. Forward-looking consumersmay want to explore further. The dotted curve, whichis simply the difference between Gittins’ index and theposterior mean quality, measures the value of explo-ration. This curve declines smoothly with experiencebecause the value of exploration decreases as the

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


consumer learns more about brand quality. When weplot Gittins’ index as a function of the consumer’sposterior quality uncertainty �̄jt (not shown), it isalso intuitive—the index increases with �̄jt becausethe value of exploration increases with the remainingamount of quality uncertainty. Figure 2 and the sim-ple relationship between Gittins’ index and posteriorquality beliefs suggest that a consumer might intuitsomething close to the dotted curve if there were noutility shocks.

5. An Index Strategy in the Presence ofUtility Shocks

We now allow utility shocks. Observable shocks Exjtinclude effects that researchers observe and model,such as changes in advertising, price, or promotion.Unobservable shocks �jt include effects that researchersdo not observe and that do not provide a signal aboutquality. The presence of unobservable shocks is centralto many empirical consumer choice models. Becauseshocks enter the utility function regardless of the con-sumer’s decisions, the consumer may, in any purchaseoccasion, switch among brands.12

When the model includes utility shocks, the Gittins-Jones index theorem no longer applies because thestates of nonchosen brands do not remain constant.With shocks, the consumer’s problem belongs to theclass of restless-bandit problems as introduced byWhittle (1988). In general, such optimization problemsare PSPACE-hard (Papadimitriou and Tsitsiklis 1999,Theorem 4) making the problem extremely difficult, ifnot infeasible, to solve and making it implausible thatthe consumer would use a solution strategy based onEquation (3). Among other difficulties, PSPACE-hardproblems require extremely large memory—a particu-larly scarce resource for consumers (e.g., Lindsay andNorman 1977, p. 306; Bettman 1979, p. 140). We developa theoretical solution to this problem in this section. Weshow that the canonical forward-looking experientiallearning problem is indexable and index strategieshave intuitive properties.

5.1. The Canonical Forward-Looking ExperientialLearning Problem Is Indexable

Whittle (1988) proposes a solution that generalizesGittins’ index. At each purchase occasion, to evaluatea brand j , the consumer thinks as if he or she mustchoose between brand j and a reward �j that is fixedfor all future purchase occasions. The Bellman equation

12 Even when there is no learning, a typical empirical model ofconsumer choices may include a shock, or an idiosyncratic error,�jt , that is treated as unobservable by researchers. Without thisshock, the model would predict that the consumer makes the samechoice over purchase occasions if all other observable factors remainconstant. In the context of learning, incorporating this shock allowsfor switching among brands even when the consumer has learnedmuch about brand quality.

for the jth subproblem, which now includes utilityshocks, becomes

V 4sjt1 Exjt1 �jt1�j5

= max{

�j + �Ɛ[

V 4sjt1 Exj1 t+11 �j1 t+11�j5]

1 E�′ Exjt + �jt

+ Ɛ[

qjt + �V 4sj1 t+11 Exj1 t+11 �j1 t+11�j5 � sjt]}

0 (5)

The index is defined as the smallest value of �j

such that the consumer at purchase occasion t is justindifferent between choosing brand j and receivingthe fixed reward. For such an index to be well-definedand meaningful, the indexability condition needs tobe satisfied (Whittle 1988). Let St4�j5⊆ì×X ×E bethe set of states for which choosing �j at purchaseoccasion t is optimal:

St4�j5 ={

4sjt1 Exjt1 �jt5 ∈ì×X ×E2 �j

+ �Ɛ[

V 4sj1 t1 Exj1 t+11 �j1 t+11�j5]

≥ E�′ Exjt + �jt

+ Ɛ[

qjt + �V 4sj1 t+11 Exj1 t+11 �j1 t+11�j5 � sjt]}

0 (6)

Indexability is defined as follows:

Definition. A brand j is indexable if, for any t,St4�j5⊆ St4�

′j5 for any �j <�′

j .

Indexability requires that, as the fixed rewardincreases, the collection of states for which the fixedreward is optimal does not decrease. In other words, ifin some state it is optimal to choose the fixed reward,it must also be optimal to choose a higher fixed reward.Indexability implies a consistent ordering of brands forany state, so an index strategy is meaningful. However,indexability need not always hold in general and can-not be taken for granted (Whittle 1988).13 Thus, beforewe can posit an index strategy as a consumer heuristic,we must establish indexability for a model that includesutility shocks. In Online Appendix A (available assupplemental material at http://dx.doi.org/10.1287/mksc.2014.0868), we prove the following proposition.

Proposition 1 (Indexability). The canonical forward-looking experiential learning problem defined in §2 isindexable.

Once the indexability condition is established, thena well-defined strategy is to choose at each purchaseoccasion the brand with the largest index. The indexstrategy breaks the curse of dimensionality by decom-posing a problem with exponential complexity into Jmuch simpler subproblems, each on a state space of�ì� after integrating out the utility shocks Exjt and �jt .With this simplification, it is more plausible that theconsumer might use the index strategy. As a bonus,

13 Whittle (1988, p. 297) provides a simple example where indexabilityfails.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


estimation is much faster. The difference in the sizeof the state space can be dramatic. For example, sup-pose we were interested in the mean and varianceof quality and discretized them with M and N gridpoints, respectively. With J brands, the state space forindex strategies is M ×N for each brand, rather than4M ×N5J for the original optimization problem givenin Equation (3). For M =N = 10 and J = 6, this is thedifference between a state space of 100 (for each of thesix brands) and 1,000,000,000,000.

5.2. The Index Strategy Is Invariant to Scale andBehaves Intuitively

Index strategies dramatically simplify the solution, butcan the consumer intuit (perhaps approximately) anindex strategy? We expect future laboratory experi-ments to address this issue empirically. In this paper,we argue that index strategies have intuitive propertiesand that it is not unreasonable for the consumer tointuit those properties.

An index strategy would be difficult for the con-sumer to use if the strategy were not invariant topermissible scale transformations. If it is invariant theconsumer can intuit (or learn) the basic shape of theindex function and use that intuited shape in manysituations. Invariance facilitates ecological rationality.14

The following results hold for fairly general distribu-tions of quality, Fj4qjt3�j5, and joint distributions ofutility shocks Exjt and �jt , as long as they have scale andlocation parameters and the quality belief Bjt4�j3 sjt5is conjugate. To ease interpretation, we assume thatFj and Bjt are normal distributions with parametersdefined earlier: �j and �j for true quality; �̄jt and �̄jt

for posterior beliefs about quality; and �x1 �j and �x1 �

j

for utility shocks. In Online Appendix B we prove thefollowing proposition.

Proposition 2 (Invariance). Let W̌ be Whittle’sindex for the canonical forward-looking experiential learningproblem computed when the posterior mean quality (�̄jt) iszero, the mean utility shock (�x1 �

j ) is zero, and the inherentvariation of quality (�j) is one. Whittle’s index for anyvalues of these parameters is the following simple functionof W̌ :

Wj

(

�̄jt1�̄jt1 E�′ Exjt+�jt1�j1�x1�j 1�x1�

j 1�)

= �̄jt+�x1�j +�jW̌j

(

01�̄jt

�j

1E�′ Exjt+�jt−�x1�

j

�j

11101�x1�j

�j

1�

)

0

Proposition 2 implies that the consumer can simplifyhis or her mental evaluations by decomposing theindex for each brand into (1) the mean utility gainedfrom myopic learning, �̄jt +�x1 �

j , which reflects theexploitation of posterior beliefs, and (2) the incremental

14 Gittins’ index exhibits invariance properties (Gittins 1989).

benefit of looking forward, �jW̌ , which captures qualityinformation gained through exploration. To assess thevalue of exploration, the consumer need only intuitthe shape of W̌ for a limited range of parameter valuesand scale it by �j . Proposition 2 also helps researchersunderstand which parameters can be identified in theindex-strategy model.

To provide further intuition, we prove the followingproposition in Online Appendix C. The propositionshows that Whittle’s index behaves as expected whenthe parameters of the problem vary. The consumer likesincreases in quality and utility shocks, dislikes inherentuncertainty in quality and utility shocks, but valuesthe ability to learn and, hence, resolve the uncertaintyin posterior beliefs about quality.

Proposition 3 (Comparative Statics). Whittle’s in-dex for the canonical forward-looking experiential learningproblem (1) increases with the posterior mean of quality (�̄jt),the observable utility shocks ( E�′ Exjt), and the unobservableutility shock (�jt); (2) weakly decreases with the inherentuncertainty in quality (�j ) and the magnitude of uncertaintyin the utility shocks (�x1 �

j ); and (3) weakly increases withthe consumer’s posterior uncertainty about quality (�̄jt).

Figure 3 illustrates Whittle’s index where we setthe posterior mean quality to zero, so that the curverepresents the value of exploration. (More generally,Whittle’s index fluctuates with the posterior meanquality in a way similar to Figure 2.) As was the casefor Gittins’ index, the value-of-exploration componentof Whittle’s index is a smooth decreasing functionof experience because experience reduces posteriorquality uncertainty. With sufficient experience, thevalue of exploration converges toward zero implyingthat, asymptotically, the value of a brand is based onthe posterior mean of quality (Proposition 2). UnlikeGittins’ index, Whittle’s index is a function of themagnitude of utility shocks (�x1 �

j 5. As the magnitudeof utility shocks becomes larger, it is less important forthe consumer to explore, and the value of explorationdecreases as shown in Figure 3. These properties andthe shape of the curve itself, are intuitive.

Figure 3 and Proposition 3 suggest that, other thingsbeing equal, when the magnitude of the uncertaintyin utility shocks is larger, the realized utility shocksare more likely to be the deciding factor in consumers’brand choices. For example, as the depth of pricepromotions increases, consumers are more likely tobase their purchase decisions on price. When �x1 �

j = 5(compared with inherent quality uncertainty normal-ized as �j = 15, Whittle’s index is almost flat implyingan almost myopic strategy. To formalize this insight,we state the following corollary to Proposition 3:

Corollary. (1) As the consumer’s posterior uncertaintyin quality increases relative to the magnitude of the utility

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


Figure 3 Whittle’s Index as Experience and Utility Shock Magnitude Vary

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 6 11 16 21 26 31 36 41 46

Whi

ttle’

s in

dex

Purchase occasion

Shock magnitude = 0.01




Notes. Posterior mean quality is set to zero in this figure, so that Whittle’s index captures the value of exploration. Inherent quality uncertainty �j is normalized as 1.

shocks, the value to the consumer from looking forwardincreases. (2) As the magnitude of the utility shocks increasesrelative to the consumer’s posterior uncertainty in quality,the value from looking forward decreases. In this lattercase, a myopic leaning strategy (i.e., exploiting posteriorbeliefs) may suffice, and could be the optimal strategy if it iscognitively simpler than a forward-looking learning strategy.

These results highlight the intricate relationshipbetween the consumer’s uncertainty in quality anduncertainty caused by utility shocks. The two types ofuncertainty complement each other in driving the con-sumer’s value of exploitation, but may compete witheach other in shaping the consumer’s value of explo-ration. The index solution offers an intuitive descriptionof this relationship. In §§6 and 7, we examine theempirical performance of the index strategy.

6. Examination of the Near Optimalityof an Index Strategy (Synthetic Data)

We now examine whether an index strategy implies areasonable trade-off between optimality and simplicity.Indexability guarantees existence of a well-definedindex strategy but does not guarantee its optimality.15

For the canonical forward-looking experiential learningproblem, the performance of the index strategy is anempirical question. Cognitive costs remain unobserv-able, but §§4 and 5 suggest that an index strategycould be substantially simpler than the direct solutionof the Bellman equation to the overall problem. Toexamine whether the loss in utility is small, we switchfrom analytic derivations to synthetic data becausethe loss in utility is an issue of magnitude rather than

15 Many performance bounds have been developed in differentcontexts. See Gittins et al. (2011) for a review of recent developments.

direction. Synthetic data establish existence (rather thanuniversality) of situations where index strategies areclose to optimal.

For concreteness we examine the special case when Fjand Bjt are normal distributions. From the perspectiveof consumer decision making, what matters is thejoint distribution of observable shocks (Exjt5 and unob-servable shocks (�jt5. Therefore, for the synthetic-dataanalysis we set observable shocks to zero without lossof generality. Practically, even if there are no observableshocks (e.g., no price promotions), unobservable shocks(e.g., idiosyncratic taste fluctuations) are still likelyto prevail in most choice models. We allow for bothobservable and unobservable shocks in the field-dataanalysis.

We compare four decision strategies that the con-sumer might use.

1. No learning. The consumer chooses the brandbased only on the consumer’s prior beliefs of qualityand the current utility shocks. This strategy provides abaseline to evaluate the incremental value of learning.

2. Myopic learning. The consumer chooses the brandbased only on the consumer’s posterior quality beliefsand the current utility shocks. This strategy exploits theconsumer’s posterior knowledge about brand quality.The corollary predicts that this strategy will sufficewhen the magnitude of utility shocks is relatively highcompared with posterior quality uncertainty.

3. Index strategy. This strategy assumes the consumercan intuit the shape of Whittle’s index. As per Proposi-tion 2, this strategy improves on the myopic learningstrategy to take into account the exploration value oflearning. Brand choices reflect the consumer’s trade-offbetween exploitation and exploration.

4. Approximately optimal. The PSPACE-hard forward-looking experiential learning problem cannot be solved

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


optimally, hence researchers resort to approximatesolutions (e.g., Keane and Wolpin 1994, Erdem andKeane 1996, Rust 1997a, Ackerberg 2003, Crawford andShum 2005, Imai et al. 2009, Ching 2010, Ching et al.2013b). Although approximation methods vary (seeOnline Appendix G for a review), discrete optimizationis a representative method and should converge to theoptimal solution with a larger number of grids (Chowand Tsitsiklis 1991, Rust 1996).

We choose parameters that illustrate the phenomenaand are empirically plausible. The simulation requiresa finite horizon; we select T = 50 purchase occasions.If the discount factor is �= 0090, truncation to sucha finite horizon is negligible. We discretize the statespace, sjt = 4�̄jt1 �̄jt5 into a set of M ×N grid points foreach of J brands. We choose M ×N = 200 × 50 = 104,which should be close to optimal in the continuousproblem.16 To simplify integration we draw the utilityshocks from a Gumbel distribution with parameters4��

j 1��j 5 and normalize the location parameter such

that the utility shocks have zero unconditional means(Rust 1987, 1994). Inherent uncertainties in quality forboth brands, �j , are equal and normalized to one.

The index strategy evolves on a state space of sizeM ×N for each of the J brands, whereas the approxi-mately optimal solution evolves on a state space ofsize 4M ×N5J . We choose J = 2 for a conservative testof the relative simplicity of the index strategy.

We vary the parameter values to capture three pos-sibilities: (1) the means and uncertainty both favorone brand, (2) the means are the same but uncertaintyfavors one brand, and (3) the means and uncertaintyfavor different brands. Because quality beliefs are rela-tive, we fix the prior mean quality belief of brand 1as �̄10 = 0 and vary the prior mean quality belief forbrand 2 as �̄20 ∈ 8−0031010039. We normalize the stan-dard deviation of brand 2’s prior quality belief as�̄20 = 1 and the standard deviation of brand 1’s priorquality belief as �̄10 = 005. Finally, to test the corol-lary we allow the uncertainty in shocks to vary fromrelatively small to relatively large: � �

1 = � �2 ∈ 8001119.

We compute the indices and the consumer’s expectedtotal utilities for 50 purchase occasions under the fourdecision strategies. Details are provided in OnlineAppendices D and E. Table 1 summarizes the results.

We first examine computation time as a surrogatefor cognitive complexity. As expected, the no-learningand myopic learning strategies impose negligible com-putation time, the index strategy requires moderate

16 We choose M = 200 grid points for the posterior mean quality.Meanwhile, we fix each brand’s prior quality variance. Posteriorquality variance evolves deterministically following Bayesian updat-ing formulae. Because there are T = 50 purchase occasions, a brand’sposterior quality variance has N = T = 50 possible values, dependingon how many times this brand has been chosen. Therefore, the sizeof the state space for the index strategy is M ×N = 200 × 50 = 104.

computation time, and the approximately optimal solu-tion is substantially slower—600 times as time consum-ing as the index strategy even for this basic problem.Faster approximation algorithms would reduce thecomputational time for the approximately optimalsolution (Keane and Wolpin 1994, Rust 1997a, Imaiet al. 2009), but they would also expedite the indexstrategy because we use the same algorithm for solvingthe Bellman equations in both models (see OnlineAppendix G for implementation details). Moreover,faster approximation algorithms do not address thecurse of dimensionality. The ratio of computationaltime in Table 1 could be made arbitrarily large withfiner grid points or with a larger number of brands.

We next examine the consumer’s expected utilities.In all cases, the no-learning strategy leads to the lowestutility, which suggests that learning is valuable. Further-more, the index strategy is statistically indistinguishablefrom the approximately optimal strategy. As long ascognitive simplicity matters even a little, the indexstrategy will be better on utility minus complexity.

Finally, the results are consistent with the corollary.When there is relatively low uncertainty in utilityshocks (upper panel of Table 1), the index strategy andthe approximately optimal strategy generate higher util-ity than myopic learning, and, in two of the three cases,significantly higher utility. When there is relatively highuncertainty in utility shocks (lower panel of Table 1),the myopic learning model performs virtually thesame as either the index strategy or the approximatelyoptimal strategy. The differences are not significant.In this case, the consumer might achieve the best utilityminus complexity with a myopic strategy, among themodels tested.

Analysis of synthetic data never covers all cases.Table 1 is best interpreted as providing evidence that(1) there exist reasonable situations where an indexsolution is better than the approximately optimal solu-tion on utility minus complexity and (2) there existdomains where myopic learning is best on utility minuscomplexity. We now examine field data.

7. Field Estimation of an Index Strategy(IRI Data on Diaper Purchases)

We examine how an index solution fits and predictsbehaviors compared with an approximately optimalsolution and myopic learning. As a first test, we seek aproduct category and sample where consumers arelikely to be forward looking. Even if an index solutiondoes no better than an approximately optimal solution,we consider the result promising because an indexsolution is cognitively simpler. As a test of face validity,we expect learning strategies to outperform no-learningstrategies and, because we focus on a situation thatfavors forward-looking behavior, we expect forward-looking strategies to outperform myopic learning.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


Table 1 Comparing Decision Strategies on Utility and Simplicity (Synthetic Data)

Expected discounted utility (standard errors in parentheses)

No learning Myopic learning Index strategy Approximately optimal

Size of state space N/A N/A 104 108

Computation time (surrogate for cognitive complexity)a Negligible Negligible 102 seconds 6 × 104 seconds

Relatively low uncertainty in utility shocks (� �1 = � �

2 = 001)

Mean of prior quality beliefs (brand 1, brand 2)(�̄101 �̄205= 40001−0035 00041 10801 10992 10996

4000035 4000435 4000455 4000455(�̄101 �̄205= 400010005 00618 30352 30544 30547

4000035 4000495 4000525 4000525(�̄101 �̄205= 400010035 30036 50298 50323 50327

4000035 4000565 4000565 4000565

Relatively high uncertainty in utility shocks (� �1 = � �

2 = 15

Mean of prior quality beliefs (brand 1, brand 2)(�̄101 �̄205= 40001−0035 40919 50762 50767 50768

4000265 4000475 4000475 4000475(�̄101 �̄205= 400010005 60182 70150 70190 70190

4000275 4000505 4000525 4000525(�̄101 �̄205= 400010035 70946 80912 80911 80912

4000265 4000545 4000545 4000545

aThis is the time required to compute one utility function using a university computing system based on Sun Grid Engine and Red Hat Enterprise Linux.

7.1. IRI Data on Diaper PurchasesWe select the diaper category from the IRI Market-ing Data Set that is maintained by IRI (formerlySymphonyIRI Group) and available to academicresearchers (Bronnenberg et al. 2008).17 Diaper con-sumers are likely to be learning and forward looking.Parents typically begin purchasing diapers based on adiscrete birth event, and their entry to the category isarguably exogenous (Ching et al. 2010, 2012). Even ifthe birth is a second or subsequent child, diaper qualitymay have changed. Informal qualitative interviewssuggest that parents learn about whether diaper brandsmatch their needs through experience (with oftenmore than one purchase), that diapers are sufficientlyimportant that parents take learning seriously, and thatparents often try multiple brands before settling ona favorite brand. In fact, Ching et al. (2012) find thatdiaper consumers conduct strategic trials of variousbrands.18 There are observable shocks due to pricepromotions and shocks due to unobservable events.

17 In comparison, durable goods may induce different learningdynamics. Because of the low purchase frequency, consumers maynot have the opportunity to learn by sampling. Also, because thestakes are often high, consumers may have the motivation to acquireother types of information (e.g., Consumer Reports reviews) priorto purchase. The INFORMS Society of Marketing Science durablesgoods data set (Ni et al. 2012) provides a good resource to studythese learning dynamics.18 Ching et al. (2012) use a quasi-structural approach, where theymodel the consumer’s expected future payoffs as a function of statevariables. Their model detects strategic trial if the coefficients ofexpected future payoffs are significant and if model fit improvessignificantly over the myopic model.

For example, a baby might go through a stage wherea different brand is best suited to the parent/child’sneeds. Finally, diapers have the advantage of beingregular purchases, where the no-choice option is lessof a concern, and consumers tend to be in the marketfor many purchase occasions.

To isolate a situation favoring forward-looking learn-ing, we apply the following sample screening criteria.First, to focus on consumers whose purchases are likelytriggered by a birth event, we select households whosefirst purchase occurs 30 weeks after the start of datacollection (73% of the entire sample). Second, we focuson frequent buyers. Compared with occasional buyerswho might be shopping for a baby shower, frequentbuyers are more likely to have both the motivationand the opportunity to explore different diaper brands.Therefore, we select households who have made at leastfive purchases during the observation window (39%of the entire sample).19 Third, to focus further, weeliminate any consumers who have purchased pri-vate labels and restrict attention to consumers whobuy exclusively branded products (64% of the entiresample). To the extent that private label buyers aremore price sensitive (Hansen et al. 2006), they maybe less interested in learning about product quality.(In §8.4, we reanalyze the data by including privatelabels.) After applying these screening criteria, the data

19 Analyses based on a random selection of buyers rather thanfrequent buyers are available from the authors. The myopic learningmodel does better on this random selection of buyers than onfrequent buyers because infrequent buyers have less incentive oropportunity to learn.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


contain 262 households who made 3,379 purchases(13 purchases per household on average).20 We ran-domly select 131 households for estimation and 131households for validation.

The market is dominated by three major brands,Pampers, Huggies, and Luvs. We aggregate all otherbranded purchases as “other brands” and do not modelthe no-purchase option. As a first-order view, Table 2(a)compares market-shared-weighted switching behaviorduring the first 13 purchases with that after the first 13purchases.21 There is a noticeable change in switchingpatterns. For example, the relative brand loyalty ofHuggies increases after 13 purchases. This suggeststhat consumers may learn about brand quality fromexperience. Although the category is chosen as a likelytest bed for consumer learning, high brand loyalty,even during the initial 13 purchases, suggests that thereis no guarantee a forward-looking strategy will fit thedata.

7.2. Empirical SpecificationWe denote households by i and denote by Ti house-hold i’s purchase-occasion horizon. We assume thatthe quality and quality-belief distributions, Fj andBijt , are normal and that unobservable shock distri-butions are Gumbel. For this initial test of an indexsolution, we limit Exjt to the weekly average prices. Thedecision strategies are specified below (� and xjt arenow scalars):No learning: çN = arg maxj8�̄j0 +�xjt + �ijt9.Myopic learning: çM =argmaxj8�̄ijt+�xjt+�ijt9.Index strategy: çW = arg maxj8�̄ijt +�x1 �

j +

�jW̌j401 �̄ijt/�j1 4�xjt + �ijt −�x1 �j 5/�j11101�x1 �

j /�j1�59.Approximately optimal: çA = arg maxj8�̄ijt +�xjt +

�ijt + �Ɛ6V 4Esi1 t+11 Ext+11 E�i1 t+15 � Esit1 j79.

7.3. Issues of IdentificationAlthough we would like to identify all parametersof the various models, we cannot do so from choicedata alone because utility is only specified to an affine

20 The data only record the week, as opposed to the exact time, ofpurchase. Therefore, if a consumer makes multiple purchases duringthe same week, we do not observe the sequence of brands purchased.Rather than make potentially erroneous assumptions about thedata, we remove consumers who make multiple-brand purchases inany week of the observation window (11% of the entire sample).An alternative analysis strategy might have been to randomizepurchase orders. However, there is no reason to expect that removingconsumers who make multiple purchases a week will affect thecomparison between the index strategy and the approximatelyoptimal solution. We also do not model purchase quantity decisions.Instead we assume that consumers update their quality beliefs aftereach purchase (and consumption) occasion.21 We define market share at the purchase level across the observationwindow, so that market shares before and after the first 13 purchasesadd up to 100%. For readers who wish to normalize Table 2 in otherways, the raw counts are obtained by multiplying the percentages inTable 2 by 1,407, the total number of purchases in the estimationsample except the last purchase of each household.

Table 2 Switching Among Diaper Brands

Percent of times that row brandis purchased at occasion t and column

brand is purchased at occasion t + 1a (%)

Pampers Huggies Luvs Other brands

(a) Actual switching matrixWithin the first 13 purchases

Pampers 2003 309 209 005Huggies 308 2105 106 002Luvs 205 204 1206 003Other brands 006 004 003 100

After the first 13 purchasesPampers 603 009 007 001Huggies 007 1102 002 001Luvs 009 000 400 000Other brands 001 001 000 000

(b) Predicted switching matrix—Index strategy modelWithin the first 13 purchases

Pampers 2003 300 109 002Huggies 200 2601 103 001Luvs 106 106 1502 001Other brands 003 001 001 009

After the first 13 purchasesPampers 406 005 004 000Huggies 003 1500 001 000Luvs 005 002 304 000Other brands 000 000 000 001

aSwitching percentages are weighted by market share so that the percentagesin the same table add up to 100%.

transformation, and because many of the parametersthat matter are relative parameters. For the no-learningmodel we can identify only the relative means of priorbeliefs, as well as the price sensitivity parameter �. Forthe myopic learning model we can identify only therelative means of prior beliefs, the relative uncertaintiesof prior beliefs, the true means of quality, and pricesensitivity. For the no-learning and myopic learningmodels time discounting does not matter.

For the index strategy and approximately optimalstrategy we set the mean prior belief of one brand (�̄105to zero and normalize its variance of quality (�15 to oneto set the scale of quality. (Only �̄j0/�j matters.) We can-not simultaneously identify a brand-specific mean ofquality and a brand-specific mean of the unobservableshock, so we set the latter to zero (��

j = 0). The standarddeviation of xjt is observed in the data. We can thencompute �x1 �

j from � �j because the observable and unob-

servable shocks are independent. As in most dynamicdiscrete choice processes (Rust 1994), the discount factor� is difficult to estimate; we set it to 0090.22

22 Sensitivity analyses with other discount rates (e.g., 0.95 and 0.99)yield almost identical log-likelihood statistics and similar parameterestimates for the index strategy model. Anticipating the resultsof §7.4, we expect a similar lack of sensitivity for the approximatelyoptimal strategy. The ease with which such sensitivity checks canbe run is a benefit of the computational tractability of the indexstrategy model.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


Finally, as in Erdem and Keane (1996), we suppress“parameter heterogeneity” among households. We con-tinue to allow each household’s quality beliefs to evolveidiosyncratically, but we do not attempt to estimateheterogeneity in prior beliefs, true mean quality, or themagnitude of utility shocks. We abstract away fromparameter heterogeneity for the following reasons. First,there are, on average, only 13 purchases per household.We would overly strain the model by attempting to esti-mate heterogeneity in all of the parameters.23 Second,we wish to focus on behavioral heterogeneity that arisesendogenously from forward-looking learning. Evenif households start with exogenously homogeneousprior beliefs, different quality realizations and util-ity shocks lead to different posterior beliefs, differentexploitation-versus-exploration trade-offs, and differentlearning paths (e.g., Ching et al. 2013a). We seek toevaluate heterogeneous learning dynamics based onthe data, rather than using heterogeneous parametersto fit the data. For an initial test of an index strategy,this simplification is conservative because it biasesagainst a good model fit.

We estimate each model’s parameters with maximumsimulated likelihood estimation. Estimation details areprovided in Online Appendices F and G.

7.4. Estimation ResultsTable 3 summarizes the fit statistics for the 1,538diaper purchases in the in-sample estimation andthe 1,841 purchases in the out-of-sample validation.An information-theoretic measure, U 2, calculates thepercent of uncertainty explained by the model (Hauser1978); the Akaike Information Criterion (AIC) andthe Bayesian Information Criterion (BIC) attempt tocorrect the likelihood function based on the numberof parameters in the in-sample estimation, BIC moreso than AIC. (There are no free parameters in theout-of-sample validation.)

For comparability, we estimate the index strategyin two ways. The first estimation discretizes the statespace in the same manner as the approximately optimalmodel 4M =N = 55. This enables an “apples-to-apples”comparison. Then, because the index model does notsuffer from the curse of dimensionality, we reestimatethe model with a finer grid (M = 2001N = 755. Thereare only trivial differences. For example, U 2 = 88018%for both estimations, and parameter values are notsignificantly different (nor different from the approxi-mately optimal model). We report the results associatedwith the finer grid for the rest of the paper.

23 Doing so is technically feasible, but would likely over-parameterizethe model and exploit noise in the data. More importantly, our goalis to demonstrate that an index solution is a viable representation ofcognitive simplicity and that cognitive simplicity is a phenomenonworth studying in structural models. We leave explicit modelingof parameter heterogeneity to future research. Section 8.3 exploresforesight heterogeneity.

First, on all measures there are sizable gains tolearning—all learning models explain and predictbrand choices substantially better than the no-learningstrategy. Second, the index strategy improves in-samplefit and out-of-sample predictions relative to myopiclearning. The likelihood is significantly better (Vuongtest significance is p = 000002 in-sample and 000429out-of-sample).24 This result is consistent with ourexpectation that frequent buyers of branded diapersare forward looking. Third, the index strategy per-forms as well as the approximately optimal solution interms of both in-sample fit and out-of-sample predic-tions. This result is consistent with the synthetic-dataanalysis—when two strategies yield almost the sameexpected utilities and hence predict almost the samebrand choices, they are observationally equivalent andstatistically indistinguishable.

As a further visualization of model fit, Table 2(b)reports the predicted market-share-weighted switch-ing patterns. The predicted switching patterns arequalitatively similar to actual switching patterns inTable 2(a). For example, the index strategy model picksup the fact that consumers are more loyal to Huggiesthan to the other brands because, as we discuss below,the true mean quality is higher for Huggies and it islikely more rewarding to learn about Huggies (�x1 �

j

being relatively small). Although predictions are notperfect and could be improved if other x-variableswere observed, the overall mean absolute error (MAE)is within 008% of actual switching. Moreover, the pre-dicted switching patterns from the index strategymodel are virtually identical to those from the approx-imately optimal solution model (reported in OnlineAppendix H). The market-share-weighted MAE isapproximately 3/100ths of 1%.

Table 4 summarizes the estimated parameter values.As expected, the price sensitivity coefficient is negativein every model. Across all learning models, all fourbrands increase in mean quality relative to prior beliefs,which implies that diaper buyers learn to appreciatethese brands more through experience. These resultsare consistent with the switching patterns in Table 2(a).

Forward-looking models identify the magnitude ofutility shocks relative to inherent quality uncertainty(last panel of Table 4). Because the relative shock uncer-tainty varies across brands, the index curve impliesdifferent behavior than myopic learning for thosebrands. This explains why forward-looking models fitand predict better than myopic learning. For example,Huggies has lower relative shock uncertainty thanother brands, which may provide greater incentives forconsumers to explore Huggies. Because the myopiclearning model ignores this difference, it compensatesby overestimating the mean prior belief of Huggies.

24 We use the Vuong test to compare nonnested models (Vuong 1989).

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


Table 3 In-Sample and Out-of-Sample Fit Statistics for Diaper Data

Myopic Index Approximately Myopic learning Index strategy One-period HeterogeneousNo learning learning strategy optimal with risk aversion with risk aversion look-ahead foresight

Calibration sampleLog likelihood −11760026 −11051039 −11008019 −11008095 −11051039 −11008001 −11023030 −989006U2 (%) 79036 87067 88018 88017 87067 88018 88000 88040AIC 31528053 21126077 21048039 21049089 21128077 21050002 21072061 21036012BIC 31549088 21190083 21133080 21135031 21198017 21140077 21142000 21190093No. of parameters 4 12 16 16 13 17 13 29No. of observations 1,538 1,538 1,538 1,538 1,538 1,538 1,538 1,538

Hold-out sampleLog likelihood −11998025 −11165047 −11126087 −11127035 −11165047 −11124031 −11115004 −11113096U2 (%) 80043 88058 88096 88096 88058 88099 89008 89009No. of observations 1,841 1,841 1,841 1,841 1,841 1,841 1,841 1,841

Computation time in secondsa Negligible 2 1.4 (22) 104 2 23 11 22

aThis is the time required to compute one likelihood function using a university computing system based on Sun Grid Engine and Red Hat Enterprise Linux.The approximately optimal model is estimated using the original grid (M = N = 5). For the index strategy model, the computation time is 1.4 seconds for theoriginal grid and 22 seconds for the finer grid (M = 200, N = 755.

Managerially, Huggies has a higher true mean qual-ity than Pampers and Luvs, but also higher inherentrelative uncertainty in quality across consumption.(The table reports the ratio of shock uncertainty toquality uncertainty—a smaller number means higherrelative quality uncertainty.)

Both the index strategy and the approximatelyoptimal strategy lead to similar parameter estimates.Parameter estimates of either model are usually withinconfidence regions of the alternative model. This resultis consistent with the synthetic-data analysis, whichsuggests that both strategies lead to near optimal utility.The index strategy will be a more plausible descrip-tion of consumer behavior if it is cognitively simpler.We explore this last point below.

Computation time in the embedded optimizationproblem is one surrogate for cognitive complexity.The last row of Table 3 reports the time necessary tocompute one likelihood function in each model. For theindex strategy model we report the computation timefor both the original grid (M =N = 55 and the finergrid (M = 200, N = 755—the latter is in parentheses.Consistent with the synthetic-data analysis, the indexstrategy is substantially faster than the approximatelyoptimal strategy (74-to-1 ratio based on the same griddensity of M =N = 55.

The size of the state space is another surrogate forcognitive complexity (e.g., a consumer’s memory).The state space for the approximately optimal strategyis 15,625 times as large as the state space for the indexstrategy given the same grid density of M = N = 5.Computational-time ratios are not equal to state-spaceratios because of computational overhead. Nonetheless,if we were to attempt to use the finer grid of M = 200,N = 75 for the approximately optimal strategy, wewould increase the state space of the approximatelyoptimal solution by a factor of 130 billion. It is unlikely

that approximately optimal computations would befeasible for the finer grid. Detailed calculations arepresented in Online Appendix G.

In summary, using IRI data on diaper purchases wefind that (1) learning models fit and predict substan-tially better than the no-learning model; (2) forward-looking learning models fit and predict significantlybetter than the myopic learning model; (3) the indexstrategy and the approximately optimal solutionachieve similar in-sample fit and out-of-sample fore-casts, as well as reasonably close parameter estimates;and (4) computational (and cognitive) simplicity favorsthe index strategy model relative to the approximatelyoptimal model.

8. Further ExplorationsWe have shown that the canonical forward-lookingexperiential learning model is indexable and that anindex strategy performs well. We now extend the analy-sis to explore consumer risk aversion, other cognitivelysimple heuristics, heterogeneous consumer foresight,and private labels.

8.1. Risk AversionFor ease of exposition, in previous sections we assumedthat consumers are risk neutral. However, risk aversioncan be an important issue for decision making underuncertainty (see Ching et al. 2013a for a review). We gen-eralize our model to incorporate risk aversion followingthe standard discounted-utility approach (e.g., Samuel-son 1937, Erdem and Keane 1996). At each purchaseoccasion t, the consumer maximizes

∑�

�=t ��−tu4w�5,

where w� is the net payoff the consumer receives atpurchase occasion � , and u4 · 5 is the consumer’s utilityfunction. Utility increases with net payoff (i.e., u′ > 05.In addition, the curvature of the utility function cap-tures general risk preferences: the consumer is risk

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


Table 4 Parameter Estimates for Diaper Data

Heterogeneous foresightMyopic IndexMyopic Index Approximately learning with strategy with One-period

No learning learning strategy optimal risk aversion risk aversion look-ahead (Myopic learning) (Index strategy)

Relative mean of prior beliefs (�̄j051

Pampers 00000 00000 00000 00000 00000 00000 00000 00000 00000— — — — — — — — —

Huggies 00095 00079 −00798 −00198 00007 −00346 −00014 −00479 −101694000595 4001565 4007175 4001945 4001515 4000365 4000925 4007745 4107765

Luvs −00716 −00641 −20351 −10381 −00627 −10941 −00819 −10686 −309264000815 4001775 4106265 4005345 4001635 4100795 4002245 4104125 4507175

Other brands −30223 −20761 −30143 −20978 −20474 −30211 −20744 −70169 −202534001905 4003215 4201075 4008535 4004885 4205665 4003005 4301955 4204195

Uncertainty of prior beliefs (�̄j0) relative to inherent quality uncertainty (�j )

Pampers — 00734 00694 00724 00734 00699 00564 00142 00887— 4000995 4001185 4001195 4000995 4001165 4000385 4000185 4003565

Huggies — 00476 00455 00494 00476 00463 00424 30404 00545— 4000665 4002945 4001535 4000665 4003625 4000745 4208985 4002375

Luvs — 00773 10126 10394 00773 10215 00740 00623 10397— 4001395 4007305 4006005 4001385 4009405 4000645 4004285 4108065

Other brands — 10332 10428 10394 10333 10423 00779 00666 10391— 4004915 4107505 4008185 4004875 4109315 4001465 41104525 4506975

True mean quality (�j 52

Pampers — 30902 30686 30551 30777 30585 30991 110660 30954— 4003295 4200925 4008445 4003615 4205715 4001825 4109055 4008415

Huggies — 50852 80438 70850 50728 80189 60374 00226 90083— 4006885 4500885 4201925 4007035 4509145 4104045 4301645 4301645

Luvs — 30666 20724 20544 30542 20569 20956 20774 20908— 4005025 4105445 4007045 4004995 4108575 4002765 4204605 4009745

Other brands — 10630 10343 10364 10504 10293 10985 20247 20058— 4008205 4100805 4008565 4008245 4102395 4009615 4903405 4009215

Magnitude of utility shocks (� x1 �j 53 relative to inherent quality uncertainty (�j )

Pampers — — 00837 00836 — 10081 00273 — 00885— — 4005085 4002215 — 4008135 4000215 — 4002195

Huggies — — 00138 00151 — 00141 00271 — 00128— — 4000845 4000405 — 4001065 4000205 — 4000325

Luvs — — 00316 00347 — 00317 00275 — 00227— — 4001925 4000925 — 4002385 4000225 — 4000575

Other brands — — 10272 10347 — 10090 00275 — 10156— — 4007725 4003575 — 4008205 4000225 — 4002895

Price sensitivity (�) −00126 −00128 −00152 −00153 −00128 −00152 −00160 −00274 −000814000205 4000265 4000945 4000485 4000265 4001155 4000285 4000775 4000455

Risk aversion (r ) — — — — 00463 00084 — — —— — — — 4003055 4000395 — — —

% forward looking — — — — — — — — 00773— — — — — — — — 4000915

Note. For identification: 1�̄10 = 0. 2��j = 0. 3� x

j observed in the data, � �j estimated, and � x1 �

j computed from the two via independence.

neutral if u′′ = 0, risk averse if u′′ < 0, and risk seekingif u′′ > 0. The Bellman equation for the subproblem ofthe jth brand (Equation (5)) is generalized as

V 4sjt1 Exjt1 �jt1�j5

= max{

u4�j5+ �Ɛ6V 4sjt1 Exj1 t+11 �j1 t+11�j571

Ɛ6u4 E�′ Exjt + �jt + qjt5 � sjt7

+ �Ɛ6V 4sj1 t+11 Exj1 t+11 �j1 t+11�j5 � sjt7

}

0 (7)

We prove in Online Appendix A.2 that the general-ized canonical forward-looking experiential learningmodel is indexable. This is true for all consumer utilityfunctions satisfying u′ > 0.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


The indexability result allows us to test for riskaversion at low computational costs. We do so usingthe diaper data. To parameterize the test, we assumethat consumers exhibit constant absolute risk aversion:u4w5= 1 − e−rw, where r > 0 measures the degree of riskaversion (e.g., Roberts and Urban 1988).25 Based on thisutility function, we reestimate the index strategy model.In addition, we reestimate the myopic learning modelto see whether general risk preferences as opposed tothe exploration incentive suffice to explain consumerchoices.26

Table 3 reports the fit statistics. Allowing for riskaversion brings little improvement to the likelihood andU 2, and worsens the AIC and BIC because of the extrarisk-aversion parameter. Table 4 reports the parameterestimates. The risk-aversion parameter is insignificantfor the myopic learning model; it is marginally signifi-cant for the index strategy model but the magnitude issmall. Diaper buyers in our sample do not seem to bestrongly risk averse. Because the approximately optimalmodel provides parameter estimates that are close tothe index model for the risk neutral case, we expectsimilar results if we were to estimate the approximatelyoptimal model for the risk averse case.

8.2. Other Cognitively Simple HeuristicsThe canonical forward-looking experiential learningmodel assumes that consumers have perfect foresight.But the degree of foresight is an empirical question.Ho and Chong (2003) find that a parsimonious myopicmodel accurately describes and predicts stock keepingunit (SKU) demand.27 Models in which the decisionmaker looks one period ahead sometimes explainchoices well (Hauser et al. 1993, Gabaix et al. 2006, Cheet al. 2007). In the bandit literature, Ny and Feron (2006)explore one-period look-ahead heuristics as approxi-mate solutions to restless bandits with switching costs.A one-period look-ahead model is arguably simplerthan the full dynamic optimization problem, and isa heuristic consumers might use. For a one-periodlook-ahead model, the Bellman equation (Equation (3))is modified as

V 4Est1 Ext1 E�t5= maxj∈A

{

E�′ Exjt + �jt

+ Ɛ

[

qjt + �maxk∈A

8qk1 t+1 + E�′ Exk1 t+1 + �k1 t+19

∣

∣

∣

∣

Est1 j

]}

0 (8)

Tables 3 and 4 report the empirical results. The one-period look-ahead model has worse in-sample fitthan the index strategy model (Vuong test p = 000244)

25 For constant risk aversion, u4w5→w as r → 0. Erdem and Keane(1996) express risk aversion with a quadratic utility function.26 The risk aversion parameter cannot be separately identified fromthe mean prior beliefs in the no-learning model.27 The model was used by Procter and Gamble to predict SKUpurchases.

and approximately the same out-of-sample prediction(Vuong test p = 002529). The one-period look-aheadmodel fits better than the myopic learning model bothin-sample (Vuong test p = 0000335 and out-of-sample(Vuong test p = 0000095. These results suggest thatdiaper consumers are not myopic, although they maynot be perfectly forward looking.

We could easily estimate a variety of cognitivelysimple heuristics including Tl-period look-ahead modelsfor Tl < T , Gittins’-index models modified to allow forutility shocks,28 and various heuristics such as thoseproposed by Bertsimas and Niño-Mora (2000). Forexample, a modified-Gittins’-index model (U 2 = 88009%in-sample; U 2 = 88084% out-of-sample) does better thanmyopic learning, but not as well as the index strategymodel (which is based on Whittle’s index). We stronglycaution against choosing a best-predicting model basedon a single data set. Unrestricted search among modelswould likely exploit random variation. However, fromthe good fit and predictive ability of the three testedheuristics (Whittle’s index, one-period look-ahead, andmodified Gittins’ index), we are comfortable in ourhypotheses that (1) cognitively simple heuristics areplausible alternatives to modeling forward-lookingbehavior and (2) an index strategy is one viable model.

8.3. Heterogeneous ForesightIn an alternative approach we allow for heterogeneousconsumer foresight. We assume there are two latentconsumer segments that represent the two “extremes”of the foresight spectrum. One segment engages inmyopic learning and the other segment has perfect fore-sight. Because the index strategy and the approximatelyoptimal solution are observationally indistinguishable,we assume that the perfect-foresight segment followsthe computationally favorable index strategy. We usethe latent class method (Kamakura and Russell 1989)to estimate the fraction of consumers belonging to eachsegment, as well as the set of parameters associatedwith each segment.

Not surprisingly, as Table 3 shows, the latent classmodel generates higher likelihood and U 2 than boththe myopic learning model (p < 000001 in-sample;p < 000001 out-of-sample) and the index strategy model(p = 000003 in-sample; p = 00018 out-of-sample). The flex-ibility of the latent class model comes at the cost ofextra parameters. It produces a slightly better AIC buta worse BIC than the index strategy model.

The last two columns of Table 4 report the parameterestimates of the latent class model. The parameter esti-mates associated with the respective segments indicate

28 Specifically, the modified Gittins’ index assumes that the consumerat purchase occasion t chooses the brand with the highest valueof G4sjt5+ E�′ Exjt + �jt , where G4sjt5 is Gittins’ index derived fromthe optimization problem in the absence of utility shocks (see §4).The modified Gittins’ index is an ad hoc solution relative to theWhittle’s-index model.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


Table 5 In-Sample and Out-of-Sample Fit Statistics for Diaper Data (Private Labels Included)

Myopic Index Approximately Myopic learning Index strategy One-period HeterogeneousNo learning learning strategy optimal with risk aversion with risk aversion look-ahead foresight

Calibration sampleLog likelihood −31301057 −11955016 −11853022 −11857004 −11955016 −11852043 −11862060 −11814066U2 (%) 76058 86013 86085 86083 86013 86086 86079 87013AIC 61611014 31934033 31738045 31746008 31936033 31738086 31751020 31687031BIC 61634050 41004041 31831090 31839053 41012026 31838015 31827012 31856069No. of parameters 4 12 16 16 13 17 13 29No. of observations 2,542 2,542 2,542 2,542 2,542 2,542 2,542 2,542

Hold-out sampleLog likelihood −31173091 −11935058 −11855017 −11859030 −11935059 −11852032 −11861023 −11870081U2 (%) 76087 85090 86048 86045 85090 86050 86044 86037No. of observations 2,475 2,475 2,475 2,475 2,475 2,475 2,475 2,475

Computation time in secondsa Negligible 1 1.5 (23) 146 1 24 6 24

aThis is the time required to compute one likelihood function using a university computing system based on Sun Grid Engine and Red Hat Enterprise Linux.The approximately optimal model is estimated using the original grid (M = N = 5). For the index strategy model, the computation time is 1.5 seconds for theoriginal grid and 23 seconds for the finer grid (M = 200, N = 75).

similar qualitative comparisions across brands relativeto the homogeneous-foresight models. Meanwhile, 77%of diaper buyers are forward looking. This findingechoes the result from the one-period look-ahead modelthat the average diaper buyers are neither myopic norperfectly forward looking.

8.4. Private LabelsOur primary analyses eliminated any consumer whopurchased a private label during the observation win-dow. This restriction allowed us to focus on a casewhere we expected forward-looking learning. The deci-sion was also driven by the curse of dimensionalityinherent in the approximately optimal solution—addinganother brand increases the size of the state space byM ×N = 25 times. As a robustness check, we repeatour estimations replacing “other brands” with privatelabels. Table 5 presents the fit statistics. The relative fitand predictive accuracies are the same as in Table 3.Furthermore, the (unreported) parameter estimatesfor Pampers, Huggies, and Luvs are not significantlydifferent when comparing the index strategy andapproximately optimal models in Tables 3 and 5.

9. Summary, Conclusions, andFuture Research

Models of forward-looking experiential learning areimportant to marketing. These theory-driven modelsexamine how consumers make trade-offs betweenexploiting and exploring brand information. Manage-rially, these models enable researchers to investigateeffects due to quality uncertainty, learning, and thevariation in utility shocks. However, the consumerproblem in these models is computationally intractable(PSPACE-hard). Existing solutions via the Bellman equa-tion require vast computational resources (time and

memory) that may contradict cognitive simplicity theo-ries of consumers.

In this paper we propose that consumers use cog-nitively simple heuristics to solve forward-lookingexperiential learning problems. We explore one viableheuristic—index strategies. Index strategies represent asolution concept that decomposes a complex probleminto a set of much simpler subproblems. We proveanalytically that an index strategy exists for canonicalforward-looking experiential learning models and thatthe index function has simple properties that consumersmight intuit. Using synthetic data, we demonstrate thata well-defined index solution achieves near optimalexpected utility and is fast to compute. Using IRIdata on diaper purchases, we show that at least oneindex solution fits the data and predicts out-of-samplesignificantly better than either a no-learning model ora myopic learning model. Compared with an approxi-mately optimal solution, the index strategy fits equallywell, produces similar estimation results (and hencemanagerial implications), requires significantly lowercomputational costs and, we believe, is more likely todescribe consumer behavior.

We address many issues, but many issues remain.We do not model advertising as a quality signal (theIRI data set for the diaper category does not trackadvertising). The consequence of incorporating adver-tising signals depends on how consumers learn. Weabstract away from inventory problems. Inventoryeffects are found to be insignificant in previous research(Ching et al. 2012), but nevertheless add a dimensionto consumers’ dynamic planning. We study standardsettings where consumers do not learn from nonchosenalternatives. It would be interesting to model correlatedlearning or extend index strategies to incorporate hypo-thetical reinforcement of nonchosen options (Camererand Ho 1999). Technically, it would also be interesting

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


to examine the indexability of learning models whenthere are switching costs.29

Diaper buyers are likely forward looking, but con-sumers in other product categories may not be. Ourtheory suggests that consumers are most likely tobe forward looking when shock uncertainty is smallcompared with quality uncertainty; we expect myopiclearning models to do well when shock uncertainty islarge. This prediction is testable using cross-categoryanalysis. For instance, shock uncertainty may be largein hedonic goods categories where consumption valueswings with idiosyncratic mood. Shock uncertaintymay also be dominant in markets characterized byvolatile marketing mix variables. The recent rise offlash sales introduced remarkable price volatility tocategories such as food, gadgets, and apparel. It willbe interesting to study whether this change serves topromote myopic purchase behaviors.

Finally, an index solution appears to be a reasonabletrade-off for diaper consumers, but our basic hypothesisis that consumers use cognitively simple heuristicstrategies. Other cognitively simple heuristics mightexplain consumer behavior even better than indexstrategies. Section 8.2 suggests testable alternatives.Future research can explore these and other heuristicsusing either field data or laboratory experiments.

Supplemental MaterialSupplemental material to this paper is available at http://dx.doi.org/10.1287/mksc.2014.0868.

AcknowledgmentsThe authors gratefully acknowledge the helpful commentsfrom Andrew Ching, Nathan Fong, Jacob Gramlich, EricSchwartz, Qiaowei Shen, Duncan Simester, Olivier Toubia,Catherine Tucker; seminar participants at MassachusettsInstitute of Technology and the University of North Carolinaat Chapel Hill; and attendees of the 2012 INFORMS Inter-national Conference, 2012 Marketing Science Conference,2013 Allied Social Science Associations Annual Meeting, and2013 Marketing Dynamics Conference. The authors thank theeditor, associate editor, and reviewers for their constructivecomments that improved the paper.

ReferencesAckerberg DA (2003) Advertising, learning, and consumer choice in

experience good markets: An empirical examination. Internat.Econom. Rev. 44(3):1007–1040.

Banks JS, Sundaram RK (1994) Switching costs and the Gittins index.Econometrica 62(3):687–694.

Bertsimas D, Niño-Mora J (2000) Restless bandits, linear programmingrelaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.

29 Banks and Sundaram (1994) prove that there is no consistent wayto define an optimal index in the presence of switching costs amongchoice alternatives. However, a bandit problem with switching costcan be reformulated as a restless bandit problem, which could beindexable (Glazebrook et al. 2006, Niño-Mora 2008).

Bertsekas D, Tsitsiklis J (1996) Neuro-Dynamic Programming (AthenaScientific Press, Cambridge, MA).

Bettman JR (1979) An Information Processing Theory of Consumer Choice(Addison-Wesley, Reading, MA).

Bettman JR, Luce MF, Payne JW (1998) Constructive consumer choiceprocesses. J. Consumer Res. 25(3):187–217.

Bröder A (2000) Assessing the empirical validity of the “take thebest” heuristic as a model of human probabilistic inference.J. Experiment. Psych.: Learning, Memory, Cognition 26(5):1332–1346.

Bronnenberg BJ, Kruger MW, Mela CF (2008) The IRI marketing dataset. Marketing Sci. 27(4):745–748.

Camerer CF (2003) Behavioral Game Theory: Experiments in StrategicInteraction, Roundtable Series in Behavioral Economics (PrincetonUniversity Press, Princeton, NJ).

Camerer CF, Ho TH (1999) Experience-weighted attraction learningin normal form games. Econometrica 67(4):827–874.

Che H, Sudhir K, Seetharaman PB (2007) Bounded rationality inpricing under state-dependent demand: Do firms look ahead,and if so, how far? J. Marketing Res. 44(3):434–449.

Ching A (2010) A dynamic oligopoly structural model for theprescription drug market after patent expiration. Internat. Econom.Rev. 51(4):1175–1207.

Ching A, Erdem T, Keane M (2010) How much do consumers knowabout the quality of products? Evidence from the diaper market.Working paper, University of Toronto, Toronto.

Ching A, Erdem T, Keane M (2012) A simple approach to estimate theroles of learning, inventory and experimentation in consumerchoice. Working paper, University of Toronto, Toronto.

Ching A, Erdem T, Keane M (2013a) Learning models: An assessmentof progress, challenges and new developments. Marketing Sci.32(6):913–938.

Ching A, Erdem T, Keane M (2013b) Online appendix of “Learningmodels: An assessment of progress, challenges and new devel-opments.” Marketing Sci. http://pubsonline.informs.org/doi/suppl/10.1287/mksc.2013.0805.

Ching A, Ishihara M (2010) The effects of detailing on prescribingdecisions under quality uncertainty. Quant. Marketing Econom.8(2):123–165.

Ching A, Ishihara M (2012) Measuring the informative and persuasiveroles of detailing on prescribing decisions. Management Sci.58(7):1374–1387.

Chintagunta P, Jiang R, Jin GZ (2009) Information, learning, anddrug diffusion: The case of Cox-2 inhibitors. Quant. MarketingEconom. 7(4):399–443.

Chintagunta P, Erdem T, Rossi PE, Wedel M (2006) Structuralmodeling in marketing: Review and assessment. Marketing Sci.25(6):604–616.

Chow C-S, Tsitsiklis JN (1991) An optimal one-way multigrid algo-rithm for discrete-time stochastic control. IEEE Trans. AutomaticControl AC-36(8):898–914.

Crawford GS, Shum M (2005) Uncertainty and learning in pharma-ceutical demand. Econometrica 73(4):1137–1173.

Dickstein M (2012) Efficient provision of experience goods: Evidencefrom antidepressant choice. Working paper, Stanford University,Palo Alto, CA.

Erdem T, Keane MP (1996) Decision making under uncertainty: Cap-turing dynamic brand choice processes in turbulent consumergoods markets. Marketing Sci. 15(1):1–20.

Erdem T, Keane MP, Sun B (2008) A dynamic model of brand choicewhen price and advertising signal product quality. Marketing Sci.27(6):1111–1125.

Erdem T, Keane MP, Öncü T, Strebel J (2005) Learning about comput-ers: An analysis of information search and technology choice.Quant. Marketing Econom. 3(3):207–247.

Gabaix X, Laibson D (2000) A boundedly rational decision algorithm.Amer. Econom. Rev. 90(2):433–438.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.


Gabaix X, Laibson D, Moloche G, Weinberg S (2006) Costly informa-tion acquisition: Experimental analysis of a boundedly rationalmodel. Amer. Econom. Rev. 96(4):1043–1068.

Gigerenzer G, Goldstein DG (1996) Reasoning the fast and frugalway: Models of bounded rationality. Psych. Rev. 103(4):650–669.

Gilbride TJ, Allenby GM (2004) A choice model with conjunctive,disjunctive, and compensatory screening rules. Marketing Sci.23(3):391–406.

Gilovich T, Griffin D, Kahneman D (2002) Heuristics and Biases: ThePsychology of Intuitive Judgment (Cambridge University Press,Cambridge, UK).

Gittins J (1989) Multi-Armed Bandit Allocation Indices (John Wiley &Sons, New York).

Gittins J, Jones D (1974) A dynamic allocation index for the sequentialdesign of experiments. Gani J, Sarkadi K, Vince I, eds. Progressin Statistics (North-Holland, Amsterdam), 241–266.

Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit AllocationIndices, 2nd ed. (John Wiley & Sons, Hoboken, NJ).

Glazebrook K, Ruiz-Hernandez D, Kirkbride C (2006) Some indexablefamilies of restless bandit problems. Adv. Appl. Probab. 38(3):643–672.

Goldstein DG, Gigerenzer G (2002) Models of ecological rationality:The recognition heuristic. Psych. Rev. 109(1):75–90.

Guadagni PM, Little JDC (1983) A logit model of brand choicecalibrated on scanner data. Marketing Sci. 2(3):203–238.

Hansen K, Singh V, Chintagunta P (2006) Understanding thestore-brand purchase behavior across categories. Marketing Sci.25(1):75–90.

Hauser JR (1978) Testing the accuracy, usefulness and significance ofprobabilistic models: An information theoretic approach. Oper.Res. 26(3):406–421.

Hauser JR (1986) Agendas and consumer choice. J. Marketing Res.23(3):199–212.

Hauser JR, Urban GL (1986) The value priority hypotheses forconsumer budget plans. J. Consumer Res. 12(4):446–462.

Hauser JR, Urban GL, Weinberg BD (1993) How consumers allocatetheir time when searching for information. J. Marketing Res.30(4):452–466.

Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing.Marketing Sci. 28(2):202–223.

Hauser JR, Toubia O, Evgeniou T, Befurt R, Dzyabura D (2010) Dis-junctions of conjunctions, cognitive simplicity, and considerationsets. J. Marketing Res. 47(3):485–496.

Ho T-H, Chong J-K (2003) A parsimonious model of stockkeeping-unit choice. J. Marketing Res. 40(3):351–365.

Houser D, Keane MP, McCabe K (2004) Behavior in a dynamicdecision problem: An analysis of experimental evidence usingBayesian type classification algorithm. Econometrica 72(3):781–822.

Hutchinson JMC, Gigerenzer G (2005) Simple heuristics and rules ofthumb: Where psychologists and behavioural biologists mightmeet. Behavioural Processes 69(2):97–124.

Imai S, Jain N, Ching A (2009) Bayesian estimation of dynamicdiscrete choice models. Econometrica 77(6):1865–1899.

Johnson EJ, Payne JW (1985) Effort and accuracy in choice. Manage-ment Sci. 31(4):395–414.

Jovanovic B (1979) Job matching and the theory of turnover. J. PoliticalEconom. 87(5):972–990.

Kamakura W, Russell G (1989) A probabilistic choice model formarket segmentation and elasticity structure. J. Marketing Res.26(4):379–390.

Keane M, Wolpin K (1994) The solution and estimation of discretechoice dynamic programming models by simulation and interpo-lation: Monte Carlo evidence. Rev. Econom. Statist. 76(4):648–672.

Kohli R, Jedidi K (2007) Representation and inference of lexico-graphic preference models and their variants. Marketing Sci. 26(3):380–399.

Lindsay PH, Norman DA (1977) Human Information Processing: AnIntroduction to Psychology (Academic Press, New York).

Marewski JN, Pohl RF, Vitouch O (2010) Recognition-based judgmentsand decisions: Introduction to the special issue (Vol. 1). JudgmentDecision Making 5(4):207–215.

Martignon L, Hoffrage U (2002) Fast, frugal, and fit: Simple heuristicsfor paired comparisons. Theory Decision 52(1):29–71.

McFadden D (1986) The choice theory approach to market research.Marketing Sci. 5(4):275–297.

Mehta N, Chen XJ, Narasimhan O (2008) Informing, transforming,and persuading: Disentangling the multiple effects of advertisingon brand choice decisions. Marketing Sci. 27(3):334–355.

Miller R (1984) Job matching and occupational choice. J. PoliticalEconom. 92(6):1086–1120.

Narasimhan C (1988) Competitive promotional strategies. J. Bus.61(4):427–449.

Narayanan S, Manchanda P (2009) Heterogeneous learning andthe targeting of marketing communication for new products.Marketing Sci. 28(3):424–441.

Narayanan S, Manchanda P, Chintagunta PK (2005) Temporaldifferences in the role of marketing communication in newproduct categories. J. Marketing Res. 42(3):278–290.

Ni J, Neslin SA, Sun B (2012) Database submission—The ISMSdurable goods data sets. Marketing Sci. 31(6):1008–1013.

Niño-Mora J (2001) Restless bandit, partial conservation laws andindexability. Adv. Appl. Probab. 33(1):76–98.

Niño-Mora J (2008) A faster index algorithm and a computationalstudy for bandits with switching costs. INFORMS J. Comput.20(2):255–269.

Ny JL, Feron E (2006) Restless bandits with switching costs: Linearprogramming relaxations, performance bounds and limitedlookahead policies. Proc. 2006 Amer. Control Conf., Minneapolis.

Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimalqueuing network control. Math. Oper. Res. 24(2):293–305.

Payne JW, Bettman JR, Johnson EJ (1988) Adaptive strategy selectionin decision making. J. Experiment. Psych.: Learning, Memory,Cognition 14(3):534–552.

Payne JW, Bettman JR, Johnson EJ (1993) The Adaptive Decision Maker(Cambridge University Press, Cambridge, UK).

Roberts JH, Urban GL (1988) Modeling multiattribute utility, risk,and belief dynamics for new consumer durable brand choice.Management Sci. 34(2):167–185.

Rust J (1987) Optimal replacement of GMC bus engines: An empiricalmodel of Harold Zurcher. Econometrica 55(5):999–1033.

Rust J (1994) Structural estimation of Markov decision processes.Engle RF, McFadden DF, eds. Handbook of Econometrics (NorthHolland, Amsterdam).

Rust J (1996) Numerical dynamic programming in economics. AmmanHM, Kendrick DA, Rust J, eds. Handbook of ComputationalEconomics, Vol. 1 (North Holland, Amsterdam).

Rust J (1997a) Using randomization to break the curse of dimension-ality. Econometrica 65(3):487–516.

Rust J (1997b) Dealing with the complexity of economic calculations.Working paper, Yale University, New Haven, CT.

Samuelson PA (1937) A note on measurement of utility. Rev. Econom.Stud. 4(2):155–161.

Shugan SM (1980) The cost of thinking. J. Consumer Res. 7(2):99–111.Simon HA (1955) A behavioral model of rational choice. Quart. J.

Econom. 69(1):99–118.Simon HA (1956) Rational choice and the structure of the environment.

Psych. Rev. 63(2):129–138.Urban GL, Liberali G, MacDonald E, Bordley R, Hauser JR (2014)

Morphing banner advertising. Marketing Sci. 33(1):27–46.Vuong QH (1989) Likelihood ratio tests for model selection and

non-nested hypotheses. Econometrica 57(2):307–333.Whittle P (1988) Restless bandits: Activity allocation in a changing

world. J. Appl. Probab. 25(2):287–298.Yee M, Dahan E, Hauser JR, Orlin J (2007) Greedoid-based noncom-

pensatory inference. Marketing Sci. 26(4):532–549.

Dow

nloa

ded

from

info

rms.

org

by [

18.1

54.1

.183

] on

08

Sept

embe

r 20

14, a

t 06:

05 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Documents

Learning from Experience, Simplyweb.mit.edu/hauser/www/Updates - 4.2016/Lin Zhang... · able to posit that a consumer cannot solve optimally in his or her head a dynamic problem that