SSRN-id2462577

Embed Size (px)

DESCRIPTION

Document

Citation preview

  • Electronic copy available at: http://ssrn.com/abstract=2462577

    Product Offerings and Product Line Length Dynamics

    Xing Li

    October 6, 2014

    Abstract

    This paper provides a model that uses preference heterogeneity to rationalize the cross-sectional and intertemporal variation in a firms product proliferation strategies. Product-linedynamics arise from shocks to preference heterogeneity. For example, in the potato chip cat-egory I study, consumer concerns over fat levels in foods created two desirable alternatives(low fat and zero fat) for each flavor. On the supply side, firms learn about these changingtastes and adapt product lines accordingly. For tractability, the heterogeneity in preferenceis captured within the nesting parameter in an aggregate nested logit demand model. I findgreater preference heterogeneity for smaller packages of chips and for markets with more de-mographic diversity. The dominant firm in the market bases its decisions primarily on pastexperience in the market, with the latest preference shocks representing only 30% of the in-fluence in product-line decisions. Gross margins are increased by 5% if firms have perfectinformation about preference diversity. Costs for product line maintenance constitute about2% of total revenue. Sunk costs incurred when expanding the product line are estimated tobe four times the per-product fixed cost, thereby limiting the flexibility of product-line adjust-ment. The probability of line length adjustment grows from 70% to 90% under a smooth coststructure.

    I am grateful to my advisors, Tim Bresnahan, Wes Hartmann, and Petra Moser, for their invaluable guidance,discussion, and encouragement. I would also like to thank Chris Colon, Chen Cheng, Oystein Daljord, MichaelDickstein, Liran Einav, Pedro Gardete, Daniel Grodzicki, Han Hong, Mike Kruger, Brad Larsen, James Lattin, AnqiLi, Harikesh Nair, Sridhar Narayanan, Joe Orsini, Qiusha Peng, Peter Reiss, Gregory Rosston, Navdeep Sahni, StephanSeiler, Stephen Teng Sun, Paul Wong, Yiqing Xing, Constantine Yannelis, Pai-Ling Yin, and seminar participants atStanford Department of Economics, Stanford Marketing WIP, Marketing Science Conference in 2014 Atlanta for theirhelpful comments. The usual disclaimer applies.Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305-6072. [email protected]

    1

  • Electronic copy available at: http://ssrn.com/abstract=2462577

    1 Introduction

    One of the central decisions firms make is the level of their product proliferation. Product prolifer-ation can be exercised in two dimensions: vertically or horizontally. Vertical proliferation meansproviding an upgraded or downgraded model and charging a different price. Examples includeApple iPhone5S and iPhone5C, Toyota Corolla and Camry, and Canon Digital Camera DSLR 5Dand DSLR 50D. Within the same model, firms can differente horizontally by providing differentfeatures of colors, flavors, or designs. Apple offers iPhone5S with three choices of colors; Danoneproduces 6oz yogurt in different flavors. Both companies are doing horizontal product proliferationwithin the same model.

    Vertical proliferation is mainly driven by leaps in R&D success (e.g., Goettler and Gordon,2011), whereas horizontal proliferation is largely initiated by consumer tastes (e.g., Draganska andJain, 2005) that vary across different markets and over time. Furthermore, vertical proliferation canalso involve higher fixed costs of adapting production processes, whereas horizontal proliferationtypically utilizes the same process as existing products. For both reasons, horizontal proliferation ismore flexible and therefore creates more variation in a firms decisions. This variation in horizontalproliferation is what motivates my investigation into firms extensions and contractions of theirproduct lines.

    Acknowledging the fact that consumers preference heterogeneity on the demand side is theprimary driver of horizontal product-line, I propose the following framework to rationalize bothcross-sectional and intertemporal variation in product-line decisions. The extent of preferenceheterogeneity varies across markets for reasons such as the concentration of different demographicgroups.1 Firms will provide a richer set of (horizontally differentiated) products in markets with amore heterogeneous preference to serve a larger proportion of consumers and make more profits.Within each market, firms can also adjust their product lines over time. When some changes occurin the heterogeneity of preference,2 firms can respond by adjusting their product-line decisions.When the level of preference heterogeneity increases, firms are more likely to expand their productlines; when preferences become more homogenous, firms are more likely to contract their productlines.

    The main mechanism to support the above argument is that preference heterogeneity affectsthe tradeoff between cannibalization and new sales creation when expanding the product line. For

    1In this paper, preference heterogeneity is an aggregate statistic for both variety seeking within individuals andpreference heterogeneity among individuals, which I demonstrate later.

    2For example, manufacturers of potato chip will consider the immigration of Asian and Hispanic population. Theywill also be aware of the consumers growing concern for their own health.

    2

  • multi-product firms, the newly launched product may bring in additional consumers, whereas at thesame time eat up market shares from existing products. When consumers preference is quite ho-mogenous, it is difficult to initiate new sales by new product launching, and cannibalization effectdominates new sales creation effect, and firms may not maintain a long product line. On the otherhand, when consumers preference is quite heterogeneous, new sales creation effect dominates andexpanding product line is more profitable.

    To formalize and quantify the above argument, I model the demand side using Nested Logit.Different products (features) from the same brand are clustered in the same nest (line) in the choicestructure. The nesting parameter has the same behavioral interpretation as the heterogeneity ofpreference, which is an aggregate measure of both variety seeking within an individual and pref-erence heterogeneity across individuals.3 When products are more nested within the line, they arecloser substitutes; consumers agree on the preference ranking among these products and the prefer-ence is more homogenous.4 On the other hand, when products are less nested within the line, theyare less substitutes; consumers have more varied views on their favorite products, and preferenceis more heterogenous.

    On the supply side, multi-product firms decides on the number and content of products offered.This is a very complicated problem, because multi-product firms cannot simply display all productsin front of the consumers to choose. Instead, they are facing many constraints including shelf space,distribution and storage cost, and advertising capacity. It is even more challenging to the modelersfor two reasons. First, if I model the launching decisions of every products, the model will becomeexponentially complicated as the product line gets longer.5 Second, it is difficult to write downa model to predict consumers taste for new products.6 For both reasons, I am focusing on theproduct line length while abstracting away the product contents on the supply side model.

    Firms are chasing time-varying preference heterogeneity by adjusting product line length onthe supply side,7 which is described by an empirical learning model similar to Hitsch (2006). Due

    3This modeling idea is rooted in the early motivation to use nested logit model, that is to use unobserved hetero-geneity to capture substitution pattern.

    4Intuitively, consider two products that are equally favorable, and split the market. Suppose the price of one productrises. When these two products are closer substitutes, the market share of the second product will increase more, whichmeans more people agree on the which product is their favorite.

    5The existing literature simplify this problem by only focusing on a subset of products. For example, in Draganskaet al. (2009), yogurt firms decides on whether to launch each of the six vanilla-flavored yogurt. Even if they disregardthe existence of other flavored yogurt, their action space is {Entry,NotEntry}6, with 64 possible actions.

    6In the standard BLP model, consumers utility is derived from characteristics, and it is possible to predict con-sumers valuation for a new product. (e.g., Petrin (2002); Berry et al. (2004)) This is applicable to some of theindustries such as automobiles, but inapplicable to others, including the potato chip industry that I am studying, be-cause there does not exist a vector of characteristics that capture consumers preference over potato chips with differentflavors.

    7As mentioned before, in the potato chip industry that I am studying, peoples preference heterogeneity changes

    3

  • to the timing constraints firms face in having to decide on the product proliferation prior to themarket realization, I assume they are ignorant of the true preference heterogeneity at the time ofdesigning product lines. I model the uncertainty of true preference heterogeneity as some beliefon which firms base their proliferation decisions. After the market realization, the belief can beupdated in a Bayesian way. The elegant representation of nested logit model that is linear in thenesting parameter makes modeling supply-side learning framework tractable.

    I apply the model to the potato chip market, where there is a leading firm (I call it CompanyA hereafter) with a market share of 60%. The average preference heterogeneity is estimated to be0.41 in the market of small package chips and 0.67 in the market for large package chips, whichmeans that preference for small packages is more heterogenous. This is explained by consumersmore willingness to try new flavors when buying small-packaged potato chips. In addition, morediverse population in the local market will tend to exhibit more preference heterogeneity, whichis confirmed by the estimation with a series of measure for population diversity in that market,including the dispersion of income and age distribution, and the magnitude of ethnic groups. Onthe supply side, Company A applies in-market learning on preference heterogeneity to adjust hisproduct line length. I find that Company A bases its decisions primarily on past experience in themarket, with the latest preference shocks representing 30% of the influence. The marginal cost ofoffering one additional product is estimated to be $3,560 per million households by quarter; thetotal maintenance cost is estimated to be 2% of total revenue for an average line with a length of22. I also estimate the sunk cost incurred when expanding the product proliferation to be threetimes the usual maintenance cost, which may limit the flexibility of product-line adjustments.

    Two counterfactual exercises based on estimates obtained from above evaluate the firms opti-mal line length decisions under different line-length specific policy experiments. In the first exer-cise, I simulate the optimal line length decisions without the existence of extra cost for line lengthexpansion. This removes the restrictions on Company As flexibility adjustment line length, andthe probability of line length changes grows from 70% to 90% under a smooth cost structure. In thesecond exercise, I consider the situation where Company A knows the precise value of preferenceheterogeneity at the time of product line length decisions. She can make a better decision basedon the true value instead of some guess, and the gross margin is increased by 5%. A byproduct ofthe second counterfactual is to test the hypothesis of learning or knowing preference heterogeneitywhen making line length decisions. I construct a test based on gross margin, and the test resultsupports the assumption of learning rather than knowing about heterogeneity. Both simulations

    over time for reasons such as an increasing concern over health issues and some trends in taste within a group ofpeople.

    4

  • shed lights on firms potential gain from product-line related improvement. The first one relates toa more efficient cost for product line maintenance, say, a more flexible contract on shelf-space anda better distribution system, and the second one relates to a better knowledge on consumers from,say, consumer study.

    This paper is related to several strands of literature. First, there is a growing literature onfirms product proliferation and product-line design, both theoretically and empirically. Theoreti-cal works have discovered varies factors to determine product proliferation, including communica-tion cost to consumers (Villas-Boas, 2004), quality signaling Kamenica (2008), vertical structureof distribution (Liu and Cui, 2010), consumers deliberation on their preference (Guo and Zhang,2012), variety preference and purchase cost (Bronnenberg, 2014) and other rational interpretationsas well as behavioral explanations such as cognitive overload Iyengar and Lepper (2000), articu-lated preference Chernev (2003b,a), and contextual effects (Simonson and Tversky, 1992; Orhun,2009). However, there are few empirical papers in this field. Hui (2004) takes product prolifera-tion as given in the demand estimation of nested logit. Draganska and Jain (2006) further exploredifferent nesting assumptions in their demand analysis. Draganska et al. (2009) offer a supply-sidemodel for product line design but restricts to a small subset of products and mainly use supply-sidecompetition environment to explain the product line design. This paper proposes a supply-sidemodel for product line length of the whole product line where the driving force for variation inproduct-line length is preference heterogeneity on the demand side. The second strand of relatedliterature is on variety seeking. Models for variety seeking find negative state dependence on pastchoice (Chintagunta, 1998, 1999; Seetharaman et al., 2005; Dub et al., 2009, 2010). They areestimating individual-level variety-seeking behavior between intertemporal purchases, whereas Iam incorporating variety-seeking effect in an aggregate measure of preference heterogeneity. Bothmodels should provide similar inference on the effect of variety seeking on product-line design.In addition to the static inference, this model focus more on dynamic proliferation decisions overtime.

    This paper is most related to papers in the statistical learning literature. Numerous papers studyconsumers learning the quality of new products (Roberts and Urban, 1988; Erdem and Keane,1996; Ching et al., 2013; Lin et al., 2014). On the supply side, Urban and Katz (1983) and Ur-ban and Hauser (1993) address firms market experimentation in designing new products. Hitsch(2006) studies firms learning the quality of new products when making exit decisions. Anotherseries of papers(Crawford and Shum, 2005; Narayanan and Manchanda, 2009; Dickstein, 2014)considers physicians and patients learning about the effectiveness of drugs when making prescrip-tion decisions. This paper differs from those empirical learning papers in two perspectives. First,

    5

  • the learning object is preference heterogeneity rather than mean preference in these existing liter-ature. Second, the learning object is evolving over time whereas in standard learning framework,the learning object is constant.8

    This paper is also related to research on empirical entry and product positioning.9 Early re-search on empirical entry infer firms profitability from their entry decisions (Reiss and Spiller,1989; Bresnahan and Reiss, 1990, 1991; Berry, 1992). Later research treats as endogenous vari-ables the marketing mix other than price (Berry and Waldfogel, 2001; Mazzeo, 2002; Berry et al.,2004; Seim, 2006; Einav, 2010; Sweeting, 2010; Crawford et al., 2011; Ryan and Tucker, 2012;Fan, 2013). This paper contributes to this strand of literature by proposing a tractable model forproduct line length dynamics for multi-product firms.

    The rest of the paper is organized as follows. Section 2 introduces the data and some reduced-form evidences on line length dynamics. Section 3 provides an empirical model to quantify firmsoptimal line length decisions driven by preference heterogeneity. Section 4 describes the full spec-ification and identification. Section 5 shows the results, and section 6 concludes.

    2 Product Offerings in US Potato Chip Market

    In this section, I will provide an overview of potato chip industry and description on the IRI Aca-demic Dataset (Bronnenberg et al., 2008) that I use.10 The last part of the section shows somereduced-form evidence on product line length dynamics.

    2.1 The Potato Chip Market

    Potato chips can be found in most American households. An average US household will spend $80a year in salty snacks. Potato chips have a dollar share of 30% in the industry of salty snacks, whichmeans an average household will spend around $24 each year on potato chips (First-Research,2011).

    Chip manufacturers anticipate and respond to changes in consumer preferences. First of all,in the potato chip industry, the ability to be innovative and differentiate a product is the key tocompetition. As a result, manufacturers offer different choices of potato chips with different fla-vors, fat contents, and cut types. Furthermore, consumers tastes vary by region and over time.

    8Lovett et al. (2009) also model time-evolving learning parameters.9Dub et al. (2005) provide an excellent summary of these papers.

    10All estimates and analyses in this paper based on Information Resources Inc. data are by the author and not byInformation Resources Inc.

    6

  • For example, Joon (2013) states that consumers in the Midwestern region prefer thick cuts andconsumers in the southwestern states prefer bold and spicy flavors. At the same time, many ex-ogenous factors drive the evolution of tastes over time. Population migration is one such factor(Bronnenberg et al., 2012). Manufacturers are creating new spicy flavors catering to a growingHispanic and Asian population (First-Research, 2011). Consumers awareness of the health costof eating potato chips high in trans fat and salt is another factor. To capitalize on this shift, leadingmanufacturers have introduced a number of new products with reduced fat and low salt content(Joon, 2013). A third factor is the change in taste for (new) flavors. Firms can elicit this change byinviting consumers to submit their newly designed flavors. 11 With the existence of diehard fansof classically flavored potato chips, the regional and temporal variations of tastes imply changes inpreference heterogeneity and have corresponding implications on product proliferation decisions.

    A second feature of potato chip industry is that it is highly concentrated, with a leading player(Company A) having a market share of 60%. The second largest player has a market share ofonly 5.2% (Joon, 2013). Company A does not worry too much about potential entrants. First,consumers have strong brand preference in picking potato chips. They are willing to pay extrafor branded chips. In addition, operating firms in this industry need to have good relations withupstream suppliers and downstream retailers. They use long-term contracts to hedge against thevolatile prices for potatoes, sugars, oils, and fats from their suppliers, and they are competing forthe best shelf spaces in grocery stores.

    2.2 Data

    I use the IRI Academic Dataset from 2001 to 2007 to estimate the model.12 The IRI academicdataset provides scanned sales data from a sample of grocery stores at the UPC-store-week level.I restrict the analysis in this paper to the Salty Snack - Potato Chip industry. I aggregate the datainto feature-city-quarter as detailed in Appendix A and described briefly as follows.

    In the product dimension of UPC, I am restricting to 8-13 serving sizes because they are themain sales happened in grocery stores.13 Furthermore, I treat all non-Company A chips as ho-

    11For example, Frito Lay holds the contest called Do us a flavor in each year to invite consumers to submit theirnewly designed flavors and launches the winners. The winning flavor will be awarded 1 million dollars.

    12Although the IRI Academic Dataset is available from 2001 to 2011, I only make use of seven years for the fol-lowing reasons. First, the 2008 financial crisis heavily drove up prices of potato chips, which makes pricing decisionsnon-trivial and complicate the model. Second, Company A did a national launch of zero-trans fat in 2008. The reasonsfor the timing and scale of such a big event are beyond the scope of this project. Moreover, the concurrence of the twoevents further complicates the analysis.

    13I choose the boundary of serving-size separation from the natural discontinuity point of serving-size density, as isshown in Appendix A.

    7

  • mogenous outside goods and aggregate their sales across different non-Company A brands. ForCompany A chips, I aggregate sales into features, where each feature is a unique triplet of distin-guishable characteristics of flavor, fat content, and cut type. I observe 41 different flavors, threedifferent fat contents (regular, fat free, reduced fat), and three different cut types (flat, ruffle, wavy).Not all combinations are ever produced and sold. In the data, only a total of 63 features (flavor-fat-cut combinations) have ever been sold in grocery stores.

    Along the geographic dimension of store location, I restrict the analysis to a balanced panel ofstores to prevent artifactual variation in product line due to changes in sampling criteria. I furtheraggregate to the level of 50 markets defined by IRI. Three reasons justify this aggregation. First,grocery stores are different. Product lines displayed in grocery stores in neighborhoods wherethe majority population is white differ from those in stores operating in more diversely populatedcommunities. Even within one diversely populated community, we might find some Asian storesand other Hispanic ones, where product lines in both stores are short, but the aggregate diversityof preference is high. Second, I can only observe the sale data, not the feature-launching data. Inother words, I do not know which features are displayed. It might be artifactual to be tagged asfeature withdraw if there are zero sales for some features in stores, while they are actually on theshelf. If all features with zero sales in store are taged as unlaunched, I would observe quite frequentchanges in store-level line length, whereas some of them are mis-specified. Both these two issueswill largely be aggregated out at the level of geographical market. Third, it is easier to link thedemographic data at the level of market, which provides further information about consumers.

    In the time dimension of week, I aggregate to the level of quarters to avoid artifactual productassortments identified by observed sales instead of actual launching as described above. Somefeatures have non-zero sales for only part of the weeks because they are launched in the middleof the quarter. Disregarding this effect will bias the estimates.14 For simplicity, I drop all thesefeatures that are only partially observed within the quarter (that have non-zero sale for less than 12weeks). For all 63 features in 28 quarters across 7 years, most of the features (96%) have positivesales in either all or no weeks within serving-size-city-quarters. Market shares from these featuresare also negligible as shown in Appendix A. After dropping these transient features, we have 58features remained in 28 quarters across 50 markets.

    Company A has wide variation in the length of its product line, defined as the count of featuressold in one city-quarter. Table 1 and Figure 1 present the distribution of line length. The average

    14As I will discuss later, the concentration of market shares among Company A chips is one key variable in theanalysis. Ignoring these marginal features that may be launched in the middle of the quarter and naively viewingtheir small market share as low sales will bias the inference regarding the concentration measure and contaminate theestimates.

    8

  • line length is 22.09 with a standard deviation of 3.86. The shortest line is in Raleigh/Durham -2001q1, with a length of 8, whereas the longest line is in Chicago - 2002q2, with a supply of 30different features. Variation in line length derives from two sources: cross-sectional and inter-temporal. Chip lines vary widely in line length in both sources. Cross-sectionally, Pittsfield hasthe shortest line, with an average length of 16.89, whereas Houston has the longest line, with anaverage length of 25.05. Line length also changes over time, as is shown in Table 1 and right panelof Figure 1. Line length is quite sticky, with about 30% cases there is zero changes, and in 85%the changes are within 2 features, but there are still cases where Company A is quite aggressive inline length adjustment.

    I supplement the IRI dataset by merging with the IPUM CPS data to get the demographics.Among 302 metropolitan area in CPS, I have identified 98 that can be merged with IRI markets.In terms of population, the 50 IRI markets in the data set cover half of the total population nation-wide. I calculate the total market size by number of households in 2007, whereas I calculate ina quarter by city level other demographics that may correlate with preference heterogeneity arecalculated in a quarter by city level.

    2.3 Reduced-form Evidence on Dynamic Product Offering

    Before going into the structural estimation, I will show some reduced-form evidence on firmschanging proliferation decisions based on market outcomes over time. When preference is ho-mogenous, consumers tend to agree on the preference ranking of all features within a line, andin-line market shares for features are concentrated. In the extreme case, consumers fully agree onthe preference ranking, and the in-line market share is 1 for the most preferred feature and 0 forothers. In these cases, the model predicts that firms will contract the product line. On the con-trary, when preference heterogeneity becomes high, consumers have various opinions about themost favorable feature, and in-line market shares become less concentrated. In this case, the modelpredicts that firms will expand the product line.

    To illustrate the argument above, I run the following regression:

    LineLengthmt = 0 + 1HHIm,t1 + 2LineLengthm,t1 + cm + ct + mt

    where m indexes market, t denotes quarter, LineLengthmt is the length of the product line,HHImt is the Herfindahl Index for in-line market share, that is,

    HHImt =f

    s2f |l,mt

    9

  • and sf |l,mt is the in-line market share for feature f in market-timemt. cm and ct are market and timefixed effects to control for geographic unobservables and seasoning effects. Baseline regressionconfirms the model prediction (Table 2, Column 1). The higher the market concentration, theshorter the product line in response. A one standard-deviation change in HHI (0.03) will lead to achange in line length of 0.2. Compared to the average change in line length of 0.19 (Table 1), thismagnitude of estimate is not small.

    One of the challenge for the interpretation of the estimate is that the measure of HHI is mechan-ically decreasing in line length, and the estimated correlation is artifactual.15 The worry is partlytrue as shown in Appendix B, and I use another measure of market concentration: the standarddeviation of log in-line market share defined as

    StdLnShareInLinemt = Std(ln sf |l,mt

    )which is not mechanically correlated with line length (Appendix B). When in-line market sharesare more concentrated, the standard deviation is high. Regression results still support our conjec-ture. A one standard-deviation increase in this concentration measure will lead to a 0.36 increasein line length (Table 2, Column 2).

    One alternative interpretation of the above findings is that firms will automatically withdrawlosing features that are unpopular. To deal with this challenge, I change the dependent variable tobe the indicator of line expansion. Regression results also confirm the initial theory proposed above(Table 2, Column 3,4). The higher the market concentration, the less likely the line gets expanded.A one standard-deviation decrease in HHI will lead to a higher chance of line expansion by 4.74%(Table 2, Column 3) and 4.60% (Table 2, Column 4). Compared to an average chance of lineexpansion of 34%, this increase is economically significant.

    A final caveat is that all these reduced-form evidences are correlational, not causal. The com-plete model allows firms to adjust their line length based on all past market realizations rather thanjust the last one. To quantify the above mechanism, we will estimate a structural model with aricher set of specifications.

    3 A Model of Product Line Length Dynamics

    In this section, I propose a model that is structural in both demand and supply to capture the effectof preference heterogeneity on the tradeoff between cannibalization and new sales creation when

    15I regress line length on one-period lagged HHI. The mechanics in calculating HHI will contaminate the inferenceonly when firms have some inertia to adjust line length.

    10

  • firms are making product line length decisions. For simplicity, I assume that in each market m,there is a separate monopolist. Within each market, the monopolist provides a line of nt productsindexed by j {1, 2, ..., nt} to compete with one single outside good j = 0 in each period t.

    3.1 Demand Side

    For each market m and period t (suppressed temporarily), the utility for consumer i from consum-ing Company A chip j {1, 2, ..., n} and outside goods j = 0 is

    uij = ai + cij pj= (a+ i) + (cj + jij) pj= j + (i + jij)

    ui0 = i0

    where ai is consumers brand preference for company A, which can be decomposed into the av-erage level a and consumer heterogeneity i; cij is consumer is utility for product j, which alsoinclude the mean value cj and consumers heterogeneity jij; pj is the price for product j. Aftersome rearrangement, the utility for consumer i consuming Company A product j equals the meanutility level j = a+ cj pj and consumers heterogeneity (i + jij). Following Berry (1994)and Cardell (1997), both ij and (i + jij) follows i.i.d. type I extreme value distribution.

    From the representation of cij = cj + jij , the value of j measures consumers preferenceheterogeneity over product j. When j is high, cij varies a lot across different individual i, and pref-erence for product j is heterogenous. On the other hand, lower j implies the preference is morehomogenous. Unfortunately, with only the market level sales data available, I lack the statisticalpower to identify all j . For tractability, I equalize all j and propose a market-level heterogeneitymeasure that captures the overall level of preference heterogeneity in the market. By makingthis assumption, the model is degenerated to Nested Logit model with nesting parameter equals(1 ), with representation from Berry (1994) as

    uij = j + (i + ij)

    The nesting parameter of (1 ) has the behavioral interpretation of preference heterogeneity.When = 0, the difference in utility from consuming product j and k is

    uij uik = j k

    11

  • which is the same for all i. Consumers agree on the preference ranking for all features within theproduct line. When is small, different products in line are close substitutes and the preferenceis more homogenous. On the contrary, when is large, consumers tend to have more divergentopinions on the preference ranking for products in the line and the preference is heterogenous.

    The nesting parameter is an aggregate statistics of both individual level variety seeking andcross-individual preference heterogeneity. If we think about the repeated purchases of one indi-vidual as different purchase occasions, the variety seeking behavior can be rationalized as the lowcorrelation for individual-specific demand shocks among different products, which is captured ashigh in current model. With market level data, I cannot identify between variety seeking withinindividual or preference heterogeneity among individuals. But these two channels should havesimilar implication for product assortment decisions, which is presented later.

    Another advantage for equalizing all j is that the model is now degenerated into Nested Logitwhich can be estimated in linear GMM. Within each market m,

    ln s1t ln s0t = jt + t( ln sj|l,t)

    = a+ cj pjt + t( ln sj|l,t)+ jt (1)

    where s1t is the market share for all Company A products, s0t is the market share for all non-Company-A products, sj|l,t is the in-line market share, which equals

    sjt1s0t . Following the standard

    model, I allow the taste for product j to vary by time, with cjt = cj + jt, where cj is the productfixed effects, and jt is the unobserved demand shock, which is distributed as N

    (0, 1

    ).

    3.2 Static Profit when is Known

    I assume that at the time of product line length decision, Company A does not know the precisevalue of mean utility jt so that she is taking expectation on over some distribution F (). Thereare three reasons to justify this assumption. First, product line length decisions are made prior tothe realization of demand, so Company A is ignorant about the demand shock jt. Second, retailerscan observe the demand shock and adjust the retail price pjt, so pjt is also unknown before marketrealization. Third, when Company A launches some new product, the value of cj is also unknownto her.16 By making this assumptions, I abstract away the identity of each products in line and

    16This also assumes out the product launching in the vertical sense or mass market strategy. When a companydecides to launch a new product, she can either play mass market strategy so that the new product is attractive to allconsumers (i.e., with a high value of ) or niche market strategy that the feature is attractive to a set of consumers (i.e.,similar ). In the potato chip industry, it is quite difficult to launch a potato chip that is favorable to all consumers andplay mass market strategy.

    12

  • focus mainly on the length of product line. The total market share for Company from offering aproduct line with length n follows the nested logit representation with

    s (n, ) = E

    (exp (I)

    1 + exp (I)

    )where

    I = lnnj=1

    exp

    (j

    )The total market share s (n, ) is increasing in n, increasing in , and super-modular in n and under some conditions imposed on F.17 In other words, when expanding the line length, themarginal gain in total share is larger when the preference is more heterogenous. Suppose there isa constant cost of expanding the line length by one, the super-modularity means the optimal linelength choice n is increasing in .18 In general, let C (n, l) denote the cost of launching a linewith length n while the line length in the last period is l. A myopic firm will choose n to maximize

    w M s (n, ) C (n, l)

    where w is the manufacture margin, M is the market size. Let n (,m) be the optimal line lengthchoice made, from super-modularity, n is increasing in t.

    3.3 Dynamic Learning on Time-evolving

    As mentioned earlier, preference heterogeneity evolves over time due to many exogenous factorsincluding population migration, health concerns, as well as evolving tastes for new flavors. I furtherassume that firms do not know the true value of preference heterogeneity when making line lengthdecisions. Instead, they have some beliefs on this value and update their beliefs based on marketrealizations.19

    17Super-modular means s (n+ 1, )s (n, ) is increasing in . Proof of these properties are provided in AppendixC.

    18This is consistent with the standard interpretation of price elasticity in the nested logit model. When products aremore nested within line, the price elasticity is higher within nests than between nests. Lowering price for one featurewill have larger cannibalization effect that will consume the market share of other products within the line than newsales creation effect that will increase the total share of all products in line. The same logic applies to the strategyof line expansion. The cannibalization effect of expanding the line dominates the business stealing effect whenfeatures are more nested, and in this case firms are less likely to expand the line.

    19There is no direct test about the informational assumption that firms do not know the exact value of preferenceheterogeneity because the stationary learning model (as described below in this paper) and complete informationmodel are not nested with each other. However, I will show some indirect test result based on simulation in latersection.

    13

  • 3.3.1 Learning from Market Realizations

    Suppose at the beginning of period t, Company A has a prior belief on t, which is modeled as atruncated normal with mean t and precision t, truncated at unit interval (0, 1), which is denotedas TN

    (t,

    1t

    ). After market gets realized, the market shares on all products are observed, and

    Company A can observe one signal from each product j as derived from (1):

    jt = ln s1t ln s0t a cj pj = t( ln sj|l,t)+ jt

    Aggregate signals from all products about the same t will get an aggregate signal20

    t =

    j

    ( ln sj|l,t) jtj ln

    2 sj|l,t= t +

    j

    ( ln sj|l,t) jtj ln

    2 sj|l,t

    with precision

    ht =

    (j

    ln2 sj|l,t

    )

    A nice property for truncated normal belief is that it is also a conjugate prior for normal datageneration process, which is shown in the next theorem

    Theorem 1. Suppose the prior is truncated normal

    t TN(t,

    2t =

    1t

    )and an unbounded signal is observed with value t and precision ht, then the posterior belief is

    also truncated normal

    t|t, ht TN(t, (

    t)2

    = (t)1)

    with

    t =t

    t + ht t + ht

    t + ht t (2)

    t = t + ht (3)

    Proof is shown in Appendix D.

    20For convenience, the notation ln2 sj|l,t means(ln sj|l,t

    )2.

    14

  • 3.3.2 Evolution of t

    The next step is to model the time-evolution of preference heterogeneity t. The reason for allowingt to evolve over time is two-folds. First, in the potato chip industry, we do observe preferenceheterogeneity changes over time and chip manufactures responds by adjusting their product linestrategies. Second, for modeling perspective, if the preference heterogeneity is constant over time,as an experienced firm operating in a mature market, Company A is sophisticated enough to knowthe true value of preference preference heterogeneity and no intertemporal variation in productline should be observed. The large intertemporal variation in product line length motivates theassumption of time-evolving preference heterogeneity.

    If t is not truncated, a natural candidate model is random walk, with

    t+1 = t + t

    where t N (0, 1) is the evolution error, or equivalently,

    t+1|t N(t,

    2

    )In the truncated case, I propose the following quasi random walk

    t+1|t f (|t;)

    which is similar to the random walk process as for unbounded case with acceptance-rejection atunit interval. Convoluted with the truncated normality on t, we can approximate the prior beliefof t+1 as TN

    (t+1,

    2t+1 =

    1t+1

    )with

    t+1 = t (4)

    2t+1 = (t)2

    + 2 (5)

    Details are included in Appendix E.

    15

  • 3.4 Line Length Dynamics Chasing Time-evolving Preference Heterogene-ity

    When we combine the above two pieces of dynamic learning and evolution, we can have the fulldescription of firms dynamic problem. The action-specific flow profit

    pin (t, t, lt) = w M E (s (n, t) |t, t) C (n, lt)

    and the value value function is

    Vn (t, t, lt) = pin (t, t, lt) + E (V (t+1, t+1, lt+1) |t, t, lt, n)V (, , l) = Emax

    n(Vn (, , l) + n)

    where the state variables are the belief mean, belief precision, as well as last period line length,and the transition probability is defined as (2) (3) (4) (5), with an additional one for lt+1 = n.

    4 Empirical Specification and Identification

    In this section, I will present the full empirical specification and identification of the model. Similarto Hitsch (2006), I apply two-step estimation, where the demand side is estimated in linear GMM,and its parameters are plugged in to the supply side. I estimate the dynamic supply model bymaximizing likelihood. This section ends with a discussion on the identification of the model.

    4.1 Demand Side

    The demand side is modeled as a nested logit of with two nests where all Company A chips ofdifferent features are nested in one line, and all non-Company A chips are treated as homogenousoutside products. Based on (1), for each market m,

    ln s1mt ln s0mt = am + cj pjmt + mt( ln sj|l,mt)+ jmt (6)

    Both pjmt and ln sj|l,mt are endogenous, because they are correlated with the unobserved demandshock jmt. I employ the following sets of instruments for the two endogenous variables:

    The summation of characteristics (flavors, fat content and cut type) of other Company A

    16

  • chips sold in the same market-time j 6=j

    xjmt

    Average price of the same feature sold in other geographical markets in the same time

    1

    #

    m 6=m

    pjmt

    Other cost for raw materials, including potatoes, sugar, soy bean oil, edible butter, and edibletallow

    Number of competitor brands and number of competitor UPCs other than Company A chipswithin the same market-time

    The first set of instruments are widely known as BLP instruments, which Berry et al. (1995) startedto use. The underlying assumption is that the characteristics are exogenous to demand shocks. Inthe current model, the upstream wholesalers make product assortment decisions whereas down-stream retailers make pricing decisions. In reality, grocery stores and manufacturers jointly decidewhat to display in advance. If some of the features do not sell well, grocery stores will lower pricesto sell out the storage. In this case, it is natural to assume the assortment decision is made prior tothe realization of local demand shock.

    The second set of instruments are known as Hausman instruments which Nevo (2001) started touse in demand estimation. The underlying assumption is that demand shocks are independent overdifferent markets, but there are factors that may affect the pricing for all markets. These factorsinclude, but are not restricted to, common cost shifters and nationwide advertising campaign. In thepotato chip industries, prices across all markets are subject to common manufacturing cost fromCompany A as well as common nationwide campaign, which validates the usage of Hausmaninstrument.

    The last set of instruments consider the competition environment that was used in Bresnahanet al. (1997). The argument is that competition environments affect firms pricing decisions, whichis orthogonal to demand shocks. In this project, I can also exploit the huge variation in the com-petition environment across different markets measured by the number of competitor brands andUPCs.

    17

  • 4.2 Supply Side - Flow Profit

    In each market m, the action-specific flow payoff of Company A is

    pin,m (, , l) = wm Mm E (sm (n, ) |, )Hm c (n, l)sm (n, ) = EFm

    (exp (I)

    1 + exp (I)

    )I = ln

    nj=1

    exp

    (j

    )

    In other words, I allow a market-specific value profit function and calibrate the parameters asfollows:

    wm: manufacturers margin, calibrated from average price in that market, adjusted by re-tailers markup (15%), distributors markup (25%) and manufacturers gross margin (30%),i.e., wm = pm 0.85 0.75 0.3

    Mm: market size, calibrated from total number of household Hm, with assumption thatan average household spend X dollars per quarter in buying potato chips, where X iscalculated from $24 spent by an average household in a year in potato chip consump-tion, adjusted, by quarters and market shares of large package sized chips, i.e., Mm =Hm 6 ShareLargem/pm

    Cost of line length maintenance: assume a per-capita cost, i.e., Cm (n, l) = Hm c (n, l). Inthe estimation, I tried two specifications of the per-capita cost: linear and kink. In the linearspecification, c (n, l) = c n. In the kink specification, c (n, l) = (c1 + c21 (n > l)) n

    Fm: distribution of mean utility , assume normality, with mean and variance calibrated bythe empirical distribution of {jmt}j,t

    The only parameters to estimate in the flow profit is the cost parameter {c1, c2}.

    4.3 Supply Side - Dynamics

    Firms dynamic problem is described as

    Vn,m (t, t, lt) = pin,m (t, t, lt) + Em (Vm (t+1, t+1, lt+1) |t, t, lt, n)Vm (, , l) = Emax

    n(Vn,m (, , l) + n)

    18

  • The unspecified parameters are initial belief (1,m, 1,m), the evolution rate ,m as well as thescale of random fixed cost . All parameters are identified as shown from below, but I still imposethe following cross-market restrictions to simplify the calculation.

    1m: initial prior precision is assumed to be proportional to the precision of signal. This isjustified by stationary assumptions in the learning process. For markets with a more precisesignal, the learning speed is expected to be fast. However, this is only valid if the beliefprecision is the same. I equalize the learning speed across all markets by assuming that theprior belief is proportional to signal precision, i.e., 1m = k hm, where hm = 1#

    t hmt

    be the average precision.

    ,m: evolution rate of preference heterogeneity. From stationary assumption, ,m = k (k + 1) hm after combining stationarity and (5)

    1

    1m=

    1

    1m + hm+ ,m

    1m: initial prior mean, integrated from calibrated normal distribution, with mean and vari-ance estimated from {mt}t21

    So the dynamic parameters to identify is {k, }

    4.4 Identification

    This section briefly shows the identification of of supply side parameters without imposing anycross-market restriction, i.e., market-specific parameters are separately identified. In the currentversion, we assume that initial prior mean 1 is known (and integrated out in the estimation). How-ever, the identification does not rely on this assumption. A stronger identification result withoutknowing prior mean is described in Appendix F.

    In our data, we can observe actual line length decisions, signal values and precisions, as wellas prior mean

    {nt, t, ht, 1}

    Based on these information, I will show the non-parametric identification of preference evolutionrate, initial belief precision, and line length maintenance cost, and scale of fixed cost for launch-

    21Note that initial prior mean is also identifiable as is shown in Appendix F. However, I follow the convention oflearning literature to integrate out this value.

    19

  • ing22

    {, 1, c}

    4.4.1 Preference evolution rate and prior precision 1

    Signal evolution rate measures how fast evolves over time. Intuitively, t can be estimatedfrom demand, and this rate is identified by the demand side estimation t. Equivalently, the signalvalue t is calculated based on demand estimation, and is identified from Var (t+1|t), becauset+1 deviates from t by three reasons: signal error in period t, signal error in period t+ 1, and thedeviation of t+1 from 1. The precision of the first two errors are known, so the rate of evolutionis identified.

    Initial precision is identified by stationary assumptions that the precision belief does not ex-plode. From the following equation

    1

    1=

    1

    1 + h+

    we can pin down 1. The intuition is that when making line length decisions, Company A cannotrely too much on market signal, because signal is noisy, measured by h. She can neither rely toomuch on her prior belief, because evolves over time, as is measured by . The optimal balancingbetween these two sources pin down the belief precision in the stationary level.

    4.4.2 Cost of line length maintenance c

    From the last part, I have shown identification of 1 and . With the knowledge of 1, I cancalculate the whole process of belief process {t, t}, and the state variable is known. The costparameter is identified by the standard argument of Conditional Choice Probability E (nt|t, t, lt)proposed by Magnac and Thesmar (2002). Intuitively, fixing the belief precision, when the cost islow, optimal line length is more responsive to changes in belief mean, as is shown in Figure 2. Thecost is identified by regressing actual line length nt on the belief mean t, controlling for t.

    5 Results

    This section shows the model estimates and various simulation results based on estimates obtained.22A final supply side parameter is a nuisance parameter which is not non-parametrically identified. But since we

    have impose functional form assumption on the value function, including the estimation of this parameter will improvethe model fit a lot.

    20

  • 5.1 Demand Estimation

    In the demand side, I estimate a Nested Logit model specified in (6). I report the average estimatesof preference heterogeneity by imposing mt = in this part, but in the supply side, I allowpreference heterogeneity to vary by market and time.

    Table 3 reports the estimation result from the demand side. Column (1) disregards the exis-tence of endogeneity problem and directly estimate the equation by OLS. Column (2) overcomethis problem by applying three sets of instruments as described before. By comparing column (1)and column (2), I find that instrumental variables work well as expected. Both preference hetero-geneity and price elasticity will be under-estimated without controlling for endogeneity, and thecharacteristic vectors only become significant in 2SLS specification.

    Note that the first two columns in Table 3 use characteristic vectors (flavor fixed effects, cuttypes, fat contents) to describe one product. In column (3), I replace with a more precise control,that is product fixed effects. The estimates for price elasticity does not change too much (-2.38in Column 3 compared to -2.53 in Column 2), but the estimates for preference heterogeneity al-most doubled. As mentioned, the characteristics vectors cannot capture consumers preferencecompletely, so I take the product fixed effects estimates as benchmark case, where the preferenceheterogeneity is estimated to be 0.41 (with a standard error of 0.02, Column 3, Table 3). In Column(4), I allow price elasticity to vary by demographics. I find that price is less elastic in markets witha richer population measured by median income, or older population measured by median age,which coincides with most previous findings.

    The main parameter of interest is the preference heterogeneity in this paper, so in Table 4, Iexplore the source of preference heterogeneity by interacting with different observables. Column(1) copies the Column (3) from Table 3 to serve as a benchmark case. In Column (2), I estimatethe same model but in the data for small-package-sized potato chips. I find that preference ismore heterogenous (0.67 in Column 2 compared to 0.41 in Column 1) and price is more elastic(2.74 in Column 2 compared to 2.38 in Column 1). This extra heterogeneity in preference maycome from the fact that consumers are more willingness to try new flavors when buying smallsized potato chips. There are two sources of preference heterogeneity estimated in this paper: oneis the preference heterogeneity between consumers, and the other is the preference heterogeneitywithin consumer but in different purchase occasions. I cannot separately identify these two sourceswith only market level data, but I believe that the second source is more significant in markets forsmall packaged potato chips. The difference in heterogeneity estimation supports the existenceof heterogeneity within consumers in different purchase occasions, and this is related to varietyseeking behavior.

    21

  • Another source of preference heterogeneity comes from population diversity. In Column (3)-(7) of Table 4, I explore to what extent population diversity can explain preference heterogeneity.The results are robust to a series of diversity measures. In Column (3), I uses interquartile ofincome distribution to measure the population diversity. I find that in markets with a more disperseincome distribution, the preference heterogeneity is significantly higher. To quantify this estimates,I take out two markets with minimum (0.04) and maximum (0.10) diversity measure, and theimplied difference in heterogeneity is 0.09,23 or 20% of the baseline heterogeneity of 0.41. Incolumn (4), the diversity measure is the dispersion of age distribution, and the implied differencein heterogeneity is 0.07, or 17% of baseline value. Other than the above two dispersions, thepreference heterogeneity is also explained by diversity of ethnic groups. In Column (5), I use Asianpopulation ratio in that market and find that in markets with a 10% higher Asian population ratio,the preference is more heterogenous by a measure of 0.047 out of baseline value of 0.41. In Column(6), I use Hispanic population ratio, and the interaction term is not significant. This is becausethere is a wide range of Hispanic population measure from 0 to 53%. If the true functional formis non-linear, using linear function form to approximate may not get significant result. Instead, Idiscretize the measure using a dummy for above median, and the estimates is reported in Column(7). In markets with above-median Hispanic population ratio, the preference is more heterogenousby a measure of 0.12 out of baseline value of 0.41.

    5.2 Supply Estimation

    I plug in the coefficients and estimate the supply side by maximum likelihood. Solving the originalproblem with brute force is difficult, because calculating the line share sm (n, ), the flow payoffpin,m (, , l) and the state transition f (t+1, t+1|t, t, n) all requires simulation. However, I canemploy numerical methods to further simplify the calculations.

    For sm (n, ), I use power polynomials to approximate. Because it does not contain any pa-rameters to estimate, the approximation needs to be calculated only once. The reason for usingpolynomials is the ease for preserving monotonicity and super-modularity in the approximatedfunction, which is the key for identification.24 To calculate pin,m (, , l), I use quadrature to cal-culate the expectation with respect to although is distributed in truncated normal instead ofnormal. When the precision is quite high, and the mean is far from the boundary, the truncatednormal can be approximated by standard normal because the probability of lying outside the

    23This is calculated by (0.1 0.04) 1.4824I use CVX to get the approximation, which is a regularized optimization package (Grant et al., 2008). See

    Appendix C for details.

    22

  • boundary is low. In terms of state transition probability, because the line length stays at a highlevel (for the large package size, the line length ranges from 8 to 30, with an average of 22), andthe precision does not explode because of the time-varying , I simply assume the state transitionprobability does not depend on action n, which relieves the computation burden. Finally, I useChebyshev polynomials to approximate the value function and estimate the single-agent dynamicgame with unobservable and time-varying state variables.25

    Table 5 reports the estimation results. I estimate the model in two specifications. In the firstspecification, I assume the maintenance cost per capita (1M household) is linear in the line length,whereas in the second specification, the marginal cost is higher when manufactures are expandingtheir lines. In the first specification, the marginal cost of expanding a line by length one is $3,560per million of household. For an average line length of 22, the total (variable) cost of maintaining aline length in an average-size city with 2.63 million household is approximately $0.2 million.26 Asa comparison, the industrial in an averaged-sized city with average line length selling at averageprice is $8.96 million,27 the product line related cost constitutes about 2% of total revenue.

    In the second specification, the cost is nonlinear, and I find an extra cost ($6.14K comparedto $2.08K) of expanding the product line. This extra cost comes from the inflexibility of display-ing, distributing, storing or advertising additional products. The extra cost limits the flexibility ofline length adjustment in two senses. First, it restricts the possibility of line expansion becauseexpanding the product line may incur this extra cost. Second, it also restricts the possibility ofproduct line contraction, because when Company A considers withdrawing some products, shemight worry about the future cost of pulling them back again. Counterfactual analysis in the nextsubsection may quantify this inflexibility caused by non-linear cost structure.

    The precision ratio between belief and signal is estimated to be around 2.5 in both specifi-cations. Note that this ratio determines the linear weight for prior and signal when updating thebelief. From the estimation, Company A places 30% of decision weight on in-market signal and70% weights on past experience, summarized by prior belief. Even as an experienced player in amatured market, Company A is still leveraging heavily on the in-market learning, because of theevolutionary nature of preference heterogeneity. The market signal is a bit too noisy, so CompanyA cannot rely completely on the market signal. Counterfactual analysis in the next subsection willshow the gross margin Company A may achieve if she knows the true value of heterogeneity inadvance.

    25The recent development of MPEC (Dub et al., 2013) is also applicable to this model.26$3, 560 22 2.63 = $0.2M, all numbers are taken from Table 1.27$0.25 0.03 22 54.31M = $8.96M, all numbers are taken from Table 1.

    23

  • 5.3 Model Fit

    In order to evaluate how the model fit the data, I simulate the line length decisions in all 50 markets.Within each market, the prior mean 1 is drawn from known distribution, and prior precision 1is known from estimation, initial line length n1 = l2 is taken as given. After specifying the initialcondition, beliefs are updated from signals (1, h1) to get belief in period 2 (2, 2), and theoptimal line length n2 is simulated, and the process goes on to the end of data period.

    I run simulations to check how the model fit the data. In the first simulation, signals (t, ht)are taken from data. In the second simulation, I simulate these signals. Figure 3 compares actualand simulated line length in two markets, and Figure 4 compares the whole distribution of linelength and line length changes for actual and simulated data. Both simulations fit the data quitewell in most markets. The first simulation fits the data almost perfect, because it makes use ofmost information from the data. The second simulation also fits well. In the model, there arethree factors that determines the optimal line length choices. They are evolution of preferenceheterogeneity, signaling error caused by demand shocks, and random fixed cost of product lineadjustment. The first simulation only average out random fixed cost, and simulation result confirmsthat this cost is not the driving force for actual line length patterns. The second simulation averagedout both random fixed cost and signaling error. The only remaining force that determines the linelength pattern is the evolution of preference heterogeneity, which is the main mechanism in thispaper. In the remaining part of this paper, I will always implement the second simulation.

    5.4 Counterfactuals

    I run two sets of counterfactual simulations to evaluate Company As optimal line length responsesto product-line related policy changes. In the first counterfactual exercise, I evaluate firms optimalline length decisions under a smooth cost structure; in the second counterfactual exercise, I estimatefirms improvement in gross margin under complete information about preference heterogeneitywhen making line length decisions. A byproduct of of the second counterfactual exercise is toprovide some indirect test on the information assumptions of the firm: does he know or learn?

    5.4.1 Smooth cost structure

    The non-linearity of cost structure restricts firms flexibility to adjust product line. This simulationquantify how much. In this simulation, I take the cost structure as linear in the first specificationfrom supply side and simulate market signals as well as firms optimal responses. The results isillustrated in Figure 5. The distribution of line length does not change too much, as is shown from

    24

  • the left panel, whereas the distribution of line length changes becomes more dispersed in the rightpanel, which means that Company A is more likely to adjust line length aggressively in the smoothcost structure. To further quantify this change, the probability of line length adjustment growsfrom 70% in the raw data to 90% in simulation.

    The effect is quite symmetric in line length expansion and line length contraction, as is shownin the right panel. Under a smooth cost structure, probability of line length expansion and linelength contraction both increases significantly. As is mentioned before, the increase in line lengthexpansion reflects the static concern that expanding the product line will incur more cost, whereasincrease in line length contraction reflects the dynamic concern that the firm is more cautious inwithdrawing some flavor because they might worry about the future cost of pull them back again.Simulation result confirms the existence of both effects that restricts the flexibility of line lengthadjustment.

    5.4.2 Perfect information on preference heterogeneity

    Figure 6 shows the simulation result for complete information on preference heterogeneity whenmaking line length decisions. The actual line length decisions under complete information deviatea lot from the baseline case with learning heterogeneity. This is simply because Company A adaptsinstantly to the time-evolving heterogeneity rather than chasing time-varying heterogeneity underthe learning model. The resulting gross margin is increased by 5% under complete information.On the other hand, the change in line length adjustment does not change a lot.

    Based on this simulation result, I can indirectly test the information hypothesis that CompanyA learns rather than knows the true value of preference heterogeneity. First note that the twohypothesis are not nested in the model of stationary learning,28 so there is no direct test based onsome parameters. Motivated by the fact that with complete information, Company A will enjoy ahigher gross margin, I propose the following test based on gross margin.

    In the data, we can calculate the gross margin across 50 cities over 28 quarters, which givesus a vector gm with a length of 1,400. Let FK denote the distribution of gm generated in modelwhere firms knows heterogeneity, and FL denote the distribution of gm generated from the modelwhere firms learns heterogeneity. To test the assumption of learning, it is equivalent to test

    H0 : gm FK , H1 : gm FL28In the standard learning framework, the two hypothesis is nested. In order to test whether agent knows the true

    value, it is equivalent to test whether the initial belief precision is infinity. (Hitsch, 2006)

    25

  • It is quite difficult to calculate a test statistics in testing high-dimensional vector, but at leastwe can sacrifice some of the power and focus on some statistics. Figure 7 reports the test resultfor the median level of gross margin. We can see that the two distributions are quite separated,and the actual data is observed to come from FL. We can reject the null and tend to believe inthe information assumption, that Company A learns about preference heterogeneity when makingproduct line decisions.

    6 Conclusion

    This paper links product line length decisions with heterogeneity of preference and rationalizes itscross-sectional and intertemporal variation. Preference heterogeneity in this paper is an aggregatemeasure of both preference heterogeneity across individuals and variety seeking within individu-als, and it is measured by nesting parameters in the standard nested logit model. Cross-sectionalvariation in preference heterogeneity, which is partly driven by the diversity of population de-mographics, explains differentials in line length among different cities. Within one city, a firmsin-market learning of preference heterogeneity drives line length adjustment.

    I apply the model to the potato chip industry, where Company A is the lead player. The pref-erence heterogeneity is estimated to be 0.41 in large package size chips and 0.67 in small packagesize chips, which means preference for small packages is more heterogenous. This is driven bymore intensive variety seeking for small package chips. I also find that preference is more hetero-geneous in markets with higher Hispanic population ratios or higher dispersions in age distribution.

    On the supply side, Company A, as an experienced firm in a mature market, also applies in-market learning about preference heterogeneity to adjust proliferation decisions. I find CompanyA bases its decisions primarily on past experience in the market, with the most recent marketrealization representing only one-third of the influence on product-line decisions. The cost formaintaining an average line length constitutes about 2% of total revenue. I estimate the sunk costincurred when expanding product proliferation to be three times the usual maintenance cost, whichmay limit the flexibility of product-line adjustment.

    Counterfactual analysis based on the estimates evaluate firms optimal line length decisionsunder smooth cost and in cases with complete information rather than learning about preferenceheterogeneity. In the first case, Company A is found to be more aggressive in line length adjust-ment under a smooth cost structure; in the second case, Company As gross margin is increased by5% when she knows the true value of preference heterogeneity. The result for the second counter-factual also help to test the information assumption that firms learns rather than knows the pref-

    26

  • erence heterogeneity at the time of line length decisions. The test result supports the informationassumption of learning.

    The whole model is easily applicable to other industries in which product proliferation is a keydecision. One example is the two MP3 players produced by Apple: iPod Classic and iPod Nano.iPod classic provides a limited choice of colorsalways black or whitebut iPod Nano offersa longer line of colors. The length of the Nano line also varies over time, from two in the firstgeneration to nine in the fourth generation and back to six in the most recent one. The mechanismin this paper explains the difference between two MP3 players, as most consumers of the iPodClassic are professional music lovers who care more about sound quality, control convenience,and storage and less about colors, whereas consumers buying iPod Nano are younger on averageand care more about colors and have more diverse views on their favorite one. The time-varyingchanges in line length for the Nano can be attributed to Apple gradual learning about preferencediversity.

    The model simplifies the measure of preference heterogeneity. I use a nesting parameter in thenested logit model for two primary reasons. First, nested logit is simple and clean. If I allow anarbitrary substitution pattern for individual-specific demand shock, I can estimate a mixed logit,but getting one statistic to measure the diversity of preference is difficult. Second, the linear rep-resentation of the nested logit model makes the supply-side learning tractable. Further researchshould be directed toward finding a better way to model preference diversity and link it to productproliferation decisions.

    Another shortcoming of the paper is the measure of product proliferation. In this paper, Iuse line length as a highly abstract measure while ignoring the real contents of the product. Imake the assumption primarily for the simplification of state space. If more detailed feature-levelinformation were incorporated, the state space might grow exponentially. One possibility is toapply some heuristic rule in incorporating some statistics of the s for existing features, and thisalso calls for future work.

    A third limitation is that I make the monopoly assumption. This assumption is justifiable in thepotato chip market, but in other markets with competition, the supply-side learning model needsto be modified.

    ReferencesBerry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilibrium. Econo-

    metrica: Journal of the Econometric Society, 841890.

    Berry, S., J. Levinsohn, and A. Pakes (2004). Differentiated products demand systems from a

    27

  • combination of micro and macro data: The new car market. Journal of Political Economy 112(1),68105.

    Berry, S. T. (1992). Estimation of a model of entry in the airline industry. Econometrica: Journalof the Econometric Society, 889917.

    Berry, S. T. (1994). Estimating discrete-choice models of product differentiation. The RANDJournal of Economics, 242262.

    Berry, S. T. and J. Waldfogel (2001). Do mergers increase product variety? evidence from radiobroadcasting. The Quarterly Journal of Economics 116(3), 10091025.

    Bresnahan, T. F. and P. C. Reiss (1990). Entry in monopoly market. The Review of EconomicStudies 57(4), 531553.

    Bresnahan, T. F. and P. C. Reiss (1991). Entry and competition in concentrated markets. Journalof Political Economy, 9771009.

    Bresnahan, T. F., S. Stern, and M. Trajtenberg (1997). Market segmentation and the sources ofrents from innovation: Personal computers in the late 1980s. RAND Journal of Economics,S17S44.

    Bronnenberg, B. J. (2014). The provision of convenience and variety by the market. Available atSSRN.

    Bronnenberg, B. J., J.-P. H. Dub, and M. Gentzkow (2012). The evolution of brand preferences:Evidence from consumer migration. American Economic Review 102(6), 24722508.

    Bronnenberg, B. J., M. W. Kruger, and C. F. Mela (2008). Database paper-the iri marketing dataset. Marketing Science 27(4), 745748.

    Cardell, N. S. (1997). Variance components structures for the extreme-value and logistic distribu-tions with application to models of heterogeneity. Econometric Theory 13(02), 185213.

    Chernev, A. (2003a). Product assortment and individual decision processes. Journal of Personalityand Social Psychology 85(1), 151.

    Chernev, A. (2003b). When more is less and less is more: The role of ideal point availability andassortment in consumer choice. Journal of consumer Research 30(2), 170183.

    Ching, A. T., T. Erdem, and M. P. Keane (2013). Invited paper-learning models: An assessment ofprogress, challenges, and new developments. Marketing Science 32(6), 913938.

    Chintagunta, P. K. (1998). Inertia and variety seeking in a model of brand-purchase timing. Mar-keting Science 17(3), 253270.

    Chintagunta, P. K. (1999). Variety seeking, purchase timing, and the "lightning bolt" brand choicemodel. Management Science 45(4), 486498.

    28

  • Crawford, G., A. Shcherbakov, and M. Shum (2011). The welfare effects of endogenous qualitychoice: evidence from cable television markets. Technical report, mimeo. University of War-wick.

    Crawford, G. S. and M. Shum (2005). Uncertainty and learning in pharmaceutical demand. Econo-metrica 73(4), 11371173.

    Dickstein, M. J. (2014). Efficient provision of experience goods: Evidence from antidepressantchoice. Working Paper.

    Draganska, M. and D. C. Jain (2005). Product-line length as a competitive tool. Journal of Eco-nomics & Management Strategy 14(1), 128.

    Draganska, M. and D. C. Jain (2006). Consumer preferences and product-line pricing strategies:An empirical analysis. Marketing science 25(2), 164174.

    Draganska, M., M. Mazzeo, and K. Seim (2009). Beyond plain vanilla: Modeling joint productassortment and pricing decisions. QME 7(2), 105146.

    Dub, J., J. T. Fox, and C. Su (2013). Improving the numerical performance of blp static anddynamic discrete choice random coefficients demand estimation. forthcoming in. Econometrica.

    Dub, J.-P., G. J. Hitsch, and P. E. Rossi (2009). Do switching costs make markets less competitive?Journal of Marketing Research 46(4), 435445.

    Dub, J.-P., G. J. Hitsch, and P. E. Rossi (2010). State dependence and alternative explanations forconsumer inertia. The RAND Journal of Economics 41(3), 417445.

    Dub, J.-P., K. Sudhir, A. Ching, G. S. Crawford, M. Draganska, J. T. Fox, W. Hartmann, G. J.Hitsch, V. B. Viard, M. Villas-Boas, et al. (2005). Recent advances in structural econometricmodeling: Dynamics, product positioning and entry. Marketing Letters 16(3-4), 209224.

    Einav, L. (2010). Not all rivals look alike: Estimating an equilibrium model of the release datetiming game. Economic Inquiry 48(2), 369390.

    Erdem, T. and M. P. Keane (1996). Decision-making under uncertainty: Capturing dynamic brandchoice processes in turbulent consumer goods markets. Marketing science 15(1), 120.

    Fan, Y. (2013). Ownership consolidation and product characteristics: A study of the us dailynewspaper market. The American Economic Review 103(5), 15981628.

    First-Research (2011). Industry profile - snack foods manufacturing. Technical report.

    Goettler, R. L. and B. R. Gordon (2011). Does amd spur intel to innovate more? Journal ofPolitical Economy 119(6), 11411200.

    Grant, M., S. Boyd, and Y. Ye (2008). Cvx: Matlab software for disciplined convex programming.

    29

  • Griliches, Z. and J. A. Hausman (1986). Errors in variables in panel data. Journal of economet-rics 31(1), 93118.

    Guo, L. and J. Zhang (2012). Consumer deliberation and product line design. Marketing Sci-ence 31(6), 9951007.

    Hitsch, G. J. (2006). An empirical model of optimal dynamic product launch and exit underdemand uncertainty. Marketing Science 25(1), 2550.

    Hu, Y. and S. M. Schennach (2008). Instrumental variable treatment of nonclassical measurementerror models. Econometrica 76(1), 195216.

    Hu, Y. and M. Shum (2012). Nonparametric identification of dynamic models with unobservedstate variables. Journal of Econometrics 171(1), 3244.

    Hui, K.-L. (2004). Product variety under brand influence: An empirical investigation of personalcomputer demand. Management Science 50(5), 686700.

    Iyengar, S. S. and M. R. Lepper (2000). When choice is demotivating: Can one desire too muchof a good thing? Journal of personality and social psychology 79(6), 995.

    Joon, H. (2013). Snack food production in the us. Technical report, IBISWorld.

    Judd, K. L. (1998). Numerical methods in economics. MIT press.

    Kamenica, E. (2008). Contextual inference in markets: On the informational content of productlines. The American Economic Review 98(5), 21272149.

    Lin, S., J. Zhang, and J. R. Hauser (2014). Learning from experience, simply. Marketing Science.

    Liu, Y. and T. H. Cui (2010). The length of product line in distribution channels. MarketingScience 29(3), 474482.

    Lovett, M., W. Bolding, and R. Staelin (2009). Consumer learning models for perceived and actualproduct instability. Working Paper.

    Magnac, T. and D. Thesmar (2002). Identifying dynamic discrete decision processes. Economet-rica 70(2), 801816.

    Mazzeo, M. J. (2002). Product choice and oligopoly market structure. RAND Journal of Eco-nomics, 221242.

    Narayanan, S. and P. Manchanda (2009). Heterogeneous learning and the targeting of marketingcommunication for new products. Marketing Science 28(3), 424441.

    Nevo, A. (2001). Measuring market power in the ready-to-eat cereal industry. Econometrica 69(2),307342.

    30

  • Orhun, A. Y. (2009). Optimal product line design when consumers exhibit choice set-dependentpreferences. Marketing Science 28(5), 868886.

    Petrin, A. (2002). Quantifying the benefits of new products: The case of the minivan. Journal ofPolitical Economy 110(4), 705729.

    Reiss, P. C. and P. T. Spiller (1989). Competition and entry in small airline markets. Journal ofLaw and Economics 32(2), S179202.

    Roberts, J. H. and G. L. Urban (1988). Modeling multiattribute utility, risk, and belief dynamicsfor new consumer durable brand choice. Management Science 34(2), 167185.

    Ryan, S. P. and C. Tucker (2012). Heterogeneity and the dynamics of technology adoption. Quan-titative Marketing and Economics 10(1), 63109.

    Seetharaman, P., S. Chib, A. Ainslie, P. Boatwright, T. Chan, S. Gupta, N. Mehta, V. Rao, andA. Strijnev (2005). Models of multi-category choice behavior. Marketing Letters 16(3-4), 239254.

    Seim, K. (2006). An empirical model of firm entry with endogenous product-type choices. TheRAND Journal of Economics 37(3), 619640.

    Simonson, I. and A. Tversky (1992). Choice in context: Tradeoff contrast and extremeness aver-sion. Journal of marketing research.

    Srensen, M. (2007). How smart is smart money? a two-sided matching model of venture capital.The Journal of Finance 62(6), 27252762.

    Sweeting, A. (2010). The effects of mergers on product positioning: evidence from the music radioindustry. The RAND Journal of Economics 41(2), 372397.

    Urban, G. L. and J. R. Hauser (1993). Design and marketing of new products, Volume 2. PrenticeHall Englewood Cliffs, NJ.

    Urban, G. L. and G. M. Katz (1983). Pre-test-market models: Validation and managerial implica-tions. Journal of Marketing Research (JMR) 20(3).

    Villas-Boas, J. M. (2004). Communication strategies and product line design. Marketing Sci-ence 23(3), 304316.

    31

  • Table 1: Summary Statistics

    Variable Obs Mean Std. Dev. Min Max Sales and prices Line Length (# of features) 1400 22.09 3.86 8.00 30.00

    Change in Line Length 1350 0.19 2.11 -8.00 9.00

    Line Expansion 1350 0.39 0.49 0.00 1.00

    HHI for In-line Market Share 1400 0.13 0.03 0.07 0.36

    Std. for Log In-line Market Share 1400 1.28 0.23 0.72 2.06

    Number of competitor firms 1400 7.41 2.77 3.00 20.00

    Number of competitor UPC 1400 51.77 24.71 12.00 166.00

    Market Share 30930 0.03 0.05 0.00 0.38

    Market Share In Line 30930 0.05 0.06 0.00 0.53

    Price ($/oz) 30930 0.25 0.07 0.12 0.43

    Fat Free 30930 0.09 0.28 0.00 1.00

    Reduced Fat 30930 0.15 0.35 0.00 1.00

    Ruffle Cut 30930 0.28 0.45 0.00 1.00

    Wavy Cut 30930 0.13 0.33 0.00 1.00 Market size (Million Oz) 50 54.31 57.10 6.06 278.27 Demographics Median Income (1K $) 1400 56.81 8.98 23.10 89.09

    Median Age 1400 35.08 2.89 26.00 48.33

    Interquartile Income (1M $) 1400 0.06 0.01 0.04 0.10 Interquartile Age (10 yrs) 1400 3.40 0.20 2.75 4.47 Asian % 1400 0.04 0.04 0.00 0.31

    Hispanic % 1400 0.10 0.11 0.00 0.53 Number of Households (Million) 50 2.63 3.10 0.26 17.10 Cost shifters Potato Price ($/100lb) 28 12.37 4.29 7.42 21.90

    Refined Sugar Price (cent/lb) 28 45.20 3.58 41.93 51.93

    Soy Bean Oil Price (cent/lb) 7 28.28 11.58 16.46 52.03

    Edible Butter Price ($/lb) 7 1.41 0.27 1.11 1.82

    Edible Tallow Price (cent/lb) 7 19.60 5.53 13.71 30.76

    Note: Sales and prices data for 58 Company-A features (a unique combination of 36 flavors, 3 fat contents regular, reduced fat, fat free and 3 cut types flat, ruffle, wavy) across 50 markets, over 28 quarters in 7 years (2001-2007) are aggregated from IRI Academic dataset. Features that have positive sales for less than 12 weeks are dropped from the sample and their market shares are proportionally allocated to other features within serving sizes city quarter. Demographic data over 50 cities and 28 quarters are merged from IPUM CPS dataset. Cost shifters for 28 quarters or 7 years depending on the data availability are collected from various year books published by Bureau of Labor Statistics and Department of Agriculture.

  • Table 2: Reduced Form Evidence on Dynamic Line Length Decisions

    FE, Dependent variable is

    Line Length, t+1 1(Line Expansion, t+1) Mean

    (1) (2) (3) (4) (Std) Concentration measure HHI -6.78*** -1.58*** 0.13

    (1.41) (0.41) (0.03) Sdev Ln Share In Line -1.57*** -0.20*** 1.28

    (0.25) (0.07) (0.23) Line Length 0.86*** 0.79*** -0.03*** -0.05*** 22.15

    (0.01) (0.02) (0.00) (0.01) (3.89) City fe Yes Yes Yes Yes Quarter fe Yes Yes Yes Yes Observations 1350 1350 1350 1350 Adjusted R-squared 0.86 0.73 0.34 0.34 Mean Dependent Variable 22.34 22.34 0.39 0.39

    Note: This table illustrates the reduced-form evidence for line length adjustment in response to time-evolving preference heterogeneity. Preference heterogeneity is inversely correlated with concentration for in-line market shares, i.e., concentrated in-line market-share means homogenous preference. All columns are panel data regressions with market fixed effects. The dependent variables are next-quarter line-length in columns (1) and (2) and next-quarter dummy for line length expansion in columns (3) and (4). Line length is the count of features (flavor-cut-fat) within each market-quarter after dropping transient ones with less than 12 weeks of positive sales. All data come from IRI Academic Dataset.

  • Table 3: Demand Estimation

    Dependent Variable is Ln(Share1) Ln(Share0) OLS 2SLS Mean (1) (2) (3) (4) (Std) Preference Heterogeneity 0.02*** 0.23*** 0.41*** 0.49*** (0.00) (0.01) (0.02) (0.02) Price -0.13** -2.53*** -2.38*** -49.50*** 0.25

    (0.05) (0.19) (0.22) (2.92) (0.07) Ln(Median Income) 3.08*** 10.94

    (0.30) (0.16) Ln(Median Age) 3.33*** 3.55

    (0.60) (0.08) Ruffle cut -0.01*** -0.11*** 0.28

    (0.00) (0.01) (0.45) Wavy cut 0.01 0.03*** 0.13

    (0.01) (0.01) (0.33) Fat free 0.01 -0.06*** 0.09

    (0.01) (0.01) (0.28) Reduced fat -0.01 -0.15*** 0.15

    (0.01) (0.01) (0.35) Flavor fe Yes Yes No No Product fe No No Yes Yes Market fe Yes Yes Yes Yes Observations 30930 30930 30930 30930

    Note: This table shows the demand estimation induced by nested logit model. The dependent variable for all columns are the difference between logarithm of total Frito Lay shares and total shares from outside goods. Column (1) uses OLS, column (2) (4) uses 2SLS, with three sets of instrumental variables including BLP instruments (summation of flavor, cut and fat dummies for other features in the same serving-city-quarter), Hausman instruments (average price sold for the same feature in other city within serving-quarter, price of materials including potatoes, sugar, soy bean oil, edible butter and edible tallow) and competition environment (number of competitor firms and number of competitor UPCs other than Company-A chips within serving-city-quarter).

  • Table 4: Demand Estimation with Varieties of Preference Heterogeneity

    2SLS, Dependent Variable is Ln(Share1) Ln(Share0)

    Baseline Small Packaged Interquartile Income (1M)

    Interquartile Age (10 yr) Asian Hispanic

    Above p50 Hispanic

    (1) (2) (3) (4) (5) (6) (7) Preference Heterogeneity 0.41*** 0.67*** 0.36*** 0.30*** 0.41*** 0.42*** 0.36***

    (0.02) (0.04) (0.02) (0.04) (0.02) (0.02) (0.02) Diversity Measure 1.48*** 0.04*** 0.47*** -0.09 0.12***

    (0.19) (0.01) (0.15) (0.06) (0.01) Price -2.38*** -2.74*** -2.99*** -2.62*** -2.57*** -2.30*** -2.87***

    (0.22) (0.29) (0.24) (0.24) (0.23) (0.22) (0.23) Product fe Yes Yes Yes Yes Yes Yes Yes Observations 30930 9155 30930 30930 30930 30930 30930 Adjusted R-squared 0.81 0.41 0.79 0.8 0.80 0.81 0.78 Summary statistics of population diversity measure Mean 0.06 3.39 0.04 0.1 0.5 Min 0.04 2.75 0.00 0.00 0.00 Max 0.10 4.47 0.31 0.53 1.00

    Note: This table shows the demand estimation of nested logit model allowing preference heterogeneity to vary by observables. The dependent variable for all columns are the difference between logarithm of total Frito Lay shares and total shares from outside goods. All columns are estimated using 2SLS with three sets of instruments: BLP instruments, Hausman instruments, and competition environments. Column (1) is the baseline estimates for large package size potato chips, which is identical to Column (3) in Table 3. Column (2) reports the estimates with identical specification but in small-sized package chips (1-4 serving sizes). Column (3)-(7) allow preference heterogeneity to vary by different measures of population diversity, where Column(3) uses interquartile of income, Column (4) uses interquartile of age, Column(5) uses Asian population ratio, Column (6) uses Hispanic population ratio, and Column (7) uses discretized Hispanic population ratio, which is the dummy for above-median Hispanic population ratio.

  • Table 5: Supply Side Parameter Estimation

    Linear Cost Nonlinear Cost b s.e. b s.e. Cost ! (1K $ / 1M HH) 3.56 (0.86) 2.08 (1.38) Cost ! (1K $ / 1M HH) 6.14 (0.65) Precision Ratio !/ 2.55 (0.02) 2.42 (0.02) Scale of fixed cost ! 0.14 (0.00) 0.02 (0.00) Prior mean ! Integrated Integrated Log Likelihood -83.37 -63.61

    Note: The cost function for linear specification is = ! , while the cost function for nonlinear specification is ! , !!! = ! + ! (! > !!!) !.

  • Figure 1: Distribution of line length and line length changes

    0.0

    5.1

    .15

    10 15 20 25 30Line Length

    Line Length

    0.1

    .2.3

    -10 -5 0 5 10Change in Line Length

    Change in Line Length

    Note: Left figure plots the distribution of line length among 50 markets over 28 quarters, and rightfigure plots the distribution of change in line length, which is first difference for line length overtwo consecutive quarters within one market. Line length is defined as the count of products (uniquecombination of flavor-fat-cut) within the city-quarters. Products with positive sales for less than12 weeks within city-quarters are not counted.

    37

  • Figure 2: Identification line length maintenance cost

    .3.4

    .5.6

    .7.8

    Tota

    l Sha

    re

    5 10 15 20Line length (n)

    Low c

    .3.4

    .5.6

    .7.8

    Tota

    l Sha

    re5 10 15 20

    Line length (n)

    High c

    H M L

    Note: This figure shows the identification of line length maintenance cost. In each plot, the thickcurves are the total market share as a function of line length. I plot three curves with identicalvariance but different mean value of preference heterogeneity . We can see that the total marketshare is increasing in line length, preference heterogeneity and super-modular in the two parame-ters. Straight lines are cost function, and the slope represents the marginal cost of expanding theline length. The tangent point of cost line and market share curve represents the optimal line lengthdecisions. We can see that the implied optimal line length is higher when preference heterogene-ity is higher. The two plots differ in marginal cost, and we can see that when cost is lower, linelength decisions are more responsive to change in mean for heterogeneity, which completes theidentification for cost.

    38

  • Figure 3: Model fit - two markets

    1520

    2530

    2001q3 2003q1 2004q3 2006q1 2007q3

    BOSTON

    1520

    2530

    2001q3 2003q1 2004q3 2006q1 2007q3

    DETROIT

    Actual Signal from simulation Signal from data

    Note: This figure shows how the model fits the data in two cities: Boston and Detroit. Solid linesare actual line length decisions, and two dashed lines are line length decisions from simulation.In the first simulation, signal from data, market signals are taken from the data; in the secondsimulation, signal from simulation, market signals are also simulated from the model. Priormean in the first