29
Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks A case study about predicting the U.S. market share of the Porsche Panamera using the Bayesia Market Simulator White Paper 2010/II Stefan Conrady, [email protected] Dr. Lionel Jouffe, [email protected] December 18, 2010 Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting

Bayesia Lab Choice Modeling 1

  • Upload
    jouffe

  • View
    1.148

  • Download
    0

Embed Size (px)

DESCRIPTION

We present a new method and the associated workflow for estimating market shares of future products based exclusively on pre-introduction data, such as syndicated studies conducted prior to product launch. Our approach provides a highly practical, fast and economical alternative to conducting new primary research.

Citation preview

Page 1: Bayesia Lab Choice Modeling 1

Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks

A case study about predicting the U.S. market share of the Porsche Panamera

using the Bayesia Market Simulator

White Paper 2010/II

Stefan Conrady, [email protected]

Dr. Lionel Jouffe, [email protected]

December 18, 2010

Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting

Page 2: Bayesia Lab Choice Modeling 1

Table of Contents

Modeling Vehicle Choice and Simulating Market Share with Bayesian Net-works

Abstract/Executive Summary 1

Objective 1

About the Authors 2

Stefan Conrady 2

Lionel Jouffe 2

Acknowledgements 2

Introduction 2

Bayesian Networks for Choice Modeling 3

Case Study 4

Porsche Panamera 4

Common Forecasting Practices 6

Tutorial 6

Data Preparation 6

Consumer Research 6

Variable Selection 7

Set of Choice Alternatives 7

Filtered Values (Censored States) 7

Data Modeling 8

Data Import 8

Missing Values 9

Discretization 10

Variable Classes and Forbidden Arcs 12

Unsupervised Learning 13

Simulation 14

Product Scenario Baseline 14

Product Scenario Simulation 16

Substitution and Cannibalization 19

Market Scenario Simulation 20

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com i

Page 3: Bayesia Lab Choice Modeling 1

Limitations 20

Outlook 20

Summary 21

Appendix 22

Utility-Based Choice Theory 22

Multinomial Logit Models 22

Stated Preference Data 23

Revealed Preference Data 23

NVES Variables 23

References 25

Contact Information 26

Conrady Applied Science, LLC 26

Bayesia SAS 26

Copyright 26

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com ii

Page 4: Bayesia Lab Choice Modeling 1

Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks

Abstract/Executive SummaryWe present a new method and the associated work!ow

for estimating market shares of future products based

exclusively on pre-introduction data, such as syndicated studies conducted prior to product launch. Our ap-

proach provides a highly practical, fast and economical

alternative to conducting new primary research.

With Bayesian networks as the framework, and by em-

ploying the BayesiaLab and Bayesia Market Simulator

software packages, this approach helps market research-ers and product planners to reliably perform market

share simulations on their desktop computers1, which

would have been entirely inconceivable in the past.

This innovative approach is explained step-by-step in a

study about the introduction of the new Porsche Panam-era in the U.S. market. The results con"rm that market

share simulation with Bayesian networks is feasible even

in niche markets that provide relatively few observa-tions.

We believe that making this method and the tools acces-

sible to practitioners is an important contribution to real-world marketing. We are con"dent that for many

companies this approach can yield a step-change in their

forecasting ability.

ObjectiveThis tutorial is intended for marketing practitioners, who

are exploring the use of Bayesian network for their work. The example in this tutorial is meant to illustrate

the capabilities of BayesiaLab with a real-world case

study and actual consumer data. Beyond market re-searchers, analysts in many "elds will hopefully "nd the

proposed methodology valuable and intuitive. In this

context, many of the technical steps are outlined in great detail, such as data preparation and the network learn-

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 1

1 BayesiaLab and Bayesia Market Simulator can run on a wide range of operating systems, including Windows, OS X,

Linux/Unix, etc.

Scenario De!nitionfrom Analyst

ModelingBayesiaLab

SimulationBayesia Market

Simulator

Market Datafrom Survey

Market ModelBayesian Network

Market Shares

Projection

Market Share Simulation Work"ow with BayesiaLab and Bayesia Market Simulator

Page 5: Bayesia Lab Choice Modeling 1

ing, as they are applicable to research with BayesiaLab in

general, regardless of the domain.

This paper is part of a series of tutorials, which are ex-

ploring a broad range of real-world applications of

Bayesian networks.

About the Authors

Stefan ConradyStefan Conrady is the co-founder and managing partner

of Conrady Applied Science, LLC, a privately held con-

sulting "rm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010,

Conrady Applied Science was appointed the authorized

sales and consulting partner of Bayesia SAS for North

America. Stefan Conrady has many years of marketing, product planning and market research experience with

Mercedes-Benz, BMW Group, Rolls-Royce Motor Cars

and Nissan. In the context of these management assign-ments, Stefan has been based in Europe, North America

and Asia.

Lionel JouffeDr. Lionel Jouffe is co-founder and CEO of France-based

Bayesia SAS. Lionel Jouffe holds a Ph.D. in Computer

Science and has been working in the "eld of Arti"cial

Intelligence since the early 1990s. He and his team have been developing BayesiaLab since 1999 and it has

emerged as the leading software package for knowledge

discovery, data mining and knowledge modeling using

Bayesian networks. BayesiaLab enjoys broad acceptance

in academic communities as well as in business and in-dustry. The relevance of Bayesian networks, especially in

the context of market research, is highlighted by

Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007.

AcknowledgementsStrategic Vision, Inc.2 (SVI) has generously made their

2009 New Vehicle Experience Survey available as a data

source for this case study. In this context, special thanks go to Alexander Edwards, President, Automotive Divi-

sion of Strategic Vision.

We would also like to thank Jeff Dotson3, John Fitzger-

ald4 and Frank Koppelman5 for their ongoing coaching and their valuable comments on this paper. However, all

errors remain the responsibility of the authors.

Finally, Kenneth Train’s6 books and articles have been very helpful over the years as we explored the "eld of

consumer choice modeling.

IntroductionFor the vast majority of businesses, market share is a key performance indicator. Market share is used as a metric

that allows comparing competitive performance inde-

pendently from overall market size and its !uctuations.

In the product planning process, the expected market

share is critical, along with the overall market forecast,

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 2

2 www.strategicvision.com

3 Assistant Professor of Marketing, Vanderbilt University, Owen Graduate School of Management.

4 President, Fitzgerald Brunetti Productions, Inc., New York.

5 Professor Emeritus, Professor Emeritus of Civil and Environmental Engineering, Robert R. McCormick School of En-

gineering and Applied Science, Northwestern University.

6 Adjunct Professor of Economics and Public Policy, University of California, Berkeley.

Page 6: Bayesia Lab Choice Modeling 1

as together they de"ne the sales volume expectation,

which, for obvious reasons, is a key element in most business cases.

As a result, it is critical for decision makers to correctly

predict the future market shares of products not yet de-veloped. The task of such market share forecasts typi-

cally falls into marketing and market research depart-

ments, who are mostly closely involved with understand-ing consumer behavior and, more speci"cally, the

product choices they make.

If we fully understood the consumer’s decision making

process and observed all components of it, we could simply generate a deterministic model for predicting

future consumer choices. However, we do not and it is

obvious that many elements contributing to a consumer’s purchase decision are inherently unobservable. Despite

our limited comprehension of the true human choice

process, there are a number of tools that still allow mod-eling consumer choice with what is observable, and ac-

counting for what will remain unknowable. In this con-

text, and based on the seminal works of Nobel-laureate

Daniel McFadden7, choice modeling has emerged as an important tool in understanding and simulating con-

sumer choice.

Such choice models serve a representation of the “real world” and thus become, what Judea Pearl likes to call

“oracles” that allow us to “deliberately reason about the

consequences of actions we have not yet taken.”8

Bayesian Networks for Choice ModelingUsing Bayesian networks9 as the general framework for modeling a domain or system has many advantages,

which Darwiche (2010) summarizes as follows:

• “Bayesian networks provide a systematic and localized method for structuring probabilistic information

about a situation into a coherent whole […]”

• “Many applications can be reduced to Bayesian net-work inference, allowing one to to capitalize on Bayes-

ian network algorithms instead of having to invent

specialized algorithms for each new application.”

Given the very attractive properties of Bayesian net-works for representing a wide range of problem do-

mains, it seems appropriate applying them for choice

modeling as well. In particular, the BayesiaLab software package has made it very convenient to automatically

machine-learn fairly large and complex Bayesian net-

works from observational data.

Beyond the convenience and speed of estimating Bayes-

ian networks with BayesiaLab, there are three fundamen-

tal differences in modeling consumer choice with Bayes-

ian networks compared to traditional discrete choice models.10

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 3

7 Daniel McFadden received, jointly with James Heckman, the 2000 Nobel Memorial Prize in Economic Sciences;

McFadden’s share of the prize was “for his development of theory and methods for analyzing discrete choice”.

8 A recurring quote from Judea Pearl’s many lectures on causality.

9 A Bayesian network is a graphical model that represents the joint probability distribution over a set of random vari-

ables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to

compute the probabilities of the presence of various diseases. A very concise introduction to Bayesian networks can be

found in Darwiche (2010).

10 A very brief overview about utility-based choice models is provided in the appendix.

Page 7: Bayesia Lab Choice Modeling 1

1. Whereas utility-based choice models, such as multi-

nomial logit models (MNL), will “!atten” the vector of attribute utilities into a single scalar value, Bayes-

ian networks do not inherently restrict all the di-

mensions relating to choice. For example, learning a Bayesian network on observed vehicle choices might

reveal that fuel economy and vehicle price are sub-

ject to tradeoff, while safety is a nonnegotiable basic requirement for the consumer. Correctly recognizing

such dynamics are obviously critical for making

predictions about future consumer choices.

2. Bayesian networks are nonparametric and therefore do not require the speci"cation of a functional form.

No assumptions need to made regarding the form of

links between variables. Potentially nonlinear pat-terns are therefore not an issue for model estimation

or simulation.

3. Bayesian networks are inherently probabilistic and as such there is no need to specify an error term. An

error would be needed in a traditional choice model

to make it non-deterministic.

4. In BayesiaLab all computations are natively discrete and therefore no transformation functions, such as

logit or probit, are needed. Given that we are deal-

ing with discrete consumer choices, this all-discrete approach is an advantage.

For our case study we use BayesiaLab 5.0 Professional

Edition to learn a Bayesian network from consumer choices in the form of stated preference (SP) or revealed

preference (RP) data.11,12 The learned Bayesian network

allows us to compute the posterior probability distribu-

tion in each choice situation, including hypothetical product alternatives (and even hypothetical consumers).

As a result we obtain a choice probability as a function

of product and consumer attributes.

In order to obtain a product’s projected market share, we

then need to simulate choice probabilities across all

product scenarios and across all individuals in the popu-lation under study. For this speci"c purpose Bayesia SAS

has developed the Bayesia Market Simulator, which uses

the Bayesian networks generated by BayesiaLab. Both tools will play a central role in this case study.

Case StudyTo illustrate the entire market share estimation process

with Bayesian networks, we have derived a case study from the U.S. auto industry. More speci"cally, we will

model consumer choice behavior in the high-end vehicle

market based on 2009 survey data. This is an interesting

point in time, as it precedes the launch of the new Por-sche Panamera in model year 2010 (MY 2010), which

will be the focus of our study.

Porsche Panamera

After the highly successful Cayenne, a four-door luxury

SUV, the Panamera is Porsche’s second vehicle with four doors. Clearly in!uenced by the legendary 911’s styling,

the Panamera is offers sports-car looks and performance

while comfortably accommodating four passengers. It

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 4

11 The properties of Stated Preference (SP) and Revealed Preference (RP) data are explained in the appendix.

12 Although we focus here exclusively on machine-learning consumer behavior, within the BayesiaLab framework we

can also utilize expert knowledge about consumer behavior. For instance, vehicle dealers and their salespeople will have extensive knowledge about how consumer behave in the showroom. A special Knowledge Elicitation module in

BayesiaLab can formally capture such expertise and build a new Bayesian network from it or augment an existing one.

Knowledge Elicitation with BayesiaLab will be the subject of a separate tutorial to be published in the near future.

Page 8: Bayesia Lab Choice Modeling 1

enters a segment with well-established contenders, such

the Mercedes-Benz S-Class13, the BMW 7-series14 and the Audi A815, shown below in that order.

Beyond these traditional premium sedans, there are a

number of less conventional products that one can as-sume to be in the Panamera’s competitive "eld as well.

The coupe-like Mercedes-Benz CLS16 would probably

fall into this category.

Finally, the new Panamera may draw customers away

from Porsche’s own product offerings, such as the Cay-enne17 , an effect that is often referred to as “product

substitution” or “product cannibalization.”

It is not our intention to speculate about potential

product interactions, but rather to attempt learning from

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 5

13 MY 2010 shown

14 MY 2009 shown

15 MY 2009 shown

16 MY 2010 shown

17 MY 2009 shown

Page 9: Bayesia Lab Choice Modeling 1

revealed consumer behavior in a very formal way with

Bayesian networks.

In order not to prematurely restrict our consumer choice

set, we have de"ned a broad set of competitors for our

purposes and included all non-domestic luxury vehicles18 (including Light Trucks) priced above $75,000.19

What was certainly a very real task for Porsche’s product

planning team in recent years, i.e. predicting the Panam-era market share, now becomes the topic of our case

study and tutorial. Our objective is to predict what mar-

ket share the Panamera will achieve without conducting

any new research, strictly using RP data from before the product launch.

Common Forecasting Practices

Although we have no knowledge of the speci"c forecast-ing methods at Porsche, we know from industry experi-

ence that volume and market share forecasts are often

determined through a long series of negotiations20 be-tween stakeholders, typically with an optimistic market-

ing group on one side and a skeptical CFO on the other.

While expert consensus may indeed be a reasonable heu-

ristic for business planning, the lack of forecasting for-malisms is often justi"ed by saying that forecasting is at

least as much art as it is science.

The authors believe strongly that there is great risk in relying too heavily on “art”, which is inherently non-

auditable, and have therefore been pursuing easily trac-

table, but scienti"cally sound methods to support mana-gerial decision making, especially in the context of fore-

casting. With this in mind, this very formal and struc-

tured forecasting exercise was consciously chosen as the

topic of the tutorial.

TutorialIn this tutorial we will explain each step from data preparation to market share simulation using BayesiaLab

and Bayesia Market Simulator, according to the follow-

ing outline:

1. Data preparation (external)

2. BayesiaLab:

a. Data import

b. Data modeling

3. Baseline product scenario generation (external)

4. Bayesia Market Simulator:

a. Network import

b. De"nition of scenarios

c. Market share simulation

Notation

To clearly distinguish between natural language,

software-speci"c functions and study-speci"c variable

names, the following notation is used:

• BayesiaLab and Bayesia Market Simulator functions,

keywords, commands, etc., are shown in bold type.

• Variable/node names are capitalized and italicized.

Data Preparation

Consumer ResearchThis tutorial utilizes the 2009 New Vehicle Experience

Survey, a syndicated study conducted annually by Strate-

gic Vision, Inc., which surveys new vehicle buyers in the

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 6

18 We followed the SVI segmentation and included “Luxury Car”, “Premium Coupe”, “Premium Convertible/Roadster”

and “Luxury Utility” in our selection.

19 The $75,000 threshold was chosen as it marks the lower end of the Panamera price range.

20 As an interesting aside, these negotiations are usually Markovian in nature, i.e. the starting point of today’s negotia-

tion only depends on the outcome of the previous negotiation.

Page 10: Bayesia Lab Choice Modeling 1

U.S. This study is widely used in the auto industry and it

serves one of the primary market research tools. NVES contains over 1,000 variables and close to 200,000 re-

spondent records. In large auto companies, hundreds of

analysts typically have access to NVES, most often through the mTAB interface provided by Productive Ac-

cess, Inc. (PAI).21

Variable SelectionCompared to traditional statistical models, Bayesian

networks require much less “care” in terms of variable

selection, as overparameterization is generally not an

issue. So, although we could easily start with all 1,000+ variables, for expositional clarity we will initially select

only about 50 variables22 from the following categories,

which we assume to capture relevant characteristics of both the consumer and the product:

1. Vehicle/product attributes, e.g. brand, segment, num-

ber of cylinders, transmission, drive type, etc.

2. Consumer demographics, e.g. age, income, gender, etc.

3. Vehicle-related consumer attitudes, e.g. “I want to

look good when driving my vehicle”, “I want a basic,

no-frills vehicle that does the job,” etc.

Set of Choice Alternatives

Beyond variable selection, we must also de"ne the set of

choice alternatives and assume which vehicles a potential Panamera customer would consider. Not only that, but

we also need to make sure that all choice alternatives for

the Panamera’s choice alternatives are included. For in-stance, if we included the Porsche Cayenne in the choice

set, then the Mercedes-Benz M-Class and the BMW X5

should be included too, and so on. One might argue that

the vehicle purchase might be an alternative to a kitchen renovation or the purchase of a boat. Expert knowledge

is clearly required at this point as to how far to expand

the choice set. Furthermore, SVI’s NVES can also help us in this regard as it contains questions about what vehi-

cles actual buyers did consider and which vehicles they

disposed in the context of their most recent purchase.23

As mentioned in the case study introduction, we included

“Luxury Car”, “Premium Coupe”, “Premium

Convertible/Roadster” and “Luxury Utility”24 in the choice set and we further restricted it by excluding all

domestic vehicles and vehicles priced below $75,000. For

this segment of assumed Panamera competitors we have approximately 1,200 unweighted observations in the

2009 NVES, which, on a weighted basis, re!ect ap-

proximately 25,000 vehicles purchased in 2009.

Filtered Values (Censored States)

Although in BayesiaLab we can be less rigorous regard-

ing the maximum number of variables, we still need to

be conscious of the information contained in them.

For instance, we need to distinguish unobserved values

from non-existing values, although at "rst glance both

appear to be “simple” missing values in the database. BayesiaLab has a unique feature that allows treating

non-existing values as Filtered Values or Censored States.

To explain Filtered Values we need to resort to an auto-motive example from outside our speci"c study. We as-

sume that we have two questions about trailer towing.

We "rst ask, “do you use your vehicle for towing?”, and then, “what is the towing weight?” If the response to the

"rst question is “no”, then a value for the second one

cannot exist, which in BayesiaLab’s nomenclature is a Filtered Value or Censored State. We actually must not

impute a value for towing weight in this case and instead

Filtered Value code will indicate this special condition.

On the other hand, a respondent may answer “yes”, but then fail to provide a towing weight. In this case, a true

value for the towing weight exists, but we cannot ob-

serve it. Here it is entirely appropriate to impute a miss-

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 7

21 www.paiwhq.com

22 A list of all variables used is given in the appendix. It should be noted that even 50 variables would create a major

computational challenge with MNL models.

23 Martin Krzywinski’s visualization tool, Circos, is highly recommended for the interpretation of cross-shopping behav-

ior: www.mkweb.bcgsc.ca/circos/

24 According to SVI’s segment de"nition.

Page 11: Bayesia Lab Choice Modeling 1

ing value, as we will explain as part of the Data Import procedure.

To indicate Filtered Values to BayesiaLab, we will need

to apply a study-speci"c logic and recode the relevant

variables in the original database. Most statistical soft-ware package have a set of functions for this kind of

task.

For example, in STATISTICA this can be done with the Recode function.

Alternatively, this recoding logic can also be expressed

with the following pseudo code:

IF towing=yes THEN towing weight=unchanged

IF towing=no THEN towing weight=FV (Filtered Value)

A simple Excel function will achieve the same and it is

assumed that the reader can implement this without fur-

ther guidance.

Although Filtered Values are very important in many

research contexts, hence the emphasis here, our case

study does not require using them.

Data Modeling

Data ImportTo start the analysis with BayesiaLab, we "rst import the

database, which needs to be formatted as a CSV "le.25

With Data>Open Data Source>Text File, we start the Data Import wizard, which immediately provides a

preview of the data "le.

The table displayed in the Data Import wizard shows the

individual variables as columns and the respondent re-cords as rows. There are a number of options available,

such as for Sampling. However, this is not necessary in

our example given the relatively small size of the data-base.

Clicking the Next button prompts a data type analysis,

which provides BayesiaLab’s best guess regarding the data type of each variable.

Furthermore, the Information box provides a brief sum-

mary regarding the number of records, the number of

missing values, "ltered states, etc.

For this example, we will need to override the default

data type for the Unique Identi!er variable, as each

value is a nominal record identi"er rather than a numeri-cal scale value. We can change the data type by highlight-

ing the Unique Identi!er column and clicking the Row

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 8

25 CSV stands for “comma-separated values”, a common format for text-based data "les. As an alternative to this im-

port format, BayesiaLab offers a JDBC connection, which is practical when accessing large databases on servers.

Page 12: Bayesia Lab Choice Modeling 1

Identi!er check box, which changes the color of the

Unique Identi!er column to beige.

Although it is not imperative to maintain a Row Identi-!er, and we could instead assign the Not Distributed

status to the Unique Identi!er variable, it can be quite helpful for "nding individual respondent records at a

later point in the analysis.

As the respondent records in the NVES survey are weighted, we need to select the Weight by clicking on the

Combined Base Weight variable, which will turn the

column green.

Missing Values

In the context of data import, it is important to point out

how missing values are treated in BayesiaLab. The na-

tive, automatic processing of missing values reveals a particular strength of BayesiaLab.

In traditional statistical analysis, the analyst has to

choose from a number of methods to handle missing values in a database, but unfortunately many of them

have serious drawbacks. Perhaps the most common

method is case-wise deletion, which simply excludes re-

cords that contain any missing values. Casually speaking, this means throwing away lots of good data (the non-

missing values) along with the bad (the missing values).

Another method is means-imputation, by which any missing value is "lled in with the variable’s mean. Inevi-

tably, this reduces the variance of the variable and thus

has an impact on its summary statistics, which is clearly undesirable considering the intended analysis. In the case

of discrete distributions, means-imputation typically also

introduces a bias. There are other, better techniques, which typically demand signi"cant computational effort

and thus often turn out like a labor-intensive standalone

project rather than being just a preparatory step.

Without going into too much detail at this point,

BayesiaLab can estimate all missing values given the

learned network structure using the Expectation Maxi-mization (EM) algorithm. As a result, we obtain a com-

plete database without “making things up.” In tradi-

tional statistics, the equivalent would be to say that nei-

ther the mean nor the variance of the variables is af-fected by the imputation process.

Continuing in our data import process, the next screen

provides options as to how to treat the missing values. Clicking the small upside-down triangle next to the vari-

able names brings up a window with key statistics of the

selected variable, in this case Age Bracket.

The very basic functions of "ltering, i.e. case-wise dele-tion, and mean/modal value imputation are available.

However, at this point, we can take advantage of

BayesiaLab’s advanced missing values processing algo-rithms. We will select Dynamic Completion, which will

continuously “"ll in” and “update” the missing values

according to the conditional distribution of the variable,

as de"ned by the current structure of the networks. However, as our network is not yet connected and hence

does not have a structure, BayesiaLab will draw from the

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 9

Page 13: Bayesia Lab Choice Modeling 1

marginal distribution of each variable to “tentatively”

establish placeholder values for each missing value.

A screenshot from STATISTICA, where we have done

most of the preprocessing, shows the marginal distribu-

tion of the Age Bracket variable in the form of a histogram.26

The missing Age Bracket values will be drawn from this marginal distribution and are used as placeholders, until

we can use the structure of the Bayesian network to rees-

timate our missing values. As Dynamic Completion im-plies, BayesiaLab performs this on continuous basis in

the background, so at any point we would have the best

possible estimates for the missing values, given the cur-

rent network structure.

Discretization

The next step is the Discretization and Aggregation dia-

logue, which allows the analyst to determine the type of discretization, which must be performed on all continu-

ous variables.27 We will use the Purchase Price variable

to explain the process. Highlighting a variable will show the default discretization algorithm while the graph

panel is initially blank.

By clicking on the Type drop-down menu, the choice of

discretization algorithms appears.

Selecting Manual will show a cumulative graph of the

Purchase Price distribution, and we can see that it ranges from $75,000 to $180,000.28

We could now manually select binning thresholds by

way of point-and-click directly on the graph panel. This

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 10

26 The normal curve in the histogram is just for illustration purposes. BayesiaLab always uses the actual discrete distri-

bution, not a parametric approximation.

27 BayesiaLab requires discrete distributions for all variables.

28 $75,000 was previously selected as the lower boundary for this particular vehicle segment. $180,000 was the highest

reported price in NVES.

Page 14: Bayesia Lab Choice Modeling 1

might be relevant, if there were government regulations

in place with speci"c vehicle price thresholds.29

For our purposes, however, we want to create price cate-

gories that are meaningful in the context of our vehicle

segment and "ve bins may seem like a reasonable start-ing point.

Clicking Generate Discretization will prompt us to select

the type of discretization and the number of desired in-tervals. Without having a-priori knowledge about the

distribution of the Price variable, we may want to start

with the Equal Distances algorithm.

The resulting view shows the generated intervals and by

clicking on the interval boundaries we can see the per-centage of cases falling into the adjacent intervals.

We learn from this that our bottom two intervals contain

89% of the cases, whereas the top two intervals contain just under 5% of the cases. This suggests that we may

not have enough granularity to characterize the bulk of

the market towards the bottom end of the price spec-trum. Perhaps we also have too few cases within the top

two intervals. So we will generate a new discretization,

now with four intervals, and select KMeans as the type

this time.

The resulting bins appear much more suitable to describe

our domain.

We will proceed similarly with the only other continuous

variable in the database, i.e. Age Bracket.

Clicking Finish completes the import process and 49

variables (columns) from our database are now shown as blue nodes in the Graph Panel, which is the main win-

dow for network editing.

Note

For choosing discretization algorithms beyond this example, the following rule of thumb may be helpful:

• For supervised learning, choose Decision Tree.

• For unsupervised learning, choose, in the order of priority, K-Means, Equal Distances or Equal Frequencies.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 11

29 The now-expired luxury tax for passenger cars in the U.S. would be an example for such a policy.

Page 15: Bayesia Lab Choice Modeling 1

The six nodes on the far left column re!ect product at-

tributes (green), the second-from-left column shows ten

demographic attributes (yellow) and all remaining nodes

to the right represent 33 vehicle-related attitudes (red). This initial view represents a fully unconnected Bayesian

network.

Also, to simplify our nomenclature, we will combine the demographic attributes (yellow) and the vehicle-related

attitudes (red) and refer to them together as “Market”

variables (now all red).

Variable Classes and Forbidden Arcs

One is now tempted to immediately start with Unsuper-vised Learning to see how all these variables relate to

each other.

However, there are two reasons why we need to intro-duce another step at this point:

1. Our mission is to model the interactions between

products variables on the one side and market vari-

ables on the other, so we can see the consumer re-sponse to products. For instance, we are more inter-

ested in learning P(Transmission= “Manual” | Atti-

tude = “Driving is one of my favorite things”) than

we are in P(Age < 45 | Number of children under 6

= 2). Hence we focus the learning algorithm on the area of interest, i.e. product attributes vis-à-vis mar-

ket attributes.

2. We must not learn the dependencies between the product variables themselves because they would

simply re!ect today’s product offerings and their

contingencies, e.g. P(Vehicle Segment=“4-door se-dan” | Brand=“Porsche”)=0. We do want to under-

stand what is available today, but we certainly do

not want to encode today’s product scenarios as

constraints in the network. Instead, we want to be able to introduce new scenarios, which are not

available today.

To focus learning in a speci"c area, we need to take an indirect approach and tell BayesiaLab “what not to

learn.” So, to prevent the algorithm from learning the

product-to-product variable relationships, we will “for-bid” such arcs.

We "rst create a Class by highlighting all product nodes

then right-clicking them. From the menu, we then select

Properties>Classes>Add.

When prompted for a name, we can choose something descriptive, so we give this new Class class the label

“Product”.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 12

Page 16: Bayesia Lab Choice Modeling 1

Having introduced this Class of node, we can now very

easily manage Forbidden Arcs. More speci"cally, we want to make all arcs within the Class Products forbid-

den. A right-click anywhere on the Graph Panel opens

up the menu from which we can select Edit Forbidden Arcs.

In the Forbidden Arc Editor, we can select the Class Product both as start and end.

We now repeat the above steps and also create Forbid-den Arcs for the Market variables.

As a result, these Forbidden Arc relationships will appear

in the Forbidden Arc Editor and will remain there unless we subsequently choose to modify them.

We are also reminded about the presence of Forbidden Arcs by the symbol in the lower right corner of the screen.

Unsupervised Learning

Now that the learning constraints are in place, we con-tinue to learn the network by selecting Learning>Asso-ciation Discovering>EQ.30

The resulting network may appear somewhat unwieldy at "rst glance, but upon closer inspection we can see that

arcs exist only between Product variables (green) and

Market variables (red), which is precisely what we in-tended by establishing Forbidden Arcs.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 13

30 EQ is one of the unsupervised learning algorithms implemented in BayesiaLab. Koller and Friedman (2009) provide a

comprehensive introduction to learning algorithms.

Page 17: Bayesia Lab Choice Modeling 1

However, we will not analyze this structure any further,

but rather use it solely as a statistical device to be used in

the Bayesia Market Simulator. We simply need to save

the network in its native xbl "le format, so the Bayesia Market Simulator can subsequently import it.

SimulationWith the Bayesia Market Simulator we have the ability

to simulate “alternate worlds” for both the Product variables as well as for the Market variables. In most

applications, however, marketing analysts will want to

primarily study new Product scenarios assuming the Market remains invariant, meaning that consumer

demographics and attitudes remain the same.31

It will be the task of the analyst to de"ne new product scenarios, which will need to include all products as-

sumed to be in the marketplace for the to-be-projected

timeframe, in our case 2010.32 As many products carry

over from one year to the next, e.g. from model year 2010 to model year 2011, it is very helpful to use the

currently available products as a baseline scenario, upon

which changes can be built. Quite simply, we need to take inventory of the product landscape today. In the

current version of Bayesia Market Simulator this step is

yet not automated, so a practical procedure for generat-

ing the baseline scenario is described in the following

section.

Product Scenario Baseline

The idea is that all available product con"gurations were

manifested in the market in 2009 and thus captured in the 2009 NVES.33

It still requires careful consideration as to how many

Product variables should be included to generate the baseline product scenario. We want to create a type of

coordinate system, that allows us to identify products

through their principal characteristics. For instance, the

following attributes would uniquely de"ne a “Mercedes-Benz S550 4Matic”:

• Brand=“Mercedes-Benz”

• Engine Type=“V8”

• Drive Type=“AWD”

• Transmission=“Automatic”

• Segment=“High Premium”34

• Price=“>$85,795 AND <= $99,378”

Relating consumer attributes and attitudes to these indi-

vidual product attributes, rather than to the vehicle as a

whole, will then allow us to construct hypothetical products during our simulation. To stay with the Mer-

cedes example, we could de"ne a new product by setting

the engine type to “V6” and changing the price to “<$85,795”.

It is easy to imagine how one can get the number of

permutations to exceed the number of consumers. For instance, in the High Premium segment, we could further

differentiate between short wheelbase and long wheel-

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 14

31 The year-to-year invariance assumption of the market has been challenged by many marketing executives during the

most recent recession. In this context, many media headlines also proclaimed a paradigm shift in consumer behavior. The authors have believed - then as well as now - that more has remained the same than has changed in terms of con-

sumer attitudes.

32 For expositional simplicity, we make no distinction between model year and calendar year.

33 In our example, we judge this to be a reasonable simpli"cation, even though a small number of automobiles at the

very top end of the market, e.g. the Rolls-Royce Phantom, may not be captured in the survey.

34 Using the Strategic Vision segmentation nomenclature, “High Premium” de"nes a large four-door luxury sedan.

Page 18: Bayesia Lab Choice Modeling 1

base versions, which would increase the number of base-

line product scenarios. We want to "nd a reasonable balance between product granularity and the ratio of

consumers to product scenarios, although we cannot

provide the reader with a hard-and-fast rule.

Pricing is obviously a very important part of the product

scenario con"guration and here we are confronted with

the reality that no two customers pay exactly the same for the identical product, and the survey data makes this

very evident. Furthermore, there are numerous product

features outside our “coordinate system”, e.g. an op-

tional $6,000 high-end audio system, that would materi-ally affect the price point of an individual vehicle, but

which would not move the vehicle into a different cate-

gory from a consumer’s perspective. With options, an S550 can easily reach a price of over $100,000. Still we

would want such a high-end S550 to be grouped with

the standard S550. Thus it is important to de"ne reason-able price brackets that cover the price spectrum of each

vehicle and minimize model fragmentation.

During the Data Import stage, BayesiaLab has discre-

tized all continuous numerical values, including Price, and created discrete states. If these discrete states are

adequate considering the price positioning and price

spectrum of the vehicles under study, we can now lever-age this existing binning for generating all current

product scenarios and select Data>Save Data.

In the subsequently appearing dialogue box, we need to

select Use the States’ Long Name. It is important that

Use Continuous Values is not checked, otherwise we will lose the discretized states of the Price variable.

This will export all variables and all records, including

values from previously performed missing value imputa-tions. The output will be in a semicolon-delimited text

"le, which can be easily imported into Excel or any sta-

tistical application, such as SPSS or STATISTICA. The purpose of loading this into an external application is to

manipulate the database to extract the unique product

combinations available in the market.

In Excel this can be done very quickly by deleting all columns unrelated to the product con"guration, which

leaves us with just the product attributes.

In Excel 2010 (for Windows) and Excel 2011 (for Mac),

there is a very convenient feature, which allows to

quickly remove all duplicates, which is exactly what we want to achieve. We want to know all the unique

product con"gurations currently in the market.

This leaves use with a table of approximately 100 unique

product scenario combinations available at the time of the survey.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 15

Page 19: Bayesia Lab Choice Modeling 1

To make these unique product scenarios available for

subsequent use in the Bayesia Market Simulator, we need to save the table as a semicolon-delimited CSV "le. This

is important to point out, as most programs will save

CSV "les by default as comma-delimited "les.

Product Scenario SimulationNow that we have the Bayesian network describing the overall market (as an xbl "le) as well as the baseline

product scenarios (as a csv "le), we can proceed to open

the Bayesia Market Simulator.

Clicking File>Open will prompt us to open the xbl net-

work "le we previously generated with BayesiaLab.

Upon loading we will see the principal interface of the

Bayesia Market Simulator. On the left panel, all nodes of the network appear as variables. We will now need to

separate all variables into Market Variables and Scenario Variables by clicking the respective arrow buttons. In our case, the aptly named Market variables are the Market Variables in BMS nomenclature and Product variables

are the Scenario Variables.

All variables must be allocated before being able to con-

tinue to Scenario Editing. This also implies that Product

variables, which are not to be included as Scenario Vari-ables, must be excluded from the Bayesian network "le.

If necessary, we will return to BayesiaLab to make such

edits

As we are working with RP data, every record in our

database re!ects one vehicle purchase, i.e. “reveals” one

choice, and therefore we need to leave the Target Vari-able and Target State "elds blank. These "elds would only be used in conjunction with SP data, which includes

a variable indicating acceptance versus rejection.

Clicking Scenario Editing opens up a new window. We can now manually add any product scenarios we wish to

simulate. Given the potentially large number of scenar-

ios, it will typically be better to load the baseline product scenarios, which were saved earlier.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 16

Page 20: Bayesia Lab Choice Modeling 1

We can do that by selecting Offer>Import Offers.

We now select to open the semicolon-delimited CSV "le

with the baseline product scenarios. It is very important

that the CSV "le is formatted precisely as speci"ed, for instance, without any extra blank lines.

In case there are any import issues, it can be helpful to

review the CSV "le in a text editor and to visually in-spect the formatting.

Upon successful import, all baseline product scenarios

will appear in the Scenario Editing dialogue.

The analyst can now add any new product scenarios or

delete those products, which are no longer expected to

be in the market.35 By clicking Add Offer an additional scenario will be added at the bottom of the product sce-

nario list. In the case of long product scenario lists, this

may require scrolling all the way down.

Clicking on the product attributes of any scenario prompts drop-down menus to appear with the available

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 17

35 To maintain expositional simplicity, we have added all Panamera versions for the entire year 2010 and not changed

any other product scenarios. It should be pointed out that the V6 version of the Porsche Panamera was introduced only in mid-2010. BMW has also launched an additional six-cylinder version of the 7-series as well as AWD variants, which

are not re!ected in the simulation. Finally, Jaguar has released a new XJ in 2010, while that year marked the runout of

the old-generation Audi A8.

Page 21: Bayesia Lab Choice Modeling 1

attribute states, e.g. RWD or AWD.36 This also allows to

change attributes of existing products, according to the analysts requirements.

For our case study, we will add the following versions of the Panamera as new product scenarios:

• Panamera (V6, RWD)

• Panamera 4 (V6, AWD)

• Panamera S (V8, RWD)

• Panamera 4S (V8, AWD)

• Panamera Turbo (V8 Turbo, RWD)

To characterize all of them as large 4-door luxury se-dans, which is the key distinction versus previous Por-

sche products, we will assign the “High Premium” at-

tribute to them.

Once this is completed, we need to obtain a database that represents the consumer base, on which these new

product scenarios will be “tried out”. This can either be

done by associating the original database, from which

the network was learned, or by creating a new, arti"cial one that re!ects the joint probability distribution of the

learned Bayesian network.

The latter can be achieved by selecting Database>Gener-ate.

It is up to the analyst to determine the size of the data-

base to be generated. Although there is no "xed rule, too

small of a database will limit the observability of prod-ucts with a very small market share.

Alternatively, we can also associate the original database, which contains the survey responses. In our case, the

original database contains 1,203 records, which is very

reasonable in terms of computational requirements.

Once a database is associated, clicking the Simulation button will start the market share estimation process.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 18

36 RWD and AWD stands for rear-wheel drive and all-wheel drive respectively

Page 22: Bayesia Lab Choice Modeling 1

With the given complexity of our network and around

100 product scenarios, the simulation should take no

longer than 30 seconds on a typical desktop computer.

Upon completion, the simulation results will appear in

the form of a pie chart and a table. One can go back and

review the scenarios by clicking the Scenario Editing button.

The aggregated simulated market shares can also be cop-

ied from the results table and pasted into Excel or any other application for further editing and presentation

purposes. An example is provided below, showing the

simulated market shares of the brands under study in the High Premium segment.

1%

21%

3%

10%

53%

12%

Simulated High Premium Market Shares ($75,000+)

Audi BMW Jaguar Lexus Mercedes Porsche

As can be seen from the results, the Porsche Panamera’s

predicted market share appears to be compatible with the reported running rate for calendar year 2010, which

was available at the time of writing. Unfortunately, we

do not know how this compares to Porsche’s expecta-

tions, but the Panamera seems to be quite successful overall.

Substitution and Cannibalization

The fully simulated database can also be saved as a semicolon-delimited CSV "le, which will allow reviewing

the choice probability for each product scenario by indi-

vidual consumer in a spreadsheet.

We can literally examine the new, simulated choices

record-by-record and see which customers have made

the switch to the Panamera. Applying conditional for-matting to the spreadsheet can also be very helpful. The

above screenshot, for example, shows a selection of ac-

tual Mercedes buyers, who would either consider or pick the Porsche Panamera in this simulation. High choice

probabilities are shown in shades of red, while near-zero

probabilities are depicted in dark blue.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 19

Page 23: Bayesia Lab Choice Modeling 1

It is equally interesting to examine which Porsche buyers

would pick the Panamera over their current vehicle choice.

Not surprisingly, our simulation suggests high probabili-ties of Panamera choice for several current Cayenne

owners. One is tempted to take this a step further and

calculate a rate of cannibalization. In this particular sur-

vey, however, the sample size is too small to attempt do-ing so. Otherwise, such a computation would be simple

arithmetic.

Market Scenario SimulationAlthough experimenting with product scenarios is ex-

pected to be the primary use of the Bayesia Market Simulator, it is also possible to change the market scenar-

ios.

For example, this can be used to simulate the impact of policy changes. One could hypothesize that legislation

would prohibit or severely penalize ownership of vehi-

cles of a certain size or of a speci"c engine type in urban

areas.37

Upon editing the market segments, the simulation can be

rerun to obtain the new market share results.

LimitationsThis approach can simulate product and market scenar-

ios consisting of variations of con"gurations, which can

be observed with suf"cient sample today. However, the impact of entirely new technologies cannot be simulated

on this basis. As a result, projecting the market share of

the all-electric Nissan Leaf38 would not possible, whereas estimating the share of a hypothetical three-row BMW

crossover vehicle would be feasible. In all cases, it re-

quires the analyst’s expert knowledge and judgment to determine the adequacy and equivalency of product at-

tributes observable today.

OutlookThere exist several natural extensions to the presented methodology, however it would go beyond the scope of

this paper to present them. A brief summary shall suf"ce

for now and we will go into greater detail in forthcom-

ing case studies in this series:

1. Beyond learning from data, we can use expert

knowledge to create or augment Bayesian networks.

BayesiaLab offers a Knowledge Elicitation module, which formally captures expert knowledge and en-

codes it in a Bayesian network. In absence of market

data, this is an excellent approach to have decision makers collectively (and formally correct) reason

about future states of the world.

2. We can extend the concept of product attributes to

consumers’ product satisfaction ratings. This will allow estimating the market share impact as a func-

tion of changes in consumer ratings. For instance,

an automaker could reason about the volume im-pact from a vehicle facelift, which is expected to

raise the consumer rating of “styling”.

3. The product cannibalization or substitution rate can be estimated based on the simulated choice behav-

ior, given that there is suf"cient sample size. So, for

most mainstream products, this seems to be realistic.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 20

37 Given the draconian restrictions on motorists in Central London, this example is presumably not very far-fetched.

38 The all-electric Leaf was launched by Nissan in the U.S. in December of 2010.

Page 24: Bayesia Lab Choice Modeling 1

4. With the ability to study consumer choice at the

model level, we can also aggregate these results to the segment level. Alternatively, using a less granular

approach, we can model the entire market at the

segment and brand level, which would allow study-ing market changes at a larger scale.

5. Beyond simulating “hard” policy changes affecting

the market, e.g. excluding a product class from a certain geography, we can also use BayesiaLab to

simulate new populations with small changes in

average consumer attitudes versus the originally

surveyed population. For instance, such an arti"-cially modi"ed population could be more environ-

mentally conscious and one could apply opinions

prevalent on the West Coast to the whole country. Bayesia Market Simulator can then generate new

market shares based on these new hypothetical

market conditions.

SummaryBayesiaLab and Bayesia Market Simulator are unique in

their ability to use Bayesian networks for choice model-

ing and market share simulation. The presented work-!ow provides a comprehensive method for simulating

market shares of future products based on their key

characteristics, without requiring new and costly ex-

periments.

As a result, BayesiaLab and Bayesia Market Simulator

allow using a vast range of existing research for market

share predictions. Given the signi"cant resources many corporations have allocated over many years to conduct-

ing consumer surveys, these BayesiaLab tools offer an

entirely new way to turn the accumulated research data into practical market oracles.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 21

Page 25: Bayesia Lab Choice Modeling 1

Appendix

Utility-Based Choice TheoryIn today’s choice modeling practice, utility-based choice

theory plays a dominant role.

1. The "rst concept of utility-based choice theory is

that each individual chooses the alternative that

yields him or her the highest utility.

2. The second idea refers to being able to collapse a vector describing attributes of choice alternatives

into a single scalar utility value for the chooser. For

instance, a vector of attributes for one choice alter-native, e.g. [Price, Fuel Economy, Safety Rating],

would translate into one scalar value, e.g. [5], spe-

ci"c to each chooser.

The following example is meant to illustrate both:

For Consumer A:

• Utility of Product 1:

[Price=$25,000, Fuel Economy=25MPG, Safety Rat-

ing=4 stars] = 7 ✓

• Utility of Product 2:

[Price=$29,000, Fuel Economy=23MPG, Safety Rat-ing=5 stars] = 5.5

For Consumer B:

• Utility of Product 1: [Price=$25,000, Fuel Economy=25MPG, Safety Rat-

ing=4 stars] = 4

• Utility of Product 2: [Price=$29,000, Fuel Economy=23MPG, Safety Rat-

ing=5 stars] = 7.5 ✓

This concept implies that consumers make tradeoffs, either explicitly or implicitly, and that there exists an

amount x of “Fuel Economy” that is equivalent in utility

to an amount y of “Safety”. The reader may reasonably

object that not even a fuel economy of 100MPG would make it acceptable to drive a vehicle that is rated very

poorly on safety.

Also, we do not know a priori what the utility values are nor can we measure them. Neither do we know in ad-

vance how individual product and consumer attributes

relate to these unobservable utilities. However, there are methods that allow us to estimate these unknown vari-

ables and, based on this knowledge, they allow us to

predict choice in the future. One such method is brie!y highlighted in the following.

Multinomial Logit Models

In the domain of choice modeling, MultiNomial Logit models (MNL) have become the workhorse of the indus-

try, but here we only want to provide a cursory overview,

so the reader can compare the approach presented in the

case study with current practice.

MNL models provide a functional form for describing

the relationship between the utilities of alternatives and

the probability of choice.

For instance, using an MNL model for a choice situation

with three vehicle alternatives, Altima, Accord and

Camry, the probability of choosing the Altima can be expressed as:

Pr(Altima) = exp(VAltima )exp(VAltima ) + exp(VAccord ) + exp(VCamry )

VAltima in this case stands for the utility of the Altima

alternative. The utilities VAltima, VAccord, and VCamry are a

function of the product attributes, e.g.

VAltima = β1 × CostAltima + β2 × FuelEconomyAltima + β3 × SafetyRatingAltima

As we can observe tangible attributes like vehicle cost, fuel economy and safety rating, and we can also observe

who bought which vehicle, we can estimate the unknown

parameters. Once we have the parameters, we can simu-late choices based on new, hypothetical product attrib-

utes, such as a better fuel economy for the Altima or a

lower price for the Camry.

The parameters of MNL models can be estimated both from “stated preference” (SP) data, i.e. asking consumers

about what they would choose, and “revealed prefer-

ence” (RP) data, i.e. observing what they have actually chosen. There are numerous variations and extensions

to the class of MNL models and the reader is referred to

Train (2003) and Koppelman (2006) for a comprehen-sive introduction.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 22

Page 26: Bayesia Lab Choice Modeling 1

Stated Preference DataStated preference data typically comes from experiments,

i.e. consumer surveys or product clinics. In this context,

conjoint experiments have become a very popular choice elicitation method and a wide range of tools have been

developed for this particular approach. In conjoint stud-

ies, consumers would typically be given a set of arti"-cially generated product choices along with their attrib-

utes, from which preference responses are then elicited.

There are many variations of this method that all at-tempt to address some of the inherent challenges related

to dealing with responses to hypothetical questions.

The Sawtooth software package has become de-facto

industry standard for such conjoint studies.39

Revealed Preference DataIn contrast to SP data, revealed preference data is purely derived from passive observations. As the name implies,

the consumer choice is revealed by their actual behavior

rather than by their stated intent in a hypothetical situa-tion. A key bene"t is that it is typically easier and more

economical to obtain passive observations than to con-

duct formal experiments. A conceptual limitation of RP data relates to the fact that non-yet-existing products can

obviously not be chosen by consumers in the present

market environment. Thus simulating market shares of

hypothetical products requires “assembling” them from components and attributes of products, which are al-

ready available in the market. This inherently limits the

exploration of entirely new technologies, which have little in common with the technologies they may replace.

Studies based on RP data have become very popular for

researching travel mode choice, as is also documented in a large body of research. In market research related to

CPG products or durable goods, using RP data is some-

what less common.

We speculate that one of the reasons for the lack of popularity outside the world of academia is the absence

of easy-to-use software packages. Only recently, with the

release of Easy Logit Modeling (ELM)40 , specifying and estimating multinomial logit models has become practi-

cal for a much broader audience. Although ELM has

successfully removed the burden of manual coding, countless iterations of speci"cation and estimation re-

main a very time-consuming task of the analyst.

NVES VariablesThe following variables from the 2009 Strategic Vision

NVES were included this case study:

• UNIQUE IDENTIFIER

• Combined Base Weight

• New Model Purchased - Make/Model/Series (Alpha Order)

• New Model Purchased - Brand

• New Model Purchased - Region Origin

• New Model Segment

• Segmentation 2

• Type Of Transmission

• Number Of Cylinders (VIN)

• Drive Type (VIN)

• Fuel Type

• Gender

• Marital Status

• Age Bracket

• Children Under 6

• Children 6 To 12

• Children 13 To 17

• Total Family Pre-Tax Income

• Ethnic Group

• Location Of Residence

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 23

39 A wide range of tools is available from Sawtooth Software, Inc., www.sawtoothsoftware.com.

40 Easy Logit Modeling is available from ELM-Works, Inc., www.elm-works.com. ELM can estimate models based on

both RP and SP data, although we only mention it in the RP context.

Page 27: Bayesia Lab Choice Modeling 1

• Customer Region Classi"cation #1

• I Seek Variety in My Life

• I'm Curious and Open to Experiences

• Luxury is Not Important Unless it Has Purpose

• I Enjoy Expressing Myself Creatively

• I See Life as Full of Endless Possibilities

• Driving is one of my favorite things to do

• I really don't enjoy driving

• Whenever I get a chance, I love to go for a drive

• When I drive for fun, I mainly prefer to relax and lis-

ten to music or talk

• I want vehicles that provide that open-air driving ex-perience

• I prefer a vehicle that has the capability to outperform

others

• I prefer vehicles that provide superior straight ahead

power

• I prefer vehicles that provide superior handling and cornering agility

• I prefer a balance of comfort and performance

• I prefer vehicles that provide the softest, most com-

fortable ride quality

• I just want the basics on my vehicle - no extras

• Value equals balance of costs, comfort & performance

• I prefer vehicles that project a tough and workmanlike image

• Vehicles are a 'tool' or a part of the 'gear' in an active

outdoors lifestyle

• I Want to be able to tow heavy loads

• I want to be able to traverse any terrain

• I want the most versatility in my interior

• I want a basic, no frills vehicle that does the job

• My choice of vehicle re!ects my personality

• I want a vehicle that says a lot about my success in life / career

• I will switch brand for features or price

• There are lots of different brands of vehicles that I would consider buying

• I prefer sofa-like comfort over a cockpit-like interior

• I want a vehicle that provides the quietest interior

• I want to look good when driving my vehicle

• I want my vehicle to stand out in a crowd

• I would pay signi"cantly more for environmentally

friendly vehicle

• Price is most important to me when buying a new

vehicle

• Purchase Price (100's)

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 24

Page 28: Bayesia Lab Choice Modeling 1

References

Barber, David. “Bayesian Reasoning and Machine Learn-ing.” http://www.cs.ucl.ac.uk/staff/d.barber/brml.

———. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2011.  

Darwiche, Adnan. “Bayesian networks.” Communica-tions of the ACM 53, no. 12 (12, 2010): 80.  

Koller, Daphne, and Nir Friedman. Probabilistic Graphi-cal Models: Principles and Techniques. The MIT Press, 2009.  

Koppelman, Frank, and Chandra Bhat. “A Self Instruct-ing Course in Mode Choice Modeling: Multinomial and Nested Logit Models.” January 31, 2006.

Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. J. Jones, and M. A. Marra. “Circos: An information aesthetic for com-parative genomics.” Genome Research 19, no. 9 (6, 2009): 1639-1645.  

Neapolitan, Richard E., and Xia Jiang. Probabilistic Methods for Financial and Marketing Informatics. 1st ed. Morgan Kaufmann, 2007.  

Pearl, Judea. Causality: Models, Reasoning and Infer-ence. 2nd ed. Cambridge University Press, 2009.  

Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search, Second Edition. 2nd ed. The MIT Press, 2001.  

Train, Kenneth. Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. 1st ed. The MIT Press, 1985.  

Train, Kenneth E. Discrete Choice Methods with Simula-tion. Cambridge University Press, 2003.  

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 25

Page 29: Bayesia Lab Choice Modeling 1

Contact Information

Conrady Applied Science, LLC312 Hamlet’s End Way

Franklin, TN 37067

USA

+1 888-386-8383 [email protected]

www.conradyscience.com

Bayesia SAS6, rue Léonard de Vinci

BP 119

53001 Laval CedexFrance

+33(0)2 43 49 75 69

[email protected]

www.bayesia.com

Copyright© 2010 Conrady Applied Science, LLC and Bayesia SAS. All rights reserved.

Any redistribution or reproduction of part or all of the

contents in any form is prohibited other than the follow-

ing:

• You may print or download this document for your

personal and noncommercial use only.

• You may copy the content to individual third parties for their personal use, but only if you acknowledge

Conrady Applied Science, LLC and Bayesia SAS as the

source of the material.

• You may not, except with our express written permis-

sion, distribute or commercially exploit the content.

Nor may you transmit it or store it in any other web-

site or other form of electronic retrieval system.

Simulating Market Share with the Bayesia Market Simulator

Conrady Applied Science, LLC - www.conradyscience.com 26