Demand Estimation - Discrete Choice Models

Demand Estimation - Discrete Choice Models

Li Zhao, SJTU

Spring, 2017

Li Zhao Demand 1 / 39

Demand Estimation

Traditional models: Product space models.

Discrete Choice models (Characteristics Space)I 1st generation: Logit / Probit, Nested Logit, Mixed LogitI 2nd generation: Logit with endogenous variableI 3rd generation: Random coe�cient + price endogeneity


Outline

1 Discrete Choice Models

2 Logit

3 Nested Logit

4 Mixed Logit


Discrete Choice Model - Introduction

Consumer chooses at most one good from �nite set of goods.

We will de�ne products as bundles of characteristics (includesunobservables).

We will assume that preferences are de�ned on these characteristics.

Each consumer chooses the bundle that maximizes her utility.

Consumers have di�erent preferences for di�erent characteristics.

Aggregate demand is the sum over individual demands.


Discrete Choice Framework

Products are bundles of characteristics: j = 0,1, ...J where 0 is theoutside good.

Consumer preferences are de�ned on space of characteristics.

The set of alternatives, called the choice set, needs to exhibit threecharacteristics.

I First, the alternatives must be mutually exclusive.I Second, the choice set must be exhaustive.I Third, the number of alternatives must be �nite.

Discrete choice models are usually derived under an assumption ofutility-maximizing behavior by the decision maker.


Why Discrete Choice Model

The number of parameters is primarily determined by the dimensionalof the characteristics and independent of the number of products.

We give micro foundation of the formation of elasticity.

We are estimating the joint distribution of preferences overcharacteristics.

If a new good is introduced, we can value that good since it is simplya bundle of characteristics.

We can model consumer heterogeneity.


Random Utility

A decision maker, labeled i , faces a choice among J alternatives.

The utility that decision maker i obtains from alternative j is Unj ,j = 1, ...,J.

The researcher observes some attributes of the alternatives as faced bythe decision maker and can specify a function that relates theseobserved factors to the decision maker's utility. Call this representativeutility Vij .

Utility is decomposed as Uij = Vij + εij .

The characteristics of εij , such as its distribution, is critical!

The probability that decision maker i chooses alternative j is

pij = Prob(Uij > Uik ,∀k 6= j).


Example

Consider a person who can take either a car or a bus to work.

We observes the time and cost that the person would incur under eachmode.

There are factors other than time and cost that a�ect the person'sutility and hence his choice.

Ucar = αTcar + βMcar + εcar ;

Ubus = αTbus + βMbus + εbus .


Choice Probability

De�ne the subset of �consumers� that lead to choice j as

Aj(θ) = {ε|uij > uik ,∀k}

.

The probability that consumer i chooses product j is

sj =∫

ε∈Aj (θ)f (ε)dε.

Under the assumption that �market size"M is very large and εi 's arei.i.d. across consumers, the Law of Large Numbers implies that marketdemand converges to M · sj .


Remarks

Normalization: choices of individual consumers are invariant to anytransformation of utilities.

Invariant to additive shifts implies normalizing mean utility of outsidegood to zero. Deduct ui0 from each uij for j = 1, ...,J.

Invariant to scale leads to normalizing one of the other parameters(typically variance of F ) to one.


Types of Discrete Choice Models

Logit / ProbitI ε ∼ Type 1 extreme value distribution.I ε ∼Multivatiate Normal.

Nested LogitI ε ∼ Generalized extreme value distribution.

Random Coe�cient LogitI Heterogeneous consumer preference.


Outline


2 Logit

3 Nested Logit

4 Mixed Logit


Logit - Introduction

Logit is the most widely used discrete choice model.

uij = Xjβ −αpj + εij .

It is derived under the assumption that εij is iid type I extreme valuefor all i .

The critical part of the assumption is that the unobserved factors areuncorrelated over alternatives, as well as having the same variance forall alternatives.

This assumption, while restrictive, provides a very convenient form forthe choice probability.


Type I Extreme Value Distribution

A Type I extreme value distribution has CDF

F (v) = exp(−exp(−v))

on v ∈ R.Motivation for extreme value distribution: Suppose that (X1,X2, ...) isa sequence of independent random variables, each with the standardexponential distribution G (x) = 1− e−x . The distribution ofY = max{X1,X2, ...Xn)− ln(n) converges to the (Type I) standardextreme value distribution as n→ ∞.

Distribution of the di�erence between two extreme value variables:

ε∗jk = εij − εik .

F (ε∗jk) =

exp(ε∗jk)

1+ exp(ε∗jk).

.


Logit Choice Probability

The utility functionuij = Xjβ −αpj + εij

where εij is distributed i.i.d. Type I extreme value.

The probability that decision maker i chooses alternative j is

Pij(θ) =exp(Xjβ −αpj)

∑Jk=0 exp(Xkβ −αpk)

.

The market share of product j = 0,1, ...J is

sj(θ) =exp(Xjβ −αpj)

∑Jk=0 exp(Xkβ −αpk)

.

Normalize the mean utility of the outside good to zero impliess0(θ) = 1/∑

Jk=0 exp(Xkβ −αpk).


Estimation

A sample of N decision makers is obtained for the purpose ofestimation.

Since the Logit probabilities take a closed form, the traditionalmaximum-likelihood procedures can be applied.

L =N

∏i=1

J

∏j=0

(Pij)yij ,

where yij = 1 if person i chooses alternative j .

The log-likelihood function is

LL =N

∑i=1

J

∑j=0

yij lnPij .


Consumer Welfare

Consumer surplus is the utility get for obtaining project.

CSi =1

αmaxj

(Uj for all j).

I Parameter α is the marginal utility of money (coe�cient on price).

Expected consumer surplus is

E (CSi ) =1

αE [max

j(Vj + εj for all j)]

=1

αln(

J

∑j=1

exp(Vj) +C .

I Taking average over the unobserved component of utility.

Welfare evaluation for policy changeI Changes in observed characteristics.I Changes in equilibrium price from alternative market structures.I Removal / Introduction of new goods.


Example: Guadagni and Little (Mkt Sci 1983)

Guadagni, P.M. and Little, J.D., 1983. A logit model of brand choicecalibrated on scanner data. Marketing science, 2(3), pp.203-238.

A multinomial Logit model of brand choices.

32 weeks of purchases of regular ground co�ee by 100 household.

The study shows high statistical signi�cance of brand loyalty, sizeloyalty, store promotion, and promotional price cut.


Guadagni and Little (1983) - Model

Model:uij = Xijβ −αpj + εij .

yijt = 1 if consumer i chooses product j at time t.

Control variable X :I Product dummy.I Regular price.I Promotion: presence of promotion, amount of price-cut (possible zero)during promotion, lagged promotion variable (if the previous purchaseof co�ee was a promotional purchase of an alternative of the samebrand as brand-size j).

I Loyalty: weighted average of past purchases of the same brand / size.

By bringing in past customer purchase behavior, we convenientlymodel purchase probability heterogeneity while treating customershomogeneously.


Guadagni and Little (1983) - Data

Data being collected by optical scanning of the Universal ProductCode (UPC).

I Store data vs. panel data.

De�nition of set of alternative (products set).I Exact level of aggregation and which ones to include in the relevant setare not necessarily obvious.

I Ground / instant, ca�einated / decaf, within instant, freeze dried /non-freeze.

I Size: small and large.

Ground co�ee store and panel records from four Kansas Citysupermarkets for the 78-week period in late 1970s.

2000 households, each of which has indicated it makes 90% or more ofits purchases at one of the stores collecting the data.


Guadagni and Little (1983) - Counterfactual

Now we wish to study how the market responds to the retailers'actions.

Analytically, determination of aggregate market response calls for anintegration of the customer response function over a joint distributionof X s.

One standard approach to such a high dimensional integration isMonte Carlo sampling.

A simpler and easier method to implement is to use the actualcustomers, time periods and attribute variables of the data.

Many di�erent response calculations are possible. Some examples areI (1) regular price elasticity of share,I (2) the cross elasticity of one brand-size's share to another's price,I (3) share response to promotion,I (4) the cannibalization of one size of a brand by the promotion ofanother and

I (5) the elasticity of price during a promotion.


Limitation of Logit - Substitution PatternsIIA (Independence from Irrelevant Alternatives)

For any two alternatives i and k, the ratio of the logit probabilities is

Pij

Pik=

exp(Vij)/∑

exp(Vik)/∑= exp(Vij −Vik)

This ratio does not depend on any alternatives other than i and k.

�Red-bus�blue-bus problem�I Assume that city has two transportation schemes: walk, and red bus,with shares 50%, 50%.

I Now consider introduction of third option: train. IIA implies that oddsratio between walk/red bus is still 1.

I What if third option were blue bus? IIA implies that odds ratiobetween walk/red bus would still be 1.

I This is especially troubling if you want to use Logit model to predictpenetration of new products.

Overcome IIA: Probit, Nested Logit, Mixed Logit, Random Coe�cientLogit.


Multinomial Probit

Recall the random utility model takes the form

uij = Xjβ −αpj + εij .

pij = Prob(Uij > Uik ,∀k 6= j).

In Logit, we assume εij ∼ i.i.d. extreme value distribution.

In Probit, we assume εi = [εi1, ...εiJ ]∼ multivariate normaldistribution.

I Probit allows for correlation between unobserved utility among choicesand does not reply on IIA assumptions.


Outline


2 Logit

3 Nested Logit

4 Mixed Logit


Nested Logit

A nested Logit model is appropriate when the set of alternatives facedby a decision maker can be partitioned into subsets, called nests, insuch a way that the following properties hold:

For any two alternatives that are in the same nest, the ratio ofprobabilities is independent of the attributes or existence of all otheralternatives. That is, IIA holds within each nest.

For any two alternatives in di�erent nests, the ratio of probabilitiescan depend on the attributes of other alternatives in the two nests.IIA does not hold in general for alternatives in di�erent nests.


Nested Logit - Illustration

Auto, (Red Bus, Blue Bus).

εnj have a joint cumulative distribution of

F (εn1,εn2, ..) = exp(−S

∑s=1

( ∑j∈Jg

exp(−εnj

λs))λs ).

Within the nest the correlation coe�cient of is approximately 1−λs ,Between the nests choices are independent.

Conditional probability of choice j given g is Pj |g =exp(

δjλs)

∑j ′∈Jg exp(δj ′λs

).

The probability of choosing group g is Pg =(∑j∈Jg exp(

δjλs))λs

∑g ′ (∑j∈Jg ′exp(

δj ′λs′

))λs′

Pj = Pj |g ·Pg .


Nested Logit - Estimation

ML joint estimation.

Sequential estimation using nesting structure.I Estimate lower model: Within the nest we have a conditional MNLwith coe�cients β/λs .

I Compute Inclusive value, ∑j∈Jg exp(δj

λs )using estimates from step 1.

I Estimate upper model with inclusive value as explanatory variable,recover λs .


Example: Morey et. al. (1993)

Morey, E.R., Rowe, R.D. and Watson, M., 1993. A repeated nestedLogit model of Atlantic salmon �shing. American Journal of

Agricultural Economics, 75(3), pp.578-592.

Participation and site choice for Atlantic salmon �shing are modeled inthe context of a repeated three-level nested Logit model.

Consumer's surplus measures are derived for di�erent levels of speciesavailability in the Penobscot River, the most important salmon river inNew England.

For comparison, six other travel-cost models are estimated.

Estimating these models indicate the importance of modeling theparticipation decision, including income e�ects, and of adopting anested Logit structure rather than a single-level Logit structure


Morey et. al. (1993) - Model

Fishing season is divided into T periods such that in each period theindividual takes, at most, one �shing trip.

Ujt = Vj + εjt.

j =0 (non�shing alternative), j = 1−5 are Maine sites, and j = 6−8are Canadian sites.

V0 = V0(ppy −pj , C +ppy , yrs, club, age).

Vj = Vj(ppy −pj ,C +ppy −pj ,Catchj).

.

εjt are drawn from generalized extreme distribution

F (ε) = exp[−e−sε0

− [(e−sε1 + ...− e−sε5)λ/s + (e−sε6 + ...+ e−sε8)λ/s ]1/λ ].


Morey et. al. (1993) - Counterfactual

Individual's expected maximum utility in a single period is

V = V (ppy ,yrs,club,age,P,Catch).

Per-period compensation variation is

V (ppy ,yrs,club,age,P0,Catch0)

=V (ppy −PPCV ,yrs,club,age,P1,Catch1).

Compare utilities under 3 scenariosI Elimination of the Atlantic salmon �shery at the Penobscot River.I Double catch rates at the Penobscot River.I Halve catch rates at the Penobscot River.


Morey et. al. (1993) - Testing Competing Models

The paper tries six other travel - cost models to compare the results.I Repeated Logit with /without income e�ects.I Standard Logit treating non-participation is not one of the alternative.I A partial demand (share) model of the site choices.I Single-site linear demand function.I Single-site log-linear demand function.


Outline


2 Logit

3 Nested Logit

4 Mixed Logit


Mixed Logit (1)

Mixed Logit obviates the three limitations of standard Logit byallowing for random taste variation, unrestricted substitution patterns,and correlation in unobserved factors over time.

Like Probit, the mixed Logit model has been known for many yearsbut has only become fully applicable since the advent of simulation.

A mixed logit model is any model whose choice probabilities can beexpressed in the form

Pij =∫

Lij(β ) · f (β ) ·dβ

where Lij(β ) is the Logit probability evaluated at parameters β :

Lij(β ) =exp(Vij(β ))

∑j exp(Vij(β )).

Mixed Logit is a mixture of the Logit function evaluated at di�erentβ 's with f (β ) as the mixing distribution.


Mixed Logit (2)A mixed Logit model is any model whose choice probabilities can beexpressed in the form

Pij =∫

Lij(β ) · f (β ) ·dβ .

Discrete: The mixing distribution f (β ) can be discrete

Pij =M

∑m=1

sm(exp(bmxij)

∑k exp(bmxik)).

This speci�cation is useful if there are M segments in the population, eachof which has its own choice behavior or preferences.

Continuous: More frequently, the mixing distribution is speci�ed to becontinuous. For example, the density of β can be speci�ed to benormal with mean b and covariance W .

Pni =∫

(exp(bmxij)

∑k exp(bmxik))φ(β |b,W )dβ .

Calculating integration is di�cult, but taking average is easy. We canuse simulation to replace integration with average.


Random Coe�cients Logit

In a random coe�cient Logit, consumer i 's utility for product j is

uij = Xjβi −αipj + ξj + εij

where(αi ,βi )

′ ∼ N((α, β )′,Σ).

(αi ,βi ) di�er across consumers. They are called �random coe�cients�.

(α, β ) and Σ are additional parameters to be estimated.

We can also make parametric assumption of (α, β ) to let themdepend on consumer characteristics (e.g income).(

αi

βi

)∼(

α

β

)+ ΠDi + vi .


Example: Rossi et. al. (1996)

Rossi, P.E., McCulloch, R.E. and Allenby, G.M., 1996. The value ofpurchase history data in target marketing. Marketing Science, 15(4),pp.321-340.

An important aspect of marketing practice is the targeting ofconsumer segments for di�erential promotional activity.

The goal of this paper is to assess the information content of variousinformation sets available for direct marketing purposes.

Information on the consumer is obtained from the current and pastpurchase history as well as demographic characteristics

Our results indicate there exists a tremendous po- tential forimproving the pro�tability of direct marketing e�orts by more fullyutilizing household purchase histories.


Rossi et. al. (1996) - Model and Data

uijt = Xitβi + εijt ,

where εit ∼ N(0,Λ).

Xit contains product dummy, prices, display, product features.

The random coe�cients takes the form

βi = ΠDi + vi .

Five brands of tuna, packaged in six-ounce cans, purchased inSpring�eld, Missouri.

400 household who remains in panel at least 1.5 years.

Demographic variables: family income, size, retired, unemployed,female headed families.


Rossi et. al. (1996) - Target Marketing

The coupon acts as a temporary price cut equal to its face value.

We ignore dynamic e�ects induced by consumer stock- piling inresponse to coupon availability.

Incremental Sale= Pr(j |βi , Λ , price−F , X )−Pr(j |βi , Λ , price, X ).

The expected net revenue is πF = Pr(j |βi , Λ , price−F , X ) · (M−F ).

βi is estimated based on various information setI No information at al.I Demographic.I Demographic + purchase history.I Demographic + one purchase and its causal e�ects.I Demographic + purchase history and all the causal e�ects.


Summary

Product space and characteristics space demand models.

Discrete choice, micro foundation, normalization.

Logit, IIA assumption, CS.

Extension of Logit: Probit, Nested Logit, Mixed Logit.

Counterfactual: change in prices, removing products, change features.


Documents

Demand Estimation - Discrete Choice Models