Models with limited dependent variables Doctoral Program 2006-2007 Katia Campo

Models with limited dependent variables

Doctoral Program 2006-2007

Katia Campo

Introduction

Limited dependent variables

Discrete dependent variable Continuous dependent variable

Truncated/Censored

Regr.Models

DiscreteChoiceModels Duration

(Hazard)Models

Truncated,Censored

Discrete choice models

• Choice between different options (j)

Single Choice (binary choice models)

e.g. Buy a product or not, follow higher education or not, ...

j=1 (yes/accept) or 0 (no/reject)

Multiple Choice (multinomial choice models),

e.g. cars, stores, transportation modes j=1(opt.1), 2(opt.2), ....., J(opt.J)

Truncated/censored regression models

• Truncated variable:observed only beyond a certain threshold level (‘truncation point’)e.g. store expenditures, income

• Censored variables:values in a certain range are all transformed to (or reported as) a single value (Greene, p.761)

e.g. demand (stockouts, unfullfilled demand), hours worked

Duration/Hazard models

• Time between two events, e.g.– Time between two purchases– Time until a consumer becomes

inactive/cancels a subscription– Time until a consumer responds to direct mail/

a questionnaire– ...

Frances and Paap (2001)

Need to use adjusted models: Illustration

Overview

• Part I. Discrete Choice Models• Part II. Censored and Truncated Regression

Models• Part III. Duration Models

Recommended Literature

– Kenneth Train, Discrete Choice Methods with Simulation, Cambridge University Press, 2003 (Part I)

– Ph.H.Franses and R.Paap, Quantitative Models in Market Research, Cambridge University Press, 2001 (Part I-II-III; Data: www.few.eur.nl/few/people/paap)

– D.A.Hensher, J.M.Rose and W.H.Greene, Applied Choice Analysis, Cambridge University Press, 2005 (Part I)

Part I. Discrete Choice Models

Overview Part I, DCM

A. Properties of DCMB. Estimation of DCMC. Types of Discrete Choice Models

1. Binary Logit Model2. Multinomial Logit Model3. Nested logit model4. Probit Model5. Ordered Logit Model

D. Heterogeneity

Notation

• n = decision maker• i,j = choice options• y = decision outcome• x = explanatory variables = parameters = error term• I[.] = indicator function, equal to 1 if expression

within brackets is true, 0 otherwisee.g. I[y=j|x] = 1 if j was selected (given x), equal to 0 otherwise

A. Properties of DCM

1. Characteristics of the choice set Alternatives must be mutually exclusive

no combination of choice alternatives

(e.g. different brands, combination of diff. transportation modes)

Choice set must be exhaustivei.e., include all relevant alternatives

Finite number of alternatives

Kenneth Train


2. Random utility maximization

Ass: decision maker selects the alternative that provides the highest utility,

i.e. Selects i if Uni > Unj j i

Decomposition of utility into a deterministic (observed) and random (unobserved) part

Unj = Vnj + nj

Kenneth Train


2. Random utility maximization

nnnjnininjni dfijVVIP )()(

Kenneth Train

)(Prob

)(Prob

)(Prob

ijVV

ijVV

ijUUP

njnininj

njnjnini

njnini


3. Identification problemsa. Only differences in utility matter

Choice probabilities do not change when a constant is added to each alternative’s utility

ImplicationSome parameters cannot be identified/estimated Alternative-specific constants; Coefficients of variables that change over decision makers but not over alternatives

Normalization of parameter(s)

Kenneth Train


3. Identification problemsb. Overall scale of utility is irrelevant

Choice probabilities do not change when the utility of all alternatives are multiplied by the same factor

ImplicationCoefficients of models (data sets) are not directly comparable

Normalization (var.of error terms)

Kenneth Train


4. Aggregation

Biased estimates when aggregate values of the explanatory variables are used as input

Consistent estimates can be obtained by sample enumeration- compute prob./elasticity for each dec.maker- compute (weighted) average of these values

Kenneth Train

Swait and Louvière(1993), Andrews and Currim (2002)

Properties of DCM

4. Aggregation

Keneth Train

B. Estimation DCM

• Numerical maximization (ML-estimation)

• Simulation-assisted estimation

• Bayesian estimation(see Train)

B. ML-estimation

- Objective: “find those parameter values most likely to have produced the sample observations” (Judge et al.)

- Likelihood for one observation: Pn(X,)- Likelihoodfunction

L() = n Pn(X,) - Loglikelihood

LL() = n ln(Pn(X,))

B. ML Estimation

Determine for which LL() reaches its max- First derivative = 0 no closed-form

solution- Iterative procedure:

i. Starting values 0

ii. Determine new value t+1 for which LL(t+1) > LL(t)

iii. Repeat procedure ii until convergence (small change in LL())

B. ML Estimation

B. ML Estimation

- Direction and step size t → t+1 ?

based on taylor approximation of LL(t+1) (with base (t))

LL(t+1) = LL(t)+(t+1- t)’gt+1/2(t+1- t)’Ht (t+1- t) [1]

with

t

LLgt

)(

t

LLH t

'

)(²

B. ML Estimation

- Direction and step size t → t+1 ?

Optimization of [1] leads to:

Computation of the Hessian may cause problems

tttt gH 11 )(

B. ML Estimation

Alternatives procedures:

• Approximations to the Hessian

• Other procedures, such as steepest-ascent

See e.g. Train, Judge et al.(1985)

B. ML Estimation

Properties ML estimator

Consistency

Asymptotic Normality

Asymptotic Efficiency

See e.g. Greene (ch.17), Judge et al.

B.Diagnostics and Model Selection

Goodness-of-Fit• Joint significance of explanatory var’s

LR-test : LR = -2(LL(0) - LL()) LR ~ ²(k)

• Pseudo R² = 1 - LL()

LL(0)


Goodness-of-Fit• Akaike Information Criterion

AIC = 1/N (-2 LL() +2k)

• CAIC = -2LL() + k(log(N)+1)• BIC = 1/N (-2 LL() + k log(N)) sometimes conflicting results


Model selection based on GoF• Nested models : LR-test

LR = -2(LL(r) - LL(ur))

r=restricted model; ur=unrestricted (full) model

LR ~ ²(k) (k=difference in # of parameters)

• Non-nested models

AIC, CAIC, BIC lowest value

C. Discrete Choice Models

1. Binary Logit Model

2. Multinomial Logit Model

3. Nested logit model

4. Probit Model

5. Ordered Logit Model


• Choice between 2 alternatives

• Often ‘accept/reject’ or ‘yes/no’ decisions– E.g. Purchase incidence: make a purchase in the

category or not

• Dep. var. yn = 1, if option is selected

= 0, if option is not selected

• Model: P(yn=1| xn)


• Based on the general RUM-model

• Ass.: error terms are iid and follow an extreme value or Gumbel distribution

nnnjnininj

njnininjni

dfijVVI

ijVVP

)()(

)(Prob

nj

njnj

enj

enj

eF

eef

)(

)(


• Based on the general RUM-model• Pn = I[β’xn + εn > 0] f(ε) dε

= I[εn > -β’xn] f(ε) dε

= ε=-β’x f(ε) dε

= 1 – F(- β’xn)

= 1 – 1/(1+exp(β’xn))

= exp(β’xn)/(1+exp(β’xn))

Ass.: error terms are iid and follow an extreme value/Gumbel distr. nn e

n eef

)(


• Leads to the following expression for the logit choice probability

n

n

n

n

X

X

V

V

n

e

e

e

eP

1

1


Properties

- Nonlinear effect of explanatory var’s on dependent variable

- Logistic curve with inflection point at P=0.5



Effect of explanatory variables ?

For

Quasi-elasticity

nn xV 10

1)1()|1( nn

n

nn PPx

xyP

nnnnn

nn xPPxx

xyP1)1(

)|1(


Effect of explanatory variables ?

For

Odds ratio is equal to

nn xV 10

)exp()|0(

)|1(10 n

nn

nn xxyP

xyP


Estimation: ML- Likelihoodfunction L()

= n P(yn=1|x,)yn (1- P (yn=1|x,))1-yn

- Loglikelihood LL()

= n yn ln(P (yn=1|x,) )+

(1-yn) ln(1- P (yn=1|x,))


Forecasting accuracy• Predictions : yn=1 if F(Xn ) > c (e.g. 0.5)

yn=0 if F(Xn ) c

• Compute hit rate = % of correct predictions


Example: Purchase Incidence Model

pt

n(inc) = probability that household n engages

in a category purchase in the store on purchase occasion t,

Wtn = the utility of the purchase option.

)exp(1

)exp()(

nt

ntn

t W

WincP

Bucklin and Gupta (1992)


Example: Purchase Incidence Model

nt

nt

nt

nnt CVINVCRW 3210

WithCRn = rate of consumption for household nINVn

t = inventory level for household n, time tCVn

t= category value for household n, time t




• Data– A.C.Nielsen scanner panel data– 117 weeks: 65 for initialization, 52 for estimation– 565 households: 300 selected randomly for estimation,

remaining hh = holdout sample for validation– Data set for estimation: 30.966 shopping trips, 2275

purchases in the category (liquid laundry detergent)– Estimation limited to the 7 top-selling brands (80% of

category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)


Model # param. LL U² (pseudo R²)

BIC

Null model

Full model

1

4

-13614.4

-11234.5

-

.175

13619.6

11255.2

Goodness-of-Fit


Parameter Estimate (t-statistic)

Intercept γ0

CR γ1

INV γ2

CV γ3

-4.521 (-27.70)

.549 (4.18)

-.520 (-8.91)

.410 (8.00)

Parameter estimates

Variable Coefficient Std. Error z-Statistic Prob.

C 0.222121 0.668483 0.332277 0.7397

DISPLHEINZ 0.573389 0.239492 2.394186 0.0167

DISPLHUNTS -0.557648 0.247440 -2.253674 0.0242

FEATHEINZ 0.505656 0.313898 1.610896 0.1072

FEATHUNTS -1.055859 0.349108 -3.024445 0.0025

FEATDISPLHEINZ 0.428319 0.438248 0.977344 0.3284

FEATDISPLHUNTS -1.843528 0.468883 -3.931748 0.0001

PRICEHEINZ -135.1312 10.34643 -13.06066 0.0000

PRICEHUNTS 222.6957 19.06951 11.67810 0.0000

Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)


Mean dependent var 0.890279 S.D. dependent var 0.312598

S.E. of regression 0.271955 Akaike info criterion 0.504027

Sum squared resid 206.2728 Schwarz criterion 0.523123

Log likelihood -696.1344 Hannan-Quinn criter. 0.510921

Restr. log likelihood -967.918 Avg. log likelihood -0.248797

LR statistic (8 df) 543.5673 McFadden R-squared 0.280792

Probability(LR stat) 0.000000

Obs with Dep=0 307 Total obs 2798

Obs with Dep=1 2491


0.0

0.2

0.4

0.6

0.8

1.0

500 1000 1500 2000 2500

HEINZF

Forecast: HEINZFActual: HEINZForecast sample: 1 2798Included observations: 2798

Root Mean Squared Error 0.271517Mean Absolute Error 0.146875Mean Abs. Percent Error 7.343760Theil Inequality Coefficient 0.146965 Bias Proportion 0.000000 Variance Proportion 0.329791 Covariance Proportion 0.670209


Classification Tablea

81 226 26,4

34 2457 98,6

90,7

Observed,00

1,00

HE

Overall Percentage

Step 1,00 1,00

HE PercentageCorrect

Predicted

The cut value is ,500a.


• Choice between J>2 categories• Dependent variable yn = 1, 2, 3, .... J• Explanatory variables

– Different across individuals, not across categories (standard MNL model)

– Different across (individuals and) categories (conditional MNL model)

• Model: P(yn=j|Xn)


• Based on the general RUM-model

• Ass.: error terms are iid following an extreme value or Gumbel distribution

nnnjnininj

njnininjni

dfijVVI

ijVVP

)()(

)(Prob

nj

njnj

enj

enj

eF

eef

)(

)(


• Identification problem select reference category and set coeffients equal to 0

l nl

njnn x

xxjyP

)'exp(

)'exp()|(

ijx

xxjyP

xxiyP

il nl

njnn

il nlnn

)'exp(1

)'exp()|(

)'exp(1

1)|(

• Conditional MNL model

l nl

njnn x

xxjyP

)'exp(

)'exp()|(



• Interpretation of parameters– Derivative (marginal effect)

– Cross-effects

knjnjnjk

nn PPx

xjyP )1()|(

kninjnik

nn PPx

xjyP )|(

(Traditional MNL model, see Franses en Paap p.80)


• Interpretation of parameters– Overall effect

0

]))1[(

)()1(1

ijnjninik

J

j ijkninjknini

nik

nj

PPP

PPPPx

P


• Interpretation of parameters– Probability-ratio

– Does not depend on the other alternatives!

)(')|(

)|(ln

)'exp(

)'exp(

)|(

)|(

ninjnn

nn

ni

nj

nn

nn

xxxiyP

xjyP

x

x

xiyP

xjyP


• Estimation– ML estimation

))'exp(ln())'((

)))'exp(ln()'exp((ln(

)'exp(

)'exp(ln)(

)|(ln)(

)|()(

k nknjn j

nj

k nknjnj

nj

n j k nk

njnj

n jnnnj

n j

znn

xxz

xxz

x

xzLL

xjyPzLL

xjyPL nj

(znj=1 if j is selected, 0 otherwise)


• Estimation– Alternative estimation procedures

Simulation-assisted estimation (Train, Ch.10)

Bayesian estimation (Train, Ch.12)


• Example


itit

hit

hi

hit

hii

hit

j

hjt

hith

t

LSPSLLBPBLuU

U

UinciP

PromoPrice

)exp(

)exp()|(

65

4321


• Variables– Ui = constant for brand-size i

– BLhi = loyalty of household h to brand of brandsize i

– LBPhit = 1 if i was last brand purchased, 0 otherwise

– SLhi = loyalty of household h to size of brandsize i

– LSPhit = 1 if i was last size purchased, 0 otherwise

– Priceit = actual shelf price of brand-size i at time t

– Promoit = promotional status of brand-size i at time t



• Data– A.C.Nielsen scanner panel data– 117 weeks: 65 for initialization, 52 for estimation– 565 households: 300 selected randomly for estimation,

remaining hh = holdout sample for validation– Data set for estimation: 30.966 shopping trips, 2275

purchases in the category (liquid laundry detergent)– Estimation limited to the 7 top-selling brands (80% of

category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)




Model # param. LL U² (pseudo R²)

BIC

Null model

Full model

27

33

-5957.3

-3786.9

-

.364

6061.6

3914.3

Goodness-of-Fit



Parameter Estimate (t-statistic)

BL 1

LBP 2

SL 3

LSP 4

Price 5

Promo 6

3.499 (22.74)

.548 (6.50)

2.043 (13.67)

.512 (7.06)

-.696 (-13.66)

2.016 (21.33)

Estimation Results


Scale parameter• Variance of the extreme value distribution = ²/6• If true utility is U*

nj = *’xnj + *nj with var(*

nj)= ² (²/6), the estimated representative utility Vnj = ’xnj involves a rescaling of * → = * /

* and can not be estimated separately take into account that the estimated coeffients indicate the

variable’s effect relative to the variance of unobserved factors

Include scale parameters if subsamples in a pooled estimation (may) have different error variances


Scale parameter in case of pooled estimation of subsamples with different error variance

• For each subsample s, multiply utility by µs, which is estimated simultaneously with

• Normalization: set µs equal to 1 for 1 subs.• Values of µs reflect diff’s in error variation

– µs>1 : error variance is smaller in s than in the reference subsample

– µs<1 : error variance is larger in s than in the reference subsampleSwait and Louviere (1993), Andrews and Currim (2002)


• Example• Data from online experiment, 2 product categories• Three diff.assortments, assigned to diff.respondent

groups– Assortment 1: small assortment– Assortment 2 = ass.1 extended with add.brands– Assortment 3 = ass.1 extended with add types

• Explanatory variables are the same (hh char’s, MM), with exception of the constants

• A scale factor is introduced for assortment 2 and 3 (assortment 1 is reference with scale factor =1)

Breugelmans et al (2005)



MARGARINE

Attribute Assortment 1 (limited)

Assortment 2 (add new flavors of existing brands)

Assortment 3 (add new brands of existing flavors)

Brand Common a Common Common

Add new brandsFlavor Common Common Common

Add new flavors

# alternatives 11 19 17# respondents 105 116 100# purchase occasions 275 279 278

# screens needed < 1 > 1 > 1 CEREALS

Attribute Assortment 1 (limited)

Assortment 2 (add new flavors of existing brands)

Assortment 3 (add new brands of existing flavors)

Brand Common Common Common

Add new brandsFlavor Common Common Common

Add new flavors

# alternatives 21 32 46# respondents 81 97 87# purchase occasions 271 261 281

# screens needed > 1 > 1 > 1

2. Multinomial Logit Model• MNL-model – Pooled estimation

• Phit,a= the probability that household h chooses item i at time t, facing

assortment a• uh

it,a= the choice utility of item i for household h facing assortment a

= f(household variables, MM-variables)• Ch

a= set of category items available to household h within assortment a • µa = Gumbel scale factor

haCj

hajta

haitah

ait u

up

)(exp

)(exp

|

||

Breugelmans et al, based on Andrews and Currim 2002; Swait and Louvière 1993


Estimation results

• Goodness-of-Fit– (average) LL: -0.045 (M), -0.040 (C)

– BIC: 2929 (M), 4763(C)

– CAIC: 2871 (M), 4699 (C)

• Scale factors:– M: 1.2498 (ass2), 1.2627 (ass3)

– C: 1.0562 (ass2), 0.7573 (ass3)




Margarine Cereals

Variable Assortment 1 Assortment 2 Assortment 3 Variable Assortment 1 Assortment 2 Assortment 3

Scale factorMeanLast purchaseItem preferenceBrand asymmetrySize asymmetrySequenceProximity

[1.00]b

2.0675***2.8310***0.2805-0.0841- d

0.8332

1.2498***[2.5840***]c

[3.5382***]c

0.4228**-0.08800.3672**1.0303***

1.2627***[2.6106***]c

[3.5747***]c

0.5400*0.0169-0.11900.6235

Scale factorMeanLast purchaseItem preferenceBrand asymmetryTaste asymmetryType asymmetrySequenceProximity

[1.00]b

0.6441***5.2011***0.0077-0.02600.3119-0.33112.0041***

1.0562***[0.6803***]c

[5.4934***]c

0.61300.2938**-0.0614-0.06950.7214

0.7573***[0.4888***]c

[3.9109***]c

0.0969-0.15960.3816**0.6190***4.1140***

(Excluding brand/size constants)


• Limitations of the MNL model:– Independence of Irrelevant Alternatives

(proportional substitution pattern)– Order (where relevant) is not taken into account– Systematic taste variation can be represented,

not random taste variation– No correlation between error terms (iid errors)


• Independence of irrelevant alternatives• Ratio of choice probabilities for 2 alternatives i

and j does not depend on other alternatives (see above)

• Implication: proportional substitution patterns• Cf. Blue Bus – Red Bus Example

– T1: Blue bus (P=50%), Car (P=50%)

– T2: Blue bus (P=33%), Car (P=33%),Red bus (P=33%)


• Independence of irrelevant alternativesNew alternatives – or alternatives for which utility has increased - draw proportionally from all other alternatives

• Elasticity of Pni wrt variable xnj

njnjkkni

njk

njk

nixi

ninjknjk

ni

PxP

x

x

PE

PPx

P

njk ,,

,,

,

,


• Independence of irrelevant alternatives

Hausman-McFadden specification test

Basic idea: if a subset of the choice set is truly irrelevant, omitting it should not significantly affect the estimates.


• Independence of irrelevant alternativesHausman-McFadden specification test Procedure:

-Estimate logit model twice: a. on full set of alternatives b. on subset of alternatives (and subsample with choices from this set) -When IIA is

true,

)²(~1' kbaabba


• Independence of irrelevant alternatives Alternative Procedure:-Estimate logit model twice: a. on full set of alternatives b. on subset of alternatives (and subsample with choices from this set)- compute LL for subset b with parameters obtained for set a - Compare with LLb: GoF should be similar


• Solutions to IIA– Model with attribute-specific constants

(intrinsic preferences)– Nested Logit Model– Models that allow for correlation among the

error terms, such as Probit Models

Documents

Models with limited dependent variables Doctoral Program 2006-2007 Katia Campo