Upload
marjory-miles
View
216
Download
1
Embed Size (px)
Citation preview
Models with limited dependent variables
Doctoral Program 2006-2007
Katia Campo
Introduction
Limited dependent variables
Discrete dependent variable Continuous dependent variable
Truncated/Censored
Regr.Models
DiscreteChoiceModels Duration
(Hazard)Models
Truncated,Censored
Discrete choice models
• Choice between different options (j)
Single Choice (binary choice models)
e.g. Buy a product or not, follow higher education or not, ...
j=1 (yes/accept) or 0 (no/reject)
Multiple Choice (multinomial choice models),
e.g. cars, stores, transportation modes j=1(opt.1), 2(opt.2), ....., J(opt.J)
Truncated/censored regression models
• Truncated variable:observed only beyond a certain threshold level (‘truncation point’)e.g. store expenditures, income
• Censored variables:values in a certain range are all transformed to (or reported as) a single value (Greene, p.761)
e.g. demand (stockouts, unfullfilled demand), hours worked
Duration/Hazard models
• Time between two events, e.g.– Time between two purchases– Time until a consumer becomes
inactive/cancels a subscription– Time until a consumer responds to direct mail/
a questionnaire– ...
Frances and Paap (2001)
Need to use adjusted models: Illustration
Overview
• Part I. Discrete Choice Models• Part II. Censored and Truncated Regression
Models• Part III. Duration Models
Recommended Literature
– Kenneth Train, Discrete Choice Methods with Simulation, Cambridge University Press, 2003 (Part I)
– Ph.H.Franses and R.Paap, Quantitative Models in Market Research, Cambridge University Press, 2001 (Part I-II-III; Data: www.few.eur.nl/few/people/paap)
– D.A.Hensher, J.M.Rose and W.H.Greene, Applied Choice Analysis, Cambridge University Press, 2005 (Part I)
Part I. Discrete Choice Models
Overview Part I, DCM
A. Properties of DCMB. Estimation of DCMC. Types of Discrete Choice Models
1. Binary Logit Model2. Multinomial Logit Model3. Nested logit model4. Probit Model5. Ordered Logit Model
D. Heterogeneity
Notation
• n = decision maker• i,j = choice options• y = decision outcome• x = explanatory variables = parameters = error term• I[.] = indicator function, equal to 1 if expression
within brackets is true, 0 otherwisee.g. I[y=j|x] = 1 if j was selected (given x), equal to 0 otherwise
A. Properties of DCM
1. Characteristics of the choice set Alternatives must be mutually exclusive
no combination of choice alternatives
(e.g. different brands, combination of diff. transportation modes)
Choice set must be exhaustivei.e., include all relevant alternatives
Finite number of alternatives
Kenneth Train
A. Properties of DCM
2. Random utility maximization
Ass: decision maker selects the alternative that provides the highest utility,
i.e. Selects i if Uni > Unj j i
Decomposition of utility into a deterministic (observed) and random (unobserved) part
Unj = Vnj + nj
Kenneth Train
A. Properties of DCM
2. Random utility maximization
nnnjnininjni dfijVVIP )()(
Kenneth Train
)(Prob
)(Prob
)(Prob
ijVV
ijVV
ijUUP
njnininj
njnjnini
njnini
A. Properties of DCM
3. Identification problemsa. Only differences in utility matter
Choice probabilities do not change when a constant is added to each alternative’s utility
ImplicationSome parameters cannot be identified/estimated Alternative-specific constants; Coefficients of variables that change over decision makers but not over alternatives
Normalization of parameter(s)
Kenneth Train
A. Properties of DCM
3. Identification problemsb. Overall scale of utility is irrelevant
Choice probabilities do not change when the utility of all alternatives are multiplied by the same factor
ImplicationCoefficients of models (data sets) are not directly comparable
Normalization (var.of error terms)
Kenneth Train
A. Properties of DCM
4. Aggregation
Biased estimates when aggregate values of the explanatory variables are used as input
Consistent estimates can be obtained by sample enumeration- compute prob./elasticity for each dec.maker- compute (weighted) average of these values
Kenneth Train
Swait and Louvière(1993), Andrews and Currim (2002)
Properties of DCM
4. Aggregation
Keneth Train
B. Estimation DCM
• Numerical maximization (ML-estimation)
• Simulation-assisted estimation
• Bayesian estimation(see Train)
B. ML-estimation
- Objective: “find those parameter values most likely to have produced the sample observations” (Judge et al.)
- Likelihood for one observation: Pn(X,)- Likelihoodfunction
L() = n Pn(X,) - Loglikelihood
LL() = n ln(Pn(X,))
B. ML Estimation
Determine for which LL() reaches its max- First derivative = 0 no closed-form
solution- Iterative procedure:
i. Starting values 0
ii. Determine new value t+1 for which LL(t+1) > LL(t)
iii. Repeat procedure ii until convergence (small change in LL())
B. ML Estimation
B. ML Estimation
- Direction and step size t → t+1 ?
based on taylor approximation of LL(t+1) (with base (t))
LL(t+1) = LL(t)+(t+1- t)’gt+1/2(t+1- t)’Ht (t+1- t) [1]
with
t
LLgt
)(
t
LLH t
'
)(²
B. ML Estimation
- Direction and step size t → t+1 ?
Optimization of [1] leads to:
Computation of the Hessian may cause problems
tttt gH 11 )(
B. ML Estimation
Alternatives procedures:
• Approximations to the Hessian
• Other procedures, such as steepest-ascent
See e.g. Train, Judge et al.(1985)
B. ML Estimation
Properties ML estimator
Consistency
Asymptotic Normality
Asymptotic Efficiency
See e.g. Greene (ch.17), Judge et al.
B.Diagnostics and Model Selection
Goodness-of-Fit• Joint significance of explanatory var’s
LR-test : LR = -2(LL(0) - LL()) LR ~ ²(k)
• Pseudo R² = 1 - LL()
LL(0)
B.Diagnostics and Model Selection
Goodness-of-Fit• Akaike Information Criterion
AIC = 1/N (-2 LL() +2k)
• CAIC = -2LL() + k(log(N)+1)• BIC = 1/N (-2 LL() + k log(N)) sometimes conflicting results
B.Diagnostics and Model Selection
Model selection based on GoF• Nested models : LR-test
LR = -2(LL(r) - LL(ur))
r=restricted model; ur=unrestricted (full) model
LR ~ ²(k) (k=difference in # of parameters)
• Non-nested models
AIC, CAIC, BIC lowest value
C. Discrete Choice Models
1. Binary Logit Model
2. Multinomial Logit Model
3. Nested logit model
4. Probit Model
5. Ordered Logit Model
1. Binary Logit Model
• Choice between 2 alternatives
• Often ‘accept/reject’ or ‘yes/no’ decisions– E.g. Purchase incidence: make a purchase in the
category or not
• Dep. var. yn = 1, if option is selected
= 0, if option is not selected
• Model: P(yn=1| xn)
1. Binary Logit Model
• Based on the general RUM-model
• Ass.: error terms are iid and follow an extreme value or Gumbel distribution
nnnjnininj
njnininjni
dfijVVI
ijVVP
)()(
)(Prob
nj
njnj
enj
enj
eF
eef
)(
)(
1. Binary Logit Model
• Based on the general RUM-model• Pn = I[β’xn + εn > 0] f(ε) dε
= I[εn > -β’xn] f(ε) dε
= ε=-β’x f(ε) dε
= 1 – F(- β’xn)
= 1 – 1/(1+exp(β’xn))
= exp(β’xn)/(1+exp(β’xn))
Ass.: error terms are iid and follow an extreme value/Gumbel distr. nn e
n eef
)(
1. Binary Logit Model
• Leads to the following expression for the logit choice probability
n
n
n
n
X
X
V
V
n
e
e
e
eP
1
1
1. Binary Logit Model
Properties
- Nonlinear effect of explanatory var’s on dependent variable
- Logistic curve with inflection point at P=0.5
1. Binary Logit Model
1. Binary Logit Model
Effect of explanatory variables ?
For
Quasi-elasticity
nn xV 10
1)1()|1( nn
n
nn PPx
xyP
nnnnn
nn xPPxx
xyP1)1(
)|1(
1. Binary Logit Model
Effect of explanatory variables ?
For
Odds ratio is equal to
nn xV 10
)exp()|0(
)|1(10 n
nn
nn xxyP
xyP
1. Binary Logit Model
Estimation: ML- Likelihoodfunction L()
= n P(yn=1|x,)yn (1- P (yn=1|x,))1-yn
- Loglikelihood LL()
= n yn ln(P (yn=1|x,) )+
(1-yn) ln(1- P (yn=1|x,))
1. Binary Logit Model
Forecasting accuracy• Predictions : yn=1 if F(Xn ) > c (e.g. 0.5)
yn=0 if F(Xn ) c
• Compute hit rate = % of correct predictions
1. Binary Logit Model
Example: Purchase Incidence Model
pt
n(inc) = probability that household n engages
in a category purchase in the store on purchase occasion t,
Wtn = the utility of the purchase option.
)exp(1
)exp()(
nt
ntn
t W
WincP
Bucklin and Gupta (1992)
1. Binary Logit Model
Example: Purchase Incidence Model
nt
nt
nt
nnt CVINVCRW 3210
WithCRn = rate of consumption for household nINVn
t = inventory level for household n, time tCVn
t= category value for household n, time t
Bucklin and Gupta (1992)
1. Binary Logit Model
Bucklin and Gupta (1992)
• Data– A.C.Nielsen scanner panel data– 117 weeks: 65 for initialization, 52 for estimation– 565 households: 300 selected randomly for estimation,
remaining hh = holdout sample for validation– Data set for estimation: 30.966 shopping trips, 2275
purchases in the category (liquid laundry detergent)– Estimation limited to the 7 top-selling brands (80% of
category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)
1. Binary Logit Model
Model # param. LL U² (pseudo R²)
BIC
Null model
Full model
1
4
-13614.4
-11234.5
-
.175
13619.6
11255.2
Goodness-of-Fit
1. Binary Logit Model
Parameter Estimate (t-statistic)
Intercept γ0
CR γ1
INV γ2
CV γ3
-4.521 (-27.70)
.549 (4.18)
-.520 (-8.91)
.410 (8.00)
Parameter estimates
Variable Coefficient Std. Error z-Statistic Prob.
C 0.222121 0.668483 0.332277 0.7397
DISPLHEINZ 0.573389 0.239492 2.394186 0.0167
DISPLHUNTS -0.557648 0.247440 -2.253674 0.0242
FEATHEINZ 0.505656 0.313898 1.610896 0.1072
FEATHUNTS -1.055859 0.349108 -3.024445 0.0025
FEATDISPLHEINZ 0.428319 0.438248 0.977344 0.3284
FEATDISPLHUNTS -1.843528 0.468883 -3.931748 0.0001
PRICEHEINZ -135.1312 10.34643 -13.06066 0.0000
PRICEHUNTS 222.6957 19.06951 11.67810 0.0000
Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)
Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)
Mean dependent var 0.890279 S.D. dependent var 0.312598
S.E. of regression 0.271955 Akaike info criterion 0.504027
Sum squared resid 206.2728 Schwarz criterion 0.523123
Log likelihood -696.1344 Hannan-Quinn criter. 0.510921
Restr. log likelihood -967.918 Avg. log likelihood -0.248797
LR statistic (8 df) 543.5673 McFadden R-squared 0.280792
Probability(LR stat) 0.000000
Obs with Dep=0 307 Total obs 2798
Obs with Dep=1 2491
Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)
0.0
0.2
0.4
0.6
0.8
1.0
500 1000 1500 2000 2500
HEINZF
Forecast: HEINZFActual: HEINZForecast sample: 1 2798Included observations: 2798
Root Mean Squared Error 0.271517Mean Absolute Error 0.146875Mean Abs. Percent Error 7.343760Theil Inequality Coefficient 0.146965 Bias Proportion 0.000000 Variance Proportion 0.329791 Covariance Proportion 0.670209
Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)
Classification Tablea
81 226 26,4
34 2457 98,6
90,7
Observed,00
1,00
HE
Overall Percentage
Step 1,00 1,00
HE PercentageCorrect
Predicted
The cut value is ,500a.
2. Multinomial Logit Model
• Choice between J>2 categories• Dependent variable yn = 1, 2, 3, .... J• Explanatory variables
– Different across individuals, not across categories (standard MNL model)
– Different across (individuals and) categories (conditional MNL model)
• Model: P(yn=j|Xn)
2. Multinomial Logit Model
• Based on the general RUM-model
• Ass.: error terms are iid following an extreme value or Gumbel distribution
nnnjnininj
njnininjni
dfijVVI
ijVVP
)()(
)(Prob
nj
njnj
enj
enj
eF
eef
)(
)(
2. Multinomial Logit Model
• Identification problem select reference category and set coeffients equal to 0
l nl
njnn x
xxjyP
)'exp(
)'exp()|(
ijx
xxjyP
xxiyP
il nl
njnn
il nlnn
)'exp(1
)'exp()|(
)'exp(1
1)|(
• Conditional MNL model
l nl
njnn x
xxjyP
)'exp(
)'exp()|(
2. Multinomial Logit Model
2. Multinomial Logit Model
• Interpretation of parameters– Derivative (marginal effect)
– Cross-effects
knjnjnjk
nn PPx
xjyP )1()|(
kninjnik
nn PPx
xjyP )|(
(Traditional MNL model, see Franses en Paap p.80)
2. Multinomial Logit Model
• Interpretation of parameters– Overall effect
0
]))1[(
)()1(1
ijnjninik
J
j ijkninjknini
nik
nj
PPP
PPPPx
P
2. Multinomial Logit Model
• Interpretation of parameters– Probability-ratio
– Does not depend on the other alternatives!
)(')|(
)|(ln
)'exp(
)'exp(
)|(
)|(
ninjnn
nn
ni
nj
nn
nn
xxxiyP
xjyP
x
x
xiyP
xjyP
2. Multinomial Logit Model
• Estimation– ML estimation
))'exp(ln())'((
)))'exp(ln()'exp((ln(
)'exp(
)'exp(ln)(
)|(ln)(
)|()(
k nknjn j
nj
k nknjnj
nj
n j k nk
njnj
n jnnnj
n j
znn
xxz
xxz
x
xzLL
xjyPzLL
xjyPL nj
(znj=1 if j is selected, 0 otherwise)
2. Multinomial Logit Model
• Estimation– Alternative estimation procedures
Simulation-assisted estimation (Train, Ch.10)
Bayesian estimation (Train, Ch.12)
2. Multinomial Logit Model
• Example
Bucklin and Gupta (1992)
itit
hit
hi
hit
hii
hit
j
hjt
hith
t
LSPSLLBPBLuU
U
UinciP
PromoPrice
)exp(
)exp()|(
65
4321
2. Multinomial Logit Model
• Variables– Ui = constant for brand-size i
– BLhi = loyalty of household h to brand of brandsize i
– LBPhit = 1 if i was last brand purchased, 0 otherwise
– SLhi = loyalty of household h to size of brandsize i
– LSPhit = 1 if i was last size purchased, 0 otherwise
– Priceit = actual shelf price of brand-size i at time t
– Promoit = promotional status of brand-size i at time t
Bucklin and Gupta (1992)
2. Multinomial Logit Model
• Data– A.C.Nielsen scanner panel data– 117 weeks: 65 for initialization, 52 for estimation– 565 households: 300 selected randomly for estimation,
remaining hh = holdout sample for validation– Data set for estimation: 30.966 shopping trips, 2275
purchases in the category (liquid laundry detergent)– Estimation limited to the 7 top-selling brands (80% of
category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)
Bucklin and Gupta (1992)
2. Multinomial Logit Model
Bucklin and Gupta (1992)
Model # param. LL U² (pseudo R²)
BIC
Null model
Full model
27
33
-5957.3
-3786.9
-
.364
6061.6
3914.3
Goodness-of-Fit
2. Multinomial Logit Model
Bucklin and Gupta (1992)
Parameter Estimate (t-statistic)
BL 1
LBP 2
SL 3
LSP 4
Price 5
Promo 6
3.499 (22.74)
.548 (6.50)
2.043 (13.67)
.512 (7.06)
-.696 (-13.66)
2.016 (21.33)
Estimation Results
2. Multinomial Logit Model
Scale parameter• Variance of the extreme value distribution = ²/6• If true utility is U*
nj = *’xnj + *nj with var(*
nj)= ² (²/6), the estimated representative utility Vnj = ’xnj involves a rescaling of * → = * /
* and can not be estimated separately take into account that the estimated coeffients indicate the
variable’s effect relative to the variance of unobserved factors
Include scale parameters if subsamples in a pooled estimation (may) have different error variances
2. Multinomial Logit Model
Scale parameter in case of pooled estimation of subsamples with different error variance
• For each subsample s, multiply utility by µs, which is estimated simultaneously with
• Normalization: set µs equal to 1 for 1 subs.• Values of µs reflect diff’s in error variation
– µs>1 : error variance is smaller in s than in the reference subsample
– µs<1 : error variance is larger in s than in the reference subsampleSwait and Louviere (1993), Andrews and Currim (2002)
2. Multinomial Logit Model
• Example• Data from online experiment, 2 product categories• Three diff.assortments, assigned to diff.respondent
groups– Assortment 1: small assortment– Assortment 2 = ass.1 extended with add.brands– Assortment 3 = ass.1 extended with add types
• Explanatory variables are the same (hh char’s, MM), with exception of the constants
• A scale factor is introduced for assortment 2 and 3 (assortment 1 is reference with scale factor =1)
Breugelmans et al (2005)
2. Multinomial Logit Model
Breugelmans et al (2005)
MARGARINE
Attribute Assortment 1 (limited)
Assortment 2 (add new flavors of existing brands)
Assortment 3 (add new brands of existing flavors)
Brand Common a Common Common
Add new brandsFlavor Common Common Common
Add new flavors
# alternatives 11 19 17# respondents 105 116 100# purchase occasions 275 279 278
# screens needed < 1 > 1 > 1 CEREALS
Attribute Assortment 1 (limited)
Assortment 2 (add new flavors of existing brands)
Assortment 3 (add new brands of existing flavors)
Brand Common Common Common
Add new brandsFlavor Common Common Common
Add new flavors
# alternatives 21 32 46# respondents 81 97 87# purchase occasions 271 261 281
# screens needed > 1 > 1 > 1
2. Multinomial Logit Model• MNL-model – Pooled estimation
• Phit,a= the probability that household h chooses item i at time t, facing
assortment a• uh
it,a= the choice utility of item i for household h facing assortment a
= f(household variables, MM-variables)• Ch
a= set of category items available to household h within assortment a • µa = Gumbel scale factor
haCj
hajta
haitah
ait u
up
)(exp
)(exp
|
||
Breugelmans et al, based on Andrews and Currim 2002; Swait and Louvière 1993
2. Multinomial Logit Model
Estimation results
• Goodness-of-Fit– (average) LL: -0.045 (M), -0.040 (C)
– BIC: 2929 (M), 4763(C)
– CAIC: 2871 (M), 4699 (C)
• Scale factors:– M: 1.2498 (ass2), 1.2627 (ass3)
– C: 1.0562 (ass2), 0.7573 (ass3)
Breugelmans et al (2005)
2. Multinomial Logit Model
Breugelmans et al (2005)
Margarine Cereals
Variable Assortment 1 Assortment 2 Assortment 3 Variable Assortment 1 Assortment 2 Assortment 3
Scale factorMeanLast purchaseItem preferenceBrand asymmetrySize asymmetrySequenceProximity
[1.00]b
2.0675***2.8310***0.2805-0.0841- d
0.8332
1.2498***[2.5840***]c
[3.5382***]c
0.4228**-0.08800.3672**1.0303***
1.2627***[2.6106***]c
[3.5747***]c
0.5400*0.0169-0.11900.6235
Scale factorMeanLast purchaseItem preferenceBrand asymmetryTaste asymmetryType asymmetrySequenceProximity
[1.00]b
0.6441***5.2011***0.0077-0.02600.3119-0.33112.0041***
1.0562***[0.6803***]c
[5.4934***]c
0.61300.2938**-0.0614-0.06950.7214
0.7573***[0.4888***]c
[3.9109***]c
0.0969-0.15960.3816**0.6190***4.1140***
(Excluding brand/size constants)
2. Multinomial Logit Model
• Limitations of the MNL model:– Independence of Irrelevant Alternatives
(proportional substitution pattern)– Order (where relevant) is not taken into account– Systematic taste variation can be represented,
not random taste variation– No correlation between error terms (iid errors)
2. Multinomial Logit Model
• Independence of irrelevant alternatives• Ratio of choice probabilities for 2 alternatives i
and j does not depend on other alternatives (see above)
• Implication: proportional substitution patterns• Cf. Blue Bus – Red Bus Example
– T1: Blue bus (P=50%), Car (P=50%)
– T2: Blue bus (P=33%), Car (P=33%),Red bus (P=33%)
2. Multinomial Logit Model
• Independence of irrelevant alternativesNew alternatives – or alternatives for which utility has increased - draw proportionally from all other alternatives
• Elasticity of Pni wrt variable xnj
njnjkkni
njk
njk
nixi
ninjknjk
ni
PxP
x
x
PE
PPx
P
njk ,,
,,
,
,
2. Multinomial Logit Model
• Independence of irrelevant alternatives
Hausman-McFadden specification test
Basic idea: if a subset of the choice set is truly irrelevant, omitting it should not significantly affect the estimates.
2. Multinomial Logit Model
• Independence of irrelevant alternativesHausman-McFadden specification test Procedure:
-Estimate logit model twice: a. on full set of alternatives b. on subset of alternatives (and subsample with choices from this set) -When IIA is
true,
)²(~1' kbaabba
2. Multinomial Logit Model
• Independence of irrelevant alternatives Alternative Procedure:-Estimate logit model twice: a. on full set of alternatives b. on subset of alternatives (and subsample with choices from this set)- compute LL for subset b with parameters obtained for set a - Compare with LLb: GoF should be similar
2. Multinomial Logit Model
• Solutions to IIA– Model with attribute-specific constants
(intrinsic preferences)– Nested Logit Model– Models that allow for correlation among the
error terms, such as Probit Models