42
Lecture 1: Models for Binary Outcomes Luc Behaghel PSE January 2009 Luc Behaghel (PSE) Binary outcomes January 2009 1 / 41

January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Lecture 1: Models for Binary Outcomes

Luc Behaghel

PSE

January 2009

Luc Behaghel (PSE) Binary outcomes January 2009 1 / 41

Page 2: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Limited dependent variables

Variables that are �naturally� limited:

Binary outcomes: e.g. married / not married; employed / notemployed; smoker / non smoker;Other discrete outcomes: e.g. number of years of schooling; number ofnew hires in a �rm in a given month; very satis�ed / rather satis�ed /unsatis�ed; preferred transportation (bus / car / train);Non-negative variables: e.g. income; number of hours worked.

Variables recorded as limited although the underlying outcome is not:

Discretized variables: e.g. Income recorded in brackets;Censored variables: e.g. Top-coded income.

) Frequent! Are all dependent variables limited?

Luc Behaghel (PSE) Binary outcomes January 2009 2 / 41

Page 3: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Binary outcomes

Most extreme case of limited dependent variables: y can take only twovalues, usually denoted 0 and 1

y =�0 if unemployed1 if employed

y =�0 if high-school dropout1 if high-school graduate

) Linear models (OLS, 2SLS) inappropriate?

Luc Behaghel (PSE) Binary outcomes January 2009 3 / 41

Page 4: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Outline

1 ask what is the issue with linear models;2 present models that have become standard in such contexts (logit andprobit);

3 explain how these statistical models can (sometimes) be derived froma theoretical framework;

4 introduce a new estimation method used for these non linear models:(conditional) maximum of likelihood estimation (CMLE);

5 discuss how estimates are to be interpreted.

Luc Behaghel (PSE) Binary outcomes January 2009 4 / 41

Page 5: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

A speci�cation issueReminder on Conditional Expectation Functions

CEFs are a tool to answer �factual questions�and �conditionalcounterfactual questions�.

E.g. E (y jT ,C ) with C : �controls"; T : �treatment�, �variable ofinterest�. Two types of parameters of interest:

T is discrete: discrete e¤ect of going from T0 to T1

E (y jT = T1,C = C0)� E (y jT = T0,C = C0).

T is continuous: marginal e¤ect:

∆T0 jC0 �∂E (y jT = T0,C = C0)

∂T.

Luc Behaghel (PSE) Binary outcomes January 2009 5 / 41

Page 6: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

We don�t know the funtional form g in E (y jT ,C ) � g(T ,C ). Hard toestimate if T and C can take many values.

) parametric �assumption� (rather: linear approximation of the CEF):

E (y jT ,C ) = Cβ+ αT .

Rem.

1 Flexible approximation: g(.) needs not be linear in T and C . E.g.E (y jT ,C ) = Cβ+ α1T + α2T 2.

2 The marginal e¤ect can depend on where the e¤ect is evaluated. e.g.∂E (y jT=T0,C=C0)

∂T = α1 + 2α2T0.

Luc Behaghel (PSE) Binary outcomes January 2009 6 / 41

Page 7: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

CEF when y is binary

1 The CEF is a �CPF�: a probability that takes value between 0 and 1

E (y jx) = 1� Pr(y = 1jx) + 0� Pr(y = 0jx)= Pr(y = 1jx).

) why not use this a priori information?2 The predicted probabilities in a linear model can take values outsideof [0; 1]) lack of internal consistency.

So is a linear probability model Pr(y = 1jx) = xβ still useful? Angrist andPischke (2009): yes; Wooldridge (2002): not the best solution.It depends...

Luc Behaghel (PSE) Binary outcomes January 2009 7 / 41

Page 8: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Two examples

1 A randomized experiment with binary treatment and binaryoutcome. Unemployed workers are randomly allocated to a programthat provides intensive counseling (T = 1) or to the standard track,with less intensive interventions (T = 0). The outcome is theemployment status after 6 months: employed (y = 1) or not (y = 0).

2 The relationship between a continuous variable (age) andwomen�s labor supply. Women�s labor supply is a classic example ofbinary variables. Here: a simple descriptive analysis based on theFrench Labor Force Survey (Enquête emploi). Exercise: replicateAngrist and Evans (1998) on the impact of the number of children onfemale labor force participation.

Luc Behaghel (PSE) Binary outcomes January 2009 8 / 41

Page 9: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Two examples

1 A randomized experiment with binary treatment and binaryoutcome. Unemployed workers are randomly allocated to a programthat provides intensive counseling (T = 1) or to the standard track,with less intensive interventions (T = 0). The outcome is theemployment status after 6 months: employed (y = 1) or not (y = 0).

2 The relationship between a continuous variable (age) andwomen�s labor supply. Women�s labor supply is a classic example ofbinary variables. Here: a simple descriptive analysis based on theFrench Labor Force Survey (Enquête emploi). Exercise: replicateAngrist and Evans (1998) on the impact of the number of children onfemale labor force participation.

Luc Behaghel (PSE) Binary outcomes January 2009 8 / 41

Page 10: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Speci�cation (continued)Example 1: Saturated case

Both the independent variable of interest (T ) and the outcome (y) arebinary. The data can be summarized in a simple two-way table:

y = 0 y = 1

T = 0 150 150

T = 1 100 200

The CEF is extremely simple:

E (y jT ) =�E (y jT = 1) for the treatedE (y jT = 0) for the controls

Luc Behaghel (PSE) Binary outcomes January 2009 9 / 41

Page 11: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Therefore, we can write, without making any restriction:

E (y jT ) = E (y jT = 0) + (E (y jT = 1)� E (y jT = 0))� T� α0 + α1T .

) �linear probability model�(LPM):

E (y jT ) = Pr(y = 1jT ) = α0 + α1T .

What is your estimate of α1?

Luc Behaghel (PSE) Binary outcomes January 2009 10 / 41

Page 12: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Estimation of LPMs

Linear model ) OLS or 2SLS.

Heteroskedasticity. Pay attention to the standard errors: they need tobe robust to heteroskedasticity, as the structure of the model impliesheteroskedasticity. You can check it in the example:

Var(ujT ) = E (u2jT )� (E (ujT ))2

= ...

= Pr(y = 1jT )(1� Pr(y = 1jT )).

The variance of the residual depends on T .) In practice, robust standard errors computed by standard packages(e.g., in Stata, just add the option �, robust�).

Luc Behaghel (PSE) Binary outcomes January 2009 11 / 41

Page 13: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

A note on saturated models

E (y jT ) = E (y jT = 0) + (E (y jT = 1)� E (y jT = 0))� T� α0 + α1T .

is an example of �saturated model�. It is saturated in the sense that weintroduce as many parameters to estimate as there are distinct values thatthe CEF can take. As a consequence, the model does not approximate theCEF; it describes it fully. Of course, this is feasible when the CEF can takeonly a �nite (and not too large) number of values. Here, the CEF takesonly two values.Question: what would be the �saturated model� in a slightly morecomplicated experiment, where there are two treatments that are randomlyand independently allocated: (i) counseling (T = 0 or 1); (ii) eligibility toa bonus payment if the worker �nds a job within six months (B = 1 ifeligible, 0 otherwise)? Write the model so that we can assumeE (ujT ,B) = 0 without actually imposing any constraint.

Luc Behaghel (PSE) Binary outcomes January 2009 12 / 41

Page 14: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Speci�cation (continued)Example 2: A continuous regressor

Non

 em

ploy

edE

mpl

oyed

Em

ploy

men

t

20 40 60 80Age

bandwidth = .05

Non parametric estimation

Figure: Employment status according to age. Non parametric estimation usinglocally weighted regressions.

Luc Behaghel (PSE) Binary outcomes January 2009 13 / 41

Page 15: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Pr(y = 1jx) = xβ

Non

 em

ploy

edEm

ploy

ed

20 40 60 80Age

Non­parametric estimation OLS ageOLS age, age^2 OLS age, age^2, age^3

Linear probability model

Figure: Fit using a linear probability model

Luc Behaghel (PSE) Binary outcomes January 2009 14 / 41

Page 16: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Pr(y = 1jx) = Φ(xβ)

Non

 em

ploy

edEm

ploy

ed

20 40 60 80Age

Non­parametric estimation PROBIT agePROBIT age, age^2 PROBIT age, age^2, age^3

Probit model

Figure: Fit using a Probit model

Luc Behaghel (PSE) Binary outcomes January 2009 15 / 41

Page 17: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Pr(y = 1jx) = Λ(xβ) � exp(xβ)

1+ exp(xβ)

Non

 em

ploy

edEm

ploy

ed

20 40 60 80Age

Non­parametric estimation LOGIT ageLOGIT age, age^2 LOGIT age, age^2, age^3

Logit model

Figure: Fit using a Logit model

Luc Behaghel (PSE) Binary outcomes January 2009 16 / 41

Page 18: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Index LPM, Logit and Probit are based on a linear function of x : xβ,called the index. Probit and Logit use non linear transformations (Φ or Λ)to ensure that the probability takes its values between 0 and 1. Otherfunctions are possible.Summary LPM, Logit and Probit can do a good job to summarize theCEF, as long as they are used in a �exible way. Logit and Probit constrainprobabilities to be between 0 and 1. This is an advantage over LPM whenthe model is not saturated, as it provides internal consistency.

Luc Behaghel (PSE) Binary outcomes January 2009 17 / 41

Page 19: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Latent model interpretation of Logit and Probit

Classic example: the labor supply of women8<:max u(c , 1� l)s.t. c = R + wl

l � 0

(l : labor, R : other sources of income, c : consumption, 1� l : leisure).Lagrangian (with multiplier µ)

L = u(R + wl , 1� l) + µl .

CPO: �wuc � ul + µ = 0

µl = 0

Luc Behaghel (PSE) Binary outcomes January 2009 18 / 41

Page 20: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Two cases:

1 If µ > 0, then l = 0 and ucul(R, 1) < 1

w ;

2 If µ = 0, then l > 0 and ucul(R + wl , 1� l) = 1

w .

Interpretation: two groups of women.

1 ucul(R, 1) < 1

w : starting from a situation with 0 hour worked, themarginal bene�t of working, wuc (evaluated at (R, 1)), is lower thanthe marginal bene�t of taking another hour of leisure, ul .

2 ucul(R, 1) > 1

w : these women have a net bene�t of working a �rsthour. They increase their hours until reaching l� such thatucul(R + wl�, 1� l�) = 1

w .

) economic model for employment indicator y :

y =�0 if ucul (R, 1)�

1w � 0

1 if ucul (R, 1)�1w > 0

(1)

Luc Behaghel (PSE) Binary outcomes January 2009 19 / 41

Page 21: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

From the economic to the statistical modelAssume

1 Statistical model for the MRS and the wages:

ucul(R, 1) = αR + Zγ+ ε

1w

= W δ+ η

(Z and W : observed determinants of MRS and wages; ε and η :unobserved determinants). Note u � ε� η:

y =�0 if xβ+ u � 01 if xβ+ u > 0

(with xβ � αR + Zγ+W δ).2 Distributional assumption for u given x

ujx � N(0, 1)

Luc Behaghel (PSE) Binary outcomes January 2009 20 / 41

Page 22: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

) Statistical model:

y =

�0 if xβ+ u � 01 if xβ+ u > 0

ujx � N(0, 1)

) conditional probability functions:

Pr(y = 1jx) = Pr(�u < xβ)= Φ(xβ)

= Probit model.

Luc Behaghel (PSE) Binary outcomes January 2009 21 / 41

Page 23: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Note 1: Structural approach 6= descriptive approach.

(theoretical model ) statistical speci�cation) vs. (data ) statisticalspeci�cation).

Two types of assumptions to derive the statistical model:

parametric assumptions on functional forms or distributions (linearity,normality)more fundamental independence assumptions on the correlationsbetween unobserved and observed variables (independence).

) a good model is robust to the choice of alternative parametricassumptions (as these choices are arbitrary) and has a good economicrationale to justify the independence assumptions.

Luc Behaghel (PSE) Binary outcomes January 2009 22 / 41

Page 24: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Note 2: Structural and reduced form parametersIdeally, you don�t write a theoretical model whose only job is to justifyyour statistical speci�cation. The theoretical model should really guide youon what variables to introduce, and whether or not they are likely to becorrelated with the error term; moreover, one would like to be able to goback from the empirical model to the structural parameters of thetheoretical model. One such parameter, for instance, would be theelasticity of substitution between leisure and consumption. Here, we havenot su¢ ciently developed the model for that, and the connection betweenthe theoretical and the statistical model is loose. All we get in β arereduced-form parameters (i.e. they don�t have a direct economicinterpretation).

Luc Behaghel (PSE) Binary outcomes January 2009 23 / 41

Page 25: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

�Latent�modelGeneral latent (index) model:

y � = xβ+ u

with �y = 1 if y � > 0y = 0 if y � � 0

y , not y �, is observed. y � is a �latent� variable we use to interpret whatwe observe. When y � increases, the probability to observe y = 1 increases,and vice-versa. So y � can generally be interpreted as a propensity to haveoutcome y = 1.Our example y = 1 if wuc � ul > 0; y � = wuc � ul .) y � is the marginal net bene�t of increasing labor supply from 0 to 1hour.

Luc Behaghel (PSE) Binary outcomes January 2009 24 / 41

Page 26: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Maximum of likelihood estimation

Assumptions

We observe a random sample ((y1, x1), (y2, x2), ..., (yn, xn))

The sample is drawn from a distribution with joint density f (y , x).

This density is known up to some �nite parameter, β.

The maximum likelihood estimator of β is the value of β thatmaximizes the probability that the sample((y1, x1), (y2, x2), ..., (yn, xn)) was observed.

Luc Behaghel (PSE) Binary outcomes January 2009 25 / 41

Page 27: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Example 1 : y is drawn in a binomial distribution: 1 with probability p and0 with probability 1� p. Here, there are no x 0s and the only unknownparameter is p. We want to estimate p by ML based on a sample wherethe frequency of 1�s is y = 0.4. The likelihood is the probability to obtainthis sample, as a function of p:

L(y1, ..., yn) =n

∏i=1Pr(y = yi )

= p0.4n(1� p)0.6n

(the �rst line uses the fact that the draws are independent). bpMLE is thevalue of p that maximizes p0.4n(1� p)0.6n. It is often easier to maximizelog L(y1, ..., yn). Check that you get

bp = 0.4 = y .The ML estimator of p is simply the empirical frequency of 1�s in thesample.

Luc Behaghel (PSE) Binary outcomes January 2009 26 / 41

Page 28: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Unconditional likelihood of a random sample((y1, x1), (y2, x2), ..., (yn, xn)):

log L(y , x jγ) = ∑ilog f (yi , xi jγ).

) requires to speci�cy the joint density f .Conditional likelihood Model only how y depends on x :

log L(y , x jγ) = ∑ilog f (yi , xi jγ)

= ∑ilog [f (yi jxi ,γ)f (xi jγ)]

= ∑ilog f (yi jxi ,γ) +∑

ilog f (xi jγ)

= log L(y jx ,γ) +∑ilog f (xi jγ).

Luc Behaghel (PSE) Binary outcomes January 2009 27 / 41

Page 29: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

log L(y , x jγ) = log L(y jx ,γ) +∑ilog f (xi jγ)

log L(y jx ,γ) = conditional log likelihoodAssumption: x is exogenous (it is determined independently of y) ) itsdistribution depends on a subset of the γ parameters, δ, and theconditional distribution of y jx depends on another subset, β.Then

log L(y , x jγ) = log L(y jx , β) +∑ilog f (xi jδ).

To estimate β, maximizing log L(y jx , β) is su¢ cient. This requires only tospeci�y the conditional density of y given x .

Luc Behaghel (PSE) Binary outcomes January 2009 28 / 41

Page 30: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Example 2: In a Probit model, this is exactly what we have speci�ed: theconditional distribution of y given x . y can take only two values, withprobabilities: �

Pr(y = 1jx) = Φ(xβ)Pr(y = 0jx) = 1�Φ(xβ)

The conditional log-likelihood is:

log L(y jx , β) = ∑iyi logΦ(xi β) +∑

i(1� yi ) log [1�Φ(xi β)] .

Example 3: In the case of a continuous variable y , assume that

y jx � N(xβ, 1)

What does it imply for the CEF E (y jx)? Would OLS provide a validestimate of β? Show that maximizing the log-likelihood gives the sameresult as applying OLS.

Luc Behaghel (PSE) Binary outcomes January 2009 29 / 41

Page 31: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Maximizing the log-likelihood in practiceUsually no closed-form solution ) algorithms that try di¤erent values ofthe vector β in order to �nd one such that

∂ log L(y jx , β)∂β

= 0.

= �rst-order conditions (FOC).There are also second-order conditions (SOC): matrix of second derivativesnegative de�nite (think of the case where β is a scalar). Algorithms usethe �rst derivatives (a vector called the score, or gradient) and the secondderivatives (a matrix called the Hessien) to arrive at a point where theFOC and the SOC are met.

Luc Behaghel (PSE) Binary outcomes January 2009 30 / 41

Page 32: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Most of the times, things go well and we don�t care too much on how thealgorithm actually works (we type the �probit� command in Stata, and letStata maximize the log-likelihood with its favorite algorithm). But it isimportant to see potential problems. In particular, there may be severallocal maxima, and the algorithm might be stuck on a local maximum thatis not the global maximum. A case where the algorithm has no problem isthe case where the Hessien is negative de�nite everywhere. In that case,there is only one local maximum, which is the global maximum. (In thescalar case, this means that the log-likelihood is strictly concave).Therefore, ideally, when one uses ML, one would like to check if thelog-likelihood has the right properties.

Luc Behaghel (PSE) Binary outcomes January 2009 31 / 41

Page 33: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

The Logit is a relatively easy case to show that the (conditional)log-likelihood has the good properties.

log L(y jx , β) = ∑iyi logΛ(xi β) +∑

i(1� yi ) log [1�Λ(xi β)] ,

with Λ(u) = eu/(1+ eu).

We want to compute the gradient

g(β) =∂ log L(y jx , β)

∂β

and the Hessien

H(β) =∂g(β)

∂β0.

The computations are made easier by the fact that

∂Λ(u)∂u

=eu(1+ eu)� eueu

(1+ eu)2=

eu

(1+ eu)2

= Λ(u)(1�Λ(u)).

Luc Behaghel (PSE) Binary outcomes January 2009 32 / 41

Page 34: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Then:

g(β) = ∑i

�yi

Λ(xi β)(1�Λ(xi β))Λ(xi β)

� (1� yi )Λ(xi β)(1�Λ(xi β))

1�Λ(xi β)

�x 0i

= ∑i[yi �Λ(xi β)] x 0i

andH(β) = �∑

iΛ(xi β)(1�Λ(xi β))x 0i xi .

Luc Behaghel (PSE) Binary outcomes January 2009 33 / 41

Page 35: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Asymptotic properties of CMLEUnder some regularity conditions (OK for Probit and Logit)

1 Consistency: As sample size grows, the estimate comes closer andcloser to the true value:

p lim bβML = βo .

2 Asymptotic normality: The distribution of bβML tends toward anormal: p

n(bβML � βo )! N(0,V )

with V = �E [H(βo )]�1.3 Asymptotic e¢ ciency: bβML is asymptotically e¢ cient (it has thesmallest variance).

Note 1: These properties assume that we have properly speci�ed theconditional density. However, robust to minor speci�cation errors (if theCEF is well speci�ed).Note 2: Softwares directly give an estimate of the asymptotic variance,Avar(βML) =

1nbV . It is possible to ask for robust variances, for instance to

account for correlations across observations.Luc Behaghel (PSE) Binary outcomes January 2009 34 / 41

Page 36: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Interpreting the results of binary models

The coe¢ cients in the index are not directly interpretable. Key: estimatemarginal e¤ects:

∆T0 jC0 =∂Pr(y = 1jT = T0,C = C0)

∂T

=

8><>:∂(C βLP+αLPT )

∂T (C0,T0) = αLP (LPM)

∂Φ(C βP+αPT )∂T (C0,T0) = αPφ(C0βP + αPT0) (Probit)

∂Λ(C βL+αLT )∂T (C0,T0) = αLΛ(C0βL + αLT0)(1�Λ(C0βL + αLT0)) (Logit)

Rem: The marginal e¤ects have the same sign as the coe¢ cients. In theLPM, the coe¢ cient is the approximation to the marginal e¤ect. In theProbit and Logit cases, the coe¢ cient must be multiplied by a quantitythat depends on where we are evaluating the e¤ect.

Luc Behaghel (PSE) Binary outcomes January 2009 35 / 41

Page 37: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Reporting the index coe¢ cients onlyEasiest solution. Drawback: the magnitude of a Probit or a Logitcoe¢ cient has no direct interpretation. But:

The sign of the marginal e¤ect is the same as the sign of the indexcoe¢ cient. So reporting the index coe¢ cient is enough if one is justinterested in the direction of the e¤ect (rarely the case ineconomics!).

The ratio of two index coe¢ cients is equal to the ratio of thecorresponding marginal e¤ects. E.g.�

∂Φ(CβP + αPT )∂T

(C0,T0)�

/�

∂Φ(CβP + αPT )∂C

(C0,T0)�=

αPβP

(and the same for the Logit). αPβP= 2 would thus mean that the

impact of T (on the probability that y = 1) is twice as large as theimpact of C . Note that this holds independently of where we measurethe marginal e¤ects (that is, which values we choose for C0 and T0).

Luc Behaghel (PSE) Binary outcomes January 2009 36 / 41

Page 38: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Reporting marginal e¤ects for a reference individualCompute marginal e¤ects for a given individual, using the coe¢ cientestimates. Issue: choose a particular reference (C0, T0).

A �rst possibility is to consider a (�ctive) individual for whomPr(y = 1) = .5. In that case, in the Logit model, the multiplicativeterm is Pr(y = 1)(1� Pr(y = 1)) = .25, and the multiplicative termin the Probit model is φ(0) ' .40. In other words, to transform thecoe¢ cients into marginal e¤ects, we can multiply them by .25 (Logit)or .4 (Probit).

Another possibility is to choose the �average individual�, i.e. to set C0and T0 at the sample means. This is what software like Stata dowhen they compute marginal e¤ects (command �dprobit�). Note thatif C or T is a dummy variable, choosing C0 = C or T0 = T makeslittle sense, and it might be preferable to choose C0 = 0 or 1, andsame for T0.

Luc Behaghel (PSE) Binary outcomes January 2009 37 / 41

Page 39: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Example: Back to the labor supply of women. Assume that T is anindicator variable for having a child aged less than 3 or not, and C is thewoman�s age. One could typically compute the marginal e¤ect of age for awoman of average age (C0 = C ) with no child under 3 (T0 = 0). Also, forthe impact of having a child under three, rather than a marginal e¤ect,one would compute

Pr(y = 1jC = C ,T = 1)� Pr(y = 1jC = C ,T = 0).

(Again, Stata can do the computation for you, using the appropriateoptions with the �dprobit� command).

Luc Behaghel (PSE) Binary outcomes January 2009 38 / 41

Page 40: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Note 1: Standard errors for marginal e¤ects in Probit and Logit modelsare a complex, non linear functions of all the coe¢ cients ) compute alinear approximation of the standard error, using the Delta method (a�rst-order Taylor expansion). This is done by Stata.Note 2: The formulas for marginal e¤ects are only valid if the explanatoryvariables enter the index linearly. Counter-example: polynomial of age inthe labor supply equation. A software does not know that a variable is thesquare of another explanatory variable. So you cannot trust it, in thatcase, to derive the marginal e¤ects...

Luc Behaghel (PSE) Binary outcomes January 2009 39 / 41

Page 41: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Reporting average marginal e¤ectsWhy report the marginal e¤ect of age for a woman with no child under 3rather than for a woman with children under 3?) usual practice: report average marginal e¤ect over the sample:

∆ =1n

n

∑i=1

∂Pr(y = 1jT = Ti ,C = Ci )∂T

.

Complicated, but computed by computers.

Luc Behaghel (PSE) Binary outcomes January 2009 40 / 41

Page 42: January 2009 - federation.ens.frfederation.ens.fr/wheberg/parischoeco/formation... · CEF when y is binary 1 The CEF is a fiCPFfl: a probability that takes value between 0 and 1

Test of coe¢ cient signi�cance ) t-statistic (same as OLS):

bβMLbσ(bβML) ! N(0, 1)

under the null hypothesis that β = 0.

Test of multiple restrictions ) frequently used approach: �likelihoodratio� test. Estimate the unrestricted and the restricted model (applyingthe constraints). Log-likelihoods log Lur and log Lr , respectively. Likelihoodratio statistic

LR = 2(log Lur � log Lr ).LR ! χ2(q) under the null hypothesis (q : number of constraints).

Luc Behaghel (PSE) Binary outcomes January 2009 41 / 41