Microeconometrics MECT2 Richard Blundell UCL Lecture 7 ...uctp39a/MECT2 - 09... · Richard Blundell UCL Lecture 7: Censored Data Models March 2009 (i) Censored Data Models I Censored

MicroeconometricsMECT2

Richard BlundellUCL

Lecture 7: Censored Data Models

March 2009

(i) Censored Data Models

I Censored and truncated data

Examples:

earnings

hours of work (mroz.dta is a good data set to play with)

top coding of wealth

expenditure on cars (this was James Tobin's original example which became

know as Tobin's Probit model or the Tobit model.

I Typical de�nitions:

Censored data includes the censoring points

Truncated data excludes the censoring points

I A mixture of discrete and continuous processes. In general we should model

the process of censoring or truncation as a separate discrete mechanism. This is

what is known as a `selectivity' model. To begin with we have a model in which the

two processes are generated from the same underlying continuous latent variable

model e.g. corner solution models in economics.

y�i = x0i� + ui

with

yi =

8>: y�i if y�i > 0

0 otherwise

or

yi =

8>: y�i if ui > �xi�

0 otherwise

I Sometimes de�ne Di

Di =

8>: 1 if y�i > 0

0 otherwise

as in the Probit model.

The general speci�cation for the censored regression model is

y�i = xi� + ui

yi = maxf0; y�i g

where y� is the unobservable underlying process (similar to what was used with

discrete choice models) and y is the data observation.

� When u are normally distributed - ujx s N (0; �2) - the model is called Tobin's

Probit or the Tobit model.

� Note that

P (y > 0jx) = P (u > �x0�jx) = ��x0�

�

�

� Consider the moments of the truncated normal.

I Assume w v N (0; �). Then wjw > c where c is an arbitrary constant, is a

truncated normal.

I The density function for the truncated normal is:

f (wjw > c) = f (w)1� F (c)

=1��w�

�1� �

�c�

�� where f is the density function of w and F is the cumulative density function ofw.

IWe can now write

E(wjw > c) =Z 1c

wf (wjw > c)dw

= ��c�

�1� �

�c�

�� Applying this result to the regression model yields

E(yjx; y > 0) = x0� + E(uju > �x0�) = x0� + ��x0��

��x0��

�where �(w)=�(w) is called the Inverse Mills Ratio, usually represented by �(w).

I Notice that, contrary to the discrete choice models, the variance of the residual

plays a central role here: it determines the size of the partial effects.

OLS Bias I Truncated Data:

I Suppose one uses only the positive observations to estimate the model and the

unobservables are normally distributed. Then, we have seen that,

E(yjx; y > 0) = x0� + ��x0�

�

�where the last term is E(ujx; u > �x0�), which is generally non-zero.

I A model of the form:

y = x0� + �� + v

would have E(vjx; y > 0) = 0:

I This implies the inconsistency of OLS: omitted variable problem. Thus, the re-

sulting error term will be correlated with x.

Censored Data:

I Now suppose we use all observations, both positive and zero.

I Under normality of the residual, we obtain,

E(yjx) = ��x0�

�

�x0� + ��

�x0�

�

�I Thus, once again the OLS estimates will be biased and inconsistent.

The Tobit Maximum Likelihood Estimator

I Let f(yi; xi); i = 1; :::; Ng be a random sample of data on the model.

The contribution to the likelihood of a zero observation is determined by,

P (yi = 0jxi) = 1� ��x0i�

�

�The contribution to the likelihood of a non-zero observation is determined by,

f (yijxi) =1

��

�yi � x0i��

�which is not invariant to �.

Thus, the overall contribution of observation i to the loglikelihood function is,

ln li(xi; �; �) = 1(Di = 0) ln

�1� �

�x0i�

�

��+ 1(Di = 1) ln

�1

��

�yi � x0i��

��

and the sample loglikelihood is,

lnLN(�; �) =NXi=1

�(1�Di) ln

�1� �

�x0i�

�

��+Di

�ln�

�yi � x0i��

�� ln�

��where D equals one when y = 1 and equals zero otherwise.

I Notice that both � and � are separately identi�ed. Moreover, if D = 1 for all i, the

ML and the OLS estimators will be the same.

I FOC

@ lnL@�

=

NXi=1

1

�2

8>:Di(yi � x0i�)xi � (1�Di)��x0i��

�1� �

�x0i��

�xi9>=>;

@ lnL@�2

=

NXi=1

8>:(1�Di)xi��

�x0i��

�2�2

h1� �

�x0i��

�i +Di �(yi � x0i�)22�4

� 12�2

�9>=>;

Or write as:

(1)@ lnL@�

= �Xi20

1

�2��i1� �i

xi +1

�2

Xi2+(yi � x0i�)xi

(2)@ lnL@�2

=1

2�2

Xi20

xi��i1� �i

+1

2�4

Xi2+(yi � x0i�)2 �

N+2�2

note that �0

2�2x (1) + (2)!

b�2 = 1N+

Xi2+(yi � x0i�)2

that is the positive observations only contribute to the estimation of �:

I Also if we de�ne mi � E(y�i jyi) then we can write (1) as@ lnL@�

= cNXi=1

xi(mi � x0i�)

orNXi=1

ximi =NXi=1

xix0i�

which de�nes an EM algorithm for the Tobit model. Note also that

mi =

8>: y� if y�i > 0

x0i� � ��i1��i otherwise

again replacing y� with its best guess, given y; when it is unobserved.

I Using the Theorems 1 and 2 from Lecture 6, MLE of � and �2 is consistent and

asymptotically normally distributed.

I The asymptotic covariance matrix is derived from the expected values of the 2nd

partial derivatives of lnL,

�

264 E @2 lnL@�2 E @2 lnL@�@�2: E @

2 lnL@�2

375 =264PNi=1 aixix0i PNi bixi

:PN

i=1 ci

375where

ai = �1

�2

�zi�i �

�2i1� �i

� �i�

bi = �1

2�3

�z2i �i �

zi�2i

1� �i+ �i

�ci = �

1

4�4

�z3i �i + zi�i �

zi2�2i

1� �i� 2�i

�and �i and �i are evaluated at zi = xi�=�.

LM or Score Test

I Let the log likelihood be written

lnL(�1; �2)

where �1 is the set of parameters that are unrestricted under the null hypothesis and

�2 are k2 restricted parameters under H0.

H0 : �2 = 0

H1 : �2 6= 0

I e.g.

y�i = x01i�1 + x

02i�2 + ui with ui � N(0; �2):

where �1 = (�01; �2)0 and �2 = �2:

@ lnL(�1; �2)@�

=X @ ln li(�1; �2)

@�

or

S (�) =X

Si(�)

I Let b� be the MLE under H0: Then1pNS(b�) �a N(0; H)

therefore

1

NS(b�)0H�1S(b�) �a �2(k2)

In the Tobit model consider the case of H0 : �2 = 0

@ lnL@�2

=1

�2

Xi

Di(yi � x0i�)x2i �1

�2

Xi

(1�Di)�i�i1� �i

x2i

@ lnL@�2

=1

�2

Xi

e(1)i x2i

where

e(1)i = Di(yi � x0i�) + (1�Di)(�

�i�i1� �i

)

is known as the �rst order `generalised' residual, which reduces to ui = yi � x0i�

in the general linear model case.

This kind of Score or LM test can be extended to speci�cation tests for heteroskedas-

ticity and for non-normality. Notice that is estimation under the alternative is avoided,

at least in terms of the test statistic. If H0 is rejected then estimation under Ha is

unavoidable.

I Consider the normal distribution

f (ui) =1

�p2�exp

��12

u2i�2

�can be written in terms of log scores

@ ln f (ui)

@ui= �ui

�2:

I A popular generalisation (Pearson family of distributions) is@ ln f (ui)

@ui=

�ui + c1�2i � c1ui + c2u2i

where skedastsic function �2i = h(0 + 01zi); zi observable determinants of het-

eroskedasticity.

c1 6= 0! skewness

c2 6= 0! kurtosis

c1 = c2 = 0! Normal

1 = 0! homoskedastic

I Can write out the loglikelihood with the Pearson family and take derivatives with

respect to the c and parameters to �nd the LM or Score test. e.g.

@ lnL@1

= �Xi

e(2)i zi

where e(2)i is the second order generalised residual.

I Also@ lnL@c2

=1

4�2

Xi

Di(u4i �

Z 1�x0i�

t4fdt)

which is the 4th order generalised residual.

What if normality is rejected or not a good prior assumption anyway?

Suppose we just assume symmetry:

We can write the model as

y�i = x0i� + ui; or

yi = x0i� + u

�i ; where

u�i = max fui;�x0i�g

We can de�ne the new residuals

u��i = min fu�i ; x0i�g

where the x0i� re�ects `upper' trimming. Drop observations where x0i� 6 0 as no

symmetric trimming is possible here.

� Adapt least squares by replacing y byy�i = min fyi; 2x0i�g

! symmetrically censored least squares: Applying OLS for all i : xi� > 0yields consistent and asymptotically normal estimates: the error term now satis-�es E(u��jx) = 0.

� Requires a symmetric distribution of the error term, u�, but no normality or ho-moskedasticity.

� Estimation requires an iterative procedure. But can adapt EM algorithmb� = �X xix0i��1X ximiwith

mi = minfyi; 2x0i�g

� Monte-Carlo results.

Censored Least Absolute Deviations

Assume: conditional median of ui is zero! median of yi is

x0i�:1(x0i� > 0)

CLAD minimises the absolute distance of yi from its median

b�CLAD = argmin�

Xjyi � x0i�:1(x0i� > 0)j

Powell (1984) shows that b�CLAD is pN� consistent and asymptotically normal.I Can you think of a LAD estimator for the binary response case?

I How does it relate to Maximum Score?

Endogenous Variables

Consider the following (triangular) model

y�1i = x01i� + y2i + u1i (1)

y2i = z0i�2 + v2i (2)

where y1i = y�1i1(y�1i > 0). z0i = (x01i; x02i): The x02i are the excluded `instruments' from

the equation for y1: The �rst equation is a the `structural' equation of interest and

the second equation is the `reduced form' for y2:

I y2 is endogenous if u1 and v2 are correlated. If y1 was fully observed we could

use IV.

Use the following othogonal decomposition for u1

u1i = �v2i + �1i

where E(�1ijv2i) = 0:

I Note that y2 is uncorrelated with u1i conditional on v2: The variable v2 is some-

times known as a control function.

I Under the assumption that u1 and v2 are jointly normally distributed, u2 and � are

uncorrelated by de�nition and � also follows a normal distribution.

Use this to de�ne the augmented model

y�1i = x01i� + y2i + �v2i + �1i

y2i = z0i�2 + v2i

2-step Estimator:

I Step 1: Estimate � by OLS and predict v2,

bv2i = y2i � b�02ziI Step 2: use bv2i as a `control function' in the model for y�1 above and estimate byTobit or other consistent method.

An Exogeneity test

The null of exogeneity in this model is analogous to

H0 : � = 0

A test of this null can be performed using a t-test.

I Smith-Blundell (1986, Econometrica).

I Speci�cally for the censored regression model (Tobit model).

I All this discussion of endogenous regressors follows for the binary choice (see

Lecture 6 Notes) and other related models.

For example, consider the following (triangular) binary choice model

y�1i = x01i� + y2i + u1i (3)

y2i = z0i�2 + v2i (4)

where y1i = 1(y�1i > 0). z0i = (x01i; x02i): The x02i are the excluded `instruments' from

the equation for y1: The �rst equation is a the `structural' equation of interest and

the second equation is the `reduced form' for y2:

I y2 is endogenous if u1 and v2 are correlated.

I If y1 was fully observed we could use IV.

I Blundell and Powell (2004) extend to semiparametric case.

Documents

Microeconometrics MECT2 Richard Blundell UCL Lecture 7 ...uctp39a/MECT2 - 09... · Richard Blundell UCL Lecture 7: Censored Data Models March 2009 (i) Censored Data Models I Censored