Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
MicroeconometricsMECT2
Richard BlundellUCL
Lecture 7: Censored Data Models
March 2009
(i) Censored Data Models
I Censored and truncated data
Examples:
earnings
hours of work (mroz.dta is a good data set to play with)
top coding of wealth
expenditure on cars (this was James Tobin's original example which became
know as Tobin's Probit model or the Tobit model.
I Typical de�nitions:
Censored data includes the censoring points
Truncated data excludes the censoring points
I A mixture of discrete and continuous processes. In general we should model
the process of censoring or truncation as a separate discrete mechanism. This is
what is known as a `selectivity' model. To begin with we have a model in which the
two processes are generated from the same underlying continuous latent variable
model e.g. corner solution models in economics.
y�i = x0i� + ui
with
yi =
8>: y�i if y�i > 0
0 otherwise
or
yi =
8>: y�i if ui > �xi�
0 otherwise
I Sometimes de�ne Di
Di =
8>: 1 if y�i > 0
0 otherwise
as in the Probit model.
The general speci�cation for the censored regression model is
y�i = xi� + ui
yi = maxf0; y�i g
where y� is the unobservable underlying process (similar to what was used with
discrete choice models) and y is the data observation.
� When u are normally distributed - ujx s N (0; �2) - the model is called Tobin's
Probit or the Tobit model.
� Note that
P (y > 0jx) = P (u > �x0�jx) = ��x0�
�
�
� Consider the moments of the truncated normal.
I Assume w v N (0; �). Then wjw > c where c is an arbitrary constant, is a
truncated normal.
I The density function for the truncated normal is:
f (wjw > c) = f (w)1� F (c)
=1���w�
�1� �
�c�
�� where f is the density function of w and F is the cumulative density function ofw.
IWe can now write
E(wjw > c) =Z 1c
wf (wjw > c)dw
= ���c�
�1� �
�c�
�� Applying this result to the regression model yields
E(yjx; y > 0) = x0� + E(uju > �x0�) = x0� + ���x0��
���x0��
�where �(w)=�(w) is called the Inverse Mills Ratio, usually represented by �(w).
I Notice that, contrary to the discrete choice models, the variance of the residual
plays a central role here: it determines the size of the partial effects.
OLS Bias I Truncated Data:
I Suppose one uses only the positive observations to estimate the model and the
unobservables are normally distributed. Then, we have seen that,
E(yjx; y > 0) = x0� + ���x0�
�
�where the last term is E(ujx; u > �x0�), which is generally non-zero.
I A model of the form:
y = x0� + �� + v
would have E(vjx; y > 0) = 0:
I This implies the inconsistency of OLS: omitted variable problem. Thus, the re-
sulting error term will be correlated with x.
Censored Data:
I Now suppose we use all observations, both positive and zero.
I Under normality of the residual, we obtain,
E(yjx) = ��x0�
�
�x0� + ��
�x0�
�
�I Thus, once again the OLS estimates will be biased and inconsistent.
The Tobit Maximum Likelihood Estimator
I Let f(yi; xi); i = 1; :::; Ng be a random sample of data on the model.
The contribution to the likelihood of a zero observation is determined by,
P (yi = 0jxi) = 1� ��x0i�
�
�The contribution to the likelihood of a non-zero observation is determined by,
f (yijxi) =1
��
�yi � x0i��
�which is not invariant to �.
Thus, the overall contribution of observation i to the loglikelihood function is,
ln li(xi; �; �) = 1(Di = 0) ln
�1� �
�x0i�
�
��+ 1(Di = 1) ln
�1
��
�yi � x0i��
��
and the sample loglikelihood is,
lnLN(�; �) =NXi=1
�(1�Di) ln
�1� �
�x0i�
�
��+Di
�ln�
�yi � x0i��
�� ln�
��where D equals one when y = 1 and equals zero otherwise.
I Notice that both � and � are separately identi�ed. Moreover, if D = 1 for all i, the
ML and the OLS estimators will be the same.
I FOC
@ lnL@�
=
NXi=1
1
�2
8>:Di(yi � x0i�)xi � (1�Di)���x0i��
�1� �
�x0i��
�xi9>=>;
@ lnL@�2
=
NXi=1
8>:(1�Di)xi�� �
�x0i��
�2�2
h1� �
�x0i��
�i +Di �(yi � x0i�)22�4
� 12�2
�9>=>;
Or write as:
(1)@ lnL@�
= �Xi20
1
�2��i1� �i
xi +1
�2
Xi2+(yi � x0i�)xi
(2)@ lnL@�2
=1
2�2
Xi20
xi��i1� �i
+1
2�4
Xi2+(yi � x0i�)2 �
N+2�2
note that �0
2�2x (1) + (2)!
b�2 = 1N+
Xi2+(yi � x0i�)2
that is the positive observations only contribute to the estimation of �:
I Also if we de�ne mi � E(y�i jyi) then we can write (1) as@ lnL@�
= cNXi=1
xi(mi � x0i�)
orNXi=1
ximi =NXi=1
xix0i�
which de�nes an EM algorithm for the Tobit model. Note also that
mi =
8>: y� if y�i > 0
x0i� � ��i1��i otherwise
again replacing y� with its best guess, given y; when it is unobserved.
I Using the Theorems 1 and 2 from Lecture 6, MLE of � and �2 is consistent and
asymptotically normally distributed.
I The asymptotic covariance matrix is derived from the expected values of the 2nd
partial derivatives of lnL,
�
264 E @2 lnL@�2 E @2 lnL@�@�2: E @
2 lnL@�2
375 =264PNi=1 aixix0i PNi bixi
:PN
i=1 ci
375where
ai = �1
�2
�zi�i �
�2i1� �i
� �i�
bi = �1
2�3
�z2i �i �
zi�2i
1� �i+ �i
�ci = �
1
4�4
�z3i �i + zi�i �
zi2�2i
1� �i� 2�i
�and �i and �i are evaluated at zi = xi�=�.
LM or Score Test
I Let the log likelihood be written
lnL(�1; �2)
where �1 is the set of parameters that are unrestricted under the null hypothesis and
�2 are k2 restricted parameters under H0.
H0 : �2 = 0
H1 : �2 6= 0
I e.g.
y�i = x01i�1 + x
02i�2 + ui with ui � N(0; �2):
where �1 = (�01; �2)0 and �2 = �2:
@ lnL(�1; �2)@�
=X @ ln li(�1; �2)
@�
or
S (�) =X
Si(�)
I Let b� be the MLE under H0: Then1pNS(b�) �a N(0; H)
therefore
1
NS(b�)0H�1S(b�) �a �2(k2)
In the Tobit model consider the case of H0 : �2 = 0
@ lnL@�2
=1
�2
Xi
Di(yi � x0i�)x2i �1
�2
Xi
(1�Di)�i�i1� �i
x2i
@ lnL@�2
=1
�2
Xi
e(1)i x2i
where
e(1)i = Di(yi � x0i�) + (1�Di)(�
�i�i1� �i
)
is known as the �rst order `generalised' residual, which reduces to ui = yi � x0i�
in the general linear model case.
This kind of Score or LM test can be extended to speci�cation tests for heteroskedas-
ticity and for non-normality. Notice that is estimation under the alternative is avoided,
at least in terms of the test statistic. If H0 is rejected then estimation under Ha is
unavoidable.
I Consider the normal distribution
f (ui) =1
�p2�exp
��12
u2i�2
�can be written in terms of log scores
@ ln f (ui)
@ui= �ui
�2:
I A popular generalisation (Pearson family of distributions) is@ ln f (ui)
@ui=
�ui + c1�2i � c1ui + c2u2i
where skedastsic function �2i = h(0 + 01zi); zi observable determinants of het-
eroskedasticity.
c1 6= 0! skewness
c2 6= 0! kurtosis
c1 = c2 = 0! Normal
1 = 0! homoskedastic
I Can write out the loglikelihood with the Pearson family and take derivatives with
respect to the c and parameters to �nd the LM or Score test. e.g.
@ lnL@1
= �Xi
e(2)i zi
where e(2)i is the second order generalised residual.
I Also@ lnL@c2
=1
4�2
Xi
Di(u4i �
Z 1�x0i�
t4fdt)
which is the 4th order generalised residual.
What if normality is rejected or not a good prior assumption anyway?
Suppose we just assume symmetry:
We can write the model as
y�i = x0i� + ui; or
yi = x0i� + u
�i ; where
u�i = max fui;�x0i�g
We can de�ne the new residuals
u��i = min fu�i ; x0i�g
where the x0i� re�ects `upper' trimming. Drop observations where x0i� 6 0 as no
symmetric trimming is possible here.
� Adapt least squares by replacing y byy�i = min fyi; 2x0i�g
! symmetrically censored least squares: Applying OLS for all i : xi� > 0yields consistent and asymptotically normal estimates: the error term now satis-�es E(u��jx) = 0.
� Requires a symmetric distribution of the error term, u�, but no normality or ho-moskedasticity.
� Estimation requires an iterative procedure. But can adapt EM algorithmb� = �X xix0i��1X ximiwith
mi = minfyi; 2x0i�g
� Monte-Carlo results.
Censored Least Absolute Deviations
Assume: conditional median of ui is zero! median of yi is
x0i�:1(x0i� > 0)
CLAD minimises the absolute distance of yi from its median
b�CLAD = argmin�
Xjyi � x0i�:1(x0i� > 0)j
Powell (1984) shows that b�CLAD is pN� consistent and asymptotically normal.I Can you think of a LAD estimator for the binary response case?
I How does it relate to Maximum Score?
Endogenous Variables
Consider the following (triangular) model
y�1i = x01i� + y2i + u1i (1)
y2i = z0i�2 + v2i (2)
where y1i = y�1i1(y�1i > 0). z0i = (x01i; x02i): The x02i are the excluded `instruments' from
the equation for y1: The �rst equation is a the `structural' equation of interest and
the second equation is the `reduced form' for y2:
I y2 is endogenous if u1 and v2 are correlated. If y1 was fully observed we could
use IV.
Use the following othogonal decomposition for u1
u1i = �v2i + �1i
where E(�1ijv2i) = 0:
I Note that y2 is uncorrelated with u1i conditional on v2: The variable v2 is some-
times known as a control function.
I Under the assumption that u1 and v2 are jointly normally distributed, u2 and � are
uncorrelated by de�nition and � also follows a normal distribution.
Use this to de�ne the augmented model
y�1i = x01i� + y2i + �v2i + �1i
y2i = z0i�2 + v2i
2-step Estimator:
I Step 1: Estimate � by OLS and predict v2,
bv2i = y2i � b�02ziI Step 2: use bv2i as a `control function' in the model for y�1 above and estimate byTobit or other consistent method.
An Exogeneity test
The null of exogeneity in this model is analogous to
H0 : � = 0
A test of this null can be performed using a t-test.
I Smith-Blundell (1986, Econometrica).
I Speci�cally for the censored regression model (Tobit model).
I All this discussion of endogenous regressors follows for the binary choice (see
Lecture 6 Notes) and other related models.
For example, consider the following (triangular) binary choice model
y�1i = x01i� + y2i + u1i (3)
y2i = z0i�2 + v2i (4)
where y1i = 1(y�1i > 0). z0i = (x01i; x02i): The x02i are the excluded `instruments' from
the equation for y1: The �rst equation is a the `structural' equation of interest and
the second equation is the `reduced form' for y2:
I y2 is endogenous if u1 and v2 are correlated.
I If y1 was fully observed we could use IV.
I Blundell and Powell (2004) extend to semiparametric case.