Upload
zuriel
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Endogenous Regressors and Instrumental Variables Estimation. Chapter 10. Adapted from Vera Tabakova, East Carolina University. Chapter 10: Random Regressors and Moment Based Estimation. 10.1 Linear Regression with Random x ’s 10.2 Cases in which x and e are Correlated - PowerPoint PPT Presentation
Citation preview
Chapter 10
Endogenous Regressors and Instrumental Variables Estimation
Adapted from Vera Tabakova, East Carolina University
Chapter 10: Random Regressors and Moment Based Estimation 10.1 Linear Regression with Random x’s 10.2 Cases in which x and e are Correlated 10.3 Estimators Based on the Method of Moments 10.4 Specification Tests
Slide 10-2Principles of Econometrics, 3rd Edition
Chapter 10: Random Regressors and Moment Based EstimationThe assumptions of the simple linear regression are: SR1. SR2. SR3. SR4.
SR5. The variable xi is not random, and it must take at least two
different values. SR6. (optional)
Slide 10-3Principles of Econometrics, 3rd Edition
1 2 1, ,i i iy x e i N
( ) 0iE e 2var( )ie
cov( , ) 0i je e
2~ (0, )ie N
Chapter 10: Random Regressors and Moment Based Estimation
The purpose of this chapter is to discuss regression models in which
xi is random and correlated with the error term ei. We will:
Discuss the conditions under which having a random x is not a
problem, and how to test whether our data satisfies these conditions.
Present cases in which the randomness of x causes the least squares
estimator to fail.
Provide estimators that have good properties even when xi is random
and correlated with the error ei. Slide 10-4Principles of Econometrics, 3rd Edition
10.1 Linear Regression With Random X’s
A10.1 correctly describes the relationship
between yi and xi in the population, where β1 and β2 are unknown
(fixed) parameters and ei is an unobservable random error term.
A10.2 The data pairs , are obtained by random
sampling. That is, the data pairs are collected from the same
population, by a process in which each pair is independent of
every other pair. Such data are said to be independent and
identically distributed.
Slide 10-5Principles of Econometrics, 3rd Edition
1 2i i iy x e
, 1, ,i ix y i N
10.1 Linear Regression With Random X’s
A10.3 The expected value of the error term ei,
conditional on the value of xi, is zero.
This assumption implies that we have (i) omitted no important variables, (ii) used the correct functional form, and (iii) there exist no factors that cause the error term ei to be correlated with xi.
If , then we can show that it is also true that xi and ei are uncorrelated, and that .
Conversely, if xi and ei are correlated, then and we can show that .
Slide 10-6Principles of Econometrics, 3rd Edition
| 0.i iE e x
| 0i iE e x cov , 0i ix e
cov , 0i ix e | 0i iE e x
10.1 Linear Regression With Random X’s
A10.4 In the sample, xi must take at least two different values.
A10.5 The variance of the error term, conditional
on xi is a constant σ2.
A10.6 The distribution of the error term,
conditional on xi, is normal.
Slide 10-7Principles of Econometrics, 3rd Edition
2var | .i ie x
2| ~ 0, .i ie x N
10.1.1 The Small Sample Properties of the OLS Estimator
Under assumptions A10.1-A10.4 the least squares estimator is
unbiased.
Under assumptions A10.1-A10.5 the least squares estimator is the
best linear unbiased estimator of the regression parameters,
conditional on the x’s, and the usual estimator of σ2 is unbiased.
Slide 10-8Principles of Econometrics, 3rd Edition
10.1.1 The Small Sample Properties of the OLS Estimator
Under assumptions A10.1-A10.6 the distributions of the least squares
estimators, conditional upon the x’s, are normal, and their variances
are estimated in the usual way. Consequently the usual interval
estimation and hypothesis testing procedures are valid.
Slide 10-9Principles of Econometrics, 3rd Edition
10.1.2 Asymptotic Properties of the OLS Estimator: X Not Random
Figure 10.1 An illustration of consistency
Slide 10-10Principles of Econometrics, 3rd Edition
10.1.2 Asymptotic Properties of the OLS Estimator: X Not Random
Slide 10-11Principles of Econometrics, 3rd Edition
Remark: Consistency is a “large sample” or “asymptotic” property. We have
stated another large sample property of the least squares estimators in Chapter
2.6. We found that even when the random errors in a regression model are not
normally distributed, the least squares estimators still have approximate
normal distributions if the sample size N is large enough. How large must the
sample size be for these large sample properties to be valid approximations of
reality? In a simple regression 50 observations might be enough. In multiple
regression models the number might be much higher, depending on the quality
of the data.
10.1.3 Asymptotic Properties of the OLS Estimator: X Random
A10.3*
Slide 10-12Principles of Econometrics, 3rd Edition
0 and cov , 0i i iE e x e
| 0 cov , 0i i i iE e x x e
| 0 0i i iE e x E e
10.1.3 Asymptotic Properties of the OLS Estimator: X Random
Under assumption A10.3* the least squares estimators are consistent.
That is, they converge to the true parameter values as N.
Under assumptions A10.1, A10.2, A10.3*, A10.4 and A10.5, the least
squares estimators have approximate normal distributions in large
samples, whether the errors are normally distributed or not.
Furthermore our usual interval estimators and test statistics are valid,
if the sample is large.
Slide 10-13Principles of Econometrics, 3rd Edition
10.1.3 Asymptotic Properties of the OLS Estimator: X Random
If assumption A10.3* is not true, and in particular if
so that xi and ei are correlated, then the least squares estimators are
inconsistent. They do not converge to the true parameter values
even in very large samples. Furthermore, none of our usual
hypothesis testing or interval estimation procedures are valid.
Slide 10-14Principles of Econometrics, 3rd Edition
cov , 0i ix e
10.1.4 Why OLS Fails
Figure 10.2 Plot of correlated x and e
Slide 10-15Principles of Econometrics, 3rd Edition
10.1.4 Why OLS Fails
Slide 10-16Principles of Econometrics, 3rd Edition
1 2 1 1y E y e x e x e
1 2ˆ .9789 1.7034y b b x x True model, but it gets estimated as:
…so there is s substantial bias …
10.1.4 Why OLS Fails
Figure 10.3 Plot of data, true and fitted regressions
Slide 10-17Principles of Econometrics, 3rd Edition
In the caseof a positivecorrelationbetween x and the error
ExampleWages = f(intelligence)
10.2 If X and e Are Correlated
When an explanatory variable and the error term are correlated the
explanatory variable is said to be endogenous and means
“determined within the system.”
Slide 10-18Principles of Econometrics, 3rd Edition
10.2.1 Measurement Error
Slide 10-19Principles of Econometrics, 3rd Edition
(10.1)
(10.2)
*1 2i i iy x v
*i i ix x u
X Y
vu
eX Y
10.2.1 Measurement Error
Slide 10-20Principles of Econometrics, 3rd Edition
(10.3)
*1 2
1 2
1 2 2
1 2
i i i
i i i
i i i
i i
y x v
x u v
x v u
x e
10.2.1 Measurement Error
Slide 10-21Principles of Econometrics, 3rd Edition
(10.4)
*2
2 22 2
cov ,
0
i i i i i i i i
i u
x e E x e E x u v u
E u
10.2.2 Omitted Variables
Omitted factors: experience, ability and motivation.
Therefore, we expect that
Slide 10-22Principles of Econometrics, 3rd Edition
(10.5)1 2i i iWAGE EDUC e
cov , 0.i iEDUC e
We will focus on this case
Y
W
X
10.2.2 Omitted Variables
Other examples: Y = crime, X = marriage, W = “marriageability” Y = divorce, X = “shacking up”, W = “good match” Y = crime, X = watching a lot of TV, W = “parental involvement” Y = sexual violence, X = watching porn, W = any unobserved anything
that would affect both W and Y The list is endless!
.
Slide 10-23Principles of Econometrics, 3rd Edition
We will focus on this case
Y
W
X
10.2.3 Simultaneous Equations Bias
There is a feedback relationship between Pi and Qi. Because of this
feedback, which results because price and quantity are jointly, or
simultaneously, determined, we can show that
The resulting bias (and inconsistency) is called the simultaneous
equations bias.
Slide 10-24Principles of Econometrics, 3rd Edition
(10.6)1 2i i iQ P e
cov( , ) 0.i iP e
X Y
10.2.4 Lagged Dependent Variable Models with Serial Correlation
In this case the least squares estimator applied to the lagged
dependent variable model will be biased and inconsistent.
Slide 10-25Principles of Econometrics, 3rd Edition
1 2 1 3t t t ty y x e
1AR(1) process: t t te e v
1If 0 there will be correlation between and .t ty e
10.3 Estimators Based on the Method of Moments
When all the usual assumptions of the linear model hold, the method
of moments leads us to the least squares estimator. If x is random and
correlated with the error term, the method of moments leads us to an
alternative, called instrumental variables estimation, or two-stage
least squares estimation, that will work in large samples.
Slide 10-26Principles of Econometrics, 3rd Edition
10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model
Suppose that there is another variable, z, such that z does not have a direct effect on y, and thus it does not belong on the right-
hand side of the model as an explanatory variable ONCE X IS IN THE
MODEL!
zi should not be simultaneously affected by y either, of course
zi is not correlated with the regression error term ei. Variables with this
property are said to be exogenous z is strongly [or at least not weakly] correlated with x, the endogenous
explanatory variable A variable z with these properties is called an instrumental variable. Slide 10-27Principles of Econometrics, 3rd Edition
10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model
An instrument is a variable z correlated with x but not with the error e In addition, the instrument does not directly affect y and thus does not belong
in the actual model as a separate regressor (of course it should affect it through
the instrumented regressor x) It is common to have more than one instrument for x (just not good ones!) These instruments, z1; z2; : : : ; zs, must be correlated with x, but not with e Consistent estimation is obtained through the instrumental variables or two-
stage least squares (2SLS) estimator, rather than the usual OLS estimator
Slide 10-28Principles of Econometrics, 3rd Edition
Instrumental Variables
Using Z, an “instrumental variable” for X is one solution to the problem of omitted variables bias
Z, to be a valid instrument for X must be:– Relevant = Correlated with X– Exogenous = Not correlated with Y
except through its correlation with X
Z
X Y
W
e
10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model
Slide 10-30Principles of Econometrics, 3rd Edition
(10.16)
(10.17)
1 20 0i i i i iE z e E z y x
1 2
1 2
1 ˆ ˆ 0
1 ˆ ˆ 0
i i
i i i
y xN
z y xN
Based on the method of moments…
10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model
Slide 10-31Principles of Econometrics, 3rd Edition
(10.18)
2
1 2
ˆ
ˆ ˆ
i ii i i i
i i i i i i
z z y yN z y z yN z x z x z z x x
y x
Solving the previous system, we obtain a new estimator that cleanses the endogeneity of X and exploits only the componentof the variation of X that is not correlated with e instead:
We do that by ensuring that we only use the predicted value of X from its regression on Z in the main regression
10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model
These new estimators have the following properties: They are consistent, if In large samples the instrumental variable estimators have
approximate normal distributions. In the simple regression model
Slide 10-32Principles of Econometrics, 3rd Edition
(10.19)
0.i iE z e
2
2 2 22ˆ ~ ,
zx i
Nr x x
10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model
The error variance is estimated using the estimator
Slide 10-33Principles of Econometrics, 3rd Edition
2
1 22ˆ ˆ
ˆ2
i i
IV
y x
N
(10.19)
2
2 2 22ˆ ~ ,
zx i
Nr x x
For:
The stronger the correlation between the instrument and X the better!
10.3.3a The importance of using strong instruments
Using the instrumental variables is less efficient than OLS (because
you must throw away some information on the variation of X) so it
leads to wider confidence intervals and less precise inference.
The bottom line is that when instruments are weak, instrumental
variables estimation is not reliable: you throw away too much
information when trying to avoid the endogeneity biasSlide 10-34Principles of Econometrics, 3rd Edition
2
22 2 22
varˆvarzxzx i
brr x x
Instrumental Variables using Venn diagrams
Not all of the available variation in X is used Only that component of X that is “explained”
by Z is used to explain Y
X Y
ZX = Endogenous variableY = Response variableZ = Instrumental variable
X Y
Z
Realistic scenario: Very little of X is explained by Z, and/or what is explained does not overlap much with Y
X YZ
Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for
Weak versus Strong IVs
Instrumental Variables Models
The IV estimator is BIASED E(bIV) ≠ β (finite-sample bias) but consistent: E(b) → β as N → ∞
So IV studies must often have very large samples But with endogeneity, E(bLS) ≠ β and plim(bLS) ≠ β anyway…
Asymptotic behavior of IV:plim(bIV) = β + Cov(Z,e) / Cov(Z,X)
If Z is truly exogenous, then Cov(Z,e) = 0
Some IV Terminology
Three different models to be familiar with
First stage (“Reduced form”): EDUC = α0 + α1Z + ω
Structural model: WAGES = β0 + β1EDUC + ε
WAGES = δ0 + δ1Z + ξ
An interesting equality:
δ1 = α1 × β1
so…
β1 = δ1 / α1
Z X Yα1 β1
Z Yδ1
ω ε
ξ
10.3.4a An Illustration Using Simulated Data
Slide 10-39Principles of Econometrics, 3rd Edition
(10.26)
(10.27)
1 2ˆ .1947 .5700 .2068(se) (.079) (.089) (.077)
x z z
1 2_ ,ˆ 1.1376 1.0399
(se) (.116) (.194)IV z zy x
10.3.3b An Illustration Using Simulated Data
Slide 10-40Principles of Econometrics, 3rd Edition
ˆ .9789 1.7034(se) (.088) (.090)
OLSy x 1_ˆ 1.1011 1.1924
(se) (.109) (.195)IV zy x
2_ˆ 1.3451 .1724
(se) (.256) (.797)IV zy x
3_ˆ .9640 1.7657
(se) (.095) (.172)IV zy x
10.3.3c An Illustration Using a Wage Equation
Slide 10-41Principles of Econometrics, 3rd Edition
21 2 3 4ln WAGE EDUC EXPER EXPER e
2ln = .5220 .1075 .0416 .0008
(se) (.1986) (.0141) (.0132) (.0004)
WAGE EDUC EXPER EXPER
10.3.3c An Illustration Using a Wage Equation
Slide 10-42Principles of Econometrics, 3rd Edition
2 9.7751 .0489 .0013 .2677 (se) (.4249) (.0417) (.0012) (.0311) EDUC EXPER EXPER MOTHEREDUC
2ln .1982 .0493 .0449 .0009
(se) (.4729) (.0374) (.0136) (.0004)
WAGE EDUC EXPER EXPER
Check it out: as expected much lower than from OLS!!!
We hope for a high t ratio here
Principles of Econometrics, 3rd Edition 43
Use mroz.dta
summarizedrop if lfp==0gen lwage = log(wage)gen exper2 = exper^2
* Basic OLS estimationreg lwage educ exper exper2estimates store ls
* IV estimationreg educ exper exper2 mothereducpredict educ_hatreg lwage educ_hat exper exper2
But check that the latter will give you wrong s.e. so not reccommended to run 2SLS manually
Include in z all G exogenous variables and the instruments available
Exogenous variablesInstrument themselves!
Principles of Econometrics, 3rd Edition 44
In econometrics, two-stage least squares (TSLS or 2SLS) and instrumental variables (IV) estimation are often used interchangeably
The `two-stage' terminology comes from the time when the easiest way to estimate the model was to actually use two separate least squares regressions
With better software, the computation is done in a single step to ensure the other model statistics are computed correctly
Principles of Econometrics, 3rd Edition 45
In econometrics, two-stage least squares (TSLS or 2SLS) and instrumental variables (IV) estimation are often used interchangeablyThe `two-stage' terminology comes from the time when the easiest way to estimate the model was to actually use two separate least squares regressions
_cons -.5220406 .1986321 -2.63 0.009 -.9124667 -.1316144 exper2 -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382 exper .0415665 .0131752 3.15 0.002 .0156697 .0674633 educ .1074896 .0141465 7.60 0.000 .0796837 .1352956 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 223.327442 427 .523015086 Root MSE = .66642 Adj R-squared = 0.1509 Residual 188.305145 424 .444115908 R-squared = 0.1568 Model 35.0222967 3 11.6740989 Prob > F = 0.0000 F( 3, 424) = 26.29 Source SS df MS Number of obs = 428
. reg lwage educ exper exper2
Principles of Econometrics, 3rd Edition 46
_cons 9.775103 .4238886 23.06 0.000 8.941918 10.60829 mothereduc .2676908 .0311298 8.60 0.000 .2065029 .3288787 exper2 -.0012811 .0012449 -1.03 0.304 -.003728 .0011659 exper .0488615 .0416693 1.17 0.242 -.0330425 .1307655 educ Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2230.19626 427 5.22294206 Root MSE = 2.1111 Adj R-squared = 0.1467 Residual 1889.65843 424 4.45674158 R-squared = 0.1527 Model 340.537834 3 113.512611 Prob > F = 0.0000 F( 3, 424) = 25.47 Source SS df MS Number of obs = 428
. reg educ exper exper2 mothereduc
First stage or “reduced form” equation:
Good news!!
Principles of Econometrics, 3rd Edition 47
Second stage equation:
_cons .1981861 .4933427 0.40 0.688 -.7715157 1.167888 exper2 -.0009221 .000424 -2.17 0.030 -.0017554 -.0000887 exper .0448558 .0141644 3.17 0.002 .0170147 .072697 educ_hat .0492629 .0390562 1.26 0.208 -.0275049 .1260308 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 223.327442 427 .523015086 Root MSE = .70902 Adj R-squared = 0.0388 Residual 213.146238 424 .502703391 R-squared = 0.0456 Model 10.181204 3 3.39373467 Prob > F = 0.0002 F( 3, 424) = 6.75 Source SS df MS Number of obs = 428
. reg lwage educ_hat exper exper2
Wrong standard errors!!!
Principles of Econometrics, 3rd Edition 48
Second stage equation:
Correct standard errors!!!
Instruments: exper exper2 mothereducInstrumented: educ _cons .1981861 .4706623 0.42 0.674 -.7242952 1.120667 exper2 -.0009221 .0004045 -2.28 0.023 -.0017148 -.0001293 exper .0448558 .0135132 3.32 0.001 .0183704 .0713413 educ .049263 .0372607 1.32 0.186 -.0237666 .1222925 lwage Coef. Std. Err. z P>|z| [95% Conf. Interval]
Root MSE = .67642 R-squared = 0.1231 Prob > chi2 = 0.0001 Wald chi2(3) = 22.25Instrumental variables (2SLS) regression Number of obs = 428
. ivregress 2sls lwage (educ=mothereduc) exper exper2
Principles of Econometrics, 3rd Edition 49
Second stage equation:
“Correcter” standard errors Wit a small sample and robust To heteroskedasticity!!!
Instruments: exper exper2 mothereducInstrumented: educ _cons .1981861 .4891462 0.41 0.686 -.7632673 1.159639 exper2 -.0009221 .0004319 -2.14 0.033 -.001771 -.0000732 exper .0448558 .0156038 2.87 0.004 .0141853 .0755264 educ .049263 .0380396 1.30 0.196 -.0255067 .1240326 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval] Robust
Root MSE = .6796 Adj R-squared = 0.1169 R-squared = 0.1231 Prob > F = 0.0010 F( 3, 424) = 5.50Instrumental variables (2SLS) regression Number of obs = 428
. ivregress 2sls lwage (educ=mothereduc) exper exper2, vce(robust) small
10.3.4 Instrumental Variables Estimation With Surplus Instruments
A 2-step process. Regress x on a constant term, z and all other exogenous variables G,
and obtain the predicted values Use as an instrumental variable for x.
Slide 10-50Principles of Econometrics, 3rd Edition
ˆ.x
x
10.3.4 Instrumental Variables Estimation With Surplus Instruments
Two-stage least squares (2SLS) estimator: Stage 1 is the regression of x on a constant term, z and all other
exogenous variables G, to obtain the predicted values . This first
stage is called the reduced form model estimation. Stage 2 is ordinary least squares estimation of the simple linear
regression
Slide 10-51Principles of Econometrics, 3rd Edition
(10.23)
x
1 2 ˆi i iy x error
10.3.4 Instrumental Variables Estimation With Surplus Instruments
Slide 10-52Principles of Econometrics, 3rd Edition
(10.24)
(10.25)
2
2 2ˆvar
ˆix x
2
1 22ˆ ˆ
ˆ2
i i
IV
y x
N
2
2 2
ˆˆvarˆ
IV
ix x
10.3.4b An Illustration Using a Wage Equation
Slide 10-53Principles of Econometrics, 3rd Edition
If we regress education on all the exogenous variables and the TWO instruments:
These look promising In fact: F test saysNull hypothesis: the regression parameters are zero for the variables mothereduc, fathereduc Test statistic: F(2, 423) = 55.4003, p-value 4.26891e-022
Rule of thumb for strong instruments: F>10
10.3.4b An Illustration Using a Wage Equation
ivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, small estimates store iv
Slide 10-54Principles of Econometrics, 3rd Edition
2ln = .0481 .0614 .0442 .0009
(se) (.4003) (.0314) (.0134) (.0004)
WAGE EDUC EXPER EXPER
With the additional instrument, we achieve a significant result in this case
10.3.5 Instrumental Variables Estimation in a General Model
Slide 10-55Principles of Econometrics, 3rd Edition
Is a bit more complex, but the idea is that you need at least as many Instruments as you have endogenous variables
You cannot use the F test we just saw to test for the weakness of the instruments
10.3.5a Hypothesis Testing with Instrumental Variables Estimates
When testing the null hypothesis use of the test statistic
is valid in large samples. It is common, but not
universal, practice to use critical values, and p-values, based on the
Student-t distribution rather than the more strictly appropriate N(0,1)
distribution. The reason is that tests based on the t-distribution tend to
work better in samples of data that are not large.
Slide 10-56Principles of Econometrics, 3rd Edition
0 : kH c
ˆ ˆsek kt c
10.3.5a Hypothesis Testing with Instrumental Variables Estimates
When testing a joint hypothesis, such as , the test
may be based on the chi-square distribution with the number of
degrees of freedom equal to the number of hypotheses (J) being
tested. The test itself may be called a “Wald” test, or a likelihood
ratio (LR) test, or a Lagrange multiplier (LM) test. These testing
procedures are all asymptotically equivalent
Another complication: you might need to use tests that are robust to
heteroskedasticity and/or autocorrelation…
Slide 10-57Principles of Econometrics, 3rd Edition
0 2 2 3 3: ,H c c
10.3.5b Goodness of Fit with Instrumental Variables Estimates
Unfortunately R2 can be negative when based on IV estimates.
Therefore the use of measures like R2 outside the context of the least
squares estimation should be avoided (even if GRETL and STATA
produce one!)Slide 10-58Principles of Econometrics, 3rd Edition
1 2
1 2
22 2
ˆ ˆˆ
ˆ1 i i
y x e
e y x
R e y y
10.4 Specification Tests in IV
Can we test for whether x is correlated with the error term? This might
give us a guide of when to use least squares and when to use IV
estimators. TEST OF EXOGENEITY: (Durbin-Wu) HAUSMAN TEST
Can we test whether our instrument is sufficiently strong to avoid the
problems associated with “weak” instruments? TESTS FOR WEAK
INSTRUMENTS
Can we test if our instrument is valid, and uncorrelated with the regression
error, as required? SOMETIMES ONLY WITH A SARGAN TEST
Slide 10-59Principles of Econometrics, 3rd Edition
10.4.1 The Hausman Test for Endogeneity
If the null hypothesis is true, both OLS and the IV estimator are
consistent. If the null hypothesis holds, use the more efficient
estimator, OLS.
If the null hypothesis is false, OLS is not consistent, and the IV
estimator is consistent, so use the IV estimator.
Slide 10-60Principles of Econometrics, 3rd Edition
0 1: cov , 0 : cov , 0i i i iH x e H x e
10.4.1 The Hausman Test for Endogeneity
Let z1 and z2 be instrumental variables for x.
1. Estimate the model by least squares, and obtain
the residuals . If there are more than one
explanatory variables that are being tested for endogeneity, repeat this
estimation for each one, using all available instrumental variables in each
regression.
Slide 10-61Principles of Econometrics, 3rd Edition
1 2i i iy x e
1 1 1 2 2i i i ix z z v
1 1 1 2 2ˆ ˆˆi i i iv x z z
10.4.1 The Hausman Test for Endogeneity
2. Include the residuals computed in step 1 as an explanatory variable
in the original regression, Estimate this
"artificial regression" by least squares, and employ the usual t-test
for the hypothesis of significance
Slide 10-62Principles of Econometrics, 3rd Edition
1 2 ˆ .i i i iy x v e
0
1
: 0 no correlation between and : 0 correlation between and
i i
i i
H x eH x e
10.4.1 The Hausman Test for Endogeneity
3. If more than one variable is being tested for endogeneity, the test will be
an F-test of joint significance of the coefficients on the included
residuals.
Note: This is a test for the exogeneity of the regressors xi and not for the exogeneity of the instruments zi. If the instruments are not valid, the Hausman test is not valid either.
Slide 10-63Principles of Econometrics, 3rd Edition
10.4.1 The Hausman Test for Endogeneity
* Hausman test regression based
reg educ exper exper2 mothereduc fathereduc
predict vhat, residuals
reg lwage educ exper exper2 vhat
reg lwage educ exper exper2 vhat, vce(robust)
There are several different ways of computing this test, so don't worry if your result from other software packages differs from the one you compute manually using the above script
Slide 10-64Principles of Econometrics, 3rd Edition
10.4.1 The Hausman Test for Endogeneity
Slide 10-65Principles of Econometrics, 3rd Edition
so
_cons .0481003 .4221019 0.11 0.909 -.7815781 .8777787 vhat .0581666 .0364135 1.60 0.111 -.0134073 .1297405 exper2 -.000899 .0004152 -2.16 0.031 -.0017152 -.0000828 exper .0441704 .0151219 2.92 0.004 .0144469 .0738939 educ .0613966 .0326667 1.88 0.061 -.0028127 .125606 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval] Robust
Root MSE = .66502 R-squared = 0.1624 Prob > F = 0.0000 F( 4, 423) = 21.52Linear regression Number of obs = 428
. reg lwage educ exper exper2 vhat, vce(robust)
.
So education is only barely exogenous at about 10%
10.4.1 The Hausman Test for Endogeneity
Slide 10-66Principles of Econometrics, 3rd Edition
so
So education is only barely exogenous at about 10%
hausman iv ls, constant sigmamore
(V_b-V_B is not positive definite) Prob>chi2 = 0.0954 = 2.78 chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from regress b = consistent under Ho and Ha; obtained from ivregress _cons .0481003 -.5220406 .5701409 .3418964 exper2 -.000899 -.0008112 -.0000878 .0000526 exper .0441704 .0415665 .0026039 .0015615 educ .0613966 .1074896 -.046093 .0276406 iv ls Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients
Forces Stata to use the OLS Residuals when calculating the Error variance in both estimators
10.4.2 Testing for Weak Instruments
If we have L > 1 instruments available then the reduced form equation is
Slide 10-67Principles of Econometrics, 3rd Edition
1 2 2 1 1G G G Gy x x x e
1 1 2 2 1 1G G Gx x x z v
1 1 2 2 1 1G G G L Lx x x z z v
10.4.2 Testing for Weak Instruments
Then test with an F test whether the instruments help to determine thevalue of the endogenous variable. Rule of thumb F > 10 for one
endogenous regressor
Slide 10-68Principles of Econometrics, 3rd Edition
1 2 2 1 1G G G Gy x x x e
1 1 2 2 1 1G G Gx x x z v
1 1 2 2 1 1G G G L Lx x x z z v
10.4.2 Testing for Weak Instruments
As we saw before:
* Testing for weak instruments reg educ exper exper2 mothereduc reg educ exper exper2 fathereduc reg educ exper exper2 mothereduc fathereduc test mothereduc fathereduc
Slide 10-69Principles of Econometrics, 3rd Edition
10.4.2 Testing for Weak Instruments
* Testing for weak instruments using estativregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, smallestat firststage
Slide 10-70Principles of Econometrics, 3rd Edition
so
LIML Size of nominal 5% Wald test 8.68 5.33 4.42 3.92 2SLS Size of nominal 5% Wald test 19.93 11.59 8.75 7.25 10% 15% 20% 25% 2SLS relative bias (not available) 5% 10% 20% 30% Ho: Instruments are weak # of excluded instruments: 2 Critical Values # of endogenous regressors: 1
Minimum eigenvalue statistic = 55.4003
educ 0.2115 0.2040 0.2076 55.4003 0.0000 Variable R-sq. R-sq. R-sq. F(2,423) Prob > F Adjusted Partial First-stage regression summary statistics
. estat firststage
.
We are quite alright here
10.4.2 Testing for Weak Instruments
* Robust testsreg educ exper exper2 mothereduc fathereduc, vce(robust)test mothereduc fathereduc
Slide 10-71Principles of Econometrics, 3rd Edition
so
Prob > F = 0.0000 F( 2, 423) = 49.53
( 2) fathereduc = 0 ( 1) mothereduc = 0
. test mothereduc fathereduc
.
_cons 9.10264 .4241444 21.46 0.000 8.268947 9.936333 fathereduc .1895484 .0324419 5.84 0.000 .125781 .2533159 mothereduc .157597 .0354502 4.45 0.000 .0879165 .2272776 exper2 -.0010091 .0013233 -0.76 0.446 -.0036101 .0015919 exper .0452254 .0419107 1.08 0.281 -.0371538 .1276046 educ Coef. Std. Err. t P>|t| [95% Conf. Interval] Robust
Root MSE = 2.039 R-squared = 0.2115 Prob > F = 0.0000 F( 4, 423) = 25.76Linear regression Number of obs = 428
. reg educ exper exper2 mothereduc fathereduc, vce(robust)
.
Still alright
10.4.2 Testing for Weak Instruments
* Robust tests using ivregressivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, small vce(robust)estat firststage
Slide 10-72Principles of Econometrics, 3rd Edition
so educ 0.2115 0.2040 0.2076 49.5266 0.0000 Variable R-sq. R-sq. R-sq. F(2,423) Prob > F Adjusted Partial Robust First-stage regression summary statistics
. estat firststage
.
Still alright
10.4.3 Testing Instrument Validity
We want to check that the instrument is itself exogenous…but you can only do it for
surplus instruments if you have them (overidentified equation)…
1. Compute the IV estimates using all available instruments, including the G
variables x1=1, x2, …, xG that are presumed to be exogenous, and the L
instruments z1, …, zL.
2. Obtain the residuals
Slide 10-73Principles of Econometrics, 3rd Edition
ˆk
1 2 2ˆ ˆ ˆˆ .K Ke y x x
10.4.3 Testing Instrument Validity
3. Regress on all the available instruments described in step 1.
4. Compute NR2 from this regression, where N is the sample size and
R2 is the usual goodness-of-fit measure.
5. If all of the surplus moment conditions are valid, then
If the value of the test statistic exceeds the 100(1−α)-percentile from
the distribution, then we conclude that at least one of the
surplus moment conditions restrictions is not valid.
Slide 10-74Principles of Econometrics, 3rd Edition
e
2 2( )~ .L BNR
2( )L B
10.4.3 Testing Instrument Validity
Slide 10-75Principles of Econometrics, 3rd Edition
# Sargan test of overidentificationivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, smallpredict ehat, residualsquietly reg ehat mothereduc fathereduc exper exper2scalar nr2 = e(N)*e(r2)scalar chic = invchi2tail(1,.05)scalar pvalue = chi2tail(1,nr2)di "NR^2 test of overidentifying restriction = " nr2di "Chi-square critical value 1 df, .05 level = " chicdi "p value for overidentifying test 1 df, .05 level = " pvalue
You should learn about the Cragg-Uhler test if you end up having more than one endogenous variable
10.4.3 Testing Instrument Validity
Slide 10-76Principles of Econometrics, 3rd Edition
# Sargan test of overidentification
You should learn about the Cragg-Uhler test if you end up having more than one endogenous variable
so
p value for overidentifying test 1 df, .05 level = .53863714. di "p value for overidentifying test 1 df, .05 level = " pvalue.
Chi-square critical value 1 df, .05 level = 3.8414588. di "Chi-square critical value 1 df, .05 level = " chic.
NR^2 test of overidentifying restriction = .37807151. di "NR^2 test of overidentifying restriction = " nr2.
So we are quite safe here…
10.4.3 Testing Instrument Validity
Slide 10-77Principles of Econometrics, 3rd Edition
# Sargan test of overidentification
You should learn about the Cragg-Uhler test if you end up having more than one endogenous variable
so
So we are quite safe here…
Basmann chi2(1) = .373985 (p = 0.5408) Sargan (score) chi2(1) = .378071 (p = 0.5386)
Tests of overidentifying restrictions:
. estat overid
.
. quietly ivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, small
Keywords
Slide 10-78Principles of Econometrics, 3rd Edition
asymptotic properties conditional expectation endogenous variables errors-in-variables exogenous variables finite sample properties Hausman test instrumental variable instrumental variable estimator just identified equations large sample properties over identified equations population moments random sampling reduced form equation
sample moments simultaneous equations bias test of surplus moment conditions two-stage least squares estimation weak instruments