Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
30C00200
Econometrics
8) Instrumental variables
Timo KuosmanenProfessor, Ph.D.
http://nomepre.net/index.php/timokuosmanen
Today’s topics
• Thery of IV regression
• Overidentification
• Two-stage least squates (2SLS)
• Testing for endogeneity:
– Weak instruments
– Hausman test
Examples of instrumental variables
• In the case of measurement errors, instrument could be another
measurement (or proxy) for unobserved x
Example: twins study of returns to education
– x is the self-reported ”years of schooling” by respondent
– z is the ”years of schooling” reported by respondent’s twin brother / sister
• In time series and panel data models, past values of x observed
in previous periods are frequently used as instruments.
IV estimator
Assume the regression model is:
y = β1 + β2x + ε
However, the exogeneity assumption Cov(ε,x) = 0 is violated. – Examples: measurement error in x, omitted variable in ε
Assume we have an instrument z that is
• Highly correlared with endogenous x : |Cov(x,z)| >> 0
• Uncorrelated with disturbance ε : Cov(ε,z) = 0
IV estimator
• Recall the OLS estimator for slope β2
• The instrumental variable (IV) estimator:
1 12
2
1 1
( )( ) ( )( ). ( , )
. ( )( ) ( )( )
n n
i i i iOLS i i
n n
i i i
i i
x x y y x x y yEst Cov x y
bEst Var x
x x x x x x
12
1
( )( ). ( , )
. ( , )( )( )
n
i iIV i
n
i i
i
z z y yEst Cov z y
bEst Cov z x
z z x x
IV estimator
• The instrumental variable (IV) estimator can be rewritten as:
• Since we assumed Cov(z,ε) = 0, the expected value is
• The IV estimator is unbiased and consistent
1 2
2
2
2
. ( , ). ( , )
. ( , ) . ( , )
. ( , ) . ( , )
. ( , )
. ( , )
. ( , )
IVEst Cov z xEst Cov z y
bEst Cov z x Est Cov z x
Est Cov z x Est Cov z
Est Cov z x
Est Cov z
Est Cov z x
2 2( )IVE b
Variance of IV estimator
Variance of the IV estimator
Precision of the IV estimator improves if
• Variance of disturbance ε decreases
• Sample size n increases
• Variance of regressor x increases
• Correlation (rzx) of regressor x and instrument z increases
2 2
( )( )
( 1) ( )
IV
zx
VarVar b
n Var x r
OLS and IV as GMM estimatorsThe OLS residuals have the property
Thus, Est.Cov(x,e) = 0. This is the sample counterpart to the assumed population orthogonality condition Cov(x,ε) = 0
Note: we can derive the OLS estimator directly from the sampleorthogonality condition.
Assume centered data where sample averages of x and y are equalto zero, and assume the constant term is zero. Then
1
0n
OLS
i i
i
x e
2
1 1 1 1
( ) 0n n n n
OLS OLS OLS
i i i i i i i i
i i i i
x e x y b x x y b x
. ( , ) / . ( )OLSb Est Cov x y EstVar x
OLS and IV as GMM estimatorsAnalogously, the IV estimator is based on the population
orthogonality condition Cov(z,ε) = 0.
We can derive the IV estimator using the sample orthogonalitycondition
Both OLS and IV can be seen as special cases of the generalizedmethod of moment (GMM)
1
0n
IV
i i
i
z e
1 1 1
( ) 0n n n
IV IV
i i i i i i i
i i i
z y b x z y b z x
. ( , ) / . ( , )IVb Est Cov z y Est Cov z x
IV regression in Stata
Two-stage least squares can be implemented in Stata using the command ”ivreg” instead of the usual ”reg”
Syntax
.ivreg y x2 x3 x4 (x2 = z1 z2 x3 x4)
In matrix form:
OLS:
IV:
-1b = (X X) X y
-1b = (Z X) Z y
Over-identification
• Thus far, we assumed that there exist a single instrumentalvariable z that is highly correlated with x but uncorrelated with ε
• Examples of instrumental variables– Alternative proxy variables
– Past values xt-1
• If a useful instrument is available, then there are potentially morethan just one instrument
– If past value xt-1 is a good instrument for xt, then also xt-2, xt-3, …, are likelyuseful instruments.
• Choosing just one of the many instruments would be inefficientuse of information available
• Solution: two-stage least squares (2SLS) method
Two-stage least squares (2SLS)
• Assume we have one endogenous regressor x in the modely = β1 + β2x + ε
• Assume we have (L-1) instruments z2, z3,…, zL for x
2-stage estimation procedure:
1) Regress by using OLS:
x = κ1 + κ2z2 + κ3z3 + … + κLzL + ε
Save the fitted values: x* = k1 + k2z2 + k3z3 + … + kLzL
2) Use the fitted values x* to estimate the original regression equation:
y = β1 + β2x* + ε
Two-stage least squares (2SLS)
Practical notes:
• If we have more than one endogenous ”problem variable” x, thenstage 1 can be done separately for each variable
• Different endogenous regressors can be instrumented withdifferent z variables
• All exogenous regressors x are usually included as instruments z
• If OLS is used in the stepwise estimation, the standard errors of the 2-stage regression need to be adjusted– Stata does this automatically when ”ivreg” is used
Example: production function of electricity
distribution networks
Assume Cobb-Douglas production function
ln y = β0 + β1Li + β2Ki + εi
• Output y: ln Energy (GWh)
• Inputs x: L = ln OPEX, K = ln Krepl
• Instrument for K: ln Knuse
– OPEX = operational expenditure (incl. wages)
– Krepl = Capital stock (replacement value)
– Knuse = Capital stock (net use value)
Sample of 160 observations in years 2011 and 2012.
CD function, direct OLS estimation
_cons -4.696363 .4388221 -10.70 0.000 -5.563119 -3.829606 lnKrepl .591481 .1152412 5.13 0.000 .3638579 .8191042 lnOPEX .4460534 .1190976 3.75 0.000 .210813 .6812938 lnEnergy Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 283.641113 159 1.78390637 Root MSE = .37842 Adj R-squared = 0.9197 Residual 22.483226 157 .143205261 R-squared = 0.9207 Model 261.157887 2 130.578943 Prob > F = 0.0000 F( 2, 157) = 911.83 Source SS df MS Number of obs = 160
. regress lnEnergy lnOPEX lnKrepl
Two-stage least squares (2SLS)
Capital stock is hard to measure. Suppose our proxy for capital
stock K contains measurement error. If that is the case, the OLS
estimator of the output elasticity of K is biased towards zero.
Two alternative proxy measures of K: Krepl and Knuse.
Two-stage least squares:
Stage 1: Regress ln Krepl on ln Knuse and ln OPEX. Record the
predicted ln Krepl (ln PrKrepl).
Stage 2: Regress ln Energy on ln OPEX and ln PrKrepl to
estimate the production function of interest.
2SLS regression
Exogenous variables: lnOPEX lnKnuse Endogenous variables: lnKrepl lnEnergy _cons -6.052248 .5352627 -11.31 0.000 -7.105403 -4.999093 lnKrepl .9875145 .1451144 6.81 0.000 .7019949 1.273034 lnOPEX .0456137 .1489348 0.31 0.760 -.2474226 .33865lnEnergy _cons 1.755602 .1187836 14.78 0.000 1.52189 1.989314 lnKnuse .6873963 .0377983 18.19 0.000 .6130263 .7617664 lnOPEX .2911237 .0407567 7.14 0.000 .2109328 .3713145lnKrepl Coef. Std. Err. t P>|t| [95% Conf. Interval]
lnEnergy 160 2 .3923997 0.9148 858.94 0.0000lnKrepl 160 2 .1486906 0.9862 5624.20 0.0000 Equation Obs Parms RMSE "R-sq" F-Stat P Two-stage least-squares regression
. reg3 (lnKrepl = lnOPEX lnKnuse) (lnEnergy = lnOPEX lnKrepl), exog(lnOPEX) 2sls
IV (2SLS) regression
Instruments: lnOPEX lnKnuseInstrumented: lnKrepl _cons -6.052248 .5302208 -11.41 0.000 -7.091462 -5.013034 lnOPEX .0456137 .1475319 0.31 0.757 -.2435436 .334771 lnKrepl .9875145 .1437475 6.87 0.000 .7057744 1.269254 lnEnergy Coef. Std. Err. z P>|z| [95% Conf. Interval]
Root MSE = .3887 R-squared = 0.9148 Prob > chi2 = 0.0000 Wald chi2(2) = 1750.71Instrumental variables (2SLS) regression Number of obs = 160
. ivregress 2sls lnEnergy lnOPEX (lnKrepl = lnKnuse lnOPEX)
IV (GMM) regression
Instruments: lnOPEX lnKnuseInstrumented: lnKrepl _cons -6.052248 .5687178 -10.64 0.000 -7.166915 -4.937582 lnOPEX .0456137 .1595625 0.29 0.775 -.2671231 .3583505 lnKrepl .9875145 .1540702 6.41 0.000 .6855425 1.289486 lnEnergy Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust
GMM weight matrix: Robust Root MSE = .3887 R-squared = 0.9148 Prob > chi2 = 0.0000 Wald chi2(2) = 1320.63Instrumental variables (GMM) regression Number of obs = 160
. ivregress gmm lnEnergy lnOPEX (lnKrepl = lnKnuse)
Testing for weak instruments
• F-test of joint significance in the 1-stage regression serves as a useful diagnostic test of weak instruments
• To avoid the problems with weak instruments (imprecisecoefficients), the coefficients of stage 1 regression should bejointly significant: F-stat > Fcrit
Hausman test
• also referred to as Durbin-Wu-Hausman test
Rationale: it is not always clear if endogeneity is a problem or not
• If exogeneity assumption Cov(x, ε) = 0 holds, then OLS estimatoris unbiased and efficient
• IV estimator is also unbiased, but less efficient (OLS preferred)
• However, if exogeneity assumption Cov(x, ε) = 0 fails, then OLS estimator is biased and inconsistent
• IV estimator remains unbiased (IV preferred)
Hausman test
H0: Cov(x, ε) = 0; OLS preferred
H1: Cov(x, ε) ≠ 0; IV preferred
Procedure:
• Estimate both OLS and IV regressions
• Compare the estimated coefficients bOLS and bIV and theirstandard errors
• If H0 is true, then difference |bIV - bOLS| should be small (due to inefficiency of the IV estimator)
• If H0 is true, the Hausman statistic follows chi-squared distributionwith the degrees of freedom equal to the number of endogenousregressors instrumented in the IV model
Hausman test in Stata
• Stata computes the Hausman test automatically
• Run the IV and OLS regressions
• Save the results by command ’estimates store name’– Example: ’estimates store CostIV’ and ’estimates store CostOLS’
• Hausman test is conducted by command ’hausman’– Example: ’hausman CostIV CostOLS constant’
Hausman test in Stata
Prob>chi2 = 0.0000 = 21.24 chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from regress b = consistent under Ho and Ha; obtained from ivregress lnOPEX .0456137 .4460534 -.4004397 .0870714 lnKrepl .9875145 .591481 .3960335 .0859234 IV OLS Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients
. hausman IV OLS