25
30C00200 Econometrics 8) Instrumental variables Timo Kuosmanen Professor, Ph.D. http:// nomepre.net/index.php/timokuosmanen

Econometrics 8) Instrumental variablesExamples of instrumental variables • In the case of measurement errors, instrument could be another measurement (or proxy) for unobserved x

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

30C00200

Econometrics

8) Instrumental variables

Timo KuosmanenProfessor, Ph.D.

http://nomepre.net/index.php/timokuosmanen

Today’s topics

• Thery of IV regression

• Overidentification

• Two-stage least squates (2SLS)

• Testing for endogeneity:

– Weak instruments

– Hausman test

Examples of instrumental variables

• In the case of measurement errors, instrument could be another

measurement (or proxy) for unobserved x

Example: twins study of returns to education

– x is the self-reported ”years of schooling” by respondent

– z is the ”years of schooling” reported by respondent’s twin brother / sister

• In time series and panel data models, past values of x observed

in previous periods are frequently used as instruments.

IV estimator

Assume the regression model is:

y = β1 + β2x + ε

However, the exogeneity assumption Cov(ε,x) = 0 is violated. – Examples: measurement error in x, omitted variable in ε

Assume we have an instrument z that is

• Highly correlared with endogenous x : |Cov(x,z)| >> 0

• Uncorrelated with disturbance ε : Cov(ε,z) = 0

IV estimator

• Recall the OLS estimator for slope β2

• The instrumental variable (IV) estimator:

1 12

2

1 1

( )( ) ( )( ). ( , )

. ( )( ) ( )( )

n n

i i i iOLS i i

n n

i i i

i i

x x y y x x y yEst Cov x y

bEst Var x

x x x x x x

12

1

( )( ). ( , )

. ( , )( )( )

n

i iIV i

n

i i

i

z z y yEst Cov z y

bEst Cov z x

z z x x

IV estimator

• The instrumental variable (IV) estimator can be rewritten as:

• Since we assumed Cov(z,ε) = 0, the expected value is

• The IV estimator is unbiased and consistent

1 2

2

2

2

. ( , ). ( , )

. ( , ) . ( , )

. ( , ) . ( , )

. ( , )

. ( , )

. ( , )

IVEst Cov z xEst Cov z y

bEst Cov z x Est Cov z x

Est Cov z x Est Cov z

Est Cov z x

Est Cov z

Est Cov z x

2 2( )IVE b

Variance of IV estimator

Variance of the IV estimator

Precision of the IV estimator improves if

• Variance of disturbance ε decreases

• Sample size n increases

• Variance of regressor x increases

• Correlation (rzx) of regressor x and instrument z increases

2 2

( )( )

( 1) ( )

IV

zx

VarVar b

n Var x r

OLS and IV as GMM estimatorsThe OLS residuals have the property

Thus, Est.Cov(x,e) = 0. This is the sample counterpart to the assumed population orthogonality condition Cov(x,ε) = 0

Note: we can derive the OLS estimator directly from the sampleorthogonality condition.

Assume centered data where sample averages of x and y are equalto zero, and assume the constant term is zero. Then

1

0n

OLS

i i

i

x e

2

1 1 1 1

( ) 0n n n n

OLS OLS OLS

i i i i i i i i

i i i i

x e x y b x x y b x

. ( , ) / . ( )OLSb Est Cov x y EstVar x

OLS and IV as GMM estimatorsAnalogously, the IV estimator is based on the population

orthogonality condition Cov(z,ε) = 0.

We can derive the IV estimator using the sample orthogonalitycondition

Both OLS and IV can be seen as special cases of the generalizedmethod of moment (GMM)

1

0n

IV

i i

i

z e

1 1 1

( ) 0n n n

IV IV

i i i i i i i

i i i

z y b x z y b z x

. ( , ) / . ( , )IVb Est Cov z y Est Cov z x

IV regression in Stata

Two-stage least squares can be implemented in Stata using the command ”ivreg” instead of the usual ”reg”

Syntax

.ivreg y x2 x3 x4 (x2 = z1 z2 x3 x4)

In matrix form:

OLS:

IV:

-1b = (X X) X y

-1b = (Z X) Z y

Over-identification

• Thus far, we assumed that there exist a single instrumentalvariable z that is highly correlated with x but uncorrelated with ε

• Examples of instrumental variables– Alternative proxy variables

– Past values xt-1

• If a useful instrument is available, then there are potentially morethan just one instrument

– If past value xt-1 is a good instrument for xt, then also xt-2, xt-3, …, are likelyuseful instruments.

• Choosing just one of the many instruments would be inefficientuse of information available

• Solution: two-stage least squares (2SLS) method

Two-stage least squares (2SLS)

• Assume we have one endogenous regressor x in the modely = β1 + β2x + ε

• Assume we have (L-1) instruments z2, z3,…, zL for x

2-stage estimation procedure:

1) Regress by using OLS:

x = κ1 + κ2z2 + κ3z3 + … + κLzL + ε

Save the fitted values: x* = k1 + k2z2 + k3z3 + … + kLzL

2) Use the fitted values x* to estimate the original regression equation:

y = β1 + β2x* + ε

Two-stage least squares (2SLS)

Practical notes:

• If we have more than one endogenous ”problem variable” x, thenstage 1 can be done separately for each variable

• Different endogenous regressors can be instrumented withdifferent z variables

• All exogenous regressors x are usually included as instruments z

• If OLS is used in the stepwise estimation, the standard errors of the 2-stage regression need to be adjusted– Stata does this automatically when ”ivreg” is used

Example: production function of electricity

distribution networks

Assume Cobb-Douglas production function

ln y = β0 + β1Li + β2Ki + εi

• Output y: ln Energy (GWh)

• Inputs x: L = ln OPEX, K = ln Krepl

• Instrument for K: ln Knuse

– OPEX = operational expenditure (incl. wages)

– Krepl = Capital stock (replacement value)

– Knuse = Capital stock (net use value)

Sample of 160 observations in years 2011 and 2012.

CD function, direct OLS estimation

_cons -4.696363 .4388221 -10.70 0.000 -5.563119 -3.829606 lnKrepl .591481 .1152412 5.13 0.000 .3638579 .8191042 lnOPEX .4460534 .1190976 3.75 0.000 .210813 .6812938 lnEnergy Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 283.641113 159 1.78390637 Root MSE = .37842 Adj R-squared = 0.9197 Residual 22.483226 157 .143205261 R-squared = 0.9207 Model 261.157887 2 130.578943 Prob > F = 0.0000 F( 2, 157) = 911.83 Source SS df MS Number of obs = 160

. regress lnEnergy lnOPEX lnKrepl

Two-stage least squares (2SLS)

Capital stock is hard to measure. Suppose our proxy for capital

stock K contains measurement error. If that is the case, the OLS

estimator of the output elasticity of K is biased towards zero.

Two alternative proxy measures of K: Krepl and Knuse.

Two-stage least squares:

Stage 1: Regress ln Krepl on ln Knuse and ln OPEX. Record the

predicted ln Krepl (ln PrKrepl).

Stage 2: Regress ln Energy on ln OPEX and ln PrKrepl to

estimate the production function of interest.

2SLS regression

Exogenous variables: lnOPEX lnKnuse Endogenous variables: lnKrepl lnEnergy _cons -6.052248 .5352627 -11.31 0.000 -7.105403 -4.999093 lnKrepl .9875145 .1451144 6.81 0.000 .7019949 1.273034 lnOPEX .0456137 .1489348 0.31 0.760 -.2474226 .33865lnEnergy _cons 1.755602 .1187836 14.78 0.000 1.52189 1.989314 lnKnuse .6873963 .0377983 18.19 0.000 .6130263 .7617664 lnOPEX .2911237 .0407567 7.14 0.000 .2109328 .3713145lnKrepl Coef. Std. Err. t P>|t| [95% Conf. Interval]

lnEnergy 160 2 .3923997 0.9148 858.94 0.0000lnKrepl 160 2 .1486906 0.9862 5624.20 0.0000 Equation Obs Parms RMSE "R-sq" F-Stat P Two-stage least-squares regression

. reg3 (lnKrepl = lnOPEX lnKnuse) (lnEnergy = lnOPEX lnKrepl), exog(lnOPEX) 2sls

IV (2SLS) regression

Instruments: lnOPEX lnKnuseInstrumented: lnKrepl _cons -6.052248 .5302208 -11.41 0.000 -7.091462 -5.013034 lnOPEX .0456137 .1475319 0.31 0.757 -.2435436 .334771 lnKrepl .9875145 .1437475 6.87 0.000 .7057744 1.269254 lnEnergy Coef. Std. Err. z P>|z| [95% Conf. Interval]

Root MSE = .3887 R-squared = 0.9148 Prob > chi2 = 0.0000 Wald chi2(2) = 1750.71Instrumental variables (2SLS) regression Number of obs = 160

. ivregress 2sls lnEnergy lnOPEX (lnKrepl = lnKnuse lnOPEX)

IV (GMM) regression

Instruments: lnOPEX lnKnuseInstrumented: lnKrepl _cons -6.052248 .5687178 -10.64 0.000 -7.166915 -4.937582 lnOPEX .0456137 .1595625 0.29 0.775 -.2671231 .3583505 lnKrepl .9875145 .1540702 6.41 0.000 .6855425 1.289486 lnEnergy Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust

GMM weight matrix: Robust Root MSE = .3887 R-squared = 0.9148 Prob > chi2 = 0.0000 Wald chi2(2) = 1320.63Instrumental variables (GMM) regression Number of obs = 160

. ivregress gmm lnEnergy lnOPEX (lnKrepl = lnKnuse)

Testing for weak instruments

• F-test of joint significance in the 1-stage regression serves as a useful diagnostic test of weak instruments

• To avoid the problems with weak instruments (imprecisecoefficients), the coefficients of stage 1 regression should bejointly significant: F-stat > Fcrit

Hausman test

• also referred to as Durbin-Wu-Hausman test

Rationale: it is not always clear if endogeneity is a problem or not

• If exogeneity assumption Cov(x, ε) = 0 holds, then OLS estimatoris unbiased and efficient

• IV estimator is also unbiased, but less efficient (OLS preferred)

• However, if exogeneity assumption Cov(x, ε) = 0 fails, then OLS estimator is biased and inconsistent

• IV estimator remains unbiased (IV preferred)

Hausman test

H0: Cov(x, ε) = 0; OLS preferred

H1: Cov(x, ε) ≠ 0; IV preferred

Procedure:

• Estimate both OLS and IV regressions

• Compare the estimated coefficients bOLS and bIV and theirstandard errors

• If H0 is true, then difference |bIV - bOLS| should be small (due to inefficiency of the IV estimator)

• If H0 is true, the Hausman statistic follows chi-squared distributionwith the degrees of freedom equal to the number of endogenousregressors instrumented in the IV model

Hausman test in Stata

• Stata computes the Hausman test automatically

• Run the IV and OLS regressions

• Save the results by command ’estimates store name’– Example: ’estimates store CostIV’ and ’estimates store CostOLS’

• Hausman test is conducted by command ’hausman’– Example: ’hausman CostIV CostOLS constant’

Hausman test in Stata

Prob>chi2 = 0.0000 = 21.24 chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)

Test: Ho: difference in coefficients not systematic

B = inconsistent under Ha, efficient under Ho; obtained from regress b = consistent under Ho and Ha; obtained from ivregress lnOPEX .0456137 .4460534 -.4004397 .0870714 lnKrepl .9875145 .591481 .3960335 .0859234 IV OLS Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients

. hausman IV OLS

Next time – Mon 20 Mar

Topic:

• Time series econometrics