Econometrics I

  • Upload
    sadah

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Econometrics I. Professor William Greene Stern School of Business Department of Economics. Econometrics I. Part 13 - Endogeneity. Cornwell and Rupert Data. Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are - PowerPoint PPT Presentation

Citation preview

Statistics

Econometrics IProfessor William GreeneStern School of BusinessDepartment of Economics

Part 13: Endogeneity13-#/621Econometrics IPart 13 - EndogeneityPart 13: Endogeneity13-#/622I am here to ask a little help for endogeneity.

I have a main regression, in which the independent variabels are lagged 1 year (this is an unbalanced panel dataset); I use fixed effect, xtreg:

Main Regression: Yt = Xt-1 + Qt-1 + Z3t-1

I suspect endogeneity: variable X may be itself determined by prior-year Y.As a solution, I read this strategy: regress the endogenous variable Xt-1 on the dependent variable (Yt-2) and other independent variables (i.e., Qt-2 and Zt-2); these Y Q and Z are allin year t-2, while X isin t-1.Then, from this regression, calculate the predicted values for X, and includethemas a control-for-endogeneity (e.g., a variable named Endogeneity-control) in the main regression above.

Question 1: in the Main Regression above, whenincluding the control for endogeneity (i.e.,the variableEndogeneity-control), do I have tolag its value? That is, do I have to include Endogeneity-control in t-1? or just the predicted values, without lagging?Part 13: Endogeneity13-#/62Cornwell and Rupert DataCornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 YearsVariables in the file areEXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA= 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationLWAGE = log of wage = dependent variable in regressionsThese data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text. Part 13: Endogeneity13-#/624Specification: Quadratic Effect of Experience

Part 13: Endogeneity13-#/625The Effect of Education on LWAGE

Part 13: Endogeneity13-#/626What Influences LWAGE?

Part 13: Endogeneity13-#/627An Exogenous Influence

Part 13: Endogeneity13-#/628Instrumental VariablesStructureLWAGE (ED,EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION)ED (MS, FEM)

Reduced Form: LWAGE[ ED (MS, FEM), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]

Part 13: Endogeneity13-#/629Two Stage Least Squares StrategyReduced Form: LWAGE[ ED (MS, FEM,X), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]Strategy (1) Purge ED of the influence of everything but MS, FEM (and the other variables). Predict ED using all exogenous information in the sample (X and Z).(2) Regress LWAGE on this prediction of ED and everything else.Standard errors must be adjusted for the predicted EDPart 13: Endogeneity13-#/6210OLS

Part 13: Endogeneity13-#/62The weird results for the coefficient on ED happened because the instruments, MS and FEM are dummy variables. There is not enough variation in these variables.

Part 13: Endogeneity13-#/6212Source of EndogeneityLWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + ED = f(MS,FEM, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + uPart 13: Endogeneity13-#/62Remove the EndogeneityLWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u +

StrategyEstimate uAdd u to the equation. ED is uncorrelated with when u is in the equation.

Part 13: Endogeneity13-#/62Auxiliary Regression for ED to Obtain Residuals

Part 13: Endogeneity13-#/62OLS with Residual (Control Function) Added2SLS

Part 13: Endogeneity13-#/62A Warning About Control Function

Part 13: Endogeneity13-#/62The Problem

Part 13: Endogeneity13-#/6218Instrumental VariablesFramework: y = X + , K variables in X.There exists a set of K variables, Z such that

plim(ZX/n) 0 but plim(Z/n) = 0

The variables in Z are called instrumental variables.An alternative (to least squares) estimator of is

bIV = (ZX)-1Zy

We consider the following:Why use this estimator?What are its properties compared to least squares?We will also examine an important applicationPart 13: Endogeneity13-#/6219IV EstimatorsConsistentbIV = (ZX)-1Zy = (ZX/n)-1 (ZX/n)+ (ZX/n)-1Z/n = + (ZX/n)-1Z/n Asymptotically normal (same approach to proof as for OLS)Inefficient to be shown.Part 13: Endogeneity13-#/6220The General ResultBy construction, the IV estimator is consistent. So, we have an estimator that is consistent when least squares is not.Part 13: Endogeneity13-#/6221LS as an IV EstimatorThe least squares estimator is (X X)-1Xy = (X X)-1ixiyi = + (X X)-1ixii If plim(XX/n) = Q nonzero plim(X/n) = 0 Under the usual assumptions LS is an IV estimator X is its own instrument.Part 13: Endogeneity13-#/6222IV EstimationWhy use an IV estimator? Suppose that X and are not uncorrelated. Then least squares is neither unbiased nor consistent.Recall the proof of consistency of least squares:

b = + (XX/n)-1(X/n).

Plim b = requires plim(X/n) = 0. If this does not hold, the estimator is inconsistent.Part 13: Endogeneity13-#/6223A Popular MisconceptionA popular misconception. If only one variable in X is correlated with , the other coefficients are consistently estimated. False.

The problem is smeared over the other coefficients.

Part 13: Endogeneity13-#/6224Asymptotic Covariance Matrix of bIV

Part 13: Endogeneity13-#/6225Asymptotic EfficiencyAsymptotic efficiency of the IV estimator. The variance is larger than that of LS. (A large sample type of Gauss-Markov result is at work.)(1) Its a moot point. LS is inconsistent.(2) Mean squared error is uncertain:

MSE[estimator|]=Variance + square of bias.

IV may be better or worse. Depends on the dataPart 13: Endogeneity13-#/6226Two Stage Least SquaresHow to use an excess of instrumental variables(1) X is K variables. Some (at least one) of the K variables in X are correlated with .(2) Z is M > K variables. Some of the variables in Z are also in X, some are not. None of the variables in Z are correlated with .(3) Which K variables to use to compute ZX and Zy?Part 13: Endogeneity13-#/6227Choosing the InstrumentsChoose K randomly?Choose the included Xs and the remainder randomly?Use all of them? How?A theorem: (Brundy and Jorgenson, ca. 1972) There is a most efficient way to construct the IV estimator from this subset:(1) For each column (variable) in X, compute the predictions of that variable using all the columns of Z.(2) Linearly regress y on these K predictions.This is two stage least squares

Part 13: Endogeneity13-#/6228Algebraic EquivalenceTwo stage least squares is equivalent to(1) each variable in X that is also in Z is replaced by itself.(2) Variables in X that are not in Z are replaced by predictions of that X with all the variables in Z that are not in X.Part 13: Endogeneity13-#/62292SLS Algebra

Part 13: Endogeneity13-#/6230Asymptotic Covariance Matrix for 2SLS

Part 13: Endogeneity13-#/62312SLS Has Larger Variance than LS

Part 13: Endogeneity13-#/6232Estimating 2

Part 13: Endogeneity13-#/6233Cornwell and Rupert DataCornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 YearsVariables in the file areEXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA= 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationLWAGE = log of wage = dependent variable in regressionsThese data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text. Part 13: Endogeneity13-#/6234Endogeneity Test? (Hausman) Exogenous Endogenous

OLS Consistent, Efficient Inconsistent 2SLS Consistent, Inefficient Consistent

Base a test on d = b2SLS - bOLS Use a Wald statistic, d[Var(d)]-1d What to use for the variance matrix? Hausman: V2SLS - VOLS Part 13: Endogeneity13-#/62Hausman Test

Part 13: Endogeneity13-#/62Hausman Test: One at a Time?

Part 13: Endogeneity13-#/62Endogeneity Test: WuConsiderable complication in Hausman test (text, pp. 322-323)Simplification: Wu test.Regress y on X and X^ estimated for the endogenous part of X. Then use an ordinary Wald test.Part 13: Endogeneity13-#/62Wu Test

Note: .05544 + .54900 = .60444, which is the 2SLS coefficient on ED.Part 13: Endogeneity13-#/62Alternative to Hausmans Formula?H test requires the difference between an efficient and an inefficient estimator.

Any way to compare any two competing estimators even if neither is efficient?

Bootstrap? (Maybe)Part 13: Endogeneity13-#/62

Part 13: Endogeneity13-#/62Measurement Error y = x* + all of the usual assumptionsx = x* + uthe true x* is not observed (education vs. years of school) What happens when y is regressed on x? Least squares attenutation:

Part 13: Endogeneity13-#/6242Why Is Least Squares Attenuated?y = x* + x = x* + uy = x + ( - u)y = x + v, cov(x,v) = - var(u)Some of the variation in x is not associated with variation in y. The effect of variation in x on y is dampened by the measurement error.Part 13: Endogeneity13-#/6243Measurement Error in Multiple Regression

Part 13: Endogeneity13-#/6244Twins Application from the literature: Ashenfelter/Kreuger: A wage equation for twins that includes schooling.

Part 13: Endogeneity13-#/6245OrthodoxyA proxy is not an instrumental variable

Instrument is a noun, not a verb

Are you sure that the instrument really exogenous? The natural experiment.Part 13: Endogeneity13-#/6246The First IV Study(Snow, J., On the Mode of Communication of Cholera, 1855)London Cholera epidemic, ca 1853-4Cholera = f(Water Purity,u)+.Effect of water purity on cholera?Purity=f(cholera prone environment (poor, garbage in streets, rodents, etc.). Regression does not work. Two London water companies Lambeth Southwark======|||||====== Main sewage dischargePaul Grootendorst: A Review of Instrumental Variables Estimation of Treatment Effectshttp://individual.utoronto.ca/grootendorst/pdf/IV_Paper_Sept6_2007.pdfPart 13: Endogeneity13-#/6247IV EstimationCholera=f(Purity,u)+Z = water companyCov(Cholera,Z)=Cov(Purity,Z)Z is randomly mixed in the population (two full sets of pipes) and uncorrelated with behavioral unobservables, u)Cholera=+Purity+u+Purity = Mean+random variation+uCov(Cholera,Z)= Cov(Purity,Z)Part 13: Endogeneity13-#/6248Autism: Natural ExperimentAutism ----- Television watchingWhich way does the causation go?We need an instrument: RainfallRainfall effects staying indoors which influences TV watchingRainfall is definitely absolutely truly exogenous, so it is a perfect instrument.The correlation survives, so TV causes autism.Part 13: Endogeneity13-#/62Two Problems with 2SLSZX/n may not be sufficiently large. The covariance matrix for the IV estimator is Asy.Cov(b ) = 2[(ZX)(ZZ)-1(XZ)]-1If ZX/n -> 0, the variance explodes.Additional problems:2SLS biased toward plim OLSAsymptotic results for inference fall apart.When there are many instruments, is too close to X; 2SLS becomes OLS.

Part 13: Endogeneity13-#/6250Weak InstrumentsSymptom: The relevance condition, plim ZX/n not zero, is close to being violated.Detection: Standard F test in the regression of xk on Z. F < 10 suggests a problem.F statistic based on 2SLS see text p. 351.Remedy:Not much most of the discussion is about the condition, not what to do about it.Use LIML? Requires a normality assumption. Probably not too restrictive.Part 13: Endogeneity13-#/6251A study of moral hazardRiphahn, Wambach, Million: Incentive Effects in the Demand for HealthcareJournal of Applied Econometrics, 2003

Did the presence of the ADDON insurance influence the demand for health care doctor visits and hospital visits?

For a simple example, we examine the PUBLIC insurance (89%) instead of ADDON insurance (2%).

Part 13: Endogeneity13-#/6252Application: Health Care Panel DataGerman Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of the data for the person. (Downloaded from the JAE Archive) DOCTOR = 1(Number of doctor visits > 0) HOSPITAL= 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status EDUC = years of educationPart 13: Endogeneity13-#/6253Evidence of Moral Hazard?

Part 13: Endogeneity13-#/6254Regression Study

Part 13: Endogeneity13-#/6255Endogenous Dummy VariableDoctor Visits = f(Age, Educ, Health, Presence of Insurance, Other unobservables)

Insurance = f(Expected Doctor Visits, Other unobservables)Part 13: Endogeneity13-#/6256Approaches(Parametric) Control Function: Build a structural model for the two variables (Heckman)

(Semiparametric) Instrumental Variable: Create an instrumental variable for the dummy variable (Barnow/Cain/ Goldberger, Angrist, Current generation of researchers)

(?) Propensity Score Matching (Heckman et al., Becker/Ichino, Many recent researchers)Part 13: Endogeneity13-#/6257Heckmans Control Function ApproachY = x + T + E[|T] + { - E[|T]} = E[|T] , computed from a model for whether T = 0 or 1

Magnitude = 11.1200 is nonsensical in this context.Part 13: Endogeneity13-#/6258Instrumental Variable Approach Construct a prediction for T using only the exogenous information Use 2SLS using this instrumental variable.

Magnitude = 23.9012 is also nonsensical in this context.Part 13: Endogeneity13-#/6259Propensity Score MatchingCreate a model for T that produces probabilities for T=1: Propensity ScoresFind people with the same propensity score some with T=1, some with T=0Compare number of doctor visits of those with T=1 to those with T=0.

Part 13: Endogeneity13-#/6260Treatment EffectEarnings and Education: Effect of an additional year of schooling

Estimating Average and Local Average Treatment Effects of Education when Compulsory Schooling Laws Really MatterPhilip OreopoulosAER, 96,1, 2006, 152-175Part 13: Endogeneity13-#/62Treatment Effects and Natural Experiments

Part 13: Endogeneity13-#/62