62
Part 13: Endogeneity 3-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Embed Size (px)

Citation preview

Page 1: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-1/62

Econometrics IProfessor William Greene

Stern School of Business

Department of Economics

Page 2: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-2/62

Econometrics I

Part 13 - Endogeneity

Page 3: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-3/62

I am here to ask a little help for endogeneity.

I have a main regression, in which the independent variabels are lagged 1 year (this is an unbalanced panel dataset); I use fixed effect, xtreg:

Main Regression:  Yt = Xt-1 + Qt-1  +  Z3t-1  

I suspect endogeneity: variable X may be itself determined by prior-year Y.As a solution, I read this strategy: regress the endogenous variable Xt-1 on the dependent variable (Yt-2) and other independent variables (i.e., Qt-2 and Zt-2); these Y Q and Z are all in year  t-2, while X is in t-1.  Then, from this regression, calculate the “predicted” values for X, and include them as a control-for-endogeneity (e.g., a variable named “Endogeneity-control”) in the main regression above.

Question 1: in the Main Regression above, when including the control for endogeneity (i.e., the variable  “Endogeneity-control”), do I have to lag its value? That is, do I have to include Endogeneity-control in  t-1? or just the predicted values, without lagging?

Page 4: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-4/62

Cornwell and Rupert DataCornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 YearsVariables in the file are

EXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA = 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationLWAGE = log of wage = dependent variable in regressions

These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155.  See Baltagi, page 122 for further analysis.  The data were downloaded from the website for Baltagi's text.

Page 5: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-5/62

Specification: Quadratic Effect of Experience

Page 6: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-6/62

The Effect of Education on LWAGE

1 2 3 4 ... ε

What is ε? ,... + everything elAbil seity, Motivation

Ability, Motivation = f( , , , ,...)

LWAGE EDUC EXP

EDUC GENDER SMSA SOUTH

2EXP

Page 7: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-7/62

What Influences LWAGE?

1 2

3 4

Ability, Motivation

Ability, Motivat

( , ,...)

...

ε( )

Increased is associated with increases in

ion

Ability

Ability, Motivati( , ,on

LWAGE EDUC X

EXP

EDUC X

2EXP

2

...) and ε( )

What looks like an effect due to increase in may

be an increase in . The estimate of picks up

the effect of and the hidden effect of .

Ability, Motivation

Ability

Ability

EDUC

EDUC

Page 8: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-8/62

An Exogenous Influence

1 2

3 4

( , , ,...)

...

ε( )

Increased is asso

Abili

ciate

ty, Motivation

Ability, Motivation

Ability, Motiva

d with increases in

( , , ,ti .n .o

LWAGE EDU Z

Z

Z

C X

EXP

EDUC X

2EXP

2

.) and not ε( )

An effect due to the effect of an increase on will

only be an increase in . The estimate of picks up

the effect of only.

Ability, Motiv

ation

EDUC

EDUC

ED

Z

Z

UC

is an Instrumental Variable

Page 9: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-9/62

Instrumental Variables

Structure LWAGE (ED,EXP,EXPSQ,WKS,OCC,

SOUTH,SMSA,UNION) ED (MS, FEM)

Reduced Form: LWAGE[ ED (MS, FEM), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]

Page 10: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-10/62

Two Stage Least Squares Strategy

Reduced Form: LWAGE[ ED (MS, FEM,X), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]

Strategy (1) Purge ED of the influence of everything but MS,

FEM (and the other variables). Predict ED using all exogenous information in the sample (X and Z).

(2) Regress LWAGE on this prediction of ED and everything else.

Standard errors must be adjusted for the predicted ED

Page 11: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-11/62

OLS

Page 12: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-12/62

The weird results for the coefficient on ED happened because the instruments, MS and FEM are dummy variables. There is not enough variation in these variables.

Page 13: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-13/62

Source of Endogeneity

LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) +

ED = f(MS,FEM, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u

Page 14: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-14/62

Remove the Endogeneity

LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u +

Strategy Estimate u Add u to the equation. ED is uncorrelated with when u is in

the equation.

Page 15: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-15/62

Auxiliary Regression for ED to Obtain Residuals

Page 16: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-16/62

OLS with Residual (Control Function) Added

2SLS

Page 17: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-17/62

A Warning About Control Function

Page 18: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-18/62

The Problem

1

2

Cov( , ) , K variables

Cov( , ) , K variables

is

OLS regression of y on ( , ) cannot estimate ( , )

consistently. Some other estimator is needed.

Additional structure:

= + wh

endogenous

y X Y

X 0

Y 0

Y

X Y

Y Z V

ere Cov( , )= .

An estimator based on ( , ) may

be able to estimate ( , ) consistently.

instrumental variable (

Z 0

XIV) Z

Page 19: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-19/62

Instrumental Variables Framework: y = X + , K variables in X. There exists a set of K variables, Z such that

plim(Z’X/n) 0 but plim(Z’/n) = 0

The variables in Z are called instrumental variables. An alternative (to least squares) estimator of is

bIV = (Z’X)-1Z’y

We consider the following: Why use this estimator? What are its properties compared to least squares?

We will also examine an important application

Page 20: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-20/62

IV Estimators

Consistent

bIV = (Z’X)-1Z’y

= (Z’X/n)-1 (Z’X/n)β+ (Z’X/n)-1Z’ε/n

= β+ (Z’X/n)-1Z’ε/n β

Asymptotically normal (same approach to proof as for OLS)

Inefficient – to be shown.

Page 21: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-21/62

The General Result

By construction, the IV estimator is consistent. So, we have an estimator that is consistent when least squares is not.

Page 22: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-22/62

LS as an IV EstimatorThe least squares estimator is

(X X)-1Xy = (X X)-1ixiyi

= + (X X)-1ixiεi

If plim(X’X/n) = Q nonzero

plim(X’ε/n) = 0

Under the usual assumptions LS is an IV estimator X is its own instrument.

Page 23: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-23/62

IV EstimationWhy use an IV estimator? Suppose that X and

are not uncorrelated. Then least squares is neither unbiased nor consistent.

Recall the proof of consistency of least squares:

b = + (X’X/n)-1(X’/n).

Plim b = requires plim(X’/n) = 0. If this does not hold, the estimator is inconsistent.

Page 24: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-24/62

A Popular Misconception

A popular misconception. If only one variable in X is correlated with , the other coefficients are consistently estimated. False.

The problem is “smeared” over the other coefficients.

1

111

21-1

1

1

1

Suppose only the first variable is correlated with

0Under the assumptions, plim( /n) = . Then

...

.

0plim = plim( /n)

... ....

times

K

q

q

q

ε

X'ε

b- β X'X

-1 the first column of Q

Page 25: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-25/62

Asymptotic Covariance Matrix of bIV

1IV

1 -1IV IV

2 1 -1IV IV

( ) '

( )( ) ' ( ) ' ' ( )

E[( )( ) '| ] ( ) ' ( )

b Z'X Z

b b Z'X Z Z X'Z

b b X,Z Z'X Z Z X'Z

Page 26: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-26/62

Asymptotic Efficiency

Asymptotic efficiency of the IV estimator. The variance is larger than that of LS. (A large sample type of Gauss-Markov result is at work.)

(1) It’s a moot point. LS is inconsistent.(2) Mean squared error is uncertain:

MSE[estimator|β]=Variance + square of bias.

IV may be better or worse. Depends on the data

Page 27: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-27/62

Two Stage Least Squares

How to use an “excess” of instrumental variables

(1) X is K variables. Some (at least one) of the K

variables in X are correlated with ε.

(2) Z is M > K variables. Some of the variables in

Z are also in X, some are not. None of the

variables in Z are correlated with ε.

(3) Which K variables to use to compute Z’X and Z’y?

Page 28: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-28/62

Choosing the Instruments Choose K randomly? Choose the included Xs and the remainder randomly? Use all of them? How? A theorem: (Brundy and Jorgenson, ca. 1972) There is a

most efficient way to construct the IV estimator from this subset: (1) For each column (variable) in X, compute the predictions of

that variable using all the columns of Z. (2) Linearly regress y on these K predictions.

This is two stage least squares

Page 29: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-29/62

Algebraic Equivalence

Two stage least squares is equivalent to (1) each variable in X that is also in Z is replaced by

itself. (2) Variables in X that are not in Z are replaced by

predictions of that X with all the variables in Z that are not in X.

Page 30: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-30/62

2SLS Algebra

1

1

ˆ

ˆ ˆ ˆ( )

But, = ( ) and ( ) is idempotent.

ˆ ˆ ( )( ) ( ) so

ˆ ˆ( ) = a real IV estimator by the definition.

ˆNote, plim( /n) =

-1

2SLS

-1Z Z

Z Z Z

2SLS

X Z(Z'Z) Z'X

b X'X X'y

Z(Z'Z) Z'X I -M X I -M

X'X= X' I -M I -M X= X' I -M X

b X'X X'y

X' 0

-1

ˆ since columns of are linear combinations

of the columns of , all of which are uncorrelated with

( ) ] ( )

2SLS Z Z

X

Z

b X' I -M X X' I -M y

Page 31: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-31/62

Asymptotic Covariance Matrix for 2SLS

2 1 -1IV IV

2 1 -12SLS 2SLS

General Result for Instrumental Variable Estimation

E[( )( ) '| ] ( ) ' ( )

ˆSpecialize for 2SLS, using = ( )

ˆ ˆ ˆ ˆE[( )( ) '| ] ( ) ' ( )

Z

b b X,Z Z'X Z Z X'Z

Z X= I -M X

b b X,Z X'X X X X'X

2 1 -1

2 1

ˆ ˆ ˆ ˆ ˆ ˆ ( ) ' ( )

ˆ ˆ ( )

X'X X X X'X

X'X

Page 32: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-32/62

2SLS Has Larger Variance than LS

2 -1

2 -1

A comparison to OLS

ˆ ˆAsy.Var[2SLS]= ( ' )

Neglecting the inconsistency,

Asy.Var[LS] = ( ' )

(This is the variance of LS around its mean, not )

Asy.Var[2SLS] Asy.Var[LS] in the matrix sense.

Com

X X

X X

β

-1 -1 2

2 2Z Z

pare inverses:

ˆ ˆ{Asy.Var[LS]} - {Asy.Var[2SLS]} (1/ )[ ' ' ]

(1/ )[ ' '( ) ]=(1/ )[ ' ]

This matrix is nonnegative definite. (Not positive definite

as it might have some rows and columns

X X- X X

X X- X I M X X M X

which are zero.)

Implication for "precision" of 2SLS.

The problem of "Weak Instruments"

Page 33: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-33/62

Estimating σ2

2

2 n1i 1 in

Estimating the asymptotic covariance matrix -

a caution about estimating .

ˆSince the regression is computed by regressing y on ,

one might use

ˆ (y )ˆ

This is i

2sls

x

x'b

2 n1i 1 in

nconsistent. Use

(y )ˆ

(Degrees of freedom correction is optional. Conventional,

but not necessary.)

2slsx'b

Page 34: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-34/62

Cornwell and Rupert DataCornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 YearsVariables in the file are

EXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA = 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationLWAGE = log of wage = dependent variable in regressions

These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155.  See Baltagi, page 122 for further analysis.  The data were downloaded from the website for Baltagi's text.

Page 35: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-35/62

Endogeneity Test? (Hausman) Exogenous Endogenous

OLS Consistent, Efficient Inconsistent 2SLS Consistent, Inefficient Consistent

Base a test on d = b2SLS - bOLS

Use a Wald statistic, d’[Var(d)]-1d

What to use for the variance matrix? Hausman: V2SLS - VOLS

Page 36: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-36/62

Hausman Test

Page 37: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-37/62

Hausman Test: One at a Time?

Page 38: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-38/62

Endogeneity Test: Wu Considerable complication in Hausman test

(text, pp. 322-323) Simplification: Wu test. Regress y on X and X^ estimated for the

endogenous part of X. Then use an ordinary Wald test.

Page 39: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-39/62

Wu Test

Note: .05544 + .54900 = .60444, which is the 2SLS coefficient on ED.

Page 40: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-40/62

Alternative to Hausman’s Formula? H test requires the difference between an

efficient and an inefficient estimator.

Any way to compare any two competing estimators even if neither is efficient?

Bootstrap? (Maybe)

Page 41: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-41/62

Page 42: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-42/62

Measurement Error y = x* + all of the usual assumptions

x = x* + u the true x* is not observed (education vs. years of school)

What happens when y is regressed on x? Least squares attenutation:

cov(x,y) cov(x* u, x* )plim b =

var(x) var(x* u)

var(x*) = <

var(x*) var(u)

Page 43: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-43/62

Why Is Least Squares Attenuated?

y = x* + x = x* + u

y = x + ( - u)

y = x + v, cov(x,v) = - var(u)

Some of the variation in x is not associated with variation in y. The effect of variation in x on y is dampened by the measurement error.

Page 44: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-44/62

Measurement Error in Multiple Regression

1 1 2 2

1 1 1

2

1 2

Multiple regression: y = x * x *

x * is measured with error; x x * u

x is measured with out error.

The regression is estimated by least squares

Popular myth #1. b is biased downward, b consiste

ij i j

ij

nt.

Popular myth #2. All coefficients are biased toward zero.

Result for the simplest case. Let

cov(x*,x *),i, j 1,2 (2x2 covariance matrix)

ijth element of the inverse of the covariance matrix

2

2 12

1 1 2 2 12 11 2 11

var(u)

For the least squares estimators:

1plim b , plim b

1 1

The effect is called "smearing."

Page 45: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-45/62

Twins

Application from the literature: Ashenfelter/Kreuger: A wage equation for twins that includes “schooling.”

Page 46: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-46/62

Orthodoxy

A proxy is not an instrumental variable

Instrument is a noun, not a verb

Are you sure that the instrument really exogenous? The “natural experiment.”

Page 47: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-47/62

The First IV Study(Snow, J., On the Mode of Communication of Cholera, 1855)

London Cholera epidemic, ca 1853-4 Cholera = f(Water Purity,u)+ε.

Effect of water purity on cholera? Purity=f(cholera prone environment (poor, garbage in

streets, rodents, etc.). Regression does not work.

Two London water companies

Lambeth Southwark

======|||||======

Main sewage discharge

Paul Grootendorst: A Review of Instrumental Variables Estimation of Treatment Effects…http://individual.utoronto.ca/grootendorst/pdf/IV_Paper_Sept6_2007.pdf

Page 48: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-48/62

IV Estimation

Cholera=f(Purity,u)+ε Z = water company Cov(Cholera,Z)=δCov(Purity,Z) Z is randomly mixed in the population (two full

sets of pipes) and uncorrelated with behavioral unobservables, u)

Cholera=α+δPurity+u+ε Purity = Mean+random variation+λu Cov(Cholera,Z)= δCov(Purity,Z)

Page 49: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-49/62

Autism: Natural Experiment

Autism ----- Television watching Which way does the causation go? We need an instrument: Rainfall

Rainfall effects staying indoors which influences TV watching

Rainfall is definitely absolutely truly exogenous, so it is a perfect instrument.

The correlation survives, so TV “causes” autism.

Page 50: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-50/62

Two Problems with 2SLS

Z’X/n may not be sufficiently large. The covariance matrix for the IV estimator is Asy.Cov(b ) = σ2[(Z’X)(Z’Z)-1(X’Z)]-1

If Z’X/n -> 0, the variance explodes. Additional problems:

2SLS biased toward plim OLS Asymptotic results for inference fall apart.

When there are many instruments, is too close to X; 2SLS becomes OLS.

Page 51: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-51/62

Weak Instruments

Symptom: The relevance condition, plim Z’X/n not zero, is close to being violated.

Detection: Standard F test in the regression of xk on Z. F < 10 suggests a

problem. F statistic based on 2SLS – see text p. 351.

Remedy: Not much – most of the discussion is about the condition, not

what to do about it. Use LIML? Requires a normality assumption. Probably not too

restrictive.

Page 52: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-52/62

A study of moral hazardRiphahn, Wambach, Million: “Incentive Effects in the Demand for Healthcare”Journal of Applied Econometrics, 2003

Did the presence of the ADDON insurance influence the demand for health care – doctor visits and hospital visits?

For a simple example, we examine the PUBLIC insurance (89%) instead of ADDON insurance (2%).

Page 53: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-53/62

Application: Health Care Panel Data

German Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice.  This is a large data set.  There are altogether 27,326 observations.  The number of observations ranges from 1 to 7.  (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).  Note, the variable NUMOBS below tells how many observations there are for each person.  This variable is repeated in each row of the data for the person.  (Downloaded from the JAE Archive) DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT =  health satisfaction, coded 0 (low) - 10 (high)   DOCVIS =  number of doctor visits in last three months HOSPVIS =  number of hospital visits in last calendar year PUBLIC =  insured in public health insurance = 1; otherwise = 0 ADDON =  insured by add-on insurance = 1; otherswise = 0 HHNINC =  household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC =  years of schooling AGE = age in years MARRIED = marital status EDUC = years of education

Page 54: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-54/62

Evidence of Moral Hazard?

Page 55: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-55/62

Regression Study

Page 56: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-56/62

Endogenous Dummy Variable

Doctor Visits = f(Age, Educ, Health, Presence of Insurance, Other unobservables)

Insurance = f(Expected Doctor Visits, Other unobservables)

Page 57: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-57/62

Approaches

(Parametric) Control Function: Build a structural model for the two variables (Heckman)

(Semiparametric) Instrumental Variable: Create an instrumental variable for the dummy variable (Barnow/Cain/ Goldberger, Angrist, Current generation of researchers)

(?) Propensity Score Matching (Heckman et al., Becker/Ichino, Many recent researchers)

Page 58: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-58/62

Heckman’s Control Function Approach Y = xβ + δT + E[ε|T] + {ε - E[ε|T]} λ = E[ε|T] , computed from a model for whether T = 0 or 1

Magnitude = 11.1200 is nonsensical in this context.

Page 59: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-59/62

Instrumental Variable Approach

Construct a prediction for T using only the exogenous information

Use 2SLS using this instrumental variable.

Magnitude = 23.9012 is also nonsensical in this context.

Page 60: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-60/62

Propensity Score Matching Create a model for T that produces probabilities for T=1: “Propensity Scores” Find people with the same propensity score – some with T=1, some with T=0 Compare number of doctor visits of those with T=1 to those with T=0.

Page 61: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-61/62

Treatment Effect

Earnings and Education: Effect of an additional year of schooling

Estimating Average and Local Average Treatment Effects of Education when Compulsory Schooling Laws Really Matter Philip Oreopoulos AER, 96,1, 2006, 152-175

Page 62: Part 13: Endogeneity 13-1/62 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 13: Endogeneity13-62/62

Treatment Effects and Natural Experiments