18
Chapter 2: Variably Parametric Nonlinear Regression with Endogenous Switching James Cunningham (September, 2012)

Cunningham slides-ch2

Embed Size (px)

DESCRIPTION

Testing

Citation preview

Page 1: Cunningham slides-ch2

Chapter 2: Variably Parametric Nonlinear Regression with Endogenous Switching

James Cunningham

(September, 2012)

Page 2: Cunningham slides-ch2

Introduction

• Most empirical research in health economics (HE) focuses on measurement of policy-relevant

causal effects: what effect would an exogenously mandated change in the (policy) variable

have on the outcome of interest?

• HE is replete with nonlinear outcomes: non-negative; count-valued; highly skewed; etc.

• The dissertation as a whole treats practical methods of estimating endogenous treatment

effects in nonlinear models.

• This paper — Chapter 2 — develops some flexible but parametric estimators in the case of

binary endogenous switching, methods which are either

o Minimally parametric (requiring specification of the conditional mean)

o Full information (requiring specification of the conditional density)

Page 3: Cunningham slides-ch2

Introduction (contd)

• These form foundation of the dissertation, drawing upon the research of Terza (1998, 2009,

etc).

• I demonstrate two estimators:

o Minimally parametric with specification of a conditional mean; by example we use an

exponential conditional mean with a linear index.

o Fully parametric with specification of the conditional density of the outcome; by example

we use the three-parameter generalized gamma (Manning et al. [2005]).

• In the sections that follow we introduce the estimation objective (average treatment effect);

give detail on the estimators; provide a Monte Carlo study of their efficiency properties; and

apply them to real data.

Page 4: Cunningham slides-ch2

Estimation Objective: Average Treatment Effect from a Potential Outcomes Perspective

• Consider measurement of the effect of a policy-relevant variable Xp on an outcome Y.

• Distinguish between the observed Xp and its exogenously mandated counterpart Xp* , and

similarly between Y and its potential (possibly counterfactual) value YXp

* .

• Then the average treatment effect is given by

E Y1⎡⎣ ⎤⎦ − E Y0⎡⎣ ⎤⎦ (1)

• Due to the (possibly) counterfactual natural of the random variables Y1 and Y0 , (1) cannot be

estimated directly.

Page 5: Cunningham slides-ch2

Estimation Objective: Average Treatment Effect (contd)

• But when controlling for a comprehensive set of variables Xo (observed), Xu (unobserved),

we can iterate expectations:

ATE = E Y1⎡⎣ ⎤⎦ − E Y0⎡⎣ ⎤⎦

= EXo ,XuE Y Xp = 1,Xo,Xu⎡⎣

⎤⎦ − E Y Xp = 0,Xo,Xu

⎡⎣

⎤⎦

⎡⎣

⎤⎦ (2)

• When correlated with Xp , ignoring the unobserved Xu will spuriously attribute some of its

effect to Xp .

• We can recover causal interpretation by formalizing the correlation between Xp and Xu , as in

Xp = 1 Wα + Xu > 0( ) (3)

where 1 ⋅( ) is a standard indicator function, W = Xo W+⎡

⎣⎤⎦ , W+ is a vector of identifying

instrumental variables, and Xu W( ) ~ N 0, 1( ).

Page 6: Cunningham slides-ch2

Estimation Objective: Average Treatment Effect (contd)

• By iterating expectations, we can then write (2) as

ATE = EXo

E Y Xp = 1,Xo,Xu⎡⎣

⎤⎦ − E Y Xp = 0,Xo,Xu

⎡⎣

⎤⎦{ }ϕ Xu( ) dXu

−∞

∞⌠

⌡⎮⎡

⎣⎢

⎦⎥ (4)

• Then an estimator of (1), through (2) and (3), is

ATE = 1

nE Y Xp = 1,Xo,Xu⎡⎣

⎤⎦ − E Y Xp = 0,Xo,Xu

⎡⎣

⎤⎦{ }ϕ Xu( ) dXu

−∞

∞⌠

⌡⎮⎡

⎣⎢

⎦⎥

i=1

n∑ (5)

where E ⋅⎡⎣ ⎤⎦ denotes an estimate of an expected value.

• We thus proceed by specifying estimators as if Xu were observed, just one variable among

others.

Page 7: Cunningham slides-ch2

Endogenous Treatment Effects in Continuous Nonnegative Models

• Consider the common specification

E Y Xp,Xo,Xu⎡⎣

⎤⎦ = exp Xpβp + Xoβo + Xuβu( ) (6)

• After some algebra the treatment effect from (5) can be written

ATE = 1

nexp Xoβo

+( ) exp βp( )−1( )⎡⎣⎢

⎤⎦⎥i=1

n∑ (7)

where β denotes an estimate of β , and βo+ is βo with its constant term shifted by

12βu

2 .

• We consider minimally and fully parametric approaches to the estimation of the parameters

necessary for (7).

Page 8: Cunningham slides-ch2

Endogenous Treatment Effects in Continuous Nonnegative Models: Minimally Parametric

• If the conditional mean assumption (6) holds, no further assumption is required (beyond the

relationship between Xp and Xu ).

• To derive consistent estimates of the parameters, it can be shown that

E Y Xp,W⎡⎣

⎤⎦ = exp Xpβp + Xoβo

+( ) xp

Φ βu + wα( )Φ wα( ) + 1− xp( )1−Φ βu + wα( )

1−Φ wα( )⎡

⎣⎢⎢

⎦⎥⎥

(8)

• (8) can be employed in estimation via a two-step procedure: probit in the first stage and

Nonlinear least squares in the second.

Page 9: Cunningham slides-ch2

Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric

• When further assumptions can or must be made, we must consider a full-information version

of the model above. Letting gg refer to the generalized gamma, assume that

f Y Xo,Xp,Xu( ) = gg Y X;µ,κ,σ( )= γ γ

σY γΓ γ( )exp Z γ − U( ) (9)

X = Xp Xo Xu

⎡⎣ ⎤⎦ , µ = Xpβp + Xoβo + Xuβu , γ = κ−2

, Z = sgn κ( ) log y−µ( ) / σ ,

and U = γ exp κ Z( ) • The generalized gamma is highly flexible: it fits the nonnegative, highly skewed outcomes

common in HE, and subsumes many popular distributions (gamma, Weibull, exponential,

lognormal)

Page 10: Cunningham slides-ch2

Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric

• Further:

E Y Xp,Xo,Xu⎡⎣

⎤⎦ = exp µ + k( ) (10)

where k = σ / κ( )log κ2( ) + log Γ κ−2 + σ / κ⎡

⎣⎤⎦( )− log κ−2( )

• Thus the average treatment effect estimator takes the above form, after adding the correction

k.

• It can be shown that (11)

L α,β,µ,κ,σ Y,Xp;W( ) = Xpi logXpi gg Yi Xi;κ,µ,σ( )ϕ Xu( ) dXu +−wα

∞⌠⌡⎮

1− Xpi( ) gg Yi Xi;κ,µ,σ( )ϕ Xu( ) dXu−∞

−wα⌠⌡⎮

⎜⎜⎜

⎟⎟⎟

⎨⎪

⎩⎪

⎬⎪

⎭⎪i=1

n∑

• The parameters β and α can be jointly estimated via maximum likelihood using (11).

Page 11: Cunningham slides-ch2

Monte Carlo Simulations

• To evaluate the consistency properties of the above estimators, we undertake a Monte Carlo

study. In all simulations the data generating process takes the following form:

Xo ~ U −0.5,1( ) , W ~ U 0,1( ), Xu ~ N 0,1( )

Xp = 1 Xoαo + Wαw +αc + Xu > 0( )

µ = Xpβp + Xoβo + Xuβu +βc , κ = 0.8, σ = 0.4

Y ~ GeneralizedGamma µ,σ,κ( )

αo αW αc⎡⎣ ⎤⎦ = 1 1 0.5⎡⎣ ⎤⎦

βp βo βu βc⎡⎣ ⎤⎦ = 1 1 0.5 0.25⎡⎣ ⎤⎦

• The average treatment effect was estimated by the above.

Page 12: Cunningham slides-ch2

Monte Carlo Simulations (contd)

With 500 repetitions each with sample sizes 5,000; 10,000; 50,000; and 100,000, we compute the

absolute percentage bias for each parameter: ABP β( ) = 1

mβ − ββi=1

m∑ .

Endogenous Treatment: Minimally Parametric Exponential Conditional Mean Estimator

βp = 1 βo = 1 βu = 0.5 βc = 0.25 ATE = 2.22

n Est ABP Est ABP Est ABP Est ABP Est ABP

5,000 0.995 7.65% 1.002 2.82% 0.504 11.24% 0.247 12.97% 2.201 6.24%

10,000 0.996 5.58% 1.002 1.97% 0.504 8.14% 0.249 9.88% 2.208 4.47%

50,000 1.002 2.38% 1.000 0.90% 0.498 3.55% 0.249 4.07% 2.219 1.91%

100,000 0.998 1.72% 1.000 0.67% 0.501 2.53% 0.250 2.84% 2.212 1.41%

Page 13: Cunningham slides-ch2

Monte Carlo Simulations (contd)

Endogenous Treatment: Full-Information Generalized Gamma Estimator

βp = 1 βo = 1 βu = 0.5 βc = 0.25

n Est ABP Est ABP Est ABP Est ABP

5,000 1.008 2.20% 0.998 0.91% 0.494 2.60% 0.240 8.34%

10,000 1.007 1.62% 0.999 0.67% 0.495 1.80% 0.243 6.12%

50,000 1.006 0.86% 0.999 0.32% 0.496 1.04% 0.243 3.45%

100,000 1.006 0.71% 0.999 0.23% 0.496 0.92% 0.243 3.03%

ATE = 2.22 κ = 0.8 σ = 0.4

Est ABP Est ABP Est ABP

5,000 2.226 2.12% 0.773 7.10% 0.406 2.95%

10,000 2.229 1.64% 0.777 5.25% 0.406 2.25%

50,000 2.227 0.86% 0.777 3.21% 0.406 1.53%

100,000 0.223 0.62% 0.778 2.83% 0.405 1.38%

Page 14: Cunningham slides-ch2

Monte Carlo Simulations (contd)

• On average, the parameter estimates are hit relatively well.

• There are clear efficiency advantages to using the full-information estimator — percentage

biases are low even in small samples.

• In small samples using the minimally parametric estimator, βu appears subject to some bias,

but implications for treatment effect estimation seems minimal.

• In future revisions simulations should draw upon correct standard errors to characterize the

seriousness of these implications in determining (and correcting for) endogeneity bias in small

samples.

Page 15: Cunningham slides-ch2

Real Data Example

• To provide an empirical demonstration, we applied both estimators above to the birthweight

data from Mullahy (1997), who investigated the role played by maternal cigarette smoking in

determining birthweight.

• Consider birthweight production to be a function of a binary indicator (cig) for whether the

mother smoked during pregnancy, other relevant covariates ( Xo), and any unobservable

determinants of birthweight ( Xu ):

E BirthWeight cig,Xo,Xu⎡⎣ ⎤⎦ = exp cig ⋅βcig + Xoβo + Xuβu( ) (12)

in the minimally parametric case, and (13)

BirthWeight cig,Xo,Xu( ) ~ GeneralizedGamma κ,µ = cig ⋅βcig + Xoβo + Xuβu ,σ( )

Page 16: Cunningham slides-ch2

Real Data Example (contd)

• The observable vector Xo contains birth order (parity), an indicator for race (white v.

nonwhite), an indicator for gender, and a constant;

• The variable of instruments contains parental education, family income, and the per-state

cigarette excise tax. Results

Birthweight Model with Endogenous Treatment Effect

Minimally Parametric (Exp Cond Mean)

Fully Parametric (Generalized Gamma)

Coefficient T-Statistic P-Value Coefficient T-Statistic P-Value

Smoked During Pregnancy -0.17 -3.82 0.00 -0.15 -7.10 0.00 Parity 0.02 3.06 0.00 0.01 2.81 0.01 White 0.06 4.65 0.00 0.05 4.19 0.00 Male 0.02 2.31 0.02 0.02 1.91 0.06 Constant 1.95 124.33 0.00 1.99 130.90 0.00 Xu 0.05 2.23 0.03 0.04 5.30 0.00 Effect of Cig on B.Wt. (lbs) -1.18 -4.26 0.00 -1.03 -7.80 0.00 κ 0.60 4.77 0.00 σ 0.16 20.00 0.00

All parameter estimates significant at conventional levels. Standard errors corrected for multi-step estimation.

Page 17: Cunningham slides-ch2

Real Data Example (contd)

• Results are broadly consistent between minimally and maximally parametric estimators,

although there are appear to be some efficiency gains from using maximum likelihood.

• In the minimally parametric case, maternal smoking appears to lead to a loss of 1.18 pounds;

and in the fully parametric case a loss of 1.03 pounds.

• Both are considerably different from a treatment effect estimate using NLS with an

exponential conditional mean that did not correct for endogeneity, which implies an average

drop in birthweight of about 0.57 pounds.

• Estimates of parameters κ and σ are statistically significant, so use of the generalized gamma

does appear to offer an opportunity for greater fit.

Page 18: Cunningham slides-ch2