Upload
lena-hakes
View
223
Download
0
Embed Size (px)
Citation preview
1
Program Evaluation andPanel Discrete Data Models – Some Considerations on
Methodology and Applications
Cheng Hsiao
2
1. Essential Issues for Program Evaluation
2. Panel Discrete Choice Models
3. General Principle of Estimating Structural Parameters in the Presence of Incidental Parameters
4. Parametric Approach
5. Semi-Parametric Approach
6. Bias-Reduced Approach
7. Concluding Remarks
3
Definition of Treatment Effect
Let denote the potential outcomes of
the ith individual in the untreated and treated state. Then, the treatment effects on the ith individual are just
),( *1
*0 yy ii
yy iii*0
*1
1. Essential Issues for Program Evaluation
4
Two measures of treatment effects are of interest to policy makers:
The average treatment effects (ATE)
and the treatment effects on the treated (TT)
)( *0
*1 yyEATE ii
)group treatment |( *0
*1 iyyETT ii
5
The ATE is of interest if one is interested in the effect of treatment for a randomly assigned individual or population mean response to treatment.
The TT is of interest if the same selection rule for treatment continues in the future.
6
The observed data is in the form of ,
i = 1,…,N, where if the ith individual receives treatment and if not, and
, where i = 1,…,N
In other words, we do not simultaneously observe
and . An individual is either observed with or
. Using
as a measure of ATE could be subject to two sources of bias: selection on observables and selection on unobservables.
),( dy ii
1d i0d i
ydydy iiiii*0
*1 )1(
y i*1
y i*0 y i
*1
y i*0
N
iiiN
ii
N
iiiN
ii
dyd
dyd 1
1
1
1
)1()1(
11(1)
7
If selection into the treatment is random,
(1) can be used to estimate ATE (≡ TT).
),,( *0
*1 yyd iii
then
)()1|()1|( *1
*1 yEdyEdyE iiiii
)()0|()0|( *0
*0 yEdyEdyE iiiii
8
If
(1) will be a biased estimate of ATE or TT.
),,( *0
*1 yyd iii
then
)()1|()1|( *1
*1 yEdyEdyE iiiii
)()0|()0|( *0
*0 yEdyEdyE iiiii
9
Suppose we can decompose the outcomes in terms of the effects of observables and the effects of unobservables as
where denotes the effects of observable factors , denotes the effects of unobservable factors, j = 0,1, and .
Then
and
iii xgy 1~
1*1 )(
iii xgy 0~
0*0 )(
)(~xg ij
~xi ji
0)|(~xE iji
)]()]()([ 01~
0~
1 iiiii xgxg
)]()([~
0~
1 xgxgEATE
10
Selection on observables
Selection on unobservables)1|()( 11 dEE iii
)0|()( 00 dEE iii
11
Conditional Independence Assumption – selection is ignorable after controlling a set of observable confounders x
~
*0
*1 |),( xyyd iiii
),|(),|(~
*1
~
*1 xyExdyE iiiii
),|(),|(~
*0
~
*0 xyExdyE iiiii
)|()|(
),0|(),1|(
)|()(
~~
*0
~~
*1
~~~~
~~
*0
*1
~
xxyExxyE
xxdyExxdyE
xxyyExATE
iiii
iiiiii
iii
then
12
Adjustment Methods
1. If there is no selection on unobservables
(i.e. where j = 0,1)
1a. Propensity Score Matching Method (Rosenbaum and Rubin (1983))
Let , then
(i)
(ii)
0)|()( jdEE ijiji
1)()|1(0~~ xpxdp iii
)(|),(|),(
~
*0
*1
~
*0
*1 xpdyyxdyy iiiiiiii
)(|~~xpdx iii
13
(ii) →
If a subclass of units or a matched treatment-control pair is homogenous in , the treated and control units in that subclass or matched pair will have the same distribution of , then, at any value of a propensity score, the difference between the treatment and the control means is an unbiased estimate of the ATE at that value of the propensity score (treatment assignment is ignorable)
)(~xp
x~
))(|())(,0|())(,1|(~~~~~~xpxfxpdxfxpdxf
14
)0),(|()1),(|()( dxpyEdxpyEEATE iiiixp
))(|())(|( *0
*1)( xpyExpyEE iixp
)( *0
*1 yyE ii
)0),(|()1),(|(1|)( dxpyEdxpyEETT iiiid ixp
15
Issues of this approach
(i) Conditional independence assumption is not a testable hypothesis
(ii) Estimates sensitive to how blocks of
are constructed
)(xp
16
Decriminalization and
Marijuana Smoking Prevalence:
Evidence from Australia
Kannika Damrongplasit, Cheng Hsiao and Xueyan Zhao
17
The global sale of illegal drugs – US $150 billion (2001)US drugs policy costs $30-40 billion a year
18
Decriminalized States Non-decriminalized states
South Australia 1987
ACT 1992
Northern Territory 1996
Western Australia 2003
New South Wales
Queensland
Victoria
Tasmania
19
What is Decriminalization Policy?
• Reduction of penalties for possession and cultivation of marijuana for personal consumption (i.e. minor possession)
• In Australia, it is called “Expiation System”
- Still an offence to use or grow marijuana
- The offence is expiable by payment of a fine with no imprisonment and no criminal record if the fine is paid.
20
Debates on Marijuana Decriminalization
• Supporting Arguments
- Criminal offence from marijuana possession is too severe
- Allow separation of marijuana market from other harder drugs
- Reduce law enforcement and criminal justice resources
• Opposing Arguments
- Increase marijuana smoking prevalence
- Greater use of other illicit drugs
21
Sources of Data
(i) 2001 National Drug Strategy Household survey (NDSHS)- Nationally representative survey of non-institutionalized civilian population aged 14 and above- 26744 total observations- 14008 resulting samples after delete missing data- Treatment group = 2968, Control group = 11040
(ii) Australian Illicit Drug Report
(iii) Australia Bureau of Statistics
22
Variable Description
y
Decrim
PMAR
Income
Age1419
Age2024
Age2529
Age3034
Age3539
Age4069
Age70
Male
Married
Divorce
Widow
Never Married
# Depchild
Degree
Working Status
Aboriginal
Unemployment rate
1 if using marijuana in the last 12 months, otherwise 0
1 if residing in decriminalized states
ln(real price of marijuana)
ln(real household annual income before tax)
1 if age is 14 to 19 years old
1 if age is 20 to 24 years old
1 if age is 25 to 29 years old
1 if age is 30 to 34 years old
1 if age is 35 to 39 years old
1 if age is 40 to 69 years old
1 if age is 70 years old and above, it is a reference category and is omitted from the
estimation
1 for male and 0 for female
1 if married
1 if divorce
1 if widow
1 if single, it is a reference category and omitted from the estimation
Number of dependent children aged 14 or below in the household
1 if university degree
1 if respondent is unemployed, and 0 otherwise
1 if Aboriginal or Torres Strait Islander
State Unemployment rate (%)
23
Summary Statistics
Variable
All Data (N = 14008) Treatment (N = 2968) Control (N = 11040)
Mean S.D. Mean S.D. Mean S.D.
y
Decrim
PMAR
Income
Age1419
Age2024
Age2529
Age3034
Age3539
Age4069
Male
Married
Divorce
Widow
# Depchild
Degree
Working Status
Aboriginal
Unemployment rate
0.157
0.212
5.876
10.445
0.054
0.077
0.100
0.122
0.126
0.455
0.476
0.619
0.115
0.038
0.594
0.260
0.028
0.013
6.986
0.364
0.409
0.237
0.757
0.227
0.267
0.300
0.327
0.331
0.498
0.499
0.486
0.319
0.191
0.895
0.438
0.164
0.111
1.375
0.181
1
5.929
10.554
0.047
0.075
0.103
0.121
0.130
0.466
0.488
0.621
0.124
0.033
0.583
0.292
0.019
0.017
6.331
0.385
0
0.036
0.709
0.211
0.263
0.304
0.326
0.337
0.499
0.500
0.485
0.330
0.178
0.880
0.455
0.137
0.127
1.386
0.151
0
5.862
10.416
0.056
0.078
0.099
0.122
0.124
0.452
0.472
0.618
0.112
0.039
0.597
0.251
0.030
0.012
7.162
0.358
0
0.264
0.767
0.231
0.268
0.299
0.327
0.330
0.498
0.499
0.486
0.315
0.195
0.899
0.434
0.170
0.107
1.318
24
Non-parametric Model: Propensity Score Stratification Matching
• Propensity score is )|1Pr()( wdwp iii
• Under the assumptions,
1)|1Pr()(0 wdwp iii and wdyy iiii |, *0
*1
then
)(|),( *0
*1 wpdyy iiii and )(| wpdw iii
25
2
19
66
240
526
252
51
18
27 10
15
35
84
188
214
152
39
26 2532
37
101
170
253
222
103
34 32
61 1 2 3
01
00
200
300
400
500
Fre
qu
ency
0 .025.05.075 .1 .125.15.175 .2 .225.25.275 .3 .325.35.375 .4 .425.45.475 .5 .525.55.575 .6 .625.65.675 .7 .725.75.775 .8 .825.85.875 .9 .925.95.975 1p(w)
63
403
264
109
345
837
602
150
5423
6094
163
321
423
373
285
154
198
316
534
794
1084
1237
997
660
253
94
5640
11126 5 6 8 4 1 1
05
00
100
01
50
0
Fre
qu
ency
0 .025.05.075 .1 .125.15.175 .2 .225.25.275 .3 .325.35.375 .4 .425.45.475 .5 .525.55.575 .6 .625.65.675 .7 .725.75.775 .8 .825.85.875 .9 .925.95.975 1p(w)
26
Range of Estimated Propensity Score
Number of Treatment
observations
Number of Control observations
ATE ATET
0.05 – 0.45
Length of interval 0.025
2810 10301 0.092***
(0.026)
-0.032
(0.053)
0.05 – 0.45
With STATA interval
2810 10301 0.096***
(0.025)
-0.069*
(0.038)
0.075 – 0.425
Length of interval 0.025
2432 9509 0.074***
(0.018)
-0.056**
(0.025)
0.075 – 0.425
With STATA interval
2432 9509 0.086***
(0.020)
-0.056**
(0.025)
0.05 – 0.4
Length of interval 0.05
2116 10295 0.098***
(0.017)
-0.024†
(0.014)
0.05 – 0.4
With STATA interval
2116 10295 0.112***
(0.026)
-0.026**
(0.012)
0.1 – 0.35
Length of interval 0.05
1909 8263 0.067***
(0.016)
-0.021†
(0.015)
0.1 – 0.35
With STATA interval
1909 8263 0.059***
(0.020)
-0.022†
(0.015)
27
• Violation of )(| wpdw iii could happen because of
(1) Our sample size is not large enough to perform a reliable non-parametric estimation
- For many ranges of propensity score, there is no
overlapping observations
- Within overlapping range, t-squared tests unambiguously reject the balancing condition
or
(2) Conditional independence assumption is violated
28
Non-parametric Approach Parametric Approach
Advantages - Do not impose any
distributional assumption
-Take account of both selection on observables and unobservables- Can estimate the impact of other explanatory variables on smoking outcome in addition to the effect of decriminalization policy
Disadvantages - Conditional independence assumption is the maintained hypothesis- Only take account of selection on observables
- Need to impose both functional form and distributional assumptions
29
• Endogenous Probit Switching Model (Model 1)
iii xy 111*1
iii xy 000*0
iii zd *
1
1
1
,
0
0
0
~
01
010
110
0
1
N
i
i
i
30
Average Treatment Effect
Model 1 & 2:
If sample is randomly drawn,
Model 3 & 4:
)]ˆˆ()ˆˆˆ([1
1xx i
n
iin
)]()([1
ˆˆˆˆ 001
11 xx in
ii
n
dxxfxx )()]()([ 0011
31
Coefficient estimates for marijuana smoking equation
Variable Binary Probit Bivariate Probit
Two-part:
Treatment
Two-part:
Control
Switching:
Treatment
Switching:
Control
Decrim
PMAR
Income
Age1419
Age2024
Age2529
Age3034
Age3539
Age4069
Male
0.173***
(0.033)
-0.338***
(0.057)
-0.008
(0.021)
1.952***
(0.205)
2.105***
(0.202)
2.113***
(0.200)
1.908***
(0.200)
1.848***
(0.200)
1.212***
(0.197)
0.238***
(0.028)
0.187
(0.130)
-0.341***
(0.061)
-0.009
(0.022)
1.952***
(0.205)
2.105***
(0.202)
2.112***
(0.200)
1.908***
(0.200)
1.848***
(0.200)
1.211***
(0.197)
0.238***
(0.028)
-5.305***
(0.810)
-0.016
(0.049)
1.962***
(0.418)
2.084***
(0.409)
2.023***
(0.404)
1.854***
(0.404)
1.894***
(0.402)
1.113***
(0.398)
0.201***
(0.060)
-0.315***
(0.057)
-0.007
(0.024)
1.961***
(0.240)
2.109***
(0.237)
2.133***
(0.235)
1.920***
(0.234)
1.822***
(0.234)
1.237***
(0.232)
0.250***
(0.032)
-5.324***
(0.829)
-0.018
(0.052)
1.963***
(0.418)
2.084***
(0.409)
2.022***
(0.405)
1.853***
(0.404)
1.893***
(0.402)
1.113***
(0.398)
0.201***
(0.060)
-0.409***
(0.060)
-0.022
(0.024)
1.915***
(0.237)
2.048***
(0.235)
2.068***
(0.233)
1.869***
(0.232)
1.774***
(0.231)
1.206***
(0.228)
0.236***
(0.032)
32
Variable Binary Probit Bivariate Probit
Two-part:
Treatment
Two-part:
Control
Switching:
Treatment
Switching:
Control
Married
Divorce
Widow
Degree
Working Status
Aboriginal
Constant
ρ1υ= ρ0υ
ρ1υ
ρ0υ
-0.470***
(0.037)
-0.001
(0.053)
-0.441***
(0.127)
-0.034
(0.034)
0.256***
(0.074)
0.282**
(0.107)
-0.468
(0.429)
-0.470***
(0.037)
-0.001
(0.053)
-0.441***
(0.127)
-0.034
(0.034)
0.257***
(0.075)
0.281**
(0.107)
-0.451
(0.455)
-0.008
(0.077)
-0.429***
(0.080)
-0.051
(0.112)
-0.561†
(0.310)
-0.262***
(0.073)
0.306
(0.187)
-0.304
(0.218)
29.338***
(4.832)
-0.493***
(0.043)
0.008
(0.061)
-0.435***
(0.140)
0.047
(0.039)
0.249***
(0.081)
0.438***
(0.124)
-0.650
(0.457)
-0.428***
(0.080)
-0.053
(0.112)
-0.561†
(0.310)
-0.262***
(0.073)
0.307
(0.188)
-0.308
(0.222)
29.486***
(5.021)
-0.012
(0.109)
-0.470***
(0.044)
-0.005
(0.059)
-0.422***
(0.136)
0.042
(0.038)
0.266***
(0.080)
0.377***
(0.123)
-0.011
(0.475)
-0.469**
(0.181)
33
Average Treatment Effect and Marginal Effect
Binary Probit Bivariate Probit
Two-part:
Treatment
Two-part:
Control
Switching:
Treatment
Switching:
Control
ATE
Marginal Effect
Decrim
PMAR
Income
Age1419
Age2024
Age2529
Age3034
Age3539
Age4069
Male
Married
Divorce
Widow
Degree
Working Status
Aboriginal
0.037***
(0.0002)
0.067***
-0.127***
-0.003
0.352***
0.598***
0.598***
0.578***
0.570***
0.443***
0.085***
-0.157***
-0.0004
-0.149***
-0.013
0.099***
0.110**
0.040***
(0.0002)
0.072
-0.128***
-0.003
0.351***
0.599***
0.599***
0.578***
0.571***
0.443***
0.085***
-0.157***
-0.001
-0.148***
-0.013
0.100***
0.109**
0.137***
(0.002)
-2.101***
-0.006
0.433***
0.523***
0.520***
0.507***
0.510***
0.388***
0.078***
-0.161***
-0.020
-0.204†
-0.101***
0.121
-0.116
-0.118***
-0.003
0.351***
0.599***
0.601***
0.580***
0.567***
0.450***
0.089***
-0.163***
0.003
-0.147***
0.018
0.097***
0.172***
0.163***
(0.002)
-2.112***
-0.007
0.439***
0.518***
0.514***
0.502***
0.505***
0.385**
0.078***
-0.161***
-0.021
-0.206
-0.101***
0.122
-0.118
-0.145***
-0.008
0.304***
0.628***
0.631***
0.604***
0.588***
0.451***
0.078***
-0.144***
-0.002
-0.131***
0.015
0.099***
0.143***
34
Panel Data – Allow better control of selection on observables and unobservables
Difference-in-Difference Method
- outcome of the jth individual after the treatment
- outcome of the jth individual before the treatment
- outcome of the ith individual who did not receive treatment at time t
- outcome of the ith individual at time s
)()( yyyyE isitjsjt
y jty jsyit
yis
35
Cross-Section vs Panel Discrete Modeling
iii xy ~~
'*
0 if ,0
0 if ,1*
*
y
yy
i
ii
)|1Prob(
)Prob(
)Prob(*0)Prob(*1)|(
~
~~
'
~~
'
~~
'
~
xy
x
xxxyE
)|()|(~~~~xxyExxyE jjii
36
37
38
39
40
6. Bias-Reduced Estimator
Mean Square Error =
Suppose
)()(~~
^ '
~~
^ E
)()()()(~~
^ '
~~
^
~
^
~
^ '
~
^
~
^ EEEEE
)()()( '
~
^
biasbiasVar
n
Ob 2~~~
^ 1)( plim
,~~
^
~
^
bR
n
OR 2~~
^ 1)( plim
41
Consider the log-likelihood function of N cross-sectional units observed over T time periods,
where denotes the likelihood function of the T-time series observations for the ith individual.
For instance, consider a binary choice model of the form,
),,(),...,,(~1
1~
i
N
iiN
i
i.i.d.~~
'* , uuxy ititiitit
.0 if ,0
,0 if ,1*
*
y
yy
it
itit
42
Then
The MLE is obtained by simultaneously solving for
),|1(Prob)(),|(~~~
'
~ iitititiitiitit xyFxFxyE
T
tititititii FyFy
1~)1log()1(log),(
),...,,(^
1
^
~
^
N from
0
~
^1
~~
N
i
i(1)
.,...,1 ,0^ Nii
i
i
(2)
43
The MLE of can also be derived by first obtaining
from (2) as a function of , ,
substituting into the likelihood function
to form the concentrated log-likelihood function,
~
^
i ~
)(~
^
i
)(~
^
i
)),(,())(),...,(,(1 ~
^
~
*
~
^
~
^
1~
*
N
iiN
N
ii
N
i
ii
1^
*
1~
~
^
~
*
~
*
0))(,(
i
(3)
(4)
then solving
44
When T is finite
0))(,(
1
1~
~
^
~
*
~
N
i
i
NE
Expanding the score of the concentrated log-likelihoodaround , and evaluating it at : i
))((),())(,(^
*
~
*
~
^
~
* iiiiiii
)())((2
1 2/1^ 2* TO piiiii
(5)
45
)(),(
)()(
)(
)(
2/1*
2/1^
*
*
TO
TO
TO
TO
ii
ii
piii
pii
46
)(~
^
i
0),(),(
)(^
~)~
(^
~
iiii
ii
ii
Riii
ii
i
iii
ii
iii
))((),(^
),(2
2
is derived by solving
47
iiiiiii1^
))((
iiiiiiiE )(
iiiiiii E
iiiE
iiiiiii
E 11
11
...1 2
21
iiiE
i
iiiE
iiii
E
48
Monte carlo experiments conducted by Carro (2006) have shown that when T = 8, the bias of modified MLE for dynamic probit and logit models are negligible. Another advantage of the Arellano-Carro approach is its generality. For instance, a dynamic logit model with time dummy explanatory variable cannot meet the Honore and Kyriazidou (2000) conditions for generating consistent estimator, but can still be estimated by the modified MLE with good finite sample properties.
49
Advantages of Carro (2006) produce:
1. No need to transform the parameters of interest into (information) orthogonal parameters as done in Cox and Reid (1987, JRSS B) or Arellano (2003).
2. No need to impose any conditions on the observed data as in the case of Honore and Kyriazidou (2000). In other words, all observed
data can be utilized to obtain .^
~
M
50
Issues:
1. may not have a closed form solution. Neither is the evaluation of expectation term trivial. Hence, computationally can be tedious.
e.g. in the case of logit model,
)(~
^
i
e
exy
xiti
xiti
iitit
'
'
1),|1(Prob
N
i
T
tiitit
N
i
T
tiit xyx
1 1
'
1 1
' )()(exp1log L log
51
01
log
1
T
titixit
ixit
i
i
i
ye
eL
)(^^^
MLEMLER g
It will be useful to derive a bias-reduced estimatorthat has the form
52
7. Concluding Remarks
Issues:
(i) Is there a simultaneous equation framework for discrete data?
uxyy 111211
uxyy 222122
53
If and are continuous, one can find a
particular realized value of ( u1 , u2 ) satisfying
observed ( , ) or one can consider that
there exist independent shocks ( u1 , u2 ) such
that there exist ( , ) satisfying the model.
y1 y2
y1 y2
y1 y2
54
But if is dichotomous, and exogenous,
then
y1 y2
)1121(121 )(),|1(Prob
xy
duufxyy
)( 1121 xyF
and realized u1 takes value of (1-F) and (-F) while F depends on y2. If y2 is endogenous, then u1 cannot be independent shocks
55
(ii) Is there an equivalent limited information framework for discrete data simultaneous equation model
(iii) Cross-Sectional dependence