Upload
charleen-arnold
View
214
Download
0
Embed Size (px)
Citation preview
Part 2: Panel Data Estimators:using the STATA program for Growth
and Aid (Burnside and Dollar)
Jean-Bernard CHATELAIN
Plan
1. TO DO steps with panel data.
2. A tool: Standardized parameters
3. Example: Step 1: bivariate analysis with panel data
4. Theory for step 1
5. Example: Step 2: multivariate panel data estimators
6. Theory for step 2
7. Time invariant variables using panel data
To do sequence for panel data econometrics: Step 1
1. Create Within (W) Between (B) transformed variables: forget non transformed variables
2. Analysis of variance Within / Between
3. Univariate Histograms and desc stat. W/B
4. Bivariate graphs, correlation matrix W/B, including autocorrelation for W. Choice of specification
5. xtset, xtdescribe, xtsum, xttab, xtdata, xtline, xtunitroot
FIRST STEP: Always compare histograms, standard errors and simple correlations in the within versus between
subspacesfor specification search and in
order to understand your second step multivariate
results
Step 2
1. Step 2: Fixed effect, Between, Random effects Mundlak estimates if time invariant variables (xtreg, fe/be/re)
2. Check studentized residuals/dfbetas
3. If autocorrelation of residuals (xtregar)
4. Hausman Taylor for time invariant (xthtaylor)
5. Others: xtpcse (pcse), xtrc (random coefficients), xtmixed (random fixed coefficients), xtgls, xtunitroot
Step 3: Other IV-methods: other types of endogeneity
1. IV for panel (xtivreg)
2. Panel data GMM first differences (Arellano Bond) (xtabond, xtdpd, Roodman’s xtabond2 )
3. Panel data GMM system (Arellano Bover), (xtdpdsys)
2. A tool: Standardized parameters
Non standardized beta for variable x:
Beta(x)= beta(standardized, r(ij))* sigma(y)/sigma(x)
Simple regression:
BetaS(x)=r(yx) simple correlation coefficient.
Multiple regression:
BetaS(x) over 1 signals near-multicollinearity issues.
Abs(x-mean)<sigma=66% of shocks (normal)Abs(x-mean)<1.96.sigma=95%
3. Panel data models
Trade-offs
Between (endogeneity bias)
versus
Within-fixed effects (common trends, unit root problems)
Versus
First differences
A. Dealing with a specific type of (country) heteroskedasticity
y(it)=β0+β1.x(it)+α(i)+α(t)+ε(it)
α(i) and α(t) are random disturbances added to ε(it) :
α(i): individual (time invariant) unobserved random effects (characteristics).
Country: geography, history before the beginning of the sample, and so on. You do not believe it?
Then ALWAYS PLOT BOXPLOTS of RESIDUALS per individuals (countries).
ETH5
NIC4CMR7GAB4
DOM2SYR3
CMR4
GAB3-5
05
Stu
dent
ized
res
idua
ls
CIV4
ECU2
MYS7
MLI6
NIC4
TUR7
ZAR2
-50
5S
tud
entiz
ed r
esid
ual
s
ALGERIAARGENTINABOLIVIABOTSWANABRAZILCAMEROONCHILECOLOMBIACOSTA RICACOTE D'IVOIREDOMINICAN REPUBLICECUADOREGYPTEL SALVADORETHIOPIAGABONGAMBIA, THEGHANAGUATEMALAGUYANAHAITIHONDURASINDIAINDONESIAJAMAICAKENYAKOREA, REPUBLIC OFMADAGASCARMALAWIMALAYSIAMALIMEXICOMOROCCONICARAGUANIGERNIGERIAPAKISTANPARAGUAYPERUPHILIPPINESSENEGALSIERRA LEONESOMALIASRI LANKASYRIAN ARAB REPUBLICTANZANIATHAILANDTOGOTRINIDAD AND TOBAGOTUNISIATURKEYURUGUAYVENEZUELAZAIRE (D.R. CONGO)ZAMBIAZIMBABWE
Remark
In the previous graph, the boxplots of the distribution of residuals of a panel data regression (shown later in the slides) per country are presented by alphabetic order of countries.
Another graph may show boxplots ordered by the median (or by the mean) of residuals per country.
Specific Heteroskedasticityimplie specific GLS
This specific Feasible GLS estimator is the « random effect » estimator, with the choice of the optimal estimated θ weight for between with respect to within (see later).
Note that the FGLS estimator does not and cannot correct the distribution of residuals per country to be identically distributed between all countries.
B. Dealing with a specific type of endogeneity: α(i)-endogeneity
y(it)=β0+β1.x(it)+α(i)+α(t)+ε(it)
Cov(x(it), α(i)) = 0 or not.
α(i): individual (time invariant) unobserved random effects (characteristics).
Country: geography, history before the beginning of the sample, and so on.
Specific Heteroskedasticityimplie specific GLS
A joint answer to Heteroskedasticity and to
α(i)-endogeneity is:
Random effects including all averages over time of time varying variables x(i,.) in the set of regressors (Mundlak estimator). [ see later]
Analysis of variance operators: Between(groups) and Within(group)
Between:
Within:
OLS: Within + Between: x(it)
Random effects « θ-weight » operator:
Within + θ Between= x(it)-x(i.)+ θ.x(i.)
θ =0: Within
θ =1: OLS
Analysis of variance: Orthogonal sub-spaces: Within versus Between
cov( x(it)-x(i.) , x(i.) ) = 0
Overall variance = (for example)
30% Within: deviation from this average, NT-N, Regression on within transformed variables X(it) – X(i.) = fixed effect models.
70% Between: average over time of of cross sections, dimension N = good for time invariant inference. X(i.)
variable | variance gdpg | 12.95207wgdpg | 8.132692 (63%) mgdpg | 4.8193
policy | 1.591617 wpolicy | .7358689 (45%)mpolicy | .8557484 eda | 4.281359 weda | 1.202634 (28%) mmeda | 3.078725
BD: N=275: Analysis of variance
Within/overall
For the dependent:
GDPG: 63% rather high
Good news for fixed effects model!
POLICY: 45%
IEDA: 28%: relatively low variance in within.
CRGE, SSA, EASIA: 0%
Dimension of subspaces of observations
NT = N (between) + (NT – N) (within).
Then in each subspace:
Between regression leads to an analysis of variance with prediction and residual
Within regression leads to an analysis of variance with prediction and residual
Regression in each between or within subspace: another 2nd
step of the analysis of variance.
Repeated Between has meaningless degrees of freedom
Repeated T times x(i.) is what appears in your database of dimension NT when you compute it using by country egen (etc.).
STATA: xtreg y x, be
Adjust for dof = N – k -1
STATA: reg y(i.) x(i.)
Will not! Dof NT-k-1, t stats are sky rocketing
Within estimator 2nd gain: explosion of degrees of freedomNT-N-k degrees of freedom (dof)
Much larger than cross section and between dof N-k-1, and than time series dof T-k-1
Number of regressors excluding intercept : k
Intercept is useless, as within transformed dependent variable and regressors are all zero mean (this constraint explain why N degrees of freedom are lost, and « given » to the between subspace.)
1st drawback: NT>400 (N=20, T=20): substantive significance?
T-stats >2 for number of observations over 400 may (or may not) be related to contributions to R2<1%.
Check substantive significance (marginal contribution to R2:
R2(k variables) – R2 (k-1 variables)
You may find huge discrepancies among regressors
Few researcher and journal editors require it:
Publication bias of a lot of unsubstantive significance for career concerns.
Statistical significance versus Substantive significance
With panel data: NT number of observations > 400
• Statistical significance with reasonable parameters (average of parameters) is easy to obtain (N=20, T=20), but substantive significance may turn to be an issue (very small contribution to R2)!
• But assume same slopes for different countries (or individual) with respect to time series estimators by country (T>10): a VERY LARGE increase of residuals and VERY POOR forecast.
2nd Drawback of within: common slopes
Per country time series estimate (best estimates of individual constants: so called « fixed effects »)
y(1,t)=β(1,cst)+β(1,x).x(1,t)+α(1)+α(t)+ε(1,t)
y(2,t)=β(2,cst)+β(2,x).x(1,t)+α(2)+α(t)+ε(2,t)
Fixed effect/OLS on within transformed
Constrains β(1,x)=β(2,x), if not the case, explosion of the size of residuals and RMSE with respect to time series, poor forecasts albeit large t-stats due to NT degrees of freedom.
Common slopes
When one assumes common slopes in a panel estimator (random effects or within transformed), β(1,x)=β(2,x) whereas the time series estimates per country (feasible if N>k, number of regressors) suggests different slopes β(1,x)<β(2,x), the panel estimator is a weighted average of both: β(1,x)< β(panel) <β(2,x).
with (β(2,x)-β(panel))*x(2,.) goes into the residuals of country 2, and for country one:
(β(1,x)-β(panel))*x(1,.)
Not all slopes identical but some in given groups: a most interesting use of panel data
In fact, panel data are most useful to track differences in slopes for different groups of country in the panel and different goups before/after a policy intervention (e.g. financial liberalization).
The « difference of differences » estimator and tests with 2 groups and 2 periods is a typical panel data estimators.
Remark: An application of Frish Waugh Lovell theorem
OLS estimates on slopes on within transformed variables (common constant = zero)
=
OLS estimates on slopes on non-within transformed regressors but including dummies for all individuals (estimates N constants for N individuals)
3rd drawback: possible non stationarity of OLS on Within transformed variables
(« fixed effects ») if T>5It eliminates cov (x(it) , a(i) ) BUT:
Common trends remains even with T<10:
spurious regressions, trend driven near-multicollinearity.
Try also first differences (but smaller variance)
BUT ALSO: large share of overall variance (70% between variance) unexplained.
1st Gain of within: Eliminates α(i) endogeneity
y(i,t)=β(cst)+β(x).x(i,t)+α(i)+α(t)+ε(i,t)
The operator between leads to:
y(i,.)=β(cst)+β(x).x(i,.)+α(i)+ε(i,.)
The difference (within) eliminates α(i), the β(cst) constant and all time invariant regressors z(i) in the sample period, the average over time of α(t) may be chosen as zero:
y(it) - y(i,.)=β(x).(x(it) - x(i,.))+ε(it)- ε(i,.)
Remark: same trick for first difference estimator
y(i,t)=β(cst)+β(x).x(i,t)+α(i)+α(t)+ε(i,t)
The operator between leads to:
y(i,t-1)=β(cst)+β(x).x(i,t-1)+α(i)+ε(i,t-1)
The first differences eliminates α(i), the constant and all time invariant variables in the sample:
y(it) - y(i,t-1)=β(x).(x(it) - x(i,t-1))+ε(it)- ε(i,t-1)
Another advantage: First difference may eliminate unit roots for variables in levels.
Between -endogeneity biassimple regression (true+bias)
The bias is linearly increasing with the standard error of the random individual term and with .
This property remains in multiple regression, but cross correlation between several endogenous variables (X’X)-1(X’α) leads to a more complicated formulas for the bias.
OLS bias for 4 parameters
Another drawback of OLS on Between transformed variables
N-k-1 degrees of freedom, very bad with respect to publication bias requiring for t-stats>2
NT-N-k >> N – k - 1
But: nearly time invariant regressors have unstable estimates in within
• Dependent variable: check the share of variance between / within.
• Regressors: idem.• A regressor close to be time invariant has
a small share of variance in within dimension. Its within transformation is concentrated around its mean zero. The Beta in within regression will be very high and unstable when removing a few observations.
Between / Within
Between:
N, may have large share of overall variance.
Endogeneity bias
But no non stationarity issues.
Within: NT-N
No endogeneity bias
But stationarity issues (spurious time series regressions), then First differences estimator
BD example Step 1: univariate and
bivariate analysis
0.0
5.1
.15
.2.2
5
Den
sity
-10 -5 0 5 10wgdpg
0.5
11.5
2
Den
sity
-5 0 5 10weda
0.1
.2.3
.4
Den
sity
-5 0 5 10gdpg(i.)
0.5
1
Den
sity
0 2 4 6 8mmeda
Do not use the « OLS » simple correlation matrix.
You use FIXED EFFECTS (within)
The non transformed (OLS) simple correlation matrix is NOT the one to investigate for near-multicolinearity, classical suppressors (near zero correlation with dependent), sizeable correlation between regressors (endogeneity).
It is the correlation of WITHIN transformed variables!
Discrepancy between Within and Between simple correlations
Ideally, similar WITHIN BETWEEN simple correlation between dependent and regressors, as well as similar share of variance (standard errors) in within and between dimensions:
Would lead to identical betas in within and between, an acceptance of equality of both sets of parameters in Hausman tests constrasting Within and Between.
Then the overall variance would be fully explained by the model: a dream.
Ideally: Specification minimizing the gap Within versus Between (sub-correlation matrix W = B)
Minimize Panel Hausman Test statistics
while selecting regressors.
H0: β(within) - β(between) = 0
If not rejected: Within regression not spurious (also valid in between/cross section space) and Between not facing endogeneity
Ex(it)a(i)=0 which imply =0
and Between variance (often 70%) explained.
Differences of simple correlations W/B: (1) endogeneity
If no endogeneity bias and no additional issues:
Between has an alpha(i) endogeneity bias.
not Within [cf. Hausman test]
(2) Common trends (and non stationarity) in within dimension
May also explain the discrepancy of between versus within.
This time Within has a chance to lead (or not) to spurious time series correlation.
Between is cross section: it DOES not face spurious time series correlation.
ADD lagged variables and TREND in the WITHIN CORRELATION MATRIX for HINTS!
(3) Dynamic model: Between as a long term estimate
When the true model has an auto-regressive component = DYNAMIC MODEL.
AND when there is no alpha(i) endogeneity.
Between estimate converges to long term
beta/(1-autoreg parameter)
[hence within may be smaller in absolute value]
Pirotte (Economics Letters)
(4) Time invariant variables
Modify the data generating process in the between subspace
With respect to the within subspace (where they do not belong).
This may be another factor (besides endogeneity) which explains the discrepancy W/B.
Example: women/men (and no transexuals in data set) for salaries: do not matter in within dimension!
Correlation Within and Between matrix inspection: Omitted variable bias is bad except when adding highly
collinear covariate or « classical suppressor »
Y= a1. x1 + a2 . x2 + e
If corr (y , x1) below 0.1 in absolute value (« classical suppressor »: spurious effect identification problem): if possible, omit x1 in the regression.
If corr (x1, x2) higher that 0.8 in absolute value: near-multicolinerity problem: if possible, omit x1 OR x2 in the regression.
The simple correlation matrix helps to specify the multivariate
model.Select regressors.
It also indicates further causal or endogenous links between regressors (with their simple correlation is between 0.15 and 0.85):
You may use causal graphs with bidirectional arrows to get insights.
You may use these relationships for specification of first step 2SLS or simulatenous regressions.
wgdpg
wpolicy
wlgdp
wm2_1
year
-10
0
10
-10 0 10
-5
0
5
-5 0 5
-1
0
1
-1 0 1
-20
0
20
40
-20 0 20 40
2
4
6
8
2 4 6 8
gdpg(i.)
mpolicy
mlgdp
mm2_1
-5
0
5
10
-5 0 5 10
-2
0
2
4
-2 0 2 4
6
7
8
9
6 7 8 9
20
40
60
20 40 60
3 within transformed regressors with correlation with dependent different from zero, Correlation
wm2_1 with trend, autocorr wgdpg: -0.03 for 217 obs.
| year wgdpg Between wpolicy wlgdp wgdpg | -0.2737 wpolicy | 0.1806 0.2488 ≠ 0.7091 wlgdp | 0.2887 -0.2519 ≠ 0.2787 0.0543 wm2_1| 0.5588 -0.1816 ≠ 0.2116 0.0179 0.2009
Gdp growth is the first difference of log of gdp per head: persistence of gdp per head is wiped out by first differences. In this sample, its auto-correlation is close to zero: no dynamic model necessary. No long term interpretation of between.
No very large correlations (>0.8) with trend: within may not be subject to spurious time series correlation.
The between / within discrepancy may be related to cov(x(i.), alpha(i) ) endogeneity bias for between.
wgdpg
wassas
wethnfassas
-10
0
10
-10 0 10
-5
0
5
10
-5 0 5 10
-5
0
5
-5 0 5
gdpg(i.)
massas
methnfassas
-5
0
5
10
-5 0 5 10
0
2
4
0 2 4
0
1
2
0 1 2
wgdpg between wassas wassas | -0.0592 -0.0690 wethnfassas | -0.0219 -0.0656 0.8821 0.8315
Parameter identification problemCheck robustness to outliers.
wgdpg
weda
wedapolicy
weda2policy
-10
0
10
-10 0 10
-5
0
5
10
-5 0 5 10
-20
0
20
-20 0 20
-200
0
200
400
-200 0 200 400
gdpg(i.)
mmeda
medapolicy
meda2policy
-5
0
5
10
-5 0 5 10
0
5
10
0 5 10
-10
0
10
20
-10 0 10 20
-100
0
100
200
-100 0 100 200
| wgdpg between weda wedapolicy weda | 0.0079 -0.3417 wedapolicy | 0.0883 0.1866 0.4488 0.5912 weda2policy | 0.0428 0.0858 0.5147 0.6225 0.9273 0.9229
Parameter identification problemCheck robustness to outliers
DZA2DZA3ARG2ARG3ARG4ARG5ARG6 ARG7
BOL2BOL3BOL4BOL5
BOL6 BOL7BWA4
BWA5BWA6BRA2BRA3BRA4BRA5BRA6BRA7
CMR3CMR4
CMR5CMR6CMR7 CHL2CHL3 CHL4CHL5 CHL6CHL7COL2COL3COL4COL5 COL6COL7
CRI2CRI3CRI4
CRI5 CRI6CRI7CIV4 DOM2DOM3DOM4
DOM5DOM6DOM7 ECU2ECU3ECU4ECU5ECU6ECU7
EGY3
EGY4EGY5EGY6
EGY7
SLV2SLV3SLV4
SLV5SLV6
SLV7
ETH5
ETH6
GAB2
GAB3GAB4GAB5GAB6 GAB7
GMB2
GMB3
GMB4GMB5
GMB6
GMB7
GHA2GHA3 GHA4GHA5
GHA6GHA7
GTM2GTM3GTM4GTM5GTM6GTM7
GUY2 GUY3
GUY4GUY5 GUY6
GUY7
HTI2
HTI3HTI4HTI5
HTI6
HND2HND3
HND4
HND5 HND6HND7
IND2 IND3IND4 IND5IND6IND7IDN2IDN3IDN4IDN5IDN6IDN7
JAM3
JAM4JAM5
KEN2KEN3KEN4KEN5
KEN6KEN7 KOR2KOR3KOR4 KOR5 KOR6KOR7
MDG2MDG3
MDG6MDG7
MWI4
MWI5
MWI6
MWI7
MYS2MYS3MYS4MYS5MYS6 MYS7MLI6 MEX2MEX3 MEX4MEX5MEX6 MEX7MAR2 MAR3MAR4MAR5MAR6
MAR7
NIC2 NIC3
NIC4
NIC5
NIC6
NIC7
NER3
NER4NGA2NGA3NGA4NGA5 NGA6NGA7PAK2PAK3 PAK4PAK5PAK6PAK7
PRY2PRY3 PRY4PRY5 PRY6PRY7 PER2PER3PER4PER5PER6 PER7 PHL2PHL3PHL4PHL5 PHL6PHL7SEN2 SEN3
SEN4 SEN5SLE2
SLE3SLE4
SLE5 SLE6SLE7SOM3
SOM4LKA2 LKA3
LKA4LKA5LKA6 LKA7SYR2
SYR3SYR4
SYR6SYR7
TZA5TZA6THA2THA3THA4THA5 THA6THA7
TGO3TGO4
TGO5 TGO6
TTO2 TTO3TTO4TTO5TTO6 TUN5TUN6 TUN7TUR7URY2 URY3URY4URY5 URY6URY7VEN2VEN3VEN4VEN5 VEN6 VEN7ZAR2ZAR3ZAR4
ZAR5
ZAR6
ZMB2ZMB3
ZMB4ZMB5
ZMB6
ZMB7
ZWE5ZWE6
ZWE7
-50
510
With
in t
rans
form
ed
Aid
/GD
P
-10 -5 0 5 10
Within transformed GDP growth
Actual Data
Linear fit
Quadratic fit
Lowess
Bivariate Within tranformed: Aid/GDP and GDP Growth
DZA2DZA3
ARG2ARG3
ARG4
ARG5
ARG6
ARG7
BOL2
BOL3
BOL4
BOL5
BOL6
BOL7BWA4
BWA5BWA6
BRA2
BRA3
BRA4BRA5
BRA6
BRA7
CMR3
CMR4
CMR5
CMR6
CMR7
CHL2CHL3
CHL4
CHL5
CHL6CHL7COL2
COL3COL4
COL5
COL6COL7
CRI2
CRI3
CRI4
CRI5
CRI6CRI7CIV4
DOM2
DOM3DOM4
DOM5
DOM6
DOM7
ECU2
ECU3
ECU4
ECU5ECU6
ECU7
EGY3
EGY4EGY5
EGY6
EGY7
SLV2SLV3
SLV4
SLV5
SLV6
SLV7
ETH5
ETH6
GAB2
GAB3
GAB4
GAB5
GAB6
GAB7GMB2
GMB3
GMB4
GMB5GMB6
GMB7
GHA2
GHA3
GHA4
GHA5
GHA6GHA7
GTM2GTM3
GTM4
GTM5
GTM6GTM7
GUY2
GUY3
GUY4
GUY5
GUY6
GUY7
HTI2HTI3
HTI4
HTI5HTI6
HND2HND3
HND4
HND5
HND6HND7
IND2
IND3
IND4
IND5
IND6
IND7IDN2IDN3IDN4
IDN5
IDN6
IDN7
JAM3
JAM4JAM5
KEN2
KEN3
KEN4
KEN5
KEN6
KEN7
KOR2KOR3
KOR4
KOR5
KOR6
KOR7
MDG2MDG3
MDG6
MDG7
MWI4MWI5
MWI6MWI7
MYS2MYS3MYS4
MYS5MYS6
MYS7
MLI6
MEX2
MEX3
MEX4
MEX5MEX6
MEX7MAR2
MAR3
MAR4
MAR5MAR6
MAR7
NIC2
NIC3
NIC4
NIC5
NIC6
NIC7
NER3
NER4
NGA2
NGA3
NGA4
NGA5
NGA6NGA7
PAK2PAK3
PAK4PAK5PAK6
PAK7
PRY2
PRY3
PRY4
PRY5
PRY6PRY7
PER2PER3PER4
PER5
PER6
PER7
PHL2PHL3PHL4
PHL5
PHL6
PHL7
SEN2
SEN3
SEN4
SEN5SLE2
SLE3
SLE4
SLE5
SLE6SLE7
SOM3
SOM4
LKA2
LKA3LKA4LKA5
LKA6
LKA7
SYR2
SYR3
SYR4
SYR6
SYR7
TZA5TZA6
THA2THA3THA4THA5
THA6THA7
TGO3TGO4
TGO5
TGO6TTO2
TTO3TTO4
TTO5TTO6
TUN5
TUN6
TUN7
TUR7
URY2
URY3URY4
URY5
URY6
URY7VEN2VEN3
VEN4VEN5
VEN6
VEN7ZAR2
ZAR3
ZAR4
ZAR5ZAR6
ZMB2
ZMB3 ZMB4
ZMB5
ZMB6ZMB7
ZWE5ZWE6
ZWE7
-10
-50
510
With
in t
rans
form
ed
GD
P g
row
th
-5 0 5 10
Within transformed Aid/GDP
Actual Data
Linear fit
Quadratic fit
Lowess
Bivariate Within tranformed: Aid/GDP versus GDP Growth
DZA2DZA3
ARG2ARG3
ARG4
ARG5
ARG6
ARG7
BOL2
BOL3
BOL4
BOL5
BOL6
BOL7BWA4
BWA5BWA6
BRA2
BRA3
BRA4BRA5
BRA6
BRA7
CMR3
CMR4
CMR5
CMR6
CMR7
CHL2CHL3
CHL4
CHL5
CHL6CHL7COL2
COL3COL4
COL5
COL6COL7
CRI2
CRI3
CRI4
CRI5
CRI6CRI7
CIV4
DOM2
DOM3DOM4
DOM5
DOM6
DOM7
ECU2
ECU3
ECU4
ECU5ECU6
ECU7
EGY3
EGY4EGY5
EGY6
EGY7
SLV2SLV3
SLV4
SLV5
SLV6
SLV7
ETH5
ETH6
GAB2
GAB3
GAB4
GAB5
GAB6
GAB7GMB2
GMB3
GMB4GMB5
GMB6GMB7
GHA2
GHA3
GHA4
GHA5
GHA6GHA7
GTM2GTM3
GTM4
GTM5
GTM6GTM7
GUY2
GUY3
GUY4
GUY5
GUY6
GUY7
HTI2HTI3
HTI4
HTI5HTI6
HND2HND3
HND4
HND5
HND6 HND7
IND2
IND3
IND4
IND5
IND6
IND7IDN2IDN3IDN4
IDN5
IDN6
IDN7
JAM3
JAM4JAM5
KEN2
KEN3
KEN4
KEN5
KEN6
KEN7
KOR2KOR3
KOR4
KOR5
KOR6
KOR7
MDG2MDG3
MDG6
MDG7
MWI4MWI5
MWI6MWI7
MYS2MYS3
MYS4
MYS5MYS6
MYS7
MLI6
MEX2
MEX3
MEX4
MEX5MEX6
MEX7MAR2
MAR3
MAR4
MAR5 MAR6
MAR7
NIC2
NIC3
NIC4
NIC5
NIC6
NIC7
NER3
NER4
NGA2
NGA3
NGA4
NGA5
NGA6NGA7
PAK2PAK3
PAK4PAK5PAK6
PAK7
PRY2
PRY3
PRY4
PRY5
PRY6PRY7
PER2PER3PER4
PER5
PER6
PER7
PHL2PHL3PHL4
PHL5
PHL6
PHL7
SEN2
SEN3
SEN4
SEN5SLE2
SLE3
SLE4
SLE5
SLE6SLE7
SOM3
SOM4
LKA2
LKA3LKA4LKA5
LKA6
LKA7
SYR2
SYR3
SYR4
SYR6
SYR7
TZA5TZA6
THA2THA3THA4THA5
THA6THA7
TGO3TGO4
TGO5
TGO6TTO2
TTO3TTO4
TTO5TTO6
TUN5
TUN6
TUN7
TUR7
URY2
URY3URY4
URY5
URY6
URY7VEN2VEN3
VEN4VEN5
VEN6
VEN7ZAR2
ZAR3
ZAR4
ZAR5ZAR6
ZMB2
ZMB3ZMB4
ZMB5
ZMB6ZMB7
ZWE5ZWE6
ZWE7
-10
-50
510
With
in t
rans
form
ed G
DP
gro
wth
-6 -4 -2 0 2
Within transformed Policy
Actual Data
Linear fit
Quadratic fit
Lowess
Within tranformed: GDP Growth and Policy
DZADZA
ARGARGARGARGARGARGBOLBOLBOLBOLBOLBOL
BWABWABWA
BRABRABRABRABRABRA
CMRCMRCMRCMRCMR
CHLCHLCHLCHLCHLCHLCOLCOLCOLCOLCOLCOLCRICRICRICRICRICRI
CIV
DOMDOMDOMDOMDOMDOMECUECUECUECUECUECU
EGYEGYEGYEGYEGY
SLVSLVSLVSLVSLVSLV
ETHETH
GABGABGABGABGABGAB
GMBGMBGMBGMBGMBGMB
GHAGHAGHAGHAGHAGHA
GTMGTMGTMGTMGTMGTM
GUYGUYGUYGUYGUYGUYHTIHTIHTIHTIHTI
HNDHNDHNDHNDHNDHND
INDINDINDINDINDIND
IDNIDNIDNIDNIDNIDN
JAMJAMJAM
KENKENKENKENKENKEN
KORKORKORKORKORKOR
MDGMDGMDGMDGMWIMWIMWIMWI
MYSMYSMYSMYSMYSMYS MLI
MEXMEXMEXMEXMEXMEXMARMARMARMARMARMAR
NICNICNICNICNICNIC
NERNERNGANGANGANGANGANGA
PAKPAKPAKPAKPAKPAKPRYPRYPRYPRYPRYPRY
PERPERPERPERPERPER
PHLPHLPHLPHLPHLPHL
SENSENSENSENSLESLESLESLESLESLE
SOMSOM
LKALKALKALKALKALKA SYRSYRSYRSYRSYR
TZATZA
THATHATHATHATHATHA
TGOTGOTGOTGO
TTOTTOTTOTTOTTOTUNTUNTUN
TUR
URYURYURYURYURYURY
VENVENVENVENVENVEN
ZARZARZARZARZAR ZMBZMBZMBZMBZMBZMB
ZWEZWEZWE
-50
510
Bet
wee
n tra
nsfo
rmed
GD
P g
row
th
0 2 4 6 8
Between transformed Aid/GDP
Actual Data
Linear fit
Quadratic fit
Lowess
Weighted repeated Between tranformed: Aid/GDP versus GDP Growth
DZADZA
ARGARGARGARGARGARGBOLBOLBOLBOLBOLBOL
BWABWABWA
BRABRABRABRABRABRA
CMRCMRCMRCMRCMR
CHLCHLCHLCHLCHLCHLCOLCOLCOLCOLCOLCOLCRICRICRICRICRICRI
CIV
DOMDOMDOMDOMDOMDOM ECUECUECUECUECUECU
EGYEGYEGYEGYEGY
SLVSLVSLVSLVSLVSLV
ETHETH
GABGABGABGABGABGAB
GMBGMBGMBGMBGMBGMB
GHAGHAGHAGHAGHAGHA
GTMGTMGTMGTMGTMGTM
GUYGUYGUYGUYGUYGUYHTIHTIHTIHTIHTI
HNDHNDHNDHNDHNDHND
INDINDINDINDINDIND
IDNIDNIDNIDNIDNIDN
JAMJAMJAM
KENKENKENKENKENKEN
KORKORKORKORKORKOR
MDGMDGMDGMDGMWIMWIMWIMWI
MYSMYSMYSMYSMYSMYSMLI
MEXMEXMEXMEXMEXMEXMARMARMARMARMARMAR
NICNICNICNICNICNIC
NERNERNGANGANGANGANGANGA
PAKPAKPAKPAKPAKPAKPRYPRYPRYPRYPRYPRY
PERPERPERPERPERPER
PHLPHLPHLPHLPHLPHL
SENSENSENSENSLESLESLESLESLESLE
SOMSOM
LKALKALKALKALKALKASYRSYRSYRSYRSYR
TZATZA
THATHATHATHATHATHA
TGOTGOTGOTGO
TTOTTOTTOTTOTTOTUNTUNTUN
TUR
URYURYURYURYURYURY
VENVENVENVENVENVEN
ZARZARZARZARZARZMBZMBZMBZMBZMBZMB
ZWEZWEZWE
-50
510
Bet
wee
n tr
ansf
orm
ed G
DP
gro
wth
-1 0 1 2 3 4
Between transformed Policy
Actual Data
Linear fit
Quadratic fit
Lowess
Weighted repeated Between tranformed: GDP Growth and Policy
Aid2*Policy ordered by its mean values over time per country.
A few very large country-outliers-2
000
200
400
600
eda2
pol
icy
-51.18398-2.79541-.0002516.000174.0004187.0008187.0043965.0145151.0209916.0258429.0322472.0337574.0616103.1186159.1926836.2457322.2581988.3121285.3177678.3845654.3896744.397981.4410459.5594041.5744781.5931172.6132871.78449431.3197951.4484061.755161.9522572.7751342.7838692.9021643.2499533.3897483.951664.4271595.2107387.0500618.3241839.2244719.88985410.9641512.0066112.1251512.6576313.983821.7781125.2001826.9940654.1966994.54123113.7891149.2819
-40
-20
020
40ed
apo
licy
-7.564619-.0583638-.0096292.0091213.0109737.0146439.0381823.0669819.1242624.1262204.1545346.179565.2074886.2508489.3692497.412823.4941833.5683056.5948087.6384138.7247269.7387802.775421.7759466.776252.787003.89940361.0856391.1564761.2244481.2524241.3279761.4156451.5388751.5837811.6475681.7812841.8825962.0814652.1925182.3647112.431222.5345592.85823.0666253.1079283.279863.3817613.4894083.6536793.7310283.8731874.94910814.4406714.8756118.62207
Boxplots (per country) of aid*policy and Aid2*Policy ordered by
aid*policy(i.): only a few countries depart from « small » values
-200
020
040
060
0ed
a2p
olic
y
-51.18398-2.79541-.0002516.000174.0004187.0008187.0043965.0145151.0209916.0258429.0322472.0337574.0616103.1186159.1926836.2457322.2581988.3121285.3177678.3845654.3896744.397981.4410459.5594041.5744781.5931172.6132871.78449431.3197951.4484061.755161.9522572.7751342.7838692.9021643.2499533.3897483.951664.4271595.2107387.0500618.3241839.2244719.88985410.9641512.0066112.1251512.6576313.983821.7781125.2001826.9940654.1966994.54123113.7891149.2819
Poorly correlated (<0.1) with dependent(and highly correlated together):
spurious parameter identification problemif not robust to outliers: spurious
| year wgdpg weda wedapol weda2plcy wassas -------------+--------------------------------------------------------------- weda | 0.3601 0.0079 wedapolicy | 0.1748 0.0883 0.4488 weda2policy | 0.1383 0.0428 0.5147 0.9273 wassas | 0.0761 -0.0592 0.0306 0.0162 -0.0020 wethnfassas | 0.0296 -0.0219 -0.0003 -0.0070 -0.0088 0.8821
Step 2. Multivariate panel data estimators
Multivariate panel estimators
Fixed effects: OLS on within transformed
Between: OLS on between transformed.
Random effects: OLS on
Within transformed + theta * between transformed.
Mundlak: Random effects including all x(i.) and z(i)
OLS on First differences (with T=2: identical to fixed effects, different when T>2).
You do not have time invariant variables in your model
It is proposed to do an Hausman pre-test for
Random effects (without all x(i.)) versus Fixed effects.
Guggenberger (Journal of econometrics): pre-test is misleading. For « SMALL » endogeneity (corr(x(it),alpha(i) <0.25), it over accepts « random effects », but the endogeneity bias is LARGE.
No time invariant (continued)
Fixed effects (within) is always better than
Random effect and OLS
Then if time varying endogeneity issues than alpha(i) endogeneity: use panel instrumental variables estimators
Xtivreg
Xtabond2 (GMM ONLY IF T<10!!)
But: what if common trends for dependent and regressors
Insight from simple correlation with trend
And auto-correlation.
Then, also add in your tables:
Between estimation.
And
First differences estimation (prefered one before IV estimation).
You have time invariant variables using panel data
Orthogonal spaces:
Between: average over time of of cross sections, dimension N
Is RELEVANT for inference of time invariant Z(i) via cancelling out of individual disturbances: N observation and NOT
Repeated between with NT observations!
Time Invariant excluded in Fixed effects
Y(it) = X(it) + c Z(i) + a(i) + e(it)
If a(i) random individual effect
If cov ( X(it) , a(i) ) non zero (endogeneity)
Then use: within = fixed effects.
But Z(i) – Z(i.) = 0, eliminates time invariant
Between: cov (Z(i), a(i) ) non zero possible.
Y(i.) = b X(i.) + c Z(i) + a(i) + e(i.)
Mundlak (1978): run RANDOM effects including ALL x(i.)
ASSUME: a(i)=b’.X(i.)+c’.Z(i)+a’(i)
Y(it) = bw X(it) + (bb-bw) X(i.)
+ c Z(i) + a’(i) + e(it)
Estimates: within (fixed effects!) for X(it),
between with correct degrees of freedom (N-k-1) for Z(i) for balanced panel
Difference of between versus within parameters (and t test) for X(i.): signals size of endogeneity.
2 remarks
1) If you run Random effects EXCLUDING x(i.), you may face an omitted variables bias.
2) If you run OLS INCLUDING x(i.): you find the same parameters than Mundlak RE ! But the standard errors are not correct.
This helps for influence statistics computed in STATA reg and NOT computed in STATA xtreg
BD example Step 2: multivariate analysis
WITHIN (FIXED EFFECTS) ESTIMATION (default std err.)R-squared = 15.09% OF 63% (share of within variance of gdpg) = 9.45% of overall total variance of gdpg (not overwhelming)BUT estimated parameters without alpha(i) endogeneity bias.Root MSE = 2.6423Estimated with « reg »: incorrect degrees of freedom for standard errors:N.Tbar-k-1 instead of N.Tbar-N-k---------------------------------------------------------------------------- gdpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] policy | .878318 .2087423 4.21 0.000 .4668853 1.289751 lgdp | -3.882033 1.043291 -3.72 0.000 -5.938368 -1.825698 m2_1 | -.049506 .0228887 -2.16 0.032 -.0946198 -.00439 -
BETWEEN ESTIMATE: N=56WEIGHTS FOR UNBALANCED T(i) PER COUNTRY: WEIGHTED LEAST SQUARESWhen Policy is the only regressor: R2=51.51%, cf square of unweighted simple correlation with gdpg (0.7091)2=r2
R2=57.17% OF 37% (THE SHARE OF BETWEEN VARIANCE IN OVERALL VARIANCE OF GDPG): 21% of OVERALL VARIANCE.Within + Between : 15.09% of 63% + 57.17% of 37% = 9.45%+21%Alpha(i) biased parameters. TIME INVARIANT VARIABLES ARE NOT YET INCLUDEDRMSE= 2.572 = 1.604 x 1.604------------------------------------------------------------------------ gdpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- policy | 1.816484 .2393748 7.59 0.000 1.336143 2.296824 lgdp | .3931539 .3101081 1.27 0.211 -.2291234 1.015431 m2_1 | .0403462 .0204063 1.98 0.053 -.000602 .0812943 _cons | -5.105835 2.233604 -2.29 0.026 -9.58789 -.6237802------------------------------------------------------------------------------
gdpg(i.)
mpolicy
mlgdp
mm2_1
-5
0
5
10
-5 0 5 10
-2
0
2
4
-2 0 2 4
6
7
8
9
6 7 8 9
20
40
60
20 40 60
BETWEEN Including 4 time invariant variablesR2=59.07% > 57.17%: marginal gain of R2 = 2%ICRGE no longer relevant regressor (inference with N=56).------------------------------------------------------------------------------ gdpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- policy | 1.661274 .3073901 5.40 0.000 1.043224 2.279323 lgdp | -.1287407 .482766 -0.27 0.791 -1.099407 .8419255 m2_1 | .0301945 .0222294 1.36 0.181 -.0145006 .07488 icrge | .2697856 .2305179 1.17 0.248 -.1937016 .7332728 ssa | -.8900755 .8002408 -1.11 0.272 -2.499067 .7189164 easia | .128337 .9851497 0.13 0.897 -1.852439 2.109113 ethnf | -.0020761 .009249 -0.22 0.823 -.0206726 .0165203 _cons | -1.571956 3.541679 -0.44 0.659 -8.69298 5.549067------------------------------------------------------------------------------
Mundlak estimator: beta (B) – beta (W), beta(B) for time invariant: FOR BALANCED PANEL (weights in between for unbalanced?)------------------------------------------------------------------------------ gdpg | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- policy | .878318 .2079318 4.22 0.000 .4707792 1.285857 lgdp | -3.882033 1.03924 -3.74 0.000 -5.918907 -1.845159 m2_1 | -.049506 .0227999 -2.17 0.030 -.0941929 -.004819 mpolicy | .413422 .3574066 1.16 0.247 -.2870822 1.113926 mlgdp | 3.606534 1.125934 3.20 0.001 1.399744 5.813323 mm2_1 | .0766954 .0304941 2.52 0.012 .016928 .1364628 icrge | .470856 .2050086 2.30 0.022 .0690466 .8726655 ssa | -1.1797 .7126032 -1.66 0.098 -2.576376 .2169769 easia | .5661284 .8568978 0.66 0.509 -1.11336 2.245617 ethnf | -.002012 .0084419 -0.24 0.812 -.0185578 .0145339 _cons | -.811913 3.231572 -0.25 0.802 -7.145679 5.521853-------------+---------------------------------------------------------------- sigma_u | .60621529 sigma_e | 2.9596021 rho | .04026596 (fraction of variance due to u_i)------------------------------------------------------------------------------
Outliers: studentized residuals of the Mundlak regression
OLS provides residuals and fitted for Mundlak (SE of parameters are not correct).
Influence statistics
ETH5
NIC4CMR7GAB4
DOM2SYR3
CMR4
GAB3-5
05
Stu
dent
ized
res
idua
ls
CIV4
ECU2
MYS7
MLI6
NIC4
TUR7
ZAR2
-50
5S
tud
entiz
ed r
esid
ual
s
ALGERIAARGENTINABOLIVIABOTSWANABRAZILCAMEROONCHILECOLOMBIACOSTA RICACOTE D'IVOIREDOMINICAN REPUBLICECUADOREGYPTEL SALVADORETHIOPIAGABONGAMBIA, THEGHANAGUATEMALAGUYANAHAITIHONDURASINDIAINDONESIAJAMAICAKENYAKOREA, REPUBLIC OFMADAGASCARMALAWIMALAYSIAMALIMEXICOMOROCCONICARAGUANIGERNIGERIAPAKISTANPARAGUAYPERUPHILIPPINESSENEGALSIERRA LEONESOMALIASRI LANKASYRIAN ARAB REPUBLICTANZANIATHAILANDTOGOTRINIDAD AND TOBAGOTUNISIATURKEYURUGUAYVENEZUELAZAIRE (D.R. CONGO)ZAMBIAZIMBABWE
ETH5
NIC4 CMR7GAB4
URY5 CHL5SLV4 VEN5PHL5TTO6GUY5 TTO5GTM5JAM3VEN4MDG7 GMB7GHA5CMR6 PHL7MAR7SYR6ZMB5 CRI5PRY5 MEX6CHL3 THA2THA3IND2SEN4CIV4NGA4PER5 TUN6 MYS2VEN6GHA3 PRY7JAM5ZAR3 JAM4 MYS3SLE5 GMB2CHL2ZAR4 MYS5GMB6ECU5SLV5NIC5 HTI5 THA5BOL6 MEX7URY2KEN7 ECU7 IDN2GAB6 CRI4TGO5 THA4ECU6DOM5MDG3HTI6ZWE7 MAR2HND5KEN5ZAR5ZMB7ZAR6 GTM6 MYS6BOL4 CRI6GTM7MEX5 COL7MDG6GUY6 VEN2 TTO2IND4PER7MWI6 KOR4SYR2SLE7NIC6 DOM7COL5SLV6 VEN7HND7LKA2EGY7 VEN3GHA7BOL2 KOR2MYS4ECU4TUN5 CRI7SEN2SOM4PER6 IND3HTI2 COL6ARG5 GUY2GHA6MAR6PRY6SLE3MWI5 TUN7MDG2BRA7MWI7GUY4NGA5 IDN3ZMB4 ECU3NIC2ZMB6 KEN3MWI4SLE6 COL3ZMB3 DZA2GAB5TZA5 MAR4SLV2ARG4 PAK3 IDN5NER3 BOL7ZWE5 CRI3MEX3ARG6 PRY2PAK2GAB7 CHL4HND6BRA5NIC7 URY7GMB5GMB4 HND2HTI3 TUR7ZWE6 MAR5BRA4TGO3GHA4 HND4TGO6 PHL6LKA3HND3DOM4PHL2 LKA4EGY6GTM4 KOR3MEX2SLV7COL4CMR5BOL5 NGA6 KOR5TZA6 IND5SLV3SEN3ZMB2 PHL4PER2 CHL6URY3IND7 BOL3DZA3 IDN4CRI2COL2LKA6 BWA5DOM3PHL3ARG2PAK7 TTO3KEN4 CHL7TGO4GHA2PER3SLE4PER4 MAR3SEN5NGA7 PAK5KEN6HTI4 GTM2 IDN6ETH6SLE2 URY4GUY3 EGY4GTM3PAK6 CMR3DOM6ARG3 PRY3PAK4NER4 MYS7ARG7LKA5LKA7NGA3 THA6BRA3 BWA6ZAR2 SOM3GMB3URY6IND6BRA6 KEN2KOR7GUY7 THA7TTO4EGY5SYR7NIC3 MEX4MLI6 IDN7 KOR6BWA4EGY3 ECU2SYR4 NGA2 BRA2GAB2 PRY4 DOM2SYR3
CMR4
GAB3-5
05
Stu
den
tized
res
idu
als
-5 0 5 10Fitted values
ETH5
NIC4 CMR7GAB4
URY5 CHL5SLV4 VEN5PHL5TTO6GUY5 TTO5GTM5JAM3VEN4MDG7 GMB7GHA5CMR6 PHL7MAR7SYR6ZMB5 CRI5PRY5 MEX6CHL3 THA2THA3IND2SEN4CIV4NGA4PER5 TUN6 MYS2VEN6GHA3 PRY7JAM5ZAR3 JAM4 MYS3SLE5 GMB2CHL2ZAR4 MYS5GMB6ECU5SLV5NIC5 HTI5 THA5BOL6 MEX7URY2KEN7 ECU7 IDN2GAB6 CRI4TGO5 THA4ECU6DOM5MDG3HTI6ZWE7 MAR2HND5KEN5ZAR5ZMB7ZAR6 GTM6 MYS6BOL4 CRI6GTM7MEX5 COL7MDG6GUY6 VEN2 TTO2IND4PER7MWI6 KOR4SYR2SLE7NIC6 DOM7COL5SLV6 VEN7HND7LKA2EGY7 VEN3GHA7BOL2 KOR2MYS4ECU4TUN5 CRI7SEN2SOM4PER6 IND3HTI2 COL6ARG5 GUY2GHA6MAR6PRY6SLE3MWI5 TUN7MDG2BRA7MWI7GUY4NGA5 IDN3ZMB4 ECU3NIC2ZMB6 KEN3MWI4SLE6 COL3ZMB3 DZA2GAB5TZA5 MAR4SLV2ARG4 PAK3 IDN5NER3 BOL7ZWE5 CRI3MEX3ARG6 PRY2PAK2GAB7 CHL4HND6BRA5NIC7 URY7GMB5GMB4 HND2HTI3 TUR7ZWE6 MAR5BRA4TGO3GHA4 HND4TGO6 PHL6LKA3HND3DOM4PHL2 LKA4EGY6GTM4 KOR3MEX2SLV7COL4CMR5BOL5 NGA6 KOR5TZA6 IND5SLV3SEN3ZMB2 PHL4PER2 CHL6URY3IND7 BOL3DZA3 IDN4CRI2COL2LKA6 BWA5DOM3PHL3ARG2PAK7 TTO3KEN4 CHL7TGO4GHA2PER3SLE4PER4 MAR3SEN5NGA7 PAK5KEN6HTI4 GTM2 IDN6ETH6SLE2 URY4GUY3 EGY4GTM3PAK6 CMR3DOM6ARG3 PRY3PAK4NER4 MYS7ARG7LKA5LKA7NGA3 THA6BRA3 BWA6ZAR2 SOM3GMB3URY6IND6BRA6 KEN2KOR7GUY7 THA7TTO4EGY5SYR7NIC3 MEX4MLI6 IDN7 KOR6BWA4EGY3 ECU2SYR4 NGA2 BRA2GAB2 PRY4 DOM2SYR3
CMR4
GAB3
-10
010
20R
esid
uals
-5 0 5 10Fitted values
ETH5
NIC4
CMR7
GAB4
URY5CHL5SLV4
VEN5
PHL5
TTO6
GUY5
TTO5
GTM5JAM3VEN4
MDG7
GMB7
GHA5CMR6
PHL7
MAR7
SYR6
ZMB5CRI5PRY5MEX6
CHL3
THA2THA3
IND2
SEN4
CIV4
NGA4
PER5TUN6
MYS2
VEN6GHA3PRY7JAM5ZAR3
JAM4
MYS3
SLE5
GMB2
CHL2
ZAR4
MYS5
GMB6
ECU5SLV5
NIC5HTI5THA5BOL6MEX7URY2KEN7ECU7
IDN2
GAB6
CRI4TGO5
THA4ECU6
DOM5
MDG3HTI6
ZWE7MAR2HND5KEN5ZAR5
ZMB7
ZAR6GTM6
MYS6
BOL4CRI6GTM7MEX5COL7
MDG6
GUY6
VEN2TTO2IND4
PER7MWI6
KOR4SYR2
SLE7
NIC6
DOM7COL5SLV6VEN7HND7LKA2
EGY7
VEN3GHA7BOL2
KOR2
MYS4
ECU4TUN5CRI7SEN2
SOM4
PER6
IND3
HTI2
COL6ARG5
GUY2
GHA6
MAR6
PRY6SLE3
MWI5TUN7
MDG2BRA7
MWI7
GUY4
NGA5IDN3
ZMB4ECU3
NIC2
ZMB6KEN3MWI4SLE6
COL3
ZMB3DZA2GAB5TZA5MAR4SLV2ARG4PAK3
IDN5NER3BOL7
ZWE5CRI3MEX3
ARG6PRY2PAK2
GAB7
CHL4HND6
BRA5
NIC7
URY7
GMB5GMB4
HND2HTI3
TUR7ZWE6MAR5
BRA4
TGO3GHA4
HND4TGO6
PHL6
LKA3HND3DOM4
PHL2
LKA4
EGY6
GTM4
KOR3
MEX2SLV7
COL4CMR5
BOL5
NGA6KOR5
TZA6
IND5
SLV3
SEN3
ZMB2
PHL4
PER2CHL6
URY3
IND7
BOL3DZA3IDN4
CRI2COL2LKA6
BWA5
DOM3
PHL3
ARG2PAK7TTO3
KEN4
CHL7TGO4GHA2PER3SLE4PER4MAR3SEN5
NGA7
PAK5KEN6HTI4GTM2
IDN6
ETH6SLE2
URY4
GUY3
EGY4
GTM3PAK6CMR3DOM6ARG3PRY3PAK4
NER4
MYS7
ARG7
LKA5LKA7
NGA3THA6BRA3
BWA6
ZAR2
SOM3
GMB3
URY6
IND6BRA6
KEN2
KOR7
GUY7
THA7
TTO4
EGY5SYR7
NIC3
MEX4
MLI6IDN7KOR6
BWA4
EGY3
ECU2
SYR4NGA2BRA2
GAB2
PRY4DOM2
SYR3
CMR4
GAB3
0.1
.2.3
Leve
rag
e
0 .02 .04 .06 .08Normalized residual squared
Step 2.B. Hausman Taylorfor α(i)-endogenous
time invariant regressors
Panel data with time-invariant variables
Geographical distance for cross-country data in gravity models of foreign trade and foreign direct investments (Egger and Pfaffermayr (2004), Serlenga and Shin (2007),…
Years of schooling, gender and race when testing Mincer’s wage regressions using survey data (Hausman and Taylor (1981)).
Colonial or legal origin, initial GDP/head in 1960 for growth or income or inequality (GINI) regressions.
Endogeneity and overstated degrees of freedom
Endogeneity of time-varying variables.
Possible correlation of time-invariant variables with the individual effect.
Increasing the number of periods does not add additional information for time-invariant variables.
Consequences:• biased estimates• wrong inference (t-test should use dimension N
rather than NT)
Time invariant Mundlak/Between endogeneity biasj simple regression (true+bias)
The bias is linear with the standard error of the random individual term and increases with r(z(i), alpha(i)).
Tthis remains in multiple regression, but cross correlation between several endogenous variables (X’X)-1(X’α) leads to a more complicated formulas for the bias.
Hausman and Taylor estimator
It deals with the alpha(i) endogeneity of time invariant variables:
Corr ( z(i), alpha(i) )
Which is not dealt with Mundlak estimator
The trick is to use as internal instruments some of the x(i.) which are exogenous.
A pre-test may look at the Mundlak test
bb-bw for the x(i.): t<2: exogenous.
Weak instruments of HT
As ANY instrumental variables, unfortunately a strongly exogenous instrument is often a weak instrument poorly correlated with the regressor to be instrumented.
The WEAK instrument bias on the parameter and the standard error may be very large.
So HT has limits.
You may reduce the weak instrument bias
By including the average over time of endogenous time varying regressors in the HT estimation
(see program).
A pre-test estimator: The model
y : NT vector, endogenous variable
X : cross-section, time-varying variables
Z : time-invariant variables
α : individual effect
ε : disturbance term
The estimators for β and γ are biased unless no identifying assumptions are made.
Random effects Mundlak including time invariant
Auxiliary regression: The unobserved individual random effect is a LINEAR function of the average over time of ALL time-varying variables, , and the time-invariant variables
Then:
MZX .
MXZXy .)(
Pre-test: Step 1: select internal instruments for Hausman Taylor
Run a random effects Mundlak-Krishnakumar regression which provides t-tests for each
H0: πm=0 against H1: πm≠0If H0 is not rejected, add variable to subset of
exogenous variables, .
If H0 is rejected, add variable to the subset of endogenous variables,
Time Invariant – Mundlak Pretest
Y(it) = βw. X(it) + (βb- βw). X(i.)
+ γb. Z(i) + α(i) + ε(it)
If H0: (βb- βw)=0 not rejected, X(i.) is exogenous with respect to a(i).
Could be a valid « internal » instrument in the Hausman Taylor estimator with time invariant variables (but could be Weak…)
Mundlak with unbalanced panel:
CORRELATED RANDOM EFFECTS MODELSWITH UNBALANCED PANELSJeffrey M. Wooldridge∗Department of EconomicsMichigan State UniversityEast Lansing, MI [email protected] version: May 2010
Step 2B
Run an unrestricted Hausman Taylor regression with the exogenous variables, as instruments for the endogenous time-invariant variables and
KEEP AS REGRESSORS the endogenous
average-over-time variables, to correct for endogeneity of the time-varying variables
Properties of the pre-test estimator
Extreme cases:– All average-over-time variables are
significant, πi ≠ 0: Mundlak-Krishnakumar estimation: Within / Between
– No average-over-time variable is significant, πi= 0: Restricted Random Effects (GLS) estimation
We do not need any a-priory information which variables to use as instruments.
Design
nsreplicatio 1000
0.7t coefficienation autocorrelbenchmark with
3
and for 1deviation standard
0,mean with lmultinorma
1
511001
22
21211
2212121111
it
εα
itiiitititit
X
σσ
ZX
X, Z, α
T, ; t N, i
ZXXXy
Correlation matrix of Multinormal variables, det(R)>0
55.0,...,45.0,75.0,,4.0
45.0,...,35.0,45.0,...,0
1
01
052.01
075.04.0)(
004.00)(
004.000)(
),,,,,(
2
21211
22
1212
1111
2
12
11
21211
ZX
ZXZXZX
Z
XZX
XZX
XZX
itiiititit
with
XR
XR
XR
ZXXXR
Auto-correlation 0.7 for Xit, T=5
1
7.01
49.07.01
343.049.07.01
24.0343.049.07.01
168.024.0343.049.07.01
)( 11XR
Second question:OLS bias for 4 parameters
Hausman Taylor for ICRG
See program
Step 3: GMM-system and time invariant?
Not seriously investigated so far except a recent working paper.
GMM-system: levels instrumented by first differences and first differences instrumented by levels
Risk of too many instruments.