Part 2: Panel Data Estimators: using the STATA program for Growth and Aid (Burnside and Dollar) Jean-Bernard CHATELAIN

Part 2: Panel Data Estimators:using the STATA program for Growth

and Aid (Burnside and Dollar)

Jean-Bernard CHATELAIN

Plan

1. TO DO steps with panel data.

2. A tool: Standardized parameters

3. Example: Step 1: bivariate analysis with panel data

4. Theory for step 1

5. Example: Step 2: multivariate panel data estimators

6. Theory for step 2

7. Time invariant variables using panel data

To do sequence for panel data econometrics: Step 1

1. Create Within (W) Between (B) transformed variables: forget non transformed variables

2. Analysis of variance Within / Between

3. Univariate Histograms and desc stat. W/B

4. Bivariate graphs, correlation matrix W/B, including autocorrelation for W. Choice of specification

5. xtset, xtdescribe, xtsum, xttab, xtdata, xtline, xtunitroot

FIRST STEP: Always compare histograms, standard errors and simple correlations in the within versus between

subspacesfor specification search and in

order to understand your second step multivariate

results

Step 2

1. Step 2: Fixed effect, Between, Random effects Mundlak estimates if time invariant variables (xtreg, fe/be/re)

2. Check studentized residuals/dfbetas

3. If autocorrelation of residuals (xtregar)

4. Hausman Taylor for time invariant (xthtaylor)

5. Others: xtpcse (pcse), xtrc (random coefficients), xtmixed (random fixed coefficients), xtgls, xtunitroot

Step 3: Other IV-methods: other types of endogeneity

1. IV for panel (xtivreg)

2. Panel data GMM first differences (Arellano Bond) (xtabond, xtdpd, Roodman’s xtabond2 )

3. Panel data GMM system (Arellano Bover), (xtdpdsys)

2. A tool: Standardized parameters

Non standardized beta for variable x:

Beta(x)= beta(standardized, r(ij))* sigma(y)/sigma(x)

Simple regression:

BetaS(x)=r(yx) simple correlation coefficient.

Multiple regression:

BetaS(x) over 1 signals near-multicollinearity issues.

Abs(x-mean)<sigma=66% of shocks (normal)Abs(x-mean)<1.96.sigma=95%

3. Panel data models

Trade-offs

Between (endogeneity bias)

versus

Within-fixed effects (common trends, unit root problems)

Versus

First differences

A. Dealing with a specific type of (country) heteroskedasticity

y(it)=β0+β1.x(it)+α(i)+α(t)+ε(it)

α(i) and α(t) are random disturbances added to ε(it) :

α(i): individual (time invariant) unobserved random effects (characteristics).

Country: geography, history before the beginning of the sample, and so on. You do not believe it?

Then ALWAYS PLOT BOXPLOTS of RESIDUALS per individuals (countries).

ETH5

NIC4CMR7GAB4

DOM2SYR3

CMR4

GAB3-5

05

Stu

dent

ized

res

idua

ls

CIV4

ECU2

MYS7

MLI6

NIC4

TUR7

ZAR2

-50

5S

tud

entiz

ed r

esid

ual

s

ALGERIAARGENTINABOLIVIABOTSWANABRAZILCAMEROONCHILECOLOMBIACOSTA RICACOTE D'IVOIREDOMINICAN REPUBLICECUADOREGYPTEL SALVADORETHIOPIAGABONGAMBIA, THEGHANAGUATEMALAGUYANAHAITIHONDURASINDIAINDONESIAJAMAICAKENYAKOREA, REPUBLIC OFMADAGASCARMALAWIMALAYSIAMALIMEXICOMOROCCONICARAGUANIGERNIGERIAPAKISTANPARAGUAYPERUPHILIPPINESSENEGALSIERRA LEONESOMALIASRI LANKASYRIAN ARAB REPUBLICTANZANIATHAILANDTOGOTRINIDAD AND TOBAGOTUNISIATURKEYURUGUAYVENEZUELAZAIRE (D.R. CONGO)ZAMBIAZIMBABWE

Remark

In the previous graph, the boxplots of the distribution of residuals of a panel data regression (shown later in the slides) per country are presented by alphabetic order of countries.

Another graph may show boxplots ordered by the median (or by the mean) of residuals per country.

Specific Heteroskedasticityimplie specific GLS

This specific Feasible GLS estimator is the « random effect » estimator, with the choice of the optimal estimated θ weight for between with respect to within (see later).

Note that the FGLS estimator does not and cannot correct the distribution of residuals per country to be identically distributed between all countries.

B. Dealing with a specific type of endogeneity: α(i)-endogeneity

y(it)=β0+β1.x(it)+α(i)+α(t)+ε(it)

Cov(x(it), α(i)) = 0 or not.

α(i): individual (time invariant) unobserved random effects (characteristics).

Country: geography, history before the beginning of the sample, and so on.

Specific Heteroskedasticityimplie specific GLS

A joint answer to Heteroskedasticity and to

α(i)-endogeneity is:

Random effects including all averages over time of time varying variables x(i,.) in the set of regressors (Mundlak estimator). [ see later]

Analysis of variance operators: Between(groups) and Within(group)

Between:

Within:

OLS: Within + Between: x(it)

Random effects « θ-weight » operator:

Within + θ Between= x(it)-x(i.)+ θ.x(i.)

θ =0: Within

θ =1: OLS

Analysis of variance: Orthogonal sub-spaces: Within versus Between

cov( x(it)-x(i.) , x(i.) ) = 0

Overall variance = (for example)

30% Within: deviation from this average, NT-N, Regression on within transformed variables X(it) – X(i.) = fixed effect models.

70% Between: average over time of of cross sections, dimension N = good for time invariant inference. X(i.)

variable | variance gdpg | 12.95207wgdpg | 8.132692 (63%) mgdpg | 4.8193

policy | 1.591617 wpolicy | .7358689 (45%)mpolicy | .8557484 eda | 4.281359 weda | 1.202634 (28%) mmeda | 3.078725

BD: N=275: Analysis of variance

Within/overall

For the dependent:

GDPG: 63% rather high

Good news for fixed effects model!

POLICY: 45%

IEDA: 28%: relatively low variance in within.

CRGE, SSA, EASIA: 0%

Dimension of subspaces of observations

NT = N (between) + (NT – N) (within).

Then in each subspace:

Between regression leads to an analysis of variance with prediction and residual

Within regression leads to an analysis of variance with prediction and residual

Regression in each between or within subspace: another 2nd

step of the analysis of variance.

Repeated Between has meaningless degrees of freedom

Repeated T times x(i.) is what appears in your database of dimension NT when you compute it using by country egen (etc.).

STATA: xtreg y x, be

Adjust for dof = N – k -1

STATA: reg y(i.) x(i.)

Will not! Dof NT-k-1, t stats are sky rocketing

Within estimator 2nd gain: explosion of degrees of freedomNT-N-k degrees of freedom (dof)

Much larger than cross section and between dof N-k-1, and than time series dof T-k-1

Number of regressors excluding intercept : k

Intercept is useless, as within transformed dependent variable and regressors are all zero mean (this constraint explain why N degrees of freedom are lost, and « given » to the between subspace.)

1st drawback: NT>400 (N=20, T=20): substantive significance?

T-stats >2 for number of observations over 400 may (or may not) be related to contributions to R2<1%.

Check substantive significance (marginal contribution to R2:

R2(k variables) – R2 (k-1 variables)

You may find huge discrepancies among regressors

Few researcher and journal editors require it:

Publication bias of a lot of unsubstantive significance for career concerns.

Statistical significance versus Substantive significance

With panel data: NT number of observations > 400

• Statistical significance with reasonable parameters (average of parameters) is easy to obtain (N=20, T=20), but substantive significance may turn to be an issue (very small contribution to R2)!

• But assume same slopes for different countries (or individual) with respect to time series estimators by country (T>10): a VERY LARGE increase of residuals and VERY POOR forecast.

2nd Drawback of within: common slopes

Per country time series estimate (best estimates of individual constants: so called « fixed effects »)

y(1,t)=β(1,cst)+β(1,x).x(1,t)+α(1)+α(t)+ε(1,t)

y(2,t)=β(2,cst)+β(2,x).x(1,t)+α(2)+α(t)+ε(2,t)

Fixed effect/OLS on within transformed

Constrains β(1,x)=β(2,x), if not the case, explosion of the size of residuals and RMSE with respect to time series, poor forecasts albeit large t-stats due to NT degrees of freedom.

Common slopes

When one assumes common slopes in a panel estimator (random effects or within transformed), β(1,x)=β(2,x) whereas the time series estimates per country (feasible if N>k, number of regressors) suggests different slopes β(1,x)<β(2,x), the panel estimator is a weighted average of both: β(1,x)< β(panel) <β(2,x).

with (β(2,x)-β(panel))*x(2,.) goes into the residuals of country 2, and for country one:

(β(1,x)-β(panel))*x(1,.)

Not all slopes identical but some in given groups: a most interesting use of panel data

In fact, panel data are most useful to track differences in slopes for different groups of country in the panel and different goups before/after a policy intervention (e.g. financial liberalization).

The « difference of differences » estimator and tests with 2 groups and 2 periods is a typical panel data estimators.

Remark: An application of Frish Waugh Lovell theorem

OLS estimates on slopes on within transformed variables (common constant = zero)

=

OLS estimates on slopes on non-within transformed regressors but including dummies for all individuals (estimates N constants for N individuals)

3rd drawback: possible non stationarity of OLS on Within transformed variables

(« fixed effects ») if T>5It eliminates cov (x(it) , a(i) ) BUT:

Common trends remains even with T<10:

spurious regressions, trend driven near-multicollinearity.

Try also first differences (but smaller variance)

BUT ALSO: large share of overall variance (70% between variance) unexplained.

1st Gain of within: Eliminates α(i) endogeneity

y(i,t)=β(cst)+β(x).x(i,t)+α(i)+α(t)+ε(i,t)

The operator between leads to:

y(i,.)=β(cst)+β(x).x(i,.)+α(i)+ε(i,.)

The difference (within) eliminates α(i), the β(cst) constant and all time invariant regressors z(i) in the sample period, the average over time of α(t) may be chosen as zero:

y(it) - y(i,.)=β(x).(x(it) - x(i,.))+ε(it)- ε(i,.)

Remark: same trick for first difference estimator

y(i,t)=β(cst)+β(x).x(i,t)+α(i)+α(t)+ε(i,t)

The operator between leads to:

y(i,t-1)=β(cst)+β(x).x(i,t-1)+α(i)+ε(i,t-1)

The first differences eliminates α(i), the constant and all time invariant variables in the sample:

y(it) - y(i,t-1)=β(x).(x(it) - x(i,t-1))+ε(it)- ε(i,t-1)

Another advantage: First difference may eliminate unit roots for variables in levels.

Between -endogeneity biassimple regression (true+bias)

The bias is linearly increasing with the standard error of the random individual term and with .

This property remains in multiple regression, but cross correlation between several endogenous variables (X’X)-1(X’α) leads to a more complicated formulas for the bias.

OLS bias for 4 parameters

Another drawback of OLS on Between transformed variables

N-k-1 degrees of freedom, very bad with respect to publication bias requiring for t-stats>2

NT-N-k >> N – k - 1

But: nearly time invariant regressors have unstable estimates in within

• Dependent variable: check the share of variance between / within.

• Regressors: idem.• A regressor close to be time invariant has

a small share of variance in within dimension. Its within transformation is concentrated around its mean zero. The Beta in within regression will be very high and unstable when removing a few observations.

Between / Within

Between:

N, may have large share of overall variance.

Endogeneity bias

But no non stationarity issues.

Within: NT-N

No endogeneity bias

But stationarity issues (spurious time series regressions), then First differences estimator

BD example Step 1: univariate and

bivariate analysis

0.0

5.1

.15

.2.2

5

Den

sity

-10 -5 0 5 10wgdpg

0.5

11.5

2

Den

sity

-5 0 5 10weda

0.1

.2.3

.4

Den

sity

-5 0 5 10gdpg(i.)

0.5

1

Den

sity

0 2 4 6 8mmeda

Do not use the « OLS » simple correlation matrix.

You use FIXED EFFECTS (within)

The non transformed (OLS) simple correlation matrix is NOT the one to investigate for near-multicolinearity, classical suppressors (near zero correlation with dependent), sizeable correlation between regressors (endogeneity).

It is the correlation of WITHIN transformed variables!

Discrepancy between Within and Between simple correlations

Ideally, similar WITHIN BETWEEN simple correlation between dependent and regressors, as well as similar share of variance (standard errors) in within and between dimensions:

Would lead to identical betas in within and between, an acceptance of equality of both sets of parameters in Hausman tests constrasting Within and Between.

Then the overall variance would be fully explained by the model: a dream.

Ideally: Specification minimizing the gap Within versus Between (sub-correlation matrix W = B)

Minimize Panel Hausman Test statistics

while selecting regressors.

H0: β(within) - β(between) = 0

If not rejected: Within regression not spurious (also valid in between/cross section space) and Between not facing endogeneity

Ex(it)a(i)=0 which imply =0

and Between variance (often 70%) explained.

Differences of simple correlations W/B: (1) endogeneity

If no endogeneity bias and no additional issues:

Between has an alpha(i) endogeneity bias.

not Within [cf. Hausman test]

(2) Common trends (and non stationarity) in within dimension

May also explain the discrepancy of between versus within.

This time Within has a chance to lead (or not) to spurious time series correlation.

Between is cross section: it DOES not face spurious time series correlation.

ADD lagged variables and TREND in the WITHIN CORRELATION MATRIX for HINTS!

(3) Dynamic model: Between as a long term estimate

When the true model has an auto-regressive component = DYNAMIC MODEL.

AND when there is no alpha(i) endogeneity.

Between estimate converges to long term

beta/(1-autoreg parameter)

[hence within may be smaller in absolute value]

Pirotte (Economics Letters)

(4) Time invariant variables

Modify the data generating process in the between subspace

With respect to the within subspace (where they do not belong).

This may be another factor (besides endogeneity) which explains the discrepancy W/B.

Example: women/men (and no transexuals in data set) for salaries: do not matter in within dimension!

Correlation Within and Between matrix inspection: Omitted variable bias is bad except when adding highly

collinear covariate or « classical suppressor »

Y= a1. x1 + a2 . x2 + e

If corr (y , x1) below 0.1 in absolute value (« classical suppressor »: spurious effect identification problem): if possible, omit x1 in the regression.

If corr (x1, x2) higher that 0.8 in absolute value: near-multicolinerity problem: if possible, omit x1 OR x2 in the regression.

The simple correlation matrix helps to specify the multivariate

model.Select regressors.

It also indicates further causal or endogenous links between regressors (with their simple correlation is between 0.15 and 0.85):

You may use causal graphs with bidirectional arrows to get insights.

You may use these relationships for specification of first step 2SLS or simulatenous regressions.

wgdpg

wpolicy

wlgdp

wm2_1

year

-10

0

10

-10 0 10

-5

0

5

-5 0 5

-1

0

1

-1 0 1

-20

0

20

40

-20 0 20 40

2

4

6

8

2 4 6 8

gdpg(i.)

mpolicy

mlgdp

mm2_1

-5

0

5

10

-5 0 5 10

-2

0

2

4

-2 0 2 4

6

7

8

9

6 7 8 9

20

40

60

20 40 60

3 within transformed regressors with correlation with dependent different from zero, Correlation

wm2_1 with trend, autocorr wgdpg: -0.03 for 217 obs.

| year wgdpg Between wpolicy wlgdp wgdpg | -0.2737 wpolicy | 0.1806 0.2488 ≠ 0.7091 wlgdp | 0.2887 -0.2519 ≠ 0.2787 0.0543 wm2_1| 0.5588 -0.1816 ≠ 0.2116 0.0179 0.2009

Gdp growth is the first difference of log of gdp per head: persistence of gdp per head is wiped out by first differences. In this sample, its auto-correlation is close to zero: no dynamic model necessary. No long term interpretation of between.

No very large correlations (>0.8) with trend: within may not be subject to spurious time series correlation.

The between / within discrepancy may be related to cov(x(i.), alpha(i) ) endogeneity bias for between.

wgdpg

wassas

wethnfassas

-10

0

10

-10 0 10

-5

0

5

10

-5 0 5 10

-5

0

5

-5 0 5

gdpg(i.)

massas

methnfassas

-5

0

5

10

-5 0 5 10

0

2

4

0 2 4

0

1

2

0 1 2

wgdpg between wassas wassas | -0.0592 -0.0690 wethnfassas | -0.0219 -0.0656 0.8821 0.8315

Parameter identification problemCheck robustness to outliers.

wgdpg

weda

wedapolicy

weda2policy

-10

0

10

-10 0 10

-5

0

5

10

-5 0 5 10

-20

0

20

-20 0 20

-200

0

200

400

-200 0 200 400

gdpg(i.)

mmeda

medapolicy

meda2policy

-5

0

5

10

-5 0 5 10

0

5

10

0 5 10

-10

0

10

20

-10 0 10 20

-100

0

100

200

-100 0 100 200

| wgdpg between weda wedapolicy weda | 0.0079 -0.3417 wedapolicy | 0.0883 0.1866 0.4488 0.5912 weda2policy | 0.0428 0.0858 0.5147 0.6225 0.9273 0.9229

Parameter identification problemCheck robustness to outliers

DZA2DZA3ARG2ARG3ARG4ARG5ARG6 ARG7

BOL2BOL3BOL4BOL5

BOL6 BOL7BWA4

BWA5BWA6BRA2BRA3BRA4BRA5BRA6BRA7

CMR3CMR4

CMR5CMR6CMR7 CHL2CHL3 CHL4CHL5 CHL6CHL7COL2COL3COL4COL5 COL6COL7

CRI2CRI3CRI4

CRI5 CRI6CRI7CIV4 DOM2DOM3DOM4

DOM5DOM6DOM7 ECU2ECU3ECU4ECU5ECU6ECU7

EGY3

EGY4EGY5EGY6

EGY7

SLV2SLV3SLV4

SLV5SLV6

SLV7

ETH5

ETH6

GAB2

GAB3GAB4GAB5GAB6 GAB7

GMB2

GMB3

GMB4GMB5

GMB6

GMB7

GHA2GHA3 GHA4GHA5

GHA6GHA7

GTM2GTM3GTM4GTM5GTM6GTM7

GUY2 GUY3

GUY4GUY5 GUY6

GUY7

HTI2

HTI3HTI4HTI5

HTI6

HND2HND3

HND4

HND5 HND6HND7

IND2 IND3IND4 IND5IND6IND7IDN2IDN3IDN4IDN5IDN6IDN7

JAM3

JAM4JAM5

KEN2KEN3KEN4KEN5

KEN6KEN7 KOR2KOR3KOR4 KOR5 KOR6KOR7

MDG2MDG3

MDG6MDG7

MWI4

MWI5

MWI6

MWI7

MYS2MYS3MYS4MYS5MYS6 MYS7MLI6 MEX2MEX3 MEX4MEX5MEX6 MEX7MAR2 MAR3MAR4MAR5MAR6

MAR7

NIC2 NIC3

NIC4

NIC5

NIC6

NIC7

NER3

NER4NGA2NGA3NGA4NGA5 NGA6NGA7PAK2PAK3 PAK4PAK5PAK6PAK7

PRY2PRY3 PRY4PRY5 PRY6PRY7 PER2PER3PER4PER5PER6 PER7 PHL2PHL3PHL4PHL5 PHL6PHL7SEN2 SEN3

SEN4 SEN5SLE2

SLE3SLE4

SLE5 SLE6SLE7SOM3

SOM4LKA2 LKA3

LKA4LKA5LKA6 LKA7SYR2

SYR3SYR4

SYR6SYR7

TZA5TZA6THA2THA3THA4THA5 THA6THA7

TGO3TGO4

TGO5 TGO6

TTO2 TTO3TTO4TTO5TTO6 TUN5TUN6 TUN7TUR7URY2 URY3URY4URY5 URY6URY7VEN2VEN3VEN4VEN5 VEN6 VEN7ZAR2ZAR3ZAR4

ZAR5

ZAR6

ZMB2ZMB3

ZMB4ZMB5

ZMB6

ZMB7

ZWE5ZWE6

ZWE7

-50

510

With

in t

rans

form

ed

Aid

/GD

P

-10 -5 0 5 10

Within transformed GDP growth

Actual Data

Linear fit

Quadratic fit

Lowess

Bivariate Within tranformed: Aid/GDP and GDP Growth

DZA2DZA3

ARG2ARG3

ARG4

ARG5

ARG6

ARG7

BOL2

BOL3

BOL4

BOL5

BOL6

BOL7BWA4

BWA5BWA6

BRA2

BRA3

BRA4BRA5

BRA6

BRA7

CMR3

CMR4

CMR5

CMR6

CMR7

CHL2CHL3

CHL4

CHL5

CHL6CHL7COL2

COL3COL4

COL5

COL6COL7

CRI2

CRI3

CRI4

CRI5

CRI6CRI7CIV4

DOM2

DOM3DOM4

DOM5

DOM6

DOM7

ECU2

ECU3

ECU4

ECU5ECU6

ECU7

EGY3

EGY4EGY5

EGY6

EGY7

SLV2SLV3

SLV4

SLV5

SLV6

SLV7

ETH5

ETH6

GAB2

GAB3

GAB4

GAB5

GAB6

GAB7GMB2

GMB3

GMB4

GMB5GMB6

GMB7

GHA2

GHA3

GHA4

GHA5

GHA6GHA7

GTM2GTM3

GTM4

GTM5

GTM6GTM7

GUY2

GUY3

GUY4

GUY5

GUY6

GUY7

HTI2HTI3

HTI4

HTI5HTI6

HND2HND3

HND4

HND5

HND6HND7

IND2

IND3

IND4

IND5

IND6

IND7IDN2IDN3IDN4

IDN5

IDN6

IDN7

JAM3

JAM4JAM5

KEN2

KEN3

KEN4

KEN5

KEN6

KEN7

KOR2KOR3

KOR4

KOR5

KOR6

KOR7

MDG2MDG3

MDG6

MDG7

MWI4MWI5

MWI6MWI7

MYS2MYS3MYS4

MYS5MYS6

MYS7

MLI6

MEX2

MEX3

MEX4

MEX5MEX6

MEX7MAR2

MAR3

MAR4

MAR5MAR6

MAR7

NIC2

NIC3

NIC4

NIC5

NIC6

NIC7

NER3

NER4

NGA2

NGA3

NGA4

NGA5

NGA6NGA7

PAK2PAK3

PAK4PAK5PAK6

PAK7

PRY2

PRY3

PRY4

PRY5

PRY6PRY7

PER2PER3PER4

PER5

PER6

PER7

PHL2PHL3PHL4

PHL5

PHL6

PHL7

SEN2

SEN3

SEN4

SEN5SLE2

SLE3

SLE4

SLE5

SLE6SLE7

SOM3

SOM4

LKA2

LKA3LKA4LKA5

LKA6

LKA7

SYR2

SYR3

SYR4

SYR6

SYR7

TZA5TZA6

THA2THA3THA4THA5

THA6THA7

TGO3TGO4

TGO5

TGO6TTO2

TTO3TTO4

TTO5TTO6

TUN5

TUN6

TUN7

TUR7

URY2

URY3URY4

URY5

URY6

URY7VEN2VEN3

VEN4VEN5

VEN6

VEN7ZAR2

ZAR3

ZAR4

ZAR5ZAR6

ZMB2

ZMB3 ZMB4

ZMB5

ZMB6ZMB7

ZWE5ZWE6

ZWE7

-10

-50

510

With

in t

rans

form

ed

GD

P g

row

th

-5 0 5 10

Within transformed Aid/GDP

Actual Data

Linear fit

Quadratic fit

Lowess

Bivariate Within tranformed: Aid/GDP versus GDP Growth

DZA2DZA3

ARG2ARG3

ARG4

ARG5

ARG6

ARG7

BOL2

BOL3

BOL4

BOL5

BOL6

BOL7BWA4

BWA5BWA6

BRA2

BRA3

BRA4BRA5

BRA6

BRA7

CMR3

CMR4

CMR5

CMR6

CMR7

CHL2CHL3

CHL4

CHL5

CHL6CHL7COL2

COL3COL4

COL5

COL6COL7

CRI2

CRI3

CRI4

CRI5

CRI6CRI7

CIV4

DOM2

DOM3DOM4

DOM5

DOM6

DOM7

ECU2

ECU3

ECU4

ECU5ECU6

ECU7

EGY3

EGY4EGY5

EGY6

EGY7

SLV2SLV3

SLV4

SLV5

SLV6

SLV7

ETH5

ETH6

GAB2

GAB3

GAB4

GAB5

GAB6

GAB7GMB2

GMB3

GMB4GMB5

GMB6GMB7

GHA2

GHA3

GHA4

GHA5

GHA6GHA7

GTM2GTM3

GTM4

GTM5

GTM6GTM7

GUY2

GUY3

GUY4

GUY5

GUY6

GUY7

HTI2HTI3

HTI4

HTI5HTI6

HND2HND3

HND4

HND5

HND6 HND7

IND2

IND3

IND4

IND5

IND6

IND7IDN2IDN3IDN4

IDN5

IDN6

IDN7

JAM3

JAM4JAM5

KEN2

KEN3

KEN4

KEN5

KEN6

KEN7

KOR2KOR3

KOR4

KOR5

KOR6

KOR7

MDG2MDG3

MDG6

MDG7

MWI4MWI5

MWI6MWI7

MYS2MYS3

MYS4

MYS5MYS6

MYS7

MLI6

MEX2

MEX3

MEX4

MEX5MEX6

MEX7MAR2

MAR3

MAR4

MAR5 MAR6

MAR7

NIC2

NIC3

NIC4

NIC5

NIC6

NIC7

NER3

NER4

NGA2

NGA3

NGA4

NGA5

NGA6NGA7

PAK2PAK3

PAK4PAK5PAK6

PAK7

PRY2

PRY3

PRY4

PRY5

PRY6PRY7

PER2PER3PER4

PER5

PER6

PER7

PHL2PHL3PHL4

PHL5

PHL6

PHL7

SEN2

SEN3

SEN4

SEN5SLE2

SLE3

SLE4

SLE5

SLE6SLE7

SOM3

SOM4

LKA2

LKA3LKA4LKA5

LKA6

LKA7

SYR2

SYR3

SYR4

SYR6

SYR7

TZA5TZA6

THA2THA3THA4THA5

THA6THA7

TGO3TGO4

TGO5

TGO6TTO2

TTO3TTO4

TTO5TTO6

TUN5

TUN6

TUN7

TUR7

URY2

URY3URY4

URY5

URY6

URY7VEN2VEN3

VEN4VEN5

VEN6

VEN7ZAR2

ZAR3

ZAR4

ZAR5ZAR6

ZMB2

ZMB3ZMB4

ZMB5

ZMB6ZMB7

ZWE5ZWE6

ZWE7

-10

-50

510

With

in t

rans

form

ed G

DP

gro

wth

-6 -4 -2 0 2

Within transformed Policy

Actual Data

Linear fit

Quadratic fit

Lowess

Within tranformed: GDP Growth and Policy

DZADZA

ARGARGARGARGARGARGBOLBOLBOLBOLBOLBOL

BWABWABWA

BRABRABRABRABRABRA

CMRCMRCMRCMRCMR

CHLCHLCHLCHLCHLCHLCOLCOLCOLCOLCOLCOLCRICRICRICRICRICRI

CIV

DOMDOMDOMDOMDOMDOMECUECUECUECUECUECU

EGYEGYEGYEGYEGY

SLVSLVSLVSLVSLVSLV

ETHETH

GABGABGABGABGABGAB

GMBGMBGMBGMBGMBGMB

GHAGHAGHAGHAGHAGHA

GTMGTMGTMGTMGTMGTM

GUYGUYGUYGUYGUYGUYHTIHTIHTIHTIHTI

HNDHNDHNDHNDHNDHND

INDINDINDINDINDIND

IDNIDNIDNIDNIDNIDN

JAMJAMJAM

KENKENKENKENKENKEN

KORKORKORKORKORKOR

MDGMDGMDGMDGMWIMWIMWIMWI

MYSMYSMYSMYSMYSMYS MLI

MEXMEXMEXMEXMEXMEXMARMARMARMARMARMAR

NICNICNICNICNICNIC

NERNERNGANGANGANGANGANGA

PAKPAKPAKPAKPAKPAKPRYPRYPRYPRYPRYPRY

PERPERPERPERPERPER

PHLPHLPHLPHLPHLPHL

SENSENSENSENSLESLESLESLESLESLE

SOMSOM

LKALKALKALKALKALKA SYRSYRSYRSYRSYR

TZATZA

THATHATHATHATHATHA

TGOTGOTGOTGO

TTOTTOTTOTTOTTOTUNTUNTUN

TUR

URYURYURYURYURYURY

VENVENVENVENVENVEN

ZARZARZARZARZAR ZMBZMBZMBZMBZMBZMB

ZWEZWEZWE

-50

510

Bet

wee

n tra

nsfo

rmed

GD

P g

row

th

0 2 4 6 8

Between transformed Aid/GDP

Actual Data

Linear fit

Quadratic fit

Lowess

Weighted repeated Between tranformed: Aid/GDP versus GDP Growth

DZADZA

ARGARGARGARGARGARGBOLBOLBOLBOLBOLBOL

BWABWABWA

BRABRABRABRABRABRA

CMRCMRCMRCMRCMR

CHLCHLCHLCHLCHLCHLCOLCOLCOLCOLCOLCOLCRICRICRICRICRICRI

CIV

DOMDOMDOMDOMDOMDOM ECUECUECUECUECUECU

EGYEGYEGYEGYEGY

SLVSLVSLVSLVSLVSLV

ETHETH

GABGABGABGABGABGAB

GMBGMBGMBGMBGMBGMB

GHAGHAGHAGHAGHAGHA

GTMGTMGTMGTMGTMGTM

GUYGUYGUYGUYGUYGUYHTIHTIHTIHTIHTI

HNDHNDHNDHNDHNDHND

INDINDINDINDINDIND

IDNIDNIDNIDNIDNIDN

JAMJAMJAM

KENKENKENKENKENKEN

KORKORKORKORKORKOR

MDGMDGMDGMDGMWIMWIMWIMWI

MYSMYSMYSMYSMYSMYSMLI

MEXMEXMEXMEXMEXMEXMARMARMARMARMARMAR

NICNICNICNICNICNIC

NERNERNGANGANGANGANGANGA

PAKPAKPAKPAKPAKPAKPRYPRYPRYPRYPRYPRY

PERPERPERPERPERPER

PHLPHLPHLPHLPHLPHL

SENSENSENSENSLESLESLESLESLESLE

SOMSOM

LKALKALKALKALKALKASYRSYRSYRSYRSYR

TZATZA

THATHATHATHATHATHA

TGOTGOTGOTGO

TTOTTOTTOTTOTTOTUNTUNTUN

TUR

URYURYURYURYURYURY

VENVENVENVENVENVEN

ZARZARZARZARZARZMBZMBZMBZMBZMBZMB

ZWEZWEZWE

-50

510

Bet

wee

n tr

ansf

orm

ed G

DP

gro

wth

-1 0 1 2 3 4

Between transformed Policy

Actual Data

Linear fit

Quadratic fit

Lowess

Weighted repeated Between tranformed: GDP Growth and Policy

Aid2*Policy ordered by its mean values over time per country.

A few very large country-outliers-2

000

200

400

600

eda2

pol

icy

-51.18398-2.79541-.0002516.000174.0004187.0008187.0043965.0145151.0209916.0258429.0322472.0337574.0616103.1186159.1926836.2457322.2581988.3121285.3177678.3845654.3896744.397981.4410459.5594041.5744781.5931172.6132871.78449431.3197951.4484061.755161.9522572.7751342.7838692.9021643.2499533.3897483.951664.4271595.2107387.0500618.3241839.2244719.88985410.9641512.0066112.1251512.6576313.983821.7781125.2001826.9940654.1966994.54123113.7891149.2819

-40

-20

020

40ed

apo

licy

-7.564619-.0583638-.0096292.0091213.0109737.0146439.0381823.0669819.1242624.1262204.1545346.179565.2074886.2508489.3692497.412823.4941833.5683056.5948087.6384138.7247269.7387802.775421.7759466.776252.787003.89940361.0856391.1564761.2244481.2524241.3279761.4156451.5388751.5837811.6475681.7812841.8825962.0814652.1925182.3647112.431222.5345592.85823.0666253.1079283.279863.3817613.4894083.6536793.7310283.8731874.94910814.4406714.8756118.62207

Boxplots (per country) of aid*policy and Aid2*Policy ordered by

aid*policy(i.): only a few countries depart from « small » values

-200

020

040

060

0ed

a2p

olic

y

-51.18398-2.79541-.0002516.000174.0004187.0008187.0043965.0145151.0209916.0258429.0322472.0337574.0616103.1186159.1926836.2457322.2581988.3121285.3177678.3845654.3896744.397981.4410459.5594041.5744781.5931172.6132871.78449431.3197951.4484061.755161.9522572.7751342.7838692.9021643.2499533.3897483.951664.4271595.2107387.0500618.3241839.2244719.88985410.9641512.0066112.1251512.6576313.983821.7781125.2001826.9940654.1966994.54123113.7891149.2819

Poorly correlated (<0.1) with dependent(and highly correlated together):

spurious parameter identification problemif not robust to outliers: spurious

| year wgdpg weda wedapol weda2plcy wassas -------------+--------------------------------------------------------------- weda | 0.3601 0.0079 wedapolicy | 0.1748 0.0883 0.4488 weda2policy | 0.1383 0.0428 0.5147 0.9273 wassas | 0.0761 -0.0592 0.0306 0.0162 -0.0020 wethnfassas | 0.0296 -0.0219 -0.0003 -0.0070 -0.0088 0.8821

Step 2. Multivariate panel data estimators

Multivariate panel estimators

Fixed effects: OLS on within transformed

Between: OLS on between transformed.

Random effects: OLS on

Within transformed + theta * between transformed.

Mundlak: Random effects including all x(i.) and z(i)

OLS on First differences (with T=2: identical to fixed effects, different when T>2).

You do not have time invariant variables in your model

It is proposed to do an Hausman pre-test for

Random effects (without all x(i.)) versus Fixed effects.

Guggenberger (Journal of econometrics): pre-test is misleading. For « SMALL » endogeneity (corr(x(it),alpha(i) <0.25), it over accepts « random effects », but the endogeneity bias is LARGE.

No time invariant (continued)

Fixed effects (within) is always better than

Random effect and OLS

Then if time varying endogeneity issues than alpha(i) endogeneity: use panel instrumental variables estimators

Xtivreg

Xtabond2 (GMM ONLY IF T<10!!)

But: what if common trends for dependent and regressors

Insight from simple correlation with trend

And auto-correlation.

Then, also add in your tables:

Between estimation.

And

First differences estimation (prefered one before IV estimation).

You have time invariant variables using panel data

Orthogonal spaces:

Between: average over time of of cross sections, dimension N

Is RELEVANT for inference of time invariant Z(i) via cancelling out of individual disturbances: N observation and NOT

Repeated between with NT observations!

Time Invariant excluded in Fixed effects

Y(it) = X(it) + c Z(i) + a(i) + e(it)

If a(i) random individual effect

If cov ( X(it) , a(i) ) non zero (endogeneity)

Then use: within = fixed effects.

But Z(i) – Z(i.) = 0, eliminates time invariant

Between: cov (Z(i), a(i) ) non zero possible.

Y(i.) = b X(i.) + c Z(i) + a(i) + e(i.)

Mundlak (1978): run RANDOM effects including ALL x(i.)

ASSUME: a(i)=b’.X(i.)+c’.Z(i)+a’(i)

Y(it) = bw X(it) + (bb-bw) X(i.)

+ c Z(i) + a’(i) + e(it)

Estimates: within (fixed effects!) for X(it),

between with correct degrees of freedom (N-k-1) for Z(i) for balanced panel

Difference of between versus within parameters (and t test) for X(i.): signals size of endogeneity.

2 remarks

1) If you run Random effects EXCLUDING x(i.), you may face an omitted variables bias.

2) If you run OLS INCLUDING x(i.): you find the same parameters than Mundlak RE ! But the standard errors are not correct.

This helps for influence statistics computed in STATA reg and NOT computed in STATA xtreg

BD example Step 2: multivariate analysis

WITHIN (FIXED EFFECTS) ESTIMATION (default std err.)R-squared = 15.09% OF 63% (share of within variance of gdpg) = 9.45% of overall total variance of gdpg (not overwhelming)BUT estimated parameters without alpha(i) endogeneity bias.Root MSE = 2.6423Estimated with « reg »: incorrect degrees of freedom for standard errors:N.Tbar-k-1 instead of N.Tbar-N-k---------------------------------------------------------------------------- gdpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] policy | .878318 .2087423 4.21 0.000 .4668853 1.289751 lgdp | -3.882033 1.043291 -3.72 0.000 -5.938368 -1.825698 m2_1 | -.049506 .0228887 -2.16 0.032 -.0946198 -.00439 -

BETWEEN ESTIMATE: N=56WEIGHTS FOR UNBALANCED T(i) PER COUNTRY: WEIGHTED LEAST SQUARESWhen Policy is the only regressor: R2=51.51%, cf square of unweighted simple correlation with gdpg (0.7091)2=r2

R2=57.17% OF 37% (THE SHARE OF BETWEEN VARIANCE IN OVERALL VARIANCE OF GDPG): 21% of OVERALL VARIANCE.Within + Between : 15.09% of 63% + 57.17% of 37% = 9.45%+21%Alpha(i) biased parameters. TIME INVARIANT VARIABLES ARE NOT YET INCLUDEDRMSE= 2.572 = 1.604 x 1.604------------------------------------------------------------------------ gdpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- policy | 1.816484 .2393748 7.59 0.000 1.336143 2.296824 lgdp | .3931539 .3101081 1.27 0.211 -.2291234 1.015431 m2_1 | .0403462 .0204063 1.98 0.053 -.000602 .0812943 _cons | -5.105835 2.233604 -2.29 0.026 -9.58789 -.6237802------------------------------------------------------------------------------

gdpg(i.)

mpolicy

mlgdp

mm2_1

-5

0

5

10

-5 0 5 10

-2

0

2

4

-2 0 2 4

6

7

8

9

6 7 8 9

20

40

60

20 40 60

BETWEEN Including 4 time invariant variablesR2=59.07% > 57.17%: marginal gain of R2 = 2%ICRGE no longer relevant regressor (inference with N=56).------------------------------------------------------------------------------ gdpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- policy | 1.661274 .3073901 5.40 0.000 1.043224 2.279323 lgdp | -.1287407 .482766 -0.27 0.791 -1.099407 .8419255 m2_1 | .0301945 .0222294 1.36 0.181 -.0145006 .07488 icrge | .2697856 .2305179 1.17 0.248 -.1937016 .7332728 ssa | -.8900755 .8002408 -1.11 0.272 -2.499067 .7189164 easia | .128337 .9851497 0.13 0.897 -1.852439 2.109113 ethnf | -.0020761 .009249 -0.22 0.823 -.0206726 .0165203 _cons | -1.571956 3.541679 -0.44 0.659 -8.69298 5.549067------------------------------------------------------------------------------

Mundlak estimator: beta (B) – beta (W), beta(B) for time invariant: FOR BALANCED PANEL (weights in between for unbalanced?)------------------------------------------------------------------------------ gdpg | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- policy | .878318 .2079318 4.22 0.000 .4707792 1.285857 lgdp | -3.882033 1.03924 -3.74 0.000 -5.918907 -1.845159 m2_1 | -.049506 .0227999 -2.17 0.030 -.0941929 -.004819 mpolicy | .413422 .3574066 1.16 0.247 -.2870822 1.113926 mlgdp | 3.606534 1.125934 3.20 0.001 1.399744 5.813323 mm2_1 | .0766954 .0304941 2.52 0.012 .016928 .1364628 icrge | .470856 .2050086 2.30 0.022 .0690466 .8726655 ssa | -1.1797 .7126032 -1.66 0.098 -2.576376 .2169769 easia | .5661284 .8568978 0.66 0.509 -1.11336 2.245617 ethnf | -.002012 .0084419 -0.24 0.812 -.0185578 .0145339 _cons | -.811913 3.231572 -0.25 0.802 -7.145679 5.521853-------------+---------------------------------------------------------------- sigma_u | .60621529 sigma_e | 2.9596021 rho | .04026596 (fraction of variance due to u_i)------------------------------------------------------------------------------

Outliers: studentized residuals of the Mundlak regression

OLS provides residuals and fitted for Mundlak (SE of parameters are not correct).

Influence statistics

ETH5

NIC4CMR7GAB4

DOM2SYR3

CMR4

GAB3-5

05

Stu

dent

ized

res

idua

ls

CIV4

ECU2

MYS7

MLI6

NIC4

TUR7

ZAR2

-50

5S

tud

entiz

ed r

esid

ual

s

ALGERIAARGENTINABOLIVIABOTSWANABRAZILCAMEROONCHILECOLOMBIACOSTA RICACOTE D'IVOIREDOMINICAN REPUBLICECUADOREGYPTEL SALVADORETHIOPIAGABONGAMBIA, THEGHANAGUATEMALAGUYANAHAITIHONDURASINDIAINDONESIAJAMAICAKENYAKOREA, REPUBLIC OFMADAGASCARMALAWIMALAYSIAMALIMEXICOMOROCCONICARAGUANIGERNIGERIAPAKISTANPARAGUAYPERUPHILIPPINESSENEGALSIERRA LEONESOMALIASRI LANKASYRIAN ARAB REPUBLICTANZANIATHAILANDTOGOTRINIDAD AND TOBAGOTUNISIATURKEYURUGUAYVENEZUELAZAIRE (D.R. CONGO)ZAMBIAZIMBABWE

ETH5

NIC4 CMR7GAB4

URY5 CHL5SLV4 VEN5PHL5TTO6GUY5 TTO5GTM5JAM3VEN4MDG7 GMB7GHA5CMR6 PHL7MAR7SYR6ZMB5 CRI5PRY5 MEX6CHL3 THA2THA3IND2SEN4CIV4NGA4PER5 TUN6 MYS2VEN6GHA3 PRY7JAM5ZAR3 JAM4 MYS3SLE5 GMB2CHL2ZAR4 MYS5GMB6ECU5SLV5NIC5 HTI5 THA5BOL6 MEX7URY2KEN7 ECU7 IDN2GAB6 CRI4TGO5 THA4ECU6DOM5MDG3HTI6ZWE7 MAR2HND5KEN5ZAR5ZMB7ZAR6 GTM6 MYS6BOL4 CRI6GTM7MEX5 COL7MDG6GUY6 VEN2 TTO2IND4PER7MWI6 KOR4SYR2SLE7NIC6 DOM7COL5SLV6 VEN7HND7LKA2EGY7 VEN3GHA7BOL2 KOR2MYS4ECU4TUN5 CRI7SEN2SOM4PER6 IND3HTI2 COL6ARG5 GUY2GHA6MAR6PRY6SLE3MWI5 TUN7MDG2BRA7MWI7GUY4NGA5 IDN3ZMB4 ECU3NIC2ZMB6 KEN3MWI4SLE6 COL3ZMB3 DZA2GAB5TZA5 MAR4SLV2ARG4 PAK3 IDN5NER3 BOL7ZWE5 CRI3MEX3ARG6 PRY2PAK2GAB7 CHL4HND6BRA5NIC7 URY7GMB5GMB4 HND2HTI3 TUR7ZWE6 MAR5BRA4TGO3GHA4 HND4TGO6 PHL6LKA3HND3DOM4PHL2 LKA4EGY6GTM4 KOR3MEX2SLV7COL4CMR5BOL5 NGA6 KOR5TZA6 IND5SLV3SEN3ZMB2 PHL4PER2 CHL6URY3IND7 BOL3DZA3 IDN4CRI2COL2LKA6 BWA5DOM3PHL3ARG2PAK7 TTO3KEN4 CHL7TGO4GHA2PER3SLE4PER4 MAR3SEN5NGA7 PAK5KEN6HTI4 GTM2 IDN6ETH6SLE2 URY4GUY3 EGY4GTM3PAK6 CMR3DOM6ARG3 PRY3PAK4NER4 MYS7ARG7LKA5LKA7NGA3 THA6BRA3 BWA6ZAR2 SOM3GMB3URY6IND6BRA6 KEN2KOR7GUY7 THA7TTO4EGY5SYR7NIC3 MEX4MLI6 IDN7 KOR6BWA4EGY3 ECU2SYR4 NGA2 BRA2GAB2 PRY4 DOM2SYR3

CMR4

GAB3-5

05

Stu

den

tized

res

idu

als

-5 0 5 10Fitted values

ETH5

NIC4 CMR7GAB4

URY5 CHL5SLV4 VEN5PHL5TTO6GUY5 TTO5GTM5JAM3VEN4MDG7 GMB7GHA5CMR6 PHL7MAR7SYR6ZMB5 CRI5PRY5 MEX6CHL3 THA2THA3IND2SEN4CIV4NGA4PER5 TUN6 MYS2VEN6GHA3 PRY7JAM5ZAR3 JAM4 MYS3SLE5 GMB2CHL2ZAR4 MYS5GMB6ECU5SLV5NIC5 HTI5 THA5BOL6 MEX7URY2KEN7 ECU7 IDN2GAB6 CRI4TGO5 THA4ECU6DOM5MDG3HTI6ZWE7 MAR2HND5KEN5ZAR5ZMB7ZAR6 GTM6 MYS6BOL4 CRI6GTM7MEX5 COL7MDG6GUY6 VEN2 TTO2IND4PER7MWI6 KOR4SYR2SLE7NIC6 DOM7COL5SLV6 VEN7HND7LKA2EGY7 VEN3GHA7BOL2 KOR2MYS4ECU4TUN5 CRI7SEN2SOM4PER6 IND3HTI2 COL6ARG5 GUY2GHA6MAR6PRY6SLE3MWI5 TUN7MDG2BRA7MWI7GUY4NGA5 IDN3ZMB4 ECU3NIC2ZMB6 KEN3MWI4SLE6 COL3ZMB3 DZA2GAB5TZA5 MAR4SLV2ARG4 PAK3 IDN5NER3 BOL7ZWE5 CRI3MEX3ARG6 PRY2PAK2GAB7 CHL4HND6BRA5NIC7 URY7GMB5GMB4 HND2HTI3 TUR7ZWE6 MAR5BRA4TGO3GHA4 HND4TGO6 PHL6LKA3HND3DOM4PHL2 LKA4EGY6GTM4 KOR3MEX2SLV7COL4CMR5BOL5 NGA6 KOR5TZA6 IND5SLV3SEN3ZMB2 PHL4PER2 CHL6URY3IND7 BOL3DZA3 IDN4CRI2COL2LKA6 BWA5DOM3PHL3ARG2PAK7 TTO3KEN4 CHL7TGO4GHA2PER3SLE4PER4 MAR3SEN5NGA7 PAK5KEN6HTI4 GTM2 IDN6ETH6SLE2 URY4GUY3 EGY4GTM3PAK6 CMR3DOM6ARG3 PRY3PAK4NER4 MYS7ARG7LKA5LKA7NGA3 THA6BRA3 BWA6ZAR2 SOM3GMB3URY6IND6BRA6 KEN2KOR7GUY7 THA7TTO4EGY5SYR7NIC3 MEX4MLI6 IDN7 KOR6BWA4EGY3 ECU2SYR4 NGA2 BRA2GAB2 PRY4 DOM2SYR3

CMR4

GAB3

-10

010

20R

esid

uals

-5 0 5 10Fitted values

ETH5

NIC4

CMR7

GAB4

URY5CHL5SLV4

VEN5

PHL5

TTO6

GUY5

TTO5

GTM5JAM3VEN4

MDG7

GMB7

GHA5CMR6

PHL7

MAR7

SYR6

ZMB5CRI5PRY5MEX6

CHL3

THA2THA3

IND2

SEN4

CIV4

NGA4

PER5TUN6

MYS2

VEN6GHA3PRY7JAM5ZAR3

JAM4

MYS3

SLE5

GMB2

CHL2

ZAR4

MYS5

GMB6

ECU5SLV5

NIC5HTI5THA5BOL6MEX7URY2KEN7ECU7

IDN2

GAB6

CRI4TGO5

THA4ECU6

DOM5

MDG3HTI6

ZWE7MAR2HND5KEN5ZAR5

ZMB7

ZAR6GTM6

MYS6

BOL4CRI6GTM7MEX5COL7

MDG6

GUY6

VEN2TTO2IND4

PER7MWI6

KOR4SYR2

SLE7

NIC6

DOM7COL5SLV6VEN7HND7LKA2

EGY7

VEN3GHA7BOL2

KOR2

MYS4

ECU4TUN5CRI7SEN2

SOM4

PER6

IND3

HTI2

COL6ARG5

GUY2

GHA6

MAR6

PRY6SLE3

MWI5TUN7

MDG2BRA7

MWI7

GUY4

NGA5IDN3

ZMB4ECU3

NIC2

ZMB6KEN3MWI4SLE6

COL3

ZMB3DZA2GAB5TZA5MAR4SLV2ARG4PAK3

IDN5NER3BOL7

ZWE5CRI3MEX3

ARG6PRY2PAK2

GAB7

CHL4HND6

BRA5

NIC7

URY7

GMB5GMB4

HND2HTI3

TUR7ZWE6MAR5

BRA4

TGO3GHA4

HND4TGO6

PHL6

LKA3HND3DOM4

PHL2

LKA4

EGY6

GTM4

KOR3

MEX2SLV7

COL4CMR5

BOL5

NGA6KOR5

TZA6

IND5

SLV3

SEN3

ZMB2

PHL4

PER2CHL6

URY3

IND7

BOL3DZA3IDN4

CRI2COL2LKA6

BWA5

DOM3

PHL3

ARG2PAK7TTO3

KEN4

CHL7TGO4GHA2PER3SLE4PER4MAR3SEN5

NGA7

PAK5KEN6HTI4GTM2

IDN6

ETH6SLE2

URY4

GUY3

EGY4

GTM3PAK6CMR3DOM6ARG3PRY3PAK4

NER4

MYS7

ARG7

LKA5LKA7

NGA3THA6BRA3

BWA6

ZAR2

SOM3

GMB3

URY6

IND6BRA6

KEN2

KOR7

GUY7

THA7

TTO4

EGY5SYR7

NIC3

MEX4

MLI6IDN7KOR6

BWA4

EGY3

ECU2

SYR4NGA2BRA2

GAB2

PRY4DOM2

SYR3

CMR4

GAB3

0.1

.2.3

Leve

rag

e

0 .02 .04 .06 .08Normalized residual squared

Step 2.B. Hausman Taylorfor α(i)-endogenous

time invariant regressors

Panel data with time-invariant variables

Geographical distance for cross-country data in gravity models of foreign trade and foreign direct investments (Egger and Pfaffermayr (2004), Serlenga and Shin (2007),…

Years of schooling, gender and race when testing Mincer’s wage regressions using survey data (Hausman and Taylor (1981)).

Colonial or legal origin, initial GDP/head in 1960 for growth or income or inequality (GINI) regressions.

Endogeneity and overstated degrees of freedom

Endogeneity of time-varying variables.

Possible correlation of time-invariant variables with the individual effect.

Increasing the number of periods does not add additional information for time-invariant variables.

Consequences:• biased estimates• wrong inference (t-test should use dimension N

rather than NT)

Time invariant Mundlak/Between endogeneity biasj simple regression (true+bias)

The bias is linear with the standard error of the random individual term and increases with r(z(i), alpha(i)).

Tthis remains in multiple regression, but cross correlation between several endogenous variables (X’X)-1(X’α) leads to a more complicated formulas for the bias.

Hausman and Taylor estimator

It deals with the alpha(i) endogeneity of time invariant variables:

Corr ( z(i), alpha(i) )

Which is not dealt with Mundlak estimator

The trick is to use as internal instruments some of the x(i.) which are exogenous.

A pre-test may look at the Mundlak test

bb-bw for the x(i.): t<2: exogenous.

Weak instruments of HT

As ANY instrumental variables, unfortunately a strongly exogenous instrument is often a weak instrument poorly correlated with the regressor to be instrumented.

The WEAK instrument bias on the parameter and the standard error may be very large.

So HT has limits.

You may reduce the weak instrument bias

By including the average over time of endogenous time varying regressors in the HT estimation

(see program).

A pre-test estimator: The model

y : NT vector, endogenous variable

X : cross-section, time-varying variables

Z : time-invariant variables

α : individual effect

ε : disturbance term

The estimators for β and γ are biased unless no identifying assumptions are made.

Random effects Mundlak including time invariant

Auxiliary regression: The unobserved individual random effect is a LINEAR function of the average over time of ALL time-varying variables, , and the time-invariant variables

Then:

MZX .

MXZXy .)(

Pre-test: Step 1: select internal instruments for Hausman Taylor

Run a random effects Mundlak-Krishnakumar regression which provides t-tests for each

H0: πm=0 against H1: πm≠0If H0 is not rejected, add variable to subset of

exogenous variables, .

If H0 is rejected, add variable to the subset of endogenous variables,

Time Invariant – Mundlak Pretest

Y(it) = βw. X(it) + (βb- βw). X(i.)

+ γb. Z(i) + α(i) + ε(it)

If H0: (βb- βw)=0 not rejected, X(i.) is exogenous with respect to a(i).

Could be a valid « internal » instrument in the Hausman Taylor estimator with time invariant variables (but could be Weak…)

Mundlak with unbalanced panel:

CORRELATED RANDOM EFFECTS MODELSWITH UNBALANCED PANELSJeffrey M. Wooldridge∗Department of EconomicsMichigan State UniversityEast Lansing, MI [email protected] version: May 2010

Step 2B

Run an unrestricted Hausman Taylor regression with the exogenous variables, as instruments for the endogenous time-invariant variables and

KEEP AS REGRESSORS the endogenous

average-over-time variables, to correct for endogeneity of the time-varying variables

Properties of the pre-test estimator

Extreme cases:– All average-over-time variables are

significant, πi ≠ 0: Mundlak-Krishnakumar estimation: Within / Between

– No average-over-time variable is significant, πi= 0: Restricted Random Effects (GLS) estimation

We do not need any a-priory information which variables to use as instruments.

Design

nsreplicatio 1000

0.7t coefficienation autocorrelbenchmark with

3

and for 1deviation standard

0,mean with lmultinorma

1

511001

22

21211

2212121111

it

εα

itiiitititit

X

σσ

ZX

X, Z, α

T, ; t N, i

ZXXXy

Correlation matrix of Multinormal variables, det(R)>0

55.0,...,45.0,75.0,,4.0

45.0,...,35.0,45.0,...,0

1

01

052.01

075.04.0)(

004.00)(

004.000)(

),,,,,(

2

21211

22

1212

1111

2

12

11

21211

ZX

ZXZXZX

Z

XZX

XZX

XZX

itiiititit

with

XR

XR

XR

ZXXXR

Auto-correlation 0.7 for Xit, T=5

1

7.01

49.07.01

343.049.07.01

24.0343.049.07.01

168.024.0343.049.07.01

)( 11XR

Second question:OLS bias for 4 parameters

Hausman Taylor for ICRG

See program

Step 3: GMM-system and time invariant?

Not seriously investigated so far except a recent working paper.

GMM-system: levels instrumented by first differences and first differences instrumented by levels

Risk of too many instruments.

Documents

Part 2: Panel Data Estimators: using the STATA program for Growth and Aid (Burnside and Dollar) Jean-Bernard CHATELAIN