32
1 Research Method Research Method Lecture 8 (Ch14) Lecture 8 (Ch14) Advanced Panel Advanced Panel Data Method Data Method ©

1 Research Method Lecture 8 (Ch14) Advanced Panel Data Method ©

Embed Size (px)

Citation preview

1

Research MethodResearch Method

Lecture 8 (Ch14)Lecture 8 (Ch14)

Advanced Panel Advanced Panel Data MethodData Method

©

Fixed effects estimationFixed effects estimation

Fixed effects estimation is another method to eliminate the time invariant unobserved effect.

Consider the following model

Yit=β0+β1xit1+β2xit2+…+xitk+ai+uit ……. (1)

The correlation between the fixed effect ai and the explanatory variables will cause biases in the estimated coefficients.

2

Thus, we need to eliminate ai from the estimation. The first differencing is one method. Another method is the following.

First, compute the sample average of variables for each individual. (That is, for ith individual, you compute the time series sample average of each variables). Then, you have the following

3

)2.........( 22110 iiikkiii uaxxxy

Since ai is constant over time, ai term in the equation (2) does not have the over-bar.

Now, subtract (2) from (1). Then, you get the following equation.

4

)()()()()( 222111 iitikitkkiitiitiit uuxxxxxxyy

Notice that, this transformation eliminates the fixed effect ai. This transformation is called the within transformation. Note also that this transformation eliminates the constant as well.

Now, we simplify the notation by writing the above equation as: )3(2211 ititkkititit uxxxy

where . This is called the time-demeaned data on y. The same notation is used for the x-variables and u.

iitit yyy

Finally, estimate the demeaned equation (3) using OLS. This is called the fixed effect estimation.

To repeat, you simply run the OLS for the following equation and it is called the fixed effect estimation..

5

ititkkititit uxxxy 2211

Note that you do not have the intercept in this model.

The standard error for the The standard error for the fixed effect estimatorfixed effect estimator

Now, define the fixed effects residual as

6

)ˆˆ(ˆ 11 itkkititit xxyu

Then, the unbiased estimator of the sample variance is given by

SSR

n

i

T

titu u

NkNT

1 1

2

freedom of Degree

2 ˆ1

=Total # of observations. (T is the # of period, and N is the # of cross sectional units)

# of parameters excluding the intercept # cross sectional units (# of

individuals, firms etc)

After computing the estimated sample variance , you can compute the standard errors for the parameters by applying the formula given in Handout 2.

Notice that, if you manually create the time-demeaned variables and apply OLS, the usual statistical software will compute the degree of freedom as NT-k. This will understate the standard errors.

In this case, you have to correct the sample standard errors by multiplying each standard error by .

Fortunately, STATA has a command that estimates the fixed effect model automatically with correct standard errors.

7

)/()( NkNTkNT

2ˆu

Estimating aEstimating aii

Sometimes (not often though), ai ,itself is of interest. This can be easily estimated as:

8

ikkiii xxya ˆˆˆ 11

When you estimate a fixed effect model using STATA, STATA reports the `intercept’. Remember that, fixed effect does not have the intercept. What STATA is reporting is the average value of .

ia

ExampleExample JTRAIN.dta is a three year panel data. In

the first differenced model, we used only the first two years. Now use all the three years and estimate the following model.

log(scrap)it=β0+β1(grant)it

+β2log(sales)it+β3log(#employees)it

+β4(year88)it+β5(year89)it+ai+uit

Ex1. Estimate the model using OLS ignoring the presence of the fixed effect.

Ex2. Estimate the model using the fixed effect model.

9

10 _cons 7.007854 2.569933 2.73 0.007 1.927583 12.08813 d89 -.3971186 .2897043 -1.37 0.173 -.9698094 .1755721 d88 -.1792051 .3029787 -0.59 0.555 -.7781368 .4197266 lsales -.5983353 .2072189 -2.89 0.004 -1.007968 -.1887027 lemploy .7193017 .2004453 3.59 0.000 .3230593 1.115544 grant .1460224 .3185326 0.46 0.647 -.4836564 .7757012 lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 304.125633 147 2.06888185 Root MSE = 1.3857 Adj R-squared = 0.0719 Residual 272.667651 142 1.92019473 R-squared = 0.1034 Model 31.4579815 5 6.29159631 Prob > F = 0.0079 F( 5, 142) = 3.28 Source SS df MS Number of obs = 148

. reg lscrap grant lemploy lsale d88 d89

delta: 1 unit time variable: year, 1987 to 1989 panel variable: fcode (strongly balanced). tsset fcode year

. use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\JTRAIN.DTA", clear

Ex1. OLS result

11

F test that all u_i=0: F(50, 92) = 19.64 Prob > F = 0.0000 rho .88518501 (fraction of variance due to u_i) sigma_e .50390488 sigma_u 1.3991565 _cons 1.570754 3.178357 0.49 0.622 -4.741738 7.883246 d89 -.3785155 .1168365 -3.24 0.002 -.6105629 -.1464681 d88 -.0926408 .1165089 -0.80 0.429 -.3240376 .138756 lsales -.0654104 .2660989 -0.25 0.806 -.5939057 .463085 lemploy -.0149276 .3581686 -0.04 0.967 -.7262814 .6964261 grant -.088648 .1340168 -0.66 0.510 -.3548169 .1775209 lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

corr(u_i, Xb) = -0.0613 Prob > F = 0.0051 F(5,92) = 3.60

overall = 0.0059 max = 3 between = 0.0111 avg = 2.9R-sq: within = 0.1637 Obs per group: min = 1

Group variable: fcode Number of groups = 51Fixed-effects (within) regression Number of obs = 148

. xtreg lscrap grant lemploy lsale d88 d89, fe

Fixed effect model

Ex3. The fixed effect model above did not show statistically significant effects of the grant. It is probably because it takes some time for the effect of grants to appear. In order to capture this possibility, include the lag of grant. That is, estimate the following model.

log(scrap)it=β0+β1(grant)it +β2(grant)it-1

+β3log(sales)it+β4log(#employees)it

+β5(year88)it+β6(year89)it+ai+uit

12

One year lag of the grant

This is called the distributed lag model. The lag of the grant captures the effect of receiving grant last year on this year’s scrap rate.

13

F test that all u_i=0: F(50, 91) = 20.75 Prob > F = 0.0000 rho .89585692 (fraction of variance due to u_i) sigma_e .49149057 sigma_u 1.4415155 _cons 2.115481 3.10843 0.68 0.498 -4.059034 8.289996 d89 -.132193 .1536863 -0.86 0.392 -.4374719 .173086 d88 -.0039609 .1195487 -0.03 0.974 -.2414296 .2335079 lsales -.0868577 .2596985 -0.33 0.739 -.6027167 .4290014 lemploy -.0763679 .3502902 -0.22 0.828 -.7721764 .6194405 grant_1 -.5355783 .224206 -2.39 0.019 -.980936 -.0902207 grant -.2967542 .1570861 -1.89 0.062 -.6087863 .015278 lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

corr(u_i, Xb) = -0.2258 Prob > F = 0.0011 F(6,91) = 4.11

overall = 0.0004 max = 3 between = 0.0341 avg = 2.9R-sq: within = 0.2131 Obs per group: min = 1

Group variable: fcode Number of groups = 51Fixed-effects (within) regression Number of obs = 148

. xtreg lscrap grant grant_1 lemploy lsale d88 d89, feFixed effect model with one year lag of the grant

The lag of grant has greater effect than current grant. This indicates that it takes time for the effect to appear.

Ex4. Finally, estimate the following fixed effect model by manually creating the time-demeaned variable. This is a good exercise for you to understand the exact procedure of the fixed effect estimation.

log(scrap)it=β0+β1(grant)it

+β2(year88)it+β3(year89)it+ai+uit

14

15

F test that all u_i=0: F(53, 105) = 23.90 Prob > F = 0.0000 rho .88894293 (fraction of variance due to u_i) sigma_e .50485774 sigma_u 1.4283441 _cons .5974341 .0687024 8.70 0.000 .4612098 .7336583 d89 -.42704 .0999338 -4.27 0.000 -.6251903 -.2288897 d88 -.140066 .106835 -1.31 0.193 -.3519 .0717681 grant -.0822141 .1262632 -0.65 0.516 -.3325706 .1681424 lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

corr(u_i, Xb) = -0.0109 Prob > F = 0.0002 F(3,105) = 7.18

overall = 0.0130 max = 3 between = 0.0189 avg = 3.0R-sq: within = 0.1701 Obs per group: min = 3

Group variable: fcode Number of groups = 54Fixed-effects (within) regression Number of obs = 162

. xtreg lscrap grant d88 d89, fe

_cons -8.26e-09 .0323354 -0.00 1.000 -.0638653 .0638653 dmd89 -.42704 .0814664 -5.24 0.000 -.5879437 -.2661363 dmd88 -.140066 .0870923 -1.61 0.110 -.3120812 .0319493 dmgrant -.0822141 .1029302 -0.80 0.426 -.2855107 .1210825 dmlscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 32.2496203 161 .200308201 Root MSE = .41156 Adj R-squared = 0.1544 Residual 26.7625405 158 .169383168 R-squared = 0.1701 Model 5.48707982 3 1.82902661 Prob > F = 0.0000 F( 3, 158) = 10.80 Source SS df MS Number of obs = 162

. reg dmlscrap dmgrant dmd88 dmd89

Fixed effect estimated automatically

Fixed effect estimated by manually creating time-demeaned variables. Note the standard errors are wrong, so you have to correct them.

16

The do file

****************************** Mannually estimating the ** fixed effect model ******************************sort fcode

by fcode: egen meanlscrap=mean(lscrap) gen dmlscrap=lscrap-meanlscrap

by fcode: egen meangrant=mean(grant) gen dmgrant=grant-meangrantby fcode: egen meand88=mean(d88) gen dmd88=d88-meand88by fcode: egen meand89=mean(d89) gen dmd89=d89-meand89********************Estimate the model ********************reg dmlscrap dmgrant dmd88 dmd89xtreg lscrap grant d88 d89, fe

Note, when you estimate the fixed effect model, it is a good idea to tell your audience what the potential fixed effect would be and whether it is correlated with the explanatory variables.

Off course, one can never tell exactly what the fixed effect is since it is the aggregate effects of all the unobserved effects. However, if you tell what is contained in the fixed effect, your audience can understand the potential direction of the bias, and why you need to use the fixed effect model.

17

The dummy variable The dummy variable regressionregression

Consider again the following model.

log(scrap)it=β0+β1(grant)it

+β2(year88)it+β3(year89)it+ai+uit

We learned that fixed effect model can correct for the biases arising from the correlation between ai and the explanatory variables.

18

Now, consider instead that you include all the firm dummy variables in the model, and estimate the model using the usual OLS.

It is known that the slope coefficients and their standard errors obtained from this procedure are exactly the same as those obtained from the fixed effect estimation.

The coefficients for dummy variables will be the same as the fixed effect estimates for ai.

19

However, note that the coefficients for the dummy variables are not consistent when the number of periods (T) is fixed and the number of firms (N) gets large. This is because, when N gets large, the number of ai will increase. So no information accumulates on each ai.

20

The Random Effect The Random Effect EstimationEstimation

Consider the following unobserved effect model.

Previously, we applied the fixed effect estimation since we suspect that ai are correlated with some of the explanatory variables.

But if we can assume that ai are not correlated with any of the explanatory variables, we can estimate the model more efficiently (i.e., get smaller standard errors).

21

)1()(22110 itv

itiitkkititit uaxxxY

When ai are not correlated with any of the explanatory variables, pooled OLS will be consistent.

But the problem is now the serial correlation. That is, for a given person i, the composite error term vit of this period and other periods are correlated.

22

To be more precise, assume the following.

Cov(xitj, ai)=0 for t=1,2,…,T, and j=1,2,…,k

That is: ai is uncorrelated with all the explanatory variables in all the periods.

In addition, we assume that ai and the idiosyncratic errors in all the periods are uncorrelated.

Then we can show the following.

23

0),(22

2

ua

aisit vvCorr

where σa2=var(ai) and σu

2=Var(uit). Proof: See the front board.

Here is a way to eliminate the serial correlation.

Consider the following.

24

2/1

22

2

1

au

u

T

Then, the term are not serially correlated. Thus, first consider the following.

iit vv

)2...( )()()()( 22110 itikkiii vxxxy

Then, subtract (2) from (1) to get,

25

(3))()()(

)()1()(

222

1110

iitikitkkiit

iitiit

vvxxxx

xxyy

As can be seen, the composite error term is , and we know that this error term has no serial correlation. The transformed data are called the quasi-demeaned data. Therefore, if we apply the OLS to (3), we get the correct standard error.

One problem is that λ is an unknown parameter. So this has to be estimated.

The procedure to estimate λ is the following.

iit vv

1. Estimate (1) using OLS. Then estimate σa2 σu

2 σv2 and as:

26

N

i

T

t

T

tsisita vvkTNT

1 1 1

12 ˆˆ)]1(2/)1([

2. Then estimate λ as:

N

i

T

titv vkNT

1 1

212 ˆ)]1([

222 ˆˆˆ avu

This is just the estimate of the sigma-squared estimated from the pooled OLS of (1).

2/1

22

2

ˆˆ

ˆ1ˆ

au

u

T

3. Finally, replace λ in equation (3) with and estimate the equation using OLS. This is called the Random Effect Estimation.

ExampleExample Estimate a log wage equation using

WAGEPAN.dta. Include in the model education, black, hispan, exper, exper squared, married, union, and full set of year dummies.

First, estimate the model using OLS Next, estimate the model using the

random effect. Finally estimate the model using the

fixed effect model. Why does STATA drops some of the variables?

27

28 _cons .0920558 .0782701 1.18 0.240 -.0613935 .2455051 d87 .1738334 .049433 3.52 0.000 .0769194 .2707474 d86 .1419596 .046423 3.06 0.002 .0509469 .2329723 d85 .1092463 .0433525 2.52 0.012 .0242533 .1942393 d84 .0904672 .0400907 2.26 0.024 .011869 .1690654 d83 .0620117 .0366601 1.69 0.091 -.0098608 .1338843 d82 .0627744 .0332141 1.89 0.059 -.0023421 .1278909 d81 .05832 .0303536 1.92 0.055 -.0011886 .1178286 union .1824613 .0171568 10.63 0.000 .1488253 .2160973 married .1082529 .0156894 6.90 0.000 .0774937 .1390122 expersq -.0024117 .00082 -2.94 0.003 -.0040192 -.0008042 exper .0672345 .0136948 4.91 0.000 .0403856 .0940834 hisp .0160195 .0207971 0.77 0.441 -.0247535 .0567925 black -.1392342 .0235796 -5.90 0.000 -.1854622 -.0930062 educ .0913498 .0052374 17.44 0.000 .0810819 .1016177 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1236.52964 4359 .283672779 Root MSE = .48033 Adj R-squared = 0.1867 Residual 1002.48136 4345 .230720682 R-squared = 0.1893 Model 234.048277 14 16.7177341 Prob > F = 0.0000 F( 14, 4345) = 72.46 Source SS df MS Number of obs = 4360

. reg lwage educ black hisp exper expersq married union d81 d82 d83 d84 d85 d86 d87

OLS

29 rho .46100216 (fraction of variance due to u_i) sigma_e .35099001 sigma_u .32460315 _cons .0235864 .1506683 0.16 0.876 -.271718 .3188907 d87 .1349289 .0813135 1.66 0.097 -.0244427 .2943005 d86 .0919476 .0712293 1.29 0.197 -.0476592 .2315544 d85 .0578155 .0612323 0.94 0.345 -.0621977 .1778286 d84 .0431187 .0513163 0.84 0.401 -.0574595 .1436969 d83 .0202806 .041582 0.49 0.626 -.0612186 .1017798 d82 .0309212 .0323416 0.96 0.339 -.0324672 .0943096 d81 .040462 .0246946 1.64 0.101 -.0079385 .0888626 union .1061344 .0178539 5.94 0.000 .0711415 .1411273 married .063986 .0167742 3.81 0.000 .0311091 .0968629 expersq -.0047239 .0006895 -6.85 0.000 -.0060753 -.0033726 exper .1057545 .0153668 6.88 0.000 .0756361 .1358729 hisp .0217317 .0426063 0.51 0.610 -.0617751 .1052385 black -.1393767 .0477228 -2.92 0.003 -.2329117 -.0458417 educ .0918763 .0106597 8.62 0.000 .0709836 .1127689 lwage Coef. Std. Err. z P>|z| [95% Conf. Interval]

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000Random effects u_i ~ Gaussian Wald chi2(14) = 957.77

overall = 0.1830 max = 8 between = 0.1860 avg = 8.0R-sq: within = 0.1799 Obs per group: min = 8

Group variable: nr Number of groups = 545Random-effects GLS regression Number of obs = 4360

. xtreg lwage educ black hisp exper expersq married union d81 d82 d83 d84 d85 d86 d87,re

Random Effect

30F test that all u_i=0: F(544, 3805) = 7.96 Prob > F = 0.0000 rho .56612236 (fraction of variance due to u_i) sigma_e .35099001 sigma_u .4009279 _cons 1.02764 .0299499 34.31 0.000 .9689201 1.086359 d87 (dropped) d86 -.0273819 .0203863 -1.34 0.179 -.0673511 .0125872 d85 -.0432498 .0202458 -2.14 0.033 -.0829434 -.0035562 d84 -.0384709 .0203144 -1.89 0.058 -.0782991 .0013573 d83 -.0419955 .0203205 -2.07 0.039 -.0818357 -.0021553 d82 -.011322 .0202275 -0.56 0.576 -.0509798 .0283359 d81 .0190448 .0203626 0.94 0.350 -.0208779 .0589674 union .0800019 .0193103 4.14 0.000 .0421423 .1178614 married .0466804 .0183104 2.55 0.011 .0107811 .0825796 expersq -.0051855 .0007044 -7.36 0.000 -.0065666 -.0038044 exper .1321464 .0098247 13.45 0.000 .1128842 .1514087 hisp (dropped) black (dropped) educ (dropped) lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

corr(u_i, Xb) = -0.1212 Prob > F = 0.0000 F(10,3805) = 83.85

overall = 0.0635 max = 8 between = 0.0005 avg = 8.0R-sq: within = 0.1806 Obs per group: min = 8

Group variable: nr Number of groups = 545Fixed-effects (within) regression Number of obs = 4360

. xtreg lwage educ black hisp exper expersq married union d81 d82 d83 d84 d85 d86 d87,fe

Fixed effect

Fixed effect or random Fixed effect or random effecteffect

Fixed effect estimation allows arbitrary correlation between ai and explanatory variables. Random effect is valid only if ai are uncorrelated with any of the explanatory variables.

When you conduct a policy analysis, correlation should be considered as the rule rather than the exception.

Thus fixed effect is almost always more convincing than the random effect.

31

But if the policy variable is set experimentally, then you might apply random effect. For example, suppose that you want to know the effect of the class size on the students’ achievement. And if students are randomly assigned to classes of different size, then random effect can be applied.

However, again, this kind of situation is rare. So, the usual recommendation is to use the fixed effect method.

32