MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

Embed Size (px)

Citation preview

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    1/42

    Journal of Econometrics 18 (1982) 546. North-Holland Publishing Company

    MULTIVARIATE REGRESSION MODELS

    FOR PANEL DATA

    Gary CHAMBERLAIN*

    University ( / K+sconsin Madison, WI 53706, USA

    Nutionul Bureau of

    Economic

    esearch, Cambridge, MA 02138. USA

    The paper examines the relationship between heterogeneity bias and strict exogeneity in a

    distributed lag regression of y on X. The relationship is very strong when x is continuous,

    weaker when x is discrete, and non-existent as the order of the distributed lag becomes

    infinite. The individual specific

    random variables introduce nonlinearity and hetero-

    skedasticity; so the paper provides an appropriate framework for the estimation of multivariate

    linear predictors. Restrictions are imposed using a minimum distance estimator. It is generally

    more efficient than the conventional estimators such as quasi-maximum likelihood. There are

    computationally simple generalizations of two- and three-stage least squares that achieve

    this efficiency gain. Some of these ideas are illustrated using the sample of Young Men in the

    National Longitudinal Survey. The paper reports regressions on the leads and lags of variables

    measuring union coverage, SMSA, and region. The results indicate that the leads and lags could

    have been generated just by a random intercept. This gives some support for analysis of

    covariance type estimates; these estimates indicate a substantial heterogeneity bias in the union,

    SMSA, and region coefficients.

    1. Introduction

    Suppose that we have a sample of individuals (or firms) followed over time:

    (xif,yiJ, where there are

    t=

    1,.

    . .,

    T periods and

    i

    1,. ., N individuals.

    Consider the following distributed lag specification:

    E YitIXil,...,XiT,bO...,bJ,C)= i bijXi,t-j+Ci,

    t=J+l,...,T

    j=O

    The coefficients

    b,,

    and ci are allowed to vary across individuals but are

    constant over time. The population parameters of interest are fij= E bij),

    j=O,...,

    J. If the bii or ci are correlated with x, then a least squares regression

    *I am grateful to Arthur Goldberger, Zvi Griliches, Donald Hester, George Jakubson, Ariel

    Pakes, and Burton Singer for comments and helpful discussions. Financial support was provided

    by the National Science Foundation (Grants No. SOC-7925959 and No. SES-8016383) and by

    funds granted to the Institute for Research on Poverty at the University of Wisconsin, Madison,

    by the Department of Health, Education, and Welfare pursuant to the provisions of the

    Economic Opportunity Act of 1964.

    01657410/82/000Cr0000/$02.75 0 1982 North-Holland

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    2/42

    6

    G. Chamberl ain, M ult it lar iat e reqression models or panel data

    of y, on x,, ., .xtmJ will not provide a consistent estimator of the B,i (as

    N-+co). We shall refer to this inconsistency as a heterogeneity bias.

    In section 2, on identification, we consider first the case J =0 and bi j=Pj

    We argue that the presence of heterogeneity bias will be signalled by a full

    set of lags and leads in the least squares regression of y, on x1,. . .,xT

    Furthermore, if we let y=(yi,.. .,yr), x=(xr,. . .,x,) and consider the

    multivariate linear predictor: E*(y

    lx) = no + lI,x,

    then the T x T matrix ZZ,

    should have a distinctive pattern - the off-diagonal elements within the

    same column are all equal. In that case,

    so there is just a contemporaneous relationship when we transform to first

    differences. I think that a test for such restrictions should accompany

    analysis of covariance type estimation.

    There is an analogous question when J is finite and the bj are random as

    well as c. Does E(y, 1 1,. . , xT) = E y,

    1x,, . . ., xtmJ)

    imply that there is no

    heterogeneity bias? We find that the answer is yes if x has a continuous

    distribution but not if x is discrete.

    New issues arise as the order

    (J)

    of the distributed lag becomes infinite.

    We consider this problem in the context of a stationary stochastic process; c

    and the bj are (shift) invariant random variables. There are invariant random

    variables with non-zero variance if and only if the process is not ergodic. We

    pose the following question: if

    E* Y,

    I

    . . .Xf-1.&,Xt+1,.. .I= E*(Y, 1 ,, x, - 1,. . .I,

    so that y does not cause x according to the Sims (1972) definition, is it then

    true that there is no heterogeneity bias? The answer is no, because if d is an

    invariant random variable, then

    E*(dI .

    x,_~,x,,x,+~ ,...

    )=E*(dIxt,xtpl ,... ).

    Section 3 of the paper considers the estimation of multivariate linear

    predictors. lhere is a sample ri = (x;,y i = 1,. . , N , where x; = (xi,, ., xiK) and

    yi=(y,r,. . , yiM). We assume that ri is independent and identically distributed

    (i.i.d.) according to some distribution with finite fourth moments. We do not

    assume that the regression function

    E(ji 1 i )

    is linear; for although

    E(j i 1 i , ci)

    may be linear, there is generally no reason to insist that

    E(c,j xi )

    is linear.

    Furthermore, we allow the conditional variance V(_V,

    i)

    to be an arbitrary

    function of xi; the heteroskedasticity could, for example, be due to random

    coefficients. Let wi be the vector formed from the squares and cross-products

    of the elements of vi ; let Zl be the matrix of linear predictor coefficients:

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    3/42

    G. Chamberlain Mulrirw-iate regression models fir panel data

    7

    ,5*Cyi ( xi) =ZIx, where fl= ECy,x;)[E(xix~)] -I. Then wi is i.i.d. and f7 is a

    function of E(wi). So the problem is to make inferences about differentiable

    functions of a population mean, under random sampling.

    This is straightforward and the results have a variety of novel implications.

    Let ii be the least squares estimator; let it and 71

    be the vectors formed from

    the columns of ii and II. Then fi(7i-~)~N(O,Q) as N-t co. The formula

    for C2 is not the standard one, since we are not assuming homoskedastic,

    linear regression.

    We impose restrictions by using a minimum distance estimator: find the

    matrix satisfying the restrictions that is closest to fi in the norm provided by

    fi -I, where fi is a consistent (as N+ 00) estimator of a. This leads to some

    surprising results. For example,

    consider a univariate linear predictor:

    E*(yi 1xii, xiz)= x0 + zlxil + n2xi2. We can impose the restriction that n2 =0

    by using a least squares regression of y on x1 to estimate rcr; however, this is

    asymptotically less efficient, in general, than our minimum distance estimator.

    The conventional estimator is a minimum distance estimator, but it is using

    a different norm.

    A related result is that two-stage least squares is not, in general, an

    efticient procedure for combining instrumental variables; three-stage least

    squares is also using the wrong norm. We provide more efficient estimators

    for the linear simultaneous equations model by applying our minimum

    distance procedure to the reduced form, thereby generalizing Malinvauds

    (1970) minimum distance estimator. Suppose that the only restrictions are

    that certain structural coefficients are zero (and the normalization rule). We

    provide a generalization of three-stage least squares that has the same

    limiting distribution as our minimum distance estimator. There is a

    corresponding generalization of two-stage least squares.

    We also consider the maximum likelihood estimator based on assuming

    that ri has a multivariate normal distribution with mean z and covariance

    matrix Z. Then the slope coefficients in IZ are functions of C and, more

    generally, we can consider estimating arbitrary functions of C subject to

    restrictions. When the normality assumptions do not hold, we refer to the

    estimator as a quasi-maximum likelihood estimator. The quasi-maximum

    likelihood estimator has the same limiting distribution as a certain minimum

    distance estimator; but in general that minimum distance estimator is not

    using the optimal norm. Hence our estimator is generally more efficient than

    the quasi-maximum likelihood estimator.

    Section 4 of the paper presents an empirical example that illustrates some

    of the results. It is based on the panel of Young Men in the National

    Longitudinal Survey (Parnes); y, is the logarithm of the individuals hourly

    wage, and x, includes variables to indicate whether or not the individuals

    wage is set by collective bargaining; whether or not he lives in an SMSA;

    and whether or not he lives in the South. We present unrestricted least

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    4/42

    8

    G. Chamberlain, Multivariate regression models for panel data

    squares regressions of y, on xi,.

    .,

    xT. There are significant leads and lags; if

    they are generated just by a random intercept (c), then ZZ should have a

    distinctive form. There is some evidence in favor of this, and hence some

    justification for analysis of covariance estimation. In this example, the leads

    and lags could be interpreted as due just to c, with E(y, 1 1,. . ., xT, c) =j?x, + c.

    2. Identification

    Suppose that a farmer is producing a product with a Cobb-Douglas

    technology,

    Y,=Px,+c+~,,

    o

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    5/42

    G. Chamberlain :Illrltirariate regression models for panel data

    9

    With more than one observation per farm, however, we can consider the

    least squares regression of y, on x = (xi,.

    .,xT).

    The population counterpart

    is

    E*(y,

    I x) = pxt + E*(c ( x) + E*(u, I x).

    Assume that V(X) is non-singular. Then

    E*(c 1 ) = $ + xx,

    Iz = I/ (x) cov(x, c).

    Even if E*(u, / x) =O, there will generally be a full set of lags and leads

    if V(c) # 0. For example, if cov (xt, c) =cov (x,, c), t = 1,.

    . ., ?;

    then Iz is pro-

    portional to the row sums of V-(x), and all of the elements of I will

    typically be non-zero. I think that it is generally true that E*(c lx) depends

    on all of the x,s if it depends on any of them. So the presence of

    heterogeneity bias will be signalled by a full set of lags and leads. Also, if

    E*(u) x)=0, then the wide-sense multivariate regression will have a

    distinctive pattern:

    n

    1 =

    co+,

    x)

    v

    (x) = p

    I, +

    1

    A,

    where 1 is a TX 1 vector of ones. The off-diagonal elements within the same

    column of ll, are all equal.

    A common solution to the bias problem is some form of analysis of co-

    variance. For example, we can form the farm specific means (j?=CT= 1 y,/T,

    X =cT= 1 ,/T) and the deviations around them (jt = y, - j, 3, = x,-X), and then

    run a pooled least squares regression of ~7 on 2. This is equivalent to first

    running the least squares regression of g* on & for each of the

    T

    cross-section

    samples, and then forming a weighted average of the T slope coefficients. The

    population counterpart of the tth least squares regression is

    So the least squares regression of Y; on ?r provides a consistent (as N-co)

    estimator of fl only if E*(u, - Ul X,-Z?) =O. I would not expect this condition

    to hold unless

    E*(u,-uu,.

    1

    j-~~-x~,...,x~-x~~~)=O,

    t = 2,

    ,) 7:

    This analysis of covariance estimator was used by Mundlak (1961). Related estimators have

    been discussed by Balestra and Nerlove (1966), Wallace and Hussein (1969), Amemiya (1971),

    Maddala (1971), and Mundlak (1978). Analysis of covariance in nonlinear models is discussed in

    Chamberlain (1980).

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    6/42

    10 G. Chamberlain, Mu&variate regression models for panel data

    so that x is strictly exogenous when we transform the model to first

    differences.3 The strict exogeneity restriction is testable since it implies that

    E*( ,,-Yr~1IXz-X1,...,

    xT--x.-l)=~*(Yt-Y,-l I-+-x,-d

    hence there are exclusion restrictions on the linear predictors.

    A stronger condition is that

    E*(u,lx ,,..., xT)=o,

    t=l,...,T.

    This implies that Zl, has the form fiZ,+l1. These restrictions on n,are

    testable; we can summarize them by saying that x is strictly exogenous

    conditional on c. The restrictions would fail to hold in the production

    function example if u, is partly predictable from its past, so that

    E[exp(u,)

    1LAY,]

    epends on u, _ r, u, _ 2, . . .

    Now suppose that the technology varies across the farms, so that

    y,=bx,+c+u,,

    where b is a random variable that is constant over time. We shall refer to b

    and c as invariant random variables. Our discussion of E*(c lx) indicated

    that it depends on all of the x,)s if it depends on any of them. I would expect

    this to be true of E(c

    1 )

    as well. This general characteristic of invariant

    random variables is formulated in the following condition:

    Condition (C). Let x* =(xt,, . . ., xlK), where {tr,. . ., tK} is some proper subset of

    {l,...,

    T). Let d be an invariant random variable. Then E(d 1 )=E(d I x*)

    implies that

    E(d

    1 ) =

    E(d).

    Suppose that the parameter of interest is /l=E(b). If b or c is correlated

    with x, then a least squares regression of y, on x, will not provide a

    consistent estimator of /I. We have argued that such a heterogeneity bias will

    be signalled by a full set of lags and leads when we regress y, on (x1,. . ., xT).

    Under what conditons can we infer that there is no bias if we observe only a

    contemporaneous relationship? Proposition 1 provides some guidance; it can

    be extended easily to the case of a finite distributed lag.

    Condition (R). Prob (x, =x, _ 1) = 0 for some integer n with i 5 n 5 T.

    Proposition I. Suppose that

    E(y, I x, b, 4 = b x, + c,

    t=l,...,T.

    3The strict exogeneity terminology is based on Sims (1972, 1974)

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    7/42

    G. Chamberl ain, M ult iv ari ate regression models fir panel dat a

    11

    [f conditions (C) and (R) hold and if T 23. then

    E Y, I 4 = E(Y, 4,

    t=l,...,7:

    implies that

    where /I = E(b) = E(b ) x) and y = E(c) = E(c

    ) x).

    ProoJ: The following equalities hold with probability one:

    E(b 4 = CE(Y,I 4 - E(Y, I I xn - d/(x, - xn 11,

    So E(bIx)=E(bI

    x,, x, _ I), and if T2 3, then (C) implies that E(b

    1 )=

    E(b),

    and

    ~ clx)=E y,lx)--E blx)x,=E y,lx,)--xx,;

    hence E(c 1 ) = E(c 1x1) and so

    E(c 1 ) = E(c).

    Q.E.D.

    This analysis can be applied to linear transformations of the process. If

    we find that E(y,

    1 )

    has a full set of lags and leads, then we can ask if

    that is just due to E(c/x)#E(c). Let dy,=y,-y,_,, Ax~=x~--x~-~, and

    Ax = (Ax,,

    . . .,

    Ax,). Under the assumptions of the proposition, if

    E(AY, 1A4 = E(AY, ( Ax,),

    then

    E(AY, 1A4 = B(A-4.

    Note that it is possible to find

    E(Ay,

    1

    Ax)=E(Ay, (Ax,)

    even though

    -W+)#W). F

    or example, consider the stationary case in which cov(x,, b)

    = cov (x,, b);

    then

    E*(b

    1

    Ax) = E(b)

    and so

    E(b 1Ax)= E(b)

    if the regression

    function of

    b

    on

    Ax

    is linear. Then we might find that

    E(Ay,)

    x) has a full set

    of lags and leads even though E(Ay,

    1

    Ax) does not.

    The condition that prob(x,=x,_ ,)=O is necessary. For consider the

    following counter-example:

    E(b

    ( x) = /II1 if x1 =. . . = xT,

    E(b

    1 ) = p2 if not

    (PI f PA. Then

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    8/42

    12

    G.

    Chamberlain, Multivariate regression

    modelsor

    panel data

    but p2 #E(b) unless prob(x, = ... = xT) = 0. So there is an important

    distinction here between continuous and discrete distributions for x. If x,

    only takes on a finite set of values, then there will generally be positive

    probability that x1 =. . . = xT, although this probability may become negligible

    for large 7:

    The following proposition provides some additional insight mto this

    distinction; it is based on a condition that is slightly weaker than (R):

    Condi ti on (R).

    Prob(x, = x2 =. . . = xT) = 0.

    Proposi ti on 2. Suppose that

    E(Y, x, b,4 = bxt + c,

    t=l,...,7;

    w here T 2 2. Assume t hat condi t i on (R) hol ds and defi ne

    6=til

    Yt-m-+l

    x,--v.

    Then E(6j = E(b) i f E((6j) < a .4

    ProoJ The following equalities hold with probability one:

    E(l+,b,c)= i b(x,-X)

    i (x,-%)2=b;

    t=1

    I

    t=1

    so if E(I6/)< co,

    E(6j = E[E(6[ X, b, c)] = E(b).

    Q.E.D.

    Suppose that (yil,. . ., yi,, xii,.

    . ., xiT), i=

    1,. . ., N, is a random sample

    from the distribution of b,x). Define

    6zt$l (Yit-Pi)(xit-xi) til (xit-xi)2.

    I

    Then if the assumptions of Proposition 2 are satisfied, cr= I &i/N converges

    almost surely (as.) to E(b) as N-co. It is important that gi is an unbiased

    estimator of E(b), since we are actually taking the unweighted mean of a

    *The assumption that E(161)< co is not innocuous. For example, suppose that V(c)= V(b)=0

    and (x,, y,) is independent and identically distributed (t = 1,. ., T) according to a bivariate normal

    distribution. Then h^=b+{ P(y, Ix~)/[(T-~)V(.X,)]}~ w,

    where w has Students t-distribution with

    T- 1 degrees of freedom. Hence Q/61) < cc only if T 2 3.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    9/42

    G. Chamberl ain, M ult iuar iat e regression models for panel data 13

    large number of these estimators. The lack of bias requires that x be strictly

    exogenous conditional on b,c. It would not be sufficient to assume that

    E(y, ( xt,

    b, c) = bx, + c.

    For example, if x, = y,_ 1, then our estimator would not

    converge to

    E(b),

    due to the small

    T

    bias in least squares estimates of an

    autoregressive process.

    Let Di =0 if xi1 = .. .. = xiT, Di= 1 if not. We can compute gi only for the

    group with Di= 1. The sample mean of bi for that group converges as. to

    E(b

    1D = l), but we have no information on

    E(b

    1D = 0). So unless prob(D = 0)

    = 0, any value for E(b) is consistent with a given value for E(b

    1

    D = 1).5

    If x, has a continuous distribution, then the assumption that the regression

    function is linear (E(y,

    1 t,

    b, c) = bx, + c) is very restrictive; the implication of

    this assumption (combined with strict exogeneity) is that we can obtain an

    unbiased estimator for

    b,

    and hence a consistent (as N+co) estimator for

    E(b).

    If x, is a binary variable, then the assumption of linear regression is not

    restrictive at all; but there are fewer implications since there is positive

    probability that 6is not defined for finite ?:

    The following extension of Proposition 1 to the case of a finite distributed

    lag is straightforward?

    Proposition 1. Suppose that

    E(y,IX,b,,...,b,,c)= i bjx,-j+c,

    t=J+l,...,T

    j=O

    If condition (C) holds, ij

    1

    X . . X,-J

    i i

    : 1

    X,-J-l . . . . x&,

    ,fbr some integer n with 25 + 2 5 n 5 7; and if T 2 25 + 3, then

    E(Y, I4 = E(Y, Ix,, . . .>X,-J),

    t=J+1,...,7;

    5A solution could be based on Mundlaks (1978a) proposal that E(bI x)=$,,+$, CT=, x1.

    However, even if we assume that the regression function is linear in x1,. .,xT, it may be difficult

    to justify the restriction that only cx, matters, unless T is large and we have stationarity:

    cov (b, I,) = cov

    (b, x1)

    and V(x) band diagonal. (See Proposition 4 and the discussion preceding

    it). Furthermore, if cov(h, x,) = cov(b, x1), then E(b 1 2-x,, .,xr -xT- 1)= E(b) (if the regression

    function is linear), and so there is no heterogeneity bias once we transform to first differences.

    6We shall not discuss the problems that arise from truncating the lag distribution when

    T < J + 1. These problems are discussed in Griliches and Pakes (1980). By working with linear

    transformations of the process, it is fairly straightforward to extend our analysis to general

    rational distributed lag schemes.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    10/42

    14

    G. Chamberl ain, M ult iv ari ate regression models or panel data

    implies that

    E(y,Ix)= i Bjxt-j+Y,

    j=O

    where

    pj = E(bj) = E(b,

    1 )

    and y = E(c) = E(c I x),

    j=O,..., J.

    The extension of Proposition 2 is also straightforward. There are new issues,

    however, in the infinite lag case, which we shall take up next.

    Large number of lags.

    Suppose that

    E(.Yfldx),c)= f Bjxt-j+c2

    i=O

    where O(X) is the information set (a-field) generated by {.

    . .,x_ I, x0, x1,. . .},

    and Cj=o /Ij x,_ j converges in mean square as J-+ co. Consider a regression

    version of the Sims (1972) condition for x to be strictly exogenous (y does

    not cause x),

    E Yt

    I

    4) =

    E(Yt

    I x,2 xt -

    19.. 4

    Does this condition imply that E(c

    1a(x))=E(c), so

    that there is no

    heterogeneity bias?

    We shall consider this question in the context of a (strictly) stationary

    stochastic process. Since c does not change over time, it is an invariant

    random variable. The following proposition is proved in appendix A:

    Proposition 3. Ifd is an invariant random variable with E(ldl)< co, then

    E(dIo(x))=E(dlx,,x,-,,...),

    where t is any integer.

    It follows that

    n

    E Y,Ia x))=E cIx,,x,-,,...)+ C Pjx*-j

    j=O

    =E y,Ix,,x,-I,...).

    So we cannot rule out heterogeneity bias just because y does not cause x. If

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    11/42

    G. Chamberl ain, M ult iv ari ate regression models or panel data

    1s

    a large number of lags have been included, then a small number of leads

    provide little additional information on c.

    We can gain some insight into this result by considering the linear

    predictor of an invariant random variable. Let

    where

    E*(c 1xl,. . .,

    x.)=IC/T+&XT,

    2;. =(& i, . . .) A,,) and x;=(xl,...,xT).

    Stationarity implies that I,=rV- (xT)l, where r =cov(xl, c) and 1 is a TX 1

    vector of ones. Since V(x,) is a band-diagonal matrix, I is approximately an

    eigenvector of I+,) for large T; hence &.x,EzIc~T=

    1 x,.

    For example, if

    X, = px, _ i + u,, where v, is serially uncorrelated, then

    &-x,=~

    (1 PI i xt+P(x, +x,)

    /cu+P) vxln

    i=l

    1

    Now in this example, L&K, does not approach a limit as T--+Lx unless

    z = cov (x,, c) =O. In fact cov (xi, c) is zero here, since there is a non-trivial

    linear predictor only if cj=O x,_ j/J converges to a non-degenerate random

    variable as J-rco.

    The general case is covered by the following proposition:

    Proposition 4.

    If d is an invariant random variable and E(d) < co, E(xf) < 00,

    then

    E*(d

    I

    . ..) X_1,X&Xl)... )=$ +/IT?,

    where 2 is the limit

    i n mean squar e of cJ= O , _

    /J as J+ co, t is any

    integer,

    A=cov(d,i)/V(i) if V(a)#O,

    and

    =o

    if V(a)=O,

    $ = E(d) - AE(f).

    (See appendix A for proof.)

    The existence of the f limit, both in mean square and almost surely, is the

    main result of ergodic theory and will be discussed further below. It is clear

    that 2 is an invariant random variable. If V(a)#O, then the x process has a

    (non-degenerate) invariant component, and conditioning on the xs gives a

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    12/42

    16

    G. Chamberlain, Multioariate regression modelsfor panel data

    non-trivial linear predictor if 2 is correlated with c. However, if V(i)=O, then

    cov(c, x,)=0 for all

    t,

    and the linear prediction of c is not improved by

    conditioning on the xs

    It follows from Proposition 4 that

    E* Y, I

    . . . x,-1,x*,x,+ ,,..

    ) =E*(ytIxt,x,-,,...I

    =i+jio

    j++

    1

    t - + r(J),

    where r(J) converges in mean square to zero as J-co. So y does not cause x

    according to Sims definition; but this does not imply that c is uncorrelated

    with the xs. If we include a large number of lags, then the bias in any one

    coefficient is a negligible

    A/J,

    but the bias in the sum of the lag coefficients

    tends to 2 as J-co. If we include K leads, then the sum of their coefficients

    is approximately K3,/J, which is close to zero when J is much larger than K.

    If the pi are zero for j> J*, then the lag coefficients beyond that point will

    be close to zero but their sum will be close to II.

    Under the stationarity assumption,

    there are non-degenerate invariant

    random variables if and only if the process is not ergodic. The basic result

    here is the (pointwise) ergodic theorem: Let g be a random variable on

    (Q,F,P) with E(lgl)< co,

    and let g,(o)=g(Sw), where S is the shift

    transformation (see appendix A); then the following limit exists as.:

    The limit kj is an invariant random variable; it is the expectation of 8,

    conditional on &, where f is the information set (a-field) generated by all of

    the invariant random variables. If 1/(i) # 0 for some g, then the process is not

    ergodic. In the ergodic case, all of the invariant random variables have

    degenerate distributions.

    Suppose that

    and let

    E(Y, 44, A= b x, + c,

    Gil

    (Y,-Ylbt--x)

    il

    h-3.

    Recall condition (R): prob(s, =...=x~)=O. I want to examine the

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    13/42

    G. Chamberl ain, Multinariate regression models or panel data

    17

    significance of condition (R) as T+n; in the stationary case. Note that

    So a limiting version of condition (R) is

    prob[ I/(x, ) f) = 0] = 0.

    If this condition holds, then

    l imb

    ~~(xlY1l~)-~(xlI8)~(YlI,a)~, as.

    T

    T- r X

    E(4 I&)-cm, I&)I2 .

    and b is observable as T-tco. But if there is positive probability that

    T/(x, 1 ) =O, then the identification problem is more difficult. There is no

    information on b for the stayers; so-in order to obtain E(b), even as T-co,

    we

    have to make untestable assumptions about the unobservable part of the

    b distribution.

    3. Estimation

    Consider a sample Y;=(x:,yi),

    i =

    1,.

    . .,X,

    where xi. = (xi,, . ., xiK), yi

    =(yil,. . ., yiM). We shall assume that vi is independent and identically

    distributed (i.i.d.) according to some multivariate distribution with finite

    fourth moments and

    E(x,x:)

    non-singular. Consider the minimum mean

    square error linear predictors,

    E*(yi,

    I

    xi)

    =dlxi>

    m=l,...,M,

    which we can write as

    E*bi 1xi) = LZxi with

    tZ = Ebi xi) [E(xi xi)] .

    We want to estimate ll subject to restrictions and to test those restrictions.

    For example, we may want to test whether a submatrix of Ll has the form

    /?Z+lA.

    I think that analysis of covariance estimation should be accompanied

    by such a test.

    We shall not assume that the regression function

    E(y, 1xi)

    is linear. For

    although E@, 1 i, ci) may be linear (indeed, we hope that it is), there is generally

    This agrees with the definition in section 2 if xi includes a constant.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    14/42

    18

    G. Chamberlain, Multivariate regression models for panel data

    no reason to insist that E(ci Ixi) is linear. So we shall present a theory of

    inference for linear predictors. Furthermore, even if the regression function is

    linear, there may be heteroskedasticity - due to random coefficients, for

    example.8 So we shall allow V(j,

    1 i)

    to be an arbitrary function of xi.

    3.1. The estimation of linear predictors

    Let wi be the vector formed from the distinct elements of r i r i that have

    non-zero variance. Since v;=(xi,yi) is i.i.d.,

    it follows that wi is i.i.d. This

    simple observation is the key to our results. Since IZ is a function of E(wi),

    our problem is to make inferences about a function of a population mean,

    under random sampling.

    Let p= E(w,) and let IL be the vector formed from the columns of ll [Z

    = vet (IZ)]. Then YI is a function of P: x=/z(p). Let W= cy2

    1 w,/N;

    then

    7i = h(w) is the least squares estimator:

    .=VeC[ ~~XixI)-~~XiYI].

    By the strong law of large numbers, W

    converges almost surely to p as

    N-tee

    (WL

    $), where p is the true value of p. Let n=h(~o). Since h(p) is

    continuous at p =p, we have 2%

    7~. The central limit theorem implies

    that

    J5$i-pO)%v(O,

    (w,)).

    Since h(p) is differentiable at p = PO, the &method gives

    JN(iZ-d)%v(O,R),

    where

    We have derived the limiting distribution of the least squares estimator.

    This approach was used by Cramer (1946) to obtain limiting normal

    *Anderson (1969,1970), Swamy (1970,1974), Hsiao (1975), and Mundlak (1978a) discuss

    estimators that incorporate the particular form of heteroskedasticity that is generated by

    random coefficients.

    See Billingsley (1979, example 29.1, p. 340) or Rao (1973, p. 388).

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    15/42

    G. Chamberlain, Multivariate regression models for panel data 19

    distributions for sample correlation and regression coefficients (p. 367); he

    presents an explicit formula for the variance of the limiting distribution of a

    sample correlation coefficient (p. 359). Kendall and Stuart (1961, p. 293) and

    Goldberger (1974) present the formula for the variance of the limiting

    distribution of a simple regression coefficient.

    Evaluating the partial derivatives in the formula for 52 is tedious. That

    calculation can be simplified since i has a ratio form. In the case of simple

    regression with a zero intercept, we have rc = E(y,x,)/E(xj ) and

    fi(kTO)=

    y.u.-

    I I

    i=l

    Ql xi)[fi( , m)].

    Since I?= r x/N*E(x?), we obtain the same limiting distribution by working

    with

    fl C(Yi noxi)xillCfi E(xZ)l,

    The definition of rc gives E[(y, - rcxi)xi] = 0, and so the central limit theorem

    implies that

    This approach was used by White (1980) to obtain the limiting distribution

    for univariate regression coefficients.

    lo In appendix B (Proposition 5) we

    follow Whites approach to obtain

    where

    s2 =

    E[iJJi-noxi)(yi

    -nOx,) @@i; (Xi xi) @,

    1,

    (1)

    @, = E(qx;).

    A consistent estimator of 52 is readily available from the corresponding

    sample moments,

    n here

    o=&$ [~i-Bxi)(JJi-fiXi)@ S;(Xixi)S;q AL?,

    (2)

    L 1

    S,= 5 x,x:/N.

    i l

    Also see White (1980a,b).

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    16/42

    20 G. Chamberlain Multkariate regression modelsfiv

    panel

    data

    If E(j, 1 i) =ZZx, so that the regression function is linear, then

    If Vcvi 1Xi) is uncorrelated with xix;, then

    If the conditional variance is homoskedastic, so that V(j, 1 i)= C does not

    depend on xi, then

    3.2. Imposing restrictions: The minimum distance estimator

    Since IZ is a function of E(w,), restrictions on ZZ imply restrictions on E(wi).

    Let the dimension of r=E(wi) be q.

    We shall specify the restrictions by the

    condition that ~1 depends only on a p x 1 vector 8 of unknown parameters: p

    =g(8), where g is a known function and psq. The domain of 8 is X a subset

    of p-dimensional Euclidean space (RP) that contains the true value 8. So the

    restrictions imply that ~=g(6) is confined to a certain subset of

    Rq.

    We can impose the restrictions by using a minimum distance estimator:

    choose &to

    where A, ff-i P and P is positive definite. This minimization problem is

    equivalent to the following one: choose 6 to

    The properties of 6 are developed, for example, in Malinvaud (1970, ch. 9).

    Since g does not depend on any exogenous variables, the derivation of these

    properties can be simplified considerably, as in Chiang (1956) and Ferguson

    (1958).

    For completeness, we shall state a set of regularity conditions and the

    properties that they imply:

    If there is one element in ripi with zero variance, then q = [(K + M)(K + M + 1)/2] - 1.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    17/42

    G. Chamberl ain, M ult iv ari ate regression models for panel data

    21

    Assumption 1.

    uN aAg(Bo); Yis a compact subset of RP that contains 6; g

    is continuous on yT and g(6)=g(O) for 0~ Y implies that 8=8;

    A, s Y,

    where Y is positive definite.

    Assumption 2.

    $?[a,-g(O)] %(O, A); r contains a neighborhood

    O in which g has continuous second partial derivatives; rank (G) =p,

    G =

    ag eOym

    Choose 8 to

    minCa,-g(e)lA.Ca,-s(e)l.

    0Er

    Proposition 6.

    If Assumption I is satisfied, then ea%Oo.

    E. of

    where

    Proposition 7.

    Zf Assumptions I and 2 are satisfied, then ,,/%(&O)%V(O, A),

    where

    If A is positive definite, then A -(CT A - 1 c)-

    1

    is positive semi-definite; hence an

    optimal choice for Y is A .

    Proposition 8.

    If Assumptions I and 2 are satisfied, if A is a q x q positive

    definite matrix, and if A,%A- I, then

    Wwd831 4vC~,-g(B)1%2kp).

    (This is extended to the case of nested restrictions in Proposition 8, appendix

    B.)12

    Suppose that the restrictions involve only Zl. We specify the restrictions by

    the condition that z=f (4, where 6 is s x 1 and the domain of 6 is Y,, a

    subset of R that includes the true value 6. Consider the following estimator

    of 6: choose s^ to

    ~:CA-f(6)]8-[li-f(S)],

    1

    Since the proofs are simple, we shall keep the paper self-contained and include them in

    appendix B. The proofs are based on Chiang (1956), Ferguson (1958), and Malinvaud (1970,

    ch. 9).

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    18/42

    22

    G. Chamberl ain, M ult ioar iat e regression models or panel data

    where fi is given in eq. (2) and we assume that 0 in eq. (1) is positive

    definite. If Y, and

    f

    satisfy Assumptions 1 and 2, then 6^3S,

    fi(&

    so)qo, [F

    n ~

    Fj -

    ),

    and

    where

    F= i 3 f d ) /W.

    We can also estimate So by applying the minimum distance procedure to w

    instead of to Iz. Suppose that the components of wi are arranged so that

    w:=(w;,, wQ, where wil contains the components of x&. Partition p=E(wi)

    conformably: p = (PC;,&). Set 8 = (8r, VZ)= (8, pi). Assume that V(w,) is

    positive definite. Now choose 6 to

    and g,(n, ~1~)= pr. Then &r gives an estimator of 6; it has the same limiting

    distribution as the estimator 8 that we obtained by applying the minimum

    distance procedure to 12. (See Proposition 9, appendix B.)

    This framework leads to some surprising results on efficient estimation.

    For a simple example, we shall use a univariate linear predictor model,

    E*(yi 1

    Xil,Xiz)=710 +

    ?Tl

    Xi1

    +7Cz Xi2.

    Consider imposing the restriction rc2 = 0. Then the conventional estimator of

    n1 is byx,,

    the slope coefficient in the least squares regression of y on x1. We

    shall show that this estimator is generally less efficient than the minimum

    distance estimator if the regression function is nonlinear or if there is

    heteroskedasticity.

    Let fi,,it, be the slope coefficients in the least squares multiple regression

    of y on x1,x2. The minrmum distance estimator of a, under the restriction

    rrZ =0 can be obtained as 6=72r +r& where r is chosen to minimize the

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    19/42

    G. Chamberlain, Multivariate regression models for panel data

    23

    (estimated) variance of the limiting distribution of & this gives

    where Qj, is the estimated covariance between tij and I& in their limiting

    distribution. Since 72, = bYx, - 722bx2x1, we have

    If E(Y,

    1Xil,XiJ is

    linear and if V(y,

    1 ii, xi2)=a2,

    then w12/022 =

    -COv(Xi,,Xi2)/~(Xi~) and s^= byxl. But in general 8# byxl and s^ is more

    efficient than

    by_.

    The source of the efficiency gain is that the limiting

    distribution for ti, has a zero mean (if rc2=O), and so we can reduce variance

    without introducing any bias if 72, is correlated with

    b,,l.

    Under the

    assumptions of linear regression and homoskedasticity,

    b,_

    and 72, are

    uncorrelated; but this need not be true in the more general framework that

    we are using.

    3.3. Simultaneous equations: A generalization of two- and three-stage least

    squares

    Given the discussion on imposing restrictions, it is not surprising that two-

    stage least squares is not, in general, an efficient procedure for combining

    instrumental variables. I shall demonstrate this with a simple example.

    Assume that (yi,zirxil,xi2) is i.i.d. according to some distribution with finite

    fourth moments, and that

    yi = 6 Zi +

    Vi,

    where

    E(ui xii) = E(ui xi2) = 0.

    Assume also that

    E(zi xii) # 0, E(z, xi2) # 0.

    Then

    there are two instrumental variable estimators that both converge a.s. to 6:

    $jcifI YixijlifI zixij,

    j= 1,2,

    fi{(;;)-(;)}-N(OJ)>

    where the j,

    k

    element of n is

    2, = EC(Yi-dzi)2XijXi J

    Jk

    E(zixii)E(zi.xik)

    j,k=1,2.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    20/42

    24 G. Chamberl ain, M ult iv ari at e regression models or panel data

    The two-stage least squares estimator combines 8, and & by forming

    ^zi=7c1xil +ti2xi2, based on the least squares regression of z on x1,x2 (as-

    sume that E[(xir, Xia)(Xil, xi2)] is non-singular),

    where

    N

    N

    N

    oiti,

    c

    ZiXil

    i=l

    I

    ili~lzixil+722 C zixi2

    .

    i=l

    )

    Since i %a, JN(&s,,

    -6) has the same limiting distribution as

    This suggests finding the r that minimizes the variance of the limiting

    distribution of fi[r($i - 6) + (1 -r)(& -S)]. The answer leads to the

    minimum distance estimator: choose e^ o

    gives

    e^=z&+(l-z)&,

    where

    ~=(~+1,2)/(3.1+2~12+~22),

    and Ijk is the j, k element of A - .

    The estimator obtained by using a

    consistent estimator of A has the same limiting distribution.

    In general z #a since r is a function of fourth moments and a is not.

    Suppose, for example, that zi = Xi2. Then IX= 0 but z # 0 unless

    xil xi2

    E(xil

    x i2 )

    >I

    =o

    If we add another equation, then we can consider the conventional three-

    stage least squares estimator. Its limiting distribution is derived in appendix

    B (Proposition 5); however, viewed as a minimum distance estimator, it is

    using the wrong norm in general.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    21/42

    G. Chamberlain, Multivariate regression models for panel data

    25

    Consider the standard simultaneous equations model:

    yi = nxi +

    ui,

    E(Ui xi) = 0,

    ryi +

    BXi =

    ui,

    where rll+

    B= 0

    and Tui = vi. We are continuing to assume that yi is

    M x 1, xi is K x 1, r; = (xi yi) is i.i.d. according to a distribution with finite fourth

    moments

    (1 =

    1,. .,N), and that

    E(x,,xi)

    is non-singular. There are restrictions on

    r and

    B: m T, B )=O,

    where

    m

    is a known function. Assume that the implied

    restrictions on ll can be specified by the condition that n=vec(lT)=f(Q

    where the domain of 6 is r,, a subset of

    R

    that includes the true value So

    (s 5 MK). Assume that Y, and f satisfy Assumptions 1 and 2; these properties

    could be derived from regularity conditions on m, as in Malinvaud (1970,

    prop. 2, p. 670).

    Choose 8 to

    y: [7i - f(d)]&

    1[72-f(s)],

    E 1

    where d is given by eq. (2) and we assume that 0 in eq. (1) is positive

    definite. Let F= af(s)/S. Then we have J%(~-~~)%NN(O, A), where n

    = (F Q - 1 F) . This generalizes Malinvauds minimum distance estimator (p.

    676); it reduces to his estimator if UP uy is uncorrelated with xi xi, so that Q

    = E(up up ) @ [E(.qx;)] - (up = yi Zl x,).

    Now suppose that the only restrictions on r and B are that certain

    coefficients are zero, together with the normalization restrictions that the

    coefticient of yim in the mth structural equation is one. Then we can give an

    explicit formula for A. Write the mth structural equation as

    where the components of zi, are the variables in yi and xi that appear in the

    mth equation with unknown coefficients. Let there be M structural equations

    and assume that the true value r is non-singular. Let 6 =(S;, . . ., &) be s x 1,

    and let r(6) and

    B 6)

    be parametric representations of r and

    B

    that satisfy

    the zero restrrctions and the normalization rule. We can choose a compact

    set Y, c

    R

    containing a neighborhood of the true value a, such that I(6) is

    non-singular for b E Y,. Then s = f(s), where f(s) = vet [ - r

    (6)

    B S)].

    Assume that f(s) =IL implies that 6=6, so that the structural parameters

    are identified. Then Y, and f satisfy Assumptions 1 and 2, and J%(8-6)

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    22/42

    26

    G. Chamberl ain, M ult iv ari ate regression models or panel data

    A + (O, A). The formula for &r/&Y is given in Rothenberg (1973, p. 69),

    an/as =-(r - 1cg K) p,,(zM B d5; )I,

    where @,, is block-diagonal: @,, = diag {E(zilx:), . . ., E(Zi,Xi)}, and @,=E(X&).

    So we have

    n = {

    ~,,[E(Op

    Up~ Xi X:)] - l

    UP:,>

    ,

    where I$ = royi + So xi. If up up is uncorrelated with xi xj, then this reduces

    to

    n = {@J-E -(Up up) @ @,I] a;,> - l,

    which is the conventional asymptotic covariance matrix for three-stage least

    squares [Zellner and Thiel (1962)].

    I shall present a generalization of three-stage least squares that has the

    same limiting distribution as the generalized minimum distance estimator.

    Let /I=vec(B) and note that R= -(f ~

    @ I)/?.

    Then we have

    [ji+(r- 0

    z)/?]s)-[a+(r-

    0

    4Bl

    =[(ro1)72+P]O-[(ro1)12+81,

    where

    o=(Z~~;l)E(f UpU:r~XtX;)(Z~Qi;).

    Let S,, be the following block-diagonal matrix:

    and let

    where

    iji = ~yi +

    ~Xi

    p+rO

    7

    B%

    BO.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    23/42

    G. Chamberlain, Multivariate regression models fir palI e data

    21

    Now replace 0 by

    6 = (Z@s,- ) 9yzgs,- ),

    and note that

    (I 0 S,)[(r 0 472 + j?] = sxy - s:,s.

    Then we have the following distance function:

    This corresponds to Basmanns (1965) interpretation of three-stage least

    squares. 3

    Minimizing with respect to 6 gives

    a,,=(S,, F s:,)-(s,, Ps,,).

    The limiting distribution of this estimator is derived in appendix B

    (Proposition 5). We record it as:

    Proposition

    10. fi(6^,,-6)%iV(0,A),

    where A =(@,, P- @P:,)-l.

    This

    generalized three-stage least squares estimator is asymptotically efficient within

    the class of minimum distance estimators.

    Finally, we shall consider the generalization of two-stage least squares.

    Suppose that

    Yil =S; zil O i l ,

    where E(xiUil)=O, Zil is sl x 1, and rank [E(XiZ:l)] =sl. We complete the

    system by setting

    yi, = nk xi + Uim,

    where E(XiUi,)=O (m=2,. . ., M). SO z~,,,=x~ (m=2,.

    . ., M),

    and

    Let 6 =(6;, II;, . ., nJ and apply the minimum distance procedure to obtain

    8; since we are ignoring any restrictions on R, (m = 2,.

    . ., M), 8

    is a limited

    information minimum distance estimator.

    13See Rothenberg (1973 p. 82). A more general derivation of this distance function can be

    obtained by following Hanken (1982). Also see White (1982).

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    24/42

    28

    G. Chamberlain, Multivariate regression models for panel data

    We have a($, -@)1?N(O, n

    11),

    and evaluating the partitioned inverse

    gives

    where

    n 11= {E(Zi, $I [E((Diq)Z Xix:)] - E(Xi Zil)} _ ,

    (4)

    $1 =yi, -s;ozir.

    We can obtain the same limiting distribution by using the following

    generalization of two-stage least squares: Let

    and

    where $I %Sy (for example, 8r could be an instrumental variable estimator);

    then

    &;G2

    Z; x PE;,

    x2,)-

    (z; x P

    ,,

    Xy,).

    This is the estimator of S, that we obtain by applying generalized three-stage

    least squares to the completed system, with no restrictions on A, (m

    = 2,. . .)

    M). The limiting distribution of this estimator is derived in appendix

    B (Proposition 5):

    Proposition 11.

    ,,/%(8,,, -Sy)%N(O, A,,), where A,

    I

    is given in eq. (4). This

    generalized two-stage least squares estimator is asymptotically efficient in the

    class of limited information minimum distance estimators.

    3.4. Asymptotic efjciency: A comparison with the quasi-maximum likelihood

    estimator

    Assume that

    ri

    is i.i.d.

    (i=

    1,. . .,

    N) from a distribution with Er,) =z, V rJ

    =Z, where Z is a J x J positive definite matrix; the fourth moments are

    finite. Suppose that we wish to estimate functions of Z subject to restrictions.

    Let C= vet(Z) and express the restrictions by the condition that a=g(O),

    where g is a function from Yinto Rq with a domain YC RP that contains the

    true value

    O(q = J*; p 5 J(J + 1)/2).

    Let

    S=kiil

    ri-FJ ri-yi),

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    25/42

    If the distribution of vi is multivariate normal, then the log-likelihood

    function is

    If there are no restrictions on r, then the maximum likelihood estimator of 8

    is a solution to the following problem: Choose 6 to solve

    We shall derive the properties of this estimator when the distribution of Yi is

    not necessarily normal; in that case we shall refer to the estimator as a quasi-

    maximum likelihood estimator (e^,,,).14

    MaCurdy (1979) considered a version of this problem and showed that,

    under suitable regularity conditions, ,/%(gQML -0) has a limiting normal

    distribution; the covariance matrix, however, is not given by the standard

    information matrix formula. We would like to compare this distribution with

    the distribution of the minimum distance estimator.

    This comparison can be readily made by using Theorem 1 in Ferguson

    (1958). In our notation, Ferguson considers the following problem: Choose 8

    to solve

    w (s, e) [s-g e)] = 0.

    He derives the limiting distribution of

    fi(&--

    fI) under regularity

    conditions on the functions W and g. These regularity conditions are

    particularly simple in our problem since W does not depend on S. We can

    state them as follows:

    Assumption 3. E. c RP

    is an open set containing 8; g is a continuous, one-

    to-one mapping of E. into Rq with a continuous inverse; g has continuous

    second partial derivatives in Eo; rank [ag(fI)/S] =p for OE 8,; Z(O) is non-

    singular for

    edo.

    In addition, we shall need SaAg(Oo) and the central limit theorem result that

    +%(S-g(e))%N(O,d), where A = V[(U~-~~)@(U~-~~)].

    Then Fergusons theorem implies that the likelihood equations almost

    surely have a unique solution within So for sufficiently large N, and

    14The quasi-maximum likelihood terminology was used by the Cowles Commission; see

    Malinvaud (1970, p. 678).

    JE--B

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    26/42

    30

    G. Chamberl ain, M ult ioar iat e regression models or panel data

    vmL4L

    eO)%,N(O, A), where

    A=(GYG,-GYAYG(GYG,)-,

    and G=&(fl)/%, Y=(Z@Zo)-. It will be convenient to rewrite this,

    imposing the symmetry restrictions on Z. Let G* be the J( J+ 1)/2 x 1 vector

    formed by stacking the columns of the lower triangle of Z. We can define a

    J* x [ J( J + 1)/2] matrix

    T

    such that CT

    Ta*.

    The elements in each row of

    T

    are all 0 except for a single element which is one;

    T

    has full column rank. Let

    s= J-s*

    g(6)=

    Tg*(B), G* = ~g*(~)/S, Y* = TYT;

    then fi[S* -s*(0)]

    %N(O,A*),

    where

    A*

    is the covariance matrix of the vector formed from the

    columns of the lower triangle of (ri-rO)(ri -TO). NOW we can set

    /I =(e* y*G*)- (G*

    y* A* y* G*)(e* y*

    G*)-

    1.

    Consider the following minimum distance estimator: choose @MD o

    T$[s* -g*(B)] A,{ ?* -g*(O)],

    where ris a compact subset of E. that contains a neighborhood of 8 and

    A,=%Y*. Then the following result is implied by Proposition 7.

    Proposition 12. If Assumption 3 is satisfied, then J%(&~~~ -0) has the

    same limiting distribution as fi(gMD - 0).

    If A* is non-singular,

    an optimal minimum distance estimator has

    A,a%[A*-,

    where [ is an arbitrary positive real number. If the distribution

    of ri is normal, then A*- =iY*; but in general A*- is not proportional to

    Y*, since

    A*

    depends on fourth moments and Y* is a function of second

    moments. So in general flPML

    is less efficient than the optimal minimum

    distance estimator that uses

    ;i;l s~-s*) s:-s-i)

    1

    -1

    ,

    where SF is the vector formed from the lower triangle of (ri-r](ri-f).

    More generally, we can consider the class of consistent estimators that are

    continuously differentiable functions of s

    *: &=@*). Chiang (1956) shows that

    the minimum distance estimator based on

    A*-

    has the minimal asymptotic

    covariance matrix within this class. The minimum distance estimator based

    on A, in (5) attains this lower bound.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    27/42

    G. Chamberlain, Multivariate regression models or panel data

    31

    4. An empirical example

    We shall present an empirical example that illustrates some of the

    preceding results. The data come from the panel of Young Men in the

    National Longitudinal Survey (Parnes). The sample consists of 1454 young

    men who were not enrolled in school in 1969, 1970, or 1971, and who had

    complete data on the variables listed in table 1. Table 2a presents an

    unrestricted least squares regression of the logarithm of wage in 1969 on the

    union, SMSA, and region variables for all three years. The regression also

    includes a constant, schooling, experience, experience squared, and race. This

    regression is repeated using the 1970 wage and the 1971 wage.

    Table

    I

    Characteristics of National Longitudinal Survey

    Young Men, not enrolled in school in 1969,

    1970, 1971; N= 1454.

    Variable Mean

    Standard

    deviation

    LWI 5.64

    0.423

    LWZ 5.74 0.426

    LW3 5.82 0.437

    Ul 0.336

    u2 0.362

    lJ3 0.364

    lJlU2

    0.270

    lJIcJ3 0.262

    U2U3

    0.303

    UI CJ2U3

    0.243

    SMSAI 0.697

    SMSAZ

    0.627

    SMSA3 0.622

    RNSI

    0.409

    RNS2

    0.404

    RNS3 0.410

    s 11.7 2.64

    EXP69

    5.11 3.71

    EXP692 39.8 46.6

    RACE

    0.264

    LWI, L W2, LW3 ~ logarithm of hourly

    earnings (in cents) on the current or last job in

    1969,1970,1971; UI, U2, U3 - 1 if wages on

    current or last job set by collective bargaining,

    0 if not, in 1969,1970,1971; SMSAI,SMSAZ,

    SMSA3 -

    1 if respondent in SMSA, 0 if not,

    in 1969,1970,1971;

    RNSI, RNSZ, RNS3 -

    1, if

    respondent in South, 0 if not, in 1969,1970,1971;

    S ~ years of schooling completed; EXP69 -

    (S-age in 1969 -6); RACE - 1 if respondent

    black, 0 if not.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    28/42

    1

    u

    2

    b

    u

    y

    B

    n

    p

    m

    a

    v

    s

    o

    a

    p

    s

    X

    (

    8

    p

    6

    S

    C

    Z

    S

    V

    W

    Z

    W

    V

    S

    W

    a

    "

    S

    =

    %

    t

    1

    0

    9

    O

    (

    9

    0

    t

    z

    (

    P

    O

    2

    0

    f

    ?

    ?

    -

    (

    6

    O

    9

    O

    (

    9

    0

    L

    O

    (

    O

    O

    O

    9

    O

    f

    (

    1

    0

    P

    O

    (

    2

    0

    8

    0

    (

    S

    O

    O

    2

    0

    i

    (

    6

    O

    O

    0

    0

    (

    P

    O

    O

    1

    0

    (

    Z

    O

    O

    8

    0

    z

    z

    E

    O

    O

    )

    (

    L

    O

    z

    o

    Z

    O

    O

    O

    O

    l

    M

    k

    m

    (

    S

    O

    (

    O

    O

    S

    0

    P

    O

    O

    6

    0

    Z

    M

    (

    P

    O

    O

    h

    O

    b

    M

    O

    Z

    O

    O

    L

    O

    L

    O

    f

    M

    .

    _

    r

    z

    I

    a

    q

    A

    -

    _

    ~

    o

    J

    p

    m

    p

    m

    s

    u

    o

    j

    u

    _

    e

    .

    s

    u

    S

    s

    e

    l

    s

    p

    s

    m

    9

    1

    ~

    u

    z

    .b

    u

    Q

    B

    n

    p

    m

    a

    s

    o

    J

    p

    m

    s

    a

    L

    $

    w

    2

    9

    a

    s

    u

    %

    a

    V

    (

    8

    O

    O

    (

    E

    0

    1

    O

    O

    (

    9

    0

    S

    O

    (

    E

    O

    O

    O

    (

    O

    O

    (

    E

    O

    O

    Z

    O

    O

    O

    P

    O

    0

    8

    0

    C

    0

    f

    8

    0

    Z

    0

    I

    P

    O

    O

    9

    O

    .

    C

    (

    Z

    D

    O

    (

    6

    0

    (

    6

    0

    (

    9

    0

    S

    0

    (

    L

    O

    O

    9

    o

    (

    8

    0

    (

    E

    O

    O

    S

    O

    6

    O

    s

    0

    o

    o

    E

    O

    9

    0

    0

    0

    O

    O

    8

    O

    Z

    M

    (

    O

    O

    O

    8

    o

    g

    p

    (

    P

    O

    S

    O

    (

    s

    (

    S

    O

    O

    Z

    O

    I

    O

    O

    S

    O

    6

    0

    Z

    (

    s

    o

    I

    L

    O

    l

    C

    Z

    I

    I

    t

    z

    I

    a

    q

    m

    _

    ~

    l

    u

    :

    J

    o

    o

    a

    p

    m

    s

    p

    s

    u

    x

    a

    R

    S

    s

    m

    I

    p

    3

    u

    e

    a

    w

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    29/42

    G. Chamberlain Multivariate regression models for

    panel

    data

    33

    In section 2 we discussed the implications of a random intercept (c) and a

    random slope b). If the leads and lags are due just to c, then the submatrices

    of LI corresponding to the union, SMSA, or region coefficients should have

    the form /3l+U. Consider, for example, the 3 x 3 submatrix of union

    coefficients ~ the off-diagonal elements in each column should be equal to

    each other. So we compare 0.048 to 0.046, 0.042 to 0.041, and -0.009 to 0.010;

    not bad.

    In table 2b we add a complete set of union interactions, so that, for the

    union variables at least, we have a general regression function. Now the

    submatrix of union coefficients is 3 x 7. If it equals pZ3,0)+Zl, then in the

    first three columns, the off-diagonal elements within a column should be

    equal; in the last four columns, all elements within a column should be equal.

    I first imposed the restrictions on the SMSA and region coefficients, using

    the minimum distance estimator. fl is estimated using the formula in eq. (2),

    section 3.1, and A,=&. The minimum distance statistic (Proposition 8) is

    6.82, which is not a surprising value from a ~(10) distribution. If we impose

    the restrictions on the union coefficients as well, then the 21 coefficients in

    table 2b are replaced by 8: one fl and seven 2s. This gives an increase in the

    minimum distance statistic (Proposition 8, appendix B) of 19.36-6.82

    = 12.54, which is not a surprising value from a ~(13) distribution. So there is

    no evidence here against the hypothesis that all the lags and leads are

    generated by c.

    Consider a transformation of the model in which the dependent variables are

    LWl, LW2-LWl, and LW3-LW2. Start with a multivariate regression on

    all of the lags and leads (and union interactions); then impose the restriction that

    U,

    SMSA,

    and

    RNS

    appear in the LW2-

    L WI

    and LW3 - LW2 equations

    only as contemporaneous changes (E(y, - y,

    1 1 1, x2, x3) = p(x, - x,_ J).

    This

    is equivalent to the restriction that c generates all of the lags and leads, and

    we have seen that it is supported by the data. I also considered imposing all

    of the restrictions with the single exception of allowing separate coefficients

    for entering and leaving union coverage in the wage change equations. The

    estimates (standard errors) are 0.097 (0.019) and -0.119 (0.022). The standard

    error on the sum of the coefficients is 0.024, so again there is no evidence

    against the simple model with E(y, 1x1, x2, x3, c) = /IX, + c.15

    However, since the x,s are binary variables, condition (R) in Proposition 1

    Using May-May CPS matches for 197771978, Mellow (1981) reports coefftcients (standard

    errors) of 0.087 (0.018) and -0.069 (0.020) for entering and leaving union membership in a wage

    change regression. The sample consists of 6,602 males employed as non-agricultural wage and

    salary workers in both years. He also reports results for 2,177 males and females whose age was

    525. Here the coefficients on entering and leaving union membership are quite different: 0.198

    (0.031) and -0.035 (0.041); it would be useful to reconcile these numbers with our results for

    young men. Also see Stafford and Duncan (1980).

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    30/42

    34

    G. Chamberlain, Multivariate regression models for panel data

    does not hold. For example, the union coefticients provide some evidence

    that E(b

    1 1, x2,x,)

    is constant for the individuals who experience a change in

    union coverage [i.e., E(b

    1

    ,,x,,x,)=if if x,+x,+x,#O or 33; but there is

    no direct evidence on E(b

    1x1, x2, x3)

    for the people who are always covered

    or never covered. Furthermore, our alternative hypothesis has no structure.

    It might be fruitful, for example, to examine the changes in union coverage

    jointly with changes in employer.

    Table 3a exhibits the estimates that result from imposing the restrictions

    using the optimal minimum distance estimator.j We also give the

    conventional generalized least squares estimates. They are minimum distance

    estimates in which the weighting matrix (AN) is the inverse of

    We give the conventional standard errors based on (pfi;F)- and the

    standard errors calculated according to Proposition 7, which do not require

    an assumption of homoskedastic linear regression. These standard errors are

    larger than the conventional ones, by about 30%. The estimated gain in

    efficiency from using the appropriate metric is not very large; the standard

    errors calculated according to Proposition 7 are about 10% larger when we

    use conventional GLS instead of the optimum minimum distance estimator.

    Table 3a also presents the estimated Ils. Consider, for example, an

    individual who was covered by collective bargaining in 1969. The linear

    predictor of c increases by 0.089 if he is also covered in 1970, and it increases

    by an additional 0.036 if he is covered in all three years. The predicted c for

    someone who is always covered is higher by 0.102 than for someone who is

    never covered.

    Table 3b presents estimates under the constraint that I=U. The increment

    in the distance statistic is 89.08 - 19.36= 69.72, which is a surprisingly large

    value to come from a x2 (13) distribution. If we constrain only the union As

    to be zero, then the increment is 57.06- 19.36= 37.7, which is surprisingly

    large coming from a x2(7) distribution. So there is strong evidence for

    heterogeneity bias.

    The union coefficient declines from 0.157 to 0.107 when we relax the A =0

    restriction. The least squares estimates for the separate cross-sections, with

    16We did not find much evidence for nonstationarity in the slope coefficients. If we allow the

    union fi to vary over the three years, we get 0.105, 0.103, 0.114. The distance statistic declines

    IO 18.51, giving 19.36- 18.51 =0X5; this is not a surprising value from a x*(2) distribution. If we

    also free up /I for SMSA and RNS, then the decline in the distance statistic is 18.51- 13.44

    = 5.07, which is not a surprising value from a x(4) distribution.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    31/42

    T

    e

    3

    R

    c

    e

    e

    m

    e

    B

    /

    ,

    s

    C

    c

    e

    s

    (

    a

    s

    a

    d

    e

    o

    o

    u

    S

    M

    S

    4

    0

    1

    0

    0

    5

    (

    0

    0

    (0

    0

    0

    1

    0

    0

    (

    0

    0

    (0

    0

    (

    0

    0

    (0

    0

    R

    -

    0

    0

    (

    0

    0

    -

    0

    0

    (

    0

    0

    (

    0

    0

    U

    u

    (

    3

    l

    J

    J

    2

    U

    U

    r

    2

    U

    lJ

    U

    U

    -

    0

    0

    -

    0

    0

    -

    0

    0

    0

    1

    0

    1

    0

    1

    -

    0

    2

    .

    (

    0

    0

    (

    0

    0

    (

    0

    0

    (

    0

    0

    (

    0

    0

    (

    0

    0

    (

    00

    S

    M

    I

    S

    M

    A

    S

    M

    A

    R

    R

    R

    0

    0

    ~

    0

    0

    0

    0

    0

    1

    -

    0

    0

    -

    0

    1

    (

    0

    0

    (

    0

    0

    (

    0

    0

    (

    0

    0

    (

    0

    0

    (

    00

    x

    2

    =

    1

    3

    E

    C

    x

    =

    x

    +

    x

    x

    =

    U

    U

    U

    U

    U

    U

    U

    U

    U

    U

    U

    U

    S

    M

    A

    S

    M

    A

    S

    M

    A

    R

    R

    R

    x

    =

    1

    S

    E

    E

    R

    Z

    =

    J

    Z

    0

    B

    M

    A

    R

    +

    1

    Z

    s

    u

    e

    c

    e

    T

    e

    c

    o

    a

    e

    e

    e

    a

    n

    =

    F

    w

    e

    6

    s

    u

    e

    c

    e

    B

    a

    1

    a

    e

    m

    n

    m

    m

    d

    s

    a

    e

    m

    e

    w

    h

    A

    =

    i

    n

    e

    2

    s

    o

    3

    1

    o

    a

    l

    o

    a

    e

    m

    n

    m

    m

    d

    s

    a

    e

    m

    e

    w

    h

    A

    =

    6

    n

    e

    6

    s

    o

    4

    o

    i

    s

    n

    s

    h

    w

    i

    n

    t

    h

    t

    a

    e

    T

    f

    s

    a

    d

    e

    o

    f

    o

    /

    ?

    i

    s

    h

    c

    o

    o

    b

    o

    (

    FR

    4

    t

    h

    s

    s

    a

    d

    e

    o

    f

    o

    &

    i

    s

    b

    o

    F

    F

    ~

    F

    1

    F

    F

    F

    ~

    T

    x

    s

    a

    s

    c

    a

    e

    c

    m

    e

    f

    o

    m

    N

    k

    F

    G

    &

    ?

    F

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    32/42

    36

    G. Chamberlain, Multivariate regression models for panel data

    Table 3b

    Restricted estimates under the constraint that I = 0.

    Coefficients (and standard errors) of:

    u

    SMSA

    RNS

    s^

    0.157

    0.120 -0.150

    (0.012)

    (0.013)

    (0.016)

    x2(36) = 89.08

    See

    footnote to table 3a.

    no leads or lags, give union coefficients of 0.195, 0.189, and 0.191 in 1969,

    1970 and 1971.17 So the decline in the union coefficient, when we allow for

    heterogeneity bias, is 32% or 44x, depending on which biased estimate (0.16

    or 0.19) one uses. The SMSA and region coefficients also decline in absolute

    value. The least squares estimates for the separate cross-sections give an

    average SMSA coefficient of 0.147 and an average region coefficient of

    -0.131. So the decline in the SMSA coefficient is either 53% or 62x, and the

    decline in absolute value of the region coefficient is either 45% or 37%.

    5. Conclusion

    We have examined the relationship between heterogeneity bias and strict

    exogeneity in distributed lag regressions of y on x. The relationship is very

    strong when x is continuous, weaker when x is discrete, and non-existent as

    the order of the distributed lag becomes infinite.

    The individual specific random variables introduce nonlinearity and

    heteroskedasticity. So we have provided an appropriate framework for the

    estimation of multivariate linear predictors. We showed that the optimal

    minimum distance estimator is more efficient, in general, than the

    conventional estimators such as quasi-maximum likelihood, We provided

    computationally simple generalizations of two- and three-stage least squares

    that achieve this efficiency gain.

    Using the NLS Young Men in 1969 (N = 1362), Griliches (1976) reports a union membership

    coefticient of 0.203. Using the NLS Young Men in a pooled regression for 19661971 and 1973

    (N=470), Brown (1980) reports a coefficient of 0.130 on a variable measuring the probability of

    union coverage. (The union coverage question was asked only in 1969, 1970, and 1971; so this

    variable is imputed for the other four years.) The coefficient declines to 0.081 when individual

    intercepts are included in the regression. His regressions also include a large number of

    occupation and industry specific job characteristics.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    33/42

    G. Chamberlain Multiuariate regression modekfor panel data

    37

    Some of these ideas were illustrated using the sample of Young Men in the

    National Longitudinal Survey. We examined regressions of wages on the

    leads and lags in union coverage, SMSA, and region. The results indicate

    that the leads and lags could have been generated just by a random

    intercept. This gives some support for analysis of covariance type estimates;

    these estimates indicate a substantial heterogeneity bias in the union, SMSA,

    and region coefficients.

    ppendix

    Let Sz be a set of points where OEQ is a doubly infinite sequence of

    vectors of real numbers: 0={...,0_~,0~,0~,...}={0,,t~I), where w,ER~

    and I is the set of all integers. Let z,(w)=o, be the tth coordinate function.

    Let F be the a-field generated by sets of the form

    A =

    (0.xz,(w)E

    B,, . . .,

    Z,+k(u)E Bk},

    where t, k E I and the Bs are q-dimensional Bore1 sets. Let P be a probability

    measure defined on 9 such that {e,, t E 11 is a (strictly) stationary stochastic

    process on the probability space (C&P-, P).

    The shift transformation S is defined by z,(So) =zt+ r(w). It is an invertible,

    measure preserving transformation. A random variable d defined on (sZ,P, P)

    is invariant if d(So)=d(w) except on a set with probability measure zero

    (almost surely or as.). A set A E 9 is invariant if its indicator function is an

    invariant random variable.

    We shall use E(d ( Y), to denote the conditional expectation of the random

    variable d with respect to the o-field 3, evaluated at w. Let x, be a

    component of zl, let g(x) denote the a-field generated by {.. ., x_ 1, x0, x1,. . .},

    and let E(d1 xt,x,_

    r,. . .) denote the expectation of d conditional on the

    a-field generated by xt, xt 1,. . . .

    Proposition 3.

    If d is an invariant random variable with E(ldl)< co, then

    where t is any integer.

    Proof. First we shall show that E(d I a(x)) is an invariant random variable.

    Let f(o)=d(Sw). A change of variable argument shows that

    E(d I CT(X))~~ E(fl S- o(x)),

    a.s.

    [See Billingsley (1965, example 10.3, p. 109).] Since d is an invariant

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    34/42

    38 G. Chamberlain, Multivariate regression models,for panel data

    random variable, we have d(Sw)=d(o) as.; also S- CJ(X) 0(x). Hence

    Let CJ(X,,X _ 1,. . .) denote the a-field generated by (x,, x,_ 1,. . .), and let

    ~=~~_a(xt,x,-

    1,.

    . .)

    be the left tail o-field generated by the x process.

    Since E(d

    1 T(X))

    is an invariant random variable, there is a version of

    E(d 1a(x)) that is measurable Y-. [See Rozanov (1967, lemma 6. l., p. 162).]

    Hence E(d

    1o(x)) = E(d 1Y)

    a.s.,

    and

    so

    E(d

    1a(x)) = E(d 10(x,, xt_ 1,. . .)).

    Q.E.D.

    Let d be an invariant random variable and assume that E(P)< co,

    E(xT)< co. Consider the Hilbert space of random variables generated by the

    linear manifold spanned by the variables {d,. . .,x_ 1, x,,, x1,. . .}, closed with

    respect to convergence in mean square. We also include a constant (1) in

    the space. The inner product is (a, b) =E(ab). Then the linear predictor

    E*(d I

    . ..) X_1,X(),Xl)... )

    is defined as the projection of d on the closed linear

    subspace generated by { 1,. . ., x _ 1, x0, x1,. . .}.

    Proposit ion 4.

    I f d i s an i nvar i ant random var i able and E(d) < co, E(xf) < co,

    then

    E*(dl ..., x_l,xO ,xl ,... )=$+A&

    w here f i s t he l imi t i n mean square of cfzO X ,-~/J as J-co, t i s any integer,

    and

    i = cov (d, ~)/V(~)

    if V(2) 0,

    =o

    if

    V(R)=O,

    rc/ E(d) - AE(.f).

    Proof:

    The existence of the limit is implied by the mean ergodic theorem

    [Billingsley (1965, theorem 2.1, p. 21)]. Since d is an invariant random

    variable, we have cov(d,x,)=cov(d, x1) for all t. Let aJ=xf=l x,-j/J. Then

    cov (d, a,) = cov (d, x,),

    and

    so cov (d, m) lim,, m cov (d, 2,) = cov (d, x1).

    Since

    i is an invariant random variable, we have cov (a, a,)= cov (a, x,), and so

    V(a) = lim,, m

    cov (2, a,) = cov (a, x1). Hence

    cov

    (d $ - A f, x,) = cov (d, x1)-I cov (a, x1)

    =

    cov (d, 2) - 1 V (a) 0,

    t E I.

    Since we also have E(d - I I / 2~2) 0, the proof is complete.

    Q.E.D.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    35/42

    G. Chamberlain Multioariate reyression models for panel data 39

    Appendix B

    Let r: = (x;,yi ), i = 1,. . ., N, where xi = (xii,. . ., xix) and yi = (y,r, . . ., yiM). Write

    the mth structural equation as

    Yim = S:, i r n U im ,

    m=l,...,M,

    where the components of zi, are the variables in yi and xi that appear in the

    mth equation with unknown coefficients. Let S,, be the following block-

    diagonal matrix:

    and

    Let 0: = (I$~, . ., &), where u& = yim - 6,ozi, and ~5: is the true value of 6,; let

    Gz, = E&J Let 6 =(S;, . . ., 6b) be s x 1, and set

    s^=

    S,,

    D -

    Sz,) -

    (S,,

    D

    s,,,).

    Proposition 5. Assume that (1)

    ri

    is i.i.d. according to some distribution with

    ,finite fourth moments; (2) E[xi(yi, -8: Zi,J] = 0 (m = 1,. , ., M); (3) rank (a..,) = s;

    und (4) D a Y as N- +E,

    ,I$& 6)s N(0, A), where

    where P is a positive definite matrix. Then

    Proof: ~(S^-~O)=(S,,D-~S~,)-~~,,D~~~~~,(U~O~~)~~. By

    the

    strong law of large numbers, Sz,2@,,; @2x Y ~

    W,

    is an s x s positive

    definite matrix since rank (@,,)=s. So we obtain the same limiting

    distribution by considering

    (Gi,, Y - l a:,) - 1 CD,, Y l f (I$ @ X&G.

    i=l

    Note that II: @Ixi is i.i.d. with E(u: @ Xi)=O, V(U~ 0

    X~)=E(U~U: 0 Xix;).

    Then

    applying the central limit theorem gives ~(8-6)~N(0,A).

    Q.E.D.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    36/42

    40

    G. Chamberlain, Multivariate regression models or panel data

    This result includes as special cases a number of the commonly used

    estimators. If zi, = xi(m = 1

    . . ., M)

    and D =Z, then 8 is the least squares

    estimator and ,4 reduces to the formula for R given in eq. (1) of section 3.1.

    If Y = E($$) 0 E(x&), then n is the asymptotic covariance matrix for the

    three-stage least squares estimator. If Y =E($$ @ XiXI), then ,4 is the

    asymptotic covariance matrix for the generalized three-stage least squares

    estimator [eq. (3), section 3.31. If

    Y = diag{E(z$~) E(xi xi), . ., E(vg) E(xi xi)),

    then we have the asymptotic covariance matrix for two-stage least squares. If

    Y = diag{ E($t Xi xi), . ., E(?I~$ xi xi)},

    we have the asymptotic covariance matrix for generalized two-stage least

    squares. [A,, is given in eq. (4), section 3.3.1

    Next we shall derive the properties of the minimum distance estimator. Let

    D,(0) = [a,-g(@]AJa,-g(6)] and choose e to

    min ON(e).

    Bt 1

    Assumptions 1 and 2 are stated in section 3.2.

    Proposition 6.

    f

    ssumpt i on 1 i s sati sfied, t hen @%I .

    ProoJ:

    Let D*(6) = &(0) -g(0)] Yk(8) -g(e)].

    D, a.~.

    converges uniformly

    to D* on Y: Let B be a neighborhood of 8 and set r = r-

    B.

    Then

    min D,(@*min D*(8)=&

    BEY BET

    Since 6 > 0 and

    DN (& )a*O,

    it must be that 8~ B a.s. for N sufficiently large.

    Since B is an arbitrary neighborhood of 8, we have shown that 8 Leo.

    Q.E.D.

    Proposition 7.

    f

    Assumpti ons I and 2 are sat i sfi ed, t hen fi(& -8)

    % N(U , A), w here

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    37/42

    G. Chamberl ain, M ult ivari ate

    regression

    models or panel dat a

    41

    Zf A is positive definite, then A -(G A - 1 G)- 1 is positive semi-definite; hence an

    optimal choicefbr Y is A I.

    Proof Let

    s,(fI)=dD,(B)/80= -2(aip(e)/a~)A.[a,-g(e)].

    Since &SO, for N sufficiently large we as. have 6~ Z. and s,(8) = 0. The

    mean value theorem implies that

    s,(6)= ~~(8) + (ds,(O*)/tW) (6 0)

    a.s.,

    for sufficiently large N, where 8* is on the line segment connecting 8 and 0.

    [There is a different 8* for each row of &,(O*)/%; the measurability of 8*

    follows from lemmas 2 and 3 of Jennrich (1969).] Since 0*28, direct

    evaluation shows that

    &&I*)/% 32 G Y G,

    which is non-singular. Hence

    fi(e^- 0) = - [t%,(tl*)/ae,] - l

    JNs, eo)

    a.s.,

    for sufficiently large N. We obtain the same limiting distribution by

    considering

    Hence @(e - 0)s N(O, A).

    To find an optimal Y, note that there is a non-singular matrix C such that

    A=CC.

    Let G=C-Gand B=(GYG)-GYC. Then we have

    which is positive semi-definite. Q.E.D.

    Proposition 8. If Assumptions I and 2 are satisfied, if A is positive definite,

    and if ANaLA-, then

    NC% s(e)l Ada, -&I

    %c2(q

    PI.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    38/42

    42

    G. Chamberl ain, M ulr ~rt rri ate regression models or panel data

    Proof:

    For sufficiently large N we have

    JNCg(&)-g(OO)]

    G,

    JE(e^-0)

    a.s.,

    where G, %G. From the proof of Proposition 7, we have

    JE(e^--OO)=R,JN[u,-g(BO)] s

    .,

    where R,~R=(GA-G)-lGAp. Hence

    fib,

    -g(831

    = ,%,

    -AeoN

    -fiCg(83

    g(e")l sQCU>

    where Q = Z,- GR, C is a non-singular matrix such that CC = A, and

    II 2 N(O,

    I,);

    d, = N[a, -g(@)] A,[a, -g(8)] %i C Q A - 1 QCu.

    Let G=CG and M,=Z,-c(G@lc; then M, is a symmetric

    idempotent matrix with rank

    q-p

    and

    CQA-QC=M;=M,.

    Hence d,~,uM,u~~X2(q-p). Q.E.D.

    Now consider imposing additional restrictions, which are expressed by the

    condition that 8 =f(a), where

    a

    is s x 1 (s 5 p). The domain of a is Y,, a subset

    of

    R

    that contains the true value d . So O=f(a) is confined to a certain

    subset of RP.

    Assumption 2. Y,

    is a compact subset of

    R

    that contains a; f is a

    continuous mapping from r, into Y, f a) =

    e

    for a E Y, implies a =a ; Y,

    contains a neighborhood of a0 in which f has continuous second partial

    derivatives; rank (F) = s, where

    F= df a)/da.

    Let h a)=g[f a)]. Choose oi to

    min

    [ahi-h a)]AN[u,-h a)].

    OIE,

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    39/42

    G. Chamberlain Multivariate regression models

    for

    anel data

    43

    Proposition 8. If Assumptions 1, 2, and 2 are satisfied, $ A is positive

    definite, and if AN%Ael, then d, -d2%~(p-s), where

    d, =N[a,

    -W)lA.Ca,-W41,

    Furthermore, d, -d, is independent of d, in their limiting joint distribution.

    Proof The assumptions on f and Y, imply that h and & satisfy

    Assumptions 1 and 2. By following the proof of Proposition 8, we can show

    that the vector (d,, d2) converges in distribution to (d:, d:), where

    U& N(O,Z), C is a non-singular matrix such that CC= A, 8= C-H,

    G=C-G, and

    MH=Iq-A(RA)-lzT, M,=z,-e(el~-w

    Since fi is in the column space of e, we have M,Mc=MGM, =M,; so

    MH-MG is a symmetric idempotent matrix with rank p-s. Hence

    d, -d,~tU(MH-MG)u~~X2(p-s).

    Since

    cov[(M,-M,)u,M,u] =(MH-MG)MG=O,

    we see that d: - d,* is independent of d:.

    Q.E.D.

    In section 3.2 we considered applying the minimum distance procedure

    both to L and to W. We want to show that if the restrictions involve only n,

    then the two procedures give estimators of R with the same limiting

    distribution. First consider the effect of a one-to-one transformation from W

    to (ti, w;): let I(p) be a function from R4 into Rq and let L = al(p)/a$, where

    p =g(O). Let h(8) = Ik(O)]. Choose 8 to

    y;,rCk) - W41ANCQ4 W)l.

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    40/42

    44

    G. Chamberl ain, M ult iv ari at e regression models or panel data

    Proposition 9a.

    Assume that (1) Assumptions 1 and 2 are satisfied for g and

    1: (2) 1 is one-to-one and continuous on the range of g(O) for 0~ K 1 has

    continuous second partial derivatives in a neighborhood of g(Q); L is non-

    singular; (3) A is positive definite and A, A(LAL)-. Then ,/%(6-O)

    3

    N(0, A),

    where A = (G A

    l c)-.*

    Proof: By the d-method,

    fi[f ~,)-h 8~)]~N O,

    A L).

    Hence ,,/%(& O)% N(0, A), where ,4 = (H(L A L)) 1 H)- and H= c%(O~)/N.

    Since

    H= L G

    and

    L

    is non-singular, we have A =(G

    A ~ c)-.

    Q.E.D.

    Finally, consider augmenting aN to a

    k x 1

    vector cN: c;V=(a;, bk),

    kzq.

    (For example, we can augment 12 by adding WZ.) Assume that

    cN %

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    41/42

    G. Chamberl ain, Mul t iw ri ate regression models for panel data

    45

    where A&,

    is the s,

    t

    submatrix of A; (s, t = 1,2). Then the concentrated

    distance function is

    = [aN -g(O)] A,[a, -g(O)].

    Q.E.D.

    So the addition of unrestricted moments does not affect the minimum

    distance estimator.

    References

    Anderson, T.W., 1969, Statistical inference for covariance matrices with linear structure, in: P.R.

    Krishnaiah, ed., Proceedings of the second international symposium on multivariate analysis

    (Academic Press, New York).

    Anderson, T.W., 1970, Estimation of covariance matrices which are linear combinations or

    whose inverse are linear combinations of given matrices, in: Essays in probability and

    statistics (University of North Carolina Press, Chapel Hill, NC).

    Amemiya, T., 1971, The estimation of variances in a variance-components model, International

    Economic Review 12, l-13.

    Balestra, P. and M. Nerlove, 1966, Pooling cross section and time series data in the estimation

    of a dynamic model: The demand for natural gas, Econometrica 34, 5855612.

    Basmann, R.L., 1965, On the application of the identifiability test statistic and its exact finite

    sample distribution function in predictive testing of explanatory economic models,

    Unpublished manuscript.

    Billingsley, P., 1965, Ergodic theory and information (Wiley, New York).

    Billingsley, P., 1979, Probability and measure (Wiley, New York).

    Brown, C., 1980, Equalizing differences in the labor market, Quarterly Journal of Economics 94,

    113-134.

    Chamberlain, G., 1980, Analysis of covariance with qualitative data, Review of Economic

    Studies 47, 225-238.

    Chiang, C.L., 1956, On regular best asymptotically normal estimates, Annals of Mathematical

    Statistics 27, 336-351.

    Cramer, H., 1946, Mathematical methods of statistics (Princeton University Press, Princeton,

    NJ).

    Ferguson, T.S., 1958, A method of generating best asymptotically normal estimates with

    application to the estimation of bacterial densities, Annals of Mathematical Statistics 29,

    10461062.

    Goldberger, AS., 1974, Asymptotics of the sample regression slope, Unpublished lecture note no.

    12.

    Griliches, Z., 1976, Wages of very young men, Journal of Political Economy 84, S69-S85.

    Griliches, Z. and A. Pakes, 1980, The estimation of distributed lags in short panels, National

    Bureau of Economic Research technical paper no. 4.

    Hansen, L.P., 1982, Large sample properties of generalized method of moments estimators,

    Econometrica 50, forthcoming.

    Hsiao, C., 1975, Some estimation methods for a random coefficient model, Econometrica 43,

    3055325.

    Jennrich, R.I., 1969, Asymptotic properties of non-linear least squares estimators, The Annals of

    Mathematical Statistics 40, 6333643.

    Kendall, M.G. and A. Stuart, 1961, The advanced theory of statistics, Vol. 2 (Griffin, London).

    MaCurdy. T.E., 1979, Multiple time series models applied to panel data: Specification of a

    dynamic model of labor supply, Unpublished manuscript,

  • 7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

    42/42

    46

    G. Chamberl ain, M ult iv ari ate regression models for panel data

    Maddala, G.S., 1971, The use of variance components models in pooling cross section and time

    series data, Econometrica 39, 341-358.

    Malinvaud, E., 1970, Statistical methods of econometrics (North-Holland, Amsterdam).

    Mellow, W., 1981, Unionism and wages: A longitudinal analysis, Review of Economics and

    Statistics 63, 43-52.

    Mundlak, Y., 1961, Empirical production function free of management bias, Journal of Farm

    Economics 43,44-56.

    Mundlak, Y., 1963, Estimation of production and behavioral functions from a combination of

    time series and cross section data, in: C. Christ et al., eds., Measurement in economics

    (Stanford University Press, Stanford, CA).

    Mundlak, Y., 1978, On the pooling of time series and cross section data, Econometrica 46, 699

    85

    Mundlak, Y., 1978a. Models with variable coefftcients: Integration and extension, Annales de

    IINSEE, 30-31, 4833509.

    Rao, C.R., 1973, Linear statistical inference and its applications (Wiley, New York).

    Rothenberg, T.J., 1973, Efficient estimation with a priori information (Yale University Press,

    New Haven, CT).

    Rozanov, Y.A., 1967, Stationary random processes (Holden-Day, San Francisco, CA).

    Sims, CA., 1972, Money, income, and causality, American Economic Review 62, 54G552.

    Sims, C.A. 1974, Distributed lags, in: M.D. Intriligator and D.A. Kendrick, eds., Frontiers of

    quantitative economics, Vol. II (North-Holland, Amsterdam).

    Swamv, P.A.V.B.. 1970. Efficient inference in a random coefficient regression