4
MISSPECIFICATION: EXCLUDING AND INCLUDING VARIABLES SIMULTANEOUSLY* KRISHNA KADIYALA [Jniwrsity of Western Ontario and Algoma University College, Canada I. INTRODUCTION Theil (1957) considered the problem of excluding relevant variables and including irrelevant variables in a regression line: many researchers worked on this important topic since Theil’s 1957 article. Recently the problem of misspecification under first- order serial correlation (both known and unknown autocorrelation coefficient) has been analyzed by Kadiyala (1984). Giles (1983) considered the estimation of misspecified models by the instrumental variables technique. Fomby (1981)extended the misspecification analysis for inclusion of a set of irrelevant variables; see, also Riddell and Buse (1980). Thus far, researchers have analyzed the problem of excluding relevant variables and including irrelevant variables separately. But in practice, sometimes, we may come across a misspecified model which excludes a relevant variable(s) and, at the same time, includes an irrelevant variable(s). The purpose of this paper is to analyze the behaviour of the unknown coefficient of the misspecified regression equation. Specifically, we show that the bias and the variance of the unknown coefficient of the regression equation depend on simple and multiple correlation coefficients when only one regressor is included/excluded. Further, we show that the ratio of generalised variances depends on canonical correlations when a set of regressors are included/excluded. 11. THE MODEL AND MAIN RESULTS Let us assume that the correct regression equation is y = XlPl + .1‘2p2 + t = XiB1 + t where. y, .r, and .rz are )I x 1 vectors of observations on the dependent and independent variables, p1 and p2 are unknown scalars to be estimated and t is an )z x 1 vector of disturbances such that E(t) = 0 and V(t) = 0’1. It is well known that * I would like to thank the referee for useful suggestions. 206

MISSPECIFICATION: EXCLUDING AND INCLUDING VARIABLES SIMULTANEOUSLY

Embed Size (px)

Citation preview

MISSPECIFICATION: EXCLUDING AND INCLUDING VARIABLES SIMULTANEOUSLY*

KRISHNA KADIYALA

[Jniwrsity of Western Ontario and Algoma University College, Canada

I. INTRODUCTION

Theil (1957) considered the problem of excluding relevant variables and including irrelevant variables in a regression line: many researchers worked on this important topic since Theil’s 1957 article. Recently the problem of misspecification under first- order serial correlation (both known and unknown autocorrelation coefficient) has been analyzed by Kadiyala (1984). Giles (1983) considered the estimation of misspecified models by the instrumental variables technique. Fomby (1981) extended the misspecification analysis for inclusion of a set of irrelevant variables; see, also Riddell and Buse (1980). Thus far, researchers have analyzed the problem of excluding relevant variables and including irrelevant variables separately. But in practice, sometimes, we may come across a misspecified model which excludes a relevant variable(s) and, at the same time, includes an irrelevant variable(s). The purpose of this paper is to analyze the behaviour of the unknown coefficient of the misspecified regression equation. Specifically, we show that the bias and the variance of the unknown coefficient of the regression equation depend on simple and multiple correlation coefficients when only one regressor is included/excluded. Further, we show that the ratio of generalised variances depends on canonical correlations when a set of regressors a re included/excluded.

11. THE MODEL A N D M A I N RESULTS

Let us assume that the correct regression equation is

y = X l P l + .1‘2p2 + t = XiB1 + t

where. y, .r, and .rz are ) I x 1 vectors of observations on the dependent and independent variables, p1 and p2 are unknown scalars to be estimated and t is an )z x 1 vector of disturbances such that E( t ) = 0 and V(t) = 0’1. It is well known that

* I would like to thank the referee for useful suggestions.

206

A where, PI is the OLS estimator of p1 for the true model in (1). Now, suppose the misspecified model excludes x2 but includes X I and, let us say, that the researcher includes an irrelevant variable (x3) simultaneously. That is, the misspecified model is:

y = ZlPl + x3 p 3 + u

= XzBz + 11 (4) where. .I*,{ isa vector of observationson the irrelevant variable and we have used PI in ( 4 ) (and in (1)) for simplicity of notation. Also. \ve assume that all variables a re measnretl as deviations from their respective sample means. The 01,s estimator of H, for the misspccified model in ( 4 ) is

fi, = (x;'s"-''YZ'!/ ( 5 )

Nest . noting the fact that = .1.~,!3~ + .i.,B2 + t . RS givcn i n (1). \ r e can easily verify that

( 7 ) E ($1 ) = 81 + /Kd/- R,2,3 p2 xi x1 1-r13

where

is the simple correlation between x i and x and J

r . . - a] 'ik'jk R . . = aj.k

&. J-1- jli

is the partial correlation of x ; and .r ., keeping xk constant. J We note from (7 ) , that

Remarks. In (71, bias o f z i is positive if the multiple correlation coefficient R123 and p2 have the same sign and the bias is negative if they have different signs. We note that this result is similar to the well-known result of excluding relevant variables as follows: let us assume, in the standard notation, that the true model is y = xlpl + s2B2 + C, the misspecified model is y = x lp l + t* and i1 is the OLS estimator of pl for the misspecified model. Then it follows that the bias of 51 is -r12p2.

Next, the variance-covariance matrix of is:

V(ii2) = u2 (X;xJ' ( 8)

208 AITSTKAI.IAN E(:ONOMIC PAPERS

From (3) and (9) we get

~ ( p ^ ~ ) = 1 - r k V(p"1) 1 - rh

A That is, the variance of p 1 for the true model is less than the variance of p"1 for the false

model if the absolute correlation between the joint variable (xl) and the other variable of the misspecified model ( ~ 3 ) is greater than the absolute correlation between x1 and the other variable of the t rue model ( x 2 ) or if r12 =O and r~32 0. Also, the variance of p1 is the same as the variance of

A if rf3 = rfz.

The above results can be easily extended if 21,X2,X3, and p1,p2,p3 a re matrices and vectors in (1) and (4) using the canonical correlation concept. In this case, the t rue model in (1) can be written as:

and the misspecified model in (4) can be written as

1/ = .rTpT + sjpt + 1i*

= X$B$ + ti*

where. .r?* . t s ? and .t? are, sag. )/.rkI, u.rk:! and )/.t*k:< matrices of observations on the independent variables and P?.p1" and are k l s l . k2xl. and k:3xl vectors of unknowns. We can verify that

A where, B? is the OLS estimator of B? for the true model in (11). Next, the OLS estimator of B$ for the misspecified model in (12) is

B? = (S$'x$)-"Y2*'y

19x5 MISSPECIE‘ICA’I‘ION

t

i = l - - n (1 -rB2)

where. t = ))tin ( k l , k ~ ) . p = mir! ( k l , kx). r;*, i = I . ... t , are the canonical correlations between the sets of variables .r.? and .r.P and I * . ..j = 1. ...p a re the canonical correlations between the sets of variables I? and ~ 2 * . We note that when .J.? and .J,: areorthogonal )-.

I V(b?)l. These results a r e similar to those in equation (10).

J

- -0 , j= l , . . . p , a n d ~ , + ’ O , i = I ,... f,when,.r?and.rJ*areorthogonal:inthiscase. I V(p?)l = .I

111. SUMMARY

Misspecification is a very serious problem in econometric theory, and the consequences of specification errors were originally analyzed by Theil (1957). Thus far, researchers are concerned with two types of misspecifications: excluding relevant variables and including irrelevant variables where they treated these two problems separately. But, in practice we may come across a situation where a relevant variable(s) is excluded and, at the same time, an irrelevant variable(s) is included in a true regression model; this paper dealt with this type of misspecification. We obtain the bias and the variance of the unknown coefficient of the true regression model and showed that they depend on simple and multiple correlation coefficients. Also, we extended the misspecification analysis of simultaneous inclusion and exclusion of one regressor to a set of regressors: in the latter case the ratio of generalised variances depend on canonical correlations.

First version received 5th June, 1984 Final version accepted 11th December, 1984

[Edi tors]

REFERENCES Fomby. T.B. (1981), “Loss of Efficiency in Regression Analysis due to Irrelevant Variables, A

Generalization”, Economics Letters. Giles, D.E.A. (19831, “Instrumental Variables Estimation of Mis-Specified Regressions”, in

Proceedings of the Business and Economic Statistics Section, 1983 Meetings, American Statistical Association.

Hooper, J.W. (1959), “Simultaneous Equations and Canonical Correlation Theory”, Econmetrica. vol. 27.

Kadiyala, K. (1984), “Misspecification Under the Presence of Autocorrelated Disturbances”. Proceedings of Twenty-Second Econometric Conference, Indian Statistical Institute, Bangalore, India.

Australian Ecotiomic Papers, vol. 19.

the International Statistical Institute.

Riddell, W.C. and Buse. A. (1980). “An Alternative Approach to Specification Errors”,

Theil, H. (1957), “Specification Errors and the Estimation of Economic Relationships”. Review of