12
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 481-491 © Research India Publications http://www.ripublication.com 481 On Properties of the difference between two modified C p statistics in the nested multivariate linear regression models S. Boonyim 1 , J. Jitthavech 2 and V. Lorchirachoonkul 3 1-3 School of Applied Statistics, National Institute of Development Administration Bangkok 10240, Thailand. Abstract The statistic D which is the difference of the modified p C statistics in the full model and in the reduced model is developed for variable selection in the contemporaneous multivariate linear regression nested model. The expectation, the variance and the distribution of the statistic D are derived via three theorems and three lemmas. Keywords: Modified p C statistic, test statistic for variable selection, distribution of the test statistic. Mathematics Subject Classification: 62H10, 62H15 Introduction In the system-of-equations model with contemporaneously but no serially correlated disturbance the dependent variable vector y can be written in the stacked form of the system of m equations model as y Xβ ε , (1) where y is the 1 nm vector consisting of subvectors i y , the 1 n dependent variable vector in equation i, 1, 2, , , i m X is the T nm p diagonal matrix consisting of diagonal submatrices, i X , the i n p matrix of independent variables including the constant unit vector in equation i with rank( i X ) i p , 1, 2, , , i m i p is the number

On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 481-491 © Research India Publications http://www.ripublication.com

481

On Properties of the difference between two modified

Cp statistics in the nested multivariate linear

regression models

S. Boonyim1, J. Jitthavech2 and V. Lorchirachoonkul3

1-3School of Applied Statistics, National Institute of Development Administration Bangkok 10240, Thailand.

Abstract

The statistic D which is the difference of the modified pC statistics in the full model and in the reduced model is developed for variable selection in the contemporaneous multivariate linear regression nested model. The expectation, the variance and the distribution of the statistic D are derived via three theorems and three lemmas. Keywords: Modified pC statistic, test statistic for variable selection, distribution of the test statistic. Mathematics Subject Classification: 62H10, 62H15

Introduction In the system-of-equations model with contemporaneously but no serially correlated disturbance the dependent variable vector y can be written in the stacked form of the system of m equations model as y Xβ ε , (1) where y is the 1nm vector consisting of subvectors iy , the 1n dependent variable vector in equation i, 1,2, , ,i m X is the Tnm p diagonal matrix consisting of diagonal submatrices, iX , the in p matrix of independent variables including the constant unit vector in equation i with rank( iX ) ip , 1,2, , ,i m ip is the number

Page 2: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

482 S. Boonyim et al

of parameters in equation i , 1

,m

T ii

p p

1 2( )m β β β β is the 1Tp parameter

vector consisting of subvectors iβ , the 1ip parameter vector in equation i , 1,2, , ,i m and n is the number of observations in each equation. The 1nm random

disturbance vector,ε , is assumed to be uncorrelated across observations in the same equation but contemporaneously correlated across observations in different equations and distributed as ( , )nmN 0 Ω where n Ω Σ I and the covariance matrix Σ can be written as

11 12 1

21 22 2

1 2

m

m

m m mm

Σ .

The multivariate regression model is a special case of the system-of-equations model where i X X , ip p , 1,2, ,i m . The total number of parameters in the multivariate regression model is equal to Tp mp . In a multivariate linear regression model, the modified pC statistic suggested by Lorchirachoonkul and Jitthavech (2012) becomes

1 1( ) ( 2 ) ( )y yM E n p tr ε Ω ε Σ Σ , (2)

where ε is an 1nm vector of random errors in the model and distributed as ( , )nmN 0 Ω , 1 1

y y n Ω Σ I , yΣ is an m m covariance matrix of dependent

variables. The estimate of M in (2) in the full model with the total number of parameters fmp can be expressed as

1 1ˆ ˆ( 2 ) ( ),ff f y f f y eMC n p tr e Ω e Σ S (3)

where fe is the 1nm vector consisting of 1n subvector of residuals in equation i ,

, 1,2, ,if i me , in the full model, ˆ

yΣ is the estimate of yΣ and feS is the m m

sample covariance matrix of the residual vectors ife ’s, which is used as the estimate

of Σ provided that the full model is a “good” approximation of the unknown “true” model. The modified pC statistic in (2) is different from the original definition of Mallows pC

statistic, and the extended definition of Mallows pC statistic in a single equation model to a multivariate regression model (Sparks et al., 1983; Fujikoshi and Satoh, 1997; Yanagihara and Satoh, 2010). The unknown error covariance matrix of the true model in the original definition is replaced by the known covariance matrix of dependent variables as shown in (2). The rest of the paper is organized as follows. The hypothesis testing for variable selection under the modified pC statistic in (2) in the contemporaneous multivariate linear regression nested model is introduced in the next

Page 3: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

On Properties of the difference between two modified Cp statistics in the nested 483

section. Section 3 presents the derivation of the statistical properties and the distribution of the difference of the modified pC statistics in the full model and in a reduced model in the multivariate linear regression. The final section is conclusions.

Hypothesis Testing In variable selection in the multivariate linear regression model in (1) under the M criterion in (2), an explanatory variable can be eliminated from the model specification if its discard does not increase the value of the statistic significantly. Otherwise, such explanatory variable is retained in the model. In other words, a variable or a number of variables can be eliminated from the model specification if the null hypothesis is not rejected at a significant level . 0 : r fH M M against :a r fH M M , (4)

where fM and rM are the values of the modified pC statistics of the full model and the reduced model respectively. In order to test the hypothesis (4), it is sufficient to consider whether the difference ,r fD MC MC (5) is significantly different from zero. From (3), it is obvious that the difference D can be expressed as 1 1ˆ ˆ( ) ( ) 2 ( ),

fr f y r f y eD dtr e e Ω e e Σ S (6)

where fe and re are the residual vectors in the full and reduced models respectively, d is the difference in the number of parameters in the full and reduced models,

f rd p p . Properties of the Statistic D In this section, the expectation, the variance and the distribution of the statistic D are investigated. The statistic D in (6) consists of two terms. The first term in the quadratic form can be expressed in terms of a linear combination of the products of two correlated normal vectors as

1

1 1 1 1ˆ ˆ ˆ( ) ( ) ( ) ( ) ,

i i j j

m m m mij ijr f y r f y r f r f y i j

i j i j

e e Ω e e e e e e d d (7)

wherei ii r f d e e and

j jj r f d e e . Similarly, 1ˆ( )fy etr

Σ S which is a part of the

second term of the statistic D in (6) can be written in terms of a linear combination of the product of two correlated normal vectors as (7),

1

1 1 1 1ˆ ˆ ˆ( ) , , 1,2,..., ,

( )f fi j

f fij

m m m mij ijy e y e y

i j i j ftr s i j m

n p

e e

Σ S (8)

Page 4: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

484 S. Boonyim et al

where , 1,2,..., ; 1,2,..., .( )

f fi jfije

fs i m j m

n p

e e

Noted that both terms in the LHSs of

(7) and (8) are in the similar form of weighted sum of the product of twocorrelated vectors which may be represented by a Qa where a is a 1nm vector consisting of m correlated subvectors of size 1n , ,ia with zero mean and variance 2

ia n I ,1,2, ,i m , and Q is an nm nm matrix.

Lemma 1. Let ia and ja be the 1n correlated vectors distributed as 2( , )in a nN 0 I

and 2( , )jn a nN 0 I respectively,

ija is a correlation coefficient between ia and ja , iz

is an 1n vector distributed as 2( , )in a nN 0 I , independent of ia , and ja is defined as

2(1 )j jj ij ij

i i

a aa i a i

a a

a a z . (9)

Then ja and ja are identically distributed as 2( , )jn a nN 0 I .

Proof of Lemma 1. Since ia and iz are independent normal random vectors with the

same zero mean and variance 2ia n I , the expectation of ja is equal to zero,

2( ) (1 )j jij ij

i i

a aj a i a i

a aE E

a a z

0 , and the variance of ja can be expressed as

2 22 2 2 2

2 2( ) (1 )j jij i ij i

i i

a aj a a n a a n

a aVar

a I I

2ja n I .

Therefore, ja is distributed as 2( , ),jn a nN 0 I the identical distribution as .ja

Lemma 2. Let a be a 1nm vector consisting of m correlated subvectors, ia ’s, of

size 1n distributed as 2( , )in a nN 0 I ,

ija be a correlation coefficient between ia and

ja and n Q I where Σ is an m m matrix. Thus,

21,

i

m i ii

i au

a aa Qa (10)

Page 5: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

On Properties of the difference between two modified Cp statistics in the nested 485

where 1 ij i j

mi ij a a a

ju

, and ij is the thij element of the matrix Σ .

Proof of Lemma 2. The quadratic term a Qa can be expressed as

1 1

m mij i j

i j

a Qa a a . (11)

By Lemma 1, the random vector ja in (11) can be replaced by the random vector ja in (9). Then (11) becomes

2

1 1 1 1(1 ) .j j

ij iji i

m m m ma aij a i i ij a i i

i j i ja a

a Qa a a a z

Since iz is independent of ia , we get

21,

i

m i ii

i au

a aa Qa

where1

.ij i j

mi ij a a a

ju

(12)

By Lemma 2, it is obvious that (7) and (8) can be respectively re-written as

11

1 1ˆ( ) ( ) ,

m mr f y r f i i i

i jw

e e Ω e e d d (13)

12 21

ˆ( ) ,i if

fi

m f fy e i

i etr w

e eΣ S (14)

where 11

ˆ ,ij i j

m iji y d d d

jw

(15)

and 21

ˆ f f fij i je e em iji y

j fw

n p

. (16)

Substituting (13) and (14) into (6) yields

1 22 21 12 i i

i fi

m m f fi ii i

i id eD w d w

e ed d . (17)

Theorem 1. Let id and

ife be distributed as 2( , )in d nN 0 I and 2( , )

fin e nN 0 I , 1iw and

2iw be as defined in (15) and (16) respectively. Then the distribution of 1 21 i

m i ii

i dw

d d

can be approximated by 12

1 1 ,fR where 1 and 1f are chosen so that the first two

Page 6: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

486 S. Boonyim et al

cumulants of the distributions are equal. Similarly, the distribution of 2 21

i i

fi

m f fi

i ew

e e can

be approximated by 22

2 2 ,fR under the same criteria, where

2 21 1 1

11

11

2ij

m mi i j d

i j im

ii

w w w

w

,

2

11

12 21 1 1

1

,2

ij

mi

im m

i i j di j i

n wf

w w w

2 22 2 2

12

21

2,

ij

m mi i j ef

i j im

ii

w w w

w

2

21

22 22 2 2

1

,2

ij

mi

im m

i i j efi j i

n wf

w w w

ijd is the correlation coefficient between id and jd ,

and fije is the correlation coefficient between

ife and jfe .

Proof of Theorem 1. Since 2i

i i

d

d d is distributed as chi-squared with n degrees of

freedom, it is obvious that the expectation of 1 21 i

m i ii

i dw

d d can be written as

1 121 1i

m mi ii i

i idE w n w

d d .

The variance of 1 21 i

m i ii

i dw

d d can be written as

21 1 1 12 2 2 21 1 1

2 ( , )i i i j

m m m m j ji i i i i ii i i j

i i i j id d d dVar w w Var w w Cov

d dd d d d d d

2 21 1 1

12 2

ij

m mi i j d

i j in w w w

,

since 2 2 , 1,2 ,i

i i

dVar n i m

d d and 2ijd is the correlation coefficient of 2

i

i i

d

d d and

2j

j j

d

d d(Isserlis, 1981).

Page 7: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

On Properties of the difference between two modified Cp statistics in the nested 487

The distribution of the weighted sum of correlated chi-squared variable 1 21 i

m i ii

i dw

d d can

be approximated by 1

21 1 fR (Brown, 1975; Makambi, 2003 and Hou, 2005) where

the first two cumulants of the two distributions are equal. Thus, equating the first two

cumulants of 1 21 i

m i ii

i dw

d d and 1R yields

1 1 11

,m

ii

f n w

(18)

2 2 21 1 1 1 1

12 2 2

ij

m mi i j d

i j if n w w w

. (19)

Solving (18) and (19) yields

2

11

12 21 1 1

1

,2

ij

mi

im m

i i j di j i

n wf

w w w

(20)

2 21 1 1

11

11

2ij

m mi i j d

i j im

ii

w w w

w

. (21)

Similarly, it can be shown that

2

21

22 22 2 2

1

,2

fij

mi

im m

i i j ei j i

n wf

w w w

(22)

2 22 2 2

12

21

2,

fij

m mi i j e

i j im

ii

w w w

w

(23)

where 2fije is the correlation coefficient of 2

i i

i

f f

ef

e e and 2

j j

j

f f

ef

e e (Isserlis, 1981). □

Lemma 3. 1d W d and 2f fe W e are independent where d and fe are the 1nm vectors

consisting of m subvectors id , distributed as 2( , )in d nN 0 I , and m subvectors ife ,

Page 8: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

488 S. Boonyim et al

distributed as 2( , )fin e nN 0 I respectively, 1,2,...,i m , 1W and 2W are the nm nm

diagonal matrices with the thi diagonal element 12i

i

d

w

and 22

fi

i

e

w

respectively.

Proof of Lemma 3. The residual vectors ife and

ire in the full model and in the

reduced model can be respectively expressed in term of iy (Graybill, 1976), ( )

if n f i e I M y , (24)

( )ir n r i e I M y , (25)

where 1( )f f f f f M X X X X and 1( )r r r r r

M X X X X . From (24) and (25), the

difference vector id can be written as ( )i f r i d M M y . (26)

Consider the term r fM M by partitioning the matrix fX into rX and dX .

1

11

( )( )

( )r r r

r f r r r r r ddd d

X X 0 XM M X X X X X X

X0 X Xr M , (27)

since rX and dX are independent. Similarly, .f r rM M M (28)

Then, by (27), (28) and the idempotent property of fM and rM , the quadratic term

1d W d and 2f fe W e can be expressed in term of y as

11 21

( )( )i

m ii f r f r i

i d

w

d W d y M M M M y

2

121

( )i

i

m i yi f r i

i d

w

y M M y , (29)

22 21

( )( )i

m if f i n f n f i

i f

w

e W e y I M I M y

2

221

( )i

i

m i yi n f i

i f

w

y I M y , (30)

where iy is a standardized form of iy with variance nI . From (29) and (30), it can be said that if ( )i f r i y M M y and ( )i n f i y I M y are independent, then 1d W d and

2f fe W e are independent. Again, by (27) and the idempotent property of fM , it can be concluded that ( )( )f r n f M M I M 0 ,

Page 9: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

On Properties of the difference between two modified Cp statistics in the nested 489

which leads to the conclusion that ( )i f r i y M M y and ( )i n f i y I M y are independent (Graybill, 1976). □ Theorem 2. The expectation and the variance of the statistic D can be expressed as

2 2

1

2 2ˆ ˆ2i i ij i j ij i j

m mii ijD y d ef y d d d ef ef ef

i j if f

d dnn p n p

,

22 2 2 2

1

22 2

1

22

24 ,

fii i f i

i j ij i j f f ij i ji j

m eD d d e

i f

m md d d d d e e ef ef ef

i j i f

dn S S

n p

dn S S S Sn p

where1

ˆ ,i ik k

m ikd y d d

kS

1

ˆ ,j jk k

m jkd y d d

kS

1

ˆi ik k

m ikef y ef ef

kS

and

j jk k

m jkef y ef ef

kS

.

Proof of Theorem 2. By Theorem 1, the expectation of the statistic D in (17) can be expressed as 1 1 2 22D f d f . (31) Substituting 1 1 2, ,f and 2f from (20)-(23) into (31) gives

1 21

2m

D i ii

n w dw

. (32)

Substituting (15) and (16) into (32) gives

1 1 1ˆ ˆ2

f f fij i jij i j

e e em m mij ijD y d d d y

i j j fn d

n p

2 2

1

2 2ˆ ˆ2i i ij i j ij i j

m mii ijy d ef y d d d ef ef ef

i j if f

d dnn p n p

. (33)

By Theorem 1 and Lemma 3, the variance of statistic D can be written as 2 2 2 2

1 1 2 22 8D f d f . (34) Again, substituting 1 1 2, ,f and 2f from (20)-(23) into (34) yields

2 2 2 2 2 2 21 2 1 1 2 2

12 4 2 4

ij fij

m mD i i i j d i j e

i j in w d w w w d w w

. (35)

Substituting (15) and (16) into (35) gives

Page 10: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

490 S. Boonyim et al

22 2 2 2

1

22 2

1

22

24 ,

fii i f i

i j ij i j f f ij i ji j

m eD d d e

i f

m md d d d d e e ef ef ef

i j i f

dn S S

n p

dn S S S Sn p

(36)

where ,idS ,

jdS iefS and jefS are as defined in Theorem 2.

Theorem 3. The distribution of the statistic D is a gamma distribution with shape parameter 2 2

D D and scale parameter 2D D .

Proof of Theorem 3. Since the statistic D in (17) is a linear combination of two independent Chi-squared variables, the distribution of D can be approximated by 2

f where f and are chosen so that the first two cumulants of the two distributions are equal (Brown, 1975; Makambi, 2003 and Hou, 2005). By Theorem 2, it is obvious that

2 22 D Df and 2 (2 )D D . Therefore, it can be concluded that the distribution

of the statistic D is a gamma distribution with shape parameter 2 2D D and scale

parameter 2D D . □

By Theorem 2, the standardized statistic D can be written as DD

D

DT

which, from

the central limit theorem, converges to N(0,1) as n .

Conclusions In this paper, three theorems and three lemmas are proved to provide the statistical properties of the statistic D which can be used in variable selection in the contemporaneous multivariate linear regression nested model. This is another alternative of variable selection by testing the hypothesis (4) instead of direct comparison of the modified pC statistics. Without the hypothesis testing, the backward procedure of variable elimination suggests to retain any variable in the model specification if its elimination results in the modified pC statistic higher than fMC even in the case of slight increase. But by hypothesis testing, a variable can be eliminated from the model specification if its elimination does not cause the rejection of the null hypothesis. It can be concluded that the number of independent variables in the final model by using the hypothesis testing in variable elimination is at most equal to the number of independent variables in the final model by the conventional variable elimination. The proposed concept can be easily extended for variable selection by the forward and stepwise procedures of variable selection.

Page 11: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

On Properties of the difference between two modified Cp statistics in the nested 491

References [1] Brown, M.B., 1975, A method for combining non-independent, one-sided tests

of significance. Biometrics,31, 987-992. [2] Fujikoshi, Y., Satoh, K., 1997, Modified AIC and pC in multivariate Linear

Regression. Biometrika, 84(3), 707-716. [3] Graybill, F.A., Theorem and Application of the Linear model. Duxbury,

Canada, 1976. [4] Hou, C.D., 2005, A simple approximation for the distribution of the weighted

combination of nonindependent or independent probabilities. Statistics and Probability Letters, 73, 179-187.

[5] Isserlis, L., 1981, On a formula for the product-moment coefficient of any order of a normal frequency distribution on in any number of variables. Biometrika, 12, 134-139.

[6] Lorchirachoonkul, V., Jitthavech, J., 2012, A modified pC statistic in a system-

of-equations model. Journal of Statistical Planning and Inference,142, 2386-2394.

[7] Makambi, K.H., 2003, Weighted inverse chi-square method correlated significance tests. Journal of Applied Statistics, 30, 225-234.

[8] Sparks, R.S., Troskie, L., Coutsourides, D., 1983, The multivariate pC .

Communications in Statistics-Theory and Methods, 12(15), 1775-1793. [9] Yanagihara, H., Satoh, K., 2010, An unbiased pC criterion for multivariate

ridge regression. Journal of Multivariate Analysis, 101, 1226-1238.

Page 12: On Properties of the difference between two modified ... · On Properties of the difference between two modified Cp statistics in the nested 485 where 1 ij i j m i ij a a a j u V

492 S. Boonyim et al