14
GeneraIized Least Squares F-Test and Relevant ML Estimation in Regression Analysis With Two-Stage Cluster Samples by S. R. Paul Dept. of Mathematics & Statistics WMSR#96-01 May 1996

Generalized Least Squares F-Test and Relevant ML Estimation in

  • Upload
    vucong

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

GeneraIized Least Squares F-Test and Relevant ML Estimation in Regression Analysis

With Two-Stage Cluster Samples

by S R Paul

Dept ofMathematics amp Statistics WMSR96-01

May 1996

Generalized Least Squares F-Test and Relevant

ML Estimation in Regression Analysis

With Two-Stage Cluster Samples

S R Paul

For regression analysis of data from two-stage duster sampling we extend the GLS

F-test of Rao Sutradhar and Vue (1993) to the general situation in which the intracluster

correlations and the variances are possibly different The situations in which (i) the

variances are common and the intracluster correlations are possibly different and (ii) the

intracluster correlations and the variances are both common are dealt with as special

cases For all the models considered we derive the required transformed variables with

iid errors and the maximum likelihood estimates of the unknown intracluster correlations

Results of a small scale simulation study similar to that of Rao et al (1993) show that

the GLS F-test using maximum likelihood estimates of the intracluster correlations might

produce correct type I error rate irrespective of the amount of collinearity and intracluster

correlation

Keywords Equal and unequal intracluster correlations equal and unequal sample sizes

GLS F test maximum likelihood estimation size and power two-stage sampling

1 Introduction

The use of the standard F test in regression analysis of complex survey data leads to

inflated type I error rate (size) due to correlated errors in regression model appropriate for

clustered data In case of two-stage cluster samples Wu Holt and Holmes (1988) propose

a simple correction to the standard F-test which ~akes account of possible common or

heterogeneous intracluster correlations However for common intraclass correlation p they

SR Paul is Professor Department of Mathematics and Statistics University of Windsor Windsor Ontario Canada N9B 3P4 The author thanks JNK Rao for supplying details of simulation study in Rae et al (1993) and Brajendra C Sutradhar for several discussions during the earLy development of this paper

Typeset by AAAS-lEX

show by simulation that the corrected F-test performs much better than the standard Fshy

test in controlling the size for a scalar hpothesis It performs almost as well as the iterative

generalized least square (IGL8) F-test for large values of p and better than the IGL8 for

small p in controlling size Rao 8utradhar and Vue (1993) propose a simple GLS F test

which also takes account of common intracluster correlation and show also by simulation

that for both scalar and vector hypotheses the GLS F test performs as well as the corrected

F-test in controlling the size even for small p and that it leads to significant power gains

for large values of p

For the estimation of the common intra-cluster correlation p Wu et a1 (1988) use a two

step procedure and Rao et a1 (1993) use a method of fitting constants due to Henderson

(1953) Rao et a1 (1993) comment that the performance of the GLS F test using the

estimate of the common p by the two-step procedure of Wu et a1 (1988) is similar to

that of the GLS F test using the estimate of p by the method of fitting constants due to

Henderson (1953) These estimates however are method of moment type estimates and

they use data under the alternative hypothesis A disadvantage of the corrected F-test and

GLS test using such as estimate of p is that they produce inflated type I error rate with

increasing collinearity and large p (see Wu et a1 1988) A more appropriate approach

would be to esimate p using data under the null hypothesis Such an approach is taken in

the well-known score test (Rao 1947) or the C(a) test of Neyman (1959) However the

method of moment estimate of p such as that obtained by Hendersons procedure can not

be obtained under all null hypothesis situations We propose using maximum likelihood

estimates

Note that the GLS F-test proposed by Rao et a1 (1993) with common intracluster corshy

relation and variance is based on transformed data following Fuller and Battese (1973)

with iid errors In this paper we extend the GLS F-test to the general situation in which

the intracluster correaltions and the variances are possibly different The situations in

which (i) the variances are common and the intracluster correlations are possibly different

2

and (ii) the intracluster correlations and the variances are both common are dealt with as

special cases For all the models considered we derive the required transformed variables

with iid errors and the maximum likelihood estimates of the unknown intracluster correshy

lations The regression model and the associated GLS F test are given in section 2 In

section 3 we deal with the model and its variants the transformations and the maximum

likelihood estimation of the intracluster correlation(s A small scale simulation study

similar to that of Rao et al (1993) is given in section 4 to show the possible advantage of

using the maximum likelihood estimates of the intracluster correlations

2 The Regression Model and The GLS F Test

Following Fuller and Battese (1973) we consider a regression model with nested error

structure that allows for intracluster correlations

y = XfJ + c (21)

where X is an (n x k) matrix ofregression variables fJ is a vector of k regression parameters

E(c) = 0 and E(cc = D where D is positive definite The generalized least-squares

estimator of fJ is

which has cov (ffi) (XD-1X-lXD-1X Fuller and Battese (1973) show that if a

transformation matrix T can be found such that the transformed errors

euro = Tc

are uncorrelated with constant variances the generalized least-squares estimator fJ is

given by the ordinary regression of the transformed dependent variable y = Ty on the

transformed independent variable X = T X Thus for testing the vector hypothesis

Ho 0 fJ = b where 0 is a known q x k matrix of rank q( lt k) and b is a known q x 1

vector the standard F test based on the transformed data is based on

(OfJ - b) (X~X~rl (OfJ - b) qFGLS= ~~--~~~--~--~~--~~

(y - XJ) (y - Xf3) (n - k)

3

2

which has an exact F distribution with q and n k degrees of freedom where f3 =

(X X)-l Xy is the ordinary least squares estimator of f3 under the transformed model

and X~ = X (XX)-l C

We will see later that the regression model involves the variance components at or a or

the intracluster correlations Pi or p The F-distribution of the statistic FGLS is based on

the intracluster correlation parameter Pi or P and the variance parameter at or a2 being

known In practice these parameters are unknown If the parameters are replaced by some

consistent estimators the distribution of FGLS will be approximately correct For model

III Rao et al (1993)proposed estimating P by a moment type procedure due to Henderson

(1953) and Fuller and Battese (1973) which is consistent This procedure uses all data

under the full model ie under the alternative hypothesis In this paper we propose to

use instead maximum likelihood estimates of the variance and intracluster correlation

parameters under the null hypothesis

3 The Model Its Variants The Transformations

and Relevant Maximum Likelihood Estimation

31 The Model and its Variants

Consider a two stage cluster sample of n observaLions with c clusters at the first stage

of sampling and mi elements drawn from the ith-sampled cluster at the second stage

n = ~mi The model with the nested error structure is

(31)

and

where Yij is the response ofthejth element in the ith cluster Xij = (XijOXijl Xijk-r)

with XijO = 1 f3 = (f3o i3I f3k-I) is the vector of regression parameters Vi rv N (0 a~i)

and Uij rv N (0 a~i) Now denote al a~i + a~~ and Pi = a~dal Clearly Pi is the

4

intracluster correlation within the second stage units of the ith cluster Thus for two stage

cluster samples model (31) can be written as

Model I Y X3+E E N(O D)

I)( (X~Imiddot IX~) with Xi deshy0 bullwhere Y = Ylmiddotmiddotmiddot Yimiddotmiddotmiddot Ye Yi

noting the mi X k matrix with rows X~j j = 10 mi E = (E~middotmiddot E~middotmiddotmiddot E~) Ei

( Eil bull Eim D has a diagonal form Efo-Vi with Vi = (1 - Pi) Im- + PiJm- where Ip iso

bull )

1 bullbull

the p x p identity matrix and Jp is the p x p unit matrix

Different variants of model I are possible and have been dealt with by other authors

When 0 = u 2 for all i model I is identical to model (22) of Wu Holt and Holmes (1988)

which we write as

Model II y=X3+E E N (0 D)

2where D = u 2 Ef Vi and Y X 3 E and Vi are all same as those of model When uf u1

and Pi = P for all i model I is identical to the model dealt with by Campbell (1977) Scott

and Holt (1982) Rao Sutradhar and Vue (1993) We write this model as

Model III Y X3+E E N (OD)

where D u 2 EfWi and y X 3 and E are the same as those of model I and Wi = (1 shy1

P)lmi + pJmi middot

32 Transformations

The covariance matrix for the error vector E in model I is the block diagonal matrix

e h 2 _1 e _1 - _ -1 -~ D = (JjD i were Di Ui Vi Then D 2 = (JjD 2 where D - U i Vi Now Vi1 middot1middotmiddot

(1 - Pi)lmi + PiJmi which can be written as

where ti = 1 + (mi I)Pi Then it is easily verified that (see FUller and Battese 1973)

5

and euroi = (1Oi)Vi-euroi are uncorrelated with constant variances 1 Thus the matrix

Ti = (1OdVi- = 1 I lmi - I - [(l~Pd] i transforms the error vector euroUi(l-pd

to a vector of uncorrelated random variables with constant variances 1 Thus the transshy

formations for y and X are

yii = 1 (Yii - OiYiO) and xii = I (xii - OiXiO) Oi (1 - Pi) Oi (1 - Pi)

where 0i = 1- [(1- Pi) I + (mi -1) Pi] fho = ElYiimi andxiQ = E71xiimi

The transformed model can be written as yii = xij3 +uii j 1 mi i = 1 C or c

in matrix notation as Y= X3 + u where uii N(OI)or u N(O E9lm J SimilarlyIV

1

mit can be shown that for model II the matrix Ti = I [lmi - I - [(l~Pi)] J 1u(l-Pi) m transforms the error vector to to a vector of uncorrelated random variables with constant

variances 1 The transformation for y and X are

yij = 1 1 (Yij - OiYiO) and xii = 1 1 (Xij - OiXiO) O(I-Pi)2 O(I-Pi)2

where 0i = 1- [(1- Pi)1 I +(mi -1)pl ~ The transformed model then is Yij = Xij3 +uii

j = 1 mi i = 1 C where uii N(OI)or in matrix notation y = X3+u where c

u N(O E9Im) Finally for model III the transformation 1

Ti = Imi - I - [(1 - p) I + (mi-I) P ] i transforms the error vector euro to a vector of uncorrelated random variables with constant

variance 0 = 02(1- p) Clearly for model III the transformation for y and X are yij

Yii - OiYiO and xij = Xij - 0iXiQ where 0 = 1- [(1- p) I I + (mi -1)p] t The transformed

fI+ 1 modle IS Yij = Xijl- Uij J = mi t 1 c or in matrix notation Y = X 3 + u

where uij N(OO~) or u N(Oq~In)(See also Roo et al (1993))

34 Maximum Likelihood Estimates of Intraclass Correlation

We consider estimating the intracluster correlation parameters in presence of the reshy

gression parameters under the null hypothesis H0 C3 = b We assume that C i3 = b is

6

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

Generalized Least Squares F-Test and Relevant

ML Estimation in Regression Analysis

With Two-Stage Cluster Samples

S R Paul

For regression analysis of data from two-stage duster sampling we extend the GLS

F-test of Rao Sutradhar and Vue (1993) to the general situation in which the intracluster

correlations and the variances are possibly different The situations in which (i) the

variances are common and the intracluster correlations are possibly different and (ii) the

intracluster correlations and the variances are both common are dealt with as special

cases For all the models considered we derive the required transformed variables with

iid errors and the maximum likelihood estimates of the unknown intracluster correlations

Results of a small scale simulation study similar to that of Rao et al (1993) show that

the GLS F-test using maximum likelihood estimates of the intracluster correlations might

produce correct type I error rate irrespective of the amount of collinearity and intracluster

correlation

Keywords Equal and unequal intracluster correlations equal and unequal sample sizes

GLS F test maximum likelihood estimation size and power two-stage sampling

1 Introduction

The use of the standard F test in regression analysis of complex survey data leads to

inflated type I error rate (size) due to correlated errors in regression model appropriate for

clustered data In case of two-stage cluster samples Wu Holt and Holmes (1988) propose

a simple correction to the standard F-test which ~akes account of possible common or

heterogeneous intracluster correlations However for common intraclass correlation p they

SR Paul is Professor Department of Mathematics and Statistics University of Windsor Windsor Ontario Canada N9B 3P4 The author thanks JNK Rao for supplying details of simulation study in Rae et al (1993) and Brajendra C Sutradhar for several discussions during the earLy development of this paper

Typeset by AAAS-lEX

show by simulation that the corrected F-test performs much better than the standard Fshy

test in controlling the size for a scalar hpothesis It performs almost as well as the iterative

generalized least square (IGL8) F-test for large values of p and better than the IGL8 for

small p in controlling size Rao 8utradhar and Vue (1993) propose a simple GLS F test

which also takes account of common intracluster correlation and show also by simulation

that for both scalar and vector hypotheses the GLS F test performs as well as the corrected

F-test in controlling the size even for small p and that it leads to significant power gains

for large values of p

For the estimation of the common intra-cluster correlation p Wu et a1 (1988) use a two

step procedure and Rao et a1 (1993) use a method of fitting constants due to Henderson

(1953) Rao et a1 (1993) comment that the performance of the GLS F test using the

estimate of the common p by the two-step procedure of Wu et a1 (1988) is similar to

that of the GLS F test using the estimate of p by the method of fitting constants due to

Henderson (1953) These estimates however are method of moment type estimates and

they use data under the alternative hypothesis A disadvantage of the corrected F-test and

GLS test using such as estimate of p is that they produce inflated type I error rate with

increasing collinearity and large p (see Wu et a1 1988) A more appropriate approach

would be to esimate p using data under the null hypothesis Such an approach is taken in

the well-known score test (Rao 1947) or the C(a) test of Neyman (1959) However the

method of moment estimate of p such as that obtained by Hendersons procedure can not

be obtained under all null hypothesis situations We propose using maximum likelihood

estimates

Note that the GLS F-test proposed by Rao et a1 (1993) with common intracluster corshy

relation and variance is based on transformed data following Fuller and Battese (1973)

with iid errors In this paper we extend the GLS F-test to the general situation in which

the intracluster correaltions and the variances are possibly different The situations in

which (i) the variances are common and the intracluster correlations are possibly different

2

and (ii) the intracluster correlations and the variances are both common are dealt with as

special cases For all the models considered we derive the required transformed variables

with iid errors and the maximum likelihood estimates of the unknown intracluster correshy

lations The regression model and the associated GLS F test are given in section 2 In

section 3 we deal with the model and its variants the transformations and the maximum

likelihood estimation of the intracluster correlation(s A small scale simulation study

similar to that of Rao et al (1993) is given in section 4 to show the possible advantage of

using the maximum likelihood estimates of the intracluster correlations

2 The Regression Model and The GLS F Test

Following Fuller and Battese (1973) we consider a regression model with nested error

structure that allows for intracluster correlations

y = XfJ + c (21)

where X is an (n x k) matrix ofregression variables fJ is a vector of k regression parameters

E(c) = 0 and E(cc = D where D is positive definite The generalized least-squares

estimator of fJ is

which has cov (ffi) (XD-1X-lXD-1X Fuller and Battese (1973) show that if a

transformation matrix T can be found such that the transformed errors

euro = Tc

are uncorrelated with constant variances the generalized least-squares estimator fJ is

given by the ordinary regression of the transformed dependent variable y = Ty on the

transformed independent variable X = T X Thus for testing the vector hypothesis

Ho 0 fJ = b where 0 is a known q x k matrix of rank q( lt k) and b is a known q x 1

vector the standard F test based on the transformed data is based on

(OfJ - b) (X~X~rl (OfJ - b) qFGLS= ~~--~~~--~--~~--~~

(y - XJ) (y - Xf3) (n - k)

3

2

which has an exact F distribution with q and n k degrees of freedom where f3 =

(X X)-l Xy is the ordinary least squares estimator of f3 under the transformed model

and X~ = X (XX)-l C

We will see later that the regression model involves the variance components at or a or

the intracluster correlations Pi or p The F-distribution of the statistic FGLS is based on

the intracluster correlation parameter Pi or P and the variance parameter at or a2 being

known In practice these parameters are unknown If the parameters are replaced by some

consistent estimators the distribution of FGLS will be approximately correct For model

III Rao et al (1993)proposed estimating P by a moment type procedure due to Henderson

(1953) and Fuller and Battese (1973) which is consistent This procedure uses all data

under the full model ie under the alternative hypothesis In this paper we propose to

use instead maximum likelihood estimates of the variance and intracluster correlation

parameters under the null hypothesis

3 The Model Its Variants The Transformations

and Relevant Maximum Likelihood Estimation

31 The Model and its Variants

Consider a two stage cluster sample of n observaLions with c clusters at the first stage

of sampling and mi elements drawn from the ith-sampled cluster at the second stage

n = ~mi The model with the nested error structure is

(31)

and

where Yij is the response ofthejth element in the ith cluster Xij = (XijOXijl Xijk-r)

with XijO = 1 f3 = (f3o i3I f3k-I) is the vector of regression parameters Vi rv N (0 a~i)

and Uij rv N (0 a~i) Now denote al a~i + a~~ and Pi = a~dal Clearly Pi is the

4

intracluster correlation within the second stage units of the ith cluster Thus for two stage

cluster samples model (31) can be written as

Model I Y X3+E E N(O D)

I)( (X~Imiddot IX~) with Xi deshy0 bullwhere Y = Ylmiddotmiddotmiddot Yimiddotmiddotmiddot Ye Yi

noting the mi X k matrix with rows X~j j = 10 mi E = (E~middotmiddot E~middotmiddotmiddot E~) Ei

( Eil bull Eim D has a diagonal form Efo-Vi with Vi = (1 - Pi) Im- + PiJm- where Ip iso

bull )

1 bullbull

the p x p identity matrix and Jp is the p x p unit matrix

Different variants of model I are possible and have been dealt with by other authors

When 0 = u 2 for all i model I is identical to model (22) of Wu Holt and Holmes (1988)

which we write as

Model II y=X3+E E N (0 D)

2where D = u 2 Ef Vi and Y X 3 E and Vi are all same as those of model When uf u1

and Pi = P for all i model I is identical to the model dealt with by Campbell (1977) Scott

and Holt (1982) Rao Sutradhar and Vue (1993) We write this model as

Model III Y X3+E E N (OD)

where D u 2 EfWi and y X 3 and E are the same as those of model I and Wi = (1 shy1

P)lmi + pJmi middot

32 Transformations

The covariance matrix for the error vector E in model I is the block diagonal matrix

e h 2 _1 e _1 - _ -1 -~ D = (JjD i were Di Ui Vi Then D 2 = (JjD 2 where D - U i Vi Now Vi1 middot1middotmiddot

(1 - Pi)lmi + PiJmi which can be written as

where ti = 1 + (mi I)Pi Then it is easily verified that (see FUller and Battese 1973)

5

and euroi = (1Oi)Vi-euroi are uncorrelated with constant variances 1 Thus the matrix

Ti = (1OdVi- = 1 I lmi - I - [(l~Pd] i transforms the error vector euroUi(l-pd

to a vector of uncorrelated random variables with constant variances 1 Thus the transshy

formations for y and X are

yii = 1 (Yii - OiYiO) and xii = I (xii - OiXiO) Oi (1 - Pi) Oi (1 - Pi)

where 0i = 1- [(1- Pi) I + (mi -1) Pi] fho = ElYiimi andxiQ = E71xiimi

The transformed model can be written as yii = xij3 +uii j 1 mi i = 1 C or c

in matrix notation as Y= X3 + u where uii N(OI)or u N(O E9lm J SimilarlyIV

1

mit can be shown that for model II the matrix Ti = I [lmi - I - [(l~Pi)] J 1u(l-Pi) m transforms the error vector to to a vector of uncorrelated random variables with constant

variances 1 The transformation for y and X are

yij = 1 1 (Yij - OiYiO) and xii = 1 1 (Xij - OiXiO) O(I-Pi)2 O(I-Pi)2

where 0i = 1- [(1- Pi)1 I +(mi -1)pl ~ The transformed model then is Yij = Xij3 +uii

j = 1 mi i = 1 C where uii N(OI)or in matrix notation y = X3+u where c

u N(O E9Im) Finally for model III the transformation 1

Ti = Imi - I - [(1 - p) I + (mi-I) P ] i transforms the error vector euro to a vector of uncorrelated random variables with constant

variance 0 = 02(1- p) Clearly for model III the transformation for y and X are yij

Yii - OiYiO and xij = Xij - 0iXiQ where 0 = 1- [(1- p) I I + (mi -1)p] t The transformed

fI+ 1 modle IS Yij = Xijl- Uij J = mi t 1 c or in matrix notation Y = X 3 + u

where uij N(OO~) or u N(Oq~In)(See also Roo et al (1993))

34 Maximum Likelihood Estimates of Intraclass Correlation

We consider estimating the intracluster correlation parameters in presence of the reshy

gression parameters under the null hypothesis H0 C3 = b We assume that C i3 = b is

6

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

show by simulation that the corrected F-test performs much better than the standard Fshy

test in controlling the size for a scalar hpothesis It performs almost as well as the iterative

generalized least square (IGL8) F-test for large values of p and better than the IGL8 for

small p in controlling size Rao 8utradhar and Vue (1993) propose a simple GLS F test

which also takes account of common intracluster correlation and show also by simulation

that for both scalar and vector hypotheses the GLS F test performs as well as the corrected

F-test in controlling the size even for small p and that it leads to significant power gains

for large values of p

For the estimation of the common intra-cluster correlation p Wu et a1 (1988) use a two

step procedure and Rao et a1 (1993) use a method of fitting constants due to Henderson

(1953) Rao et a1 (1993) comment that the performance of the GLS F test using the

estimate of the common p by the two-step procedure of Wu et a1 (1988) is similar to

that of the GLS F test using the estimate of p by the method of fitting constants due to

Henderson (1953) These estimates however are method of moment type estimates and

they use data under the alternative hypothesis A disadvantage of the corrected F-test and

GLS test using such as estimate of p is that they produce inflated type I error rate with

increasing collinearity and large p (see Wu et a1 1988) A more appropriate approach

would be to esimate p using data under the null hypothesis Such an approach is taken in

the well-known score test (Rao 1947) or the C(a) test of Neyman (1959) However the

method of moment estimate of p such as that obtained by Hendersons procedure can not

be obtained under all null hypothesis situations We propose using maximum likelihood

estimates

Note that the GLS F-test proposed by Rao et a1 (1993) with common intracluster corshy

relation and variance is based on transformed data following Fuller and Battese (1973)

with iid errors In this paper we extend the GLS F-test to the general situation in which

the intracluster correaltions and the variances are possibly different The situations in

which (i) the variances are common and the intracluster correlations are possibly different

2

and (ii) the intracluster correlations and the variances are both common are dealt with as

special cases For all the models considered we derive the required transformed variables

with iid errors and the maximum likelihood estimates of the unknown intracluster correshy

lations The regression model and the associated GLS F test are given in section 2 In

section 3 we deal with the model and its variants the transformations and the maximum

likelihood estimation of the intracluster correlation(s A small scale simulation study

similar to that of Rao et al (1993) is given in section 4 to show the possible advantage of

using the maximum likelihood estimates of the intracluster correlations

2 The Regression Model and The GLS F Test

Following Fuller and Battese (1973) we consider a regression model with nested error

structure that allows for intracluster correlations

y = XfJ + c (21)

where X is an (n x k) matrix ofregression variables fJ is a vector of k regression parameters

E(c) = 0 and E(cc = D where D is positive definite The generalized least-squares

estimator of fJ is

which has cov (ffi) (XD-1X-lXD-1X Fuller and Battese (1973) show that if a

transformation matrix T can be found such that the transformed errors

euro = Tc

are uncorrelated with constant variances the generalized least-squares estimator fJ is

given by the ordinary regression of the transformed dependent variable y = Ty on the

transformed independent variable X = T X Thus for testing the vector hypothesis

Ho 0 fJ = b where 0 is a known q x k matrix of rank q( lt k) and b is a known q x 1

vector the standard F test based on the transformed data is based on

(OfJ - b) (X~X~rl (OfJ - b) qFGLS= ~~--~~~--~--~~--~~

(y - XJ) (y - Xf3) (n - k)

3

2

which has an exact F distribution with q and n k degrees of freedom where f3 =

(X X)-l Xy is the ordinary least squares estimator of f3 under the transformed model

and X~ = X (XX)-l C

We will see later that the regression model involves the variance components at or a or

the intracluster correlations Pi or p The F-distribution of the statistic FGLS is based on

the intracluster correlation parameter Pi or P and the variance parameter at or a2 being

known In practice these parameters are unknown If the parameters are replaced by some

consistent estimators the distribution of FGLS will be approximately correct For model

III Rao et al (1993)proposed estimating P by a moment type procedure due to Henderson

(1953) and Fuller and Battese (1973) which is consistent This procedure uses all data

under the full model ie under the alternative hypothesis In this paper we propose to

use instead maximum likelihood estimates of the variance and intracluster correlation

parameters under the null hypothesis

3 The Model Its Variants The Transformations

and Relevant Maximum Likelihood Estimation

31 The Model and its Variants

Consider a two stage cluster sample of n observaLions with c clusters at the first stage

of sampling and mi elements drawn from the ith-sampled cluster at the second stage

n = ~mi The model with the nested error structure is

(31)

and

where Yij is the response ofthejth element in the ith cluster Xij = (XijOXijl Xijk-r)

with XijO = 1 f3 = (f3o i3I f3k-I) is the vector of regression parameters Vi rv N (0 a~i)

and Uij rv N (0 a~i) Now denote al a~i + a~~ and Pi = a~dal Clearly Pi is the

4

intracluster correlation within the second stage units of the ith cluster Thus for two stage

cluster samples model (31) can be written as

Model I Y X3+E E N(O D)

I)( (X~Imiddot IX~) with Xi deshy0 bullwhere Y = Ylmiddotmiddotmiddot Yimiddotmiddotmiddot Ye Yi

noting the mi X k matrix with rows X~j j = 10 mi E = (E~middotmiddot E~middotmiddotmiddot E~) Ei

( Eil bull Eim D has a diagonal form Efo-Vi with Vi = (1 - Pi) Im- + PiJm- where Ip iso

bull )

1 bullbull

the p x p identity matrix and Jp is the p x p unit matrix

Different variants of model I are possible and have been dealt with by other authors

When 0 = u 2 for all i model I is identical to model (22) of Wu Holt and Holmes (1988)

which we write as

Model II y=X3+E E N (0 D)

2where D = u 2 Ef Vi and Y X 3 E and Vi are all same as those of model When uf u1

and Pi = P for all i model I is identical to the model dealt with by Campbell (1977) Scott

and Holt (1982) Rao Sutradhar and Vue (1993) We write this model as

Model III Y X3+E E N (OD)

where D u 2 EfWi and y X 3 and E are the same as those of model I and Wi = (1 shy1

P)lmi + pJmi middot

32 Transformations

The covariance matrix for the error vector E in model I is the block diagonal matrix

e h 2 _1 e _1 - _ -1 -~ D = (JjD i were Di Ui Vi Then D 2 = (JjD 2 where D - U i Vi Now Vi1 middot1middotmiddot

(1 - Pi)lmi + PiJmi which can be written as

where ti = 1 + (mi I)Pi Then it is easily verified that (see FUller and Battese 1973)

5

and euroi = (1Oi)Vi-euroi are uncorrelated with constant variances 1 Thus the matrix

Ti = (1OdVi- = 1 I lmi - I - [(l~Pd] i transforms the error vector euroUi(l-pd

to a vector of uncorrelated random variables with constant variances 1 Thus the transshy

formations for y and X are

yii = 1 (Yii - OiYiO) and xii = I (xii - OiXiO) Oi (1 - Pi) Oi (1 - Pi)

where 0i = 1- [(1- Pi) I + (mi -1) Pi] fho = ElYiimi andxiQ = E71xiimi

The transformed model can be written as yii = xij3 +uii j 1 mi i = 1 C or c

in matrix notation as Y= X3 + u where uii N(OI)or u N(O E9lm J SimilarlyIV

1

mit can be shown that for model II the matrix Ti = I [lmi - I - [(l~Pi)] J 1u(l-Pi) m transforms the error vector to to a vector of uncorrelated random variables with constant

variances 1 The transformation for y and X are

yij = 1 1 (Yij - OiYiO) and xii = 1 1 (Xij - OiXiO) O(I-Pi)2 O(I-Pi)2

where 0i = 1- [(1- Pi)1 I +(mi -1)pl ~ The transformed model then is Yij = Xij3 +uii

j = 1 mi i = 1 C where uii N(OI)or in matrix notation y = X3+u where c

u N(O E9Im) Finally for model III the transformation 1

Ti = Imi - I - [(1 - p) I + (mi-I) P ] i transforms the error vector euro to a vector of uncorrelated random variables with constant

variance 0 = 02(1- p) Clearly for model III the transformation for y and X are yij

Yii - OiYiO and xij = Xij - 0iXiQ where 0 = 1- [(1- p) I I + (mi -1)p] t The transformed

fI+ 1 modle IS Yij = Xijl- Uij J = mi t 1 c or in matrix notation Y = X 3 + u

where uij N(OO~) or u N(Oq~In)(See also Roo et al (1993))

34 Maximum Likelihood Estimates of Intraclass Correlation

We consider estimating the intracluster correlation parameters in presence of the reshy

gression parameters under the null hypothesis H0 C3 = b We assume that C i3 = b is

6

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

and (ii) the intracluster correlations and the variances are both common are dealt with as

special cases For all the models considered we derive the required transformed variables

with iid errors and the maximum likelihood estimates of the unknown intracluster correshy

lations The regression model and the associated GLS F test are given in section 2 In

section 3 we deal with the model and its variants the transformations and the maximum

likelihood estimation of the intracluster correlation(s A small scale simulation study

similar to that of Rao et al (1993) is given in section 4 to show the possible advantage of

using the maximum likelihood estimates of the intracluster correlations

2 The Regression Model and The GLS F Test

Following Fuller and Battese (1973) we consider a regression model with nested error

structure that allows for intracluster correlations

y = XfJ + c (21)

where X is an (n x k) matrix ofregression variables fJ is a vector of k regression parameters

E(c) = 0 and E(cc = D where D is positive definite The generalized least-squares

estimator of fJ is

which has cov (ffi) (XD-1X-lXD-1X Fuller and Battese (1973) show that if a

transformation matrix T can be found such that the transformed errors

euro = Tc

are uncorrelated with constant variances the generalized least-squares estimator fJ is

given by the ordinary regression of the transformed dependent variable y = Ty on the

transformed independent variable X = T X Thus for testing the vector hypothesis

Ho 0 fJ = b where 0 is a known q x k matrix of rank q( lt k) and b is a known q x 1

vector the standard F test based on the transformed data is based on

(OfJ - b) (X~X~rl (OfJ - b) qFGLS= ~~--~~~--~--~~--~~

(y - XJ) (y - Xf3) (n - k)

3

2

which has an exact F distribution with q and n k degrees of freedom where f3 =

(X X)-l Xy is the ordinary least squares estimator of f3 under the transformed model

and X~ = X (XX)-l C

We will see later that the regression model involves the variance components at or a or

the intracluster correlations Pi or p The F-distribution of the statistic FGLS is based on

the intracluster correlation parameter Pi or P and the variance parameter at or a2 being

known In practice these parameters are unknown If the parameters are replaced by some

consistent estimators the distribution of FGLS will be approximately correct For model

III Rao et al (1993)proposed estimating P by a moment type procedure due to Henderson

(1953) and Fuller and Battese (1973) which is consistent This procedure uses all data

under the full model ie under the alternative hypothesis In this paper we propose to

use instead maximum likelihood estimates of the variance and intracluster correlation

parameters under the null hypothesis

3 The Model Its Variants The Transformations

and Relevant Maximum Likelihood Estimation

31 The Model and its Variants

Consider a two stage cluster sample of n observaLions with c clusters at the first stage

of sampling and mi elements drawn from the ith-sampled cluster at the second stage

n = ~mi The model with the nested error structure is

(31)

and

where Yij is the response ofthejth element in the ith cluster Xij = (XijOXijl Xijk-r)

with XijO = 1 f3 = (f3o i3I f3k-I) is the vector of regression parameters Vi rv N (0 a~i)

and Uij rv N (0 a~i) Now denote al a~i + a~~ and Pi = a~dal Clearly Pi is the

4

intracluster correlation within the second stage units of the ith cluster Thus for two stage

cluster samples model (31) can be written as

Model I Y X3+E E N(O D)

I)( (X~Imiddot IX~) with Xi deshy0 bullwhere Y = Ylmiddotmiddotmiddot Yimiddotmiddotmiddot Ye Yi

noting the mi X k matrix with rows X~j j = 10 mi E = (E~middotmiddot E~middotmiddotmiddot E~) Ei

( Eil bull Eim D has a diagonal form Efo-Vi with Vi = (1 - Pi) Im- + PiJm- where Ip iso

bull )

1 bullbull

the p x p identity matrix and Jp is the p x p unit matrix

Different variants of model I are possible and have been dealt with by other authors

When 0 = u 2 for all i model I is identical to model (22) of Wu Holt and Holmes (1988)

which we write as

Model II y=X3+E E N (0 D)

2where D = u 2 Ef Vi and Y X 3 E and Vi are all same as those of model When uf u1

and Pi = P for all i model I is identical to the model dealt with by Campbell (1977) Scott

and Holt (1982) Rao Sutradhar and Vue (1993) We write this model as

Model III Y X3+E E N (OD)

where D u 2 EfWi and y X 3 and E are the same as those of model I and Wi = (1 shy1

P)lmi + pJmi middot

32 Transformations

The covariance matrix for the error vector E in model I is the block diagonal matrix

e h 2 _1 e _1 - _ -1 -~ D = (JjD i were Di Ui Vi Then D 2 = (JjD 2 where D - U i Vi Now Vi1 middot1middotmiddot

(1 - Pi)lmi + PiJmi which can be written as

where ti = 1 + (mi I)Pi Then it is easily verified that (see FUller and Battese 1973)

5

and euroi = (1Oi)Vi-euroi are uncorrelated with constant variances 1 Thus the matrix

Ti = (1OdVi- = 1 I lmi - I - [(l~Pd] i transforms the error vector euroUi(l-pd

to a vector of uncorrelated random variables with constant variances 1 Thus the transshy

formations for y and X are

yii = 1 (Yii - OiYiO) and xii = I (xii - OiXiO) Oi (1 - Pi) Oi (1 - Pi)

where 0i = 1- [(1- Pi) I + (mi -1) Pi] fho = ElYiimi andxiQ = E71xiimi

The transformed model can be written as yii = xij3 +uii j 1 mi i = 1 C or c

in matrix notation as Y= X3 + u where uii N(OI)or u N(O E9lm J SimilarlyIV

1

mit can be shown that for model II the matrix Ti = I [lmi - I - [(l~Pi)] J 1u(l-Pi) m transforms the error vector to to a vector of uncorrelated random variables with constant

variances 1 The transformation for y and X are

yij = 1 1 (Yij - OiYiO) and xii = 1 1 (Xij - OiXiO) O(I-Pi)2 O(I-Pi)2

where 0i = 1- [(1- Pi)1 I +(mi -1)pl ~ The transformed model then is Yij = Xij3 +uii

j = 1 mi i = 1 C where uii N(OI)or in matrix notation y = X3+u where c

u N(O E9Im) Finally for model III the transformation 1

Ti = Imi - I - [(1 - p) I + (mi-I) P ] i transforms the error vector euro to a vector of uncorrelated random variables with constant

variance 0 = 02(1- p) Clearly for model III the transformation for y and X are yij

Yii - OiYiO and xij = Xij - 0iXiQ where 0 = 1- [(1- p) I I + (mi -1)p] t The transformed

fI+ 1 modle IS Yij = Xijl- Uij J = mi t 1 c or in matrix notation Y = X 3 + u

where uij N(OO~) or u N(Oq~In)(See also Roo et al (1993))

34 Maximum Likelihood Estimates of Intraclass Correlation

We consider estimating the intracluster correlation parameters in presence of the reshy

gression parameters under the null hypothesis H0 C3 = b We assume that C i3 = b is

6

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

2

which has an exact F distribution with q and n k degrees of freedom where f3 =

(X X)-l Xy is the ordinary least squares estimator of f3 under the transformed model

and X~ = X (XX)-l C

We will see later that the regression model involves the variance components at or a or

the intracluster correlations Pi or p The F-distribution of the statistic FGLS is based on

the intracluster correlation parameter Pi or P and the variance parameter at or a2 being

known In practice these parameters are unknown If the parameters are replaced by some

consistent estimators the distribution of FGLS will be approximately correct For model

III Rao et al (1993)proposed estimating P by a moment type procedure due to Henderson

(1953) and Fuller and Battese (1973) which is consistent This procedure uses all data

under the full model ie under the alternative hypothesis In this paper we propose to

use instead maximum likelihood estimates of the variance and intracluster correlation

parameters under the null hypothesis

3 The Model Its Variants The Transformations

and Relevant Maximum Likelihood Estimation

31 The Model and its Variants

Consider a two stage cluster sample of n observaLions with c clusters at the first stage

of sampling and mi elements drawn from the ith-sampled cluster at the second stage

n = ~mi The model with the nested error structure is

(31)

and

where Yij is the response ofthejth element in the ith cluster Xij = (XijOXijl Xijk-r)

with XijO = 1 f3 = (f3o i3I f3k-I) is the vector of regression parameters Vi rv N (0 a~i)

and Uij rv N (0 a~i) Now denote al a~i + a~~ and Pi = a~dal Clearly Pi is the

4

intracluster correlation within the second stage units of the ith cluster Thus for two stage

cluster samples model (31) can be written as

Model I Y X3+E E N(O D)

I)( (X~Imiddot IX~) with Xi deshy0 bullwhere Y = Ylmiddotmiddotmiddot Yimiddotmiddotmiddot Ye Yi

noting the mi X k matrix with rows X~j j = 10 mi E = (E~middotmiddot E~middotmiddotmiddot E~) Ei

( Eil bull Eim D has a diagonal form Efo-Vi with Vi = (1 - Pi) Im- + PiJm- where Ip iso

bull )

1 bullbull

the p x p identity matrix and Jp is the p x p unit matrix

Different variants of model I are possible and have been dealt with by other authors

When 0 = u 2 for all i model I is identical to model (22) of Wu Holt and Holmes (1988)

which we write as

Model II y=X3+E E N (0 D)

2where D = u 2 Ef Vi and Y X 3 E and Vi are all same as those of model When uf u1

and Pi = P for all i model I is identical to the model dealt with by Campbell (1977) Scott

and Holt (1982) Rao Sutradhar and Vue (1993) We write this model as

Model III Y X3+E E N (OD)

where D u 2 EfWi and y X 3 and E are the same as those of model I and Wi = (1 shy1

P)lmi + pJmi middot

32 Transformations

The covariance matrix for the error vector E in model I is the block diagonal matrix

e h 2 _1 e _1 - _ -1 -~ D = (JjD i were Di Ui Vi Then D 2 = (JjD 2 where D - U i Vi Now Vi1 middot1middotmiddot

(1 - Pi)lmi + PiJmi which can be written as

where ti = 1 + (mi I)Pi Then it is easily verified that (see FUller and Battese 1973)

5

and euroi = (1Oi)Vi-euroi are uncorrelated with constant variances 1 Thus the matrix

Ti = (1OdVi- = 1 I lmi - I - [(l~Pd] i transforms the error vector euroUi(l-pd

to a vector of uncorrelated random variables with constant variances 1 Thus the transshy

formations for y and X are

yii = 1 (Yii - OiYiO) and xii = I (xii - OiXiO) Oi (1 - Pi) Oi (1 - Pi)

where 0i = 1- [(1- Pi) I + (mi -1) Pi] fho = ElYiimi andxiQ = E71xiimi

The transformed model can be written as yii = xij3 +uii j 1 mi i = 1 C or c

in matrix notation as Y= X3 + u where uii N(OI)or u N(O E9lm J SimilarlyIV

1

mit can be shown that for model II the matrix Ti = I [lmi - I - [(l~Pi)] J 1u(l-Pi) m transforms the error vector to to a vector of uncorrelated random variables with constant

variances 1 The transformation for y and X are

yij = 1 1 (Yij - OiYiO) and xii = 1 1 (Xij - OiXiO) O(I-Pi)2 O(I-Pi)2

where 0i = 1- [(1- Pi)1 I +(mi -1)pl ~ The transformed model then is Yij = Xij3 +uii

j = 1 mi i = 1 C where uii N(OI)or in matrix notation y = X3+u where c

u N(O E9Im) Finally for model III the transformation 1

Ti = Imi - I - [(1 - p) I + (mi-I) P ] i transforms the error vector euro to a vector of uncorrelated random variables with constant

variance 0 = 02(1- p) Clearly for model III the transformation for y and X are yij

Yii - OiYiO and xij = Xij - 0iXiQ where 0 = 1- [(1- p) I I + (mi -1)p] t The transformed

fI+ 1 modle IS Yij = Xijl- Uij J = mi t 1 c or in matrix notation Y = X 3 + u

where uij N(OO~) or u N(Oq~In)(See also Roo et al (1993))

34 Maximum Likelihood Estimates of Intraclass Correlation

We consider estimating the intracluster correlation parameters in presence of the reshy

gression parameters under the null hypothesis H0 C3 = b We assume that C i3 = b is

6

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

intracluster correlation within the second stage units of the ith cluster Thus for two stage

cluster samples model (31) can be written as

Model I Y X3+E E N(O D)

I)( (X~Imiddot IX~) with Xi deshy0 bullwhere Y = Ylmiddotmiddotmiddot Yimiddotmiddotmiddot Ye Yi

noting the mi X k matrix with rows X~j j = 10 mi E = (E~middotmiddot E~middotmiddotmiddot E~) Ei

( Eil bull Eim D has a diagonal form Efo-Vi with Vi = (1 - Pi) Im- + PiJm- where Ip iso

bull )

1 bullbull

the p x p identity matrix and Jp is the p x p unit matrix

Different variants of model I are possible and have been dealt with by other authors

When 0 = u 2 for all i model I is identical to model (22) of Wu Holt and Holmes (1988)

which we write as

Model II y=X3+E E N (0 D)

2where D = u 2 Ef Vi and Y X 3 E and Vi are all same as those of model When uf u1

and Pi = P for all i model I is identical to the model dealt with by Campbell (1977) Scott

and Holt (1982) Rao Sutradhar and Vue (1993) We write this model as

Model III Y X3+E E N (OD)

where D u 2 EfWi and y X 3 and E are the same as those of model I and Wi = (1 shy1

P)lmi + pJmi middot

32 Transformations

The covariance matrix for the error vector E in model I is the block diagonal matrix

e h 2 _1 e _1 - _ -1 -~ D = (JjD i were Di Ui Vi Then D 2 = (JjD 2 where D - U i Vi Now Vi1 middot1middotmiddot

(1 - Pi)lmi + PiJmi which can be written as

where ti = 1 + (mi I)Pi Then it is easily verified that (see FUller and Battese 1973)

5

and euroi = (1Oi)Vi-euroi are uncorrelated with constant variances 1 Thus the matrix

Ti = (1OdVi- = 1 I lmi - I - [(l~Pd] i transforms the error vector euroUi(l-pd

to a vector of uncorrelated random variables with constant variances 1 Thus the transshy

formations for y and X are

yii = 1 (Yii - OiYiO) and xii = I (xii - OiXiO) Oi (1 - Pi) Oi (1 - Pi)

where 0i = 1- [(1- Pi) I + (mi -1) Pi] fho = ElYiimi andxiQ = E71xiimi

The transformed model can be written as yii = xij3 +uii j 1 mi i = 1 C or c

in matrix notation as Y= X3 + u where uii N(OI)or u N(O E9lm J SimilarlyIV

1

mit can be shown that for model II the matrix Ti = I [lmi - I - [(l~Pi)] J 1u(l-Pi) m transforms the error vector to to a vector of uncorrelated random variables with constant

variances 1 The transformation for y and X are

yij = 1 1 (Yij - OiYiO) and xii = 1 1 (Xij - OiXiO) O(I-Pi)2 O(I-Pi)2

where 0i = 1- [(1- Pi)1 I +(mi -1)pl ~ The transformed model then is Yij = Xij3 +uii

j = 1 mi i = 1 C where uii N(OI)or in matrix notation y = X3+u where c

u N(O E9Im) Finally for model III the transformation 1

Ti = Imi - I - [(1 - p) I + (mi-I) P ] i transforms the error vector euro to a vector of uncorrelated random variables with constant

variance 0 = 02(1- p) Clearly for model III the transformation for y and X are yij

Yii - OiYiO and xij = Xij - 0iXiQ where 0 = 1- [(1- p) I I + (mi -1)p] t The transformed

fI+ 1 modle IS Yij = Xijl- Uij J = mi t 1 c or in matrix notation Y = X 3 + u

where uij N(OO~) or u N(Oq~In)(See also Roo et al (1993))

34 Maximum Likelihood Estimates of Intraclass Correlation

We consider estimating the intracluster correlation parameters in presence of the reshy

gression parameters under the null hypothesis H0 C3 = b We assume that C i3 = b is

6

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

and euroi = (1Oi)Vi-euroi are uncorrelated with constant variances 1 Thus the matrix

Ti = (1OdVi- = 1 I lmi - I - [(l~Pd] i transforms the error vector euroUi(l-pd

to a vector of uncorrelated random variables with constant variances 1 Thus the transshy

formations for y and X are

yii = 1 (Yii - OiYiO) and xii = I (xii - OiXiO) Oi (1 - Pi) Oi (1 - Pi)

where 0i = 1- [(1- Pi) I + (mi -1) Pi] fho = ElYiimi andxiQ = E71xiimi

The transformed model can be written as yii = xij3 +uii j 1 mi i = 1 C or c

in matrix notation as Y= X3 + u where uii N(OI)or u N(O E9lm J SimilarlyIV

1

mit can be shown that for model II the matrix Ti = I [lmi - I - [(l~Pi)] J 1u(l-Pi) m transforms the error vector to to a vector of uncorrelated random variables with constant

variances 1 The transformation for y and X are

yij = 1 1 (Yij - OiYiO) and xii = 1 1 (Xij - OiXiO) O(I-Pi)2 O(I-Pi)2

where 0i = 1- [(1- Pi)1 I +(mi -1)pl ~ The transformed model then is Yij = Xij3 +uii

j = 1 mi i = 1 C where uii N(OI)or in matrix notation y = X3+u where c

u N(O E9Im) Finally for model III the transformation 1

Ti = Imi - I - [(1 - p) I + (mi-I) P ] i transforms the error vector euro to a vector of uncorrelated random variables with constant

variance 0 = 02(1- p) Clearly for model III the transformation for y and X are yij

Yii - OiYiO and xij = Xij - 0iXiQ where 0 = 1- [(1- p) I I + (mi -1)p] t The transformed

fI+ 1 modle IS Yij = Xijl- Uij J = mi t 1 c or in matrix notation Y = X 3 + u

where uij N(OO~) or u N(Oq~In)(See also Roo et al (1993))

34 Maximum Likelihood Estimates of Intraclass Correlation

We consider estimating the intracluster correlation parameters in presence of the reshy

gression parameters under the null hypothesis H0 C3 = b We assume that C i3 = b is

6

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

a consistent set of equations Then under the null hypothesis we will have k q regresshy

sion parameters Let 8 =(80 81 8k-l-q) be the vector of regression parameters to be

estimated under the null hypothesis and Zij =(ZijO Zij1 Zijk-l-qf be the vector of

covariates corresponding to the parameters 8 Note that the regression variables as well

as the dependent variable will be defined depending on the composition of C and b We

give a few examples from Graybill (1976 p 184) First we define the dependent variable

as s Suppose under the alternative hypothesis we have 4 regression variables X 1 X2 X 3

and X4

Example 1 If C = (01 -100) b = 0 that is Ho 31 = 32 then S y Zo = Xo = 1

ZI XI + X 2 Z2 = X3 Z3 = Xl and 8 = (80818283)

Example 2 If

C 0 1 -1 0 0) b (00)( o 0 0 1 -1 bull

that is Ho 31 = 32 and 33 = 34 then s = y Zo = Xo = 1 ZI = XI + X2 Z2 X3 + X4

and 8 = (808182)

Example 3 If o

C 1

1

that is Ho 31 = 32 = 33 = 34 then s y Zo = Xo 1 ZI = (X 1 + X2 + X3 + X4) and

8 = (808d

Example 4 If

= (0 1 -2 -4 0)C o 1 2 0 0

that is Ho 31 - 232 = 433 and 31 +232 6 then s = y - 6X 1 ~X3 Zl = X2 - 2X1 - X3

Z2 = X 4 and 8 = (8081 82) Note in this example this is not the only way we can define

J

Now under the null hypothesis model (31) reduces to Sij z~j8 + Vi + Uij ) =

1 mii 1C Further let Si = (Sil Simi) Zi = (Zitmiddot Zim) ti 1+

7

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

(mi - l)pi and di 1 + (mi -1)p Then OJ is the inverse of Vi with (ti - Pi)(1 - Pi)td

in the diagonal and -pi(I - Pi)ti everywhere else and 6i is the inverse of Wi with

(d i - p)(1 - p)di in the diagonal and -pl(I- p)di everywhere else

Now following Paul (1990) the estimating equation for Pi of model I is

where p ( - =1_1 1) bull S S = j~1 (j - j5)2

and SST = L~1 (j - jb)r5

(~ z0) -1~ 0 Once the es timale of p of p is obtained the estimate of q 1 is

amp~ _1 (SS- - p-SSTIt-)t mi t t t

Again the estimating equation for Pi of model II is

n1 - Pi)--1[SSi - SSTi1 + (mi - l)pnlttl ( 1)1 0 c - Pimi m1 - ti = (33)

2(1- Pi)-1(SSi - PiSSTlti) i=1

1where PiE ( - mi _ 1 1) and SSi and SSTi are the same as in equation (32) Once the c

estimate Pi of Pi is obtained the estimate of (12 is 2 = ~ 2(1- Pi)-1(SSi - PiSSTiti) i=1

Finally the estimating equation for P of model III is

where p 9 (- =1_1 1) SS = ~ (Sj - j6)2 and SST = 2( (j j6) 6 =

(2i Z6iZi) -1 (2i Z~6iSi) The estimate of (12 is not required here

4 Simulations

For ease of comparison the simulation design the regression model and the parameter

values considered here are the same as those considered by Rao et a1 (1993) and Wu et

a1 (1988) However for completeness we describe these in what follows

8

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

We consider the nested error regression model with two covariates xl= x and X2 Z

and equal mi m)

I c (41 )

Values of (Xij) Zij) were generated from the bivariate normal distribution with additional

random effects components to allow for intracluster correlations pz and Pz on both x and

Z

(42)

iid N (0 2) iid N (0 2) iiA N (0 2) iid N (0 2)wereh vxi Uux Vzi Uuz Uxij Uux uzij u uz Px

22 d _22 h 2_2+2 d 2 uux Ux an pz - uuz u z were Ux - uux uux an u z

are correlated with covariance u uxz and Uxij and Uzij are correlated with covariance u uxz

Also let pzz = uuxzuxuz and corr (xz) = uxzuxUZ) where Uxz = Uvzz + Uuzz and corr

(xz) denotes the correlation between Xij and Zij The parameters u~x UUXZ) u~z Uuxz etc

were chosen to satisfy u 20 Px 01 pz 05 Pzx 0 and corrx z) = -033

0 033 66 88

We first generated (Vzi Vzi) from bivariate normal distribution with mean vector 0

variances u~X) U~ZI and covariance Uvxz Next we generate m = 10 independent pairs

(Uzij Uzij) j 1 m from bivariate normal distribution with mean vector (00) varishy

ances u~x u~z and covariance Uuxz The pairs (Xij Zij) j = 1 m were then obtained

from (42) using JLx = 100 and JL = 200 This three-steps procedure was repeated 10 times

to generate 10 pairs (xz) from each of c = 10 clusters

We next turn to the generation of Yij for given (Xij Zij) fJo = 10 and (fJl fJ2) combinashy

tions given in Tables 1-2 For u 2 = 10 and selected p given in Tables 1-2 (or equivalently

u and u) we generated Vi id N(O u~) and Uij id N(O u) independently and then

obtain Yijfrom (41) The simulated data (Yij Xij Zij) j = 1 m i = 1 c were

used to compute the test statistics The simulations of YiS were repeated 10000 times

for each set of (xz) values in order to obtain estimates of actual type I error rate (size)

and power of each test statistic

9

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

We considered the hypothesis PI = P2 = 0 as reported by Rao et a1 (1993) Table

1 gives size estimates of the statistics FCLS(P) and FGLS(P) using Hendersons estimate

of p and the maximum likelihood estimate of p respectively There is evidence from the

simulation that FGLS(P) gives inflated type I error rate as the corr(xz) and p increase

The statistic FGLS(P) seems to control type I error rate adequately Table 2 gives power

estimates of the two statistics Power estimates of Fcns(P) are in general larger than those

of FCLS(fi) This is because the corresponding sizes are larger

Thus the statistics FCLS with maximum likelihood estimates of the unknown intraclusshy

ter correlations might produce correct type I error rate However we do not claim any

power advantage of FGLS(P)

References

Campbell C (1977) Properties of Ordinary and Weighted least squares Estimators for

Two-Stage Samples in Proceedings of the Social Statistics Section American Statisshy

tical Association 800-805

Fuller WA and Battese GE (1973) Transformations for Estimation of Linear Models

with Nested Error Structures Journal of the American Statistical Association 68

626-632

Graybill FA (1983) Theory and Application of the Linear Model Massachusetts Wadsworth

Henderson CR (1953) Estimation of Variance and Covariance Components Biometshy

rics 9 226-252

Neyman J (1959) Optimal asymptotic tests of composite hypothesis In Probability and

Statistics The Harold Cramer Volume U Grenarder (ed) New York John Wiley

Paul SR (1990) Maximum Likelihood Estimation of Intraclass Correlation in the Analshy

ysis of Familial Data Estimating Equation Approach Biometrika 77 549-555

Rao C R (1947) Large Sample Tests of Statistieal Hypothesis concerning several pashy

rameters with applications to problems of Estimation Proceedings of the Cambridge

10

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

Philosophical Society 44 50-57

Rao JNK Sutradhar BC and Yue K (1993) Generalized Least Squares F test in Reshy

gression Analysis with two-stage Cluster Samples Journal of the A merican Statistical

Association 88 1388-139l

Scott AJ and Holt D (1982) The Effect of Two-Stage Sampling on Ordinary Least

Squares Methods Journal of the American Statistical Association 77 848-854

Wu CFJ Holt D and Holmes DJ (1988) The Effect of Two-Stage Sampling on the

F Statistics Journal of the Americal Statistical Assocation 83 150-159

11

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

Table 1 Size Estimates () of FGu(i)) and FGpounds(~) Tests of Ho

PI =0 P1 =0 ex = OS and 1

bull =OS bull = 10

Corr(xz) p FGLSO) FGu(l ) FGu(p) FGu(fJ )

-33

0 05 1 3 5

51 58 62 61 58

43 47 49 50 51

96 111 117 117 111

89 96 98 103 101

0

0 05 1 3 5

51 58 61 59 57

44 49 50 51 51

97 109 115 112 109

90 96 99 100 101

33

0 05 1 3 5

49 59 63 61 58

42 49 51 51 51

95 113 117 118 111

88 97 99 102 102

66

0 05 1 3 5

43 59 68 76 73

35 43 48 51 53

89 115 125 133 131

81 97 103 108 106

88

0 05 10 30 50

43 60 69 75 75

35 43 48 49 50

88 114 123 132 129

81 95 101 107 105

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100

Table 2 Power Estimates () of FGu(p) and FGLSlaquo(J) Tests of No Pl = 0 and P =0 CI =051 vs Specified Alternatives

With c=lO m=1O and corr(xz)= 0 33bull66

CI =05 CI =1

PI P p FGu(p) FGu(p) FGLS(p) FGu(P)

corr(xz)=O

1 1

0 05 10 30 50

382 382 380 417 521

359 354 353 396 505

517 509 506 545 646

498 483 480 526 631

2bull2

0 05 10 30 50

926 919 917 947 983

917 906 905 939 980

961 954 955 971 993

958 948 947 966 992

corr(xz)=33

1 1

0 05 10 30 50

502 499 502 561 685

481 472 478 539 671

621 621 624 675 794

609 601 601 659 783

2 bull 2

0 05 10 30 50

978 975 975 990 998

975 971 970 988 998

990 988 989 995 999

989 987 986 995 999

corr(xz)=66

1 1

0 05 10 30 50

590 601 608 683 813

576 563 590 665 801

707 711 722 786 886

700 699 705 776 879

2bull2

0 05 10 30 50

993 993 994 998 100

993 993 993 997 100

998 998 998 999 100

998 998 997 999 100