Predicting Realized Cluster Parameters from Two Stage ... · two-stage random permutation model for cluster sampling, unequal size clusters complicates the tracking of target random

C03ed02.doc 8/4/2003 3:07 PM i

Predicting Realized Cluster Parameters from Two Stage Samples of Unequal Size Clustered Population

Edward J. Stanek III

Department of Biostatistics and Epidemiology 404 Arnold House

University of Massachusetts 715 North Pleasant Street

Amherst, MA 01003-9304 USA [email protected]

Julio Singer

Departamento de Estatística Universidade de São Paulo

São Paulo, Brazil [email protected]

C03ed02.doc 8/4/2003 3:07 PM ii

ABSTRACT

C03ed02.doc 8/4/2003 3:07 PM iii

Contact Address: Edward J. Stanek III Department of Biostatistics and Epidemiology 404 Arnold House University of Massachusetts at Amherst Amherst, Ma. 01002 Phone: 413-545-3812 Fax: 413-545-1645 Email: [email protected] KEYWORDS: super-population, best linear unbiased predictor, two-stage

sampling, random permutation, optimal estimation, design-based inference ACKNOWLEDGEMENT. Many people contributed to the development of ideas

that lead to this manuscript, including Dalton Andrade, Heleno Bolfarine, John Buonaccorsi, Viviana Lencina, and Oscar Loureiro. This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), FINEP (PRONEX), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848), USA.

C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)

1

1. INTRODUCTION

We discuss methods for predicting a cluster mean or total based on a two-stage clustered random sample from a finite population formed from unequal size clusters. Common applications of mixed model theory {Goldberger, 1962 #531}{Henderson, 1984 #1606}{McLean, 1991 #175}{Robinson, 1991 #723}{McCulloch, 2001 #1577} assume infinite size populations and clusters to develop best linear unbiased predictors (BLUP). In the survey sampling literature, predictors have been developed using super-population models {Scott, 1969 #143}{Bolfarine, 1992 #1027}{Valliant, 2000 #1578}. When clusters are equal in size, two-stage random permutation models have been used for prediction that include response error {Stanek, 2003a #1697}.

We follow a similar approach to {Scott, 1969 #143}, but base the probability model solely on the random permutation framework that underlies a two-stage cluster sample. If such a sample is selected, the results are based on the sampling design. No hypothetical super-population is required, and account is made for different size clusters in the finite population. Target parameters correspond to linear combinations of latent values for clusters. The objective of this paper is to extend the results of {Stanek, 2003a #1697} to predicting such target parameters.

The strategy builds on the methods for predicting realized random effects in a population formed by equal size clusters. With mixed model and super-population model approaches, extending methods to unequal size cluster settings is straightforward. Unlike survey sampling results for estimation {Cochran, 1977 #1067}, the different cluster sizes do not alter the analysis. In fact, that is one of the appeals of mixed model methods. Using a two-stage random permutation model for cluster sampling, unequal size clusters complicates the tracking of target random variables through the development of predictors. Tracing through this development, while more complex, retains a clear connection between the target random variable, and the predictor. Avoiding these complications with other methods comes with a cost. The cost is the ambiguity introduced in the interpretation of predictors.

As an example, we consider a cluster randomized controlled trial designed to evaluate the relative effectiveness of a training program for physician-delivered nutrition counseling, alone and in combination with an office support program, on dietary fat intake and blood LDL-cholesterol levels in patients with hyperlipidemia (the Worcester-Area Trial for Counseling in Hyperlipidemia (WATCH) Trial){Ockene, 1996 #890}. Forty-five primary care internists were selected from 60 internists at the Fallon Clinic, a central Massachusetts health maintenance organization (HMO), with each internist randomly assigned to one of three conditions: (I) Usual Care; (II) Physician nutrition counseling training; and (III) Physician nutrition counseling training plus an office support program. Twelve hundred and seventy-eight patients with blood LDL-cholesterol levels in the highest 25th percentile, having previously-scheduled physician visits, were recruited into the study. The number of sampled patients per cluster ranged from 1 to 43.

The WATCH study had a cluster randomized design, with patients in physician practices forming the basic clusters. We focus our attention on a single dependent variable, percent of Kcal from saturated fat, which we consider as a difference constructed by taking the baseline minus one year measure for each patient. There was considerable variability in the magnitude of the average patient’s saturated fat change between physician’s practices within a given intervention. We discuss estimating the change for eligible patients in a


2

physician’s practice that accounts for the practice size. The cluster corresponds to a realized physician.

1.1 Basic Notation

We define notation and terminology in the context of two related literatures, mixed

models {Diggle, 2002 #378}{Verbeke, 2000 #1703}{Searle, 1992 #708}{McCulloch, 2001 #1577} and prediction based super-population sampling models {Valliant, 2000 #1578}{Bolfarine, 1992 #1027} that address the basic problem. Let a finite population be defined by a listing of 1,..., st M= units in each of 1,...,s N= clusters, where response for subject t in cluster s is given by sty . We assume that a two-stage cluster sample is to be selected from this population (using without replacement sampling), with sm units selected for each of n clusters.

We represent the sample of clusters by defining a random permutation of the clusters in the original frame. A new label, 1,...,i N= , is assigned to the positions corresponding to different clusters the permutation. Without loss of generality, we assume that the first

1,...,i n= positions contain the clusters that will occur in the sample. In a similar manner, we distinguish the listed subjects in cluster s from a random permutation of units in cluster s , whose positions are labeled by 1,..., sj M= . We again assume that the first 1,..., sj m= positions contain the units in the sample if cluster s is selected. Using these ideas and notation, we represent the population under a two-stage permutation as an ordered list of

1

N

ss

M=

= ∑ random variables, where a model for the unit in the thj position in the cluster in

the thi position is represented by ijY . For ease of exposition, we refer to the cluster that will

occupy position i in the permutation as the thi primary sampling unit (PSU), and to the unit that will occupy position j in the permutation as the thj secondary sampling unit (SSU). PSUs and SSUs are indexed by positions ( i and j ), whereas clusters and units are indexed by labels ( s and t ) in the finite population. Thus, ijY represents the response for the thj

SSU in the thi PSU. We think of sampling as selecting a permutation. Prior to sample selection, the

cluster that will correspond to PSU i is random, and hence the expected value of response for the cluster in the thi position is a random variable. Once the sample has been selected, for i n≤ , we will observe which cluster corresponds to a particular PSU in the sample. We refer to that cluster as the realized PSU, and to the average (or total) for units in that cluster as the latent value for the realized PSU (noting that such effects are often defined as deviations relative to an overall mean). In this context, mixed model and super-population modeling strategies may be advocated for predicting a realized PSU, or linear combinations of PSUs. We first review these approaches. We then define the context for finite population sampling prediction setting the stage for development of the main results in the subsequent section.

1.2 Mixed Models


3

In the mixed model literature, the model for the response of the thj SSU in the thi PSU is given by

ij i ijY B Eµ= + + (1.1) where µ corresponds to the overall population mean, iB is a random effect that corresponds to the deviation from the population mean for the thi selected PSU, and ijE is the random

error corresponding to the deviation of the thj selected SSU from the thi selected PSU mean. Typically, additional assumptions are made. The random effects, iB , are assumed

independent and identically normally distributed, ( )20 , iB iid N σ∼ , and independent of the random errors, also assumed independent and identically distributed for all j in i ,

( )2 0 , ij iE iid N σ∼ . The model and assumptions enable the joint distribution of fixed and random effects to be specified. This density, when maximized jointly with respect to the fixed and random effects, leads to Henderson’s mixed model equations{Henderson, 1959 #751}. In model (1.1), assuming known variances, the estimator of the fixed effects is the

weighted least squares estimator, 1

ˆn

i ii

w Yµ=

= ∑ where *

* 1

1/

1/

ii n

ii

vw

v=

=

∑, 1

im

ijj

ii

YY

m==∑

, and

2i

ii

vmσ

σ 2= + . Solution of the mixed model equations results in the BLUP of iB .

Combining the estimator of µ and the predictor of iB , the predictor of iBµ + is given by

( )ˆ î ik Yµ µ+ − (1.2),

where 2

ii

kvσ

= .

In practice, the variance parameters are replaced by the maximum likelihood or restricted maximum likelihood estimates. {McCulloch, 2001 #1577} note that there are many other derivations of the BLUP estimates (discussed extensively by {Robinson, 1991 #723}). A derivation of particular note is the Bayesian derivation, where (1.2) is the expected value of the posterior distribution, assuming normality. The normality assumptions are not necessary, and can be replaced by knowledge of the first and second moments. None of these derivations or discussions account for the impact that different cluster size has on the predictors.

1.3 Super-population Sampling Models Predictors of a linear combination of fixed and random effects (corresponding to the

average response for a cluster) in two-stage sampling settings were developed by {Scott, 1969 #143} using a model based survey sampling approach. In their approach, a finite population is viewed as the realization of a set of random variables. We refer to the set of random variables as the super-population, and specify a model for it. The parameters of interest are linear combinations of values in the finite population, such as the such as the


4

latent value for a realized PSU. Such parameters can be defined for all clusters in the finitie population. Since only a part of the finite population is observed in a sample, the essential statistical problem is how best to predict the values of the remaining random variables. Predictors are constructed using the joint distribution assumed for the super-population {Royall, 1976 #986}{Bolfarine, 1992 #1027}{Valliant, 2000 #1578}. This approach to statistical inference in survey sampling is called model-based, since inference is based on the model assumed for the super-population.

We define the nested super-population by a model, and first and second moments. The random variables ijY , 1,..., ; 1,..., ii N j M= = in the super-population are assumed to

satisfy ( )ijE Y µ= and

( ) 2 2

2

cov , when ;

when ;0 otherwise.

ij kl iY Y i k j l

i k j l

δ σ

δ

= + = =

= = ≠=

{Scott, 1969 #143} used this super-population model to derive predictors of linear combinations of elements in the super-population. Of special interest is the linear combination that corresponds to the latent value of a PSU. Scott and Smith frame the statistical problem as prediction of the remaining unobserved SSUs in a PSU. A Bayesian derivation (assuming the super-population is normally distributed), and a distribution free derivation (based on minimizing the expected mean squared error (MSE) of a linear predictor assuming that PSUs and SSUs within PSUs are exchangeable) are given, and shown to result in the same predictor given by

( ) ( )* * *ˆ ˆ1i i i i if Y f k Yµ µ + − + −

( ) ( )* * *ˆ ˆ1i i i i i if Y f f k Yµ µ + − + − (1.3)

if the PSU is partially realized in the sample, or *µ if the PSU is not included in the sample,

where * *

1

ˆn

i ii

w Yµ=

= ∑ , where ii

i

mf

M= ,

**

**

* 1

1/

1/

ii n

ii

vw

v=

=

∑,

2* 2 ii

i

vmσ

δ= + and **ii

kvδ 2

= . This

predictor has an appealing interpretation as the weighted sum of two terms: the sample mean for the realized PSU, and predictors of the remaining SSUs for the realized PSU. The weighting factors are the proportion of SSUs that are observed and the proportion that are not observed. For PSUs that are not in the sample, the predictor simplifies to the weighted sample mean.

There is an obvious similarity between equation (1.2) and equation (1.3). When if is small enough that the first term in equation (1.3) can be ignored, and 1 1if− ≅ , the two equations appear to be identical, with the exception of different representations for the variance components. The best linear unbiased predictor (in equation (1.2)) predicts unobserved SSUs in a realized PSU. If the number of SSUs in the realized PSU is so large that the observed SSUs are a negligible fraction of the total for the PSU, then the realized PSU mean is estimated by the predicted values of the unobserved SSUs for the PSU. When a non-trivial portion of the SSUs are observed in the sample for a realized PSU, then the estimate of the realized PSU mean is a weighted average of the sampled, and predicted


5

values of the unobserved SSUs. This provides a strong intuitive appeal to the prediction- based approach as advocated by Valliant, Dorfman, and Royall (2000).

As discussed by Scott and Smith, the assumption of normality is not needed. Instead, the model assumptions correspond to assuming PSUs are exchangeable, and SSUs are exchangeable within PSUs. The predictor is assumed to be a linear function of the sample responses. Coefficients of the predictor are derived so to minimizes the expected MSE over the super-population. The MSE is required to be bounded, leading to a constraint in the minimization.

2. THE TWO-STAGE PERMUTATION MODEL FOR A FINITE

POPULATION We derive predictors of realized random effects from a two-stage clustered sample of

a finite population using the prediction based approach to survey sampling {Royall, 1976 #986}. A principal distinction is the use of a probability model based directly on the two-stage sampling as in {Stanek, 2003a #1697}, and retention of the identifiability of clusters throughout the process. The results, while similar in form to those of {Scott, 1969 #143}{Valliant, 2000 #1578}, do not require positing a super-population. We first define the population and target parameters. Next we define the random variables representing a two-stage random permutation of the population, and variations in the representation that preserve representation of the cluster’s identity.

2.1 Finite Population Parameters and Re-parameterizations

We begin by defining finite population parameters and representing response for a

subject in a simple deterministic model. The mean and variance of cluster s is defined as

1

1 sM

s stts

yM

µ=

= ∑ and ( )22

1

1 1 sMs

s st sts s

My

M Mσ µ

=

−= −

∑

for 1,...,s N= (where we use the survey sampling definition of the parameter 2sσ ). Similarly,

the overall mean, and the variance between cluster means is defined as

1

1 N

ssN

µ µ=

= ∑ and ( )22

1

1 1 N

ss

NN N

σ µ µ=

− = −

∑ .

Using these parameters, we represent the potentially observable response for subject t in cluster s as

st s sty µ β ε= + + (2.1) where ( )s sβ µ µ= − is the deviation of the mean for cluster s from the overall mean, and

( )st st syε µ= − is the deviation of subject t 's response (in cluster s ) from the mean for cluster s . Model (2.1) is called a derived model {Hinkelmann, 1994 #1023}. Defining

( )1 2 N

′′ ′ ′=y y y y where ( )1 2 ss s s sMy y y ′=y , model (2.1) can be summarized

as µ= + +y X Zβ ε (2.2)


6

where =X 1 , 1 s

N

MN s× == ⊕Z 1 and ( )1 2 Nβ β β′ =β . Here, a1 is an 1a× column vector

of ones, and 1

N

ss=⊕A denotes a block diagonal matrix with blocks given by sA {Graybill, 1983

#1613}, and ε is defined similarly to y . None of the terms in model (2.2) are random variables.

2.2 Random Variables and The Two Stage Random Permutation Model

We define a stochastic model that parallels model (2.2) for two-stage cluster

sampling. The sample of clusters corresponds to the first n clusters in a permutation of the vectors of clusters, sy . Similarly, the sample of subjects in cluster s corresponds to the first

sm subjects in a permutation of subjects the sample clusters. With this goal in mind, we define random variables whose realization corresponds to a two-stage permutation of the population. Assuming each realization of the permutation is equally likely, the random variables formally represent two-stage sampling {Cochran, 1977 #1067}.

We use different terms and indices for the finite population and the permuted population. In the finite population, we index clusters by 1,...,s N= and subjects by 1,..., st M= . In a two-stage permutation of the finite population, we refer to the permuted clusters as primary sampling units (PSUs), and the permuted subjects in a PSU as the secondary sampling units (SSUs). We index the positions of PSUs in the permutation by

1,...,i N= and the positions of SSUs by 1,..., sj M= . In the finite population, we represent the response for subject t in cluster s by sty ; in the permuted population, we represent the random variable for response of the subject in position j for the cluster in position i by ijY .

Sample indicator random variables relate sty to ijY . The indicator random variable

isU takes on a value of one when PSU i is cluster s , and a value of zero otherwise; the

indicator random variable ( )sjtU takes on a value of one when SSU j in cluster s is subject t ,

and zero otherwise. As a consequence, the random variable corresponding to PSU i and SSU j in a permutation is given by

( )

1 1

sMNs

ij is jt sts t

Y U U y= =

= ∑∑ (2.3).

Defining ( ) ( )( ) ( ) ( )1 2 s

s s s sM=U U U U where ( ) ( ) ( )( )( )

1 2 s

ss sst t t M tU U U ′=U is an 1sM ×

column vector, a vector representing permutations of subjects in a given cluster is given by the 1sM × dimension vector ( )s

sU y . With this notation, the SSUs for the cluster in position i

in a permutation of clusters are of the form ( )sis sU U y . The dimension of this vector depends

on which cluster occurs in position i in a permutation, as determined by the realization of the random variables isU , 1,...,s N= .

Since these sizes differ for different clusters, when cluster vectors are permuted, the interpretation of which unit corresponds to SSU j in PSU i in a vector representing a two-stage permutation of the population is problematic. For example, if 3N = , 1 2 2M M= = , and


7

3 3M = , consider two permutations of clusters. Suppose the first permutation has cluster 1s = in position 1;i = 2s = in position 2;i = 3s = in position 3i = . Assume the second

permutation has clusters 3, 1, and 2 in positions 1,...,3i = , respectively. We can represent the population by a random vector for the first permutation as

( )( ) ( )( ) ( )( )1 2 31 2 3

′ ′ ′ ′

U y U y U y , and for the second permutation as

( )( ) ( )( ) ( )( )3 1 23 1 2

′ ′ ′ ′

U y U y U y . Although both vectors are of dimension 1× (where

7= ), the third SSU in the first permutation is in PSU 2i = , while the third SSU in the second permutation is in PSU 1i = . The position of the SSUs in the permuted population is not sufficient to retain the identity of the PSU for the SSU. Basically, the problem is that the dimension of the random variables in a PSU depends on the dimension of the cluster that is realized. This problem is a result of limitations in the traditional representation used for random variables in the context of two stage permutations. The ambiguity is a result of inadequate notation; it gives rise to ambiguity in analysis and interpretation.

The notational ambiguity is avoided by expanding the set of random variables that

represent the two-stage permutation to a vector of dimension 2

1

1N

ss

N M=

× ∑ whose elements

are of the form ( )* sisjt is jt stR U U y= . We arrange these random variables in the vector *R where

* * * *1 2 N+ + +

′ ′ ′ ′=

R R R R with * * * *1 2 ss s s sM+

′ ′ ′ ′=

R R R R and where

( )* ( )sst s t sty= ⊗R U U is of dimension 1sNM × where ( )1 2s s s NsU U U ′=U for

1,...,s N= . The position of the random variables in *R uniquely identify a cluster and unit. Using the expanded representation and the parameterization from (2.2), the mixed model is represented by

1 1 1 1 1

2 2 2 2 2*

N N N N N

ββ

µ

β

+

+

+

= + + +

X X 0 0 X 0 0X 0 X 0 0 X 0

R E

X 0 0 X 0 0 X

εε

ε

(2.4)

where 2sNM

ssNM

=1

X and s

s

NMs M

sNM+ = ⊗1

X I .

Expressions for the expected value and variance of *R can be determined using elementary properties of the indicator random variables. Under the two-stage random permutation model, using the subscript 1ξ to denote expectation with respect to permutations of the clusters and using the subscript 2ξ to denote expectation with respect to permutations of units in a cluster,


8

( )1 1 1

2 2 2*

N N N

Eξ ξ

µµ

µ

1 2

+

+

+

= +

X 0 0 X 0 00 X 0 0 X 0

R

0 0 X 0 0 X

ε , (2.5)

and

( )

( )

1

1 2

1 1

2 2

* 221 1

1 11 1

2 22 2

1 1var1 1

11

s

s

N N

MN Ms s Nst M s s Ns t

s s s

M MN N

M MN N

M MN N N N

N N

yM M N N M

M M

M MN N

M M

ξ ξ = =

′ ′ = ⊕ ⊕ − ⊗ ⊗ + ⊗ ⊗ − −

⊗ ⊗ ⊗ ⊗

⊗ ⊗ ⊗ ⊗

− −

⊗ ⊗ ⊗ ⊗

Jy y IR P y y P

1 1y P y P

1 1y P y P

1 1y P y P

′

(2.6)

where 1a a aa= −P I J .

2.3. Defining Random Variables of Interest

We assume that there is interest in a linear combination of PSU totals, *P ′= g R or

means defined by * * *P ′= g R where ( )1 s s

N

N M Ms=

′′ ′ ′ ′= ⊕ ⊗ ⊗ g 1 1 b 1 (for totals) or

*

1

s

s

N MN Ms

sM=

′ ′′ ′ ′= ⊕ ⊗ ⊗

1g 1 1 b (for means), where ( )( )ib=b is an N -dimensional column

vector of constants. Of principal interest is the linear combination that defines the total for

PSU i given by *i iP ′= g R , or the mean for PSU i given by * * *

i iP ′= g R , where

( )1 s s

N

i N M i Ms=

′′ ′ ′ ′= ⊕ ⊗ ⊗ g 1 1 e 1 (2.7)

and


9

*

1

s

s

N Mi N M is

sM=

′ ′′ ′ ′= ⊕ ⊗ ⊗

1g 1 1 e , (2.8)

where ie is an N -dimensional column vector with the thi value equal to one, and other values

equal to zero. Note that

1

N

i is s ss

P U M µ=

=∑ , (2.9)

while

*

1

N

i is ss

P U µ=

=∑ . (2.10)

We develop predictors of the random variables given by (2.9) and (2.10) next.

3. PREDICTING A PSU TOTAL OR MEAN BASED ON A TWO-STAGE SAMPLE

We develop a predictor of a PSU total (2.9) (or mean (2.10)) based on the two-stage

without replacement simple random sampling probability model (2.4), where sampling

results in a single non-stochastic measure on each of sm SSUs from each of the n PSUs. We

require the predictor to be a linear function of the sample, to be unbiased, and to minimize

the expected value of the mean squared error (MSE). The basic strategy is given in many

places {Scott, 1969 #143}{Royall, 1976 #986}{Bolfarine, 1992 #1027}{Valliant, 2000

#1578}. The elements of *R are first partitioned into a sampled and remaining portion. We

assume that the elements in the sampled portion will be observed, and express iP as the sum

of two parts, one which is a function of the sample, and the other which is a function of the

remaining random variables. Then, requiring the predictor to be a linear function of the

sampled random variables and to be unbiased, coefficients are evaluated that minimize the


10

expected value of the mean squared error, ( )1 2ˆvar i iP Pξ ξ − . This general development was

given in Theorem 2.1 by {Royall, 1976 #986}.

TO HERE 2/12/03

Conceptually, Royall’ theorem can be applied directly to form predictors, but

practically is complicated by singularities in ( )*var R . The singularities are the result of the

expansion of random variables needed to explicitly track clusters. We avoid these

complications by developing solutions to a simpler problem based on a ‘collapsed’ set of

random variables, Y . To do so, we first express * *′ ′= +R A Y B R . Different definitions of

Y are possible. For definitions of Y that result in *i iP ′ ′= g A Y and *′ =B R 0 , we develop

predictors of iP based solely on Y . It turns out that in this setting, unbiased predictors can

be developed only when the sampling fraction is constant for all clusters. In settings where

the sampling fraction differs between clusters, we develop an unbiased predictor of

i iP• • •′= g Y , where ( )2

*i iP E Pξ

•= is given by (2.10).

The key step in these developments is an appropriate definition of the ‘collapsed’ set

of random variables, Y . We consider three different definitions of Y . The first definition,

*′=Y C R , collapses random variables to PSU totals for the sample and remainder. The

second definition, * * *′=Y C R , collapses random variables to scaled PSU totals for the

sample and remainder where the scaling factor is the cluster size. A third strategy,

*• •′=Y C R , is to collapse random variables to PSU means for the sampled and remaining

units. We proceed to outline details of each of these developments.

3.1. Predictors of PSU Totals Based on Collapsing to Cluster Totals


11

We develop the best linear unbiased predictor of (2.9) based on collapsing *R to a

vector of PSU totals for the sample and remainder such that *′=Y C R where

( )1 2 N′ ′ ′ ′=C C C C and

( ) ( )

( )2

1

21

ss s

ss s s

s

m M ms M N

N NM M mm

× −

× −×

′ ′′ = ⊗ ⊗ ′

1 0C 1 I

0 1 (3.1).

For PSU i , the corresponding elements of Y are given by ( )

1

11

N

is s s sIs

N

is s s sIIs

U f M Y

U f M Y

=

=

−

∑

∑ where

( )

1 1

1 s sm Ms

sI jt stj ts

Y U ym = =

=

∑ ∑ and ( )

1 1

1 s s

s

M Ms

sII jt stj m ts s

Y U yM m = + =

= −

∑ ∑ . Defining

( ) 1

21

NN

s s s Ns N−

=

′ ′= ⊕ ⊗

1A C C C I , and

( ) ( ) ( )1

21 1 1

ss

s ss s

N N N mMs s s N N ss N M NMs s s M msM

−

= = = −

′ ′ ′ = ⊕ ⊗ ⊕ + ⊕ ⊗ ⊗ + ⊗

P 0JB C C C P I C I P I0 P ,

we can express * *′ ′= +R A Y B R (see c02ed34.doc). Then using (2.7), we find that *

i i iP ′ ′= =g R g Y , (3.2) where

2i i′ ′ ′= ⊗g e 1 . (3.3)

Also, ( ) ( )1 2 1 2

*var vari iξ ξ ξ ξ′ ′=g R g Y (see c02ed37.doc, p1). This provides the rational for

developing predictors of (3.2) based on Y . We highlight the main results for the development. More details of the development

are given by Stanek (2002, c02ed11, 12v1, 34-38.doc). Using sC defined by (3.1), model (2.4) simplifies to

( )2 22 1

f fiN NN f i fi

TT T

ττ τ×

= ⊗ + + − −

Y 1 I I E (3.4)

where ( )1

N

fi is s s fs

T U f τ τ=

= −∑ , ( )1

N

i is ss

T U τ τ=

= − ∑ , 1

N

s ss

f

f

N

ττ ==

∑, 1

N

ss

N

ττ ==

∑,

s s sMτ µ= , ss

s

mf

M= , and ′=E C E . Model (3.4) is a mixed model where the expected

value and variance of Y are given by


12

( ) ( )2f

Nf

Eξ ξ

ττ τ1 2

= ⊗ −

Y 1 I (3.5)

and

( )1 2

2

1

2

1

0var

0 1 1 1

11 11

Ns s ss s

Ns s s s

Nf fs sN

N ss f fs s

f f fMf f fN

f fN

f fN N

ξ ξσ

τ ττ

τ τ τ τ

=

=

′ = ⊗ − + − − − ′′ − ⊗ − − − − −−

∑

∑

Y I

JI

. (3.6)

The model contains two types of random effects. One random effect, iT , represents deviations of cluster totals about the simple average cluster total for the thi selected cluster. The second random effect, fiT , represents deviations of the expected cluster sample totals

about the simple average expected cluster sample total for the thi selected cluster We develop a predictor of (3.2) corresponding to the total of the thi PSU in the

permutation. We require the predictor to be a linear function of the sampled random variables, to be unbiased, and to minimize the expected value of the mean squared error. First, we re-arrange Y into a sampled and remaining vector by pre-multiplying by

I

II

=

KK

K where

( )( )

21 0I n n N nn N × −×

= ⊗

K I 0 (3.7)

and

( )

( )( )

( )

22

0 1n n N n

IIN n N N

N nN n n

× −

− + × −− ×

⊗ = ⊗

I 0K

0 I I (3.8)

resulting in I

II

=

YK Y

Y. In a similar manner, we define [ ]2

IN

II

= ⊗

XK 1 I

X where

( )1 0I n= ⊗X 1 and ( )2

0 1nII

N n−

⊗ = ⊗

1X

1 I. In terms of the partitioned random variables,

i iI I iII IIP ′ ′= +g Y g Y where iI iI′ ′=g e and ( )2iII iI iII′′ ′ ′= ⊗g e e 1 , and ( )i iI iII′ ′ ′=e e e where iIe

is an 1n× vector, and iIIe has dimension ( ) 1N n− × .

Collection of the study data will result in realizing the values of IY . We require the predictor of iP to be a linear function of the sample data, ( )i iI IP ′ ′+g a Y= and be unbiased


13

such that ( )1 2 i iE P Pξ ξ − = 0 . Since ( ) fI I

fEξ ξ

ττ τ1 2

= −

Y X and ( ) fII II

fEξ ξ

ττ τ1 2

= −

Y X ,

the unbiased constraint implies that ( ) 0fI iII II

f

ττ τ

′ ′− − a X g X = , or that I iII II′ ′=a X g X .

This expression simplifies to ( ) ( )0 1n iII N n−′ ′=a 1 e 1 (3.9)

which is clearly impossible, since 0 can not equal 1. As a result, unbiased predictors of (3.2) are not possible for model (3.4). Suppose, however, that second stage samples are selected with probability

proportional to size, such that s

s

mf

M= for all 1,...,s N= . Then f fτ τ= , and the unbiased

constraint can be expressed as ( ) 01I iII II

ffτ

′ ′− − a X g X = implying that

( ) 01I iII II

ff

′ ′− = − a X g X . The unbiased constraint now simplifies to

1n iII N n

ff−

−′ ′= +a 1 e 1 . These conditions will be satisfied if s sn

s

M mm−′ =a 1 for all

1,...,s N= when 1,...,i n= (corresponding to predicting a PSU in one of the first n positions

in the permutation), or if sn

s

Mm

′ =a 1 for all 1,...,s N= when 1,...,i n N= + (when predicting

a PSU in one of the last 1,...,n N+ positions in the permutation (see c02ed11.doc)). We assume second stage samples are selected with probability proportional to size in

the subsequent developments, and proceed to develop a predictor based on minimizing the expected value of the mean squared error. Using this notation, model (3.4) simplifies to

τ= + +Y X Z T E (3.10)

where 1N

ff

= ⊗ −

X 1 , 1N

ff

= ⊗ −

Z I , ( )1 2 Nτ τ τ ′=τ , and ( )Nτ= −T U 1τ .

Model (3.10) is a mixed model for cluster totals. The expression for the variance (3.6) simplifies to (see c02ed12v1.doc, p8)

( ) ( )1 2

2

2

0var

0 1 1 1

1 1

e N

NN

f f ff f f

f ff fN

ξ ξ

τ

σ

σ

+

′ = ⊗ − − − − ′ + − ⊗ − −

Y I

JI

(3.11)


14

where ( )2

2 1

1

N

ss

Nτ

τ τσ =

−=

−

∑, and

2

2 1

N

s ss

e

M

N

σσ =

+ =∑

.

The best linear unbiased predictor of the thi selected cluster total is given by

( )1,

ˆ ˆ î iI I iII II I II I I IP α α− ′′ ′+ + −

g Y g X V V Y X= where I nf=X 1 , ( )1

1

n

IIN n

ff

f−

−

= ⊗ −

1

X1

,

ˆ nIfn

α′

=1

Y , 2 2

I n nf

fvN

τσ= −V I J , 2 2ev f τσ σ+ += + , 2 2 2

eτ τσ σ σ+ += − and

( )( ) ( )

22

, 21 1

1I II n nn N n n N n

ff fN fτ

τσ

σ + × − × −

= − − ⊗ − V I 0 J J (see c02ed12.doc and

c02ed42.doc for details). Using these expressions,

( )1 1ˆ N n nni iI I iI n I iII I

fP fkf n f n

− × − ′ ′ ′= + + +

JJe Y e P Y e Y (3.12)

where 2

kvτσ += . When in addition, all cluster sizes are equal, (3.12) simplifies to

( )i iI I iI n n I iII N nP m M m Y k M Y−′ ′ = + − + + e Y e 1 P Y e 1

where 1

m

ijj

i

YY

m==∑

, ( )1 2I nY Y Y ′=Y , and 1 1

n m

iji j

YY

nm= ==∑∑

. (see c02ed39.doc and

c02ed42.doc). The variance of (3.12) when i n≤ simplifies to

( ) ( ) ( ) ( ) ( )1 2

2 2 2 21 1 1ˆvar 1 1 1 1 1 1i i eP P f f f k fk fn n nfξ ξ τσ σ+ +

− = − − + − − + − − +

and when i n> reduces to ( )1 2

2 2 21 1ˆvar i i efP P

n fξ ξ τ τσ σ σ +

−− = + +

. (see c02ed49.doc)

3.2. Predictors of PSU Means based on Weighted PSU Totals Predictors of the PSU mean (2.10) can be developed from a collapsed set of random

variables corresponding to a weighted PSU totals by using * ss

sM′

′ =C

C in lieu of (3.1) to define

* * *′ =C R Y where ( )* * * *1 2 N′ ′ ′ ′=C C C C . The elements in *Y corresponding to PSU i


15

are given by ( )

1

11

N

is s sIs

N

is s sIIs

U f Y

U f Y

=

=

−

∑

∑. Defining ( )1 2 Nf f f ′=f leads to the model

(similar to (3.5)) that

( )* *N

NN

′ = ⊗ ′−

f1Y E

1 fµ + (3.13)

where * *′=E C E . Proceeding in the same manner as in Section 3.1, the thi selected PSU mean is given by * * * *

i i iP ′ ′= =g R g Y where i′g is given by (3.3). Similar developments lead to the same set of unbiased constraints given by (3.9) which cannot in general be satisfied. Once again, requiring second stage samples to be selected with probability proportional to size enables development of an unbiased predictor of (2.10).

With the assumption of probability proportion to size second stage sampling, model (3.13) simplifies to * *µ= + +Y X Z B E with variance given by

( ) ( ) ( ) ( )

( ) ( )

1 2

* *2

2

0var

0 1 1 1

1 1

e N

NN

f f ff f f

f ff fN

ξ ξ σ

σ

′ = ⊗ − − − − ′ + − ⊗ − −

Y I

JI

where 2

*2

1

1 Ns

es sN M

σσ

=

= ∑ .

The best linear unbiased predictor of the thi selected cluster mean is given by

( )* * * 1 * *,

ˆ ˆ î iI I iII II I II I I IP α α− ′′ ′+ + −

g Y g X V V Y X= where * *ˆ nIfn

α′

=1

Y ,

2 2* *I n n

ffvNσ

= −V I J , *2 2 *2eσ σ σ= − , * *2 *2

ev fσ σ= + and

( ) ( ) ( )

2* *2, 1 1

1I II n n n N nff f

N fσσ × −

= − − ⊗ −

V I 0 J J (similar to the development

in c02ed12.doc and developed in c02ed42.doc). Using these expressions,

( )* * * * *1 1ˆ N n nni iI I iI n I iII I

fP fkf n f n

− × − ′ ′ ′= + + +

JJe Y e P Y e Y (3.14)


16

where *2

**k

vσ

= . When all cluster sizes are equal, simplifies to

( )* *i iI I iI n n I iII N n

M mmP Y k YM M −

−′ ′ ′ = + + + e Y e 1 P Y e 1 .

The variance of (3.14) when i n≤ simplifies to

( ) ( ) ( ) ( ) ( )1 2

2* * *2 * *2 *21 1 1ˆvar 1 1 1 1 1 1i i eP P f f f k fk fn n nfξ ξ σ σ

− = − − + − − + − − +

and when i n> reduces to ( )1 2

* * 2 2 *21 1ˆvar i i efP P

n fξ ξ σ σ σ −

− = + +

. (see c02ed49.doc)

3.3. Predictors of a PSU Mean based on Sampled and Remaining SSU Averages Predictors of a PSU mean (2.10) can be developed from a collapsed set of random

variables corresponding to sampled and remaining SSU averages by using

( ) ( )

( )2

1

2

1

1

1

ss s

ss

s ss

m M ms

ss M NN NM

M mms s

m

M m

× −•

×−×

′ ′′ = ⊗ ⊗ ′ −

1 0C 1 I

0 1(3.15)

in lieu of (3.1) to define *• •′ =C R Y where ( )1 2 N• • • •′ ′ ′ ′=C C C C . The elements in *Y

corresponding to PSU i are given by 1

1

N

is sIsN

is sIIs

U Y

U Y

=

=

∑

∑. We define a linear combination of these

random variables as

( )( )1

1N

i i is sI sIIs

P U cY c Y• • •

=

′= = + −∑g Y (3.16)

where ( )1i i c c•′ ′= ⊗ −g e for some constant c . This random variable, although not equal

to the PSU mean, has an expected value that equals the PSU mean, ( )2

1

N

i is ss

E Uξ µ• •

=

′ = ∑g Y .

Notice that ( )( )( )* *

11

N

i is s sI sIIs

U cY c Yµ• •

=

′ ′= + − + −∑g R g Y . Unlike the development in

sections 3.1 and 3.2, the second term in this expansion is not identically equal to zero. We ignore this term in developing a unbiased linear predictor for (3.16) that minimizes the expected mean squared error for any two stage sampling plan. Using this collapsed set of random variables, model (2.4) reduces to


17

2 1Nµ• • • •

×= +Y X Z B E+

where 2N• =X 1 and 2N

• = ⊗Z I 1 , and • •′=E C E . The variance is given by

( )( )

( ) ( )1 2

2

1 *22 22

1

0var

0

Ns

s sN N NN

s

s s s

NmN

N M m

ξ ξ

σ

σσσ

2=•

=

= ⊗ + ⊗ − ⊗ −

∑

∑Y I I J J J .

We partition these random variables into a sample and remaining vector such that

I

II

••

•

=

YK Y

Y and similarly partition iI

iiII

••

•

′ ′ = ′

gK g

g such that iI iIc•′ ′=g e and

( ) ( )( )1 1iII iI iIIc c c•′ ′ ′= − ⊗ −g e e . Collection of the study data will result in realizing the

values of I•Y . We require the predictor of iP• to be a linear function of the sample data,

( )i iI IP• • •′ ′+g a Y= and be unbiased such that ( )1 2ˆ 0i iE P Pξ ξ• •− = . The constraint will be

satisfied if 1n c′ = −a 1 when i n≤ , or if 1n′ =a 1 when i n> . The best linear unbiased

predictor is given by

( )1,

ˆ ˆ îI I iII II I II I I IP α α• • • • • • •− • • ′′ ′+ + −

g Y g X V V Y X=

which simplifies to

( ) ( )ˆ 1 N n nniI I iI n I iII IP c c ck

n n− ×• • • • •

′ ′ ′+ − + +

JJe Y e P Y e Y= (3.17)

where *

kvσ 2

••= and 2 *2

eIv σ σ• •= + and 2

2

1

1 Ns

eIs sN mσσ •

=

= ∑ . The variance of (3.17) when

i n≤ simplifies to


18

( )( ) ( ) ( )

1 2

*22*2 *2 2

ˆvar

1 22 1 1

i i

eII

P P

ck n vck v c cn n

ξ ξ

σσ σ σ

• •

• •• • •

− =

− − = − − + − + +

and when i n> reduces to

( ) ( )1 2

2*2 *2 2 2 21 1ˆvar 1 2 1i i n eI eIIP P v f c cN n N nξ ξ σ σ σ σ σ σ• • • 2 2 • • − = − + + + − + + − − −

(see c02ed51.doc) where we define ( )

22

1

Ns

eIIs s sN M m

σσ •

=

=−∑ .

6. Application We illustrate predictors for a selected physician’s practice in a randomized controlled

trial (Ockene, Hebert et al. 1996) designed to evaluate the effectiveness of a training program for physician-delivered nutrition counseling, alone and in combination with an office support program on dietary fat intake and blood LDL-cholesterol levels in patients with hyperlipidemia (WATCH). Patients enrolled in the study had duplicate venous cholesterol measures made at baseline and at 1 year follow-up. In addition, a 24 hour dietary recall was reported at baseline and 1 year follow-up. Data from these measures, in addition to baseline questionnaire data and intervention data were used for study evaluation.

The WATCH study had a cluster randomized design, with patients in physician practices forming the basic clusters. We focus our attention here on a single dependent variable, percent of Kcal from saturated fat, which we consider as a difference constructed by taking the Baseline minus 1 year measure for each patient. We also limit discussion to a single treatment arm (III) to which fourteen physician practices were randomly assigned. A summary of the average change for different physician practices assigned to this intervention is given in Table 1. We focus on predicting response for physician practice number 71. The estimate of average change in SFAT is -1.47% Kcal. This is the estimate that results treating physician practices as fixed effects. If physician practices are considered to be random, the usual mixed model estimate of the realized random effect is -0.99% Kcal (from Proc Mixed). Using a two-stage model-based approach (Scott and Smith 1969), the best linear unbiased predictor will depend on the population size. Unfortunately, the number of eligible patients in each practice was not recorded in the study. We make two assumptions about population sizes to illustrate differences in the estimator, first assuming all physician practices see

= 100M eligible patients, and second assuming half the eligible patients in a practice are included in the study.

A mixed model was fit to these data, resulting in an estimate (REML) of the variance between MDs of 0.431. We use methods of moments estimates to develop estimates of other variance components for use in developing predictors.

Table 1. List of Average Change in SF as Pct of Kcal by Physician for n=14 physicians in the Counseling and Support Intervention


19

Average change # of in Percent patients Sat.Fat Variance 1 -3.25 . 30 -2.31 35.48 11 -2.16 33.44 42 -2.09 34.18 25 -1.47 19.88 20 -1.38 21.82 25 -0.92 23.29 15 -0.82 14.59 10 -0.69 22.39 29 -0.46 18.70 15 0.36 28.73 38 0.37 27.70 16 0.75 17.53 33 0.91 20.97 Source: ced02p07.sas 11/16/02 by EJS

We develop predictors using variance components developed using the methods of

moments corresponding to sections 3.1-3.3. The method of moment estimate of 2τσ is given

by ( )( )

2

2 12 1

n

ii

Y Ys

f nτ=

−=

−

∑ where

1

im

i ijj

Y Y=

= ∑ and 1

1 n

ii

Y Yn =

= ∑ ; we use the REML of 2σ from a

mixed model; and we estimate 2eσ + , *2

eσ , 2eIσ • and 2

eIIσ • by 2 2

1

1 n

e i ii

s M sn+

=

= ∑ , 2

*2

1

1 ni

ei i

ssn M=

= ∑ ,

22

1

1 ni

eIi i

ssn m

•

=

= ∑ , and 2

2

1

1 ni

eIIi i i

ssn M m

•

=

=−∑ where ( )22

1

11

n

i ij iii

s Y Ym =

= −− ∑ . When 1im = , there is

no estimate of the within PSU variance. We alter sums over 1,...,i n= such that the sums exclude PSUs where 1im = .

Still to do: Develop example further, and check that it works. Write discussion.


20

Appendix A. Evaluation of ( )1 2

*varξ ξ R We outline the steps leading to ( )

1 2

*varξ ξ R given by (2.6) using the conditional

expansion ( ) ( ) ( )1 2 1 2 1 1 2 1

* * *| |var var varE Eξ ξ ξ ξ ξ ξ ξ ξ = + R R R . Since the vector *R is

constructed from sub-vectors for each unit given by ( )* ( )sst s t sty= ⊗R U U , we begin by

developing the expected value and variance of these sub-vectors. First,

( )2 1

*|

sMst s st

s

E yMξ ξ

= ⊗

1R U and thus,

1

2

2 1

1 11

2 2*| 2

s

N

M

MM

s ss

MN N

N

M

E MM

M

ξ ξ

⊗ ⊗

⊗ ⊗ = = ⊗ ⊗

⊗ ⊗

1y U

11y U

R y U

1y U

.

Now * *

1 1 * * * 2* *

var s s s s sM M M M MNs s s s s s s s

s s s s s

EM M M N M Mξ ξ

′ ′ ′ ′ ′⊗ ⊗ = ⊗ ⊗ − ⊗ ⊗

1 1 1 1 1Jy U y y U U y y

simplifies to *

1 1* * 2*

var s s sM M MNs s s s s s

s s s

EM N M Mξ ξ

′ ′ ′⊗ ⊗ = ⊗ − ⊗

1 1 1Jy U y y U U or

( ) *

1 2 1 1

11 1 2

2 1 2

*| * * 2

*

1 1 1 2 121 1 2 1

2 1 2 2 21 2 2

var

1 11 1

11 1

s s

N

M MNs s s s

s s

M MM M MN N N N

N

M M MN N

E EN M M

M N M M N M M

N M M MN

ξ ξ ξ ξ

′ ′ ′ = ⊗ − ⊗ ′′− − ′ ′ ′⊗ ⊗ ⊗ ⊗ ⊗ ⊗ − −

′− ′ ′⊗ ⊗ ⊗ ⊗ −=

1 1JR y y U U

1 1J 1 1y y P y y P y y P

1 1 Jy y P y y P 2

1 2

22

1 2 21 2

11

1 11 1

N

N N N

M MN N

N

M M M M MN N N N N N N

N N N

N M M

N M M N M M M

′− ′ ⊗ ⊗ −

′ ′ − − ′ ′ ′⊗ ⊗ ⊗ ⊗ ⊗ ⊗ − −

1 1y y P

1 1 1 1 Jy y P y y P y y P

since ( )1 * if *N

s sE s sNξ ′ = =IU U and ( ) ( ) ( )

1 *1 if *

1s s N NE s sN Nξ ′ = − + ≠

−U U I J . We

express this as


21

( )

1 1

2 2

1 2 1

1 11 1

2 2*| 2 221

1 1var1

s

N N

M MN N

M MNM N N

s s Nss

M MN N N N

N N

M M

E M MN M N

M M

ξ ξ ξ =

′ ⊗ ⊗ ⊗ ⊗

⊗ ⊗ ⊗ ⊗ ′ = ⊕ ⊗ ⊗ − −

⊗ ⊗ ⊗ ⊗

1 1y P y P

1 1J y P y P

R y y P

1 1y P y P

.

Next we evaluate ( )1 2 1

*|varEξ ξ ξ R . First,

( ) ( ) ( )2 2

* ( ) ( ) 2 22

1var s

s

Ms sst s s t t st s s M st

s s

E y yM Mξ ξ

′ ′ ′= ⊗ − = ⊗

JR U U U U U U P . Also,

( ) ( ) ( ) ( )2 2

* * ( ) ( )* * * *2

1cov ,1

s

s

Ms sst st s s t t st st s s M st st

s s s

E y y y yM M Mξ ξ

−′ ′ ′= ⊗ − = ⊗ −

JR R U U U U U U P .

Using these expressions, ( ) ( )2

* 2

1

1var1

s

s

Ms s

s st s s Mts s

yM Mξ + =

′′= ⊕ − ⊗ ⊗ −

y yR U U P . As a result,

( )1 2 1

* 2| 1 1

1var1

s

s

MNs s

st N Ms ts s

E yM Mξ ξ ξ = =

′ = ⊕ ⊕ − ⊗ ⊗ −

y yR I P , and combining terms results in the

expression for the variance given by (2.6).

Documents

Predicting Realized Cluster Parameters from Two Stage ... · two-stage random permutation model for cluster sampling, unequal size clusters complicates the tracking of target random