Upload
nguyenmien
View
215
Download
0
Embed Size (px)
Citation preview
C03ed02.doc 8/4/2003 3:07 PM i
Predicting Realized Cluster Parameters from Two Stage Samples of Unequal Size Clustered Population
Edward J. Stanek III
Department of Biostatistics and Epidemiology 404 Arnold House
University of Massachusetts 715 North Pleasant Street
Amherst, MA 01003-9304 USA [email protected]
Julio Singer
Departamento de Estatística Universidade de São Paulo
São Paulo, Brazil [email protected]
C03ed02.doc 8/4/2003 3:07 PM ii
ABSTRACT
C03ed02.doc 8/4/2003 3:07 PM iii
Contact Address: Edward J. Stanek III Department of Biostatistics and Epidemiology 404 Arnold House University of Massachusetts at Amherst Amherst, Ma. 01002 Phone: 413-545-3812 Fax: 413-545-1645 Email: [email protected] KEYWORDS: super-population, best linear unbiased predictor, two-stage
sampling, random permutation, optimal estimation, design-based inference ACKNOWLEDGEMENT. Many people contributed to the development of ideas
that lead to this manuscript, including Dalton Andrade, Heleno Bolfarine, John Buonaccorsi, Viviana Lencina, and Oscar Loureiro. This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), FINEP (PRONEX), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848), USA.
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
1
1. INTRODUCTION
We discuss methods for predicting a cluster mean or total based on a two-stage clustered random sample from a finite population formed from unequal size clusters. Common applications of mixed model theory {Goldberger, 1962 #531}{Henderson, 1984 #1606}{McLean, 1991 #175}{Robinson, 1991 #723}{McCulloch, 2001 #1577} assume infinite size populations and clusters to develop best linear unbiased predictors (BLUP). In the survey sampling literature, predictors have been developed using super-population models {Scott, 1969 #143}{Bolfarine, 1992 #1027}{Valliant, 2000 #1578}. When clusters are equal in size, two-stage random permutation models have been used for prediction that include response error {Stanek, 2003a #1697}.
We follow a similar approach to {Scott, 1969 #143}, but base the probability model solely on the random permutation framework that underlies a two-stage cluster sample. If such a sample is selected, the results are based on the sampling design. No hypothetical super-population is required, and account is made for different size clusters in the finite population. Target parameters correspond to linear combinations of latent values for clusters. The objective of this paper is to extend the results of {Stanek, 2003a #1697} to predicting such target parameters.
The strategy builds on the methods for predicting realized random effects in a population formed by equal size clusters. With mixed model and super-population model approaches, extending methods to unequal size cluster settings is straightforward. Unlike survey sampling results for estimation {Cochran, 1977 #1067}, the different cluster sizes do not alter the analysis. In fact, that is one of the appeals of mixed model methods. Using a two-stage random permutation model for cluster sampling, unequal size clusters complicates the tracking of target random variables through the development of predictors. Tracing through this development, while more complex, retains a clear connection between the target random variable, and the predictor. Avoiding these complications with other methods comes with a cost. The cost is the ambiguity introduced in the interpretation of predictors.
As an example, we consider a cluster randomized controlled trial designed to evaluate the relative effectiveness of a training program for physician-delivered nutrition counseling, alone and in combination with an office support program, on dietary fat intake and blood LDL-cholesterol levels in patients with hyperlipidemia (the Worcester-Area Trial for Counseling in Hyperlipidemia (WATCH) Trial){Ockene, 1996 #890}. Forty-five primary care internists were selected from 60 internists at the Fallon Clinic, a central Massachusetts health maintenance organization (HMO), with each internist randomly assigned to one of three conditions: (I) Usual Care; (II) Physician nutrition counseling training; and (III) Physician nutrition counseling training plus an office support program. Twelve hundred and seventy-eight patients with blood LDL-cholesterol levels in the highest 25th percentile, having previously-scheduled physician visits, were recruited into the study. The number of sampled patients per cluster ranged from 1 to 43.
The WATCH study had a cluster randomized design, with patients in physician practices forming the basic clusters. We focus our attention on a single dependent variable, percent of Kcal from saturated fat, which we consider as a difference constructed by taking the baseline minus one year measure for each patient. There was considerable variability in the magnitude of the average patient’s saturated fat change between physician’s practices within a given intervention. We discuss estimating the change for eligible patients in a
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
2
physician’s practice that accounts for the practice size. The cluster corresponds to a realized physician.
1.1 Basic Notation
We define notation and terminology in the context of two related literatures, mixed
models {Diggle, 2002 #378}{Verbeke, 2000 #1703}{Searle, 1992 #708}{McCulloch, 2001 #1577} and prediction based super-population sampling models {Valliant, 2000 #1578}{Bolfarine, 1992 #1027} that address the basic problem. Let a finite population be defined by a listing of 1,..., st M= units in each of 1,...,s N= clusters, where response for subject t in cluster s is given by sty . We assume that a two-stage cluster sample is to be selected from this population (using without replacement sampling), with sm units selected for each of n clusters.
We represent the sample of clusters by defining a random permutation of the clusters in the original frame. A new label, 1,...,i N= , is assigned to the positions corresponding to different clusters the permutation. Without loss of generality, we assume that the first
1,...,i n= positions contain the clusters that will occur in the sample. In a similar manner, we distinguish the listed subjects in cluster s from a random permutation of units in cluster s , whose positions are labeled by 1,..., sj M= . We again assume that the first 1,..., sj m= positions contain the units in the sample if cluster s is selected. Using these ideas and notation, we represent the population under a two-stage permutation as an ordered list of
1
N
ss
M=
= ∑ random variables, where a model for the unit in the thj position in the cluster in
the thi position is represented by ijY . For ease of exposition, we refer to the cluster that will
occupy position i in the permutation as the thi primary sampling unit (PSU), and to the unit that will occupy position j in the permutation as the thj secondary sampling unit (SSU). PSUs and SSUs are indexed by positions ( i and j ), whereas clusters and units are indexed by labels ( s and t ) in the finite population. Thus, ijY represents the response for the thj
SSU in the thi PSU. We think of sampling as selecting a permutation. Prior to sample selection, the
cluster that will correspond to PSU i is random, and hence the expected value of response for the cluster in the thi position is a random variable. Once the sample has been selected, for i n≤ , we will observe which cluster corresponds to a particular PSU in the sample. We refer to that cluster as the realized PSU, and to the average (or total) for units in that cluster as the latent value for the realized PSU (noting that such effects are often defined as deviations relative to an overall mean). In this context, mixed model and super-population modeling strategies may be advocated for predicting a realized PSU, or linear combinations of PSUs. We first review these approaches. We then define the context for finite population sampling prediction setting the stage for development of the main results in the subsequent section.
1.2 Mixed Models
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
3
In the mixed model literature, the model for the response of the thj SSU in the thi PSU is given by
ij i ijY B Eµ= + + (1.1) where µ corresponds to the overall population mean, iB is a random effect that corresponds to the deviation from the population mean for the thi selected PSU, and ijE is the random
error corresponding to the deviation of the thj selected SSU from the thi selected PSU mean. Typically, additional assumptions are made. The random effects, iB , are assumed
independent and identically normally distributed, ( )20 , iB iid N σ∼ , and independent of the random errors, also assumed independent and identically distributed for all j in i ,
( )2 0 , ij iE iid N σ∼ . The model and assumptions enable the joint distribution of fixed and random effects to be specified. This density, when maximized jointly with respect to the fixed and random effects, leads to Henderson’s mixed model equations{Henderson, 1959 #751}. In model (1.1), assuming known variances, the estimator of the fixed effects is the
weighted least squares estimator, 1
ˆn
i ii
w Yµ=
= ∑ where *
* 1
1/
1/
ii n
ii
vw
v=
=
∑, 1
im
ijj
ii
YY
m==∑
, and
2i
ii
vmσ
σ 2= + . Solution of the mixed model equations results in the BLUP of iB .
Combining the estimator of µ and the predictor of iB , the predictor of iBµ + is given by
( )ˆ ˆi ik Yµ µ+ − (1.2),
where 2
ii
kvσ
= .
In practice, the variance parameters are replaced by the maximum likelihood or restricted maximum likelihood estimates. {McCulloch, 2001 #1577} note that there are many other derivations of the BLUP estimates (discussed extensively by {Robinson, 1991 #723}). A derivation of particular note is the Bayesian derivation, where (1.2) is the expected value of the posterior distribution, assuming normality. The normality assumptions are not necessary, and can be replaced by knowledge of the first and second moments. None of these derivations or discussions account for the impact that different cluster size has on the predictors.
1.3 Super-population Sampling Models Predictors of a linear combination of fixed and random effects (corresponding to the
average response for a cluster) in two-stage sampling settings were developed by {Scott, 1969 #143} using a model based survey sampling approach. In their approach, a finite population is viewed as the realization of a set of random variables. We refer to the set of random variables as the super-population, and specify a model for it. The parameters of interest are linear combinations of values in the finite population, such as the such as the
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
4
latent value for a realized PSU. Such parameters can be defined for all clusters in the finitie population. Since only a part of the finite population is observed in a sample, the essential statistical problem is how best to predict the values of the remaining random variables. Predictors are constructed using the joint distribution assumed for the super-population {Royall, 1976 #986}{Bolfarine, 1992 #1027}{Valliant, 2000 #1578}. This approach to statistical inference in survey sampling is called model-based, since inference is based on the model assumed for the super-population.
We define the nested super-population by a model, and first and second moments. The random variables ijY , 1,..., ; 1,..., ii N j M= = in the super-population are assumed to
satisfy ( )ijE Y µ= and
( ) 2 2
2
cov , when ;
when ;0 otherwise.
ij kl iY Y i k j l
i k j l
δ σ
δ
= + = =
= = ≠=
{Scott, 1969 #143} used this super-population model to derive predictors of linear combinations of elements in the super-population. Of special interest is the linear combination that corresponds to the latent value of a PSU. Scott and Smith frame the statistical problem as prediction of the remaining unobserved SSUs in a PSU. A Bayesian derivation (assuming the super-population is normally distributed), and a distribution free derivation (based on minimizing the expected mean squared error (MSE) of a linear predictor assuming that PSUs and SSUs within PSUs are exchangeable) are given, and shown to result in the same predictor given by
( ) ( )* * *ˆ ˆ1i i i i if Y f k Yµ µ + − + −
( ) ( )* * *ˆ ˆ1i i i i i if Y f f k Yµ µ + − + − (1.3)
if the PSU is partially realized in the sample, or *µ if the PSU is not included in the sample,
where * *
1
ˆn
i ii
w Yµ=
= ∑ , where ii
i
mf
M= ,
**
**
* 1
1/
1/
ii n
ii
vw
v=
=
∑,
2* 2 ii
i
vmσ
δ= + and **ii
kvδ 2
= . This
predictor has an appealing interpretation as the weighted sum of two terms: the sample mean for the realized PSU, and predictors of the remaining SSUs for the realized PSU. The weighting factors are the proportion of SSUs that are observed and the proportion that are not observed. For PSUs that are not in the sample, the predictor simplifies to the weighted sample mean.
There is an obvious similarity between equation (1.2) and equation (1.3). When if is small enough that the first term in equation (1.3) can be ignored, and 1 1if− ≅ , the two equations appear to be identical, with the exception of different representations for the variance components. The best linear unbiased predictor (in equation (1.2)) predicts unobserved SSUs in a realized PSU. If the number of SSUs in the realized PSU is so large that the observed SSUs are a negligible fraction of the total for the PSU, then the realized PSU mean is estimated by the predicted values of the unobserved SSUs for the PSU. When a non-trivial portion of the SSUs are observed in the sample for a realized PSU, then the estimate of the realized PSU mean is a weighted average of the sampled, and predicted
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
5
values of the unobserved SSUs. This provides a strong intuitive appeal to the prediction- based approach as advocated by Valliant, Dorfman, and Royall (2000).
As discussed by Scott and Smith, the assumption of normality is not needed. Instead, the model assumptions correspond to assuming PSUs are exchangeable, and SSUs are exchangeable within PSUs. The predictor is assumed to be a linear function of the sample responses. Coefficients of the predictor are derived so to minimizes the expected MSE over the super-population. The MSE is required to be bounded, leading to a constraint in the minimization.
2. THE TWO-STAGE PERMUTATION MODEL FOR A FINITE
POPULATION We derive predictors of realized random effects from a two-stage clustered sample of
a finite population using the prediction based approach to survey sampling {Royall, 1976 #986}. A principal distinction is the use of a probability model based directly on the two-stage sampling as in {Stanek, 2003a #1697}, and retention of the identifiability of clusters throughout the process. The results, while similar in form to those of {Scott, 1969 #143}{Valliant, 2000 #1578}, do not require positing a super-population. We first define the population and target parameters. Next we define the random variables representing a two-stage random permutation of the population, and variations in the representation that preserve representation of the cluster’s identity.
2.1 Finite Population Parameters and Re-parameterizations
We begin by defining finite population parameters and representing response for a
subject in a simple deterministic model. The mean and variance of cluster s is defined as
1
1 sM
s stts
yM
µ=
= ∑ and ( )22
1
1 1 sMs
s st sts s
My
M Mσ µ
=
−= −
∑
for 1,...,s N= (where we use the survey sampling definition of the parameter 2sσ ). Similarly,
the overall mean, and the variance between cluster means is defined as
1
1 N
ssN
µ µ=
= ∑ and ( )22
1
1 1 N
ss
NN N
σ µ µ=
− = −
∑ .
Using these parameters, we represent the potentially observable response for subject t in cluster s as
st s sty µ β ε= + + (2.1) where ( )s sβ µ µ= − is the deviation of the mean for cluster s from the overall mean, and
( )st st syε µ= − is the deviation of subject t 's response (in cluster s ) from the mean for cluster s . Model (2.1) is called a derived model {Hinkelmann, 1994 #1023}. Defining
( )1 2 N
′′ ′ ′=y y y y where ( )1 2 ss s s sMy y y ′=y , model (2.1) can be summarized
as µ= + +y X Zβ ε (2.2)
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
6
where =X 1 , 1 s
N
MN s× == ⊕Z 1 and ( )1 2 Nβ β β′ =β . Here, a1 is an 1a× column vector
of ones, and 1
N
ss=⊕A denotes a block diagonal matrix with blocks given by sA {Graybill, 1983
#1613}, and ε is defined similarly to y . None of the terms in model (2.2) are random variables.
2.2 Random Variables and The Two Stage Random Permutation Model
We define a stochastic model that parallels model (2.2) for two-stage cluster
sampling. The sample of clusters corresponds to the first n clusters in a permutation of the vectors of clusters, sy . Similarly, the sample of subjects in cluster s corresponds to the first
sm subjects in a permutation of subjects the sample clusters. With this goal in mind, we define random variables whose realization corresponds to a two-stage permutation of the population. Assuming each realization of the permutation is equally likely, the random variables formally represent two-stage sampling {Cochran, 1977 #1067}.
We use different terms and indices for the finite population and the permuted population. In the finite population, we index clusters by 1,...,s N= and subjects by 1,..., st M= . In a two-stage permutation of the finite population, we refer to the permuted clusters as primary sampling units (PSUs), and the permuted subjects in a PSU as the secondary sampling units (SSUs). We index the positions of PSUs in the permutation by
1,...,i N= and the positions of SSUs by 1,..., sj M= . In the finite population, we represent the response for subject t in cluster s by sty ; in the permuted population, we represent the random variable for response of the subject in position j for the cluster in position i by ijY .
Sample indicator random variables relate sty to ijY . The indicator random variable
isU takes on a value of one when PSU i is cluster s , and a value of zero otherwise; the
indicator random variable ( )sjtU takes on a value of one when SSU j in cluster s is subject t ,
and zero otherwise. As a consequence, the random variable corresponding to PSU i and SSU j in a permutation is given by
( )
1 1
sMNs
ij is jt sts t
Y U U y= =
= ∑∑ (2.3).
Defining ( ) ( )( ) ( ) ( )1 2 s
s s s sM=U U U U where ( ) ( ) ( )( )( )
1 2 s
ss sst t t M tU U U ′=U is an 1sM ×
column vector, a vector representing permutations of subjects in a given cluster is given by the 1sM × dimension vector ( )s
sU y . With this notation, the SSUs for the cluster in position i
in a permutation of clusters are of the form ( )sis sU U y . The dimension of this vector depends
on which cluster occurs in position i in a permutation, as determined by the realization of the random variables isU , 1,...,s N= .
Since these sizes differ for different clusters, when cluster vectors are permuted, the interpretation of which unit corresponds to SSU j in PSU i in a vector representing a two-stage permutation of the population is problematic. For example, if 3N = , 1 2 2M M= = , and
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
7
3 3M = , consider two permutations of clusters. Suppose the first permutation has cluster 1s = in position 1;i = 2s = in position 2;i = 3s = in position 3i = . Assume the second
permutation has clusters 3, 1, and 2 in positions 1,...,3i = , respectively. We can represent the population by a random vector for the first permutation as
( )( ) ( )( ) ( )( )1 2 31 2 3
′ ′ ′ ′
U y U y U y , and for the second permutation as
( )( ) ( )( ) ( )( )3 1 23 1 2
′ ′ ′ ′
U y U y U y . Although both vectors are of dimension 1× (where
7= ), the third SSU in the first permutation is in PSU 2i = , while the third SSU in the second permutation is in PSU 1i = . The position of the SSUs in the permuted population is not sufficient to retain the identity of the PSU for the SSU. Basically, the problem is that the dimension of the random variables in a PSU depends on the dimension of the cluster that is realized. This problem is a result of limitations in the traditional representation used for random variables in the context of two stage permutations. The ambiguity is a result of inadequate notation; it gives rise to ambiguity in analysis and interpretation.
The notational ambiguity is avoided by expanding the set of random variables that
represent the two-stage permutation to a vector of dimension 2
1
1N
ss
N M=
× ∑ whose elements
are of the form ( )* sisjt is jt stR U U y= . We arrange these random variables in the vector *R where
* * * *1 2 N+ + +
′ ′ ′ ′=
R R R R with * * * *1 2 ss s s sM+
′ ′ ′ ′=
R R R R and where
( )* ( )sst s t sty= ⊗R U U is of dimension 1sNM × where ( )1 2s s s NsU U U ′=U for
1,...,s N= . The position of the random variables in *R uniquely identify a cluster and unit. Using the expanded representation and the parameterization from (2.2), the mixed model is represented by
1 1 1 1 1
2 2 2 2 2*
N N N N N
ββ
µ
β
+
+
+
= + + +
X X 0 0 X 0 0X 0 X 0 0 X 0
R E
X 0 0 X 0 0 X
εε
ε
(2.4)
where 2sNM
ssNM
=1
X and s
s
NMs M
sNM+ = ⊗1
X I .
Expressions for the expected value and variance of *R can be determined using elementary properties of the indicator random variables. Under the two-stage random permutation model, using the subscript 1ξ to denote expectation with respect to permutations of the clusters and using the subscript 2ξ to denote expectation with respect to permutations of units in a cluster,
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
8
( )1 1 1
2 2 2*
N N N
Eξ ξ
µµ
µ
1 2
+
+
+
= +
X 0 0 X 0 00 X 0 0 X 0
R
0 0 X 0 0 X
ε , (2.5)
and
( )
( )
1
1 2
1 1
2 2
* 221 1
1 11 1
2 22 2
1 1var1 1
11
s
s
N N
MN Ms s Nst M s s Ns t
s s s
M MN N
M MN N
M MN N N N
N N
yM M N N M
M M
M MN N
M M
ξ ξ = =
′ ′ = ⊕ ⊕ − ⊗ ⊗ + ⊗ ⊗ − −
⊗ ⊗ ⊗ ⊗
⊗ ⊗ ⊗ ⊗
− −
⊗ ⊗ ⊗ ⊗
Jy y IR P y y P
1 1y P y P
1 1y P y P
1 1y P y P
′
(2.6)
where 1a a aa= −P I J .
2.3. Defining Random Variables of Interest
We assume that there is interest in a linear combination of PSU totals, *P ′= g R or
means defined by * * *P ′= g R where ( )1 s s
N
N M Ms=
′′ ′ ′ ′= ⊕ ⊗ ⊗ g 1 1 b 1 (for totals) or
*
1
s
s
N MN Ms
sM=
′ ′′ ′ ′= ⊕ ⊗ ⊗
1g 1 1 b (for means), where ( )( )ib=b is an N -dimensional column
vector of constants. Of principal interest is the linear combination that defines the total for
PSU i given by *i iP ′= g R , or the mean for PSU i given by * * *
i iP ′= g R , where
( )1 s s
N
i N M i Ms=
′′ ′ ′ ′= ⊕ ⊗ ⊗ g 1 1 e 1 (2.7)
and
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
9
*
1
s
s
N Mi N M is
sM=
′ ′′ ′ ′= ⊕ ⊗ ⊗
1g 1 1 e , (2.8)
where ie is an N -dimensional column vector with the thi value equal to one, and other values
equal to zero. Note that
1
N
i is s ss
P U M µ=
=∑ , (2.9)
while
*
1
N
i is ss
P U µ=
=∑ . (2.10)
We develop predictors of the random variables given by (2.9) and (2.10) next.
3. PREDICTING A PSU TOTAL OR MEAN BASED ON A TWO-STAGE SAMPLE
We develop a predictor of a PSU total (2.9) (or mean (2.10)) based on the two-stage
without replacement simple random sampling probability model (2.4), where sampling
results in a single non-stochastic measure on each of sm SSUs from each of the n PSUs. We
require the predictor to be a linear function of the sample, to be unbiased, and to minimize
the expected value of the mean squared error (MSE). The basic strategy is given in many
places {Scott, 1969 #143}{Royall, 1976 #986}{Bolfarine, 1992 #1027}{Valliant, 2000
#1578}. The elements of *R are first partitioned into a sampled and remaining portion. We
assume that the elements in the sampled portion will be observed, and express iP as the sum
of two parts, one which is a function of the sample, and the other which is a function of the
remaining random variables. Then, requiring the predictor to be a linear function of the
sampled random variables and to be unbiased, coefficients are evaluated that minimize the
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
10
expected value of the mean squared error, ( )1 2ˆvar i iP Pξ ξ − . This general development was
given in Theorem 2.1 by {Royall, 1976 #986}.
TO HERE 2/12/03
Conceptually, Royall’ theorem can be applied directly to form predictors, but
practically is complicated by singularities in ( )*var R . The singularities are the result of the
expansion of random variables needed to explicitly track clusters. We avoid these
complications by developing solutions to a simpler problem based on a ‘collapsed’ set of
random variables, Y . To do so, we first express * *′ ′= +R A Y B R . Different definitions of
Y are possible. For definitions of Y that result in *i iP ′ ′= g A Y and *′ =B R 0 , we develop
predictors of iP based solely on Y . It turns out that in this setting, unbiased predictors can
be developed only when the sampling fraction is constant for all clusters. In settings where
the sampling fraction differs between clusters, we develop an unbiased predictor of
i iP• • •′= g Y , where ( )2
*i iP E Pξ
•= is given by (2.10).
The key step in these developments is an appropriate definition of the ‘collapsed’ set
of random variables, Y . We consider three different definitions of Y . The first definition,
*′=Y C R , collapses random variables to PSU totals for the sample and remainder. The
second definition, * * *′=Y C R , collapses random variables to scaled PSU totals for the
sample and remainder where the scaling factor is the cluster size. A third strategy,
*• •′=Y C R , is to collapse random variables to PSU means for the sampled and remaining
units. We proceed to outline details of each of these developments.
3.1. Predictors of PSU Totals Based on Collapsing to Cluster Totals
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
11
We develop the best linear unbiased predictor of (2.9) based on collapsing *R to a
vector of PSU totals for the sample and remainder such that *′=Y C R where
( )1 2 N′ ′ ′ ′=C C C C and
( ) ( )
( )2
1
21
ss s
ss s s
s
m M ms M N
N NM M mm
× −
× −×
′ ′′ = ⊗ ⊗ ′
1 0C 1 I
0 1 (3.1).
For PSU i , the corresponding elements of Y are given by ( )
1
11
N
is s s sIs
N
is s s sIIs
U f M Y
U f M Y
=
=
−
∑
∑ where
( )
1 1
1 s sm Ms
sI jt stj ts
Y U ym = =
=
∑ ∑ and ( )
1 1
1 s s
s
M Ms
sII jt stj m ts s
Y U yM m = + =
= −
∑ ∑ . Defining
( ) 1
21
NN
s s s Ns N−
=
′ ′= ⊕ ⊗
1A C C C I , and
( ) ( ) ( )1
21 1 1
ss
s ss s
N N N mMs s s N N ss N M NMs s s M msM
−
= = = −
′ ′ ′ = ⊕ ⊗ ⊕ + ⊕ ⊗ ⊗ + ⊗
P 0JB C C C P I C I P I0 P ,
we can express * *′ ′= +R A Y B R (see c02ed34.doc). Then using (2.7), we find that *
i i iP ′ ′= =g R g Y , (3.2) where
2i i′ ′ ′= ⊗g e 1 . (3.3)
Also, ( ) ( )1 2 1 2
*var vari iξ ξ ξ ξ′ ′=g R g Y (see c02ed37.doc, p1). This provides the rational for
developing predictors of (3.2) based on Y . We highlight the main results for the development. More details of the development
are given by Stanek (2002, c02ed11, 12v1, 34-38.doc). Using sC defined by (3.1), model (2.4) simplifies to
( )2 22 1
f fiN NN f i fi
TT T
ττ τ×
= ⊗ + + − −
Y 1 I I E (3.4)
where ( )1
N
fi is s s fs
T U f τ τ=
= −∑ , ( )1
N
i is ss
T U τ τ=
= − ∑ , 1
N
s ss
f
f
N
ττ ==
∑, 1
N
ss
N
ττ ==
∑,
s s sMτ µ= , ss
s
mf
M= , and ′=E C E . Model (3.4) is a mixed model where the expected
value and variance of Y are given by
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
12
( ) ( )2f
Nf
Eξ ξ
ττ τ1 2
= ⊗ −
Y 1 I (3.5)
and
( )1 2
2
1
2
1
0var
0 1 1 1
11 11
Ns s ss s
Ns s s s
Nf fs sN
N ss f fs s
f f fMf f fN
f fN
f fN N
ξ ξσ
τ ττ
τ τ τ τ
=
=
′ = ⊗ − + − − − ′′ − ⊗ − − − − −−
∑
∑
Y I
JI
. (3.6)
The model contains two types of random effects. One random effect, iT , represents deviations of cluster totals about the simple average cluster total for the thi selected cluster. The second random effect, fiT , represents deviations of the expected cluster sample totals
about the simple average expected cluster sample total for the thi selected cluster We develop a predictor of (3.2) corresponding to the total of the thi PSU in the
permutation. We require the predictor to be a linear function of the sampled random variables, to be unbiased, and to minimize the expected value of the mean squared error. First, we re-arrange Y into a sampled and remaining vector by pre-multiplying by
I
II
=
KK
K where
( )( )
21 0I n n N nn N × −×
= ⊗
K I 0 (3.7)
and
( )
( )( )
( )
22
0 1n n N n
IIN n N N
N nN n n
× −
− + × −− ×
⊗ = ⊗
I 0K
0 I I (3.8)
resulting in I
II
=
YK Y
Y. In a similar manner, we define [ ]2
IN
II
= ⊗
XK 1 I
X where
( )1 0I n= ⊗X 1 and ( )2
0 1nII
N n−
⊗ = ⊗
1X
1 I. In terms of the partitioned random variables,
i iI I iII IIP ′ ′= +g Y g Y where iI iI′ ′=g e and ( )2iII iI iII′′ ′ ′= ⊗g e e 1 , and ( )i iI iII′ ′ ′=e e e where iIe
is an 1n× vector, and iIIe has dimension ( ) 1N n− × .
Collection of the study data will result in realizing the values of IY . We require the predictor of iP to be a linear function of the sample data, ( )i iI IP ′ ′+g a Y= and be unbiased
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
13
such that ( )1 2 i iE P Pξ ξ − = 0 . Since ( ) fI I
fEξ ξ
ττ τ1 2
= −
Y X and ( ) fII II
fEξ ξ
ττ τ1 2
= −
Y X ,
the unbiased constraint implies that ( ) 0fI iII II
f
ττ τ
′ ′− − a X g X = , or that I iII II′ ′=a X g X .
This expression simplifies to ( ) ( )0 1n iII N n−′ ′=a 1 e 1 (3.9)
which is clearly impossible, since 0 can not equal 1. As a result, unbiased predictors of (3.2) are not possible for model (3.4). Suppose, however, that second stage samples are selected with probability
proportional to size, such that s
s
mf
M= for all 1,...,s N= . Then f fτ τ= , and the unbiased
constraint can be expressed as ( ) 01I iII II
ffτ
′ ′− − a X g X = implying that
( ) 01I iII II
ff
′ ′− = − a X g X . The unbiased constraint now simplifies to
1n iII N n
ff−
−′ ′= +a 1 e 1 . These conditions will be satisfied if s sn
s
M mm−′ =a 1 for all
1,...,s N= when 1,...,i n= (corresponding to predicting a PSU in one of the first n positions
in the permutation), or if sn
s
Mm
′ =a 1 for all 1,...,s N= when 1,...,i n N= + (when predicting
a PSU in one of the last 1,...,n N+ positions in the permutation (see c02ed11.doc)). We assume second stage samples are selected with probability proportional to size in
the subsequent developments, and proceed to develop a predictor based on minimizing the expected value of the mean squared error. Using this notation, model (3.4) simplifies to
τ= + +Y X Z T E (3.10)
where 1N
ff
= ⊗ −
X 1 , 1N
ff
= ⊗ −
Z I , ( )1 2 Nτ τ τ ′=τ , and ( )Nτ= −T U 1τ .
Model (3.10) is a mixed model for cluster totals. The expression for the variance (3.6) simplifies to (see c02ed12v1.doc, p8)
( ) ( )1 2
2
2
0var
0 1 1 1
1 1
e N
NN
f f ff f f
f ff fN
ξ ξ
τ
σ
σ
+
′ = ⊗ − − − − ′ + − ⊗ − −
Y I
JI
(3.11)
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
14
where ( )2
2 1
1
N
ss
Nτ
τ τσ =
−=
−
∑, and
2
2 1
N
s ss
e
M
N
σσ =
+ =∑
.
The best linear unbiased predictor of the thi selected cluster total is given by
( )1,
ˆ ˆ ˆi iI I iII II I II I I IP α α− ′′ ′+ + −
g Y g X V V Y X= where I nf=X 1 , ( )1
1
n
IIN n
ff
f−
−
= ⊗ −
1
X1
,
ˆ nIfn
α′
=1
Y , 2 2
I n nf
fvN
τσ= −V I J , 2 2ev f τσ σ+ += + , 2 2 2
eτ τσ σ σ+ += − and
( )( ) ( )
22
, 21 1
1I II n nn N n n N n
ff fN fτ
τσ
σ + × − × −
= − − ⊗ − V I 0 J J (see c02ed12.doc and
c02ed42.doc for details). Using these expressions,
( )1 1ˆ N n nni iI I iI n I iII I
fP fkf n f n
− × − ′ ′ ′= + + +
JJe Y e P Y e Y (3.12)
where 2
kvτσ += . When in addition, all cluster sizes are equal, (3.12) simplifies to
( )i iI I iI n n I iII N nP m M m Y k M Y−′ ′ = + − + + e Y e 1 P Y e 1
where 1
m
ijj
i
YY
m==∑
, ( )1 2I nY Y Y ′=Y , and 1 1
n m
iji j
YY
nm= ==∑∑
. (see c02ed39.doc and
c02ed42.doc). The variance of (3.12) when i n≤ simplifies to
( ) ( ) ( ) ( ) ( )1 2
2 2 2 21 1 1ˆvar 1 1 1 1 1 1i i eP P f f f k fk fn n nfξ ξ τσ σ+ +
− = − − + − − + − − +
and when i n> reduces to ( )1 2
2 2 21 1ˆvar i i efP P
n fξ ξ τ τσ σ σ +
−− = + +
. (see c02ed49.doc)
3.2. Predictors of PSU Means based on Weighted PSU Totals Predictors of the PSU mean (2.10) can be developed from a collapsed set of random
variables corresponding to a weighted PSU totals by using * ss
sM′
′ =C
C in lieu of (3.1) to define
* * *′ =C R Y where ( )* * * *1 2 N′ ′ ′ ′=C C C C . The elements in *Y corresponding to PSU i
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
15
are given by ( )
1
11
N
is s sIs
N
is s sIIs
U f Y
U f Y
=
=
−
∑
∑. Defining ( )1 2 Nf f f ′=f leads to the model
(similar to (3.5)) that
( )* *N
NN
′ = ⊗ ′−
f1Y E
1 fµ + (3.13)
where * *′=E C E . Proceeding in the same manner as in Section 3.1, the thi selected PSU mean is given by * * * *
i i iP ′ ′= =g R g Y where i′g is given by (3.3). Similar developments lead to the same set of unbiased constraints given by (3.9) which cannot in general be satisfied. Once again, requiring second stage samples to be selected with probability proportional to size enables development of an unbiased predictor of (2.10).
With the assumption of probability proportion to size second stage sampling, model (3.13) simplifies to * *µ= + +Y X Z B E with variance given by
( ) ( ) ( ) ( )
( ) ( )
1 2
* *2
2
0var
0 1 1 1
1 1
e N
NN
f f ff f f
f ff fN
ξ ξ σ
σ
′ = ⊗ − − − − ′ + − ⊗ − −
Y I
JI
where 2
*2
1
1 Ns
es sN M
σσ
=
= ∑ .
The best linear unbiased predictor of the thi selected cluster mean is given by
( )* * * 1 * *,
ˆ ˆ ˆi iI I iII II I II I I IP α α− ′′ ′+ + −
g Y g X V V Y X= where * *ˆ nIfn
α′
=1
Y ,
2 2* *I n n
ffvNσ
= −V I J , *2 2 *2eσ σ σ= − , * *2 *2
ev fσ σ= + and
( ) ( ) ( )
2* *2, 1 1
1I II n n n N nff f
N fσσ × −
= − − ⊗ −
V I 0 J J (similar to the development
in c02ed12.doc and developed in c02ed42.doc). Using these expressions,
( )* * * * *1 1ˆ N n nni iI I iI n I iII I
fP fkf n f n
− × − ′ ′ ′= + + +
JJe Y e P Y e Y (3.14)
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
16
where *2
**k
vσ
= . When all cluster sizes are equal, simplifies to
( )* *i iI I iI n n I iII N n
M mmP Y k YM M −
−′ ′ ′ = + + + e Y e 1 P Y e 1 .
The variance of (3.14) when i n≤ simplifies to
( ) ( ) ( ) ( ) ( )1 2
2* * *2 * *2 *21 1 1ˆvar 1 1 1 1 1 1i i eP P f f f k fk fn n nfξ ξ σ σ
− = − − + − − + − − +
and when i n> reduces to ( )1 2
* * 2 2 *21 1ˆvar i i efP P
n fξ ξ σ σ σ −
− = + +
. (see c02ed49.doc)
3.3. Predictors of a PSU Mean based on Sampled and Remaining SSU Averages Predictors of a PSU mean (2.10) can be developed from a collapsed set of random
variables corresponding to sampled and remaining SSU averages by using
( ) ( )
( )2
1
2
1
1
1
ss s
ss
s ss
m M ms
ss M NN NM
M mms s
m
M m
× −•
×−×
′ ′′ = ⊗ ⊗ ′ −
1 0C 1 I
0 1(3.15)
in lieu of (3.1) to define *• •′ =C R Y where ( )1 2 N• • • •′ ′ ′ ′=C C C C . The elements in *Y
corresponding to PSU i are given by 1
1
N
is sIsN
is sIIs
U Y
U Y
=
=
∑
∑. We define a linear combination of these
random variables as
( )( )1
1N
i i is sI sIIs
P U cY c Y• • •
=
′= = + −∑g Y (3.16)
where ( )1i i c c•′ ′= ⊗ −g e for some constant c . This random variable, although not equal
to the PSU mean, has an expected value that equals the PSU mean, ( )2
1
N
i is ss
E Uξ µ• •
=
′ = ∑g Y .
Notice that ( )( )( )* *
11
N
i is s sI sIIs
U cY c Yµ• •
=
′ ′= + − + −∑g R g Y . Unlike the development in
sections 3.1 and 3.2, the second term in this expansion is not identically equal to zero. We ignore this term in developing a unbiased linear predictor for (3.16) that minimizes the expected mean squared error for any two stage sampling plan. Using this collapsed set of random variables, model (2.4) reduces to
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
17
2 1Nµ• • • •
×= +Y X Z B E+
where 2N• =X 1 and 2N
• = ⊗Z I 1 , and • •′=E C E . The variance is given by
( )( )
( ) ( )1 2
2
1 *22 22
1
0var
0
Ns
s sN N NN
s
s s s
NmN
N M m
ξ ξ
σ
σσσ
2=•
=
= ⊗ + ⊗ − ⊗ −
∑
∑Y I I J J J .
We partition these random variables into a sample and remaining vector such that
I
II
••
•
=
YK Y
Y and similarly partition iI
iiII
••
•
′ ′ = ′
gK g
g such that iI iIc•′ ′=g e and
( ) ( )( )1 1iII iI iIIc c c•′ ′ ′= − ⊗ −g e e . Collection of the study data will result in realizing the
values of I•Y . We require the predictor of iP• to be a linear function of the sample data,
( )i iI IP• • •′ ′+g a Y= and be unbiased such that ( )1 2ˆ 0i iE P Pξ ξ• •− = . The constraint will be
satisfied if 1n c′ = −a 1 when i n≤ , or if 1n′ =a 1 when i n> . The best linear unbiased
predictor is given by
( )1,
ˆ ˆ ˆiI I iII II I II I I IP α α• • • • • • •− • • ′′ ′+ + −
g Y g X V V Y X=
which simplifies to
( ) ( )ˆ 1 N n nniI I iI n I iII IP c c ck
n n− ו • • • •
′ ′ ′+ − + +
JJe Y e P Y e Y= (3.17)
where *
kvσ 2
••= and 2 *2
eIv σ σ• •= + and 2
2
1
1 Ns
eIs sN mσσ •
=
= ∑ . The variance of (3.17) when
i n≤ simplifies to
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
18
( )( ) ( ) ( )
1 2
*22*2 *2 2
ˆvar
1 22 1 1
i i
eII
P P
ck n vck v c cn n
ξ ξ
σσ σ σ
• •
• •• • •
− =
− − = − − + − + +
and when i n> reduces to
( ) ( )1 2
2*2 *2 2 2 21 1ˆvar 1 2 1i i n eI eIIP P v f c cN n N nξ ξ σ σ σ σ σ σ• • • 2 2 • • − = − + + + − + + − − −
(see c02ed51.doc) where we define ( )
22
1
Ns
eIIs s sN M m
σσ •
=
=−∑ .
6. Application We illustrate predictors for a selected physician’s practice in a randomized controlled
trial (Ockene, Hebert et al. 1996) designed to evaluate the effectiveness of a training program for physician-delivered nutrition counseling, alone and in combination with an office support program on dietary fat intake and blood LDL-cholesterol levels in patients with hyperlipidemia (WATCH). Patients enrolled in the study had duplicate venous cholesterol measures made at baseline and at 1 year follow-up. In addition, a 24 hour dietary recall was reported at baseline and 1 year follow-up. Data from these measures, in addition to baseline questionnaire data and intervention data were used for study evaluation.
The WATCH study had a cluster randomized design, with patients in physician practices forming the basic clusters. We focus our attention here on a single dependent variable, percent of Kcal from saturated fat, which we consider as a difference constructed by taking the Baseline minus 1 year measure for each patient. We also limit discussion to a single treatment arm (III) to which fourteen physician practices were randomly assigned. A summary of the average change for different physician practices assigned to this intervention is given in Table 1. We focus on predicting response for physician practice number 71. The estimate of average change in SFAT is -1.47% Kcal. This is the estimate that results treating physician practices as fixed effects. If physician practices are considered to be random, the usual mixed model estimate of the realized random effect is -0.99% Kcal (from Proc Mixed). Using a two-stage model-based approach (Scott and Smith 1969), the best linear unbiased predictor will depend on the population size. Unfortunately, the number of eligible patients in each practice was not recorded in the study. We make two assumptions about population sizes to illustrate differences in the estimator, first assuming all physician practices see
= 100M eligible patients, and second assuming half the eligible patients in a practice are included in the study.
A mixed model was fit to these data, resulting in an estimate (REML) of the variance between MDs of 0.431. We use methods of moments estimates to develop estimates of other variance components for use in developing predictors.
Table 1. List of Average Change in SF as Pct of Kcal by Physician for n=14 physicians in the Counseling and Support Intervention
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
19
Average change # of in Percent patients Sat.Fat Variance 1 -3.25 . 30 -2.31 35.48 11 -2.16 33.44 42 -2.09 34.18 25 -1.47 19.88 20 -1.38 21.82 25 -0.92 23.29 15 -0.82 14.59 10 -0.69 22.39 29 -0.46 18.70 15 0.36 28.73 38 0.37 27.70 16 0.75 17.53 33 0.91 20.97 Source: ced02p07.sas 11/16/02 by EJS
We develop predictors using variance components developed using the methods of
moments corresponding to sections 3.1-3.3. The method of moment estimate of 2τσ is given
by ( )( )
2
2 12 1
n
ii
Y Ys
f nτ=
−=
−
∑ where
1
im
i ijj
Y Y=
= ∑ and 1
1 n
ii
Y Yn =
= ∑ ; we use the REML of 2σ from a
mixed model; and we estimate 2eσ + , *2
eσ , 2eIσ • and 2
eIIσ • by 2 2
1
1 n
e i ii
s M sn+
=
= ∑ , 2
*2
1
1 ni
ei i
ssn M=
= ∑ ,
22
1
1 ni
eIi i
ssn m
•
=
= ∑ , and 2
2
1
1 ni
eIIi i i
ssn M m
•
=
=−∑ where ( )22
1
11
n
i ij iii
s Y Ym =
= −− ∑ . When 1im = , there is
no estimate of the within PSU variance. We alter sums over 1,...,i n= such that the sums exclude PSUs where 1im = .
Still to do: Develop example further, and check that it works. Write discussion.
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
20
Appendix A. Evaluation of ( )1 2
*varξ ξ R We outline the steps leading to ( )
1 2
*varξ ξ R given by (2.6) using the conditional
expansion ( ) ( ) ( )1 2 1 2 1 1 2 1
* * *| |var var varE Eξ ξ ξ ξ ξ ξ ξ ξ = + R R R . Since the vector *R is
constructed from sub-vectors for each unit given by ( )* ( )sst s t sty= ⊗R U U , we begin by
developing the expected value and variance of these sub-vectors. First,
( )2 1
*|
sMst s st
s
E yMξ ξ
= ⊗
1R U and thus,
1
2
2 1
1 11
2 2*| 2
s
N
M
MM
s ss
MN N
N
M
E MM
M
ξ ξ
⊗ ⊗
⊗ ⊗ = = ⊗ ⊗
⊗ ⊗
1y U
11y U
R y U
1y U
.
Now * *
1 1 * * * 2* *
var s s s s sM M M M MNs s s s s s s s
s s s s s
EM M M N M Mξ ξ
′ ′ ′ ′ ′⊗ ⊗ = ⊗ ⊗ − ⊗ ⊗
1 1 1 1 1Jy U y y U U y y
simplifies to *
1 1* * 2*
var s s sM M MNs s s s s s
s s s
EM N M Mξ ξ
′ ′ ′⊗ ⊗ = ⊗ − ⊗
1 1 1Jy U y y U U or
( ) *
1 2 1 1
11 1 2
2 1 2
*| * * 2
*
1 1 1 2 121 1 2 1
2 1 2 2 21 2 2
var
1 11 1
11 1
s s
N
M MNs s s s
s s
M MM M MN N N N
N
M M MN N
E EN M M
M N M M N M M
N M M MN
ξ ξ ξ ξ
′ ′ ′ = ⊗ − ⊗ ′′− − ′ ′ ′⊗ ⊗ ⊗ ⊗ ⊗ ⊗ − −
′− ′ ′⊗ ⊗ ⊗ ⊗ −=
1 1JR y y U U
1 1J 1 1y y P y y P y y P
1 1 Jy y P y y P 2
1 2
22
1 2 21 2
11
1 11 1
N
N N N
M MN N
N
M M M M MN N N N N N N
N N N
N M M
N M M N M M M
′− ′ ⊗ ⊗ −
′ ′ − − ′ ′ ′⊗ ⊗ ⊗ ⊗ ⊗ ⊗ − −
1 1y y P
1 1 1 1 Jy y P y y P y y P
since ( )1 * if *N
s sE s sNξ ′ = =IU U and ( ) ( ) ( )
1 *1 if *
1s s N NE s sN Nξ ′ = − + ≠
−U U I J . We
express this as
C03ed02.doc 3/3/2003 3:25 PM (formerly c02ed43v1.doc, c01ed37.doc)
21
( )
1 1
2 2
1 2 1
1 11 1
2 2*| 2 221
1 1var1
s
N N
M MN N
M MNM N N
s s Nss
M MN N N N
N N
M M
E M MN M N
M M
ξ ξ ξ =
′ ⊗ ⊗ ⊗ ⊗
⊗ ⊗ ⊗ ⊗ ′ = ⊕ ⊗ ⊗ − −
⊗ ⊗ ⊗ ⊗
1 1y P y P
1 1J y P y P
R y y P
1 1y P y P
.
Next we evaluate ( )1 2 1
*|varEξ ξ ξ R . First,
( ) ( ) ( )2 2
* ( ) ( ) 2 22
1var s
s
Ms sst s s t t st s s M st
s s
E y yM Mξ ξ
′ ′ ′= ⊗ − = ⊗
JR U U U U U U P . Also,
( ) ( ) ( ) ( )2 2
* * ( ) ( )* * * *2
1cov ,1
s
s
Ms sst st s s t t st st s s M st st
s s s
E y y y yM M Mξ ξ
−′ ′ ′= ⊗ − = ⊗ −
JR R U U U U U U P .
Using these expressions, ( ) ( )2
* 2
1
1var1
s
s
Ms s
s st s s Mts s
yM Mξ + =
′′= ⊕ − ⊗ ⊗ −
y yR U U P . As a result,
( )1 2 1
* 2| 1 1
1var1
s
s
MNs s
st N Ms ts s
E yM Mξ ξ ξ = =
′ = ⊕ ⊕ − ⊗ ⊗ −
y yR I P , and combining terms results in the
expression for the variance given by (2.6).