27
General Design Bayesian Generalized Linear Mixed Models Y. Zhao, J. Staudenmayer, B.A. Coull and M. P. Wand 1 18th June, 2004 Abstract. Linear mixed models are able to handle an extraordinary range of com- plications in regression-type analyses. Their most common use is to account for within-subject correlation in longitudinal data analysis (e.g. Laird and Ware, 1982). They are also the standard vehicle for smoothing spatial count data (e.g. Wake- field, Best and Waller, 2000). However, when treated in full generality, mixed mod- els can also handle spline-type smoothing and closely approximate kriging (e.g. Robinson, 1991; Speed, 1991). This allows for nonparametric regression models (e.g. additive models, varying coefficient models) to be handled within the mixed model framework. The key is to allow the random effects design matrix to have general structure; hence our label general design. For continuous response data, par- ticularly when Gaussianity of the response is reasonably assumed, computation is now quite mature and supported by the SAS, S-PLUS and R packages. Such is not the case for binary and count responses where generalized linear mixed models (GLMMs) are required, but are hindered by the presence of intractable multivari- ate integrals. Software known to to us supports special cases of the GLMM (e.g. PROC NLMIXED in SAS or glmmML in R) or relies on the sometimes crude Laplace- type approximation of integrals (e.g. the SAS macro glimmix or glmmPQL in R). This paper describes the fitting of general design generalized linear mixed models. A Bayesian approach is taken and Markov Chain Monte Carlo (MCMC) is used for estimation and inference. In this generalized setting MCMC requires sampling from non-standard distributions. In this article, we demonstrate that the MCMC package WinBUGS facilitates sound fitting of general design Bayesian generalized linear mixed models in practice. Key words and phrases: Generalized additive models; Hierarchical centering; Krig- ing; Markov chain Monte Carlo; Nonparametric regression; Penalized splines; Spa- tial count data; WinBUGS. 1 Y. Zhao is Mathematical Statistician, Division of Biostatistics, Center for Devices and Radiologi- cal Health, US Food and Drug Administration, 5600 Fishers Lane, Rockville, Maryland 20857, U.S.A. J. Staudenmayer is Assistant Professor, Department of Mathematics and Statistics, University of Mas- sachusetts, Amherst, Massachusetts 01003, U.S.A. B.A. Coull is Assistant Professor, Department of Bio- statistics, School of Public Health, Harvard University, 665 Huntington Avenue, Boston, Massachusetts 02115, U.S.A. M.P. Wand is Professor, Department of Statistics, School of Mathematics, University of New South Wales, Sydney 2052, Australia. 1

General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

General Design BayesianGeneralized Linear Mixed ModelsY. Zhao, J. Staudenmayer, B.A. Coull and M. P. Wand1

18th June, 2004

Abstract. Linear mixed models are able to handle an extraordinary range of com-plications in regression-type analyses. Their most common use is to account forwithin-subject correlation in longitudinal data analysis (e.g. Laird and Ware, 1982).They are also the standard vehicle for smoothing spatial count data (e.g. Wake-field, Best and Waller, 2000). However, when treated in full generality, mixed mod-els can also handle spline-type smoothing and closely approximate kriging (e.g.Robinson, 1991; Speed, 1991). This allows for nonparametric regression models(e.g. additive models, varying coefficient models) to be handled within the mixedmodel framework. The key is to allow the random effects design matrix to havegeneral structure; hence our label general design. For continuous response data, par-ticularly when Gaussianity of the response is reasonably assumed, computation isnow quite mature and supported by the SAS, S-PLUS and R packages. Such is notthe case for binary and count responses where generalized linear mixed models(GLMMs) are required, but are hindered by the presence of intractable multivari-ate integrals. Software known to to us supports special cases of the GLMM (e.g.PROC NLMIXEDin SASor glmmMLin R) or relies on the sometimes crude Laplace-type approximation of integrals (e.g. the SASmacro glimmix or glmmPQLin R).This paper describes the fitting of general design generalized linear mixed models.A Bayesian approach is taken and Markov Chain Monte Carlo (MCMC) is usedfor estimation and inference. In this generalized setting MCMC requires samplingfrom non-standard distributions. In this article, we demonstrate that the MCMCpackage WinBUGSfacilitates sound fitting of general design Bayesian generalizedlinear mixed models in practice.

Key words and phrases: Generalized additive models; Hierarchical centering; Krig-ing; Markov chain Monte Carlo; Nonparametric regression; Penalized splines; Spa-tial count data; WinBUGS.

1Y. Zhao is Mathematical Statistician, Division of Biostatistics, Center for Devices and Radiologi-cal Health, US Food and Drug Administration, 5600 Fishers Lane, Rockville, Maryland 20857, U.S.A.J. Staudenmayer is Assistant Professor, Department of Mathematics and Statistics, University of Mas-sachusetts, Amherst, Massachusetts 01003, U.S.A. B.A. Coull is Assistant Professor, Department of Bio-statistics, School of Public Health, Harvard University, 665 Huntington Avenue, Boston, Massachusetts02115, U.S.A. M.P. Wand is Professor, Department of Statistics, School of Mathematics, University of NewSouth Wales, Sydney 2052, Australia.

1

Page 2: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

1 INTRODUCTION

The generalized linear mixed model (GLMM) is one of the most useful structuresin modern Statistics, allowing many complications to be handled within the fa-miliar linear model framework. The fitting of such models has been the subjectof a great deal of research over the past decade. Early contributions to fittingvarious forms of the GLMM include Stiratelli, Laird and Ware (1984), Andersonand Aitkin (1985), Gilmour, Anderson and Rae (1985), Schall (1991), Breslow andClayton (1993) and Wolfinger and O’Connell (1993). A summary is provided inMcCulloch and Searle (2000, Chapter 10).

Most of the literature on fitting GLMMs is geared towards grouped data. Ex-amples include repeated binary responses on a set of subjects and standardizedmortality ratios in geographical sub-regions. However, GLMMs are much richerthan the sub-class needed for these situations. The key to full generality is the useof general design matrices, for both the fixed and random components. Once again,we refer to McCulloch and Searle (2000, Chapter 8) for an overview of generaldesign GLMMs. An excellent synopsis of general design linear mixed modelsis provided by Robinson (1991) and the ensuing discussion. One of the biggestpayoffs from the general design framework is the incorporation of nonparamet-ric regression, or smoothing, through penalized regression splines (e.g. Wahba,1990; Speed, 1991; Verbyla, 1994; Brumback, Ruppert and Wand, 1999). Higher-dimensional extensions essentially correspond to generalized kriging (Diggle,Tawn and Moyeed, 1998). This allows for smoothing-type models such as gener-alized additive models to be fit as a GLMM, and combined with the more tradi-tional grouped data uses. This is the main thrust of the recent book by Ruppert,Wand and Carroll (2003) and a summary is provided by Wand (2003). Generaldesigns also permit the handling of crossed random effects (e.g. Shun, 1997) andmultilevel models (e.g. Goldstein, 1995; Kreft and de Leeuw, 1998)

The simplest method for fitting general design GLMMs involves Laplace ap-proximation of integrals (Breslow and Clayton, 1993; Wolfinger and O’Connell,1993) and is commonly referred to as penalized quasi-likelihood (PQL). However,the approximation can be quite inaccurate in certain circumstances. Breslow andLin (1995) and Lin and Breslow (1996) show that PQL leads to estimators thatare asymptotically biased. For situations such as paired binary data the PQLapproximation is particularly poor. In their summary of PQL McCulloch andSearle (2000, Chapter 10, pp. 283-284) conclude by stating that they “cannot rec-ommend the use of simple PQL methods in practice”. In this article we takea Bayesian approach and explore the Markov Chain Monte Carlo (MCMC) fit-ting of general design GLMMs. One advantage of a Bayesian approach over itsfrequentist counterpart include the fact that uncertainty in variance componentsis more easily taken into account (e.g. Handcock and Stein, 1993; Diggle et al.,1998). As summarized in Section 9.6 of McCulloch and Searle (2000) the fre-

2

Page 3: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

quentist approach to this problem is thwarted by largely intractable distributiontheory. Under a Bayesian approach, posterior distributions of parameters of in-terest take this variability into account. The hierarchical structure of the BayesianGLMMs lends itself to Gibbs sampling schemes, albeit with some non-conjugatefull conditionals, to sample from these posteriors. In addition, it is computation-ally simpler to obtain variance estimates of the predictions of the random effects.Booth and Hobert (1998) showed that, in a frequentist framework, second-orderestimation of the conditional standard error of prediction for the random effectsrequires bootstrapping the maximum likelihood estimates of the fixed effects andvariance components. For complicated random effects structures, computationof a single maximum likelihood fit can be expensive, making the bootstrap com-putationally prohibitive. However, in the Bayesian framework, interest focuseson the posterior variance of the random effects given the data, which is a by-product of the MCMC output.

There have been a few other contributions to Bayesian formulations of GLMMsin the literature. Those known to us are Zeger and Karim (1993), Clayton (1996),Diggle, Tawn and Moyeed (1998), and Fahrmeir and Lang (2001). However, eachof these articles are geared towards special cases of GLMMs. The GLMMs de-scribed in this article are much more general and allow for random effects modelsfor longitudinal data, crossed random effects, smoothing of spatial count data,generalized additive models, generalized geostatistical models, additive modelswith interactions, varying coefficient models and various combinations of these(Wand, 2003).

Section 2 lays out notation for general design GLMMs and gives several im-portant examples. MCMC implementation is described in Section 3, with a focuson the WinBUGSpackage. Section 4 provides three illustratory data analyses. Weclose with some discussion in Section 5.

2 MODEL FORMULATION

GLMMs for canonical one-parameter exponential families (e.g. Poisson, logistic)and Gaussian random effects take the general form

[y|βββ ,u] = exp{yT(Xβββ + Zu)− 1Tb(Xβββ + Zu) + 1Tc(y)}, (1)

[u|G] ∼ N(0,G) (2)

where here, and throughout, the distribution of a random vector x is denotedby [x] and the conditional distribution of y given x is denoted by [y|x] .

In the Poisson case b(x) = ex , while in the logistic case b(x) = log(1 + ex) .A few other models (e.g. gamma, inverse Gaussian) also fit into this structure(McCullagh and Nelder, 1989). A number of extensions and modifications are

3

Page 4: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

possible. One is to allow for overdispersion, especially in the Poisson case. Inthis paper we will restrict attention to the canonical one-parameter exponentialfamily structure.

In most situations, the main parameters of interest are contained in βββ andG . Throughout we will take the prior distribution of βββ to be of the form

[βββ ] ∼ N(0,F)

for some covariance matrix F . In practice it is common to take F to be diagonalwith very large entries, corresponding to non-informative priors on the entriesof βββ . The components of G are discussed in Section 2.1.

It is important to separate out random effects structure for handling group-ing. One reason is that it allows for the possibility of hierarchical centering in theMCMC implementations (Section 2.2). It also recognizes the different covariancestructures used in longitudinal data modeling, smoothing and spatial statistics.Such considerations suggest the breakdown

Xβββ + Zu = XR βββ R + ZRuR + XG βββ G +L∑

`=1

ZG`u

G` + ZCuC, (3)

where

XR ≡

XR1

...XR

m

, ZR ≡ blockdiag16i6m

(XRi )

andCov(uR) ≡ blockdiag

16i6m

(ΣΣΣ R) ≡ Im ⊗ ΣΣΣ R

correspond to random intercepts and slopes, as typically used for repeated mea-sures data on m groups with sample sizes n1, . . . , nm . Here XR

i is a ni × qR

matrix for the random design corresponding to the i th group, ΣΣΣ R is an unstruc-tured qR × qR covariance matrix and ⊗ denotes Kronecker product.

Next, ZG is a general design matrix, usually of different form than that aris-ing in random effects models. In many of our examples, ZG contains spline basisfunctions and may be further decomposed as ZG = [ZG

1 . . . ZGL] , with each ZG

`

corresponding to a smooth term in an additive model. Also, in keeping withspline penalization, we only consider

Cov(uG) = blockdiag16`6L

(σ2u`I)

for some integer L .We note that the decomposition of the random effect vector u into uR and

uG may not be unique. For instance, in the crossed random effects model givenin the following Example 3, we present two ways of decomposition.

4

Page 5: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

The ZCuC component represents random effects with spatial correlation struc-ture. This can be done in a number of ways (e.g. Wakefield, Best and Waller,2000); we will just describe one of the more common approaches here. Supposedisease incidence data are available over N contiguous regions. The random ef-fect uC vector is of dimension N and is assumed to follow a Gaussian intrinsicautoregression prior distribution. We can specify the inverse covariance matrixof the Gaussian intrinsic autoregression distribution according to Besag, Yorkand Mollie (1991) and give uC an improper density proportional to

(σ2c )−N/2 exp

−∑i∼j

12σ−2

c (U ci − U c

j )2

, (4)

where i ∼ j denotes spatially adjacent groups. The conditional distributions ofU c

i given U cj , j 6= i , is a univariate normal distribution with mean equal to the

average U cj values of neighboring regions, and variance equal to σ2

c divided bythe number of spatially contiguous groups of U c

i .The versatility of (3) can be appreciated by considering the following set of

examples. Note that we use truncated linear basis functions for smoothing com-ponents to keep the formulations simple (e.g. Brumback, Ruppert and Wand,1999). In practice these may be replaced by B-splines (Durban and Currie, 2003)or radial basis functions (French, Kammann and Wand, 2001). Knots are de-noted by κk with possible superscripting. Ruppert (2002) discusses choice ofknots of univariate smoothings, whereas Nychka and Saltzmann (1998) describethe choice of knots for multivariate smoothing and kriging. In the examples weuse 1d to denote a d× 1 vector of ones.

Example 1: Random intercept

(Xβββ + Zu)ij = β0 + Ui + β1xij , 1 6 j 6 ni, 1 6 i 6 m,

XRi = 1ni , XG = [xij ], ZG = ZC = ∅, ΣΣΣ R = σ2

u.

Example 2: Random intercept and slope

(Xβββ + Zu)ij = β0 + Ui + (β1 + Vi)xij , 1 6 j 6 ni, 1 6 i 6 m,

XRi =

1 xi1...

...1 xini

, XG = ZG = ZC = ∅,

ΣΣΣ R =[

σ2u ρuvσuσv

ρuvσuσv σ2v

].

5

Page 6: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Example 3: Crossed random effects model

(Xβββ + Zu)ii′ = β0 + Ui + U ′i′ , 1 6 i 6 n, 1 6 i′ 6 n′,

XG = 1nn′ , ZG = [In ⊗ 1n′ | 1n ⊗ In′ ], XR = ZR = ZC = ∅,

uG = [U1, . . . , Un, U ′1, . . . , U

′n′ ]T, Cov(uG) = blockdiag(σ2

uIn, σ2u′In′).

An alternative representation of this model is

XRi = 1n′×1, ZG = [1n ⊗ In′ ], XG = ZC = ∅,

ΣΣΣ R = σ2u, Cov(uG) = σ2

u′In′ .

This allows for implementation of hierarchical centering as described in Sec-tion 2.2.

Example 4: Nested random effects model

(Xβββ + Zu)ijk = β0 + Ui + Vj(i) + β1 xijk, 1 6 i 6 m, 1 6 j 6 n, 1 6 k 6 p,

XG = [1 xijk]16i6m,16j6n,16k6p, ZG = [Im⊗1np | Im⊗(In⊗1p)], XR = ZR = ZC = ∅,

uG = [U1, . . . , Um, V1(1), . . . , Vn(1), . . . , V1(m), . . . , Vn(m)]T,

Cov(uG) = blockdiag(σ2uIm, σ2

vInp).

Example 5: Generalized scatterplot smoothing

(Xβββ + Zu)i = β0 + β1xi +K∑

k=1

uk(xi − κk)+,

XG = [1 xi]16i6n, ZG = [(xi − κk)+1≤k≤K

]1≤i≤n, XR = ZR = ZC = ∅,

Cov(uG) = σ2uIK .

6

Page 7: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Example 6: Generalized additive model

(Xβββ + Zu)i = β0 + βssi +Ks∑k=1

usk(si − κs

k)+ + βtti +Kt∑k=1

utk(ti − κt

k)+,

XG = [1 si ti]16i6n, ZG = [(si − κsk)+

16k6Ks

(ti − κtk)+

16k6Kt

]16i6n, XR = ZR = ZC = ∅,

Cov(uG) = blockdiag(σ2usIKs , σ2

utIKt).

Example 7: Generalized additive semiparametric mixed model

(Xβββ + Zu)ij = β0 + Ui + (βq + Vi)qij + (βr + Wi)rij + β1xij

+βssij +Ks∑k=1

usk(sij − κs

k)+ + βttij +Kt∑k=1

utk(tij − κt

k)+,

XRi =

1 qi1 ri1...

...1 qini rini

, XG = [sij tij xij ]16j6ni,16i6m,

ZG = [(sij − κsk)+

16k6Ks

(tij − κtk)+

16k6Kt

], ZC = ∅,

ΣΣΣ R = unstructured 3× 3 covariance matrix, Cov(uG) = blockdiag(σ2usIKs , σ2

utIKt).

Example 8: Generalized bivariate smoothing/low-rank kriging

(Xβββ + Zu)i = β0 + βββ T1 xi +

K∑k=1

ukC(‖xi − κκκ k‖),

XG = [1 xTi ]16i6n ZG = [C(‖xi − κκκ k‖)

16k6K

]16i6n, XR = ZR = ZC = ∅,

Cov(uG) = σ2uI.

Here ‖v‖ ≡√

vTv , and C(r) = r2 log |r| corresponding to low-rank thin platesplines with smoothness parameter set to 2 (as defined in Wahba, 1990); C(r) =exp(−|r/ρ|)(1 + |r/ρ|) corresponding to Matern low-rank kriging with range ofρ > 0 and smoothness parameter set to 3/2 (as defined in Stein, 1999; Kammannand Wand, 2003). Several more examples could be added, including some whereZC 6= ∅ .

7

Page 8: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

2.1 Covariance Matrix Priors

Over the last decade and a half, prior elicitation for the variance componentsin Bayesian GLMMs has been an active area of statistical research. Several au-thors have demonstrated that the use of improper priors for these parameterscan lead to improper posteriors, with Gibbs samplers unable to detect such ill-conditioning (Hobert and Casella 1996). As a result, a popular choice is the useof proper but “diffuse” conditionally conjugate priors. In the GLMM settingwith normal random effects, this corresponds to an inverse Gamma (IG) dis-tribution for a single variance component, and an inverse Wishart distributionfor a variance-covariance matrix. For hierarchical versions of GLMMs, however,recent research has shown that these priors can actually be quite informative,leading to inferences that are sensitive to choice of the hyperparameters for thesedistributions (Natarajan and McCulloch 1998; Natarajan and Kass 2000; Gelman2004). Natarajan and Kass (1998) and Gelman (2004) have proposed alternativeprior elicitation strategies that improve upon the conditionally conjugate priors.In Section 4, we outline a sensitivity analysis approach that takes these latestproposals into account.

2.2 Hierarchical Centering

Hierarchical centering of parameters has been shown to improve convergenceof Markov Chain Monte Carlo schemes (Section 3) for fitting Bayesian mixedmodels (e.g. Gelfand, Sahu and Carlin, 1995). In the context of this section, hier-archical centering involves reparametrization of (βββ R, uR) to (βββ R, γγγ ) where

γγγ ≡ {(ZR)TZR}−1(ZR)TXR βββ R + uR.

The new vector of parameters γγγ can be further divided into m sub-vectors γγγ i

with γγγ i = βββ R + uRi ; so that

γγγ =

γγγ1...

γγγm

.

Then the general design generalized linear mixed model becomes

Xβββ + Zu = ZR γγγ + XG βββ G +L∑

`=1

ZG`u

G` + ZCuC .

Note that hierarchical centering is not a well-defined concept for general de-sign or spatial structures since uG and uC cannot be centered in a hierarchicalway similar to that for uR . As a result, the general design and spatial struc-tures do not contribute to the model for the mean in a conditionally hierarchicalmanner.

8

Page 9: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

2.3 Applications

This section describes three public health applications that benefit from generaldesign Bayesian GLMM analysis. The analyses are postponed to Section 4.

2.3.1 Respiratory infection in Indonesian children

Our first example involves longitudinal measurements on 275 Indonesian chil-dren. Analyses of these data have appeared previously in the literature (e.g. Dig-gle, Liang and Zeger, 1995; Lin and Carroll, 2001) so our description of them willbe brief. The response variable is binary: the indicator of respiratory infection.The covariate of most interest is the indicator of Vitamin A deficiency. However,the age of the child has been seen to have a non-linear effect in previous analyses.

A plausible model for these data is the Bayesian logistic additive mixed model

logit{P (respiratory infectionij = 1)} = β0 + Ui + βββ Txij + f(ageij) (5)

where 1 6 i 6 275 indexes child and 1 6 j 6 ni indexes the repeated measureswithin child. Here Ui

ind.∼ N(0, σ2U ) is a random child effect, xij denote measure-

ments on a vector of 9 covariates: height, indicators for vitamin A deficiency, sexand stunting and visit number, and f is modelled using penalized splines withspline basis coefficients uk

ind.∼ N(0, σ2u) .

2.3.2 Caregiver stress and respiratory health

The Home Allergen and Asthma study is an ongoing longitudinal study thatis investigating risk factors for incidence of childhood respiratory problems in-cluding asthma, allergy and wheeze (Gold, et al., 1999). The portion of the studydata that we will consider consists of 483 families who were followed for twoand a half years after the birth of a child. At the start of the study, a numberof demographic variables were measured on each family including race, cate-gorized household income, categorized caregiver educational level, and child’sgender. Additionally, one of the hypothesized risk factors for childhood respira-tory problems is exposure to a stressful environment (Wright et al., 2004). Eachchild’s environmental stress level was measured approximately bimonthly by atelephone interview and assessed on a discrete ordinal scale from 0 (no stress) to16 (very high stress). This assessment was based on the 4-item Perceived StressScale (PSS-4) (Cohen, 1988).

Let 1 6 i 6 483 index family, and 1 6 j 6 ni index the repeated measure-ments within each family. We arrived at the following Bayesian Poisson additivemixed model for stress experience by caregiver i when the child was ageij :

stressij ∼ Poisson[exp

{β0 + Ui + βββ Txij + f(ageij)

}]. (6)

9

Page 10: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

The random intercept, Uiind.∼ N(0, σ2

U ) is a random family effect, and xi includesindicators of annual family income and race (see Figure 4 for details). The termf(ageij) is a nonparametric term that we model using penalized splines with

spline coefficients ukind.∼ N(0, σ2

u) . We include the nonparametric term in themodel for the effect of stress as a function of child’s age since, outside of anec-dotal evidence, we do not know of a biologically motivated parametric modelfor stress as a function of child’s age. We arrived at the other terms in the model(and removed other demographic terms and interactions from the model) basedon discussions with the investigators in the study and exploratory data analysesthat we fit via maximum PQL.

2.3.3 Standardized cancer incidence and proximity to a pollution source

Elevated cancer rates were observed in a region of Massachusetts, USA, knownas Upper Cape Cod during the mid-1980s, and one risk factor of interest is afuel dump at the Massachusetts Military Reservation (MMR) (Kammann andWand 2003; French and Wand 2004). For nearly twenty years the MassachusettsDepartment of Public Health (MDPH) has maintained a cancer registry databasewhich records incident cases for 22 types of cancers, including lung, breast andprostate cancers. In this example we focus on female lung cancer between 1986and 1994.

We use a semi-parametric Poisson spatial model to investigate the relation-ship between census tract level female lung cancer SIRs and distance to theMMR. Let i = 1, . . . , 45 represent the census tracts in the study, and let observedi

and expectedi be the observed and expected number of incident cases of femalelung cancer in tract i (i.e. numerator and denominator of the SIR), respectively.After fitting a number of models that included terms for additional demographicfactors and water source, we arrived at the following semiparametric Poissonspatial model:

observedi ∼ Poisson [expectedi exp {β0 + U ci + β1xi + f(disti)}] , (7)

where xi is the percentage of women in tract i who were over 15 and employedoutside the home in 1989, and disti is the distance from the centroid of cen-sus tract i to the centroid of the MMR. Here, uc = (U c

1 , . . . , U c45)

T is a vector ofspatially correlated random effects with Gaussian intrinsic autoregression distri-bution parameterized by variance component σ2

c , as defined in (4). To completethe specification of the spatial correlation model, we choose a cutoff distancevalue d and treat two census tracts as neighbors if the distance between theircentroids is less than or equal to d . We choose d=7.5 kilometers, which cor-responds to the cutoff such that every census tract has at least one neighbor.We model the nonparametric term f(disti) using penalized splines with coeffi-cients uk

ind.∼ N(0, σ2u) .

10

Page 11: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

3 FITTING VIA MARKOV CHAIN MONTE CARLO

In the general design GLMM (1) and (2) the posterior distribution of

ννν T ≡ [βββ T uT],

is

[ννν |y] =

∫exp{yTCννν − 1Tb(Cννν )− 1

2(log |G|+ ννν TV−1 ννν )} [G]dG∫ ∫exp{yTCννν − 1Tb(Cννν )− 1

2(log |G|+ ννν TV−1 ννν )} [G]dG dννν(8)

where C ≡ [X Z] , V ≡ blockdiag(F,G) , and [G] is the prior on the variancecomponents in G . These integrals are analytically intractable for most problems.Further, in the applications we consider, the dimensionality precludes the use ofnumerical integration. A standard remedy is to apply a Markov Chain MonteCarlo (MCMC) algorithm to draw samples from (8). An overview of MCMC isprovided by Gilks, Richardson and Spiegelhalter (1996).

MCMC methods break up the model parameters into subsets and then sam-ple from the conditional distribution given the remaining parameters and data;often called “full conditionals”. In the general design GLMM the natural break-down of the parameters is into ννν and G ; leading to the full conditionals:

[ννν |G,y] and [G|ννν ,y].

The latter full conditional has a standard form when the prior on the variancecomponents is inverse gamma or Wishart, which are “conditionally-conjugate”priors for this model, but not when, say, a folded-Cauchy prior is used (e.g. Gel-man, 2004). The first full conditional has the general form

[ννν |G,y] ∝ exp{yTCννν − 1Tb(Cννν )− 1

2 ννν TV−1 ννν}

,

which is a non-standard distribution unless y is conditionally Gaussian. Cleverstrategies such as adaptive rejection sampling (Gilks and Wild, 1992) and slicesampling (e.g. Besag and Green, 1993; Neal, 2003) are required to draw samples.The most common versions of these algorithms work with the full conditionalsof the components ννν . When V is diagonal, these full conditionals are of theform

[νk|ννν −k,G,y] ∝ exp{(CTy)kνk − 1Tb(ckνk + C−k ννν −k)− 12ν2

k/(V)kk}, (9)

Here ck is the k th column of C , C−k is C with the k th column omitted,νk isthe k th entry of ννν ; ν−k and ννν −k is ννν with the k th entry omitted. It is easilyshown that (9) is log-concave, which permits use of adaptive rejection samplingand simplifies slice sampling. These algorithms can also be used to sample fromthe full conditionals for the variance components when necessary.

11

Page 12: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

A companion paper to this one (Zhao, Staudenmayer, Coull and Wand, 2004)provides a detailed account of MCMC for general design GLMM and comparesseveral strategies via simulation. One of the conclusions drawn from that paperis that the WinBUGSpackage (Spiegelhalter, Thomas and Best, 2000) performsexcellently amongst various “off-the-shelf” competitors. This is very good newssince it saves the user having to write his or her own MCMC code. However,it should be noted that for large models WinBUGScan take quite some time toobtain a fit. Also, the analysis must be performed on a particular platform (Win-dows). But assuming that computation time is not an issue and that Windows isavailable we can report that, in 2004, fitting of general design GLMMs via Win-BUGShas a large chance of success. For our analyses, we had access to severalpersonal computers and ran multiple chains in parallel to assess converengceand prior sensitivity. This reduced the elapsed time it took to compute each oneof the analyses by an order of magnitude.

4 DATA ANALYSES

4.1 Input Values and Prior Distributions

We used WinBUGSto fit the models described in Section 2.3. However, severalinput values and prior distributions needed to be specified, so we preface theanalyses with the particular choices that were made.

Based on the recommendations of Gelfand et al. (1995) we used hierarchicalcentering of random effects. All continuous covariates were standardized to havezero mean and unit standard deviation. A strategy such as this is necessary forthe method to be scale invariant given fixed choices for the hyperparameters. Inthe interest of making the fitted functions smooth and requiring fewer knots weused radial cubic basis functions. This corresponds to f(x) = β0 + β1 x + Zxuwhere

Zx = [|x− κk|316k6K

][|κk − κk′ |316k,k′6K

]−1/2 and u ∼ N(0, σ2uI) (10)

(French, Kammann and Wand, 2001) with κk =(

k+1K+2

)th quantile of the unique

predictor values. In general, K can be chosen using rules such as

K = min(14(number of unique predictor values, 35))

or those given in Ruppert (2002). However, often considerably smaller K canbe used through experimentation with the benefit of faster MCMC fitting. Thisapproach was taken in our analyses.

We considered several common variance component priors. These were in-verse gamma with equal scale and shape:

[σ2] ∝(σ2

)−(a+1)e−a/σ2

,

12

Page 13: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

denoted by IG(a, a) , for a = 0.001, 0.01, 0.1 (Spiegelhalter et al. 2003) and thefolded- t class of priors for σ (Gelman, 2004)

[σ] ∝(

1 +1ν

s

)2)−(ν+1)/2

,

where s and ν are fixed scale and degrees of freedom hyperparameters, respec-tively. We investigated the sensitivity of the model fit to the choice of the hy-perparameters. Results showed that fits based on the IG priors were stable fora ≥ 0.01 , but those obtained assuming a = 0.001 behaved erratically. Out ofthe remaining choices, the folded-Cauchy prior, a member of the folded-t classof priors, performed well.

As a result of these empirical comparisons, in our examples we take the ap-proach of fitting models under multiple priors distributions for the variance com-ponents, and assessing the sensitivity of the results to these assumptions. Dueto its popularity, we fit general design GLMMs using independent IG(0.01, 0.01)priors for each variance components. Results suggest this prior performs wellfor the examples considered in this paper. We also re-fit the models using in-dependent folded-Cauchy prior distributions for each variance component. Fora variance component square-root σ and fixed scale parameter s , the folded-Cauchy distribution has probability distribution

[σ] ∝ (σ2 + s2)−1.

Following Gelman (2004), we take s = 25 in our examples, and check the sen-sitivity of results to this choice by also fitting the models for s = 12 . This priorcan be implemented in WinBUGSusing the flexible feature that allows a user tocode an arbitrary prior distribution for the model parameters (see Appendix).We also ran the models using a Uniform(0,100) prior on σ . A theoretical com-parison of such priors in the general design GLMM setting is a topic worthy offuture research.

Table 1 summarizes the input values and prior distributions that were used.

4.2 Respiratory Infection in Indonesian Children

Using the prior distributions and input values given in Table 1, WinBUGSpro-duced the output for the βββ coefficients summarized in Figure 1. It is seen that,for this model, the chains mix quite well with little significant autocorrelationand Gelman-Rubin

√R values (Gelman and Rubin, 1992) all less than 1.01. Vi-

tamin A deficiency is seen to have a borderline positive effect on respiratoryinfection, which is in keeping with previous analyses. Similar comments applyto sex and some of the visit numbers.

13

Page 14: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Table 1: Input valuesand prior distributionsused in WinBUGS foranalyses.

Hierarchical centering used for random intercepts.Continuous covariates standardized to havezero mean and unit standard deviation.Radial cubic basis functions for smooth functions.

length of burn-in 5000length of ‘kept’ chain 5000thinning factor 5prior for fixed effects N(0, 108)

prior for variance components

IG(0.01,0.01)

folded-Cauchy with s2 = 12, 25Uniform(0,100)

Figure 1: Summary ofWinBUGS output forparametriccomponents of (5). Thecolumns are: name ofvariable, trace plot ofsample ofcorrespondingcoefficient, plot ofsample against1-lagged sample,sample autocorrelationfunction,Gelman-Rubin

√R

diagnostic, kernelestimates posteriordensity and basicnumerical summaries.

coeff. trace lag 1 acf GR density summary

vit. A defic. ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

1.04

1

−2 −1 0 1 2

posterior mean: 0.606

95% credible interval:

(−0.402,1.55)

sex ●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●● ●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

1.04

1

−0.5 0 0.5 1 1.5

posterior mean: 0.519

95% credible interval:

(0.0131,1.02)

height ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●●

●●

● ●

● ●

● ●

● ●●

● ●

●●

● ●●

●●●

●●

●●

●●

● ●

●● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

1.04

1

−0.2 −0.1 0 0.1

posterior mean: −0.0306

95% credible interval:

(−0.0851,0.0228)

stunted●

●●

● ●

●●

●●

●●●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

1.04

1

−1 0 1 2

posterior mean: 0.498

95% credible interval:

(−0.393,1.42)

visit 2●

● ●

● ●

●●●

● ●

●●

● ●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●●

●●

●●

●●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

1.04

1

−3 −2 −1 0

posterior mean: −1.15

95% credible interval:

(−1.96,−0.392)

visit 3●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

● ●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

1.04

1

−2 −1 0 1

posterior mean: −0.617

95% credible interval:

(−1.36,0.102)

visit 4●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●●

1.04

1

−4 −3 −2 −1 0

posterior mean: −1.38

95% credible interval:

(−2.31,−0.519)

visit 5● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

1.04

1

−1 −0.5 0 0.5 1 1.5 2

posterior mean: 0.437

95% credible interval:

(−0.181,1.08)

visit 6 ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

●●

● ●

● ●

●●

● ●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●●●

●●

●●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

1.04

1

−2 −1 0 1

posterior mean: −0.071

95% credible interval:

(−0.824,0.664)

14

Page 15: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

The estimated effect of age is summarized in the top left panel of Figure 2 andis seen to be significant and non-linear. The remaining panels show good mixingof the chains corresponding to the estimated age effect at quartiles of the agedata. Gelman-Rubin

√R plots (not shown here) support convergence of these

chains.

Figure 2: Summary ofWinBUGS output forestimate of f(age) .The top left panel is theposterior mean of theestimated probabilityof respiratory infectionwith all othercovariates set to theiraverage values. Theshaded region is acorrespondingpointwise 95% credibleset. The remainingpanels are trace plots ofsamples used toproduce the top leftplot at quartiles of theage data.

1 2 3 4 5 6 7

0.00

0.10

0.20

age (years)

est.

P(r

espi

rato

ry in

fect

ion)

0 200 400 600 800

0.05

0.15

first quartile of age

index

0 200 400 600 800

0.02

0.06

0.10

median of age

index

0 200 400 600 800

0.01

0.03

third quartile of age

index

To assess the sensitivity of our conclusions to the choice of variance compo-nent priors, we also ran the Gibbs samplers assuming the folded-Cauchy andUniform priors for the random effects standard deviations (See Section 2.1). Fig-ure 3 shows the posterior estimates and 95% credible intervals for the regres-sion coefficients of interest using the default independent IG priors, indepen-dent folded-Cauchy priors with s = 25 , independent folded-Cauchy priors withs = 12 , and independent U(0, 100) priors. This figure shows that results are notsensitive to this choice, with the changes in the posterior means never more than2% of that obtained from the IG specification and the credible intervals nevermore than 6.5% wider than their IG counterparts.

4.3 Caregiver Stress and Respiratory Health

For this example, we also used the priors and input values given in Table 1 andprovide WinBUGS code in the Appendix. For the spline, we used twelve knots

15

Page 16: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Figure 3: Rseults ofsensitivity analysis forvariance componentpriors for model 5.

Dots are posterior means. Lines are 95% credible intervals.

coef

ficie

nt p

oste

rior

(uni

ts o

f sta

ndar

d de

viat

ion)

●●●●

●●●●

●●●●

●●●●

●●●●

●●●●

●●

●●

●●●●

●●●●

−4

−2

02

4

vit A

def

sex

heig

ht

stun

ted

visi

t 2

visi

t 3

visi

t 4

visi

t 5

visi

t 6

IG(0.01,0.01)Cauchy (s=12)Cauchy (s=25)Uniform (0,100)

that were spaced evenly on the percentiles of age. We found that the fit did notchange noticeably if we used more knots, and we chose a small number of knotsfor computational efficiency.

Figure 4 shows the Bayes estimates and credible intervals for the βββ coeffi-cients as well as an assessment of the convergence of the chains. The coefficientscan be interpreted as category specific offsets from the population mean. Thechains had a moderate autocorrelation and the Gelman-Rubin

√R values were

all less than 1.04. Figure 5 contains the estimated age effect and trace plots forthe effect of age at the quartiles of the data. Again, the Gelman-Rubin

√R val-

ues were less than 1.04 and support convergence. The figures are based on thechain that used the independent inverse gamma priors for the variance compo-nents. Fits that used independent Cauchy (s=25) priors for the square root of thevariance components changed neither the posterior means nor the widths of thecredible intervals for the parameters of interest by more than 4.7%. The posteriormean and confidence set for f(age) was also relatively insensitive to the prioron the variance components in this example.

Two aspects of the fit that were interesting to the investigators in the studyincluded the inverse dose response relation between income and stress and thatrace was significantly related to environmental stress even after accounting forthe effect of income. The nonparametric estimate of stress as a function of the

16

Page 17: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

child’s age was also interesting and suggests that relatively stressful times in-clude the first few months, when the child is approximately a year old, and be-yond age two.

Figure 4: Summary ofWinBUGS output forparametriccomponents of (6). Thecolumns are: name ofvariable, trace plot ofsample ofcorrespondingcoefficient, plot ofsample against1-lagged sample,sample autocorrelationfunction,Gelman-Rubin

√R

diagnostic, kernelestimates posteriordensity and basicnumerical summaries.The coefficients can beinterpreted as timeinvariant offsets to thetime varyingpopulation mean.

coeff. trace lag 1 acf GR density summary

white or Asian

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

Series x[[plot.ind]][, j]

1.04

1

−0.2−0.15−0.1−0.05 0 0.05

posterior mean: −0.0734

95% credible interval:

(−0.134,−0.0142)

black or Hispanic●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

Series x[[plot.ind]][, j]

1.04

1

−0.05 0 0.05 0.1 0.15 0.2

posterior mean: 0.0734

95% credible interval:

(0.0142,0.134)

$50K+●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

● ●

● ●

● ●

●●

●●

●●

●●

●●

Series x[[plot.ind]][, j]1.

041

−0.4 −0.3 −0.2 −0.1 0

posterior mean: −0.19

95% credible interval:

(−0.276,−0.104)

$15K−$50K●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

Series x[[plot.ind]][, j]

1.04

1

−0.2 −0.1 0 0.1 0.2

posterior mean: −0.0033

95% credible interval:

(−0.0815,0.0784)

<$15K●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

● ●

●●

● ●

●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●

● ●

●●

● ●●

● ●

● ●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

● ●●

●●

Series x[[plot.ind]][, j]

1.04

1

−0.2 0 0.2 0.4

posterior mean: 0.193

95% credible interval:

(0.0475,0.336)

4.4 Standardized Cancer Incidence and Proximity to a Pollution Source

As in the previous examples, we started with the prior distributions and inputsin Table 1. In this case though, the chain required a longer burn in. We foundthat a burn in of length 15,000 was sufficient to produce acceptable convergence.Figure 6 (bottom panel) contains the resulting convergence diagnostics and in-ferences for the parameters in the model. The middle panel of Figure 6 containsan estimate of the contribution of distance to the MMR to the standardized in-cidence and trace plots of the function estimate at the quartiles of distance. The

17

Page 18: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Figure 5: Summary ofWinBUGS output forestimate of f(age) .The top left panel is theposterior mean of themean stress (PSS4) as afunction of age with allother covariates set totheir average values.The shaded region is acorrespondingpointwise 95% credibleset. The remainingpanels are trace plots ofsamples used toproduce the top leftplot at quartiles of theage data.

0.5 1.0 1.5 2.0 2.5

23

45

child’s age (years)

PS

S4

(car

egiv

er s

tres

s)

0 200 400 600 800

2.2

2.4

2.6

2.8

first quartile of age

index

PS

S4

(car

egiv

er s

tres

s)

0 200 400 600 800

2.2

2.4

2.6

2.8

3.0

median of age

index

PS

S4

(car

egiv

er s

tres

s)

0 200 400 600 800

2.4

2.6

2.8

3.0

3.2

third quartile of age

index

PS

S4

(car

egiv

er s

tres

s)

Gelman-Rubin√

R values for the estimates at these quartiles were less than 1.04and support convergence. Finally, the top panel of Figure 6 maps the estimatedSIRs based on the model fit, demonstrating the smoothing achieved by the spatialmodel. The figures are based on fits that used independent IG(0.01,0.01) priorsfor the variance components. Fits that used independent Cauchy (s = 25 ) priorsfor the square root of the variance components decreased the length of the cred-ible interval for the effect of percent working by 6.7% and lowered the posteriormean by 3.1%. The posterior mean and confidence set for f(disti) also changedvery little.

The fitted model suggests a nominally positive relation between the percentof women who were working outside the home in 1989 and standardized lungcancer incidence rates at the census tract level. Further, the estimated curvef(disti) suggests an increased standardized incidence rate for census tracts thatare closer than about 10 km to the MMR after controlling for other factors, andthe map suggests that areas immediately east of the MMR exhibit the highestSIRs. None of the estimated effects of the model covariates are strongly signifi-cant. Regardless of statistical significance, however, we emphasize that this typeof “cancer cluster” study should be viewed as exploratory since the study de-sign is ecological (e.g. Kelsey et al., 1996, Chapter 10). Additionally, re-analysesof similar studies have demonstrated that unmeasured confounders could radi-

18

Page 19: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

cally change the conclusions in these types of analyses (e.g. Aherns et al., 2001).

5 DISCUSSION

As illustrated by the analyses in the previous section, general design BayesianGLMM are a very useful structure. In this article we have demonstrated thatWinBUGSprovides good “off-the-shelf” MCMC fitting of these models. Some ofthe reviewers have pointed out the possibility of designing MCMC algorithmsthat take advantage of the special structure of Bayesian GLMMs that is summa-rized in Section 2. We have done some exploration in this direction (Zhao et al.,2004) but would welcome such research from MCMC specialists. In the mean-time, use of WinBUGSis our recommended fitting method.

APPENDIX: WINBUGS CODE

In this Appendix we list the WinBUGScode used for the data analyses of Sections4. Note that the spline basis functions and hyperparameters are inputs.

The following code was used for fitting (5) to the data on respiratory infec-tion of Indonesian children. Here inverse gamma priors are used on all variancecomponents.

model{

for (i in 1:num.obs){

X[i,1] <- age[i]X[i,2] <- vitAdefic[i]X[i,3] <- sex[i]X[i,4] <- height[i]X[i,5] <- stunted[i]X[i,6] <- visit2[i]X[i,7] <- visit3[i]X[i,8] <- visit4[i]X[i,9] <- visit5[i]logit(mu[i]) <- gamma[subject[i]] + inprod(beta[],X[i,])

+ inprod(u.spline[],Z.spline[i,])resp[i] ˜ dbern(mu[i])

}for (i.subj in 1:num.subj){

gamma[i.subj] <- beta0 + u.subj[i.subj]u.subj[i.subj] ˜ dnorm(0,tau.u.subj)

}for (k in 1:num.knots){

19

Page 20: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Figure 6: Summary ofWinBUGS output forthe fit of (7). The toppanel contains a spatialplot of the thesmoothed SIRs(posterior means of theUc

i s). The middlepanel shows theestimated f(dist) anda correspondingpointwise 95% credibleset along with traceplots of the samples ofthe function at thequartiles of distance.The bottom paneldisplays summaries ofother parameters ofinterest andconvergencediagnostics.Additionally, theGelman-Rubin

√R

diagnostics were lessthan 1.03 for all theUc

i s and for thef(dist) at thequartiles of distance.The MMR is the area inthe center of the mapthat is excluded fromthe analysis.

20

Page 21: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

u.spline[k] ˜ dnorm(0,tau.u.spline)}beta0 ˜ dnorm(0,tau.beta)for (j in 1:num.pred){

beta[j] ˜ dnorm(0,tau.beta)}tau.u.spline ˜ dgamma(A.u.spline,B.u.spline)tau.u.subj ˜ dgamma(A.u.subj,B.u.subj)

}

The following code was used for fitting (6) to the data on caregiver stress and res-piratory health. This code illustrates the use of folded-Cauchy priors on variancecomponents. As noted in the WinBUGSuser manual (Spiegelhalter et al.. 2000),a single zero Poisson observation with mean φ contributes a term exp(φ) to thelikelihood for σ , which is then combined with a flat prior over the positive realline to produce the folded-Cauchy distribution.

model{

for (i in 1:num.obs){

X[i,1] <- age[i]X[i,2] <- income1[i]X[i,3] <- income2[i]X[i,4] <- race[i]log(mu[i]) <- gamma[house[i]] + inprod(beta[],X[i,])

+ inprod(u.spline[],Z.spline[i,])y[i] ˜ dpois(mu[i])

}for (i.house in 1:num.house){

gamma[i.house] <- beta0+u.subj[i.house]u.subj[i.house] ˜ dnorm(0,tau.u.subj)

}for (k in 1:num.knots){

u.spline[k] ˜ dnorm(0,tau.u.spline)}beta0 ˜ dnorm(0,tau.beta)for (j in 1:num.pred){

beta[j] ˜ dnorm(0,tau.beta)}tau.u.spline <- pow(sigma.u.spline,-2)zero.u.spline <- 0sigma.u.spline ˜ dunif(0,1000)phi.u.spline <- log((pow(sigma.u.spline,2)+pow(phi.scale.u.spline,2)))zero.u.spline ˜ dpois(phi.u.spline)tau.u.subj <- pow(sigma.u.subj,-2)zero.u.subj <- 0sigma.u.subj ˜ dunif(0,1000)

21

Page 22: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

phi.u.subj <- log((pow(sigma.u.subj,2)+pow(phi.scale.u.subj,2)))zero.u.subj ˜ dpois(phi.u.subj)

}

Below is the code that we used to fit the spatial model (7) to the Cape Cod femalelung cancer data. Please note that the variance components have inverse gammapriors, and adj , weights , and numare inputs to car.normal , the normal con-ditional autoregressive function in WinBUGS.

model{

for (i in 1:num.regions){

X[i,1] <- working[i,1]X[i,2] <- distance[i,1]theta[i] <- beta0+u.spatial[i]+inprod(beta[],X[i,])+

inprod(u.spline[],Z.spline[i,])log(mu[i]) <- log(E[i])+theta[i]O[i] ˜ dpois(mu[i])SIRhat[i] <- 100*mu[i]/E[i]

}u.spatial[1:num.regions]˜car.normal(adj[],weights[],

num[],tau.u.spatial)for (k in 1:num.knots){

u.spline[k] ˜ dnorm(0.0,tau.u.spline)}for (j in 1:num.pred){

beta[j] ˜ dnorm(0.0,tau.beta)}beta0 ˜ dnorm(0.0,tau.beta)tau.u.spatial˜dgamma(A.u.spatial,B.u.spatial)tau.u.spline˜dgamma(A.u.spline,B.u.spline)

}

ACKNOWLEDGMENTS

We are grateful for comments from Ciprian Crainiceanu, Jim Hodges, Scott Sis-son, the editor, associate editor and two referees. This research was partiallysupported by U.S. National Institute of Environmental Health Sciences grantR01-ES10844-01, U.S. National Science Foundation grant NSF-DMS 0306227 andU.S. National Institutes of Health grant ES012044.

22

Page 23: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

REFERENCES

Aherns, C., Altman, N., Casella, G., Eaton, M., Hwang, G. J. T., Staudenmayer, J.,and Stefansescu, C. (2001). Leukemia clusters and TCE Wastesites in Up-state New York: How adding covariates changes the story. Environmetrics,12, 659-672.

Anderson, D.A. and Aitkin, M. (1985). Variance component models with binaryresponses: inteviewer variability. Journal of the Royal Statistical Society, Se-ries B., 47, 203–210.

Besag, J., York, J. and Mollie, A. (1991). Bayesian image restoration, with twoapplications in spatial statistics. Annals of the Institute of Statistical Mathe-matics, 43, 1–20.

Besag, J. and Green, P.J. (1993). Spatial statistics and Bayesian computation. Jour-nal of the Royal Statistics Society, Series B , 55, 25–37.

Booth, J.G. and Hobert, J.P. (1998). Standard errors of prediction in generalizedlinear mixed models. Journal of the American Statistical Association, 93, 262-272.

Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalizedlinear mixed models. Journal of the American Statistical Association, 88, 9–25.

Breslow, N.E. and Lin X. (1995). Bias correction in generalised linear mixed mod-els with a single component of dispersion. Biometrika, 82, 81-91.

Brumback, B.A., Ruppert, D. and Wand, M.P. (1999). Comment on Shively, Kohnand Wood. Journal of the American Statistical Association, 94, 794–797.

Clayton, D. (1996). Generalized linear mixed models, pp. 275–301. In MarkovChain Monte Carlo in Practice, eds. Gilks, W.R., Richardson, S. and Spiegel-halter, D. J. London: Chapman & Hall.

Cohen, S. (1988) Psychosocial models of the role of social support in the etiologyof physical disease. Health Psychology, 7, 269–297.

Diggle, P., Liang, K.-L. and Zeger, S. (1995). Analysis of Longitudinal Data. Oxford:Oxford University Press.

23

Page 24: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Diggle, P.J., Tawn, J.A. and Moyeed, R.A. (1998). Model-based geostatistics (withdiscussion). Applied Statistics, 47, 299–350.

Durban, M. and Currie, I.(2003). A note on P-spline additive models with corre-lated errors. Computational Statistics, 18, 263–292.

Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additivemixed models based on Markov random field priors. Unpublished manuscript.

French, J.L., Kammann, E.E. & Wand, M.P. (2001). Comment on Ke and Wang.Journal of the American Statistical Association, 96, 1285–1288.

French, J.F. and Wand, M.P. (2004). Generalized additive models for cancer map-ping with incomplete covariates. Biostatistics, 3, 000-000.

Gelfand, A.E., Sahu, S.K. and Carlin, B.P. (1995). Efficient parametrisations fornormal linear mixed models. Biometrika, 82, 479–488.

Gelman, A. (2004). Prior distributions for variance parameters in hierarchicalmodels. Technical Report. Available athttp://polmeth.wustl.edu/papers/04/tau5.pdf .

Gelman, A. and Rubin, D.B. (1992). Inference from iterative simulation usingmultiple sequences (with discussion). Statistical Science, 7, 457–511.

Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov Chain MonteCarlo in Practice. London: Chapman and Hall.

Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling.Applied Statistics, 41, 337–348.

Gilmour, A. R., Anderson, R. D. and Rae, A. L. (1985). The analysis of binomialdata by a generalized linear mixed model. Biometrika, 72, 593–599.

Gold, D.R., Burge, H.A., Carey, V, Milton, D.K., Platts-Mills, T. and Weiss, S.T.(1999). Predictors of repeated wheeze in the first year of life: the relativeroles of cockroach, birth weight, acute lower respiratory illness, and mater-nal smoking. American Journal of Respiratory and Critical Care Medicine160, 227–236.

Goldstein, H. (1995). Multilevel Statistical Models, London: Edward Arnold.

24

Page 25: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Handcock, M.S. and Stein, M.L. (1993). A Bayesian analysis of kriging. Techno-metrics, 35, 403–410.

Hobert, J. P. and Casella, G. (1996). The effect of improper priors on Gibbs sam-pling in hierarchical linear mixed models. Journal of the American StatisticalAssociation 91, 1461–1473.

Kammann, E.E. and Wand, M.P. (2003). Geoadditive models. Applied Statistics,52, 1–18.

Kelsey, J., Whittemore, A., Evans, A., and Thompson, W. D. (1996). Methods inObservational Epidemiology. New York: Oxford University Press.

Kreft, I. and de Leeuw, J. (1998). Introducing Multilevel Modelling, London: Sage.

Laird, N.M. and Ware, J.H. (1982). Random-effects models for longitudinal data.Biometrics, 38, 963–974.

Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixedmodels with multiple components of dispersion. Journal of the AmericanStatistical Association, 91, 1007–1016.

Lin, X. and Carroll, R.J. (2001). Semiparametric regression for clustered data.Biometrika, 88, 1179–1865.

McCullagh, P., and Nelder, J.A. (1989). Generalized Linear Models (Second Edition).London: Chapman and Hall.

McCulloch, C.E., and Searle, S.R. (2000). Generalized, Linear, and Mixed Models.New York: John Wiley & Sons.

Natarajan, R. and McCulloch, C.E. (1998). Gibbs sampling with diffuse properpriors: a valid approach to data-driven inference? Journal of Computationaland Graphical Statistics, 7, 267-277.

Natarajan, R. and Kass, R.E. (2000). Reference Bayesian methods for generalizedlinear mixed models. Journal of the American Statistical Association, 95, 227-237.

Neal, R. M. (2003). Slice Sampling (with discussion) Annals of Statistics, 31, 705–767.

25

Page 26: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Nychka, D. & Saltzman, N. (1998). Design of Air Quality Monitoring Networks.In Case Studies in Environmental Statistics (D. Nychka, Cox, L., Piegorsch,W. eds.), Lecture Notes in Statistics, Springer-Verlag, 51–76.

Robinson, G.K. (1991). That BLUP is a good thing: the estimation of randomeffects. Statistical Science, 6, 15–51.

Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journalof Computational and Graphical Statistics, 11, 735–757.

Ruppert, D., Wand, M. P. and Carroll, R.J. (2003). Semiparametric Regression. NewYork: Cambridge University Press.

Schall, R. (1991). Estimation in generalized linear models with random effects.Biometrika, 78, 719–727.

Shun, Z. (1997). Another look at the salamander mating data: a modified Laplaceapproximation approach. Journal of the American Statistical Association, 92,341–349.

Speed, T. (1991). Comment on paper by Robinson. Statistical Science, 6, 42–44.

Spiegelhalter, D., Thomas, A. and Best, N. (2000). WinBUGS Version 1.3 UserManual. www.hrc-bsu.cam.ac.uk/bugs .

Spiegelhalter, D. J., Thomas, A., Best, N. G., Gilks, W. R. and Lunn, D. (2003).BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit,Cambridge, England. www.mrc-bsu.cam.ac.uk/bugs .

Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. New York:Springer.

Stiratelli, R., Laird, N.M. and Ware, J.H. (1984). Random effects models for serialobservations with binary responses. Biometrics, 40, 961–971.

Verbyla, A.P. (1994). Testing linearity in generalized linear models. ContributedPap. 17th Int. Biometric Conf., Hamilton, Aug. 8th-12th, 177.

Wahba, G. (1990). Spline Models for Observational Data. Philadelphia: SIAM.

Wakefield, J.C., Best, N.G. and Waller, L. (2000). Bayesian approaches to diseasemapping. In Spatial Epidemiology, eds. Elliott, P., Wakefield, J.C., Best,N.G. and Briggs, D.J. Oxford: Oxford University Press. 104–127.

26

Page 27: General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

Wand, M. P. (2003). Smoothing and mixed models. Computational Statistics, 18,223–249.

Wolfinger, R. and O’Connell, M. (1993). Generalized linear mixed models: apseudo-likelihood approach. Journal of Statistical Computation and Simula-tion, 48, 233–243.

Wright, R.J., Finn, P, Contreras, J.P., Cohen, S, Wright, R.O., Staudenmayer, J.,Wand, M.P., Perkins, D., Weiss, S.T. and Gold, D.R. (2004). Chronic Care-giver Stress and IgE Expression, Allergen-Induced Proliferation, and Cy-tokine Profiles in a Birth-cohord Predisposed to Atopy. To appear in Journalof Allergy and Clinical Immunology.

Zeger, S.L. and Karim, M. R. (1993). Generalized linear models with randomeffects: a Gibbs sampling approach. Journal of the American Statistical Asso-ciation, 86, 79–86.

Zhao, Y., Staudenmayer, J., Coull, B.A. and Wand, M.P. (2004). Comparison ofMarkov Monte Carlo Methods for Generalised Linear Mixed Models. Un-published manuscript.

27