19
STATISTICS IN MEDICINE Statist. Med. 2007; 26:3782–3800 Published online 29 November 2006 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/sim.2776 Regression models for mixed Poisson and continuous longitudinal data Ying Yang , , Jian Kang, Kai Mao and Jie Zhang Department of Mathematical Sciences, Tsinghua University, Beijing 100084, People’s Republic of China SUMMARY In this article we develop flexible regression models in two respects to evaluate the influence of the covariate variables on the mixed Poisson and continuous responses and to evaluate how the correlation between Poisson response and continuous response changes over time. A scenario for dealing with regression models of mixed continuous and Poisson responses when the heterogeneous variance and correlation changing over time exist is proposed. Our general approach is first to jointly build marginal model and to check whether the variance and correlation change over time via likelihood ratio test. If the variance and correlation change over time, we will do a suitable data transformation to properly evaluate the influence of the covariates on the mixed responses. The proposed methods are applied to the interstitial cystitis data base (ICDB) cohort study, and we find that the positive correlations significantly change over time, which suggests heterogeneous variances should not be ignored in modelling and inference. Copyright 2006 John Wiley & Sons, Ltd. KEY WORDS: Poisson responses; generalized estimating equation; longitudinal data; correlation changing over time; likelihood ratio test 1. INTRODUCTION The interstitial cystitis data base (ICDB) cohort study [1] motivated this study. Interstitial cystitis (IC) is a chronic illness characterized by symptoms in at least one of three facts: pain in the pelvic or bladder area, urgency (pressure to urinate) and frequency of urination. Not all subjects diagnosed with IC have all three symptoms, and there are theories that the diagnosis includes subgroups with different underlying disease mechanisms and different sets of symptoms. One of the aims Correspondence to: Ying Yang, Department of Mathematical Sciences, Tsinghua University, Beijing 100084, People’s Republic of China. E-mail: [email protected] Contract/grant sponsor: National Natural Science Foundation of China; contract/grant number: 10671106 Received 17 October 2006 Copyright 2006 John Wiley & Sons, Ltd. Accepted 22 October 2006

Regression models for mixed Poisson and continuous longitudinal data

Embed Size (px)

Citation preview

Page 1: Regression models for mixed Poisson and continuous longitudinal data

STATISTICS IN MEDICINEStatist. Med. 2007; 26:3782–3800Published online 29 November 2006 in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/sim.2776

Regression models for mixed Poisson andcontinuous longitudinal data

Ying Yang∗,†, Jian Kang, Kai Mao and Jie Zhang

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, People’s Republic of China

SUMMARY

In this article we develop flexible regression models in two respects to evaluate the influence of the covariatevariables on the mixed Poisson and continuous responses and to evaluate how the correlation betweenPoisson response and continuous response changes over time. A scenario for dealing with regressionmodels of mixed continuous and Poisson responses when the heterogeneous variance and correlationchanging over time exist is proposed. Our general approach is first to jointly build marginal model and tocheck whether the variance and correlation change over time via likelihood ratio test. If the variance andcorrelation change over time, we will do a suitable data transformation to properly evaluate the influenceof the covariates on the mixed responses. The proposed methods are applied to the interstitial cystitisdata base (ICDB) cohort study, and we find that the positive correlations significantly change over time,which suggests heterogeneous variances should not be ignored in modelling and inference. Copyright q2006 John Wiley & Sons, Ltd.

KEY WORDS: Poisson responses; generalized estimating equation; longitudinal data; correlation changingover time; likelihood ratio test

1. INTRODUCTION

The interstitial cystitis data base (ICDB) cohort study [1] motivated this study. Interstitial cystitis(IC) is a chronic illness characterized by symptoms in at least one of three facts: pain in the pelvic orbladder area, urgency (pressure to urinate) and frequency of urination. Not all subjects diagnosedwith IC have all three symptoms, and there are theories that the diagnosis includes subgroupswith different underlying disease mechanisms and different sets of symptoms. One of the aims

∗Correspondence to: Ying Yang, Department of Mathematical Sciences, Tsinghua University, Beijing 100084, People’sRepublic of China.

†E-mail: [email protected]

Contract/grant sponsor: National Natural Science Foundation of China; contract/grant number: 10671106

Received 17 October 2006Copyright q 2006 John Wiley & Sons, Ltd. Accepted 22 October 2006

Page 2: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3783

evaluating this theory would be to determine whether symptoms tend to co-fluctuate together(suggesting a single underlying aetiology) or vary independently (suggesting multiple mecha-nisms at work). The ICDB contains longitudinal data on pain, urgency and urinary frequencyfrom a prevalent cohort of subjects with IC. For each individual, pain score (p), urinaryurgency (u) and urinary frequency ( f ) are associated with covariates, such as the demographicand clinic characteristics of patients, which were observed at different times, where variablesp and u are continuous variables and f is discrete. It is reasonable to regard f as a Pois-son variable. Our objective of this article is to develop regression models in two respects toevaluate the influence of the covariate variables on the responses and to investigate the cor-relation changing over time between the continuous response (e.g. p or u) and Poissonresponse (e.g. f ).

For the bivariate binary response and continuous response with clustering, Fitzmaurice and Laird[2] discussed regression methods for jointly analysing bivariate binary and continuous responsesthat arise from development toxicity studies. They modelled marginal expectations of both variablesand whereas the association between the binary and continuous response is regarded as a nuisanceparameter of the data. Based on the general location model of Olkin and Tate [3], they developeda likelihood-based approach to estimate the marginal mean parameters. In our ICDB data set, thediscrete variable is Poisson instead of binary. Obviously, the methods developed by Fitzmauriceand Laird [2] could not be applied directly to our problem. On the other hand, we are interested inthe correlation between Poisson response and continuous response. Therefore, we have to estimatethe correlation over time.

In this article we develop methods to jointly model marginal expectation of the Poisson re-sponse and continuous response and to explore how the correlation between the above tworesponses changes over time. The generalized estimating equation (GEE) methodology is usedto estimate the regression parameters and the correlations over time. The likelihood ratio testis employed to check whether the variances change over time. We find that the variance andcorrelation change over time in our motivated ICDB data set. Further the data transformationtechnique and likelihood ratio test approach are proposed to deal with the case of variances andcorrelation changing over time. Our general approach is first to jointly build marginal model andto check whether the variance and correlation change over time via likelihood ratio test. If thevariance and correlation change over time, we will do a suitable data transformation to prop-erly evaluate the influence of the covariates on the mixed responses. Details are discussed inSections 2.2 and 3.

We will describe the joint distribution as the product of the marginal distribution of one responseand the conditional distribution of the other. Following this idea, two models are available, depend-ing on which response is studied firstly. There are some literature on both of these methods: (1)describe the marginal distribution of discrete response first, then define the conditional distributionof the continuous one (for more details, see [3–5] and the references therein); (2) describe themarginal distribution of continuous response first, then give the conditional distribution of thediscrete one (e.g. [6]).

The remainder of the article is organized as follows. In Section 2, we describe the regressionmodel and estimation methods for bivariate Poisson and continuous responses. In Section 3 weinvestigate the correlation between Poisson and continuous responses, a regression model directlyon the correlation coefficient is proposed in Section 3.1 and we provide another approach to studythe correlation through the covariance structure of random errors over time in Section 3.2. Thelikelihood ratio test approach is used to test whether the variances and correlation change over time.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 3: Regression models for mixed Poisson and continuous longitudinal data

3784 Y. YANG ET AL.

In the context of the variances changing over time, a data transformation technique is proposed toshake the impact of heterogeneous variances upon the evaluation of the varying coefficient functionin the regression model of the mixed responses. In Section 4 we conduct Monte Carlo study toevaluate the performance of the proposed methods. Section 5 applies the approaches on the ICDBdiastases. Section 6 contains a discussion.

2. MODEL AND PARAMETER ESTIMATION

In this section we describe regression model for mixed Poisson and continuous responses andprovide methods to estimate the regression parameters.

2.1. Likelihood function

Let Xi and Yi be Poisson and continuous response, respectively, Z1i and Z2i be p× 1 andq × 1 covariate vectors, respectively, i = 1, . . . , n. Assume the marginal distribution of Xi isPoisson

f (xi |Zi ) = �xiixi ! e

−�i , 1�i�n

where log �i = �i = ZT1i�1, and �i is the parameter in Poisson distribution and �1 is a p× 1 unknown

parameter vector to be estimated, the superscript T denotes the transpose of a matrix or vector.Obviously, E(xi ) = �i , var(xi ) = �i .

Given Xi , we can assume that the conditional distribution of Yi is normal

fYi |Xi (yi |xi ) = 1√2��2

exp

{−[yi − �i − �(xi − �i )]2

2�2

}

where �i = ZT2i�2, � is a regression parameter of Yi on Xi describing their correlation and �2 is

an unknown q × 1 vector. For convenience of notations, let � = (�T2 , �)T and Wi = (ZT2i , xi − �i )T.

Then

E(Yi |Xi ) =WTi �

So, the joint distribution of (Xi , Yi ) can be written as

fXi ,Yi (xi , yi ) = fXi (xi ) fYi |Xi (yi |xi )where E(Yi ) = ZT

2i�2, �1 and �2 are regression parameters, which have marginal interpretations.To the log mean of Poisson response and the mean of continuous response, we have following

linear regression models:

log �i = ZT1i�1

�i = ZT2i�2

2.2. Likelihood equations

For convenience we assume Z1i = Z2i = Zi , otherwise we can define Zi is the combination of Z1iand Z2i if Z1i �= Z2i .

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 4: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3785

Then the likelihood equations can be written as:

n∑i=1

�li (yi , xi )��1

=n∑

i=1(Zi (xi − �i ) − Zi�i (yi − WT

i �)��−2) (1)

n∑i=1

�li (yi , xi )��

=n∑

i=1Wi (yi − WT

i �)�−2 (2)

n∑i=1

�li (yi , xi )��2

= �−1n∑

i=1(ci − �2) (3)

where �i = var(Xi ) = �i , Ci = (Yi − WTi �)2 and �= var(Ci ).

Note that the likelihood equations for �1, �, �2 given by (1)–(3) can be also written as

n∑i=1

⎛⎜⎜⎜⎜⎜⎜⎜⎝

�li��1

�li��

�li��2

⎞⎟⎟⎟⎟⎟⎟⎟⎠

=n∑

i=1

⎛⎜⎝

�i Zi −��i Zi 0

0 Wi 0

0 0 1

⎞⎟⎠⎛⎜⎜⎝

�−1i 0 0

0 �−2 0

0 0 �−1

⎞⎟⎟⎠⎛⎜⎜⎝

xi − �i

yi − WTi �

ci − �2

⎞⎟⎟⎠

=n∑

i=1

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

�E(Xi )

��1

�E(Xi )

��

�E(Xi )

��2

�E(Yi |Xi )

��1

�E(Yi |Xi )

��

�E(Yi |Xi )

��2

�E(Ci |Xi )

��1

�E(Ci |Xi )

��

�E(Ci |Xi )

��2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

T

× cov−1

⎛⎜⎝

Xi

Yi |Xi

Ci |Xi

⎞⎟⎠⎛⎜⎝

xi − E(Xi )

yi − E(Yi |Xi )

ci − E(Ci |Xi )

⎞⎟⎠

The likelihood equations above can be solved iteratively. Apply iteratively reweighed least squares(IRS) to (2) and (3) we can get maximum likelihood estimates (MLE) of � and �2

� =(

n∑i=1

WiWTi

)−1 n∑i=1

Wi yi (4)

�2 = 1

n

n∑i=1

(yi − WTi �)2 (5)

Then we can put those estimates into (1) and use Fisher Scoring algorithm to get MLE of �1.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 5: Regression models for mixed Poisson and continuous longitudinal data

3786 Y. YANG ET AL.

The covariance of (�1, �2, �, �2) can be approximated by the inverse of the Fisher information

matrix, that is

cov−1(�1, �2, �, �2) ≈

n∑i=1

⎡⎢⎢⎢⎢⎢⎢⎣

(�i + �2i �

2�−2)Zi ZTi (�−2��i )Zi Z

Ti 0 0

(�−2��i )Zi ZTi �−2Zi Z

Ti 0 0

0 0 �i�−2 0

0 0 0 12�

−4

⎤⎥⎥⎥⎥⎥⎥⎦

(6)

Note that the elements in the position (�1, �2), (�2, �

2), (�, �2) in the above covariance matrix

are all zeroes, indicating that the three groups of parameters are orthogonal, which is consistentwith the conclusion of binary data [2, 7, 8]. The conclusion of this section is similar to those in [2].

2.3. GEE methodology to handle longitudinal data

In this section we consider the case of longitudinal data, which means there are multipleobservations at different times for each unit. Here we assume that each of N independent in-dividuals has observations on ni mixed Poisson and continuous response vectors (Xi , Yi ), whereXi = (Xi1, Xi2, . . . , Xini )

T, Yi = (Yi1, Yi2, . . . , Yini )T and the covariate variables can be written

as Zi = (Zi1, Zi2, . . . , Zini ), a p× ni matrix, where Zik is a p× 1 vector.In this case, as pointed out by Liang et al. [9], it is unavailable and too complicated to use

maximum likelihood method. Here we employ GEE methodology to handle longitudinal mixedPoisson and continuous data. First, we build regression model for Poisson and continuous responsesas follows:

log(E(Xi )) = ZTi �1

E(Yik |Xi ) = ZTik�2 + �1(Xik − �ik) + �2Si

(7)

where Si =∑nij=1(Xi j − �i j ), and �1, �2 denote the correlation between Xi and Yi with �1 repre-

senting the influence of Xi on Yi within the observation and �2 the influence of other observationswithin that unit. To simplify the expression, let Wik = (ZT

ik, Xik − �ik, Si )T and � = (�T2 , �1, �2)T.

Then the second formula above can be simplified as

E(Yik |Xi ) =WTik�

Following the same idea of Fitzmaurice and Laird [2], we assume that within correlation areX and Y for the Poisson and continuous responses, respectively, and we anticipate observationswithin a unit to be positively correlated. Therefore, we get the following approximately covariancematrices:

cov(Xi ) = V1i ≈ �1/2i [(1 − X )Ii + X Ji ]�1/2

i (8)

cov(Yi |Xi ) = V2i ≈ �2[(1 − Y )Ii + Y Ji ] (9)

where �i is diagonal matrix with elements var(Xi ) = �i , Ii is an ni × ni identity matrix, and Ji isan ni × ni matrix of 1’s.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 6: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3787

The specific definitions of X , Y , �2 are as follows:

rXi j = Xi j − �i j (�1)√�i j (�1)

(10)

rYi j = Yi j − WTi j � (11)

�−1 =

(n∑

i=1

ni∑j=1

r2Xi j

)/(n∑

i=1ni − p

)(12)

X = �

(n∑

i=1

∑j> j ′

rXi j rXi j ′

)/(n∑

i=1

1

2ni (ni − 1) − p

)(13)

�2 =(

n∑i=1

ni∑j=1

r2Yi j

)/(n∑

i=1ni − p − 1

)(14)

Y = �−2

(n∑

i=1

∑j> j ′

rYi j rYi j ′

)/(n∑

i=1

1

2ni (ni − 1) − p − 1

)(15)

Similar to the likelihood equations in Section 2.2, we have

n∑i=1

⎛⎜⎜⎜⎝

�li��1

�li��

⎞⎟⎟⎟⎠ =

n∑i=1

⎛⎜⎜⎜⎝

�E(Xi )

��1

�E(Xi )

��

�E(Yi |Xi )

��1

�E(Yi |Xi )

��

⎞⎟⎟⎟⎠

T

cov−1

(Xi

Yi |Xi

)(xi − E(Xi )

yi − E(Yi |Xi )

)

This yields the following equation system of GEE’s for �1 and �:

n∑i=1

(�i Zi −(�1 + �2)�i Zi

0 Wi

)⎛⎝V−11i 0

0 V−12i

⎞⎠(

Xi − �i

Yi − WTi �

)= 0 (16)

Given a group of initial values of (�1, �), put them into (10)–(15) and then we can obtain estimates of(X , Y , �2), denoted by (X , Y , �2), and then substitute them into (8) and (9) to obtain estimatesof (V1i , V2i ), finally into (16) and solve this equation to the updated values of �1 and �, the estimateof �1 and �. So we can obtain these estimates iteratively. Furthermore, it can be showed that theestimate (�1, �) is a consistent estimate and has an asymptotic multivariate normal distribution.

In addition, ‘model-based’ covariance matrix of parameters can be estimated by

�m = I−10 (17)

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 7: Regression models for mixed Poisson and continuous longitudinal data

3788 Y. YANG ET AL.

where

I0 =n∑

i=1

(�i Zi −(�1 + �2)�i Zi

0 Wi

)⎛⎝V−1

1i 0

0 V−12i

⎞⎠(

�i Zi −(�1 + �2)�i Zi

0 Wi

)T

When cov(Xi ) �= V1i or cov(Yi |Xi ) �= V2i , this means the variance–covariance matrix is‘misspecified’, we could use another robust method to estimate covariance

�e = I−10 I1 I

−10 (18)

where

I1 =n∑

i=1

(ZTi �i −(�1 + �2)Z

Ti �i

0 WTi

)⎛⎝V−1

1i 0

0 V−12i

⎞⎠ cov(Xi , Yi )

×⎛⎝V−1

1i 0

0 V−12i

⎞⎠(ZTi �i −(�1 + �2)Z

Ti �i

0 WTi

)T

and cov(Xi , Yi ) can be replaced by [Xi −�i (�1),Yi −Wi �][Xi −�i (�1), Yi −Wi �]T in calculation.Finally, consider testing problem

H0 : L� = 0 versus H1 : L� �= 0 (19)

as noted by Zhang and Boos [10] and Rotnitzky and Jewell [11] and combining estimates (17)and (18), we can construct a statistic for test (19)

T = S(�)�mLT(L�eL

T)−1L�mS(�)T (20)

where L is a user-defined k × (2p+2) matrix, �= (�1, �2, �1, �2), S(�) is the value of generalizedestimating equation at �, which is the regression parameter resulting from GEE under the restrictedmodel L� = 0. This statistic approximately follows the chi-square distribution 2k with k degreeof freedom.

3. CORRELATION OVER TIME

One of the important problems in our study is concerned with the correlation of the two responses.We will answer question of how the correlation between Poisson and continuous response changeover time and how covariates impact on these correlations. On the other hand, we should removethe impact of heterogenous variances upon the evaluation of the varying coefficient function in theregression model of the mixed responses. For these issues arising from inference of longitudinalmixed continuous and Poisson responses, we provide a rounded scenario to deal with them in thissection.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 8: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3789

3.1. Regression model for correlation

In this section we try to construct a regression model directly on the correlation coefficient. Sincewe construct regression models, we need to get a group of sample correlation coefficients at thepoints of observed times and a group of covariate variables. There are two possible ways: first,we obtain sample correlation coefficients based on the response at the observed time points ofeach unit, and then we obtain a group of correlation coefficients and the covariate variable groupconsisting exactly of the covariate variables of each unit. Although we can obtain an estimate of thepopulation sample correlation, we would fail to observe the correlation change over time, whichis the focus of our current work. So we adopt the second way: to obtain a sample correlationcoefficient based on the response of all the units on each time point and form a group of thecorrelation coefficient at the observed time points. However, another issue may arise: how tochoose the appropriate covariate variable group? In this situation at each time point we need toconstruct a covariate variable which apparently should contain the information of all the covariatevariables of the units at this time point. A natural way is to take the average of the covariatevariables at each time point and thus form our covariate variable group. For simplicity, we supposethe observed time points are regular, that is, we observe all the individuals at the given time points,which implies n1 = · · · = nN ≡ n and we set the time points are t1, . . . , tn .

So we consider the following log-linear model:

log1 + rt j1 − rt j

= ZTt j b1 + t j b2 + �t j , 1� j�n (21)

where b1 and b2 are the regression coefficients to be estimated, rt j is the observed correlationobtained based on the responses Xit j , Yit j of all the units at a certain measure time (say t j ), that is

rt j =∑N

i=1(Xit j − Xt j )(Yit j − Yt j )√∑Ni=1(Xit j − Xt j )

2√∑N

i=1(Yit j − Yt j )2

Zt j is the average of the continuous covariates variables on time point t j , i.e. Zt j = (1/N )∑N

i=1Zit jand �t1, . . . , �tn are independent random errors with mean zero and finite variances. The estimatesof b1 and b2 in (21) can be obtained through the common least square method.

We can also fit the following non-linear correlation regression model:

rt j =eZTt jb1+t j b2 − 1

eZTt jb1+t j b2 + 1

+ � j , 1� j�n (22)

Although non-linear least square iterative method will be used to find the estimate of b1 and b2and will take more time than direct least square estimate in (21), our simulation results show thatthe former is more effective than the latter provided we choose the least square estimate in (21)as the initial values in (22) at the cost of longer computing time.

We further discuss model (21) for the correlation. Obviously, the model (21) is not applicableto the classification variables. If covariates Zi j is continuous, it seems reasonable to take averageof all the covariates of individuals at any given time. However, this could cause two majorproblems: first, the averaging is perhaps not the best method to summarize the information of allthe units. Alternative choice of covariates in (21) is possible, we take some sufficient statistics as

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 9: Regression models for mixed Poisson and continuous longitudinal data

3790 Y. YANG ET AL.

the covariates of model (21). The further research is expected. Second, even more importantly,averaging method choosing covariates in model (21) may not be applicable if covariates includediscrete, binary or categorical data and we will investigate this problem in the future.

3.2. Covariance structure of the random error

Since the correlation of the two responses is related to the covariance structure of the randomerrors, it may be helpful to focus on the covariance structure changing over time. So we changethe joint model (7) into varying-coefficient model [12]

log (xi j + 1) = Zi j1�1(t j ) + �i j1

yi j = Zi j2�2(t j ) + �i j2(23)

where independent random errors �i j ≡ (�i j1, �i j2)T ∼ N(0, �(t j ))

�(t j ) =⎛⎝ �21(t j ) (t j )�1(t j )�2(t j )

(t j )�1(t j )�2(t j ) �22(t j )

⎞⎠

is the population covariance structure at time point t j , i = 1, 2, . . . ,m j , j = 1, 2, . . . , M , m jdenotes the number of the individuals at the time point t j , and M denotes the number of theobserved reforming time points; Zi jk = (Zi jk1, . . . , Zi jkP), �k(t j ) = (�k1(t), �k1(t), . . . , �kP(t))T,k = 1, 2. The correlation function (·) could reflect how correlation between Poisson and continuousresponses change over time. At the same time, we can check whether the variance functions �21(·)and �22(·) change over time.

The covariance matrix function �(·) can be estimated using two-step methods.

1. First step: for any fixed time point t j , we use ordinary least square methods to estimate�1(t j ) and �2(t j ) based on the observations of all individuals at the given time, and thencompute the residuals �i j , i = 1, . . . ,m j . The estimate of �(t) at t = t j is defined by

�(t j ) = 1

N − p − 1

m j∑i=1

�i j �Ti j

2. Second step: using the standard smoothing technique, e.g. kernel estimate, smoothing splineor local polynomial, we can smooth all the components �1(t), �2(t) and (t) of �(t),respectively.

In order to check whether the variance function �(t) changes over time, we want to test thenull hypothesis

H0 : �(t j ) = �0 for all j = 1, . . . , M (24)

where

�0 =⎛⎝ �201 0�01�02

0�01�02 �202

⎞⎠

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 10: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3791

is a constant positive matrix, against the alternative

H1 : �(t j1) �=�(t j2) ∃ j1 �= j2 ∈ {1, . . . , M}For this testing problem, the likelihood ratio test is employed. The likelihood ratio statistic is

defined by

LR= sup�∈�0L(�|X, Y, Z)

sup�∈� L(�|X, Y, Z)= [�201�202(1 − 20)]−

∑Mj=1m j /2∏M

j=1[�21(t j )�22(t j )(1 − 2(t j ))]−m j/2(25)

where L(�|X, Y, Z) denotes the likelihood function, � = (�(t1), �(t2), . . . , �(tM )), �(t j ) = (�1(t j )T,

�2(t j )T, �1(t j ), �2(t j ), (t j )), X = (x1, x2, . . . , xM ), x j = (x1 j , . . . , xm j j ), Y = (y1, y2, . . . , yM ),

y j = (y1 j , y2 j , . . . , ym j j ), Z = (Z1, Z2, . . . , ZM ), Z j = (Z j1, Z j2), Z j1 = (Z1 j1, . . . , Zm j j1),Z j2 = (Z1 j2, . . . , Zm j j2); � denotes the whole parameter space and �0 denotes the parame-

ter space under H0. The formulae for calculating MLE �(t j ), j = 1, 2, . . . , M , and �0 are givenin Appendix A.

Under H0, the statistic −2 log LR asymptotically follows [13] the chi-square distribution 2r withr = (3M + P1 + P2) − (3 + P1 + P2) = 3(M − 1) degree of freedom, where 3M + P1 + P2 isdimension of � and 3 + P1 + P2 is dimension of �0, Pi , i = 1, 2 is the dimension of �i (t j ).

Once the null hypothesis H0 is rejected, which means that the variance function �(t) does changeover time, we have to consider how to get rid of its impact on the estimate of �i (t j ), i = 1, 2;j = 1, 2, . . . , M and on the significant test of covariates. Data transformation is a simple andeffective method to deal with this problem. Under the framework of model (23), we propose to carryout linear transformation using matrix �(t j )−1/2 for both two responses and their correspondentcovariates at the time point t j , which can transform the varying variance structures of random errorsat different time points into a uniform variance structure unchanging over time. Let x∗

i j , y∗i j and

Z∗i j1, Z

∗i j2 denote transformed response variables and covariates of the i th individual, respectively,

and the relationship between the transformed and original data are(x∗i j

y∗i j

)=�(t j )

−1/2

(log(xi j + 1)

yi j

),

(Z∗i j1

Z∗i j2

)= �(t j )

−1/2

(Zi j1

Zi j2

)(26)

After applying the transformation (26) model (23), we have the transformed model

x∗i j = Z∗

i j1�1(t j ) + �∗i j1y∗i j = Z∗

i j2�2(t j ) + �∗i j2(27)

where independent random errors �∗i j ≡ (�∗i j1, �∗i j2)T ∼ N(0, I2), I2 is a 2× 2 identity matrix.Now we can test the significance of each covariate based on model (27). For given k = 1, 2, p=

1, 2, . . . , P , consider the null hypothesis

H0 : �kp(t j ) = 0 for all j = 1, . . . , M (28)

versus

H1 : �kp(t j ) �= 0 ∃ j ∈ {1, . . . , M}where �k(t) = (�k1(t), �k2(t), . . . , �kP(t))T, k = 1, 2.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 11: Regression models for mixed Poisson and continuous longitudinal data

3792 Y. YANG ET AL.

Construct a likelihood ratio test statistic as follows:

LRk,p =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

exp {−∑Mj=1

12‖x∗

j − Z∗· j1[−p]�1[−p](t j )‖2}exp {−∑M

j=112‖x∗

j − Z∗j1�1(t j )‖2}

, k = 1

exp {−∑Mj=1

12‖y∗

j − Z∗· j2[−p]�2[−p](t j )‖2}exp {−∑M

j=112‖y∗

j − Z∗· j2�2(t j )‖2}, k = 2

(29)

where the MLE of �k[−p](t j ) = (�k1, . . . , �k(p−1), �k(p+1), . . . , �kP)T can be obtained by

�1[−p](t j ) = (Z∗· j1[−p]TZ∗· j1[−p])−1Z∗· j1[−p]

Tx∗j

�2[−p](t j ) = (Z∗· j2[−p]TZ∗· j2[−p])−1Z∗· j2[−p]

Ty∗j

which are the estimates of �1(t j ) and �2(t j ) under the null hypothesis and Z∗· jk[−p] = (Z∗· jk1, . . . ,Z∗

· jk(p−1), Z∗· jk(p+1), . . . , Z

∗· jk P)T, the dot ‘.’ in the norm is summation for all i . The definitionsof x∗

j , y∗j , etc. are similar to those of x j , y j , e.g. x∗

j = (x∗1 j , . . . , x

∗m j j

), etc.Similar to the previous asymptotic distribution of the likelihood ratio test statistic for checking

variance whether changes over time, −2 log LRk,p asymptotically follows the chi-square distribu-tion 2r with r = M degree of freedom for all given k and p.

4. SIMULATION

In order to demonstrate the performance of the proposed procedures in Sections 2 and 3, smallsample simulation experiments were done under a variety of parameter combinations.

In model (7) we take the following test parameters:

�1 = (0.5, 0.5,−0.8, 0, 0.2, −0.5, 0.7, 0.9, −0.1,−0.5)′

�2 = (0.9,−0.5, 1.2,−0.5, 2, −1.5, 0.7, 0.9,−1, −0.5)′

�1 = 2, �2 = 0.1, mi ≡ 50, M = 20

and the design matrix Z with dimension 10× 50 reshaped by 500 normal random variable withzero mean and unit variance. We repeat 200 replications and show the bias of the estimates of�1, �2, �1, �2 in Figure 1.

For joint model (23) with time-varying correlation and variances, a Monte Carlo study wasdone under a variety of regression parameter, design matrix Z , variance functions and correlationfunction changing over time. The results indicate good performance of the proposed methods.We only show a part of the fitted results in Figure 2. Here we take 50 equally spaced timepoints t = { 1

50 ,250 , . . . , 1}, the variance functions �1(t) = (1+ t)2, �2(t) = √

2 − t2 and correlationfunction (t) = 1

2 (t + sin(2�t)), true �1, �2 and the design matrix Z as the beginning of thissection.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 12: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3793

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 910

0.00

−0.05

−0.10

0.05

0.10

beta_110

0.00

0.05

−0.05

−0.10

0.10

beta_2

0.0000.0010.0020.0030.004

−0.001−0.002−0.003

−0.001−0.002−0.003

gamma_1

0.0000.0010.0020.0030.004

gamma_2

Figure 1. The bias between the estimates and the designated parameters in 200 replications.

0 0.2 0.4 0.6 0.8 1

0

0.1

−0.1

−0.2

−0.3

0.2

0.3

0.4

0.5

0.6

t

ρ(t)

true ρestimate of ρ

Figure 2. The simulation curves of the correlation coefficient. The solid line denotesthe true correlation coefficient function, dash dotted line denotes the average estimate

correlation coefficient of 200 replications.

We also conduct the simulation to evaluate the power of the proposed test (24) for checkingthe variability of the variance. We, respectively, set variance function �(t) as the following fourcases: case 1. �1(t) = �2(t) = 1, (t) = 0; case 2. �1(t) = (1+t)2, �2(t) = √

2 − t2, (t) = 0; case 3.�1(t) = �2(t) = 1, (t) = 1

2 (t + sin(2�t)); case 4. �1(t) = (1 + t)2, �2(t) = √2 − t2, (t) =

12 (t + sin(2�t)). And repeat 150, 50, 50, 50 replications for case 1, case 2, case 3, case 4, re-spectively. The test power we obtain is 0.9430, which indicates that the test has good performanceto check the variability of variance.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 13: Regression models for mixed Poisson and continuous longitudinal data

3794 Y. YANG ET AL.

Similarly, we also do the simulation to validate the power of the test (28) and we obtain thepower as 0.9820, which is good to validate the significance of covariates.

5. APPLICATION TO THE ICDB DATA

In this section we apply the models and estimation methods described in Sections 2 and 3 tothe ICDB data [1] mentioned in the Introduction. A total of 637 eligible patients were enteredinto the study and followed for symptoms of pain (p), urgency (u) and urinary frequency ( f ). Inthose responses, it is reasonable to regard urinary frequency as Poisson random variable. Someof the research results are reported in [1, 14, 15]. After deleting missing data from the data set,611 patients with 15 covariates were entered into our data analysis. For the baseline demographiccharacteristics of 611 patients, see Table I, which are almost similar to Table II in [1]. Thedefinitions of the covariates are given in Table II.

First, we use the method in Section 2.3 to estimate parameters (�1, �2, �1, �2). The Poissonresponse is urinary frequency, and the continuous ones are pain score or urinary urgency. Wehave obtained two groups of results for (p, f ) and (u, f ). Here we only report the results for(u, f ) because of the similarity of the results of two groups. The covariate ‘severity’ is a three-level categorical variable, so we introduce dummy variables to deal with it. The estimates ofthe parameters �1 and �2 are 0.258 and 0 and their corresponding p-values are 0.07 and 0.06,respectively. The estimates of the parameters �1 and �2 and their corresponding p-values are listed

Table I. Some baseline demographic characteristics of 611 patients.

Characteristics No. patients (per cent)

SexM 54(8.84)F 557(91.16)

RaceWhite 569(93.13)Other 42(6.87)

Marital statusPartnered 430(70.38)Alone 181(29.62)

EmploymentEmployed 372(60.88)Unemployed 68(11.13)Home/Retired 171(27.99)

EducationHigh School or less 260(42.55)College or Advanced 351(57.45)

Annual household income ($)Less than 30 000 175(28.64)30 000 or Greater 436(71.36)

Previous interstitial cystitis diagnosis by physicianYes 413(67.59)No 198(32.41)

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 14: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3795

Table II. The definition and type of parameters.

Parameter Definition Type

Intercept Intercept ContinuousSex Sex: M/F BinaryIncome Annual household income: Binary

Less than/more than 30 000$shx 2 Previous interstitial cystitis Binary

diagnosis by physician: yes/nourod 7 Volume at first sensation Continuousurod 9 Volume at maximal capacity ContinuousAge Age ContinuousSeverity Severity of symptoms Categorical:3

Table III. Parameter estimates of (u, f ).

�1 �2

Parameter Estimate p Estimate p

Sex −0.438 0.00 −0.380 0.07Income 0.003 0.96 −0.453 0.00shx 2 0.217 0.00 0.161 0.29urod 7 −0.001 0.00 −0.002 0.26urod 9 −0.002 0.00 −0.001 0.20Age 0.021 0.00 0.003 0.56Severity

sev 1 −0.045 0.86 3.725 0.00sev 2 0.436 0.07 5.406 0.00sev 3 0.934 0.00 6.656 0.00

in Table III. The p values of these variables are the results of hypothesis tests whether each levelterm is zero.

From Table III we can see that covariates ‘sex’, ‘shx 2’, ‘urod 7’, ‘urod 9’,‘age’ and ‘severity’are significant under the same size in the estimation of �1, and covariates ‘sex’, ‘income’ and‘severity’ except the first level are significant under the same size in the estimation of �2. Thus,we can conclude that above selected covariates are more important to the origin of the disease.

As what we did in Section 3.2, we can estimate the correlation over time of the pair (u, f ). Usingthe algorithm given in Section 3.2, the estimates of variance functions and correlation function areshown in Figure 3.

From these figures we can see that the correlation between response pair (u, f ) obviouslyincrease over time. And we employ the test (24) to validate the significance of the covariancestructure changing over time and obtain p value as 1.7529× 10−7, which is powerful to rejectthe H0 in (24) and gives a statistical interpretation for the changing covariance structure. Thesefindings suggest us to take account of covariance structure changing over time into the practicalmodelling. The solid lines in Figure 3(a) were obtained by kernel estimate method in which thebandwidth was chosen based on least square cross-validation technique. From Figure 3(b) and (c),

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 15: Regression models for mixed Poisson and continuous longitudinal data

3796 Y. YANG ET AL.

0 10 20 30 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(months)

0 10 20 30 40

(months)

0 10 20 30 40

(months)(a)

observedkernel estimaterr(log)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

(b)

σ1(estimate)

σ1(smoothed)

2

2.5

3

3.5

4

4.5

5

5.5

6

(c)

cov

σ2(estimate)

σ2(smoothed)

Figure 3. (a) The correlation change over time for (u, f ). The circles denote the observed correlationcoefficient, the real line denotes kernel estimate, dashed line denotes the estimated , and the dotted linedenotes the sample correlation coefficient after taking log on frequency. (b) and (c) are the estimates ofvariance functions �1(t) and �2(t), respectively. The dashed line denotes the estimate of variance; thedotted line denotes the smoothed variance based on the least square cross-validated kernel estimates.

Table IV. The significance levels of covariates �1(t) and�2(t) based on test (28).

�1(t) �2(t)

Sex 0.00 0.61Income 0.88 0.00shx 2 0.00 0.00urod 7 0.00 0.48urod 9 0.00 0.35Age 0.02 0.86Severitysev 1 0.00 0.00sev 2 0.00 0.00sev 3 0.00 0.00

we can see that the variances �1(t) and �2(t) decrease with time. Intuitively, we cannot ignore thechange of variance over time when we make inference for significance of the covariates.

Furthermore, the model in Section 3.1 could be used to investigate the correlation changing overtime. However, as we mentioned the way in that section fails to provide a satisfactory statisticalinterpretation if there exist discrete variables in model. So it is probably not desirable for our dataset and we will not employ it to analyse our data.

Hence, we use the data transformation technique proposed in Section 3.2 and based on the newdata to make inference for significance of the covariates under the framework of model (23), whichcan eliminate the impact of the covariance changing over time.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 16: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3797

0 10 20 30 40

0 10 20 30 40

0 10 20 30 40 0 10 20 30 40 0 10 20 30 40

0 10 20 30 40 0 10 20 30 40

0 10 20 30 400 10 20 30 40

0.0

0.5

−0.5

−1.0

−1.5

−2.0

1.0

0.0

0.5

−0.5

−1.0

−1.5

−2.0

1.0sex income

0

2

−2

4

6

shx_2

0.000

0.005

−0.005

–0.005−0.010

–0.010

−0.005

−0.010

−0.015

−0.020

0.010urod_7

0.000

0.005urod_9

0.00

0.05age

sev_1 sev_2

0

5

–5

10

0

5

–5

10

0

5

–5

10sev_3

Figure 4. The dotline and dashline are the estimates of varying-coefficient function component of �1(t)and �2(t), respectively, for the covariates sex, income, shx 2, urod 7, urod 9, age, severity with three

levels sev 1, sev 2 and sev 3.

Based on test (28), we also can give the significance of covariate over time. Like Table III,we can see from Table IV that covariates ‘sex’, ‘shx 2’, ‘urod 7’ ‘urod 9’, ‘age’ and ‘severity’are significant under the same size in the estimation of �1, and covariates ‘income’, ‘shx 2’and‘severity’ are significant under the same size in the estimation of �2. Compared with Table III, wefind the results based on test (28) are slightly different from the results of test (19), which does notconsider the covariance changing over time. First, the covariate ‘sex’ of �2 is not significant in test(28) but significant in test (19). Second, the covariate ‘shx 2’ of �2 is significant in test (28) butnot in test (19). The differences we list above indicate that the covariance changing over time tosome extent has impact on the significance of the covariates. The estimates of varying-coefficientfunction component of �1(t) and �2(t), respectively, for the covariates sex, income, shx 2, urod 7,urod 9, age, severity with three levels sev 1, sev 2 and sev 3 are shown in Figure 4. FromFigure 4 we can also observe the trends of varying-coefficient function component of �1(t) and

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 17: Regression models for mixed Poisson and continuous longitudinal data

3798 Y. YANG ET AL.

�2(t) changing over time. The model and test in Section 3.2 is an effective method to complementthe model in Section 2.

6. DISCUSSION

In this article we build regression models for mixed Poisson and continuous responses. We focuson the regression models and modelling correlation over time through joint linear regression withtime-varying variance–covariance structure. The regression parameter and variance–covariancematrix are estimated by the GEE methodology and the two step methods. We first assume themarginal distribution of the discrete variable and then consider the conditional distribution ofthe continuous one. The Poisson response is treated as a log-linear model [16], which possessesa desirable advantage that regression parameters have marginal interpretation. Furthermore, thismethod is robust, that is, even the link function of X and Y is misspecified, the other parameterscan still be well estimated. In our simulation we have shown that even the parameters �1 and�2 in model (7) are mistaken, the estimation of �1 and �2 can still work well. From the resultsin Section 5, it is easy to see that the variances change over time and we should be careful tomake inference based on the models in Section 2. When we emphasize particularly on buildingrelationship between the responses and covariates, we should eradicate the changing variance. Themethods given in Section 3.2 accommodate changing variance and correlation. To this end, wefirstly employ the likelihood ratio test to check whether the covariance changes over time and weuse data transformation technique further such that the covariance structure remains unchangingif we reject the null hypothesis covariance keeps unchanging. Fieuws and Verbeke [17] discussedthe problem how the associations between the hearing thresholds for the frequencies evolve overtime. They did not fit the observed marginal correlation in their Figure 2 very well, the mostpossible reason is that they did not take into account the heterogeneous variance and covariance.If we could get their data used in their Figure 2, we might give a satisfactory conclusion usingour methods developed in this article.

Although the maximum likelihood method is an effective way to analyse data, it could not beapplied to longitudinal data, the main reason is that the independence assumption does not holdin this situation. So the GEE methodology, which only requires the moments of the variables,is employed. Compared with the maximum likelihood method, this GEE methodology has evenhigher efficiency and can guarantee the robustness of the result.

As to the problem arising from research of IC, we apply our model to the real data. Fromthe results we can determine which covariates variables have heavy influence on the disease, andfurther we find the increasing correlation changes over time.

APPENDIX A: LIKELIHOOD RATIO TEST FOR VARIANCE CHANGING OVER TIME

The likelihood function of model (23) is

L(�|X, Y, Z)

=M∏j=1

f (log(X j + 1),Y j |�, Z j )

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 18: Regression models for mixed Poisson and continuous longitudinal data

REGRESSION MODELS FOR MIXED POISSON AND CONTINUOUS LONGITUDINAL DATA 3799

Table A1. Maximum likelihood estimate of �.

� ∈ � � ∈ �0

�1(t j ) =√√√√‖ log(X j + 1) − Z j1�1(t j )‖2

m j�01 =

√√√√∑Mj=1‖ log(X j + 1) − Z j1�1(t j )‖2∑M

j=1m j

�2(t j ) =√√√√‖Y j − Z j2�2(t j )‖2

m j�02 =

√√√√∑Mj=1‖Y j − Z j2�2(t j )‖2∑M

j=1m j

(t j ) = (log(X j+1)−Z j1�1(t j ))T(Y j−Z j2�2(t j ))

m j �1(t j )�2(t j )0 =

∑Mj=1(log(X j+1)−Z j1�1(t j ))

T(Y j−Z j1�2(t j ))∑Mj=1m j �01�02

�1(t j ) = (ZT1 j Z1 j )

−1ZT1 j log(X j + 1)

�2(t j ) = (ZT2 j Z2 j )

−1ZT2 j Y j

=M∏j=1

(2��21(t j )�22(t j )(1 − 2(t j )))

−m j/2 exp

{∑Mj=1

[− ‖ log(X j + 1) − Z1 j�1(t j )‖2

2(1 − 2(t j ))�21(t j )

+ (t j )[log(X j+1)−Z1 j�1(t j )][Y j−Z2 j�2(t j )]T(1−2(t j ))�1(t j )�2(t j )

− ‖Y j−Z2 j�2(t j )‖22(1−2(t j ))�22(t j )

]}(A1)

where f (·, ·|�, Z j ) denote the joint density function of random vector (log(X j + 1), Y j ) at timepoint t j .

Based on likelihood function (A1), we can calculate the MLE when � ∈ � and � ∈ �0, respec-tively, (Table A1).

According to Table A1, we can obtain the likelihood ratio statistic in (25).

ACKNOWLEDGEMENTS

The first author wishes to thank Prof. Kathleen Joy Propert at University of Pennsylvania School ofMedicine for providing the data form the interstitial cystitis data base. The first author is also gratefulto Prof. Wensheng Guo’s discussion. The authors thanks Prof. Michael Campbell, Editor, anonymousreferees for their insightful suggestions and careful reading that greatly improved this article.

REFERENCES

1. Propert KJ, Schaeffer AJ, Brensinger CM, Kusek JW, Nyberg LM, Landis JR, The interstitial cystitis data basestudy group. A prospective study of interstitial cystitis: results of longitudinal followup of the interstitial cystitisdata base cohort. The Journal of Urology 2000; 163:1434–1439.

2. Fitzmaurice GM, Laird NM. Regression models for a bivariate discrete and continuous outcome with clustering.Journal of the American Statistical Association 1995; 90:845–852.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim

Page 19: Regression models for mixed Poisson and continuous longitudinal data

3800 Y. YANG ET AL.

3. Olkin I, Tate RF. Multivariate correlation models with mixed discrete and continuous variables. Annals ofMathematical Statistics 1961; 32:448–465 (correction in 36:343–344).

4. Little RJA, Schluchter MD. Maximum likelihood estimation for mixed continuous and categorical data withmissing values. Biometrika 1985; 72:497–512.

5. Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley: New York, 1987.6. Regan MM, Catalano PJ. Likelihood models for clustered binary and continuous outcomes: application to

developmental toxicology. Biometrics 1999; 55:760–768.7. Cox DR, Reid N. Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical

Society, Series B 1987; 49:1–39 (with discussion).8. Cox DR, Reid N. On the stability of maximum-likelihood estimators of orthogonal parameters. Canadian Journal

of Statistics 1989; 17:229–233.9. Liang KY, Zeger SL, Qaqish BF. Multivariate regression analyses for categorical data. Journal of the Royal

Statistical Society, Series B 1992; 54:3–40.10. Zhang J, Boos DD. Bootstrap critical values for testing homogeneity of covariance matrices. Journal of the

American Statistical Association 1992; 87(418):425–429.11. Rotnitzky A, Jewell NP. Hypothesis testing of regression parameters in semiparametric generalized linear models

for cluster correlated data. Biometrika 1990; 77:485–497.12. Hastie T, Tibshirani R. Varying-coefficient models. Journal of the Royal Statistical Society, Series B 1993;

55:757–796 (with discussion).13. Shao J. Mathematical Statistics. Springer: Berlin, 1999.14. Kirkemo A, Peabody M, Diokno AC et al. Associations among urodynamic findings and symptoms in women

enrolled in the interstitial cystitis data base (ICDB) study. Urology 1997; 49(5A Suppl.):76–80.15. Messing E, Pauk D, Schaeffer A et al. Associations among cystoscopic findings and symptoms and physical

examination findings in women enrolled in the interstitial cystitis data base (ICDB) study. Urology 1997;49(5A Suppl.):81–85.

16. McCullagh P, Nelder JA. Generalized Linear Models. Chapman & Hall: London, 1983; 127–148.17. Fieuws S, Verbeke G. Joint modelling of multivariate longitudinal profiles: pitfalls of the random-effects approach.

Statistics in Medicine 2004; 23:3093–3104.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:3782–3800DOI: 10.1002/sim