The impact of dichotomization in longitudinal data analysis: a simulation study

PHARMACEUTICAL STATISTICS

Pharmaceut. Statist. 9: 298–312 (2010)

Published online 10 November 2009 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/pst.396

The impact of dichotomization in

longitudinal data analysis: A simulation

study

Bongin Yoo�,y

Global Biometric Sciences, Bristol–Myers Squibb Company, Wallingford, CT, USA

In this paper, a simulation study is conducted to systematically investigate the impact of

dichotomizing longitudinal continuous outcome variables under various types of missing data

mechanisms. Generalized linear models (GLM) with standard generalized estimating equations

(GEE) are widely used for longitudinal outcome analysis, but these semi-parametric approaches are

only valid under missing data completely at random (MCAR). Alternatively, weighted GEE

(WGEE) and multiple imputation GEE (MI-GEE) were developed to ensure validity under missing

at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and

MI-GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons,

likelihood-based linear mixed effects models (LMM) are used for incomplete longitudinal original

continuous outcome analysis. Focusing on dichotomized outcome analysis, MI-GEE with original

continuous missing data imputation procedure provides well controlled test sizes and more stable

power estimates compared with any other GEE-based approaches. It is also shown that dichotomizing

longitudinal continuous outcome will result in substantial loss of power compared with LMM.

Copyright r 2009 John Wiley & Sons, Ltd.

Keywords: linear mixed effects models; multiple imputation GEE; weighted GEE; missing data;

dichotomization; longitudinal data analysis

1. INTRODUCTION

It is common, in clinical trials or epidemiologicstudies, to use a continuous measurement tomeasure an outcome variable of interest and then

Copyright r 2010 John Wiley & Sons, Ltd.

*Correspondence to: Bongin Yoo, Global BiometricSciences, Bristol–Myers Squibb Company, 5 Research Park-way, Wallingford, CT 06492, USA.yE-mail: [email protected]

dichotomize the continuous outcome based on thecritically meaningful predefined criteria for aresponder analysis to derive evidence supportingindividual benefit of a treatment or procedure.For example, a guidance from the FDA onpatient-reported outcomes ([1], p. 20) or a recentdraft guideline from the Committee for MedicinalProducts for Human Use ([2], p. 20) concerningAlzheimer’s disease trials specifically endorsedthe responder analysis as an alternative or asecondary efficacy endpoint to assessing clinicalrelevance. Although this responder analysis iswidespread in practice, even encouraged fromregulatory agencies, it has been shown topossess substantial drawbacks such as loss ofinformation, reduction in power, uncertainty indefining the cutoff points, and the interruptionof detecting a nonmonotonic dose–responserelationship [3–10].

A typical longitudinal study would measurerepeatedly the outcome and covariates of interestover time. Hence, longitudinal data arising fromsuch studies are inevitable to have missing datadue to dropouts or loss of follow-up, etc. In thepresence of missing data, the selection of statisticalmethods has important implications on the esti-mation of treatment effects because differentstatistical methods are valid only under certainmissing data mechanisms. According to Little andRubin [11,12], a missing data mechanism is said tobe completely random (MCAR) if the missingnessis independent of both unobserved and observeddata, and random (MAR) if, conditional onobserved data, the missingness is independent ofthe unobserved data. Otherwise, the missingnessthat depends on the unobserved data is said to benonrandom (MNAR).

For incomplete longitudinal normal outcomevariables, linear mixed effects model (LMM) withmaximum likelihood (ML) estimation or residualmaximum likelihood (REML) estimation hasgained a lot of popularity in the statisticalcommunity, even in regulatory clinical trials.LMM analysis is easy to implement because noadditional data manipulation is required toaccommodate the missing data, and the analysiscan be conducted routinely using standard

statistical software (e.g. the SAS MIXED proce-dure) that has been widely available for a numberof years. Furthermore, LMM analysis is validunder MAR and it is shown to be more robust topotential bias from missing data than analysis ofcovariance (ANCOVA) with last observationcarried forward (LOCF) imputation and otherMCAR methods [13] and it can be possiblyextended under MNAR [14]. For incompletelongitudinal non-normal outcome variables,the generalized linear model (GLM) with gen-eralized estimating equation (GEE) methods, amultivariate analogue of quasi-likelihood, hasbeen widely applied in clinical trial and epidemio-logic data analysis. However, this method requiresthe stronger condition of MCAR. In fact, GEE-based parameter estimates can be biased underMAR due to the fact that GEE no longer has zeroexpectation under MAR [15,16]. Weightedgeneralized estimating equation (WGEE) [17,18]and multiple imputation (MI) [19] based general-ized estimating equation (MI-GEE) are twopossible remedies to overcome this limitationunder MAR. However, in both methods, dropoutneeds to be properly addressed, either bya dropout model, which estimates dropoutprobability for WGEE or by an imputationmodel, which replaces each missing value witha given set of plausible values for MI-GEE,meaning the missing data mechanism is then notignorable.

The first objective of the present study is to useMonte Carlo simulation methods to evaluate theimpact of dichotomizing an incomplete longitudi-nal continuous outcome by comparing resultsfrom GEE-based GLM analyses for the dichot-omized outcome and results from ML-basedLMM analysis for the original, continuous out-come under a variety of missing patterns. Thesecond objective of the study is to evaluate theperformance of various GEE-based GLM meth-ods for the analysis of incomplete longitudinaldichotomized outcome. The comparison will bemade through any change on type I error rates andpowers and magnitude and/or direction of biaseson treatment effects and their standard errorestimates.

Copyright r 2010 John Wiley & Sons, Ltd. Pharmaceut. Statist. 9: 298–312 (2010)DOI: 10.1002/pst

The impact of dichotomization in longitudinal data analysis 299

2. METHODS

2.1. Statistical methods to be compared

There are a wide range of statistical methods forhandling incomplete longitudinal data. For thispaper, however, I utilize two common statisticalapproaches in practice, ML estimation-basedLMM analysis for continuous outcome and GEEbased GLM analysis for dichotomized outcome.Even though GEE-based GLM can handle bothnormal and non-normal longitudinal outcomes,LMM is adopted for the continuous outcomeanalysis due to the fact that LMM analysis drawsa lot of attention from the statistical communityand, in fact, ML estimates perform better than theGEE estimates in that the former has smallerabsolute biases and mean-squared error (MSE) fornormal outcome variables [20].

The following specific analyses will be exploredfor the continuous outcome:

� LMM-AR1: ML-based LMM with first-orderautoregressive covariance matrix for the cor-relation between observations at different timepoints within each subject.

� LMM-UN: ML-based LMM with unstruc-tured covariance matrix for the correlationbetween observations at different time pointswithin each subject.

� LMM-AR1R: ML-based LMM with first-order autoregressive covariance matrix forthe correlation between observations at differ-ent time points within each subject as well asunstructured covariance matrix for the corre-lation between subjects describing subject-specific random intercept and slope.

The following specific analyses will be exploredfor the dichotomized outcome:

� GEE: Standard GEE-based GLM with logitlink function under first-order autoregressiveand unstructured working correlation matrix.

� WGEE: Weighted GEE-based GLM withlogit link function under first-order autore-gressive and unstructured working correlationmatrix.

� MI-B-GEE: MI (using dichotomized outcomeimputation model) GEE-based GLM with logitlink function under first-order autoregressiveand unstructured working correlation matrix.

� MI-C-GEE: MI (using original, continuousoutcome imputation model) GEE-based GLMwith logit link function under first-orderautoregressive and unstructured working cor-relation matrix.

Likelihood-based LMM analysis jointly modelsall the actual observations in a longitudinal trial,with no attempt at imputation or adjustment for themissing values or missing data mechanisms. Therepeated observations from each subject are as-sumed to follow a multivariate normal distribution,whose common covariance matrix describes thecorrelation between observations at different timepoints within a subject [21]. LMM can also handlesubject-specific random coefficients (random inter-cept and slope), which serve to model the betweensubject correlation structure in addition to withinsubject correlation using multivariate normal dis-tribution covariance matrix [22]. LMM analysis isvalid if the missing data mechanism is either MCARor MAR [15]. Moreover, LMM analysis is stableand provides sensible assessments of importantaspects such as treatment effect and time evolution,even if the assumption of MAR is violated in favorof MNAR [23]. In the context of clinical trials,Mallinckrodt et al. [24,25] argue that mixed-modelrepeated measures (MMRM), a particular form of aLMM, should be considered as a primary analysismethod for longitudinal continuous outcome ratherthan traditional LOCF single imputation approachby showing simulation results of superior control oftype I error rates with MMRM under various typesof missing data mechanisms.

In the presented LMM analysis, changes frombaseline to all post-baseline observations were thedependent variables. Independent variables in-cluded the fixed categorical effects of treatment,time, treatment-by-time interaction and baselinevalues. Parameters were estimated using ML withthe Newton–Raphson algorithm. Standard errorestimates were obtained from the observed in-formation matrix. Estimation of treatment effect


300 B. Yoo

was made at each post-baseline time point in termsof an adjusted mean difference by constructingtreatment effect estimates at that time afteradjusting the other independent variables in themodel. SAS PROC MIXED, with LS meansoption, was used to perform the LMM analysis.

The GLM with GEE, based on quasi-likelihoodmethods assuming only a functional relationshipbetween the mean and the variance, is an extensionof univariate GLM analysis on longitudinal data.GEE avoids the need for multivariate distributionsby only assuming a functional form for themarginal distribution at each time point. Thecorrelation between observations at different timepoint within a subject is captured by means ofworking correlation (or covariance) matrix, whichmay depend on the mean values, and hence on theregression coefficient parameters with non-normaloutcome. GEE yields consistent and asymptoti-cally normal estimates under MCAR, even withmisspecification of the working correlationmatrix [16]. When the working structure iscorrectly specified, the parameter estimators andtheir model-based standard error estimators arevalid under the weaker MAR as pointed out by theauthors. In the presence of missing observations,however, the GEE estimate of correlation matrix isnot always positive definite resulting in conver-gence problems, and the GEE estimator tends tohave a large MSE and poor coverage probabilitycompared with likelihood-based method for nor-mal outcome analyses [20]. The convergenceproblem is further discussed in the simulationresult section. In the presented GEE-based GLManalysis, dichotomized outcomes at all post-base-line time points were the dependent variablesassuming binary distribution along with logit linktransformation function. Independent variablesincluded the fixed categorical effects of treatment,time, treatment-by-time interaction and baselinevalues. Parameters were estimated using GEE.Robust, the so-called sandwich, standard errorestimates were obtained [16,26–28]. Estimation oftreatment effect was made at each post-baselinetime point in terms of adjusted logit difference byconstructing treatment effect estimates at that timeafter adjusting the other independent variables in

the model. SAS PROC GENMOD with theREPEATED option was used to perform theGEE-based GLM analyses.

Since in general the working correlation matrixwill not be correctly specified, standard GEE willnot be valid under MAR and hence, Robins et al.[18] extended the standard GEE by using inverseof dropout probability as weights in the estimatingequation, resulting in WGEE. The subject-specificdropout probabilities are estimated from a sepa-rate dropout model. Incorporating these subject-specific weights in the GEEs can potentially reducepossible bias in the regression parameter estimatesunder MAR as long as the dropout model iscorrectly specified. A drawback of WGEE is thatany misspecification of the dropout model willaffect the misspecification of weights for allsubjects, and thus the parameter estimates.Through a simulation study, Preisser et al. [29]conclude that misspecification of the dropoutmodel, and thus the misspecification of the weightscould result in bigger bias in WGEE comparedwith standard GEE. In order to obtain subject-specific weights in the presented WGEE, first, alogistic dropout model is applied to calculatepredictive probabilities of dropout at each timepoint for each subject by modeling dropout statusas the dependent variable and previous time pointnonmissing dichotomized outcome and categoricaleffects of treatment as the predictors. Second, thepredictive dropout probabilities for a subject arecombined to estimate subject-specific weight. SASPROC GENMOD was used to calculate thepredictive dropout probabilities with PREDoption in the model. The detailed SAS codefor utilizing WGEE is presented in the SASmanual [30].

An alternative GEE-based approach for MARmissing data analysis is MI method, which wasoriginally introduced by Rubin [19]. MI is either anonparametric or parametric approach of filling inany missing values by several plausible sets ofvalues to create completed data sets to handle anymissing values. Each of the imputed complete datasets is repeatedly analyzed by a standard statisticalprocedure (e.g. standard GEE-based GLM) andcombining the estimates from these analyses. The



combined point estimate for the parameter ofinterest is the average of the point estimates fromcomplete data sets and estimated variance of thecombined point estimate is calculated based onboth within and between-imputation variability.The parametric MI approaches assume multi-variate normal distribution, and the missing valuesare imputed from the conditional normal distribu-tion given the observed data. The MI approach isrobust to minor departures from multivariatenormal distribution or minor misspecification ofthe imputation model [31,32]. In the presented MIapproach, two different imputation procedures areevaluated. The first imputation procedure is forimputing missing original, continuous outcomesprior to dichotomization (MI-C) and then dichot-omizing the imputed continuous outcomes as wellas nonmissing continuous outcomes based onpredefined definition of response. The secondimputation procedure is for imputing dichotomizedoutcomes and then rounding the imputed values tothe closest dichotomized value (MI-B). Theseimputation procedures can be applied in practicesince both original, continuous outcome anddichotomized outcome are usually available inreal-world data sets. SAS PROC MI was used togenerate 5 complete data sets based upon MarkovChain Monte Carlo (MCMC) [33] imputationmethods.

For each of the seven methods described above,empirical test size, empirical power, percentrelative bias, the accuracy of standard errorestimates and MSE are evaluated. Empirical testsize is defined as the proportion of times the nullhypothesis of no difference is rejected when there isno true difference. Empirical power is defined asthe proportion of times the null hypothesis of nodifference is rejected when there is true difference.Percent relative bias is computed asð1=MÞ

PMi¼1 ðd̂i � dÞ=d � 100, where d̂i is estimate

of treatment effect for ith simulated data set andd is true treatment effect and M is total number ofsimulated data sets, 2000 for test size and 1000 forpower. The standard error estimates for the sevenmethods are evaluated by comparing their averagevalues of the standard error estimates over allsimulations to the empirical standard error, which

is computed as standard deviation of the treatmenteffect estimates over all simulations. With 2000data sets per test size simulation, assuming a truetest size of 5%, this simulation study hasasymptotic standard error of 0.5% and 95% exactconfidence interval of (4%, 6%) for each simula-tion scenario.

2.2. Simulation scenarios

For test size comparison, the complete data foreach treatment group were generated underidentical distribution. First, the subject-specificregression coefficients (bi0 for intercept and bi1 forslope) were generated from bivariate normaldistribution with

bi0bi1

� �� N

75:00:5

� �;

2 0:750:75 2

� ��

for both treatment groups, G1 and G2.Then N observations were generated indepen-

dently from univariate normal distributions [Yij|bi0, bi1]�N(bi0 þ bi1 � j;s

2) where s2 5 7, i5 1,2,y,100 and j5 0, 1, 2,y,5 for each treatmentgroup. Thus, the two treatment groups have anidentical mean profiles, EðYG1Þ ¼ EðYG2Þ ¼ð75:0; 75:5; 76:0; 76:5; 77:0; 77:5Þ0, with commonvariance of 66.5 at time point 5. That is, in orderto evaluate the test size, data are generated suchthat the average treatment effects of G1 areequivalent to the average treatment effects of G2

across all time points.For the power comparison, each treatment group

follows two different bivariate normal distributionsby generating the data with two distinguishedtreatment-specific subject level slopes, but a com-mon subject level intercept and common covariancematrix for all subjects. That is,

bi0bi1

� �� N

75:00:5

� �;

2 0:750:75 2

� ��

for group G1 and

bi0bi1

� �� N

75:01:2

� �;

2 0:750:75 2

� ��

for group G2, respectively.


302 B. Yoo

Then N observations were generated independentlyfrom univariate normal distributions [Yij| bi0,bi1]�N(bi0 þ bi1 � j; s

2) where s257, i51, 2,y,100 and j50, 1, 2,y,5 for each treatment group.Thus, each group has treatment-specific mean pro-files, EðYG1Þ ¼ ð75:0; 75:5; 76:0; 76:5; 77:0; 77:5Þ

0 forG1 and EðYG2Þ ¼ ð75:0; 76:2; 77:4; 78:6; 79:8; 81:0Þ

0

for G2, with common variance of 66.5 at timepoint 5. That is, in order to evaluate the power, dataare generated such that G1 is superior (assuming thatsmaller values of the outcome variables are better) toG2 with a true mean difference, EðYG1;5Þ � EðYG2;5Þ,of �3.5 at time point 5.

After the generation of complete longitudinalcontinuous outcome Yij, the complete longitudinaldichotomized outcome Wij was then obtainedfrom the continuous outcome by defining it as7% or more increase from the baseline value. Thatis, Wij 5 1, if there is YijX1.07�Yi0, and 0,otherwise, where i5 1, 2,y,200 and j5 1, 2,y,5.Since the original simulated outcomes are contin-uous in nature and the primary research interest ison the evaluation of ML-based LMM analysis onthe continuous outcome and GEE-based GLM

analysis on the dichotomized outcome, it isnecessary to obtain ‘true’ expected treatmentdifference corresponding to this dichotomizedoutcome in terms of logit function besides theexpected treatment difference (0 for test sizeand �3.5 for power) on the continuous outcomeat time point 5. In terms of logit function,the resulting expected treatment difference,EðlogitðWG1;5ÞÞ � EðlogitðWG2;5ÞÞ, on the dichoto-mized outcome at time point 5, is 0 for test sizesince the two treatment groups have an identicaldistribution and �0.751 for power since the knowndistribution of YGi;0 � Nð75:0; 9:0Þ;YG1;5 � Nð77:5; 66:5Þ;YG2;5 � Nð81:0; 66:5Þ and knowncorrelation of 0.4 between YGi;0 and YGi;5, wherei5 1,2. Figure 1 illustrates the distributionsof simulated continuous and dichotomizedoutcomes for test size (Figure 1(a1) and (a2)) andfor power (Figure 1(b1) and (b2)) evaluation ateach time point. They present expectedvalues and standard deviations of the completecontinuous outcomes as well as expectedproportions of the complete dichotomizedoutcomes.

70

0.00

y

f(y)

T0: E(Y)=75.0, SD=3.0T1: E(Y)=75.5, SD=3.5T2: E(Y)=76.0, SD=4.5T3: E(Y)=76.5, SD=5.6T4: E(Y)=77.0, SD=6.9T5: E(Y)=77.5, SD=8.2

Study Time

Res

pons

e (%

)

0

y

f(y)

T0: E(YG1)=75.0, E(YG2)=75.0, SD=3.0T1: E(YG1)=75.5, E(YG2)=76.2, SD=3.5T2: E(YG1)=76.0, E(YG2)=77.4, SD=4.5T3: E(YG1)=76.5, E(YG2)=78.6, SD=5.6T4: E(YG1)=77.0, E(YG2)=79.8, SD=6.9T5: E(YG1)=77.5, E(YG2)=81.0, SD=8.2

Study Time

Res

pons

e (%

) Group 1

Group 2

0.12

0.08

0.04

0.00

0.12

0.08

0.04

75 80 85 90 95 100

70 75 80 85 90 95 100

10

20

30

40

50

0

10

20

30

40

50

60

70

T213%

T322%

T430%

T15%

T536%

T213%23%

T322%37%

T430%47%

T15% 8%

T536%54%

(a1) (a2)

(b1) (b2)

Figure 1. Distribution of simulated continuous and dichotomized outcomes.



In summary, for each simulation scenario, 2000fully observed data sets were generated for test sizeevaluation, and 1000 fully observed data setswere generated for power evaluation. Each fullyobserved data set has sample size of n5 100per treatment group with one baseline and 5post-baseline outcomes for each subject. Given asample size of 100 per group with mean differenceof �3.5 and common variance of 66.5, each fullyobserved data set would have 85% power ofdetecting the mean difference of continuous out-come Y at time point 5 using two-sample t-testwith a 5% significance level. Meanwhile, given thesame sample size, the power to detect an oddsratio of 2.09 (P1 5 36% and P2 5 54%) at timepoint 5 for dichotomized outcome W is only72% using two-sample chi-square test at a 5%significance level.

2.3. Generating missing data

The missing data are generated under fourteendifferent missing patterns pertaining to the twodifferent missing data mechanisms (MCAR orMAR) and seven different treatment group-spe-cific missing rates at time point 5 (Table I).

For each simulation scenario, a set of completelongitudinal data are generated based on Section2.2 and the missing data were created by deletingvalues from the complete data. A monotonemissing pattern was used; that is, if a subject’sobservation was deleted for a particular timepoint, all subsequent data for that subject werealso deleted. In order to achieve the treatmentgroup-specific targeted missing rate for MCAR,post-baseline outcome values were randomlyselected with adjusted probability from the com-plete data and the observation, and all subsequentobservations were deleted. In order to achieve thetreatment group-specific targeted missing rate forMAR, the change from baseline to each time point

outcome value was divided into 10 percentiles foreach treatment group, with the first percentilecontaining the best responses (largest negativechange). A missing value indicator was generatedwith lower probability for better responses andhigher probability for worse responses, adjusted togenerate the targeted missing rates. The observa-tion that triggered the missing data were kept butall other subsequent observations were deleted.For each targeted missing rate at t5, Table IIpresents approximate missing rates at each post-baseline time point for simulated data sets.

3. SIMULATION RESULTS

Figure 2 displays the test size simulation results atendpoint (time point 5) over seven differentmissing patterns under the two different missingdata mechanisms (MCAR and MAR). The plots(a1) and (b1) in Figure 2 present the empiricaltype 1 error rates from the three different ML-based LMM analyses and the four different GEE-based GLM analyses described previously with anAR1 working correlation matrix under MCARand MAR, respectively. In order to evaluate theimpact on the test size of the GEE-based GLManalysis with a different choice of workingcorrelation matrix, the same simulated data setsare also analyzed by the methods with an


Table II. Approximate missing rate (%) at each post-baseline time point.

Target missingrate (%) at t5

Missing rate at (t1, t2, t3, t4, t5)

MCAR MAR

10 (1, 2, 3, 6, 10) (0, 6, 8, 9, 10)20 (2, 5, 10, 15, 20) (0, 12, 17, 19, 20)30 (4, 8, 13, 22, 30) (0, 18, 26, 29, 30)40 (4, 10, 19, 29, 40) (0, 25, 34, 38, 40)45 (4, 10, 19, 31, 45) (0, 28, 38, 43, 45)

Table I. Approximate missing rates (%) at time point 5 for (G1, G2).

Low Medium High Low diff 1 Med diff 1 Low diff 2 Med diff 2

(10, 10) (30, 30) (45, 45) (10, 20) (20, 40) (20, 10) (40, 20)

304 B. Yoo

unstructured working correlation matrix andpresented the empirical type 1 error rates alongwith those from using an AR1 working correlationmatrix in the plots (a2) and (b2) in Figure 2.Table III summarizes, in detail, empirical type 1error rate (T1E), percent relative bias (RBD),average estimated standard errors (SED), standarddeviation (SDD) of estimated treatment differencesand MSE from each of the seven analyses,calculated over the 2000 simulated data sets. Notethat with unstructured working correlation matrix,reliable estimates could not be obtained for someof GEE-based GLM approaches in more than 5%of the simulations due to ‘convergence’ problems.Accordingly, to avoid selection bias, no detailedresults with unstructured working correlationmatrix are reported. Most of the convergenceproblems were resolved by adopting a simplestructured AR1 working correlation matrix; infact, less than 2 percent of the simulations still had

convergence problems with AR1 working correla-tion matrix. Inspection of these samples revealedthat, at one time point, Fisher information matrixis singular in the estimation process.

In examining Figure 2(a1) and Table III, whendata were MCAR, there is no obvious bias on thetreatment effect estimates for any of the LMManalyses and all the analyses, except for LMM-AR1, have close to 5% nominal type 1 error rate.Even though LMM-AR1 shows virtually no biason the treatment effect estimates and provides thesmallest MSE, it greatly underestimates standarderror of the treatment effects resulting in a largeinflation of type 1 error rates ranging from 16% to19% (Table III). This finding is consistent withother studies, reflecting the fact that modelmisspecification could result in severe bias ofvariance estimates, but little effect on the fixedeffects estimates [34,35]. Standard GEE andMI-C-GEE-based GLM analyses show no

0

Dropout Pattern (%)

Type

I E

rror

(%

)

10-10

GEE-AR1WGEE-AR1MI-B-GEE-AR1MI-C-GEE-AR1

LMM-UNLMM-AR1LMM-AR1R

Dropout Pattern (%)

Type

I E

rror

(%

)

10-10


GEE-UNWGEE-UNMI-B-GEE-UNMI-C-GEE-UN

Dropout Pattern (%)

Type

I E

rror

(%

)



Dropout Pattern (%)

Type

I E

rror

(%

)



10

20

30

40

50

0

10

20

30

40

50

0

10

20

30

40

50

0

10

20

30

40

50

30-30 45-45 10-20 20-40 20-10 40-20 30-30 45-45 10-20 20-40 20-10 40-20

10-10 10-1030-30 45-45 10-20 20-40 20-10 40-20 30-30 45-45 10-20 20-40 20-10 40-20

(a1) (a2)

(b1) (b2)

Figure 2. Empirical type I error (%) rates. (a1) Comparing GEEs vs LMMs under MCAR; (a2) Comparing GEEs with

AR(1) and GEEs with unspecified working correlation matrices under MCAR; (b1) Comparing GEEs vs LMMs under

MAR; (b2) Comparing GEEs with AR(1) and GEEs with unspecified working correlation matrices under MAR.




Table III. Empirical type I error rate and bias comparison at endpoint (time-point 5).

MCAR MAR

Missing rate Analysis T1E� RBDy SED

z SDDy MSEz T1E� RBD

y SEDz SDD

y MSEz

(10, 10) GEE-AR1 4.5 0.0 0.31 0.31 0.10 5.8 0.0 0.33 0.34 0.11WGEE-AR1 7.5 1.3 0.37 0.42 0.14 5.3 0.0 0.35 0.36 0.12MI-B GEE-AR1 4.7 0.0 0.31 0.31 0.10 4.9 0.0 0.32 0.33 0.10MI-C GEE-AR1 4.4 0.0 0.31 0.30 0.10 4.0 0.0 0.32 0.31 0.10LMM-AR1 16.0 0.0 0.78 1.08 0.61 17.3 0.0 0.77 1.11 0.59LMM-AR1R 4.3 0.3 1.09 1.09 1.19 4.8 0.0 1.10 1.11 1.21LMM-UN 4.3 0.3 1.09 1.09 1.19 4.7 �0.3 1.10 1.11 1.21

(30, 30) GEE-AR1 5.1 �1.3 0.35 0.36 0.12 7.2 2.7 0.42 0.47 0.18WGEE-AR1 5.1 �1.3 0.37 0.38 0.14 5.1 2.7 0.45 0.47 0.20MI-B GEE-AR1 4.9 �1.3 0.34 0.35 0.12 6.8 1.3 0.40 0.44 0.16MI-C GEE-AR1 3.6 �1.3 0.33 0.32 0.11 3.5 0.0 0.36 0.33 0.13LMM-AR1 18.2 �1.1 0.80 1.18 0.64 21.4 0.6 0.76 1.19 0.58LMM-AR1R 5.5 �1.4 1.14 1.15 1.30 5.1 0.3 1.17 1.15 1.37LMM-UN 5.4 �1.4 1.13 1.15 1.28 4.9 0.3 1.17 1.15 1.37

(45, 45) GEE-AR1 4.6 �1.3 0.39 0.39 0.15 6.3 1.3 0.56 0.65 0.31WGEE-AR1 4.6 �1.3 0.40 0.40 0.16 6.6 �1.3 0.63 0.74 0.40MI-B GEE-AR1 5.1 �1.3 0.37 0.37 0.14 9.6 4.0 0.51 0.60 0.26MI-C GEE-AR1 3.2 0.0 0.36 0.32 0.13 4.3 1.3 0.42 0.39 0.18LMM-AR1 19.2 0.0 0.84 1.26 0.71 24.3 1.4 0.79 1.32 0.63LMM-AR1R 5.3 0.3 1.18 1.20 1.39 5.5 0.9 1.25 1.27 1.56LMM-UN 5.0 0.3 1.18 1.20 1.39 5.7 0.9 1.25 1.27 1.56

(10, 20) GEE-AR1 5.6 0.0 0.32 0.33 0.10 11.2 �26.6 0.34 0.37 0.16WGEE-AR1 7.6 2.7 0.36 0.42 0.13 6.2 �14.6 0.36 0.37 0.14MI-B GEE-AR1 5.7 �2.7 0.32 0.33 0.10 7.7 �16.0 0.34 0.35 0.13MI-C GEE-AR1 4.4 0.0 0.32 0.32 0.10 4.5 �2.7 0.33 0.32 0.11LMM-AR1 17.1 0.3 0.78 1.12 0.61 22.0 �14.3 0.77 1.11 0.84LMM-AR1R 5.1 �0.3 1.10 1.12 1.21 4.8 2.0 1.12 1.10 1.26LMM-UN 5.1 �0.3 1.10 1.12 1.21 4.7 2.0 1.12 1.10 1.26

(20, 40) GEE-AR1 5.8 �1.3 0.35 0.36 0.12 25.5 �75.9 0.43 0.47 0.51WGEE-AR1 5.9 2.7 0.37 0.39 0.14 8.6 �42.6 0.47 0.49 0.32MI-B GEE-AR1 6.4 �8.0 0.34 0.35 0.12 13.5 �47.9 0.42 0.43 0.31MI-C GEE-AR1 4.1 �2.7 0.33 0.32 0.11 3.5 �6.7 0.36 0.33 0.13LMM-AR1 17.9 �0.6 0.81 1.17 0.66 37.0 �30.9 0.77 1.16 1.76LMM-AR1R 5.5 �0.9 1.14 1.15 1.30 5.2 4.6 1.17 1.14 1.39LMM-UN 5.4 �0.9 1.14 1.15 1.30 5.2 4.6 1.18 1.15 1.42

(20, 10) GEE-AR1 4.4 1.3 0.32 0.32 0.10 9.2 26.6 0.34 0.36 0.16WGEE-AR1 7.2 �1.3 0.37 0.42 0.14 5.6 14.6 0.36 0.37 0.14MI-B GEE-AR1 4.4 2.7 0.32 0.32 0.10 6.7 17.3 0.34 0.35 0.13MI-C GEE-AR1 3.8 1.3 0.32 0.31 0.10 4.4 2.7 0.33 0.31 0.11LMM-AR1 16.2 1.1 0.78 1.10 0.61 22.0 14.2 0.77 1.12 0.84LMM-AR1R 4.9 1.1 1.10 1.10 1.21 4.9 �1.7 1.12 1.12 1.26LMM-UN 4.8 1.1 1.10 1.10 1.21 4.9 �1.7 1.12 1.12 1.26

(40, 20) GEE-AR1 4.4 1.3 0.35 0.35 0.12 26.1 75.9 0.43 0.50 0.51WGEE-AR1 4.2 �4.0 0.37 0.38 0.14 9.4 41.3 0.47 0.52 0.32MI-B GEE-AR1 4.6 6.7 0.34 0.34 0.12 14.1 47.9 0.42 0.46 0.31MI-C GEE-AR1 3.3 1.3 0.33 0.31 0.11 3.9 6.7 0.37 0.35 0.14LMM-AR1 16.1 0.3 0.80 1.16 0.64 37.1 30.3 0.77 1.19 1.72LMM-AR1R 4.7 0.3 1.13 1.13 1.28 5.8 �4.9 1.18 1.17 1.42LMM-UN 4.7 0.3 1.13 1.13 1.28 5.3 �4.9 1.18 1.17 1.42

�T1E: empirical type I error rate (%).yRBD: percent relative bias. True treatment mean difference is 0. True treatment logit difference is 0.zSED: standard error estimate of treatment difference estimate (D) calculated from an average of 2000 standard error estimates.ySDD: empirical standard error of treatment difference estimate (D) calculated from standard deviation of 2000 estimated

treatment differences.zMSE: mean-squared error.

306 B. Yoo

noticeable bias on the treatment effect estimates,and their standard error estimates (SED) arecomparable to empirical standard errors (SDD),and thus the type 1 error rates from those twoanalyses are well controlled at 5% nominal level.Moreover, MI-C-GEE-based GLM analysis pro-vides the smallest MSE. However, WGEE andMI-B-GEE-based GLM analyses show slight biason the treatment effects, especially when themissing rate difference is large ((20, 40) or(40, 20) missing pattern in Table III). It is notedthat when the missing rates are low, WGEE-basedGLM analysis tends to exhibit slight inflation oftype 1 error rates (7.5%, 7.6% and 7.2% for(10, 10), (10, 20) and (20, 10) missing patterns,respectively) mainly due to slight underestimationof standard error estimates (Figure 2(a1) andTable III). In addition, the analysis shows somedegree of sensitivity on the choice of differentworking correlation matrices unlike the otherGEE-based GLM analyses (Figure 2(a2)).

When data were MAR, LMM analyses exhibitsimilar results shown under MCAR. That is, thereis no noticeable bias on the treatment effectestimates and type 1 error rates are reasonablywell preserved at 5% nominal level except forLMM-AR1 analysis (Figure 2(b1) and Table III).The distortion of type 1 error rates for LMM-AR1analysis, compared with the results under MCAR,is worsened by its larger bias on the treatmenteffect estimates as well as underestimation of theirstandard errors. When the missing rates are similarin the treatment groups, standard GEE-basedGLM analysis surprisingly well preserves type 1error rate at 5% nominal level, even though thereis an indication of slight bias on the treatmenteffect estimates and slight underestimation of theirstandard errors as missing rates are increased.WGEE and MI-B-GEE-based GLM analysesshow similar phenomenon. On the other hand,when different missing rates are presented, thedistortion of type 1 error rates for the GEE-basedGLM analyses are deepened by their large bias onthe treatment effect estimates except for MI-C-GEE-based GLM analysis. MI-C-GEE-basedGLM analysis well controls type 1 error ratesunder 5% nominal level and provides smallest

MSE even though there is slight bias on thetreatment effect estimates for large different miss-ing patterns ((20, 40) or (40, 20) missing patterns).All GEE-based GLM analyses are somewhatsensitive to the choice of different workingcorrelation matrices, especially for large differentmissing patterns (Figure 2(b2)).

Empirical power simulation results, calculatedover the 1000 simulated data sets, are presented inFigure 3 and Table IV. When data were MCAR,there is no obvious bias on the treatment effect forany of the LMM analyses and the analyses achieve83% or higher statistical power even though thepower estimates tend to slightly decrease asmissing rates are increased (Figure 3(a1) andTable IV). LMM-AR1 analysis has uniformlyhigher power, ranging from 92.9% to 96.5%,compared with the other LMM analyses due to thesmaller standard error estimates, which are, infact, underestimation of true variances of treat-ment effects. GEE-based GLM analyses showslight bias on the treatment effects estimates, buttheir standard error estimates are reasonablyclose to the empirical standard errors except forWGEE-based GLM analysis. The estimatedstatistical power is as low as 51.8% with MI-B-GEE under (20, 40) missing pattern and as highas 72.4% with MI-C-GEE under (10, 10) missingpattern. WGEE-based GLM analysis has theworst bias of treatment effect estimates as well asunderestimated standard errors, resulting in thelowest power in many of the missing patternsconsidered. Moreover, the analysis shows mostsensitive results to the choice of different workingcorrelation matrices (Figure 3(a2)). The redu-ction or fluctuation of power from GEE-basedGLM analyses seems to be more dramaticcompared with those from ML-based LMManalyses. In general, MI-C-GEE-based GLManalysis demonstrates overall superior and stablestatistical power over other GEE-based GLManalyses.

When data were MAR, LMM-AR1R andLMM-UN analyses have slightly bigger bias onthe treatment effect estimates compared with theresults under MCAR, especially for (40, 20)missing pattern. However, the standard error



estimates are still reasonably comparable to theirempirical standard errors. Thus, the empiricalpowers are slightly lower and slightly morefluctuated depending on the missing patterns(Figure 3(b1) and Table IV). Even thoughLMM-AR1 analysis has overall superior power,the analysis suffers from bias on the treatmenteffect estimates and underestimation of theirstandard error estimates, resulting in substantialfluctuation of statistical powers depending onmissing patterns. In general, GEE-based GLManalyses show even higher bias on the treatmenteffects in either direction depending on missingpatterns. However, their standard error estimatesare not much deviated from their empiricalstandard errors. Thus, the analyses show substan-tial fluctuation of statistical powers dependingon missing patterns. For example, standardGEE-based GLM analysis has a power of 94.1%

for (40, 20) missing pattern, but only 20.8% for(20, 40) missing pattern. WGEE and MI-B-GEE-based GLM analyses also show large bias andfluctuation of power, which are slightly worse forsimilar missing rate patterns and slightly better fordifferent missing rate patterns compared withstandard GEE. Even though MI-C-GEE-basedGLM analysis shows slight bias on the treatmenteffects, in general, the analysis shows much morestable statistical power compared with otherGEE-based GLM analyses. Similar to test sizesimulation, GEE-based GLM analyses are sensi-tive to the choice of different working correlationmatrices, especially for largely different missingrate patterns (Figure 3(b2)).

For the sake of comparison, results fromANCOVA and Mantel–Haenszel (MH) test resultsat time point 5 on the same simulated data sets arepresented in Table V. While these approaches seem

20

Dropout Pattern (%)

Pow

er (

%)

10-10



Dropout Pattern (%)

Pow

er (

%)



Dropout Pattern (%)

Pow

er (

%)



Dropout Pattern (%)

Pow

er (

%)



40

60

80

100

20

40

60

80

100

20

40

60

80

100

20

40

60

80

100

30-30 45-45 10-20 20-40 20-10 40-20 10-10 30-30 45-45 10-20 20-40 20-10 40-20

10-10 30-30 45-45 10-20 20-40 20-10 40-20 10-10 30-30 45-45 10-20 20-40 20-10 40-20

(a1) (a2)

(b1) (b2)

Figure 3. Empirical power (%). (a1) Comparing GEEs vs LMMs under MCAR; (a2) Comparing GEEs with

AR(1) and GEEs with unspecified working correlation matrices under MCAR; (b1) Comparing GEEs vs LMMs under

MAR; (b2) Comparing GEEs with AR(1) and GEEs with unspecified working correlation matrices under MAR.


308 B. Yoo


Table IV. Empirical power and bias comparison at endpoint (time-point 5).

MCAR MAR

Missing Rate Analysis Power� RBDy SED

z SDDy MSEz Power� RBD

y SEDz SDD

y MSEz

(10, 10) GEE-AR1 71.1 2.7 0.31 0.31 0.10 69.7 4.0 0.32 0.33 0.10WGEE-AR1 62.5 9.3 0.37 0.45 0.14 68.7 12.0 0.34 0.37 0.12MI-B GEE-AR1 70.5 0.0 0.31 0.31 0.10 67.6 1.3 0.32 0.32 0.10MI-C GEE-AR1 72.4 2.7 0.31 0.30 0.10 67.7 1.3 0.31 0.31 0.10LMM-AR1 96.5 �0.6 0.78 1.06 0.61 94.0 �2.3 0.77 1.16 0.60LMM-AR1R 89.6 �0.3 1.09 1.08 1.19 86.7 �2.0 1.10 1.16 1.21LMM-UN 89.5 �0.3 1.09 1.08 1.19 86.6 �2.0 1.10 1.16 1.21

(30, 30) GEE-AR1 61.7 4.0 0.34 0.35 0.12 64.4 22.6 0.39 0.42 0.18WGEE-AR1 60.1 8.0 0.36 0.38 0.13 59.5 21.3 0.41 0.41 0.19MI-B GEE-AR1 58.9 �2.7 0.33 0.34 0.11 56.7 12.0 0.38 0.40 0.15MI-C GEE-AR1 66.3 4.0 0.33 0.32 0.11 63.1 4.0 0.34 0.33 0.12LMM-AR1 94.8 0.3 0.81 1.20 0.66 94.6 �0.6 0.77 1.18 0.59LMM-AR1R 86.0 0.6 1.14 1.18 1.30 83.7 �0.9 1.18 1.19 1.39LMM-UN 86.1 0.6 1.14 1.18 1.30 83.6 �0.9 1.18 1.19 1.39

(45, 45) GEE-AR1 55.3 5.3 0.37 0.39 0.14 61.0 46.6 0.49 0.53 0.36WGEE-AR1 56.4 8.0 0.38 0.40 0.15 48.8 39.9 0.54 0.57 0.38MI-B GEE-AR1 55.0 �2.7 0.36 0.36 0.13 53.9 28.0 0.46 0.50 0.26MI-C GEE-AR1 64.6 5.3 0.34 0.33 0.12 58.2 10.7 0.39 0.35 0.16LMM-AR1 93.3 1.4 0.83 1.30 2.65 95.4 2.6 0.79 1.27 0.63LMM-AR1R 83.5 0.9 1.17 1.25 2.18 82.1 2.9 1.25 1.24 1.57LMM-UN 83.2 0.9 1.17 1.26 2.18 81.5 2.9 1.25 1.24 1.57

(10, 20) GEE-AR1 68.2 2.7 0.32 0.33 0.10 50.0 �13.3 0.33 0.35 0.12WGEE-AR1 66.0 14.6 0.36 0.42 0.14 58.8 1.3 0.34 0.36 0.12MI-B GEE-AR1 62.8 �4.0 0.31 0.32 0.10 55.4 �9.3 0.32 0.33 0.11MI-C GEE-AR1 70.9 2.7 0.31 0.31 0.10 67.5 1.3 0.31 0.31 0.10LMM-AR1 96.2 0.0 0.78 1.15 0.61 91.4 �14.0 0.76 1.16 0.82LMM-AR1R 87.6 0.3 1.10 1.16 1.21 88.8 1.7 1.12 1.14 1.26LMM-UN 87.6 0.3 1.10 1.16 1.21 88.5 1.7 1.12 1.14 1.26

(20, 40) GEE-AR1 61.7 4.0 0.34 0.34 0.12 20.8 �49.3 0.38 0.43 0.28WGEE-AR1 66.8 13.3 0.36 0.36 0.14 29.4 �28.0 0.40 0.42 0.20MI-B GEE-AR1 51.8 -9.3 0.34 0.33 0.12 29.0 �34.6 0.38 0.40 0.21MI-C GEE-AR1 68.2 4.0 0.33 0.31 0.11 59.3 �2.7 0.34 0.34 0.12LMM-AR1 95.4 -0.3 0.80 1.17 0.64 72.0 �38.0 0.77 1.25 2.4LMM-AR1R 86.6 0.3 1.13 1.15 1.28 84.2 2.0 1.17 1.23 1.4LMM-UN 86.7 0.3 1.13 1.15 1.28 84.2 2.0 1.17 1.24 1.4

(20, 10) GEE-AR1 69.4 2.7 0.32 0.32 0.10 86.4 36.0 0.33 0.35 0.18WGEE-AR1 62.1 8.0 0.36 0.41 0.13 79.5 29.3 0.36 0.37 0.18MI-B GEE-AR1 70.4 2.7 0.31 0.31 0.10 81.0 22.6 0.33 0.34 0.14MI-C GEE-AR1 71.9 2.7 0.31 0.30 0.10 73.4 6.7 0.32 0.31 0.10LMM-AR1 96.5 0.9 0.78 1.12 0.61 99.1 16.0 0.77 1.10 0.91LMM-AR1R 88.5 0.6 1.10 1.12 1.21 87.9 0.0 1.12 1.10 1.25LMM-UN 88.1 0.6 1.10 1.12 1.21 87.8 0.0 1.12 1.10 1.25

(40, 20) GEE-AR1 64.2 4.0 0.34 0.36 0.12 94.1 99.9 0.42 0.51 0.74WGEE-AR1 58.7 2.7 0.37 0.38 0.14 80.2 73.2 0.47 0.52 0.52MI-B GEE-AR1 65.2 6.7 0.34 0.34 0.12 82.9 59.9 0.41 0.47 0.37MI-C GEE-AR1 68.5 5.3 0.33 0.31 0.11 66.7 12.0 0.36 0.35 0.14LMM-AR1 92.9 0.3 0.81 1.22 0.66 99.4 33.1 0.77 1.24 1.94LMM-AR1R 87.1 0.6 1.14 1.19 1.30 77.8 �5.4 1.17 1.20 1.41LMM-UN 86.8 0.6 1.14 1.19 1.30 78.4 �5.4 1.17 1.20 1.41

�Power: empirical power (%).yRBD: percent relative bias. True treatment mean difference is �3.5. True treatment logit difference is �0.751.zSED: standard error estimate of treatment difference estimate (D) calculated from an average of 1000 standard error estimates.ySDD: empirical standard error of treatment difference estimate (D) calculated from standard deviation of 1000 estimated

treatment differences.zMSE: mean-squared error.


less powerful, it should be understood that thet-test or chi-square test analyses on the lastmeasurement occasion are more in line withclinical trial practice. It is clear from Table V,compared with Table V, that longitudinal dataanalysis provides higher statistical power forassessing treatment differences in addition to aunique opportunity to study changes of out-comes of interest over time while accounting formissing data.

4. DISCUSSION

The primary purpose of this study was to explorethe effect of dichotomization on incomplete long-itudinal continuous data using two popularstatistical approaches, ML-based LMM analysisfor the continuous outcome and GEE-based GLManalysis for the dichotomized outcome. A second-ary purpose of this study was to evaluate theperformance of the GEE-based GLM approach onthe incomplete longitudinal dichotomized outcomeunder a variety of missing patterns.

The simulation study, which was an attempt toanswer the questions of interest, highlighted that

ML-based LMM analysis on the incomplete long-itudinal continuous outcome provides well-preserved test sizes as well as well achieved powerestimates regardless of missing patterns when thecovariance matrix is correctly specified. It is alsoshown that using a simple structured AR1 covar-iance matrix will not impair the estimates of theparameter of interests, but could underestimate thetrue variance of the estimates, subsequently, result-ing in inflation of type 1 error rates. In mostapplications, however, it is impossible to know thetrue covariance structure a priori when planning astudy. It is therefore a good practice to begin withan unstructured covariance matrix or, if necessary,utilize a simple structured covariance matrix withadditional subject-specific random effects in themodel in a case of potential nonconvergence due toa small sample size with many repeated measure-ments. A possible alternative to reduce the effect ofmisspecification of covariance matrix is to imple-ment a robust, the so-called sandwich, varianceestimator [16,26–28]. In fact, empirical type 1 errorrates from LMM-AR1 analysis with a robustvariance estimator on the same simulated data setsfor (45, 45) and (40, 20) missing patterns underMAR were dropped to 8.9% and 17.9% (from24.3% and 37.1% in Table III) due to a larger, butmore comparable standard error estimates of 1.16and 1.10 compared with the naive model-basedstandard error estimates of 0.79 and 0.77.

Even though MI-C-GEE-based GLM providesless bias and more precise estimates over all otherGEE-based GLM analyses considered, the methodyields about 11–25% less power compared withML-based LMM analysis. Knowing the informa-tion loss of the dichotomization, these results werenot completely surprising. The results obtained inthis limited simulation study illustrate that dichot-omization will have a cost and valid inference onthe dichotomized outcome analysis are question-able. Given substantial loss of power and poten-tially distortion of test size on the incompletelongitudinal dichotomized outcome analysis, theresponder analysis (or dichotomized outcomeanalysis) should be discouraged as the primaryanalysis. Therefore, the primary analysis should bebased on the original, continuous outcome when


Table V. Empirical type I error rates and powerestimates.

MCAR MAR

Missingrate Method T1E� Powery T1E� Powery

(10, 10) MH 4.6 70.6 5.3 69.5ANCOVA 4.3 88.7 5.4 86.6

(30, 30) MH 5.8 58.4 7.3 63.7ANCOVA 5.2 76.8 8.4 84.4

(45, 45) MH 4.3 49.8 6.5 57.8ANCOVA 5.0 70.9 9.2 80.0

(10, 20) MH 5.5 67.3 13.3 42.5ANCOVA 5.7 85.5 14.2 63.4

(20, 40) MH 5.9 58.2 29.6 14.0ANCOVA 4.8 76.1 34.9 23.5

(20, 10) MH 4.9 68.7 11.6 89.4ANCOVA 4.6 86.1 14.1 97.7

(40, 20) MH 4.7 58.0 30.7 96.1ANCOVA 4.7 75.6 34.1 99.3

�T1E: Empirical type I error rate (%).yPower: Empirical power (%).

310 B. Yoo

available. Nevertheless, when the responder ana-lysis is required to show overall benefit inindividual patients as mentioned in regulatoryguidelines, the impact of missing data should becarefully investigated to address any distortion oftype I and type II errors before making anydecision on the choice of statistical method.

Though not reported here, similar simulationstudies were performed when the underlying errordistributions are not normal. The results fromusing uniform and exponential error distributionsare similar to those from the case when theunderlying error distribution is normal.

Contrary to previous researches [29,36], thecurrent simulation study results show that neitherWGEE nor MI-B-based GLM performed reason-ably well on the incomplete longitudinal dichot-omized outcome analysis. Additionally, the powerestimates from those two analyses are difficult tointerpret since the bias of treatment effectscan go either direction with different magnitudedepending on missing patterns. In fact, in a fewmissing patterns, they yield higher powerover MI-C-based GLM or even higher thanML-based LMM, but the higher power is mainlydriven by larger bias on the treatment effect. It isalso confirmed that, as theory predicted, standardGEE analysis is valid under MCAR. Althoughtheoretically MI does not provide consistentresults when there is a misspecification, overall,MI-C-GEE-based GLM analysis provides well-controlled test sizes and stable power estimates. Inalmost all longitudinal clinical trials, missingdata are inevitable and it is not possible toprecisely predict the missing data mechanisma priori when a trial is planned. Therefore, it isimportant to demonstrate that MI-C-GEE-based GLM analysis has a certain degree ofrobustness to various missing patterns, and thusthe analysis can be specified a priori as a validanalysis tool.

For the analysis of incomplete longitudinaldichotomized data, several routes are available.Apart from quasilikelihood-based GEE methods,likelihood-based methods, such as the generalizedlinear mixed effect model (GLMM), are attractivealternatives. It is clear from the current simulation

results that the missing data have a substantialimpact on type 1 error rate and power estimates.Thus, to sufficiently address missing data onGLMM, the effect must be thoroughly investi-gated through carefully designed simulation stu-dies as well as theoretical investigation. This is anarea for future research.

REFERENCES

1. FDA Draft Guidance or Industry: Patient-reportedoutcome measures: Use in Medical Product Develop-ment to Support Labeling Claims, 2006.

2. Committee for Medicinal Products for Human Use(CHMP): Guideline on medicinal products for thetreatment of Alzheimer’s disease and other demen-tias. EMEA, London, 2008.

3. Altman DG, Lausen B, Sauerbrei W, Schumacher M.The dangers of using ‘optimal’ cutpoints in theevaluation of prognostic factors. Journal of theNational Cancer Institute 1994; 86:829–835.

4. Cohen J. The cost of dichotomization. AppliedPsychological Measurement 1983; 7:249–253.

5. Faraggi D, Simon R. A simulation study of cross-validation for selecting an optimal cutpoint inunivariable survival analysis. Statistics in Medicine1996; 15:2203–2213.

6. Irwin JR, McClelland GH. Negative consequencesof dichotomizing continuous predictor variables.Journal of Marketing Research 2003; 40:366–371.

7. Lagakos SW. Effects of mismodelling and mismea-suring explanatory variables on tests of theirassociation with a response variable. Statistics inMedicine 1988; 7:257–274.

8. Lausen B, Schumacher M. Evaluating the effect ofoptimized cutoff values in the assessment of prog-nostic factors. Computational Statistics and DataAnalysis 1996; 21:307–326.

9. MacCallum RC, Zhang S, Preacher KJ, Rucker DD.On the practice of dichotomization of quantitativevariables. Psychological Methods 2002; 7:19–40.

10. Steven SM, Qi J. Responder analyses and theassessment of a clinically relevant treatment effect.Trials 2007; 8(31). DOI: 10.1186/1745-6215-8-31.

11. Little RA, Rubin DB. Statistical Analysis withMissing Data (2nd edn). Wiley: New York, 2002.

12. Rubin DB. Inference and missing data. Biometrika1976; 63:581–592.

13. Mallinckrodt CH, Kaiser CJ, Watkin JG et al.Type I error rates from likelihood-based repeatedmeasures analyses of incomplete longitudinal data.Pharmaceutical Statistics 2004; 3:171–186.



14. Hedeker D, Gibbons RD. Application of random-effects pattern-mixture models for missing data inlongitudinal studies. Psychological Methods 1992;2(1):64–78.

15. Laird NM. Missing data in longitudinal studies.Statistics in Medicine 1988; 7:305–315.

16. Liang KY, Zeger SL. Longitudinal data analysisusing generalized linear models. Biometrika 1986;73:12–22.

17. Fitzmaurice GM, Molenberghs G, Lipsitz SR.Regression models for longitudinal binary responseswith informative drop-outs. Journal of the RoyalStatistical Society, Series B 1995; 57:691–704.

18. Robins JM, Rotnitzky A, Zhao LP. Analysis ofsemiparametric regression models for repeated out-comes in the presence of missing data. Journalof the American Statistical Association 1995; 90:106–121.

19. Rubin DB. Multiple Imputation for Nonresponse inSurveys. Wiley: New York, 1987.

20. Park T. A comparison of the generalized estimatingequation approach with the maximum likelihoodapproach for repeated measurements. Statistics inMedicine 1993; 12:1723–1732.

21. Jennrich RI, Schluchter MD. Incomplete repeated-measures models with structured covariance ma-trices. Biometrics 1986; 42:805–820.

22. Laird NM, Ware JH. Random-effects models forlongitudinal data. Biometrics 1982; 38:963–974.

23. Molenberghs G, Thijs H, Jansen I, Beunckens C.Analyzing incomplete longitudinal clinical trialdata. Biostatistics 2004; 5:445–464.

24. Mallinckrodt CH, Clark WS, David SR. Account-ing for dropout bias using mixed-effects models.Journal of Biopharmaceutical Statistics 2001;11:9–21.

25. Mallinckrodt CH, Clark WS, David SR. Type Ierror rates from mixed effects model repeatedmeasures versus fixed effects ANOVA with missing

values imputed via last observation carried forward.Drug Information Journal 2001; 35:1215–1225.

26. Diggle PJ, Liang KY, Zeger SL. Analysis of Longi-tudinal Data. Oxford Science: Oxford, 1994.

27. Huber PJ. The behavior of maximum likelihoodestimates under nonstandard conditions. Proceed-ings of the Fifth Berkeley Symposium on Mathe-matical Statistics and Probability 1967; 1:221–223.

28. White HA. Heteroskedasticity-consistent covariancematrix estimator and a direct test for heteroskedas-ticity. Econometrica 1980; 48:817–830.

29. Preisser JS, Lohman KK, Rathouz PJ. Performanceof weighted estimating equations for longitudinalbinary data with drop-outs missing at random.Statistics in Medicine 2002; 21:3035–3054.

30. Dmitrienko A, Molenberghs G, Chuang-Stein C,Offen W. Analysis of Clinical Trials Using SAS:A Practical Guide. SAS Institute Inc.: NC, Cary,2007.

31. Meng XL. Multiple-imputation inferences withuncongenial sources of input (with discussion).Statistical Science 1994; 10:538–573.

32. Schafer JL, Graham JW. Missing data: our view ofthe state of the art. Psychological Methods 2002;7(2):147–177.

33. Schafer JL. Analysis of Incomplete MultivariateData. Chapman & Hall: London, 1997.

34. Butler SM, Louis TA. Random effects models withnon parametric priors. Statistics in Medicine 1992;11:1981–2000.

35. Verbeke G, Lesaffre E. The effect of misspecifyingthe random-effects distribution in linear mixedmodels for longitudinal data. Computational Statis-tics and Data Analysis 1997; 23:541–556.

36. Beunckens C, Sotto C, Molenberghs G. A simula-tion study comparing weighted estimating equationswith multiple imputation based estimating equa-tions for longitudinal binary data. ComputationalStatistics and Data Analysis 2008; 52:1533–1548.


312 B. Yoo

Documents

The impact of dichotomization in longitudinal data analysis: a simulation study