12
Special Issue Paper Received 23 October 2012, Accepted 1 October 2013 Published online 17 October 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.6019 Inverse probability weighting in nested case-control studies with additional matching—a simulation study Nathalie C. Støer a * and Sven Ove Samuelsen a,b Nested case-control designs are inevitably less efficient than full cohort designs, and it is important to use available information as efficiently as possible. Reuse of controls by inverse probability weighting may be one way to obtain efficiency improvements, and it can be particularly advantageous when two or more endpoints are analyzed in the same cohort. The controls in a nested case-control design are often matched on additional factors than at risk status, and this should be taken into account when reusing controls. Although some studies have suggested methods for handling additional matching, a thorough investigation of how this affects parameter estimates and weights is lacking. Our aim is to provide such a discussion to help developing guidelines for practi- tioners. We demonstrate that it is important to adjust for the matching variables in regression analyses when the matching is broken. We present three types of estimators for the inverse sampling probabilities accounting for additional matching. One of these estimators was somewhat biased when the cases and controls were matched very closely. We investigated how additional matching affected estimates of interest, with varying degree of asso- ciation between the matching variables and exposure/outcome. Strong associations introduced only a small bias when the matching variables were properly adjusted for. Sometimes, exposure variables, for example, blood samples, are analyzed in batches. Rather, strong batch effects had to be present before this introduced much bias when the matching was broken. All simulations are based on a study of prostate cancer and vitamin D. Copyright © 2013 John Wiley & Sons, Ltd. Keywords: inverse probability weighting; matching; nested case-control; proportional hazard; weighted partial likelihood 1. Introduction To collect exposure information for all individuals in large cohorts can be expensive. Cost efficient designs such as case-cohort [1,2] or nested case-control (NCC) [3] are often used in stead. With both designs, information on exposure and other covariates are obtained for all cases, but only for a subset of the individuals not experiencing the event. In a case-cohort design, a subcohort is selected at the outset of the study and used at all event times. With a NCC design, m alive and event-free subjects are sampled at each event time and used as controls for the given case. In a NCC study, the controls are matched to their respective cases on follow-up time and often on one or more additional factors. Because of this matching, it has traditionally not been considered feasible to reuse the controls for other types of endpoints. In contrast, with case-cohort designs, the controls are readily available for all types of events. Some authors [4–7] have however suggested a method for break- ing the matching, and this enables NCC data to be analyzed in a similar manner as case-cohort data. The method involves calculating the probability of ever being sampled, and then weighting the cases and controls with their inverse sampling probability, referred to as inverse probability weighting (IPW). Reuse of controls in NCC studies can be useful in many settings. One example is competing risks where the controls for one endpoint could be used as additional controls for a competing endpoint. a Department of Mathematics, University of Oslo, PO Box 1053, 0316 Oslo, Norway b Department for Chronic Diseases, Norwegian Institute of Public Health, PO Box 4404 Nydalen, 0403 Oslo, Norway *Correspondence to: Nathalie C. Støer, Department of Mathematics, University of Oslo, PO Box 1053 Blindern, 0316 Oslo, Norway. E-mail: [email protected] 5328 Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

Inverse probability weighting in nested case-control studies with additional matching-a simulation study

Embed Size (px)

Citation preview

Page 1: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

Special Issue Paper

Received 23 October 2012, Accepted 1 October 2013 Published online 17 October 2013 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6019

Inverse probability weighting in nestedcase-control studies with additionalmatching—a simulation studyNathalie C. Støera*† and Sven Ove Samuelsena,b

Nested case-control designs are inevitably less efficient than full cohort designs, and it is important to useavailable information as efficiently as possible. Reuse of controls by inverse probability weighting may be oneway to obtain efficiency improvements, and it can be particularly advantageous when two or more endpointsare analyzed in the same cohort. The controls in a nested case-control design are often matched on additionalfactors than at risk status, and this should be taken into account when reusing controls. Although some studieshave suggested methods for handling additional matching, a thorough investigation of how this affects parameterestimates and weights is lacking. Our aim is to provide such a discussion to help developing guidelines for practi-tioners. We demonstrate that it is important to adjust for the matching variables in regression analyses when thematching is broken. We present three types of estimators for the inverse sampling probabilities accounting foradditional matching. One of these estimators was somewhat biased when the cases and controls were matchedvery closely. We investigated how additional matching affected estimates of interest, with varying degree of asso-ciation between the matching variables and exposure/outcome. Strong associations introduced only a small biaswhen the matching variables were properly adjusted for. Sometimes, exposure variables, for example, bloodsamples, are analyzed in batches. Rather, strong batch effects had to be present before this introduced muchbias when the matching was broken. All simulations are based on a study of prostate cancer and vitamin D.Copyright © 2013 John Wiley & Sons, Ltd.

Keywords: inverse probability weighting; matching; nested case-control; proportional hazard; weighted partiallikelihood

1. Introduction

To collect exposure information for all individuals in large cohorts can be expensive. Cost efficientdesigns such as case-cohort [1, 2] or nested case-control (NCC) [3] are often used in stead. With bothdesigns, information on exposure and other covariates are obtained for all cases, but only for a subset ofthe individuals not experiencing the event. In a case-cohort design, a subcohort is selected at the outsetof the study and used at all event times. With a NCC design, m alive and event-free subjects are sampledat each event time and used as controls for the given case.

In a NCC study, the controls are matched to their respective cases on follow-up time and often on oneor more additional factors. Because of this matching, it has traditionally not been considered feasibleto reuse the controls for other types of endpoints. In contrast, with case-cohort designs, the controls arereadily available for all types of events. Some authors [4–7] have however suggested a method for break-ing the matching, and this enables NCC data to be analyzed in a similar manner as case-cohort data. Themethod involves calculating the probability of ever being sampled, and then weighting the cases andcontrols with their inverse sampling probability, referred to as inverse probability weighting (IPW).

Reuse of controls in NCC studies can be useful in many settings. One example is competing riskswhere the controls for one endpoint could be used as additional controls for a competing endpoint.

aDepartment of Mathematics, University of Oslo, PO Box 1053, 0316 Oslo, NorwaybDepartment for Chronic Diseases, Norwegian Institute of Public Health, PO Box 4404 Nydalen, 0403 Oslo, Norway*Correspondence to: Nathalie C. Støer, Department of Mathematics, University of Oslo, PO Box 1053 Blindern, 0316 Oslo,Norway.

†E-mail: [email protected]

5328

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

Page 2: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Another type of situation could be a ‘subsequent event’ setting, where the second endpoint is a (small)subset of the first endpoint. We will consider the latter situation; however, we believe that the generalresults regarding IPW and additional matching are applicable also for other situations, for example,competing risks situations.

Salim et al. [6, 8] and Cai and Zheng [7] used IPW for NCC studies with additional matching. How-ever, a discussion about how the additional matching can affect final estimates and sampling probabilitiesis lacking. Moreover, they do not specify how the matching variables are handled in the regressions.Breslow and Day [9] demonstrated that carrying out unmatched analyses of matched data can resultin biased estimates. Thus, we believe that when the matching is broken, the matching variables shouldgenerally be adjusted for in the regression. We want to investigate how additional matching affects finalestimates, weights, and how the matching factors should be included in the regression model and inthe weight estimators. Salim et al. [6, 8] and Cai and Zheng [7] considered only one type of matchingcriterion and one estimator for the sampling probabilities; we want to look at alternatives.

We are going to create a realistic simulation model on the basis of an NCC study of the associationbetween prostate cancer and vitamin D levels extracted from blood samples [10]. This study is based ona cohort comprising a collection of health surveys, which for each cohort member consisted of a healthexamination including a blood sample. Because analyses of blood samples are expensive, an NCC designwas chosen, and only the blood samples for cases and controls were analyzed. The controls were matchedto the cases with respect to follow-up time, age at health examination, date of health examination, andcounty of residence.

In the paper by Meyer et al.[10], the interest lies in the association between diagnosis of prostatecancer and vitamin D. However, investigating the association between death from prostate cancer andvitamin D is also of interest due to the increasing practice of screening for this type of cancer. With thetraditional NCC analysis, only controls for cases that died from prostate cancer can be used for deathas endpoint, even though all controls are readily available. We therefore want to break the matching andreuse all controls when analyzing death from prostate cancer as endpoint.

We will, in this paper, investigate how some potential problems of IPW can affect estimates ofinterest. In particular, we want to study how strong correlation between matching variables andexposure/outcome, narrow matching and batch effects influence estimates, standard errors, and weights.The outline is as follows: in Section 2, we describe the data framework. In Section 3, we describein detail reuse of controls and inverse probability weighting. In Section 4, we state our simulationsetup, investigate the potential problems, and give the results. Finally, we conclude with a discussion inSection 5.

2. Data framework

2.1. The cohort

We consider a subsequent event setting, which means that we have two different outcomes where thesecond outcome always happens after the first. We let Di indicate which endpoints individual i experi-enced; Di D 0 corresponds to a censored observation, and Di D 1 means that the subject experiencedthe first endpoint but not the last; finally, Di D 2 indicates that the subject experienced both endpoints.Let D1 be the collection of all cases of the first type and D2 the collection of all cases of the second type.In our example, Di D 1 means that subject i was (only) diagnosed with prostate cancer, and Di D 2indicates that the subject also died from prostate cancer. In the following, let subscript k D 1 indicatethe first endpoint and k D 2 the second endpoint.

The cohort consists of n subjects. We assume that individual i is followed from an inclusion time vi

to time of event Qtik or time of censoring ci , and we only observe tik D min.Qtik; ci /. Specifically for thefirst endpoint, the subjects are followed from vi to ti1. For the second endpoint, the subjects are followedfrom vi to ti2. Note that we consider age from inclusion in study to age of event for both outcomes(k D 1: Incidence and k D 2: Death from cancer). Let xi be covariates including exposure and possibleconfounders for individual i and let ˇ0 D .ˇ1; : : : ; ˇp/ be the corresponding regression coefficients.Further, let ´i be other covariates that will be used as matching variables and let these have regressioncoefficients � 0 D .�1; : : : ; �q/. We then assume outcome-specific proportional hazard models for the twoendpoints with the hazard for subject i for endpoint k given by

hik.t jxi ; ´i ; ˇk; �k/ D h0k.t / exp.ˇ0kxi C � 0

k´i /

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

5329

Page 3: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

where h0k.t / is the baseline hazard for the k-th endpoint. We will carry out separate Cox-regressionsfor incidence of prostate cancer and death from prostate cancer with age as time scale. Alternatively, wecould have chosen time on study as time scale, that is, time from health survey. Then all subjects wouldhave been followed from time zero, and there would have been no left-truncation. We, however, thinkthat time since health survey is a rather arbitrary time scale and that the risk of prostate cancer typicallydepends on age.

2.2. The nested case-control study

With an NCC-design, m subjects are sampled as controls for each case. A potential control must be atrisk at the event time of the case and, in addition, meet the matching criteria. Hence, the eligible controlsfor a case at time tj with matching variables ´j are the set Pj D fi W vi < tj < ti ; ´i 2 Œ´j �"; ´j C"�; i D1; : : : ; ng. In our subsequent event setting, we only sampled the controls to cases of the first type, andcases of the second type use their already sampled controls. The " and ´j are vectors, and each elementin " represents one matching criterion.

We will consider two types of matching criteria: category matching and caliper matching. For categorymatching, the case and its controls must match exactly on the given matching variable; hence, the corre-sponding component in " is zero. With caliper matching, a subject is considered as a possible control ifthe value of its matching variable lies within a specified interval around the case’s value. As an example,in the original study, the controls were matched on county of residence, which is category matching, andage at blood sampling ˙ 6 months and date of blood sampling ˙ 2 months, both being caliper matching;hence, " D .0; 6; 2/ provided that time is measured in months.

Estimation in an NCC study has traditionally been based on a partial likelihood [3]. For events of typek, this likelihood is given by

Lk.ˇk/ DY

j 2Dk

exp�ˇ0

kxj

�P

i2Rjkexp

�ˇ0

kxi

� : (1)

Here, Rjk is the sampled risk set, consisting of the case and its sampled controls at tjk , andQ

j 2Dkis

taken to mean the product over all cases of type k. Inference can be based on large sample theory forthe partial likelihood [11]; hence, the estimator is approximately normally distributed, and the estimatedvariance is obtained from the inverse of the information matrix. In a situation with multiple outcomes,we carried out separate analyses for each outcome.

In our setting, we sampled the controls for incident cases, and using the traditional estimator for deathfrom prostate cancer might not be a standard procedure. The requirements for valid inference in anNCC-study is that the controls meet the matching criteria and is a random sample from the collection ofpossible controls. As long as the controls do not change their behavior in any way after they have beensampled as controls for incidence cases, they will still be a random sample when considering death asendpoint. The controls cannot have changed their behavior because the sampling was carried out retro-spectively, hence after the time of death of the cases. Further, the controls, who are censored or deadbefore the case they were sampled for died, are excluded from the traditional analysis. This correspondsto not sampling any controls for those particular cases.

3. Inverse probability weighting

The partial likelihood (1) is stratified with respect to risk sets, where each risk set consists of a casetogether with the matched controls. Because of the stratification, the cases and controls are only used inthe estimation at the event time of the particular case. A seemingly more efficient estimation proceduresuggested by Samuelsen [4] is based on a weighted partial likelihood

Lk.ˇk; �k/ DY

j 2Dk

exp�ˇ0

kxj C � 0

k´j

�P

i2Sjexp

�ˇ0

kxi C � 0

k´i

�wi

(2)

where Sj is the collection of all cases and controls at risk at time tj and wi is a weight. Because thecases are over-represented in Sj , we weight all controls with their inverse probability of ever being sam-pled as a control. Cases are given weight 1 because they are included with probability 1. We assumetime-constant covariates for notational simplicity; however, time-varying covariates are also possible ifthey are known at all event times the subject is at risk.

5330

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

Page 4: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Different ways of estimating the inclusion probabilities pi D 1=wi have been proposed; Samuelsen[4] suggested an estimator which resembles the Kaplan-Meier (KM) estimator. Later, Salim et al. [6]and Cai and Zheng [7] generalized this to handle matching,

pi D 1 �Y

j 2D1;i2Pj

�1 � m

jPj j�

: (3)

Here, m is the number of sampled controls per case, jPj j is the number of possible controls for the caseat time tj , and the product is over all cases the given subject could be sampled as a control for. Becausewe only sample controls for incident cases, only D1 is used in (3). We refer to these wi D 1=pi as KMtype of weights.

Another possibility for estimating pi , discussed by Samuelsen et al. [12], Saarela et al. [13], and Støerand Samuelsen [14], is to assume logistic regression models for the sampling indicator Oi . This indica-tor is 1 for sampled controls and 0 for non-sampled individuals. The covariates in the logistic regressionmodel are inclusion times and censoring times; thus, the model takes the form

pi D EŒOi jti ; vi � D exp.� C f .ti ; vi //

1 C exp.� C f .ti ; vi //: (4)

If we let f .ti ; vi / D f1.ti / C f2.vi / where f:.:/ are some smooth functions and � an intercept term, weobtain a generalized additive model (GAM). If we instead choose f:.:/ to be linear functions, the resultis a standard logistic regression model (GLM). It is important to note that the regression is carried outon the cohort without the cases.

In (4), we have disregarded additional matching, and a simple extension is to include the matchingvariables as covariates in the regression models [13]. How they should be included will depend on thesituation. For category matching, one could include the matching variables as categorical covariates.With caliper matching, one could include them as continuous covariates, categorical covariates aftersuitable grouping, or as smooth functions. Note that the KM type of estimator and the logistic regressionestimators handle matching variables in weight estimation differently. With the KM estimator, only theindividuals that could be sampled as controls are used in the estimation. In contrast, all non-cases areused in the logistic regression, and the matching variables are only included as covariates.

The likelihood contributions in the weighted partial likelihood (2) are not independent becausethe controls enter the likelihood whenever they are at risk. Hence, the inverse information matrix ofLk.ˇk; �k/ cannot be used to estimate the variance. For the KM type of weights, Samuelsen [4] deriveda variance estimator when there is no additional matching, and Cai and Zheng [7] generalized this withrespect to additional matching. However, as far as we know, there exists no variance estimator for theregression coefficients when logistic regression type of weights are applied in the Cox-regression, and apossibly conservative solution is to use robust variances [15,16].

4. Simulations

4.1. Prostate cancer and vitamin D

We constructed the simulations in the succeeding text to have similar features as a study of the associationbetween prostate cancer and vitamin D, described by Meyer et al. [10]. The cohort in the original studyis a collection of health surveys conducted in 1981–1991 in 17 of Norway’s 19 counties. However, tomake the simulations more transparent, we have restricted the simulations to only mimic one health sur-vey conducted in Oppland county from 1981 to 1983. This is the largest individual study and comprisesone third of the cases.

The Oppland cohort consists of n D 15949 men who were followed from age at health survey v toincidence of prostate cancer, death, emigration, or the end of December 2006, whichever happened first.During follow-up, we identified 698 incident prostate cancer cases, and we sampled m D 1 control foreach case. We matched the controls on follow-up time and two additional factors: ´1 D age at healthexamination (˙6 months) and ´2 D date of health examination (˙2 months).

The main research question in Meyer et al. [10] was how vitamin D levels influence the risk of prostatecancer. However, because of the increasing practice of screening, many mild cases might be discovered.If we want to analyze the more serious cases that would have been discovered irrespectively of screening,death from prostate cancer might then be a better endpoint. Out of the 698 incident cases, 162 died from

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

5331

Page 5: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

prostate cancer. With a traditional analysis of death from prostate cancer, we can only use the controlsfor cases that actually died from the cancer. Because all information needed to carry out analyses hasbeen collected for all sampled controls, we loose efficiency when the controls for incident cases that didnot die are excluded.

4.2. Simulation setup

Our cohort is of the same size as the original data, and for each individual, we simulated the variablesas described in the succeeding text. We have taken the age at blood sampling vi , which is also one ofthe matching variables, from the original study. The same applies to the other matching variable, dateof blood sampling. The exposure, Ei , is vitamin D, and this was drawn from a gamma distribution withmean 62 C ıMi

and variance 5 � .62 C ıMi/, where ıMi

is a constant depending on the month of bloodsampling, Mi 2 f1; 2; : : : ; 12g. The mean is varying with the month of blood sampling to reflect thevarying vitamin D levels during the year in Norway due to seasonal variation of length of daylight andpotential sun exposure. The ıMi

is ranging from -12 nmol/l in January to the maximum of 16 nmol/lin July. All vitamin D levels smaller than 20 nmol/l were set to 20 nmol/l to eliminate negative andunrealistically small values.

We then sampled the age at incidence from a Weibull distribution

QTi1 � Weibull�a1; b1 exp

�ˇ1

�Ei � ıMi

� C � 01´i

��:

Here, ˇ1 is the regression coefficient for the exposure, and �1 is the vector of regression coefficientsfor the matching variables. We also simulated the years from incidence of prostate cancer to death fromprostate cancer from a Weibull distribution, and added it to the age at incidence to construct the age atdeath from prostate cancer, that is, age at death from prostate cancer equals

QTi2 � Weibull�a2; b2 exp

�ˇ2

�Ei � ıMi

� C � 02´i

�� C QT1i :

We have used Ei � ıMito model that it is the yearly overall level of vitamin D that is important and

not the particular value at the day the blood was sampled. We let the original data guide the choice ofparameters: ˇ1 D 0:14, and a1 D 67, b1 D 15. In the original study, the estimated effect of the asso-ciation between vitamin D and prostate cancer, ˇ3, was close to �0:25. Because there is an estimatedadverse effect of vitamin D on incidence, ˇ2 must be chosen smaller than �0:25 to obtain ˇ3 D �0:25;by choosing ˇ2 D �1:3, we achieved this. Of similar reason, the �2 vector was chosen somewhat smallerthan the estimated effects in the originally data. The vectors �1 and �2 are both of length 12, and thevalues range, respectively, from �0:62 to 0:48 and from �0:95 to �0:01. Finally, we let a2 D 5 andb2 D 5. The proportional hazards assumption is not fulfilled for death as endpoint because the hazard ofdying from prostate cancer changes at the time of incidence of prostate cancer.

All individuals who did not experience the event before the end of December 2006 were censored. Tomimic the background mortality in the population, we have drawn a censoring time C from a Gompertzdistribution with a 10% yearly increase in mortality rate. The survival times Ti1 and Ti2 were thenmin. QTi1; Ci ; Ai / and min. QTi2; Ci ; Ai /, respectively, where Ai is the age at the end of December 2006. Ithappened that Ti1 was less than the inclusion time vi , corresponding to prostate cancer occurring beforeinclusion in study. For those subjects, a new survival time was drawn, because the original cohort wasrestricted to individuals without a previous cancer diagnosis at the beginning of the study.

In all the following analysis, except the midmost in Table II, we adjusted the cohort analyses andIPW analyses for the matching variables. Age at blood sampling was included as a continuous covari-ate, whereas date of blood sampling was categorized into month of blood sampling and included as acategorical covariate. We chose to only use month of health examination because the amount of sunexposure is assumed to have a yearly variation. The matching variables were included in the weightestimation in all simulations, except the rightmost in Table II. We included the inclusion time, censoringtime, and date of blood sampling as smooth functions in the GAM model. With logistic regression typeof weights, we included date of blood sampling by the categorical covariate month of blood sampling,whereas the inclusion time and censoring time were included as continuous covariates. Although wehave two matching variables, age at blood sampling and date of blood sampling, age at blood samplingis equal to inclusion time; thus, the regression models for the weights are only adjusted for one matchingvariable in addition to inclusion time and censoring time.

5332

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

Page 6: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Table I. Simulation mimicking the original study.

Method ˇ se.ˇ/ Emp.se Eff. ˇ se.ˇ/ Emp.se Eff.

Incidence Death

Cohort 0:139 0:062 0:062 — �0:259 0:139 0:142 —Trad. 0:147 0:089 0:089 0:485 �0:255 0:204 0:205 0:480

KM 0:146 0:095 0:094 0:435 �0:253 0:161 0:162 0:768

GAM 0:145 0:091 0:092 0:454 �0:252 0:158 0:159 0:798

GLM 0:145 0:091 0:091 0:464 �0:252 0:158 0:159 0:798

Based on 1000 simulations. Cohort estimated with ordinary Cox-regression. KM/GAM/GLM estimated with (2).True value of ˇ for incidence: 0.14, true value of ˇ for death: unknown.KM, Kaplan–Meier; GAM, generalized additive model; GLM, standard logistic regression model; Trad., thetraditional estimator, estimated with (1); se.ˇ), estimated (Cohort and Trad.) or robust standard errors (KM,GAM, GLM); Emp.se, empirical standard errors; Eff., efficiency compared with cohort calculated as the ratio ofthe cohort empirical variance to the empirical variance for Trad./IPW.

4.3. Mimicking the original study

Table I displays the result of a simulation similar to the original study. For the death from prostate cancerendpoint, the true value of ˇ is unknown because the survival times are the sum of two Weibull dis-tributed survival times with different parameters to reflect an adverse effect of vitamin D on incidence ofprostate cancer and later a small protective effect of vitamin D on death from prostate cancer. Becausewe do not know the true parameter value, comparing the estimates with the cohort estimate is sensible.

For incidence as endpoint, there is an indication of a slight bias for the IPW estimators, whereas fordeath as endpoint, the difference between the cohort estimate and the IPW estimates is smaller relativeto the standard errors. The empirical and robust standard errors are similar for both incidence and death.For incidence as endpoint, the efficiency for the IPW and the traditional estimator is about the same. Incontrast, for the death endpoint, the efficiency for IPW is much larger than for the traditional estimator.The main reason for this is that while the traditional estimator only uses the controls for the cases thatdied from prostate cancer, the IPW estimators use all sampled controls also when analyzing death asendpoint.

4.4. Association between matching variables and exposure/outcome

When the association between matching variables and exposure is weak, it should not matter how thematching is handled when estimating the weights and whether it is adjusted for in the Cox-regression ornot. However, when the association is strong, it could matter, and to investigate these issues, we carriedout one simulation with a strong correlation between the matching variables and exposure. Second, wecarried out a simulation with a strong association between the matching variables and both outcomeand exposure.

Because the matching variables were taken from the original study, we used a Cholesky decompositionto increase the correlation between the matching variables and vitamin D. This amounts to first decideupon the desired correlation matrix for those variables, then performing a Cholesky decomposition ofthat correlation matrix, and multiplying the result with the matrix consisting of the matching variablesand vitamin D. The new matrix, consisting of the altered vitamin D and matching variables, will nowapproximately have the required correlation matrix. The distribution of the matching variables and vita-min D changed somewhat, however retained their main features. We chose the correlation between theage at blood sampling and vitamin D to be 0:7 and between date of blood sampling and vitamin D to be�0:5. With the original matching variables and the simulation setup described earlier, the correlation wasapproximately 0 and 0:17 for age and date of blood sampling, respectively. We increased the associationbetween the matching variables and outcome by doubling the regression coefficients for the matchingvariables.

We divided Table II into two panels: in panel A, only the correlation between the matching variablesand vitamin D is increased, whereas in panel B, the association between the matching variables andboth exposure and outcome is increased. The leftmost column displays the results when the matchingvariables are included both in the weight estimation and adjusted for in the IPW Cox-regressions. In thesecond column, we included the matching variables in weight estimation but did not adjust for them in the

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

5333

Page 7: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Table II. Stronger association between matching variables/exposure.

Incorporating the matching Disregarding the matching I Disregarding the matching II

Method ˇ se.ˇ/ Emp.se ˇ se.ˇ/ Emp.se ˇ se.ˇ/ Emp.se

Panel A: strong association between matching variables and exposure

IncidenceCohort 0:138 0:123 0:120 �0:126 0:077 0:079 — — —Trad. 0:153 0:150 0:143 — — — — — —KM 0:155 0:184 0:183 �0:146 0:112 0:102 0:150 0:176 0:175

GAM 0:150 0:177 0:176 �0:128 0:106 0:093 0:150 0:177 0:176

GLM 0:150 0:177 0:175 �0:115 0:108 0:094 0:150 0:177 0:175

DeathCohort �0:275 0:274 0:273 �0:518 0:168 0:163 — — —Trad. �0:268 0:347 0:341 — — — — — —KM �0:285 0:312 0:312 �0:539 0:184 0:182 �0:285 0:307 0:306

GAM �0:284 0:307 0:307 �0:524 0:180 0:172 �0:284 0:307 0:306

GLM �0:284 0:307 0:307 �0:512 0:181 0:172 �0:284 0:307 0:307

Panel B: strong association between matching variables and both exposure and outcome

IncidenceCohort 0:142 0:127 0:126 �0:480 0:077 0:079 — — —Trad. 0:174 0:154 0:154 — — — — — —KM 0:158 0:194 0:191 �0:500 0:105 0:102 0:148 0:186 0:182

GAM 0:148 0:186 0:183 �0:485 0:092 0:093 0:148 0:186 0:183

GLM 0:150 0:186 0:182 �0:455 0:093 0:094 0:151 0:186 0:182

DeathCohort �0:235 0:265 0:258 �0:790 0:159 0:154 — — —Trad. �0:189 0:330 0:332 — — — — — —KM �0:234 0:311 0:300 �0:806 0:181 0:173 �0:241 0:305 0:296

GAM �0:242 0:305 0:297 �0:795 0:174 0:163 �0:241 0:306 0:297

GLM �0:239 0:305 0:296 �0:767 0:177 0:165 �0:239 0:305 0:296

Based on 1000 simulations. Cohort estimated with ordinary Cox-regression. KM/GAM/GLM estimated with (2).True value of ˇ for incidence: 0.14, true value of ˇ for death: unknown.Incorporating the matching indicates matching variables included in weight estimation and adjusted for.Disregarding the matching I indicates matching included in weight estimation but not adjusted for.Disregarding the matching II indicates matching not included in weight estimation but adjusted for.Adjustment for matching variables applies to all estimators except Trad.KM, Kaplan–Meier; GAM, generalized additive model; GLM, standard logistic regression model; Trad., the traditionalestimator, estimated with (1); se.ˇ), estimated (Cohort and Trad.) or robust standard errors (KM, GAM, GLM); Emp.se,empirical standard errors.

Cox-regressions. We displayed the results when the matching variables are not included in weight esti-mation, but adjusted for in the IPW Cox-regressions, in the rightmost column. We did not adjust thetraditional estimator for the matching variables in any of the simulations in Table II.

The leftmost column in panel A indicates that the IPW estimates for incidence are slightly biased,whereas the corresponding estimates for death from prostate cancer are more similar to the cohortestimates, relative to the standard errors. For the incidence endpoint, the traditional estimator is moreefficient than IPW. For death from prostate cancer, on the other hand, IPW is more efficient because itutilizes all sampled controls.

In the second column in panel A, we included the matching variables in weight estimation but did notadjust for them in the Cox-regression. The result is that IPW is biased for both endpoints. This is not sur-prising because the matching variables have a fairly strong confounding effect. The estimates from IPWare similar to the cohort estimates not adjusted for the matching variables. This indicates inconsistency,and including the matching variables when estimating the weights might not be enough to adjust forthe confounding.

5334

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

Page 8: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

The rightmost column in panel A displays the results when the matching variables have been adjustedfor in the Cox-regression while not included in the weight estimation. The estimates are similar to theleftmost column. This simulation indicates that it might be more important to adjust for the matchingvariables in the Cox-regression than to incorporate the matching in the weight estimation.

The results in panel B show a similar pattern for the IPW estimators. However, the bias in the middlepart, due to not adjusting for the matching variables, is larger in this simulation. The reason for thisis probably that the confounding effect of the matching variables is stronger because the associationbetween the matching variables and both the exposure and the outcome is increased. Another thing tonote is that this simulation indicates a bias also for the traditional estimator. However, when we adjustedthe traditional estimator for month of blood sampling, the results for incidence and death, respectively,were ˇ D 0:139 (emp. se = 0:186) and ˇ D �0:246 (emp. se = 0:434), which is close to the cohortestimates. After adjustment, the standard error of the traditional estimator in the analysis of incidence ofprostate cancer became comparable with the standard errors of IPW. This indicates that the reason for theefficiency gain of the traditional estimator for incidence of prostate cancer might be that the traditionalestimator is adjusted for fewer variables. Finally, also in this simulation, adjusting the weights for thematching variables was of minor importance.

4.5. Close matching

When we weight the cases and controls with IPW, the underlying idea is in a sense to reconstruct the fullcohort by letting each control represent a number of the individuals not sampled. If the matching crite-rion is narrow, and hence very few individuals are eligible controls, the true probability of being sampledwill be large for the controls, and the idea of reconstructing the cohort might break down. Because of thedifference between the KM estimator and the logistic regression estimators with regards to handling ofmatching, the consequence can be somewhat different. The KM type of weights will be small, sometimeseven 1. In contrast, the logistic regression estimators only include the matching variables as covariatesand, thereby, do not fully take into account how close the cases and controls are matched. This has theconsequence that the weights do not necessarily become as small as with the KM type of weights.

To investigate how narrow matching criteria might affect estimation with IPW, we set up two simu-lation experiments where the matching criteria for age at blood sampling and date of blood samplingwere, respectively, ˙1 month and ˙1 week in the first simulation and ˙1 week and ˙1 day in thesecond simulation. For the GAM weights, both date of blood sampling and age at blood sampling wereincluded as smooth terms in the logistic regression. However, sensitivity analyses indicated that it was ofminor importance how they were included (results not shown). For GLM weights, we included the dateof blood sampling in the regressions by the categorical covariate month of blood sampling, whereas ageof blood sampling was included as a continuous covariate.

Table III displays the number of possible controls for each matching criterion and the distributionof the weights estimated with KM and GLM. The GLM weights are not affected by how narrow thematching criterion is, whereas the KM weights, on the other hand, are strongly affected by this.

The left side of Table IV shows that with a fairly narrow matching criterion, none of the weightsshows any bias in the estimates. For GAM and GLM, this is not surprising because those weights do nottake into account how narrow the matching criterion actually is. It is, however, more surprising that thepoint estimate with the KM type of weights appears unbiased because the maximum number of potentialcontrols for a case is only around 20 (Table III).

With the narrowest matching criterion, the maximum number of controls is around 10 (Table III). Inthis situation, the estimate from the KM type of weights shows a clear bias. This is likely a small sampleproperty, because in a hypothetical situation where the matching criterion was held constant, whereasthe cohort was increased, the sampling probabilities would not become as large.

4.6. Batch effects

Another problem of IPW is related to inaccurate exposure information. Often biological material suchas blood samples and gene expressions is analyzed in batches, and because of a number of factors, vari-ation can occur between samples from different batches. Samples from the same batch will thereby bemore similar than samples from different batches. For instance, in the original study, we analyzed theblood samples in batches of up to 50, and the case-control pairs were always in the same batch. Withthe traditional estimator, as long as cases and controls belong to the same batch, the variation between

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

5335

Page 9: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Table III. Number of possible controls for each case and distribution ofKaplan–Meier and standard logistic regression model weights.

Minimum 1. Quartile Median 3. Quartile Maximum

Setting 1Controls 10 100 149 186 280

wi KM 4:1 10:0 13:3 20:4 166:7

wi GLM 8:8 11:6 15:9 22:7 142:9

Setting 2Controls 1 6 9 12 21

wi KM 1:0 3:2 5:4 8:0 20:0

wi GLM 8:0 11:4 15:6 23:3 142:9

Setting 3Controls 1 2 3 4 9

wi KM 1:0 1:0 2:0 3:0 8:0

wi GLM 8:1 13:0 17:9 27:0 200:0

Setting 1: original matching criteria, ˙6 months (age of blood sampling) and˙2 months (date of blood sampling).Setting 2: ˙1 month (age) and ˙1 week (date).Setting 3: ˙1 week (age) and ˙1 day(date).Numbers from one simulation.KM, Kaplan–Meier; GLM, standard logistic regression model.

Table IV. Close matching for incidence of prostate cancer.

˙ 1 month and ˙ 1 week ˙ 1 week and ˙ 1 day

Method ˇ se.ˇ) Emp.se ˇ se.ˇ) Emp.se

Cohort 0:139 0:062 0:061 0:141 0:062 0:063

Trad. 0:138 0:091 0:087 0:140 0:099 0:100

KM 0:136 0:092 0:090 0:117 0:084 0:084

GAM 0:142 0:091 0:088 0:144 0:097 0:097

GLM 0:142 0:091 0:088 0:144 0:096 0:097

Based on 1000 simulations. Cohort estimated with ordinary Cox-regression.KM/GLM/GAM estimated with (2).True value of ˇ D 0:14.KM, Kaplan–Meier; GAM, generalized additive model; GLM, standard logistic regressionmodel; Trad., traditional estimator, estimated with (1); se.ˇ), estimated (Cohort and Trad.) orrobust standard errors (KM, GAM, GLM); Emp.se, empirical standard errors.

batches will not pose a problem. However, when we break the matching and compare a case with allsubjects at risk, the variation between batches can be problematic.

In the original study, we analyzed the samples in batches of up to 50; however, we are only simulatingdata from one county, and there will be fewer cases and controls in each batch. To be able to comparethe IPW results with analyses on the entire cohort, we ran simulations letting all subjects belong to somebatch. We chose 140 batches resulting in approximately five incident case-control pairs in each batch.

We now imagine that the already simulated vitamin D level, E, is the ‘true’ level, and because ofvariation between batches, each measurement gets an addition or reduction such that the measured levelE� D E C �. The � is the same for all subjects within the same batch, but varies between batches and

� � N

�0; �2

E

R2

1 � R2

:

Here, R2 D 1 � �2E

�2E�

is the proportion of the variance in vitamin D explained by the batches, and

�2E and �2

E� are the variance of the ‘true’ vitamin D level and the measured level, respectively.The E� is used to estimate regression coefficients, whereas the original E is still used to simulateevent times.

5336

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

Page 10: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Table V. Summary of batch effects.

R2 D 0 R2 D 0:1 R2 D 0:3

Method ˇ se.ˇ) Emp.se ˇ se.ˇ) Emp.se ˇ se.ˇ) Emp.se

IncidenceCohort 0:139 0:062 0:062 0:125 0:059 0:059 0:095 0:051 0:052

Trad. 0:147 0:089 0:089 0:146 0:089 0:089 0:141 0:088 0:084

KM 0:146 0:095 0:094 0:130 0:090 0:086 0:096 0:078 0:065

GAM 0:145 0:091 0:092 0:130 0:086 0:083 0:095 0:075 0:061

GLM 0:145 0:091 0:091 0:130 0:086 0:083 0:095 0:075 0:061

DeathCohort �0:259 0:139 0:142 �0:229 0:130 0:131 �0:161 0:110 0:109

Trad. �0:255 0:204 0:205 �0:252 0:203 0:205 �0:235 0:200 0:207

KM �0:253 0:161 0:162 �0:223 0:151 0:148 �0:153 0:129 0:123

GAM �0:252 0:158 0:159 �0:223 0:148 0:145 �0:154 0:126 0:120

GLM �0:252 0:158 0:159 �0:222 0:148 0:144 �0:154 0:126 0:120

Based on 1000 simulations. Cohort estimated with ordinary Cox-regression. KM/GAM/GLM estimatedwith (2).True value of ˇ for incidence: 0.14, true value of ˇ for death: unknown.R2 D 0 corresponds to Table I.KM, Kaplan–Meier; GAM, generalized additive model; GLM, standard logistic regression model;Trad., the traditional estimator, estimated with (1); se.ˇ), estimated (Cohort and Trad.) or robuststandard errors (KM, GAM, GLM); Emp.se - empirical standard errors.

Table V shows the results from the simulation with batch effects, with R2 D 0, no batch effect,R2 D 0:1, moderate batch effect corresponding to the estimated effect in the original study and R2 D 0:3,considered to be a strong batch effect.

The true ˇ for incidence is 0.14, and in the situation with R2 D 0, there is a slight indication of bias,however away from zero. With R2 D 0:1, the estimates from IPW and the cohort estimate are somewhatsmaller than 0.14, and with R2 D 0:3, both the cohort estimate and the IPW estimates were clearlydrawn toward zero. Because also the cohort analysis is affected by the batch effect, this is likely a largesample issue.

As in Tables I and II, the true value of ˇ for the death from prostate cancer endpoint is unknown.However, comparing with the traditional estimator in this situation is reasonable because the batch effectproblem is eliminated with that estimator. The pattern is the same for death as for incidence. However,the batch effects are in absolute value larger, whereas the relative effect of the batches are fairly constantbetween incidence and death. For R2 D 0:1, the relative differences between the estimates with andwithout batch effects are 7% and 11%, whereas with R2 D 0:3, the relative differences are 32% and34%, for incidence and death, respectively.

5. Discussion

We have evaluated IPW methods for NCC studies in three settings through simulations: strong correla-tion between matching variables and exposure/outcome, close matching and batch effects, in addition topresenting the result from a simulation with none of the aforementioned issues.

All simulations indicated only minor differences between the weighting methods with regard to haz-ard ratios and standard errors. The first simulation indicated that IPW for NCC with additional matchingmay work well in the situation without any of the issues described earlier. All estimates were fairly closeto the cohort estimates, and robust and empirical standard errors were similar for both endpoints. Inthe second simulation, we induced larger correlation between the matching variables and both exposureand outcome. The IPW estimates were close to the cohort estimates, sometimes even closer than thetraditional estimator, and the empirical and robust standard errors were similar. We found, somewhatsurprising, that even with a fairly strong correlation, including the matching variables in weight estima-tion had only minor influence on the estimates and standard errors. When also the association betweenthe matching variables and outcome was increased, the conclusion remained the same.

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

5337

Page 11: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Even though it did not seem to be very important to include the matching variables when estimatingthe weights, the matching should not be ignored. The simulations in the middle part of Table II indi-cated that it is essential to include the matching variables in the Cox-regressions, especially when theyare confounders. The reason for this is that while the confounding is ‘matched away’ with the tradi-tional estimator, the matching is broken with IPW; thus, the matching variables should be adjusted forto remove the confounding. How to deal with the matching variables in the Cox-regression seems to nothave been discussed by authors addressing IPW for NCC with additional matching [6–8]. We suggestto either adjust for matching variables or include them as smooth functions. An alternative solution is tocarry out stratified Cox-regressions with stratification on matching variables. However, stratified anal-yses are generally less efficient than un-stratified analyses, and we believe that the best approach is toadjust for the matching variables. Our simulations (not shown) indicated that if the matching variablesare not adequately accounted for, estimates might still be biased. Thus, sensitivity analyses, such ascomparing with the traditional estimator, are advisable.

With a strong correlation between the matching variables and exposure, the simulation indicated thatthe traditional estimator is more efficient than IPW for the incidence endpoint. A possible explanationis that the matching increases the efficiency for the traditional estimator. When the matching is broken,the efficiency that was gained because of the matching may be lost, resulting in IPW being less efficientthan the traditional estimator. However, when IPW can use extra controls compared with the traditionalestimator, as for the death endpoint, there can still be a gain in efficiency. In the simulation where theassociation between both the exposure and outcome was increased, the traditional estimator appearedto be biased. However, after adjusting for month of blood sampling, the bias disappeared. Additionally,the standard errors increased to the same level as the IPW estimators after the adjustment. Hence, analternative explanation for why the traditional estimator is more efficient than the IPW estimators in theincidence analysis is that it is adjusted for fewer covariates, that is, not adjusted for the matching vari-ables. With caliper matching, the intervals can be too wide to fully capture the confounding, and thenadditional adjustment may be important also for the traditional estimator.

A problem with additional matching is that there can be few potential controls for some or all cases.The result is that the true sampling probability will be large, resulting in small weights and the idea ofreconstructing the cohort can break down. To investigate this problem we performed two simulationswith fairly close and very close matching criteria. The KM type of weights gave biased estimates for theincidence endpoint with the very narrow matching criteria, whereas the GAM and GLM weights wereunbiased for both matching criteria.

We also considered batch effects and our simulations suggested that for small coefficients a fairly largebatch effect is needed before the estimates become severely biased. However smaller batch effects canbe problematic with larger coefficients. It also seemed that the batch effect influences the relative riskestimates similar to a non-differential measurement error, hence pulling it toward zero. It might thereforebe possible to adjust for batch effects using measurement error methods. In some situations, one maywant to reuse the controls for a new endpoint without sampling any new controls. The blood sampleanalysis then needs to be carried out for all the new cases. These cases would potentially be run togetherin the same batch, without any new controls. Investigating what would happen in such situations seemsimportant, however beyond the scope of this paper.

Salim et al. [8] showed that using prior controls sampled for a different endpoint is not as efficient assampling new controls. Under certain circumstances, they found that six prior controls might not corre-spond to even one new control in efficiency. This point should be kept in mind when reusing controls.However, in our study, this is not relevant because we only supplement with the controls for cases thatdid not die from prostate cancer, although it could imply that the number of effective controls [8] fordeath from prostate cancer endpoint is somewhat smaller than the actual number of controls.

We have discussed two types of weight estimators: KM type and logistic regression type; however,other possibilities exist. One possibility, proposed by Chen [5], is local averaging. Without any additionalmatching, this involves partitioning the time axis with regard to both censoring and inclusion time andcalculating separate weights for controls with different combinations of intervals. However, we believethat this will be difficult to generalize to additional matching in practice, because all matching variableswould need to be partitioned, hence giving rise to a large number of combinations of intervals.

Breslow et al. [17,18] has suggested calibration of weights as a method of increasing the efficiency incase-cohort designs by incorporating information known for all subjects in the cohort into the weights.If fully observed information is correlated with covariates only known for cases and controls, such asvitamin D in our study, the calibrated weights can be more efficient than the original weights. Støer and

5338

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

Page 12: Inverse probability weighting in nested case-control studies with additional matching-a simulation study

N. C. STØER AND S. O. SAMUELSEN

Samuelsen [14] have earlier considered calibration for NCC designs, however in a setting without anyadditional matching.

There exist other ways than IPW to reuse controls, for instance by full likelihood methods [13, 19].With these methods, the full cohort is used, and variables only known for cases and controls are treatedas missing. In some situation, these methods can be more efficient than IPW; however, they rest onmore modeling assumptions and are computationally demanding. With regard to additional matching,the method of Saarela et al. [13] should be readily available as they state that the likelihood is validregardless of sampling scheme. However, we believe that the matching variables also here should beadjusted for to account for possible confounding.

Acknowledgements

We would like to thank Ørnulf Borgan and Haakon E. Meyer for useful discussions.

References1. Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 1986;

73(1):1–11.2. Kalbfleisch JD, Lawless JF. Likelihood analysis of multi-state models for disease incidence and mortality. Statistics in

Medicine 1988; 7(1-2):149–160. DOI: 10.1002/sim.4780070116.3. Thomas DC. Addendum to “Methods of cohort analysis: appraisal by application to asbestos mining” by Liddell

FDK, McDonald JC and Thomas DC. Journal of the Royal Statistical Society, Series A 1977; 140:469–491. DOI:10.2307/2345280.

4. Samuelsen SO. A pseudolikelihood approach to analysis of nested case-control studies. Biometrika 1997; 84(2):379–394.DOI: 10.1093/biomet/84.2.379.

5. Chen KN. Generalized case-cohort sampling. Journal of the Royal Statistical Society, Series B 2001; 63(4):791–809. DOI:10.1111/1467-9868.00313.

6. Salim A, Hultman C, Sparén P, Reilly M. Combining data from 2 nested case-control studies of overlapping cohorts toimprove efficiency. Biostatistics 2009; 10(1):70–79. DOI: 10.1093/biostatistics/kxn016.

7. Cai T, Zheng Y. Nonparametric evaluation of biomarker accuracy under nested case-control studies. Journal of theAmerican Statistical Association 2011; 106(494):569–580. DOI: 10.1198/jasa.2011.tm09807.

8. Salim A, Yang Q, Reilly M. The value of reusing prior nested case-control data in new studies with different outcome.Statistics in Medicine 2012; 31(11-12):1291–1302. DOI: 10.1002/sim.4494.

9. Breslow NE, Day NE. Statistical Methods in Cancer Research: Volume 1 - The Analysis of Case-Control Studies.International Agency for Research on Cancer: Lyon, 1980.

10. Meyer HE, Robsahm TE, Bjørge T, Brunstad M, Blomhoff R. Vitamin D, season and the risk of prostate cancer. A nestedcase-control study within Norwegian health studies. Americal Journal of Clinical Nutrition 2013; 97(1):147–154. DOI:10.3945/ajcn.112.039222.

11. Borgan Ø, Goldstein L, Langholz B. Methods for the analysis of samled cohort data in the Cox proportional hazardsmodel. Annals of Statistics 1995; 23(5):1749–1778. DOI: 10.1214/aos/1176324322.

12. Samuelsen SO, Ånestad H, Skrondal A, Stratified case-cohort analysis of general cohort sampling designs. ScandinavianJournal of Statistics 2007; 34(1):103–119. DOI: 10.1111/j.1467-9469.2006.00552.x.

13. Saarela O, Kulathinal S, Arjas E, Läärä E. Nested case-control data utilized for multiple outcomes: a likelihood approachand alternatives. Statistics in Medicine 2008; 27(28):5991–6008. DOI: 10.1002/sim.3416.

14. Støer NC, Samuelsen SO. Comparison of estimators in nested case-control studies with multiple outcomes. Lifetime DataAnalysis 2012; 18(3):261–283. DOI: 10.1007/s10985-012-9214-8.

15. Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. Journal of the American StatisticalAssociation 1989; 84(408):1074–1078. DOI: 10.2307/2290085.

16. Barlow WE. Robust variance-estimation for the case-cohort design. Biometrics 1994; 50(4):1064–1072. DOI:10.2307/2533444.

17. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Using the whole cohort in the analysis of case-cohortdata. American Journal of Epidemiology 2009; 169(11):1398–1405. DOI: 10.1093/aje/kwp055.

18. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Improved Horvitz-Thompson estimation of modelparameters for two-phase stratified samples: applications in epidemiology. Statistics in Biosciences 2009; 1(1):32–49.DOI: 10.1007/s12561-009-9001-6.

19. Scheike TH, Juul A. Maximum likelihood estimation for Cox’s regression model under nested case-control sampling.Biostatistics 2004; 5(2):193–206. DOI: 10.1093/biostatistics/5.2.193.

Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 5328–5339

5339