24
STATISTICS IN MEDICINE Statist. Med. 2004; 23:2375–2398 (DOI: 10.1002/sim.1825) Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy Margaret May 1; ; , Patrick Royston 2 , Matthias Egger 3 , Amy C. Justice 4 and Jonathan AC Sterne 1 1 Department of Social Medicine; University of Bristol; Bristol; U.K. 2 MRC Clinical Trials Unit; London; U.K. 3 Department of Social and Preventive Medicine; University of Bern; Bern; Switzerland 4 Section of General Internal Medicine; West Haven VA Medical Center and Yale University School of Medicine; U.S.A. For the ART Cohort Collaboration SUMMARY The process of developing and validating a prognostic model for survival time data has been much discussed in the literature. Assessment of the performance of candidate prognostic models on data other than that used to t the models is essential for choosing a model that will generalize well to independent data. However, there remain diculties in current methods of measuring the accuracy of predictions of prognostic models for censored survival time data. In this paper, exible parametric models based on the Weibull, loglogistic and lognormal distributions with spline smoothing of the baseline log cumulative hazard function are used to t a set of candidate prognostic models across k data sets. The model that generalizes best to new data is chosen using a cross-validation scheme which ts the model on k 1 data sets and tests the predictive accuracy on the omitted data set. The procedure is repeated, omitting each data set in turn. The quality of the predictions is measured using three dierent methods: two commonly proposed validation methods, Harrell’s concordance statistic and the Brier statistic, and a novel method Correspondence to: Margaret May, Department of Social Medicine, University of Bristol, Canynge Hall, Whiteladies Road, Bristol BS8 2PR, U.K. E-mail: [email protected] Members of study groups are listed at end of paper. Contract=grant sponsor: U.K. Medical Research Council; contract=grant number: RD1564 Contract=grant sponsor: GlaxoSmithKline, Roche and Boehringer-Ingelheim Contract=grant sponsor: Agence Nationale de Recherches sur le SIDA Contract=grant sponsor: Institut National de la Sant e et de la R echerche Medicale Contract=grant sponsor: French, Italian and Swiss Ministries of Health Contract=grant sponsor: The Dutch Stichting HIV Monitoring Contract=grant sponsor: The European Commission Contract=grant sponsor: The British Columbia and Alberta Governments Contract=grant sponsor: The Michael Smith Foundation for Health Research Contract=grant sponsor: The Canadian Institutes of Health Research Received December 2002 Copyright ? 2004 John Wiley & Sons, Ltd. Accepted January 2004

Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

Embed Size (px)

Citation preview

Page 1: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

STATISTICS IN MEDICINEStatist. Med. 2004; 23:2375–2398 (DOI: 10.1002/sim.1825)

Development and validation of a prognostic model for survivaltime data: application to prognosis of HIV positive patients

treated with antiretroviral therapy

Margaret May1;∗;†, Patrick Royston2, Matthias Egger3, Amy C. Justice4

and Jonathan AC Sterne1

1Department of Social Medicine; University of Bristol; Bristol; U.K.2MRC Clinical Trials Unit; London; U.K.

3Department of Social and Preventive Medicine; University of Bern; Bern; Switzerland4Section of General Internal Medicine; West Haven VA Medical Center and Yale University School

of Medicine; U.S.A.

For the ART Cohort Collaboration‡

SUMMARY

The process of developing and validating a prognostic model for survival time data has been muchdiscussed in the literature. Assessment of the performance of candidate prognostic models on data otherthan that used to �t the models is essential for choosing a model that will generalize well to independentdata. However, there remain di�culties in current methods of measuring the accuracy of predictions ofprognostic models for censored survival time data. In this paper, �exible parametric models based on theWeibull, loglogistic and lognormal distributions with spline smoothing of the baseline log cumulativehazard function are used to �t a set of candidate prognostic models across k data sets. The model thatgeneralizes best to new data is chosen using a cross-validation scheme which �ts the model on k−1 datasets and tests the predictive accuracy on the omitted data set. The procedure is repeated, omitting eachdata set in turn. The quality of the predictions is measured using three di�erent methods: two commonlyproposed validation methods, Harrell’s concordance statistic and the Brier statistic, and a novel method

∗Correspondence to: Margaret May, Department of Social Medicine, University of Bristol, Canynge Hall, WhiteladiesRoad, Bristol BS8 2PR, U.K.

†E-mail: [email protected]‡Members of study groups are listed at end of paper.

Contract=grant sponsor: U.K. Medical Research Council; contract=grant number: RD1564Contract=grant sponsor: GlaxoSmithKline, Roche and Boehringer-IngelheimContract=grant sponsor: Agence Nationale de Recherches sur le SIDAContract=grant sponsor: Institut National de la Sant�e et de la R�echerche MedicaleContract=grant sponsor: French, Italian and Swiss Ministries of HealthContract=grant sponsor: The Dutch Stichting HIV MonitoringContract=grant sponsor: The European CommissionContract=grant sponsor: The British Columbia and Alberta GovernmentsContract=grant sponsor: The Michael Smith Foundation for Health ResearchContract=grant sponsor: The Canadian Institutes of Health Research

Received December 2002Copyright ? 2004 John Wiley & Sons, Ltd. Accepted January 2004

Page 2: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2376 M. MAY ET AL.

using deviance di�erences. The results show that the deviance statistic is able to discriminate betweenquite similar models and can be used to choose a prognostic model that generalizes well to new data.The methods are illustrated by using a model developed to predict progression to a new AIDS eventor death in HIV-1 positive patients starting antiretroviral therapy. Copyright ? 2004 John Wiley &Sons, Ltd.

KEY WORDS: prognostic model; validation; survival model; deviance; cross-validation; HIV

1. INTRODUCTION

In many clinical settings, prognostic information is needed to inform decisions on when andhow to treat patients. Prognostic models are used to discriminate between groups of patientslikely to experience better or worse outcomes and to give predictions of disease progression.An important requirement of these models is that they generalize well to populations otherthan those from which they were derived and so can be used in a wide range of clinicalsettings.Steps in choosing a candidate prognostic model include (i) deciding which type of model

and distribution to use, (ii) selecting prognostic variables and their categorization and (iii) ex-amining possible interactions between prognostic variables. Nested models may be comparedusing likelihood ratio tests, while both nested and non-nested models with di�erent numbersof parameters are often compared using penalized likelihood methods such as Akaike’s In-formation Criterion (AIC), de�ned as twice the deviance (i.e. −2× the log-likelihood) plustwice the number of model parameters [1].Measures of the quality of predictions are required in order to evaluate candidate prognos-

tic models. There is much debate about appropriate measures for models based on censoredsurvival data [2–4]. Some proposals for quantifying the predictive performance of survivalmodels have concentrated on discrimination, while others have considered accuracy of pre-dictions, either at the group level or for individual patients. The quality of predictions maybe measured by comparing observed with predicted outcomes using: (i) individual survivaltimes, (ii) median survival times of prognostic groups, (iii) survival status or (iv) predictedprobability of survival of individuals at a chosen time point. Graf et al. [2] reviewed thestatistical methods most commonly used for the assessment and comparison of prognosticclassi�cation schemes, including comparison of Kaplan–Meier estimates of survival probabil-ities in groups of patients with predicted estimates from the model, Harrell’s c statistic [5]which is a measure of concordance between predicted and observed outcomes and the Brierscore [6] which measures predictive accuracy between predicted and observed outcomes.It is important to realize that the model that gives the best predictions on the data used to

�t candidate models may not be the best model when it is subjected to external validationon independent data. When the same data set is used to choose the distribution of the model,the prognostic variables and their categorization and interactions and also to estimate thecoe�cients of the model there will be a degree of ‘over�tting’ [7]. If performance is alsomeasured on the same data, the model will su�er from ‘statistical optimism’, a term coinedby Efron [8] to denote the di�erence in performance when the model is tested on the samedata that was used to �t it compared to testing it on independent data. To try to quantifythis optimism and shrink regression coe�cients or estimates of �t to more realistic values,

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 3: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2377

Van Houwelingen and Le Cessie [9] advocated splitting the data into two parts, one usedfor �tting and the other used for testing the model, and Verweij and Van Houwelingen [10]used cross-validation to estimate shrinkage factors following the methods �rst advocated byCopas [11]. Bootstrapping could be used instead of cross-validation to provide replicate datasets for testing the model [12]. One criticism of methods based on a random split of thedata, or on replicate data sets generated by cross-validation or bootstrapping, is that the dataused for testing are statistically homogeneous with the data used for �tting: testing modelson independently collected data would provide a more realistic assessment of the quality ofpredictions.Our aim in this study was to construct prognostic models for survival time data from multi-

ple cohorts, and to develop and use statistical methods for selecting the model that generalizesbest. We used generalizability as a criterion for model selection in an internal–external cross-validation (IECV) scheme proposed by Royston et al. [13], which is designed to overcomethe problem of over�tting. We illustrate the methods using data from the Antiretroviral Ther-apy Cohort Collaboration (ART Cohort Collaboration), which examines rates of progressionto AIDS or death among HIV-1 positive patients who were drug na��ve when they startedhighly active antiretroviral therapy.The structure of the paper is as follows. In Section 2, we describe the ART Cohort Col-

laboration and the data. Section 3 describes the three measures of validation used to evaluatethe quality of predictions. In Section 4, we describe the methods used to construct a set ofcandidate prognostic models and to choose the model that generalizes best using the IECVscheme. Section 5 describes the results of the validation process. In Section 6, we show howwe checked the accuracy of the �nal model both in terms of discrimination and calibration.Finally, in Section 7, we discuss the results and suggest possible directions for future research.

2. THE ART COHORT COLLABORATION DATA SET

The ART Cohort Collaboration was established at the 4th International Workshop on HIVObservational Databases in Marbella, Spain, in 2000. Its aim is to provide su�cient data for�tting prognostic models which give accurate estimates of survival free of a new AIDS eventor death and absolute survival probabilities for patients with di�erent baseline characteristics.Thirteen studies agreed to collaborate, including 9 cohorts from European countries, two fromCanada and one from the United States of America, as well as the multicentre EuroSIDAstudy, which enrolled patients from 20 European countries. All participating cohorts have beenapproved by local ethics committees or institutional review boards, use standardized methodsof data collection, and schedule follow up visits at least once every 6 months. Descriptionsof the methodologies have been published elsewhere [14–26].The ART Cohort Collaboration has combined data from 13 observational cohorts of HIV

positive patients who are receiving antiretroviral therapy (ART) de�ned as at least threedrugs, including nucleoside analogues reverse trancriptase inhibitors, protease inhibitors andnon-nucleoside analogues reverse trancriptase inhibitors. Details of the individual cohorts arein the appendix. There were 12 574 patients eligible for inclusion in the study. All patientswere drug na��ve at the start of taking ART and were aged 16 or over. Age, sex, disease statusas de�ned by Centers for Disease Control (CDC) clinical stage, risk group and drug regimenwere recorded at start of therapy. Measurements of CD4 cell count and HIV-1 RNA closest

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 4: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2378 M. MAY ET AL.

to the starting date of ART were taken as the baseline levels. Patients were only included inthe study if the date of these measurements was 3 months prior to start of ART or up to1 week after the start date. Patients were followed up for new clinical AIDS events as de�nedby CDC stage C and for mortality.We considered the probability of progression to a combined endpoint of a new AIDS-

de�ning disease or death from any causes. Elsewhere, we have presented models with mor-tality as an outcome [27], but because of space considerations we will present results onlyfor the combined outcome here. We used the clinical part of the 1993 Centers for DiseaseControl and Prevention revision of the AIDS case de�nition (i.e. people without an AIDS-de�ning disease but with a CD4 cell count below 200 cells=�l were not classi�ed as havingAIDS) [28]. We did not consider AIDS events as a separate endpoint because information oncauses of death did not allow us to distinguish between deaths due to AIDS-de�ning condi-tions and deaths from other causes. In all analyses we used an ‘intent-to-continue-treatment’approach and thus ignored subsequent changes to treatment, including treatment interruptionsand terminations. We measured time from the initiation of ART to the date the endpoints oc-curred; the date of the most recent follow up visit, or the date the patient was last known tobe alive.There were 870 new AIDS events and 344 deaths. For the combined outcome of new AIDS

event or death, there were a total of 1094 events. The median follow up time of individualpatients was 1.75 years and the maximum follow up time was 5.5 years. Table I showsfor each of the 13 cohorts the number of patients, the number of �rst AIDS events subse-quent to the start of therapy, the number of deaths and the median and maximum follow upin years.Table II shows the median values of the nine candidate prognostic variables. For each

variable it gives the median value in the pooled data together with the range of the mediansin the 13 individual cohorts. The median age across the cohorts was similar, but the percentageof injection drug users (IDU) varied from 2 per cent to 39 per cent. The cohorts also variedsubstantially in the severity of the infection at the time of starting ART as measured bythe percentage with AIDS prior to treatment (CDC stage C), amount of immunosuppression(CD4 cell count) and viral load (copies of HIV-1 RNA). Treatment regimens also di�eredwith some cohorts starting all patients on three drugs and all on a protease inhibitor, whilstother cohorts started up to 25 per cent of patients on more than three antiretroviral drugs andup to 41 per cent not on a protease inhibitor.

3. MEASURES FOR VALIDATION OF PROGNOSTIC MODELS

In this section, we review three methods used to measure the performance of the prognosticmodels and to choose the best model from among a set of candidate models.

3.1. Measuring discrimination of a prognostic model using concordance of observed andpredicted outcomes: Harrell’s c statistic

Receiver operating characteristic (ROC) curves, graphs of sensitivity (true-positive rate) ver-sus one minus speci�city (true-negative rate), were developed to quantify the accuracy ofdiagnostic tests. The global performance of a test may be characterized by the area under

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 5: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2379TableI.Numbersofpatients,�rstAIDSeventssubsequenttostartoftherapy,deathsandfollow-upinyearsin13HIVcohorts.

Noofyearsoffollowup

Description

Noofpatients

Noof

Noof

Cohort

ofstudy

includedinanalysis

events

deaths

Median

Longest

FrenchHospitalDatabase

Grabaretal.[14]

4739

362

431.6

3.1

onHIV(FHDH)

ItalianCohortofAntiretroviral-Naive

D’ArminioMonforteetal.[15]

1641

8060

1.9

3.5

Patients(ICONA)

SwissHIVCohortStudy

Eggeretal.[16]

1268

7551

1.9

5.1

(SHCS)

AIDSTherapyEvaluation

Nieuwkerketal.[17]

919

6831

2.8

4.7

projectNetherlands(ATHENA)

EuroSIDA

Lundgrenetal.[18]

840

5729

2.6

4.3

20centresinEurope

CollaborationsinHIVOutcomes

Beckeretal.[19]

601

448

2.4

4.1

ResearchUS(CHORUS)

FrankfurtHIV

Brodtetal.[20]

553

4821

2.7

4.4

CohortGermany

AntiproteaseCohort

LeMoingetal.[21]

538

4734

2.3

3.8

(APROCO)France

AquitaineCohortFrance

Binquetetal.[22]

486

3137

1.6

4.1

BritishColumbiaCentrefor

Hoggetal.[23]

422

921

2.2

3.5

ExcellenceinHIV=AIDSCanada

RoyalFreeHospitalCohortUK

Mocroftetal.[24]

321

294

1.7

3.6

SouthAlbertaClinicCanada

Mocroftetal.[25]

154

115

1.9

5.5

K�oln=BonnCohort

Fatkenheueretal.[26]

929

01.3

5.0

Germany

Allcohorts

12574

870

344

1.75

5.5

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 6: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2380 M. MAY ET AL.

Table II. Candidate prognostic variables in the HIV cohorts showing median value (or per cent) forpooled cohorts and range of medians (or per cent) across 13 cohorts.

Range of median

Prognostic factor Median Min. Max.

Age (years) 38 37 41CD4 cell count∗ (cells× 106=l) 250 140 310HIV-1 RNA∗ (log 10 copies=ml) 4.9 4.5 5.4Year of starting therapy 1997 1997 1999

Median per cent Min. per cent Max. per centSex (male) 79 71 92Injection drug use 19 2 39CDC Stage C 21 17 42Antiretroviral therapy ¿3 drugs 5 0 25Regimen including a protease inhibitor 87 59 100

the ROC curve (AUROC), which can be interpreted as the probability that the result of arandomly selected patient in the positive state will be greater than the result of a randomlyselected patient in the negative state, i.e. the probability that the two results are ranked in thecorrect order [29].In survival models, discrimination is the ability of the model to allocate to patients who

experience the outcome of interest a higher predicted probability of experiencing the eventthan that allocated to those who did not experience the event. To apply ROC methodology tosurvival data it is necessary to choose a particular time point t∗ at which the survival statuscan be ascertained. Popular choices for t∗ include a �xed number of years, the median followup time or the median survival time. Graf et al. [2] pointed out that when the prediction ismade at t=0 the status at t∗ is not yet determined, but will evolve in the period from 0to t∗ according to some stochastic mechanism. For many patients the status at t∗ will neverbe known because they are censored before t∗. Another criticism of this approach is thatAUROC measures the discrimination of the model at an arbitrary time point rather than overall the relevant time period.These issues are addressed by an adaptation of ROC methodology to survival data proposed

by Harrell [5]. This method does not depend on choosing a �xed time for evaluation of themodel and takes into account censoring of individuals. Harrell’s c index is de�ned as theproportion of all usable subject pairs in which the predictions and outcomes are concordant.In predicting the time to an event, c is calculated using all possible pairs of subjects, at leastone of whom has su�ered the event. If the predicted survival time is larger for the subjectwho actually survived longer, the predictions for that pair are concordant with the outcomes. Ifone subject su�ered the event and the other is known to have survived at least to the survivaltime of the �rst, the second subject is assumed to survive the �rst. When predicted survivaltimes are identical for a pair then 0.5 rather than 1 is added to the count of concordant pairsin the numerator of c, but one is added to the denominator which counts usable pairs. Theformula for the concordance statistic is

c=(E + T=2)=N

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 7: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2381

where E is the number of orderings as expected, T is number of tied predictions and N isnumber of comparisons.A value for c of 0.5 corresponds to the agreement expected by chance and 1 corresponds

to perfect concordance.

3.2. Measuring accuracy of predictions for individual patients: Brier score

The usual approach to evaluating the accuracy of an ordinary least squares regression modelis to look at the residual mean square error, which is a measure of the discrepancy betweenthe predicted values and the observed values. The Brier score is a commonly used methodbased on the same approach which was originally developed to judge the inaccuracy ofweather forecasts [6] and can be used when the outcome is binary. It measures the averagesquared di�erence between the outcome event and the forecast probability of the outcomeevent. For survival data, a straightforward approach would be to calculate the mean squareerror of prediction for the median survival time. However, the median survival time may beoutside the range of the observed survival times of individuals due to insu�cient follow up.It is possible to calculate the mean square error of prediction for survival at an arbitrary timepoint t∗, although this approach has the same disadvantages as discussed earlier in the contextof AUROC methods.The Brier score for a survival time T which depends on covariates X is de�ned as the mean

squared di�erence between the observed failure status at a particular time t∗ and the expectedprobability of surviving beyond that time according to the predictions from the model

BS(t∗)=1n

n∑

i=1(I(Ti¿t∗)− �(t∗|X i))2

where I(T¿t∗)∈ {0; 1} is the observed event status and �(t∗|X ) is the estimated event freeprobability.

3.3. Measuring loss of �t using deviance di�erences

The deviance of a model, D, de�ned as −2× log-likelihood, can be thought of as a measureof lack of �t of the data to the model. The performance of a survival model on external datamay be measured by comparing the �t using the parameters estimated from the original dataset (‘�xed’ parameters) with the �t using parameters re-estimated from the external data. Wedenote the deviance using the �xed parameters as DF and the deviance using re-estimatedparameters as DR. If the two deviances are similar then the model generalizes well. Thedi�erence in deviances, DR −DF, has an approximately chi-squared distribution with degreesof freedom equal to the number of parameters p in the model, since this is the di�erence inthe number of degrees of freedom between the model �tted with re-estimation and the model�tted without re-estimation of parameters. This methodology was discussed by Spiegelhalteret al. [30] who used deviance statistics to choose the best model in the context of a Bayesianframework. The result that the deviance di�erence is distributed as �2 with degrees of freedomequal to the number of re-estimated parameters depends on the independence of the parameters.Although it is likely that this assumption is violated in this application, nevertheless a testbased on the assumption of independence will tend to be liberal since the e�ective numberof parameters must be less than or equal to the actual number of parameters and the sum

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 8: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2382 M. MAY ET AL.

of the degrees of freedom with the corresponding �2 reference value still provides a usefulbenchmark for comparing models.

4. METHODS

4.1. Choice of candidate prognostic models

We �tted fully parametric survival models based on the Weibull, loglogistic and lognormaldistributions. The Weibull model assumes proportional hazards and the loglogistic model pro-portional odds for the covariate e�ects. Because the hazard function for the Weibull survivaldistribution is monotonic and the hazard functions for the loglogistic and lognormal distribu-tions are unimodal, we additionally considered �exible parametric models, based on the samethree distributions, which include spline smoothing of the baseline log cumulative hazard func-tion [31]. This allows for the possibility that the hazard might decrease and then subsequentlyincrease, for example if patients were adversely a�ected by toxicity or drug resistance. Wefound that two internal knots were su�cient to model the curvature of the splines.A priori, we decided to categorize the three continuous variables according to groupings

used in standard clinical practice: CD4 cell count (¡50, 50–99, 100–199, 200–349, 350 andover), log10 HIV-1 RNA (¡3, 3–3.99, 4–4.99, 5 and over) and age (16–29, 30–39, 40–49, 50 and over). Initial data exploration was carried out using Kaplan–Meier survival plotsto see which variables were prognostic and whether there were any threshold e�ects. Nexta backwards stepwise variable selection was used to choose prognostic variables and theircategorization using a simple Weibull model with strati�cation by cohort to allow for thepossibility of between cohort heterogeneity. In the �rst stage, a variable was omitted if theWald p-value was ¿0:2. A second stage in the stepwise selection procedure used indicatorvariables for each of the levels of the categorical variables instead of the combined categoricalvariables allowing independent selection of di�erent groupings within the categorical variables.We also tested for non-proportionality of hazards using Schoenfeld residuals and examinedwhether models allowing time varying e�ects of the baseline hazards �tted better. Candidateinteractions and strati�cation were chosen for consideration in di�erent model types if theWald p-value for the interaction or strati�cation term in the Weibull model without splineswas ¡0:05.Altogether we considered six model types: the Weibull, loglogistic, lognormal and their

spline generalizations. For each type of model we chose the two models with the lowest AIC,to give a set of 12 candidate prognostic models.

4.2. Using internal–external cross-validation to choose prognostic model

The �nal prognostic model was chosen from the candidate models using an internal–externalcross-validation (IECV) system proposed by Royston et al. [13]. Candidate prognostic modelswere �tted using leave-one-out cross-validation on a cohort basis. In this scheme, if there arek cohorts, then the model is �tted on k − 1 pooled cohorts (with no strati�cation on cohort)and its performance is measured on the left out cohort. This is repeated k times, rotatingthe left out cohort. The IECV procedure was applied to each of the 12 candidate prognosticmodels, together with all simpler models nested within them. We tested generalizability ofthe models on the omitted cohort using the three methods described in Section 3.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 9: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2383

Table III. Hazard ratio for progression to AIDS or death from Weibull proportional hazards models.

AIDS or death

Covariate Hazard ratio (95 per cent CI)

Baseline CD4 count (cells=�l)¡50 150–99 0.74 (0.62 to 0.88)100–199 0.52 (0.43 to 0.62)200–349 0.24 (0.20 to 0.30)¿350 0.18 (0.14 to 0.22)

Plasma viral load (log copies=ml)¿5 1¡5 0.75 (0.66 to 0.86)

Clinical stageCDC stage A=B 1CDC stage C 1.38 (1.20 to 1.59)

Risk factor for transmissionNot injection drug use 1Injection-drug use 1.53 (1.31 to 1.78)

Age (years)16–49 1¿50 1.40 (1.20 to 1.63)

Adjusted for all variables listed in the table and strati�ed by cohort.

5. RESULTS

5.1. Development of candidate prognostic models

The stepwise variable selection showed that of the nine candidate variables in Table II, only�ve variables were prognostic: CD4, HIV-1 RNA, age, stage and risk group (IDU versus non-IDU). Kaplan–Meier survival plots for these have been published elsewhere [27]. There wasevidence of threshold e�ects for both HIV-1 RNA and age, which were therefore modelled asdichotomous variables: HIV-1 RNA at ¡5 or ¿5 log10 copies=ml and age ¡50 or ¿50 years.Table III shows the hazard ratios, from Weibull models, for the prognostic factors includedin the model with strati�cation on cohort. The variable selection was invariant to change ofmodel and robust to changes in the probability level used to select variables. Therefore allcandidate prognostic models included the same variables.Because the hazards were not proportional for CD4 groups, disease stage and risk group,

we considered models with strati�ed baseline hazard functions. For some models we alsofound evidence of an interaction between CD4 and disease stage. Therefore, for each survivaldistribution, we compared 64 di�erent models; with and without interaction between CD4 anddisease stage, with and without strati�cation on CD4, stage and risk group, and with andwithout spline terms. For each survival distribution we chose four candidate models—the twowith the lowest AIC, with and without including spline terms for the hazard function, whichgave the 12 candidate prognostic models shown in Table IV.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 10: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2384 M. MAY ET AL.

TableIV.Descriptionofthe12candidateprognosticmodelsandthe�nalprognosticmodel(lastlineoftable)showingAIC,Harrell’s

cstatistic,Brierscoreandsummeddeviancedi�erences.

Type

Scale

Interaction

Strati�cation

AIC

Harrellc

Brier1year

p†

dof‡=11p

5percent�2

§DR

−D

¶ F

Weibull

PHCD4∗stage

CD4risk

9600

0.723

0.0553

19209

243

84∗

Weibull

PHCD4∗stage

CD4

9605

0.724

0.0553

18198

231

308

Loglogistic

POCD4∗stage

CD4risk

9586

0.724

0.0553

19209

243

114∗

Loglogistic

POCD4∗stage

CD4

9592

0.725

0.0553

18198

231

309

Lognormal

Normal

CD4∗stage

Riskstage

9541

0.724

0.0553

16176

207

409

Lognormal

Normal

CD4∗stage

Risk

9544

0.725

0.0553

15165

195

282

Spline

PHCD4∗stage

CD4risk

9490

0.691

0.0553

31341

385

1161

Spline

PHno

CD4risk

9496

0.688

0.0553

27297

338

1122

Spline

POCD4∗stage

CD4risk

9489

0.691

0.0553

31341

385

1164

Spline

POno

CD4risk

9495

0.688

0.0553

27297

338

1128

Spline

Normal

CD4∗stage

CD4riskstage

9545

0.692

0.0553

34374

420

1420

Spline

Normal

CD4∗stage

Riskstage

9541

0.698

0.0553

22242

279

587

Weibull

PHno

CD4risk

9605

0.725

0.0553

15165

195

33∗

C Denotesdeviancedi�erencelessthanreference�2implyingmodelsgeneralizewelltoallcohorts.

† p=No:ofparametersre-estimated.

‡ dof=degreesoffreedom.

§ The95thpercentagepointofthereference�2value.

¶ DR

−DF=Deviancefrom

modelwithparametersre-estimatedintestdataset−

deviancefrom

modelwith�xedparameters.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 11: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2385

0

0.02

0.04

0.06

0.08

0.1

0 0.5 1 1.5 2 2.5 3

Years from start of HAART

Bri

er s

core

Figure 1. Brier scores showing increase in inaccuracy with time (0 is perfect prediction and 0.25 isprediction no better than chance).

5.2. Validation of models using internal–external cross-validation (IECV) method

We found little evidence of between-cohort heterogeneity in either covariate e�ects or baselinehazard and therefore could justify �tting a model across all cohorts using the IECV system. Inthis part of the modelling procedure, there was no explicit modelling of cohort e�ects. Becausethe smallest three cohorts did not have enough events to allow estimation of the models weamalgamated them, so that 11 data sets were used in the cross-validation procedure. Weevaluated the quality of predictions of the models using (i) Harrell’s statistic of concordancec, (ii) the Brier score and (iii) the deviance measure within the IECV scheme.

5.2.1. Harrell’s c statistic. A weighted average of the 11 values of c from the cross-validationwas calculated using the number of comparisons, N , as the weights (see Section 3.1). Thehigher the weighted average value of c, the better are the predictions on external data Althoughthe value of c varied from 0.62 to 0.84 across the individual cohorts, the variation across themodels for the aggregated c was much less. The values of c varied from 0.723 to 0.725 forthe models without spline terms and from 0.688 to 0.692 for the models with spline terms(see Table IV) indicating that the rank order of patients agreed with observed outcomes morein the simpler models. There was therefore insu�cient discrimination to choose between thesix models without spline terms using this statistic.

5.2.2. Brier score. The Brier score ranges from 0 (perfect prediction of survival status) to0.25, which is the value if all predictions equal 0.5. We evaluated the Brier score at sixmonthly intervals up to 2.5 years. Although the Brier score varied from 0.043 at 6 months to0.09 at 2.5 years, the aggregated Brier scores for di�erent parametric models were identicalto two signi�cant �gures at all the time points. The Brier scores therefore illustrated thedegradation in forecast accuracy with time, but could not be used to discriminate betweenmodels. Figure 1 shows the aggregated Brier score graphed against time.

5.2.3. Deviance di�erences. Deviance di�erences were a much more sensitive measure of pre-dictive performance in that they showed substantial di�erences between cohorts and betweenmodels. The summary statistics for the 12 candidate models are shown in Table IV, togetherwith the number of parameters estimated in the model and the 95th percentage point of thereference �2 value, for comparison. The degrees of freedom used for the reference �2 value

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 12: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2386 M. MAY ET AL.

Table V. Coe�cients of the �nal prognostic model for the outcome AIDS or death.

AIDS or death

Prognostic factor Code Coe�. SE

Baseline hazard �

IDU (no vs yes) X1 −0.115 0.045CD4¡50 (base group)50–99 X2 0.061 0.046100–199 X3 0.144 0.049200–349 X4 0.299 0.069¿350 X5 0.239 0.045

Constant X0 0.606 0.045

Linear predictor �

Age ¡50 (vs ¿50) X6 −0.324 0.078IDU (no vs yes) X1 −0.319 0.080Stage A=B (vs C) X7 −0.349 0.071CD4¡5050–99 X2 −0.320 0.093100–199 X3 −0.724 0.096200–349 X4 −1.609 0.119¿350 X5 −1.878 0.126

Log HIV-1 RNA ¡5 X8 −0.280 0.068Constant X0 −1.031 0.112

The Weibull survival function isS(t)= exp(−�tp)where S(t) is the probability of survival at time t.

�= exp(∑i=8

i=0 Xi�i)the exponentiated linear prediction.

p=∑i=5

i=0 Xi�i is the shape parameter for the Weibull distribution.

is the number of data sets multiplied by the number of parameters, p, in the model, that is11p. The �rst and third models in the table had deviance di�erences less than the referencevalue, suggesting that they generalized well to new data. The models with the spline termsall had high deviance di�erences.As the deviance di�erence gave better discrimination between the candidate models than

either the concordance statistic or the Brier score we chose the �nal model on the basisof this statistic. The best of the 12 candidate models was a Weibull proportional hazardsmodel with interaction terms for CD4 and stage, strati�cation on CD4 and transmission risk.However, when we examined the nested sub-models of the 12 candidate models we foundthat a Weibull model with no interaction terms, but stratifying on CD4 and risk had a loweraggregated deviance di�erence (d=33). This model is shown in the last line of Table IV,and was chosen as the �nal prognostic model. We re-estimated the coe�cients of the �nalmodel using the pooled data with no modelling of cohort e�ect: the estimated coe�cients andstandard errors are shown in Table V.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 13: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2387

All cohorts

Pro

babi

lity

of A

IDS

free

sur

viva

l

Years from start of HAART0 1 2 3

0.75

0.8

0.7

0.85

0.9

0.95

1

Group 3

Group 2

Group 1

Group 4

Group 5

Figure 2. Comparison of Kaplan–Meier (solid line) and predicted (dashed line) survival functions for5 prognostic groups de�ned by prognostic index for all data.

Tables of survival probabilities have been published [27], and a risk calculator based onthe prognostic probabilities is available on the ART Cohort Collaboration website (www.art-cohort-collaboration.org).

6. CHECKING THE ACCURACY OF THE FINAL PROGNOSTIC MODEL

The validation process we have used to select the �nal prognostic model does not necessarilyensure that it is clinically useful, in that it can divide patients into prognostic groups whichhave well separated survival curves (‘discrimination’) that agree closely with the empiricalKaplan–Meier curves (‘calibration’) [32].

6.1. Checking discrimination

Discrimination may be assessed by examining the separation, in terms of di�erence in pre-dicted probability of the outcome, between the best and the worst prognostic groups. Roystonand Sauberei [33] discuss measures of separation. Usually, the separation measure will begreater if the partition of the patients is �ner. That is, most measures of separation dependon the number of prognostic groups and so are not absolute measures of performance. Anexception to this is Royston and Sauberei’s D statistic which is almost independent of thenumber of groups, but is insensitive to miscalibration. Our approach was to examine thegraphs of the overall predicted survival function in �ve groups de�ned by quintiles of riskscore derived from the linear predictor of the prognostic model, and compare these curveswith the corresponding Kaplan–Meier estimates of survival.For each patient in the data set, the model was used to predict survival probabilities for

all time points up to the date of the occurrence of AIDS or death or censoring. Withineach prognostic group, the survival probabilities were averaged over all patients in the riskset at each time point to give a survival curve representative of the group. The averagepredicted survival curve from the model for the �ve groups is shown with the correspondingKaplan–Meier survival curves in Figure 2. Using this method allows comparison of observed

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 14: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2388 M. MAY ET AL.

All cohorts

Pro

babi

lity

of A

IDS

free

sur

viva

l

Years from start of HAART0 1 2 3

0.75

0.8

0.85

0.9

0.95

1CD4>350

CD4 200-349

CD4 100-199

CD4 50-99

CD4 <50

Figure 3. Comparison of Kaplan–Meier estimate of probability of survival free of newAIDS event or death (solid line) with estimate from the prognostic model (dashed line)

for CD4 groups (all data pooled).

and predicted survival in groups of patients which are not de�ned by particular covariates,but only by overall risk.The 3 year survival probability for the worst group is 0.7 compared to 0.97 for the best

group. The survival curves for the prognostic index groups are well separated, indicating thatthe model successfully divides patients into groups with di�erent rates of clinical progression.However there is some evidence of mis�t between the predicted and observed survival curves.The observed survival of the middle group is worse than the predicted survival. The observedsurvival of the fourth group is also worse than the predicted survival, particularly in the �rstyear when it is indistinguishable from the �fth group. However, by 3 years the survival ofgroup four is better than that of group �ve.If we consider all the risk strata, then the separation of the two prognostic models is very

large. For the combined outcome the lowest 3-year probability of a new AIDS event or deathis 3.4 per cent (95 per cent CI 2.8–4.1) for patients with age ¡50, not IDU, CDC stage A=B,CD4 cell count ¿350 and log HIV-1 RNA ¡5 and the highest is 50 per cent (95 per cent CI43–58) for patients aged ¿50, IDU, CDC stage C, CD4 cell count ¡50 and log HIV-1 RNA¿5. We also calculated Harrell’s c statistic as a further measure of discriminatory power: thisshowed that the concordance or rank correlation was 0.746.

6.2. Checking calibration

Calibration may be examined by grouping patients according to the strongest prognostic vari-able, namely CD4 cell count (see Table V), and graphically comparing Kaplan–Meier esti-mates of survival in these groups with average predicted survival curves from the prognosticmodel, for the patients in each group [34].Figure 3 shows that the �t to the Kaplan–Meier survival curves is generally quite close, but

for the three groups with lowest baseline CD4 the observed survival probability is lower thanpredicted for the �rst 18 months. This lack of �t is due to between-cohort heterogeneity in theinitial trajectories: this is illustrated in Figure 4, which shows the Kaplan–Meier estimate andthe predicted survival curves for each CD4 group, for the four largest cohorts. The predictedsurvival probabilities for the group with CD4¡50 are too high for the �rst and third cohorts,

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 15: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2389

cohort1

Pro

babi

lity

of A

IDS

free

sur

viva

lP

roba

bilit

y of

AID

S fr

ee s

urvi

val

Years from start of HAART

0 1 2 3

Years from start of HAART

0 1 2 3

Years from start of HAART

0 1 2 3Years from start of HAART

0 1 2 3

0.75

0.8

0.85

0.9

0.95

1

Pro

babi

lity

of A

IDS

free

sur

viva

l

0.75

0.8

0.85

0.9

0.95

1

Pro

babi

lity

of A

IDS

free

sur

viva

l

0.75

0.8

0.85

0.9

0.95

1

cohort2

cohort3

0.75

0.8

0.85

0.9

0.95

1

cohort4

Figure 4. Comparison of Kaplan–Meier estimate of probability of survival free of new AIDS event ordeath (solid line) with estimate from the prognostic model (dashed line) for CD4 groups for the four

cohorts with most events.

but too low for the second and fourth cohorts. The predicted survival curves show substantialedge e�ects (deviation from a smooth monotonic curve) for predictions close to 3 years dueto paucity of data.As a check on the calibration of the model for the individual cohorts, we graphed the

baseline predicted survivor function with the corresponding Kaplan–Meier estimate of survivalfor the nine largest cohorts (there was insu�cient data for the smaller cohorts). The baselinegroup depends on the parameterization of the model. We chose the values for the covariatesfor the patients in the baseline group so that it represented a large group of patients (age¡50, not IDU and stage A=B) of medium risk (1006CD4¡200 and log RNA¿5). Thecohorts are shown in order of size with cohort 1 being the largest. Figure 5 shows there issome evidence of heterogeneity in the baseline survivor function particularly in cohorts 1 and9 which have worse survival at 3 years than predicted by the model.

7. DISCUSSION

Our aim in this study was to develop and evaluate prognostic models for progression to newAIDS event or death in HIV positive patients starting antiretroviral therapy. We investigated

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 16: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2390 M. MAY ET AL.

Cohort 1 Cohort 2 Cohort 3

baseline survivor Predicted survival function

0 1 2 3 0 1 2 3 0 1 2 3

Cohort 60 1 2 3

Cohort 50 1 2 3

Cohort 40 1 2 3

Cohort 70 1 2 3

Cohort 80 1 2 3

Cohort 90 1 2 3

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

Years from start of HAART

P(su

rviv

al f

ree

of n

ew

AID

S or

dea

th)

Figure 5. Predicted survival from prognostic model and Kaplan–Meier observed survival for baselinegroup (CD4 100-199, RNA¿5, stage A=B, age ¡50, non-IDU) for the nine largest cohorts.

three methods of examining the quality of predictions and found that two of them, Harrell’sc statistic and the Brier score, did not discriminate between candidate prognostic models. Thethird method, based on deviance di�erences, discriminated between the predictive performanceof candidate models and was used to choose the �nal model.Harrell’s c statistic is often used to evaluate prognostic models in clinical use such as, for

example, the Framingham predictions of the risk of coronary heart disease [35]. Because it isbased on rank order of predictions it works well in evaluating discrimination (for example theability of a prognostic index to allocate patients to low, medium and high risk groups). How-ever, to predict the probability of the outcome at di�erent times it is necessary to model thebaseline hazard function as well as the e�ects of covariates, and hence to consider calibrationas well as discrimination. Harrell’s c statistic is insensitive to di�erences in calibration whichdo not a�ect rank order. In this study, we found that Harrell’s c statistic appeared insensitiveto di�erences between some of the candidate prognostic models.The Brier score compares predicted probabilities with Bernouilli outcomes at a particular

time, which is problematic in the context of censored survival data since di�erent modelsmay predict better at di�erent lengths of follow up. It, too, failed to discriminate betweenthe predictive performance of candidate prognostic models. The insensitivity of the Brier

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 17: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2391

score demonstrates that comparing predictions, which are continuous variables on the (0; 1)interval, with observed events, which are binary outcomes, at the individual patient levelis rather useless. It is more useful to average predictions over groups of patients and thencompare the predicted survival curve for the group with the empirical Kaplan–Meier estimatewhich is the procedure we followed for checking calibration of the model. However, althoughthis does give a method for visually checking the calibration of the model, it is not usefulfor choosing between candidate prognostic models.We were able to discriminate between candidate prognostic models using a method based on

deviance di�erences, which measures the loss of �t of the model when it is applied to externaldata (that is, on data not used to �t the model) compared to the �t when the parameters ofthe model are re-estimated using the external data. More methodological work is needed tosupport the use of deviance di�erences in model validation. For example, we have combinedthe deviance di�erences from each excluded cohort, to give a total deviance di�erence, eventhough these will not be statistically independent. This is an area of current research activity,particularly in the context of Bayesian measures of model �t [30].Cox models are the most commonly used model for survival analysis, but are less useful

than fully parametric models for estimating progression probabilities since the baseline hazardis not estimated. We explored the use of three standard parametric survival models (usingthe Weibull, loglogistic and lognormal distributions), together with extensions that use splinesmoothing of the baseline log cumulative hazard function. These allow �exibility in the formof the baseline hazard: in particular the possibility that the hazard declines and then rises. Inour data, the �nal prognostic model excluded the spline terms. Although the models with splineterms �tted better than those without, examination of data for individual cohorts suggests thatthey were �tting features of the data present in some cohorts, but not in others. It is possiblethat prognostic models based on spline terms will be more useful as follow up time increases,and the shape of the baseline hazard becomes more complex.For two related reasons, standard likelihood-based methods are unlikely to be su�cient in

choosing prognostic models based on large data sets such as the one analysed here. First, thearray of modelling choices makes it inevitable that comparisons will be between non-nestedas well as nested models. Second, we must choose from among a large number of possiblemodels, giving the potential for substantial over�tting of the data [5, 9, 11]. Methods such asthe AIC aim to address this by penalizing model complexity, but model comparisons usingAIC should strictly only be made between nested models and our candidate set of models arenot nested. Cross-validation and the AIC are closely related tools for model choice: the AICis an unbiased estimator of the deviance of the model �tted on replicate data sets formed byleave-one-out cross-validation [36].Our results suggest that choices based on the AIC would not penalize model complexity

su�ciently, when the purpose of a model is to make predictions for new patients. Table IVshows that for the prognostic models based on proportional hazards or proportional odds, thoseincluding spline smoothing of the log cumulative hazard had much lower AICs. However lossof �t on external data, based on deviance di�erences, was much higher for these models thanfor the standard models. Using the deviance di�erence within the IECV scheme penalizescomplexity more severely, because it is based on the quality of the predictions in independentdata sets. However, it must be recognized that overall model �t on the whole data set isbeing traded o� with generalizability of the model to new data as measured by the deviancedi�erences: we aim for the �nal model to have adequate �t to data from each cohort, rather

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 18: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2392 M. MAY ET AL.

than the best �t to the pooled data. It is desirable that the performance of the �nal model istested further, by applying it to data external to the ART Cohort Collaboration. Of particularinterest would be comparison of the baseline hazard from new data with that from the �nalprognostic model, and comparison of the performance of the model with lowest AIC and thebest-generalizing model on new data.We have not evaluated formally how many cohorts would be needed for the IECV procedure

to provide useful results. Each cohort must be large enough to allow re-estimation of all modelparameters—in these data we had to combine the three smallest cohorts to achieve this. Arule of thumb from the classi�cation and regression trees (CART) literature [37] recommendsthat for cross-validation to work well the data should be partitioned into 3–10 subsets with10 being preferable if the data set is large and a choice of less than 10 being appropriate forsmaller data sets.We found the literature on evaluation of predictions from survival models to be relatively

small, with most papers critical of existing methods. Korn and Simon [38] suggested usingmeasures of explained variation to quantify the predictive power of a prognostic model andlooked at measures for survival models which were analogous to the multiple correlationcoe�cient R2 for normal linear models. Explained variation measures the relative gains inpredictive accuracy when prediction based on covariates replaces unconditional prediction[39]. These measures may be categorized into two classes; those based on a loss functionwhich compare the predicted probability of survival with the actual survival times and thosethat involve rank correlations between observed and predicted survival times. The Brier scoreis an example from the �rst class and Harrell’s concordance statistic c may be converted intoan example from the second class by rescaling it to become Somer’s D using the formulaD=2(c − 0:5).Further work was done by Schemper and Stare [40] who give an empirical comparison of

measures of explained variation. They point out di�culties in applying some of the measuresto censored data and show inconsistencies in the results when they are used on simulateddata with di�erent proportions of censoring. Graf et al. [2] also describe R2-type measures ofexplained variation and suggest a possible adaptation to the Brier score, namely to averagethe loss over time by integrating it with respect to a weight function. This would overcomethe problem of censoring and allow the expected loss to be calculated over all time andnot just for a �xed time t∗. The paper also suggests that a more suitable alternative to thequadratic loss function might be to use the negative log-likelihood or deviance. Schemper[39] presents a uni�ed concept of predictive accuracy and explained variation based on theabsolute prediction error which may be applied to continuous, binary, polytomous and survivaloutcomes.We used bootstrap methods to estimate con�dence intervals for the predicted probabilities

of survival free of new AIDS at 1, 2 and 3 years after starting ART in each risk strata[27]. These con�dence intervals were based on resampling from the pooled data set, and thusignore between-cohort heterogeneity. To allow for this, it might be possible to generalize themodels to allow for random-e�ects of the parameters of the baseline survival function and=orrandom-e�ects of the prognostic variables. Using such models, it would be possible to dis-tinguish between predictions for patients in a particular cohort (based on shrinkage estimatesof the random e�ects in that cohort) and for patients in new cohorts (incorporating the ad-ditional uncertainty due to the random e�ects). However, the prognostic models consideredhere already include strati�cation on prognostic variables, interactions between variables and

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 19: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2393

di�erent possible shapes for the baseline hazard. It is unclear whether adding further poten-tial complexity will be feasible, or lead to notably improved predictions, particularly as thecovariate e�ects are not very heterogeneous between cohorts in our data set. A limitation ofthis model is that we could only use variables available in all the cohorts. Results from theEuroSIDA cohort suggest that haemoglobin is an important prognostic factor [41]. Findingsfrom the CHORUS and VACS cohorts suggest that the liver enzymes aspartate transaminase(AST) and alanine aminotransferase (ALT) and co-infection with Hepatitis C virus may alsobe important prognostic indicators of survival [42]. Therefore, it might be possible to obtain abetter �tting model with a higher concordance statistic if these variables were included [43].This analysis was the �rst to be done on the ART Cohort Collaboration data set and we

hope that the collaboration will provide further data for enhanced future analyses. In particular,we aim to extend follow up on the current patients so that the e�ect of ART can be modelledover a longer time period. An interesting question is whether there is a limit to the length oftime over which it is useful to predict disease progression from baseline measurements. Theincrease in the Brier score over time shows that the model is becoming less accurate in itspredictions. This is to be expected as baseline measurements gradually exert less in�uence andresponse to treatment becomes an important factor in determining prognosis [44]. This is anissue for all prognostic modelling, but is rarely discussed. In particular, it is unlikely that theproportional hazards assumption underlying Cox and Weibull models is appropriate over verylong periods of time: it is more likely that the hazard rate ratios are gradually convergent. Anexample of gradually converging hazard ratios is seen in Andersen’s model for cardiovascularrisk based on the Framingham data which uses an adaptation of the Weibull model to allowfor the decreasing in�uence of baseline covariate measurements over time [45].We also intend to add in new patients who have started ART in more recent years so that

we can track the e�ect of ART over calendar years as this is a rapidly changing �eld inwhich new treatments are becoming available with further drug research. A further aim of thecollaboration is to evaluate this model on completely independent data. Although the strictestcriteria for validating a prognostic model must remain external validation on independent data,we believe that the methods for �tting and evaluating prognostic models based on multiplecohorts, proposed in this paper, have led to a prognostic model that will be useful to patientsstarting antiretroviral therapy, and their clinicians, in a wide range of settings.

APPENDIX A: THE ANTIRETROVIRAL THERAPY (ART) COHORTCOLLABORATION

Steering committee: Dominique Costagliola, Fran�cois Dabis, Antonella D’Arminio Monforte,Frank de Wolf, Matthias Egger, John Gill, Robert Hogg, Amy Justice, Bruno Ledergerber,Catherine Leport, Jens Lundgren, Andrew Phillips, Bernd Salzberger, Shlomo Staszewski, IanWeller.Members of the 13 study groups: French Hospital Database on HIV (FHDH): S. Alfandari,F. Bastides, E. Billaud, A. Boibieux, F. Bou�e, F. Bricaire, D. Costagliola, L. Cotte, L. Cuzin,F. Dabis, S. Fournier, J. Gasnault, C. Gaud, J. Gilquin, S. Grabar, D. Lacoste, J.M. Lang, H.Laurichesse, C. Leport, M. Mary-Krause, S. Matheron, M.C. Meyohas, C. Michelet, J. Moreau,G. Pialoux, I. Poizot-Martin, C. Pradier, C. Rabaud, E. Rouveix, P. Sa��ag, D. Salmon-Ceron,J. Soubeyrand, H. Tissot-Dupont (Scienti�c Committee); B. Haury, S. Courtial-Destembert,

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 20: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2394 M. MAY ET AL.

G. Leblanc (Coordinating Centre, French Ministry of Health); S. Abgrall, D. Costagliola, S.Grabar, L. Lievre, M. Mary-Krause, V. Potard (Statistical Data Analysis Centre, INSERMSC4). Italian Cohort of Antiretroviral-Naive Patients (ICONA): M. Montroni, G. Scalise,A. Costantini, A. Giacometti (Ancona); U. Tirelli, G. Nasti (Aviano); G. Pastore, N. Ladisa,M.L. Perulli (Bari); F. Suter, C. Arici (Bergamo); F. Chiodo, F.M. Gritti, V. Colangeli,C. Fiorini, L. Guerra (Bologna); G. Carosi, G.P. Cadeo, F. Castelli, C. Minardi, D. Vangi(Brescia); G. Rizzardini, G. Migliorino (Busto Arsizio); P.E. Manconi, P. Piano (Cagliari);T. Ferraro, A. Scerbo (Catanzaro); E. Pizzigallo, F. Ricci (Chieti); D. Santoro, L. Pusterla(Como); G. Carnevale, D. Galloni (Cremona); P. Vigano, M. Mena (Cuggiono); F. Gh-inelli, L. Sighinol� (Ferrara); F. Leoncini, F. Mazzotta, M. Pozzi, S. Lo Caputo (Florence);G. Angarano, B. Grisorio, S. Ferrara (Foggia); P. Grima, P. Tundo (Galatina); G. Pagano,N. Piersantelli, A. Alessandrini, R. Piscopo (Genova); M. Toti, S. Chigiotti (Grosseto); F.Soscia, L. Tacconi (Latina); A. Orani, P. Perini (Lecco); A. Scasso, A. Vincenti (Lucca);A. Scalzini, G. Fibbia (Mantova); M. Moroni, A. Lazzarin, A. Cargnel, G.M. Vigevani, L.Caggese, A. d’Arminio Monforte, F. Tordato, R. Novati, A. Galli, S. Merli, C. Pastecchia,C. Moioli (Milano); R. Esposito, C. Mussini (Modena); N. Abrescia, A. Chirianni, C. Izzo,M. Piazza, M. De Marco, V. Montesarchio, E. Manzillo, S. Nappa (Napoli); A. Colomba, V.Abbadessa, T. Prestileo, S. Mancuso (Palermo); C. Ferrari, P. Pzzaferri (Parma); G. Filice,L. Minoli, R. Bruno, R. Maserati (Pavia); S. Pauluzzi, F. Baldelli (Perugia); E. Petrelli, A.Cioppi (Pesaro); F. Alberici, A. Ruggieri (Piacenza); F. Menichetti, C. Martinelli (Pisa); C.De Stefano, A. La Gala (Potenza); T. Zauli, G. Ballardini (Ravenna); G. Magnani, M.A.Ursitti (Reggio Emilia); M. Arlotti, P. Ortolani (Rimini); L. Ortona, F. Dianzani, G. Ip-polito (Roma); A. Antinori, G. Antonucci, S. D’Elia, P. Narciso, N. Petrosillo, V. Vullo,A. De Luca, L. Del Forno, M. Zaccarelli, P. De Longis, M. Ciardi, G. D’O�zi, P. Noto,M. Lichtner, M.R. Capobianchi, E. Girardi, P. Pezzotti, G. Rezza (Rome); M.S. Mura, M.Mannazzu (Sassari); P. Caramello, A. Sinicco, M.L. Soranzo, L. Gennero, M. Sciandra, B.Salassa (Torino); P.A. Grossi, C. Basilico (Varese); A. Poggio, G. Bottari (Verbania); E.Raise, S. Pasquinucci (Venezia); F. De Lalla, G. Tositti (Vicenza); F. Resta, A. Chimienti(Taranto); A. Cozzi Lepri (London). Swiss HIV Cohort Study (SHCS): M. Battegay, E.Bernasconi, H. Bucher, Ph. B�urgisser, M. Egger, P. Erb, W. Fierz, M. Fischer, M. Flepp(Chairman of the Clinical and Laboratory Committee), P. Francioli (President of the SHCS,Centre Hospitalier Universitaire Vaudois, CH-1011-Lausanne), H.J. Furrer, M. Gorgievski, H.G�unthard, P. Grob, B. Hirschel, L. Kaiser, C. Kind, Th. Klimkait, B. Ledergerber, U. Lauper,M. Opravil, F. Paccaud, G. Pantaleo, L. Perrin, J.-C. Pi�aretti, M. Rickenbach (Head of DataCenter), C. Rudin (Chairman of the Mother & Child Substudy), J. Schupbach, R. Speck, A.Telenti, A. Trkola, P. Vernazza (Chairman of the Scienti�c Board), Th. Wagels, R. Weber,S. Yerly. AIDS Therapy Evaluation project Netherlands (ATHENA): Investigators: S.N.Blank, F.W.M.N. Wit, J.M. Prins, R.P. Koopmans, F.J.B. Nellen, M.H. Godfried, J.T.M. vander Meer, J.K.M. Eeftinck Schattenkerk, T. van der Poll, J.C. Bos, A. Verbon, J.M.A. Lange,M. van der Valk, Th.A. Ruys, K. Pogany, S.H. Lowe, O. van Doorn, S. Sankatsing, D. Blanck-enberg, P. Reiss, P.S. van Dam, J.H. ten Veen, M.E. Hillebrand-Haverkort, H.M. Weigel, K.Brinkman, P.H.J. Frissen, P.L. Meenhorst, J.W. Mulder, A.T.A. Malruhu, A. van Eeden, S.E.Geerlings, S.A. Danner, R.M. Perenboom, M.A. van Agtmael, F.A.P. Claessen (Amsterdam);H.G. Sprenger, W.M.A.J. Miesen (Groningen); S.v.d. Geest, G. Schreij, A.J.A.M. van derVen (Maastricht); S. de Marie, M.E. van der Ende, I.C. Gyssens, J.L. Nouwen (Rotterdam);B. Bravenboer (Eindhoven); R.W. ten Kate (Haarlem); F.P. Kroon (Leiden); E.F. Schippers

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 21: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2395

(Leiden); W. Bronsveld (Alkmaar); M.P. Leemhuis (Leeuwarden); D.P.F. van Houte, M. Pol�e(Leeuwarden); C.H.H. ten Napel (Enschede); J.R. Juttmann (Tilburg); P.P. Koopmans, D.S.C.Telgt (Nijmegen), C.A.M. Schurink, I.M. Hoepelman, M.M.E. Schneider, C.A.J.J. Jaspers,J.C.C. Borle�s, M.J.M. Bonten (Utrecht); F.J.F. Jeurissen, R. Vriesendorp, E.L.W. Koger,R.H. Kau�mann (Den Haag); C. Richter, R. van Leusen, J. van der Berg (Arnhem); W.L.Blok (Vlissingen). Virology Working Group: J. Goudsmit (Leiden), N.K.T. Back, S. Jurriaans,F. de Wolf, J.M.A. Lange (Amsterdam); A.D.M.E. Osterhaus, H.M.G. Niesters, M. Schutten(Rotterdam); A.C.M. Kroes, H.C.J. Claas (Leiden); R. Schuurman, C.A.B. Boucher (Utrecht).The Multicenter Study Group on EuroSIDA (national coordinators in parenthesis): Austria:(N. Vetter), Vienna;Belgium: (N. Clumeck) P. Hermans, B. Sommereijns, Brussels; R. Colebunders, Antwerp;Czech Republic: (L. Machala) H. Rozsypal, Prague. Denmark: (J. Nielsen) J. Lundgren, T.Ben�eld, O. Kirk, J. Gerstoft, T. Katzenstein, B. RHge, P. SkinhHj, Rigshospitalet, Copen-hagen; C. Pedersen (Odense). Estonia: (K. Zilmer), Tallinn. France: (C. Katlama) M. De Sa,J.-P. Viard, Paris; T. Saint-Marc, P. Vanhems, Lyon; C. Pradier, Nice. Germany: (M. Dietrich),C. Manegold, J. van Lunzen, H.-J. Stellbrink, Hamburg; V. Miller, S. Staszewski, Frankfurt;F.-D. Goebel, Munich; B. Salzberger, Cologne; J. Rockstroh, Bonn. Greece: (J. Kosmidis), P.Gargalianos, H. Sambatakou, J. Perdios, G. Panos, I. Karydis, A. Filandras, Athens. Hungary:(D. Banhegyi), Budapest. Ireland: (F. Mulcahy), Dublin. Israel: (I. Yust), D. Turner, TelAviv; S. Pollack, Z. Ben-Ishai, Haifa; Z. Bentwich, Rehovot; S. Maayan, Jerusalem. Italy: (S.Vella, A. Chiesi), V. Vullo, P. Santopadre, P. Narciso, A. Antinori, P. Franci, M. Zaccarelli,Rome; C. Arici, Bergamo; R. Prister�a, Bolzano; F. Mazzotta, A. Gabbuti, Florence; R. Es-posito, A. Bedini, Modena; A. Chirianni, E. Montesarchio, Naples; A. Lazzarin, R. Finazzi,A. D’Arminio Monforte, Milan. Latvia: (L. Viksna), Riga. Lithuania: (S. Chaplinskas), Vil-nius. Luxembourg: (R. Hemmer), T. Staub, Luxembourg. Netherlands: (P. Reiss), Amsterdam.Norway: (J. Bruun), A. Maeland, V. Ormaasen, Oslo. Poland: (B. Knysz), J. Gasiorowski,Wroclaw; A. Horban, Warsaw; D. Prokopowicz, A. Wiercinska-Drapalo, Bialystok; A. Boron-Kaczmarska, M. Pynka, Szczecin; M. Beniowski, Chorzow; H. Trocha, Gdansk. Portugal: (F.Antunes), K. Mansinho, R. Proenca, Lisbon. Romania: A. Streinu-Cercel, D. Duiculescu. Slo-vakia: (M. Mikras), Bratislava. Spain: (J. Gonz�alez-Lahoz), B. Diaz, T. Garc��a-Benayas, L.Martin-Carbonero, V. Soriano, Madrid; B. Clotet, A. Jou, J. Conejero, C. Tural, Badalona;J.M. Gatell, J.M. Mir�o, Barcelona. Sweden: (A. Blaxhult), A. Karlsson, P. Pehrson, Stock-holm. Switzerland: (B. Ledergerber), R. Weber, Z�urich; P. Francioli, A. Telenti, Lausanne;B. Hirschel, V. Soravia-Dunand, Geneve. United Kingdom: (S. Barton), A.M. Johnson, D.Mercey, A. Phillips, C. Loveday, M.A. Johnson, A. Mocroft, A. Pinching, J. Parkin, J. We-ber, G. Scullard, London; M. Fisher, Brighton; R. Brettle, Edinburgh. Virology group: C.Loveday, B. Clotet (Central Coordinators) plus ad hoc virologists from participating sites inthe EuroSIDA Study. Steering committee: F. Antunes, A. Blaxhult, N. Clumeck, J. Gatell, A.Horban, A. Johnson, C. Katlama, B. Ledergerber (Chair), C. Loveday, A. Phillips, P. Reiss,S. Vella. Coordinating centre sta�: J. Lundgren (Project Leader), I. GjHrup, O. Kirk, N. Friis-Moeller, A. Mocroft, A. Cozzi-Lepri, D. Mollerup, M. Nielsen, A. Hansen, D. Kristensen,L. Kolte, S. Aabolt, L. Hansen, J. Kjr. Collaborations in HIV Outcomes Research US(CHORUS): Stephen Ra�anti, Stephen Becker, Anthony Scarsella, James Braun, Amy Jus-tice, Gregory Fusco, Bernard Most, Rukmini Balu, Laura Gilbert, Richard Fleenor, TheodoreIsing, Douglas Dieterich, Jennifer Fusco. Frankfurt HIV Cohort: Schlomo Staszewski, EilkeB. Helm, Amina Carlebach, Manfred M�osch, Axel M�uller, Annette Haberl, Stephan Korn,

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 22: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2396 M. MAY ET AL.

Christoph Stephan, Markus Bickel, Peter Gute, Leo Locher, Thomas Lutz, Stephan Klauke(Clinical Group); Hans W. Doerr, Martin St�urmer (Virology Group); Caroline Sabin (Sta-tistical Analysis); Brenda Dauer (Scienti�c Advisor); Beverly Jennings (Data Management).Antiprotease Cohort (APROCO): Catherine Leport, Fran�cois Ra� (Principal Investigators);Genevieve Chene, Roger Salamon (Methodologists); Fran�coise Brun-Vezinet, Herv�e Fleury(Virologists); Jean-Paul Moatti, Jeanine Pierret (Social scientists); Gilles Peytavin, RodolpheGara�o (Pharmacologists), Bernard Masquelier, Bruno Spire. Aquitaine Cohort: Organisationand methodology: G. Chene, C. Marimoutou, F. Dabis, R. Salamon; Clinical coordination:D. Lacoste, D. Malvy, I. Pellegrin, M. Dupon, J.F. Moreau, P. Morlat, J.L. Pellegrin, J.M.Ragnaud. Participating Hospital Departments (participating physicians): Bordeaux UniversityHospital: J. Beylot, P. Morlat, N. Bernard, D. Lacoste, C. Nouts, C. Beylot, M.S. Doutre,C. Conri, J. Constans, P. Couzigou, H. Fleury, I. Pellegrin, M. Geniaux, A. Simon, J.Y. La-cut, C. Cazorla, M. Dupon, B. Leng, P. Merci�e, J.L. Pellegrin, M. Le Bras, F. Djossou, D.Malvy, J.P. Pivetaud, J.F. Moreau, J.L. Taupin, J.M. Ragnaud, C. De La Taille, H. Dutronc,D. Neau, C. Series, A.Taytard; Dax Hospital: Dr M. Loste, I. Blanchard; Bayonne Hospital:Dr F. Bonnal, Y. Blanchard, S. Farbos, M.C. Gemain; Libourne Hospital: J. Ceccaldi, X.Jacquelin; Villeneuve-sur-Lot Hospital: E. Buy, G. Brossard. Data management and analy-sis: M.-J. Blaizeau, M. Decoin, L. Dequae-Merchadou, A.M. Formaggio, M. Pontgahet, D.Touchard, G. Palmer. British Columbia Centre for Excellence in HIV: Chris Alexander,Paula Braitstein, Keith Chan, Helen Cote, Nada Gataric, P. Richard Harrigan, Marrianne Har-ris, Katherine Heath, Robert Hogg, Julio Montaner, Michael O’Shaughnessy, Evan Wood, andBenita Yip. Royal Free Hospital Cohort: Fiona Lampe, Clinton Chaloner, Helen Gumley,Darren Ransom, Caroline A. Sabin, Amanda Mocroft, Marc Lipman, Mike Youle, MargaretJohnson. South Alberta Clinic: John Gill, Ron Read. K�oln=Bonn Cohort: G. Fatkenheuer, J.Rockstroh, B. Salzberger, K. Voigt, J.C. Wasmuth, N. Schmeisser.

ACKNOWLEDGEMENTS

We thank all patients, doctors, data managers, and study nurses who were involved in the participat-ing cohort studies. The ART-CC is supported by the U.K. Medical Research Council (MRC) grantRD1564 and GlaxoSmithKline. Sources of funding of individual cohorts include the Agence Nationalede Recherches sur le SIDA (ANRS), the Institut National de la Sant�e et de la R�echerche Medicale(INSERM), the French, Italian and Swiss Ministries of Health, the Dutch Stichting HIV Monitoring,the European Commission, the British Columbia and Alberta Governments, the Michael Smith Foun-dation for Health Research, the Canadian Institutes of Health Research, and unrestricted grants fromGlaxoSmithKline, Roche, and Boehringer-Ingelheim. The Department of Social Medicine of Universityof Bristol is the lead centre of the MRC Health Services Research Collaboration.

REFERENCES

1. Akaike H. A new look at the statistical model identi�cation. IEEE Transactions on Automatic Control 1974;19:716–723.

2. Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classi�cationschemes for survival data. Statistics in Medicine 1999; 18(17–18):2529–2545.

3. Van Houwelingen HC. Validation, calibration, revision and combination of prognostic survival models. Statisticsin Medicine 2000; 19(24):3401–3415.

4. Altman DG, Royston P. What do we mean by validating a prognostic model? Statistics in Medicine 2000;19(4):453–473.

5. Harrell Jr FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluatingassumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 1996; 15(4):361–387.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 23: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

PROGNOSTIC MODEL FOR HIV PATIENTS ON HAART 2397

6. Brier GW. Veri�cation of forecasts expressed in terms of probability. Monthly Weather Review 1950; 78:1–3.7. Schumacher M, Hollander N. Resampling and cross-validation techniques: a tool to reduce bias caused by modelbuilding. Statistics in Medicine 1997; 16:2813–2827.

8. Efron B. How biased is the apparent error rate of a prediction rule? Journal of the American StatisticalAssociation 1986; 81:461–470.

9. Van Houwelingen HC, Le Cessie S. Predictive value of statistical models. Statistics in Medicine 1990; 9:1303–1325.

10. Verweij PJ, Van Houwelingen HC. Cross-validation in survival analysis. Statistics in Medicine 1993;12(24):2305–2314.

11. Copas JB. Cross-validation shrinkage of regression predictors. Journal of the Royal Statistical Society SeriesB 1987; 49:175–183.

12. Sauerbrei W, Schumacher M. A bootstrap resampling procedure for model building: Application to the Coxregression model. Statistics in Medicine 1992; 11:2093–2109.

13. Royston P, Parmar MKB, Sylvester R. Construction and validation of a prognostic model across several studies,with an application in super�cial bladder cancer. Statistics in Medicine 2004; 23:907–926.

14. Grabar S, Le Moing V, Goujard C et al. Clinical outcome of patients with HIV-1 infection according toimmunologic and virologic response after 6 months of highly active antiretroviral therapy. Annals of InternalMedicine 2000; 133:401–410.

15. d’Arminio Monforte A, Lepri AC, Rezza G et al. Insights into the reasons for discontinuation of the �rst highlyactive antiretroviral therapy (HAART) regimen in a cohort of antiretroviral naive patients. AIDS 2000; 14:499–507.

16. Egger M, Hirschel B, Francioli P et al. Impact of new antiretroviral combination therapies in HIV infectedpatients in Switzerland: prospective multicentre study. British Medical Journal 1997; 315:1194–1199.

17. Nieuwkerk PT, Sprangers MA, Burger DM et al. Limited patient adherence to highly active antiretroviral therapyfor HIV-1 infection in an observational cohort. Archives of Internal Medicine 2001; 161:1962–1968.

18. Lundgren JD, Phillips AN, Vella S et al. Regional di�erences in use of antiretroviral agents and primaryprophylaxis in 3122 European HIV-infected patients. Journal of Acquired Immune De�ciency Syndromes 1997;16:153–160.

19. Becker SL, Ra�anti SR, Hansen NI et al. Zidovudine and stavudine sequencing in HIV treatment planning:�ndings from the CHORUS HIV cohort. Journal of Acquired Immune De�ciency Syndromes 2001; 26:72–81.

20. Brodt HR, Kamps BS, Gute P et al. Changing incidence of AIDS-de�ning illnesses in the era of antiretroviralcombination therapy. AIDS 1997; 11:1731–1738.

21. Le Moing V, Thiebaut R, Chene G et al. Predictors of long-term increase in CD4(+) cell counts in humanimmunode�ciency virus-infected patients receiving a protease inhibitor-containing antiretroviral regimen. Journalof Infections Diseases 2002; 185:471–480.

22. Binquet C, Chene G, Jacqmin-Gadda H et al. Modeling changes in CD4-positive T-lymphocyte counts after thestart of highly active antiretroviral therapy and the relation with risk of opportunistic infections: the Aquitainecohort, 1996–1997. American Journal of Epidemiology 2001; 153:386–393.

23. Hogg RS, Yip B, Kully C. Improved survival among HIV-infected patients after initiation of triple-drugantiretroviral regimens. Canadian Medical Association Journal 1999; 160:659–665.

24. Mocroft A, Barry S, Sabin CA et al. The changing pattern of admissions to a London hospital of patients withHIV: 1988–1997. Royal Free Centre for HIV Medicine. AIDS 1999; 13:1255–1261.

25. Mocroft A, Gill MJ, Davidson W, Phillips AN. Predictors of a viral response and subsequent virological treatmentfailure in patients with HIV starting a protease inhibitor. AIDS 1998; 12:2161–2167.

26. Fatkenheuer G, Theisen A, Rockstroh J et al. Virological treatment failure of protease inhibitor therapy in anunselected cohort of HIV-infected patients. AIDS 1997; 11:113–116.

27. Egger M, May M, Chene G et al. Prognosis of HIV-1-infected patients starting highly active antiretroviraltherapy: a collaborative analysis of prospective studies. Lancet 2002; 360:119–129.

28. Centers for Disease Control. Revised classi�cation system for HIV infection and expanded surveillance casede�nition for AIDS among adolescents and adults. Morbidity and Mortality Weekly Report 1992; 41:1–20.

29. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology 1982; 143(1):29–36.

30. Spiegelhalter DJ, Best NG, Carlin BP, Van der Linde A. Bayesian measures of model complexity and �t.Journal of the Royal Statistical Society Series B 2002; 64(3):583–616.

31. Royston P, Parmar MKB. Flexible parametric proportional-hazards and proportional-odds models for censoredsurvival data, with application to prognostic modelling and estimation of treatment e�ects. Statistics in Medicine2002; 21:2175–2197.

32. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Annals of InternalMedicine 1999; 130(6):515–524.

33. Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Statistics in Medicine 2004;23:723–748.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398

Page 24: Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy

2398 M. MAY ET AL.

34. Ghali WA, Quan H, Brant R et al. Comparison of 2 methods for calculating adjusted survival curves fromproportional hazards models. Journal of the American Medical Association 2001; 286(12):1494–1497.

35. D’Agostino Sr RB, Grundy S, Sullivan LM, Wilson P, CHD Risk Prediction Group. Validation of theFramingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. Journalof the American Medical Association 2001; 286(2):180–187.

36. Stone M. An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal ofthe Royal Statistical Society Series B 1977; 39(1):44–47.

37. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classi�cation and Regression Trees. Wadsworth: Belmont,CA, 1984.

38. Korn EL, Simon R. Measures of explained variation for survival data. Statistics in Medicine 1990; 9:487–503.39. Schemper M. Predictive accuracy and explained variation. Statistics in Medicine 2003; 22:2299–2308.40. Schemper M, Stare J. Explained variation in survival analysis. Statistics in Medicine 1996; 15:1999–2012.41. Mocroft A, Kirk O, Barton SE et al. Anaemia is an independent predictive marker for clinical prognosis in

HIV-infected patients from across Europe. EuroSIDA study group 1. AIDS 1999; 13(8):943–950.42. Justice AC, Wagner J, Fusco GP et al. HIV survival: liver function tests independently predict survival. XIV

International AIDS Conference, Barcelona, Spain, July 7–12, 2002, Abstract Book 2002; 1:29(Abstract)43. Rancinan C, Neau D, Saves M et al. Is hepatitis C virus co-infection associated with survival in HIV-infected

patients treated by combination antiretroviral therapy? 1. AIDS 2002; 16(10):1357–1362.44. ART CC. Prognostic signi�cance of early immunologic and virologic response in HIV-1 infected patients starting

antiretroviral therapy. Lancet 2003; 362:679–686.45. Anderson KM. A nonproportional hazards Weibull accelerated failure time regression model. Biometrics 1991;

47(1):281–288.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2375–2398