Semiparametric inference for survival models with step process covariates

60 The Canadian Journal of StatisticsVol. 37, No. 1, 2009, Pages 60–79

La revue canadienne de statistique

Semiparametric inference for survival modelswith step process covariatesTimothy HANSON1*, Wesley JOHNSON2 and Purushottam LAUD3

1Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA2Department of Statistics, University of California at Irvine, Irvine, CA 92697, USA3Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, USA

Key words and phrases: Accelerated failure time; covariate process; mixture of Polya trees; proportional

hazards; time-dependent covariates.

MSC 2000: Primary 62N01; secondary 62G09, 62P10.

Abstract: The authors consider Bayesian methods for fitting three semiparametric survival models, incor-

porating time-dependent covariates that are step functions. In particular, these are models due to Cox [Cox

(1972) Journal of the Royal Statistical Society, Series B, 34, 187–208], Prentice & Kalbfleisch and Cox &

Oakes [Cox & Oakes (1984) Analysis of Survival Data, Chapman and Hall, London]. The model due to

Prentice & Kalbfleisch [Prentice & Kalbfleisch (1979) Biometrics, 35, 25–39], which has seen very limited

use, is given particular consideration. The prior for the baseline distribution in each model is taken to be

a mixture of Polya trees and posterior inference is obtained through standard Markov chain Monte Carlo

methods. They demonstrate the implementation and comparison of these three models on the celebrated

Stanford heart transplant data and the study of the timing of cerebral edema diagnosis during emergency

room treatment of diabetic ketoacidosis in children. An important feature of their overall discussion is the

comparison of semi-parametric families, and ultimate criterion based selection of a family within the con-

text of a given data set. The Canadian Journal of Statistics 37: 60–79; 2009 © 2009 Statistical Society of

Canada

Resume: Les auteurs considerent des methodes bayesiennes pour ajuster trois modeles de survie semi-

parametriques incorporant des covariables dependant du temps. En particulier, ces modeles sont dus a Cox

[Cox (1972) Journal of the Royal Statistical Society, Series B, 34, 187–208], Prentice et Kalbfleisch et

Cox et Oakes [Cox & Oakes (1984) Analysis of Survival Data, Chapman and Hall, London]. Une attention

particuliere est donnee aumodele de Prentice et Kalbfleish [Prentice &Kalbfleish (1979) Biometrics, 35, 25–39] dont l’utilisation est tres limitee. La densite a priori pour la distribution de reference pour chaque modele

est unmelange d’arbres de Polya et l’inference a posteriori est faite en utilisant les methodes deMonte-Carlo

markoviennes standards. Ils illustrent l’implantation de ces modeles et ils les comparent a l’aide des celebres

donnees de transplantation cardiaque de Stanford et sur une etude du temps de diagnostic de l’œdeme cerebral

lors du traitement a l’urgence d’enfants souffrant d’acidocetose diabetique. Les points importants de leur

argumentation portent sur la comparaison de familles semi-parametriques et le critere de selection final

d’une de ces familles en fonction du jeu de donnees. La revue canadienne de statistique 37: 60–79; 2009© 2009 Société statistique du Canada

1. INTRODUCTION

Bayesian semiparametric methods for survival data in the regression context have been devel-

oped by many authors over the past three decades. Generally, from the Bayesian viewpoint of

employing a full likelihood, two main components are relevant: a nonparametric prior on the

*Author to whom correspondence may be addressed.E-mail: [email protected]

© 2009 Statistical Society of Canada / Société statistique du Canada

2009 SEMIPARAMETRIC INFERENCE 61

space of all baseline distributions considered in the model and a parametric form specifying how

the covariates modify the baseline distribution. For the regression specification, the proportional

hazards (PH) model of Cox (1972) is the most widely used, followed by the accelerated fail-

ure time (AFT) model considered, for example, by Kalbfleisch & Prentice (1980). We consider

both.

In addition to PH and AFT models, there are other approaches to modelling survival in

the presence of time-dependent covariates. Aalen (1980) develops an additive hazards regression

model, broadly illustrated inMartinussen& Scheike (2006, Chapter 5), whereas Sundaram (2006)

develops a proportional odds rate model. Recently, Zeng & Lin (2007) extend semiparametric

transformationmodels,H(T ) = −β′x + ε, where ε has a known distribution andH is arbitrary, to

a heteroscedastic version accommodating time-dependent covariates. These latter models include

proportional hazards and proportional odds with arbitrary survival functions as special cases, but

only parametric AFT models are possible, as noted by Zeng & Lin (p. 527). Since we allow for

nonparametric ε, our work complements that of Zeng & Lin (2007), who are currently working on

a nonparametric maximum likelihood approach to AFT models accommodating time dependent

covariates.

In this article we develop new Bayesian semiparametric methodology for three regression

models for survival data with TDC’s: Cox’s original model (here denoted CTD); and two gen-

eralizations of the accelerated failure time model: a little-used model introduced by Prentice &

Kalbfleisch (1979) (PKTD) and a model first proposed by Cox & Oakes (1984) (COTD). We

take the covariate function to be fixed and observed without (or with negligible) error as this

is the case in many applications (e.g., time of transplant, fluids administered to a patient, onset

of a well defined condition). We only consider TDC’s that correspond to jump processes. Thus,

as is standard practice, each TDC will take on a finite number of values that change at discrete

times over the course of the experiment. We refer to the times at which a TDC changes as TDC

changepoints.

Similar to the case of modelling TDC’s in the context of the Cox model, where the resulting

model is no longer a PH model, the PKTD model is no longer an AFT model. In the PKTD

model, the hazard function is in the usual AFT form at time zero with an acceleration factor (AF)

that depends on the values of the covariate information at time zero. The AF remains constant

until the first TDC changepoint at which time there is a new AF. The hazard function now is in

the same form as an AFT model with the new AF until the next TDC changepoint, and so on.

In Section 3 we make this precise and provide the machinery to make MCMC inferences. This

model is analogous to the PH model where hazards are proportional for two individuals over

periods of time between TDC changepoints.

2. BACKGROUND MATERIAL

This section focuses on survival analysis with fixed covariates as background to modelling with

time-dependent covariates, and on themixture of Polya trees prior that wewill employ throughout.

We ultimately focus on three survival analysis models as discussed in Section 1, but in terms of

background material, the Cox model is well understood and the PKTD and COTD models with

fixed covariates reduce to the AFT model. Thus we focus here on the AFT model with fixed

covariates, followed by an introduction to mixtures of Polya trees and some discussion about

potential advantages of this model.

Hanson & Johnson (2002) present a Bayesian semiparametric AFT model in which the base-

line survival distribution has a mixture of finite Polya trees prior. Let T be a survival time

for an individual with covariates x = (x1, . . . , xp)′. Let F0(t) = P(T0 ≤ t), S0(t) = P(T0 > t),

f0(t) = (d/dt)F0(t), andh0(t) = f0(t)/S0(t) be the baseline cdf, survival function, pdf, and hazard

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

62 HANSON, JOHNSON AND LAUD Vol. 37, No. 1

function respectively. Hanson & Johnson (2002) consider the model

T = e−xβT0, T0 ∼ S0, S0 ∼∫

PTM(w, Gθ)p(θ) dθ,

where β is a p-dimensional vector of regression coefficients including an intercept. Given S0, the

survivor function for such a T is

S(t|x, β) = e−exβ∫ t

0h0(s e

xβ) ds = e−H0(exβt),

where H0(t) = ∫ t

0 h0(s) ds. The notation S0 ∼ ∫ PTM(w, Gθ)p(θ) dθ is shorthand for a particular

prior that “centres” S0 on a parametric family {Gθ : θ ∈ �} with weight w > 0. We will refer to

c ≡ exβ as the acceleration factor and we will say that T ∼ AFT (c, S0). Note that the cumulative

hazard is H(t|x, β) = H0(ct).

To give a brief description of thePolya tree prior as used here and bymanyothers (Lavine, 1992,

1994; Walker & Mallick, 1997, 1999; Hanson, 2006), let M be a positive integer. Let Gθ denote

a parametric family of cumulative distribution functions, such as the log-normal or log-logistic,

indexed by θ. A Polya tree (PT) prior is constructed from a set of partitions �θM and a familyAM

of positive reals. Here Gθ is the centring distribution of the Polya tree prior. Following Walker

& Mallick (1999) and Hanson & Johnson (2002), we constrain S0(1) = 0.5 with probability 1,

thus yielding a generalization of a standard median regression model on the log scale, that is, the

median failure time for given covariate x is e−xβ. This necessitates Gθ(1) = 0.5 for all θ, in the

non time-dependent covariate case.

Define the partition �θM = {Bθ

ε : ε ∈ ⋃Ml=1{0, 1}l}. If j is the base-10 representation of the

binary ε = ε1 · · · εk at level k, then Bθε1···εk

is defined to be the interval (G−1θ (j/2k), G−1

θ ((j +1)/2k)). For example, with k = 3, and ε = 000, then j = 0 and Bθ

000 = (0, G−1θ (1/8)), and with

ε = 001, then j = 1 and Bθ001 = (G−1

θ (1/8), G−1θ (2/8)), etc.

Note then that at each level k, the class {Bθε : ε ∈ {0, 1}k} forms a partition of the positive

reals and furthermore Bθε1···εk

= Bθε1···εk0

⋃Bθ

ε1···εk1for k = 1, 2, . . . , M − 1. We take the family

AM = {αε : ε ∈ ⋃Mj=1{0, 1}j} to be defined by αε1···εk

= wk2 for somew > 0 (Walker &Mallick,

1999; Hanson & Johnson, 2002). The parameter w acts much like the precision in a Dirichlet

process (Ferguson, 1973). As w tends to zero the posterior baseline is almost entirely data-driven.

As w tends to infinity we obtain a fully parametric analysis. A prior can be placed on w (Hanson,

2006) but it is also common practice to select w = 1, as we do here.

Given�θM andAM , the Polya tree prior is defined up to levelM by the class of random vectors

YM = {(Yε0, Yε1) : ε ∈ ⋃M−1j=1 {0, 1}j} through the product

S0(Bθε1···εk

|YM, θ) =k∏

j=1

Yε1···εj ,

for k = 1, 2, . . . , M, where we define S0(A) to be the baseline measure of any set A. Vector

(Y0, Y1) is set to (0.5, 0.5) to ensure S0(1|YM, θ) = 0.5; the remaining vectors (Yε0, Yε1) are

independent Dirichlet:

(Yε0, Yε1) ∼ Dirichlet(αε0, αε1), ε ∈M−1⋃j=1

{0, 1}j.

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs


Beyond sets at the level M in �M we assume S0|YM, θ follows the baseline Gθ . Hanson

& Johnson (2002) show that this assumption yields predictive distributions that are the same

as from a fully specified (infinite) Polya tree for large enough M; this assumption also avoids a

complication involving infinite probability in the tail of S0|YM, θ that arises from taking S0|YM, θ

to be flat on these sets. Note that S0(Bθε1···εM

|Bθε1···εM−1

,YM, θ) = Yε1···εM which converges to 0.5

in probability. This implies that as M grows S0(A|B,YM, θ) ≈ Gθ(A|B) for sets B with small

Lebesgue measure and A ⊂ B.

Define the vector of probabilities p = p(YM) = (p1, p2, . . . , p2M )′ as pj+1 =S0(B

θε1···εM

|Y, θ) =∏Mi=1 Yε1···εi where j is the base-10 representation of ε1 · · · εM . After

simplification, the baseline survival function is,

S0(t|YM, θ) = pN

[N − 2MGθ(t)

]+2M∑

j=N+1

pj, (1)

where N denotes the integer part of 2MGθ(t) + 1 and where gθ(·) is the density corresponding toGθ . The density associated with S0(t|YM, θ) is given by

f0(t|YM, θ) =2M∑j=1

2Mpjgθ(t)IBθεM (j−1)

(t) = 2MpNgθ(t), (2)

where εM(i) is the binary representation ε1 · · · εM of the integer i and N is as above. Note that the

number of elements of YM may not be prohibitively large. This number is∑M

j=1 2j = 2M+1 − 2.

For M = 5, a typical level, this is 62. To obtain p,∑M

j=2 2j = 2M+1 − 4 multiplications are

required; for M = 5 this is 60.

The mixture of Polya trees (MPT) prior provides an intermediate choice between a strictly

parametric analysis and allowing S0 to be completely arbitrary. In some ways it provides the best

of both worlds. In areas where data are sparse, such as the tails, the MPT prior places relatively

more posterior mass on the underlying parametric family {Gθ : θ ∈ �}. In areas where data are

plentiful the posterior ismore data driven; and features not allowed in the strictly parametricmodel,

such as left-skew and multimodality, become apparent. The user-specified weightw controls how

closely the posterior follows {Gθ : θ ∈ �} with larger values of w yielding inference closer to

that obtained from the underlying parametric model.

3. THREE MODELS FOR STEP PROCESS COVARIATES

In this section we discuss models for time-dependent covariates that exhibit a countably infinite

number of jumps in time. We develop methodology for such covariates assuming models due to

Prentice & Kalbfleisch (1979), Cox & Oakes (1984), and the proportional hazards model of Cox

(1972). The first two reduce to the standard AFT model when covariates are fixed for all time. All

three models involve a baseline survivor function S0; our methodology involves placing an MPT

prior on S0 in each instance.

3.1. The AFT Model of Prentice & Kalbfleisch (1979)Prentice & Kalbfleisch (1979) suggest a generalization of the AFTmodel allowing for TDC’s that

has been largely undeveloped in the literature. Exceptions where parametric versions of the model

are fit includeChintagunta (1998) and Seetharaman (2004); wewere unable to find semiparametric

versions of the model.



In the previous section we defined the basic AFT regression model for a single observation.

We assume that there are p − 1 covariates and that any or all of these can be time dependent. The

observed processes are assumed to be jump processes in discrete time, also called step stresses inreliability literature (e.g., Bagdonavicius & Nikulin, 2000a). Define the observed vector of p − 1

processes as x ≡ {x(s) : s ≤ u}where u is the largest time at which the covariates are “observed,”

and is less than the survival time T or a censoring time.

As in the fixed covariate case, we assume a baseline survival function S0. This distribution

now corresponds to the survival function for an individual with constant, zero covariates for all

t. That is, if T0 corresponds to covariates x(s) = 0 for all s, then T0 ∼ S0. Covariates may be

transformed so that this baseline is interpretable. In this case the full model developed in Sections

3.1–3.3 allows for the inclusion of real prior information for S0 in terms of, say, the first and third

quartiles or other percentiles of S0.

In their (12), Prentice & Kalbfleisch (1979) define the hazard function for an individual with

covariate x(s) to be

h(s|x, β) = ex(s)βh0(s ex(s)β), s ≤ u,

where h0(·) is an arbitrary “baseline” hazard function. Thus for periods between TDC change-

points, namely periods between times when x(s)β remains constant, the hazard is the same as the

hazard for an AFT model with covariates fixed to be those values at the beginning of the inter-

val. Now let {rj : j = 1, . . . , m} denote the ordered TDC changepoints in (0, u) and let r0 = 0.

We assume that s ∈ [rj−1, rj) implies that x(s) = x(rj−1) for j = 1, . . . , m and x(s) = x(rm) for

s ≥ rm. Define cj = ex(rj−1)β, j = 1, . . . , m + 1.

Let j∗(t) = maxj{rj ≤ t}. Then j∗(u) = m. The cumulative hazard for this individual is thus

H(t|x, β) =∫ t

0

ex(s)βh0(s ex(s)β) ds

=j∗(t)∑j=1

cj

∫ rj

rj−1

h0(scj) ds + c(j∗(t)+1)

∫ t

j∗(t)h0(sc(j∗(t)+1)) ds,

and the survivor function is S(t|x, β) = exp{−H(t|x, β)}.Define Sj(t) = P(T > t|T > rj−1) and let pj = Sj(rj). Then for t ∈ [rj−1, rj),

Sj(t) = e−cj

∫ t

rj−1h0(cjs) ds = e−{H0(cjt)−H0(cjrj−1)} = S0(cjt)

S0(cjrj−1).

We have

S(t|x, β) =

j∗(t)∏j=1

pj

S(j∗(t)+1)(t) =

j∗(t)∏j=1

S0(cjrj)

S0(cjrj−1)

S0(c(j∗(t)+1)t)

S0(c(j∗(t)+1)rj∗(t)), (3)

for all t ≤ u. This essentially characterizes the model.

For technical reasons, we require a slightly more elaborate specification than what was given

above. We assume that the covariate processes, x(s), take on their last values, x(rm) for all s ≥ rm.

Thus the formula (3) is defined for all t > 0, and provided cm > 0, the survivor function defined

there is proper since then, limt→∞ S(t|x, β) = 0.

Combining the PKTD specification of how covariates modify the baseline survival function

through time with an MPT prior for S0, we can construct the full likelihood. Our introduction



of the MPT is reflected by use of the notation (YM, θ). In particular, the contribution from an

individual with covariate process x(·) and survival time T = t is, where N denotes the integer

part of 2MGθ(cm+1t) + 1

Lx(β,YM, θ|T = t) =

m∏j=1

pj

2MpNf0(cm+1t|YM, θ)cm+1

S0(cm+1rm|YM, θ).

This results because the last observed jump occurs before u, and so j∗(t) = m, cj∗(t)+1 becomes

simply cm+1, and pj = S0(cjrj|YM, θ)/S0(cjrj−1|YM, θ). Taking the derivative of 1 − S(t|x, β)

in (3) yields the density evaluated at T = t. The likelihood contribution for an observation right-

censored at time t is

Lx(β,YM, θ|T > t) =

m∏j=1

pj

S0(cm+1t|YM, θ)

S0(cm+1rm|YM, θ).

The complete data involve n independent event times, {ti}ni=1, that are the observed survival

times (Ti = ti) or are right-censoring times (Ti > ti), and n covariate processes {xi(·)}ni=1. There

are thus n likelihood contributions like the one above. Define the contribution for individual

i with covariate process xi(·) as Li(β,YM, θ). Then the complete likelihood is L(β,YM, θ) =∏ni=1 Li(β,YM, θ). There are additional notational changes. For example, mi is the number of

TDC changepoints for individual i occurring at or before time ui and rij is the time of the jth

changepoint for individual i, etc.

The random quantities in the prior are thus YM , θ, and β, and all are assumed a priori inde-

pendent. We assume throughout that (β, θ) has an improper uniform distribution. Our methods

are easily modified to account for real prior information (Bedrick, Christensen & Johnson, 2000).

In the Gibbs sampler we alternate between sampling β, θ|YM and YM |β, θ (where dependence

on the data is suppressed). The former can be sampled via a Metropolis–Hastings step (Tierney,

1994) or slice sampling (Neal, 2003).

The full conditional YM |β, θ will not have a recognizable closed form. A simple Metropolis–

Hastings step for updating the components (Yε0, Yε1) first samples a candidate (Y∗ε0, Y

∗ε1) from

a Dirichlet(mYε0, mYε1) distribution, where m > 0, typically m = 20 or 30. This candidate is

accepted as the “new” (Yε0, Yε1) with probability

ρ = min

{1,

�(mYε0)�(mYε1)(Yε0)mY∗

ε0−wj2 (Yε1)

mY∗ε1

−wj2L(β,Y∗M, θ)

�(mY∗ε0)�(mY∗

ε1)(Y∗ε0)

mYε0−wj2 (Y∗ε1)

mYε1−wj2L(β,YM, θ)

},

where j is the number of digits in the binary number ε0 and Y∗M is the set YM with (Y∗

ε0, Y∗ε1)

replacing (Yε0, Yε1). This may be done in blocks or all at once with the acceptance probability

being changed accordingly, although we have had good luck simply sampling the components of

YM one at a time.

For interval censored data, where T ∈ [a, b), for b < ∞ we assume that the observed TDC’s

are constant over the interval [a, b). This would be the case for events like death since if vital signs

were taken at some time during the censoring interval, then it would be known that the individual

was still alive at the time of taking the information. We assume that u = a and that there is no

new measurement on any TDC in [a, b). If this is not the case, our formulas are easily modified,

though at the expense of losing notational simplicity, to account for the extra information. The



likelihood contribution is

Lx(β,YM, θ|T ∈ [a, b)) =

m∏j=1

pj

S0(cm+1a|YM, θ) − S0(cm+1b|YM, θ)

S0(cm+1rm|YM, θ).

The likelihood contribution for right truncated data is similarly obtained.

3.2. The AFT Model of Cox & Oakes (1984)The MPT approach allows fitting versions of a model due to Cox & Oakes (1984) and the PH

model (Cox, 1972), discussed in the next section, for time-dependent covariates. Cox & Oakes

(1984) develop a model in which an individual with covariates x(·) and survival time T uses up

her life at the rate of e−x(t)β relative to “baseline time.” Where T0 is the time a baseline individual

lives, this implies the relationship T0 = ∫ T

0 ex(s)β ds. The model is succinctly written as

S(t|x, β) = S0(c(t)t), c(t) = 1

t

∫ t

0

ex(s)β ds.

Here, we can interpret c(t) as the average value of an acceleration factor ex(s)β over s ∈ [0, t). The

equivalent specification through the hazard function is

h(t|x, β) = ex(t)βh0(c(t)t).

A generalization of this model that also includes PH was given by Shyur, Elsayed & Luxhøj

(1999) who model logh0(·) using quadratic splines. More recently, Tseng, Hsieh &Wang (2005)

consider jointly modelling survival via COTD with a univariate TDC, modeled via a normal

random-effects model. In this article we place an MPT prior on S0 and proceed as before, noting

that

S(t|x, β) = S0

(t − rj∗(t))cj∗(t)+1 +

j∗(t)∑i=1

(ri − ri−1)ci|YM, θ

,

and

f (t|x, β) = f0

(t − rj∗(t))cj∗(t)+1 +

j∗(t)∑i=1

(ri − ri−1)ci|YM, θ

cj∗(t)+1.

See also (18) in Bagdonavicius & Nikulin (2000b).

One can understand this model through the survivor function S(t|x, β) = S0(c(t)t). So if we

have standardized the covariate process, suppose that we have an individual with c(t) ≡ 1, for

all t, so that we can regard them as a baseline individual with survivor function S0. Then we can

compare this individual to one with c(t) = 2, say, for a value of t of interest. Then the prospects

of living at least t units of time for this individual are the same as the prospects for the baseline

individual to live at least 2t units. This model then allows survival prospects to slide forwards or

backwards relative to a hypothetical baseline individual. It is not possible with the PKTD model

to make such a simple comparison with a baseline individual.

Now consider the conditional survival distributions, given survival up to time s, in the PKTD

and COTD models, namely

e−∫ t

sex(v)βh0(ve

x(v)β) dvand e

−∫ t

sex(v)βh0(vc(v)) dv,



respectively. It is clear then that in the PKTD model, median (or any quantile) residual life

and the hazard function immediately change to values associated with the change in covariates,

while the effect of the covariates prior to time s may have an effect on the comparable quantities

under the COTD model. For example, consider two individuals. A and B, identical in every

way except individual A develops staph infection at time t1. According to the PKTD model,

as soon as this person gets infected his or her median residual life and hazard function are

the same as others who have had staph since the start for any time after t1. Say at t2 the staph

infection has been eradicated. Then given that individual A has lived past t2, it is as if the staph

infection never happened whereas under the model of Cox and Oakes the median residual life

and hazard functions would reflect this unhappy and (presumedly) detrimental episode well into

the future. That is, under Cox and Oakes the median residual life would be shorter and the hazard

higher for all time after t2 relative to the PKTD model. Thus when considering residual life, the

PKTD model forgets previous time-dependent events while the COTD model has the capacity

to remember. When considering overall (unconditional) survival, both types of curves will drop

at rates dictated by the relative effects of the TDC’s as they vary in time.

3.3. The Model of Cox (1972)The hazard function under the CTD model for an individual is

h(t|x, β) = ex(t)βh0(t).

Then given S0 and β we obtain

S(t|x, β) =

j∗(t)∏j=1

[S0(rj)

S0(rj−1)

]cj

[

S0(t)

S0(rj∗(t))

]cj∗(t)+1

,

and

f (t|x) =

j∗(t)∏j=1

[S0(rj)

S0(rj−1)

]cj

[

cj∗(t)+1S0(t)cj∗(t)+1−1f0(t)

S0(rj∗(t))cj∗(t)+1

].

See also page 15 in Bagdonavicius & Nikulin (2000a); the methods of Section 3.1 apply.

Like the PKTD model, the model due to Cox (1972) also has the property that a change in

covariate value at a given time immediately changes the individual’s hazard. Until there is another

change, regression parameters are interpreted the same as in the fixed-covariate case. This ease

of interpretation makes the PH model attractive.

3.4. Commonality Among ModelsAll threemodels share the same baseline distribution. Therefore if one has information on baseline

survival one only need consider one prior distribution on S0 for all three models. This is what we

do.

Moreover, if S0 is exponential with parameter θ, then the PKTD, COTD, and CTD models

are all the same model. The likelihood is

L(β, θ) =n∏

i=1

mi∏j=1

e−θ[rij−ri,j−1] ex(ri,j−1)β

e[ti−ri,mi

] e−θx(ri,mi

)β

θδi .

This is the product of right-censored or observed-exactly exponential “observations” and can be

fit readily in SAS, S-plus, WinBUGS, et cetera, thus providing crude but very reasonable starting



values and covariancematrices for the candidate generating distributions for theMetropolis step to

sample β.We also take this approach. Bagdonavicius&Nikulin (2000b) show that the intersection

of CTD and COTD models is the class of exponential regression models.

Since all models that we discuss share the same baseline, we centre our baseline on the same

family as well. Natural candidates are parametric AFT models of the form Gθ(t) = G0{(log(t) −µ)/σ}; θ = (µ, σ), where G0(·) is either a standard normal, standard logistic, or extreme value

CDF; this specification includes the intercept\µ in Gθ .

3.5. Model ComparisonWe consider the problem of model selection, assuming that a set of important predictors has been

identified a priori. Choosing among non-nested semi-parametric models can be problematic since

traditional information criteria, such as the Akaike information criterion for example, rely on the

use of a full, differentiable likelihood. Although popular, Bayes factors are undefined for improper

priors, and can be quite sensitive to the choice of prior specification when real priors are used.

An alternative measure, termed the log pseudomarginal likelihood (LPML) measure (Geisser

& Eddy, 1979), is an aggregated measure of model fit based on the “leave-one-out” principle

used in procedures such as cross validation and the jackknife. The use of the LPML measure also

leads to a “pseudo Bayes factor” that can be interpreted in much the same manner as traditional

Bayes factors. In addition to being easy to compute, the LPML statistic does not depend on model

“focus” as does the Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002), and, unlike

Bayes factors, is defined when specifying improper priors.

In each model, for i = 1, . . . , n, we also obtain the Conditional Predictive Ordinate (CPO)

(Geisser, 1993) statistic:

CPOi = [S(ti|xi,D−i)]1−δi [f (ti|xi,D−i)]

δi .

For an uncensored observation CPOi is the posterior predictive density given the subset D−i

evaluated at the observed survival time ti. For a censored observation CPOi is P(Ti > ti|D−i).

The LPML statistic is given by

LPML =n∑

i=1

log(CPOi),

and the pseudo Bayes factor between models 1 and 2 is defined to be PBF12 = exp(LPML1 −LPML2), where LPMLi is the statistic from model i. The ratios of CPOi statistics from two

different models provide relative information on how well each case is fit. Chen, Shao & Ibrahim

(2000) discuss MCMC computation of CPOi. See Sahu & Dey (2004) for a recent application of

CPO and LPML statistics for comparing survival models.

Since there has not yet been justification of the use of the DIC (Spiegelhalter et al., 2002) in

semi-parametric settings, we will instead simply calculate the posterior expected log likelihood

PELL ≡ ∫ logL(β,YM, θ)p(β,YM, θ|D), where L(β,YM, θ) is the likelihood. The model with

the largest expected log likelihood is regarded as more plausible. Carlin & Louis (2008) suggest

that differences in DIC that are less than 5 are hardly worth mentioning, with differences greater

than 10 perhaps decisively indicating a preferred model in terms of prediction. Furthermore,

Draper & Krnjajic (2007, Sec. 4.1) have shown that the DIC can be viewed as an approximation

to the LPML. With this in mind, we look for differences in LPML that are greater than, say, 5 or

10 in magnitude as “decisive.”



4. SIMULATED DATA AND EXAMPLES

In this section two data sets are considered: the classic Stanford heart transplant data and data

involving cerebral edema in children with diabetic ketoacidosis. Data simulated exactly according

to each model provides some indication of how well the model selection criteria work, as well as

what MPT extensions of the parametric log-logistic model adds, and how well regression effects

are estimated.

4.1. Simulated DataAmodest simulation studywas carried out to determine the ability of PELL andLPML to correctly

identify the data generating mechanism, as well as the ability of the model to tie down the correct

regression effect. To keep things simple, datawere generated from two covariate groups, a baseline

with x0(t) = 0 for all t ≥ 0, and a second group with a covariate process consisting of one jump

x1(t) = I{t ≥ 4}; the baseline covariate was chosen with probability 0.5. Baseline survival is

assigned a mixture of two normal distributions 0.7N(6, 1.52) + 0.3N(12, 1); that is, the survival

curve for each model is given by S0(t) = 1 − 0.7�{(t − 6)/1.5) − 0.3�{t − 12}. The survival

curves from the group with TDC x1(t) are given by:

S1(t) =

I{t < 4}S0(t) + I{t ≥ 4}S0(4 + (t − 4) eβ) COTD

I{t < 4}S0(t) + I{t ≥ 4}S0(4)S0(eβt)/S0(eβ4) PKTD

I{t < 4}S0(t) + I{t ≥ 4}S0(4)1−eβS0(t)

eβCTD

The simulation reflects the situation where a covariate switches “on” at time t = 4 with

probability 0.5. Figure 1 illustrates the differences among models for β = 0.5. The PKTD model

exhibits the most pronounced deviation from baseline, where time is warped instantaneously at

t = 4 for those with covariate x1(t). The COTD also shows a time warping effect, but not as

drastic; for example, the distance between modes is a bit bigger. The CTD model shows the least

deviation from baseline and relatively little “warping.”

Figure 1: Survival densities for x1(t): dashed = PKTD, solid = COTD, thick solid = CTD. The dots =baseline x0(t).



The regression effect was fixed at β = 0.5. The same random-walk Metropolis–Hastings

update was used for each model:

µ∗

β∗

σ∗

∼ N3

µ

β

σ

,

0.02 0.01 0.00

0.01 0.10 0.00

0.00 0.00 0.02

2 .

Each model was fit assuming a baseline log-logistic centring family with c = 1 and J = 5.

The log-logistic family including an intercept was indexed by θ = (µ, σ) as

Gθ(t) = t1/σ exp(−µ/σ)

(1 + t1/σ exp(−µ/σ)).

Inferences from each fit were obtained from 10,000 MCMC iterates kept after a burn-in of 2,000.

Starting values were σ = 0.3, µ = 2.0, and β = 0.5, reflecting an approximate log-logistic fit to

the true bimodal baseline density.

The underlying parametric model c → ∞ was also fit to assess the necessity of the MPT

generalization versus a unimodal log-logistic centring distribution (recall that the true baseline

is bimodal). One hundred data sets of size n = 200 were generated for each of the three models

and PELL, LPML, coverage probabilities for β as well as interval size computed. The results of

the simulation study are summarized in Table 1. The mean credible interval sizes for the effect

β = 0.5 were quite similar across MPT and parametric models and are not included. Similarly,

the PELL and LPML rankings were similar, so only LPML are reported. The first four rows

of Table 1 pertain to estimating β = 0.5 under parametric and nonparametric models. For both

AFT models the mean squared error (MSE) decreases when the nonparametric model is used and

the coverage increases (PKTD) or stays the same (COTD). For the Cox model these trends are

reversed.

For a true model matched with itself, the proportion in the lower part of the table gives the

proportion of times the MPT model was picked relative to the parametric log-logistic model.

The more general MPT model is clearly preferable to the simple parametric model in terms of

prediction for all three models, most markedly the PKTD and CTD models.

Table 1: Results from simulating data of size n = 200 from each model 100 times.

True model

PKTD COTD CTD

√MSE—MPT 0.10 0.08 0.24√MSE—parametric 0.17 0.14 0.14

Coverage—MPT 0.90 0.90 0.82

Coverage—parametric 0.65 0.90 0.96

Fitted model Proportion picked

PKTD 1.00 0.65 0.92

COTD 0.90 0.87 0.96

CTD 0.98 0.63 1.00



The off-diagonals give the proportion of time the true columnmodel was picked relative to the

row. For example, for the PKTD model, the MPT PKTD model was picked over the parametric

model 100% of the time, the PKTD MPT model was picked over the COTD MPT model 90% of

the time, and the PKTD MPT model was picked over the CTD MPT model 98% of the time.

Figure 1 shows that the difference in the effect of two covariate trajectories x0(t) and x1(t)

on predictive densities is minor for the CTD model compared to the difference for PKTD and

COTD, perhaps contributing to the apparent paradox in terms of interval coverage and MSE.

We also performed simulations with covariates that jump at random locations, as well as other

scenarios. Overall, the “best” model as chosen by either PELL or LMPL provided reasonable

estimates of the true underlying densities, whether or not the model chosen was the true model

that generated the data. This highlights the fact that the best predictive model may not necessarily

be the “true” model. However, the ability of the criteria to pick the true model increased with the

sample size.

4.2. Stanford Heart Transplant DataCrowley&Hu (1977) presenteddata onpatients admitted to theStanfordHeartTransplant Program

and analyzed it using the Cox model with TDC’s. Lin & Ying (1995) use these same data to

illustrate their semiparametric estimation procedure for COTD that is based on a heuristically

constructed estimating equation and justified via asymptotic properties. Here we fit to these data

themodels CTD, COTD, and PKTDusing theMPT prior with a log-logistic base-measure,M = 5

andw = 1. Patients in the program either underwent a heart transplant operation or not. For those

patients that did not receive a new heart the TDC process is xi(t) ≡ 0. For patient i who received

a transplant, we denote by zi the time of transplant and define the TDC’s

xi1(t) ={0 if t < zi

1 if t ≥ zi


age at transplant – 35 if t ≥ zi


mismatch score – 0.5 if t ≥ zi

and xi(t) = (xi1(t), xi2(t), xi3(t))′. Results from the three posterior distributions are displayed in

Table 2.

These data were first analyzed using the CTD model by Crowley & Hu (1977) and involve

the time to death from after entry into the study, which was designed to assess the effect of

Table 2: Posterior inferences for Stanford heart transplant data.

Parameter PKTD COTD CTD-MPT CTD

PELL −461.3 −460.5 −458.3

LPML −468.0 −467.0 −464.1

Status −1.76 (−3.86, 1.57) −1.10 (−2.70, 0.50) −1.04 (−1.99, −0.17)−1.04 (−2.01, −0.07)

Age-35 0.104 (−0.020, 0.260)0.054 (−0.004, 0.133) 0.058 (0.015, 0.107) 0.055 (0.010, 0.100)

Mismatch-0.5 1.63 (−0.38, 3.89) 0.64 (−0.30, 1.52) 0.49 (−0.09, 1.03) 0.49 (−0.06, 1.04)



heart transplant on survival. Individuals entered the study and many received donor hearts at

some point according to availability of an appropriate heart and a prioritization scheme. Some

patients died before a suitable heart was found. The main TDC we considered was an indicator of

having received a heart, yes or no, at each time t. The second and third TDC’s were a mismatch

score, centred at 0.5, and age at transplant, centred at 35 years. These TDC’s switched on when

the heart was transplanted. While there are other covariates, these are the ones used by Lin &

Ying (1995), and which correspond to one of the models fit by Crowley & Hu (1977). Table 2

has posterior regression effect estimates and model selection criteria, along with results from

maximizing the partial likelihood. The models are ranked CTD, COTD, PKTD using the LPML

and PELL measures, although the differences among models are not large.

Our estimates for the CTD model are quite close to those obtained via partial likelihood.

Under the CTD model, consider two individuals aged 35 years with mismatch scores of 0.5. The

first individual receives no heart transplant while the second receives a new heart after 6 months.

The relative hazard comparing the individual with no heart transplant to the one with the heart

transplant equals one from time 0 to 6 months, and is e−β1 from that time on. A 95% probability

interval for the relative hazard after 6 months is (1.19, 7.31), and the posterior median is 2.83. So

under this model, the individual without the heart transplant is estimated to have about three times

the risk of the individual with the heart transplant, from 6 months on. Figure 2 displays estimated

survivor curves for these two individuals, and their 95% limits from the MPT CTDmodel and the

defaults using the survival package for the R computing language (e.g., the Aalen estimator

of survival coupled with the partial likelihood estimate of β). The two approaches agree quite

well. Turning to the PKTD model, the estimated median/mean residual life increases by a factor

of e1.76 = 5.81 after transplant; that is, transplant recipients can expect to live about six times as

long as those without transplants.

For comparison, we fit the model with a parametric exponential baseline survival func-

tion, yielding posterior median estimates for (β1, β2, β3) of (−2.74, 0.08, 0.98) and an LPML

of −486.3, much smaller than any value in Table 2. Thus the exponential model, the intersection

of the three semiparametric models, does not appear to be appropriate for these data.

Lin&Ying (1995) estimates for the COTDmodel were (−1.99, 0.096, 0.93), found from their

Table 1. These values are closer to the exponential fit described above than to our semiparametric

fit. This, in one sense, is not surprising. A component of the score function used in their estimation

procedure involves the unknown ratio λ′0(t)/λ0(t). For simplicity, Lin &Ying take this to be equal

to 0, which is optimal for an exponential baseline. We note that when we tried minimizing Lin

and Ying’s estimating function ‖U(b)‖ over a grid with λ0(t) ∝ 1, we obtained the point estimate

(−2.49, 0.11, 1.07), which is not too dissimilar to theirs and even closer to that obtained from the

Bayesian model assuming an exponential baseline.

Figure 2: Estimated survival curves and 95% probability intervals for individuals with mismatch score =0.5 and age = 35.



4.3. Cerebral EdemaHere we develop survival models for data collected by Glaser et al. (2001), who assessed risk

factors associated with the onset of cerebral edema (CE) in children with diabetic ketoacidosis.

Cerebral edema is a dangerous complication associatedwith emergency department and in-patient

hospital care of children with diabetic ketoacidosis. Children with symptoms of diabetic ketoaci-

dosis are initially treated in the emergency department, then moved to the hospital, typically the

paediatric intensive care unit, over the course of 24 h. The main purpose of treatment is to nor-

malize blood serum chemistry and acid-base abnormalities. A major, but infrequent complication

of children associated with diabetic ketoacidosis and its treatment is CE, or swelling in the brain,

which may result in death or permanent neurological damage. After 24 h, the child is considered

to be “cured,” so we only consider those children who actually developed CE; thus there is no

censoring for these data (n = 58).

Glaser et al., developed epidemiologic models for assessing the probability of onset as a

function of potential risk factors and confounders. Our goal, on the other hand, is limited to

ascertaining the effect of treatment procedures, which are changing in time, and the effect of

fixed covariates, on the timing of CE. For example, what are the factors associated with early

versus late onset, if any, among those who develop CE? We could have considered an alternative

formulation where the yes/no binary onset is modeled as a binary regression jointly with the

survival model considered here for those who did experience CE. The development in Hanson et

al. (2003) illustrates such an analysis, in the absence of time-dependent covariates. They modeled

the event of abortion in dairy cattle (yes/no) as a binary regression and, conditional on abortion,

modeled the timing of abortion with a semi-parametric survival model. If animals failed to abort

within 260 days, they would have given birth so there was also no censoring, unless a cow was

culled before aborting or giving birth.

4.3.1. The dataAt the time of initial admission to the emergency department, several measurements were taken,

and at the same time, various treatments applied, continuously for up to 24 h. The original data

include 19 fixed covariates and 12 time-dependent covariates. Rather than attempt a complete

analysis with so many variables, we considered only a subset of the covariates for our illustration,

thus the analysis should only be considered as a demonstration of how our modelling works rather

than a definitive analysis of these data.

The only fixed variable considered here is age (in years) at the time of admission to the emer-

gency department. Time-dependent covariates were recorded hourly rather than continuously.

Two types of TDC’s are considered. The first involves simply monitoring biochemical variables

over time. Two of these included here are serum bicarbonate (Serum-BIC) (concentration in the

blood measured in mmol/L) and blood urea nitrogen (BUN) (concentration measured in mg/dl).

The second type involves actions by the physicians over time. The two used here are fluids admin-

istered (FL) (volume of fluids inml/kg/h) and sodium administered (NA) (measured inmEq/kg/h).

The ith patient in the data has a covariate process given by {(rij, x(rij))}mi

j=1, i = 1, . . . , 58, and

is diagnosed with CE at time ti. Note that then rimi < ti. None of the event times are censored.

4.3.2. Initial data analysisWe used the log-logistic family with unknown scale to centre the three MPT survival models with

TDC’s with M = 4 and w = 1. After selecting candidate generating distributions, we were able

to establish convergence of our chains by standard techniques after 10,000 iterations. Estimates

based on fitted models are shown in Table 3.

Plots of log CPOi values versus the index i showed that the data are supported similarly by the

three models. Plots of the residuals versus predicted values also showed no obvious lack of fit for



Table 3: Posterior inferences for cerebral edema data.

Par. PKTD COTD CTD

PELL −167 −168 −168

LPML −176 −176 −175

Age (Fixed) 0.028 (−0.01, 0.08) 0.021 (−0.02, 0.07) 0.044 (−0.02, 0.11)

Serum-Bic (TD) 0.04 (−0.01, 0.13) 0.05 (−0.02, 0.12) 0.06 (−0.05, 0.17)

Serum-BUN (TD) −0.005 (−0.02, 0.01) −0.01 (−0.022, 0.005) −0.00 (−0.03, 0.03)

Adm-FL (TD) −0.03 (−0.09, 0.03) −0.05 (−0.10, 0.02) −0.05 (−0.15, 0.04)

Adm-NA (TD) 0.60 (0.16, 0.93)∗ 0.74 (0.18, 1.2)∗ 0.90 (0.19, 1.57)∗

FL×NA −0.011 (−0.03, −0.00)∗ −0.013 (−0.03, 0.001) −0.014 (−0.04, 0.003)

Serum-Bic2 −0.005 (−0.01, 0.006) −0.006 (−0.02, 0.003) −0.007 (−0.02, 0.005)

any of the models. Prediction intervals for the actual observations were skewed, but all contained

the observed values. Table 3 gives PELL and LPML values for each model, and there is again no

obvious distinction among the models.

Estimates of regression coefficients for all variables in the models have the same sign and

general magnitude across models. Under all models, there is a 99% posterior probability that the

coefficient for Admin-NA is positive and at least a 96% posterior probability that the coefficient

for the interaction is negative. The Serum-BIC variable has at least a 94% probability of being

positive across models. Other variables have between an 80% and 91% posterior probability of

being positive, or negative.

According to all models, since the linear term is positive and the quadratic term is negative the

effect of Serum-BIC increases and then decreases. For example, under the CTD model, we have

the estimated risk factor exp{0.06 (Serum-BIC) − 0.007 (Serum-BIC)2}, which equals 1 when

Serum-BIC is about 8.6, is maximized at the value 4.3, with risk factor 1.37, and takes on the

value 0.13 when Serum-BIC is 22. Normal levels are between 20 and 29 and lower values are

associated with diabetic ketoacidosis.

4.3.3. Comparative analysisComparing two children that are otherwise being treated the same over a period of time and who

are of the same age, the hazard of CE for a child with a value of 22, a value in the normal range

for Serum-BIC, will be considerably lower than for one with a value of 5. Similarly, under the

PKTD model, the failure time of the first child would be decelerated relative to the second, over

the same period of time, and under the COTD model, total time to CE for the first type of child

would be “used up” at a slower rate than it would for the second child, which amounts to CE

happening earlier for the first child than for the second.

The posterior density estimates and hazard functions for time to CE corresponding to patients

with specified TDC profiles are simple to obtain. Figure 3, for example, presents predictive

densities for a hypothetical patient of age 10, BUN = 25, fluids constant at 3.6, NA constant at

0.7, and Serum-BIC increasing from 5 to 22, as was the case for patient 5 in the data. All three

densities are at least bimodal, however the COTD and PKTD estimates are smoother, with modes

shifted to the left. The CTD estimate has visible hourly “spikes,” a consequence of the CTD

model specification coupled with data being recorded in hours. Smoother densities would have

been obtained with larger w.



Figure 3: CE data; Predictive densities for subject of age 10, constant fluids at 3.6, BUN = 25, constantNA = 0.7 and Serum-BIC increasing from 5 to 22 over the 24 h. Solid=COTD, Thick solid=CTD, dashed

= PKTD.

Figure 4 presents an estimated relative hazard comparing two subjects who have the same

characteristics as the above hypothetical patient except that they both have BUN = 35 and subject

1 has constant NA = 0.7 and subject 2 has constant NA = 0.35. Observe that the CTD model

gives a constant relative hazard since the only difference in the two subjects is a TDC that is

remaining constant over time. According to this model, subject 1 is estimated to be about 1.35

times as much at risk of CE as subject 2 for all times. Under the PKTD and COTDmodels, subject

2 is at lower risk of CE, but the estimated risk varies considerably over the first 18 h.

4.3.4. Risk factor profilesWhile the interpretations ofmodel parameters and comparative risks such as above are quite useful,

the continuous monitoring and treatment of these patients also calls for a description of how the

risk evolves for individuals. Towards this end, we considered individual risk profiles obtained by

plotting the evolution of estimated risk factors (RFs) over time for individuals who experience

changing circumstances, as in our illustration. Here, an RF profile is defined as {exp(xi(t)β) : t ∈0, 1, . . . , ti} for each i = 1, . . . , 58}. We select regression estimates (posterior means) from one

of the three models. We obtained estimated RF profiles for the CTD model against t for the 58

Figure 4: CE data hazard ratio for subject with NA = 0.7 versus NA = 0.35. Dotted = CTD, dashed =COTD, solid = PKTD.



subjects. The RFs at time of diagnosis of CE had a median of 1.7 with first and third quartiles of

1.3 and 2.3; minimum and maximum were 0.1 and 15, respectively. There were four basic shapes

of these plots: (i) relatively sharp increase just before diagnosis (there were about 7 children with

this shape), (ii) relatively sharp increase followed by a decrease and then followed quickly by CE

(about 9 children), (iii) moderate to small increase followed by decrease followed by an overall

flattening of the plot and finally CE (there are about 20 childrenwith this shape), and (iv) relatively

flat or modulating curve but without sharp increases.

The type (i) shape appears to be dictated by a combination of increasing fluids and increasing

sodium administration. Plots of FL by NA for each patient indicate that FL and NA are posi-

tively associated across patients; administering more or less fluids corresponds to simultaneously

administering more or less sodium, respectively. The sample correlations between the estimated

RFs and FL, NA and Serum-BIC were 0.27, 0.57, and −0.41, respectively. As expected, larger

Serum-BIC was associated with smaller RFs.

The type (ii) scenario is generally associated with a relatively sharp increase in sodium and

or fluid administration, followed by a relatively quick decrease in these values. This was true for

patient 9 in particular; the RF went from 1.9 at baseline to 4.1 after the first hour, to 12.3 after

the third hour, and then down to about 4.5 roughly thereafter. FL went from 0 at baseline to 29

after the first hour and then immediately down to 5 in the next hour and back up to around 16 for

the remaining time until CE at 5 h. NA values went from 0 to 4.4 and then down to about 2.5

thereafter. The type (iii) scenario is similar only less dramatic.

Regarding the type (iv) scenario, there were three children that had flat plots for periods less

than 6 h; one with an estimated RF of about 2 for two consecutive hours, another with an RF of

1.2 for 3 h and the third with an RF of about 1 3/4 for 5 h. About 18 of the curves are relatively

flat for over 10 h. Conditions here involve no dramatic changes in treatment and relatively low

administrations of FL and NA. Later CE is thus evidently associated with these conditions.

Interestingly, patient 35 had a very large RF after the first hour (11.7), despite the fact that no

fluids were administered either at baseline or at time 1. NA administration was relatively large

(2.1) at that time (median NA in the sample was 0.59 and 75th percentile was 1.25). Moreover, in

the next hour, their RF went down to 3.8 after water was administered at the second hour at the

rate FL = 29.2 (median FL = 7.6 and third quartile = 11.3), and NA went up to 4.5. This nicely

illustrates the effect of the interaction between FL and NA, since the sign of the coefficient for it is

negative. According to our models, having no or relatively low fluids, and relatively high sodium

administration results in a higher risk of CE diagnosis than having relatively high administration

of both.

The above presentation was based on the CTD model, which is the simplest to interpret

among the three. If the PKTD model had been accepted, we could now proceed to calculate the

RFs as above based on the coefficients from that model and we would call them acceleration

factors. Magnitudes would be attenuated somewhat for higher rates of sodium infusion, and their

interpretations require modification.

4.3.5. Concluding remarksAs a final comment in this illustration, we again remind the reader that our analysis is far from

complete. For example, we do not adjust the sodium infusion rate for the baseline sodium con-

centration in our analysis because baseline sodium was nonsignificant in a preliminary analysis.

Moreover, the sample size is very small, and the diagnosis could well have come after the actual

onset of CE. And finally, one must be very careful to not over interpret the associations found

between increasing fluids and sodium and the onset of CE since it may well be the case that the

physicians were simply reacting to a child who was not doing well, in which case it could be

the onset of CE that was bringing on the higher levels of these factors rather than the other way



around.We also remark that users of our methodology are encouraged to plot and compare hazard

functions based on a selected model for different TDC scenarios.

5. DISCUSSION

We have presented a unified approach to handling two standard semiparametric survival models

for survival data with time-dependent “step-stress” covariates, and we have taken a third model

that was suggested by Prentice and Kalbfleisch and developed it as an alternative semiparametric

model. This latter model has ready interpretation of regression effects in terms of residual life. We

presented methods of comparing models and illustrated them on two data sets and in a simulation.

In the cerebral edemadata sets, therewas little difference in themodel ranking criteria; theStandard

heart transplant data showed more of a difference, but not marked. However, we have found in

other data rather decisive rankings when choosing among survival models. For example, for a

large (n = 10,973) data set involving the time to bankruptcy of firms and seven time-dependent

predictor variables, the COTD model was decisively chosen relative to CTD (LPML = −1,901

versus LPML = −1,925). Analyzing the same (n = 251) medfly data as Tseng, Hsieh & Wang

(2005), we found the models of Sundaram (2006) and CTD to be far superior to COTD (LPML

values of −865, −866, and −938).

Results obtained here are easily extended to right truncated data by straightforward modifica-

tion of the likelihood functions, while left truncated data create difficulties due to the necessity of

observing covariate processes from time zero. In theory, these models can be fit to arbitrary con-

tinuous x(t) subject to ‖x(t)‖ being bounded. Themain obstacle in fitting themodels is performing

the integrations involved in, for example in the PH model,

S(t|x, β) = exp

{−∫ t

0

ex(s)βh0(s) ds

}

and its derivative. However, the function h0(s|YM, θ) is simply computed from (1) and (2) and

any number of numerical integration techniques can be used for this purpose.

We finally note that our TDC’s in the heart transplant example were external covariates,

while those in the CE example were internal (see Kalbfleisch & Prentice, 1980). The practical

distinction is that internal covariates are not observed after “death” or censoring, while external

covariates are. While fitting models is unaffected by the distinction, inferences about conditional

survivor functions and predictive densities are. For example, when we calculate a predictive

density conditional on a TDC process for an individual in the CE data, we have to impute values

for the process beyond the time of their removal from the study.

ACKNOWLEDGEMENTSThe authors thank Dr. Nathan Kuppermann and Dr. Nicole Glaser of the University of California

at Davis Medical School for permission to use their CE data and for providing medical insight

for the analysis presented here. We also thank two referees and an associate editor for comments

and suggestions that resulted in an improved manuscript.

BIBLIOGRAPHYO. O. Aalen (1980). A model for nonparametric regression analysis of counting processes. In Springer

Lecture Notes in Statistics, W. Klonecki, A. Kozek and J. Rosinski, editors, 2, pp. 1–25.

V. Bagdonavicius&M. S.Nikulin (2000a).Mathematicalmodelling of failure-time in dynamic environment.

Preprint of the St. Petersburg Mathematical Society, 2000–2004.V. Bagdonavicius &M. S. Nikulin (2000b). Statistical analysis of semiparametric models in accelerated life

testing. Preprint of the St. Petersburg Mathematical Society, 2000–2002.



E. Bedrick, R. Christensen &W. Johnson (2000). Bayesian accelerated failure time analysis with application

to veterinary epidemiology. Statistics in Medicine, 19, 221–237.B. P. Carlin & T. A. Louis (2008). “Bayesian Methods for Data Analysis,” 3rd ed., Chapman and Hall/CRC

Press, Boca Raton, FL.

M.-H. Chen, Q.-M. Shao & J. G. Ibrahim (2000). “Monte Carlo Methods in Bayesian Computation,”Springer-Verlag, New York.

P. K. Chintagunta (1998). Inertia and variety seeking in amodel of brand purchase timing.Marketing Science,17, 253–270.

D. R. Cox (1972). Regression models and life tables (with discussion). Journal of the Royal StatisticalSociety, Series B, 34, 187–208.

D. R. Cox & D. Oakes (1984). “Analysis of Survival Data,” Chapman and Hall, London.

J. Crowley &M. Hu (1977). Covariance analysis of heart transplant data. Journal of the American StatisticalAssociation, 77, 27–36.

D. Draper & M. Krnjajic (2007). Bayesian model specification. Technical report, Department of Applied

Mathematics and Statistics, University of California – Santa Cruz.

T. S. Ferguson (1973). ABayesian analysis of some nonparametric problems.Annals of Statistics, 1, 209–230.S. Geisser (1993). “Predictive Inference: An Introduction,” Chapman & Hall, London.

S. Geisser&W. F. Eddy (1979). A predictive approach tomodel selection. Journal of the American StatisticalAssociation, 74, 153–160.

N. Glaser, P. Barnett, I. McCaslin, D. Nelson, J. Trainor, J. Louie, F. Kaufman, K. Quayle, M. Roback, R.

Malley &N. Kuppermann (2001). Risk factors for cerebral edema in children with diabetic ketoacidosis.

New England Journal of Medicine, 344, 264–269.T. E. Hanson (2006). Inference for mixtures of finite Polya tree models. Journal of the American Statistical

Association, 101, 1548–1565.T. Hanson &W. O. Johnson (2002). Modeling regression error with a mixture of Polya trees. Journal of the

American Statistical Association, 97, 1020–1033.J. D. Kalbfleisch & R. L. Prentice (1980). “The Statistical Analysis of Failure Time Data,” Wiley, NY.

M. Lavine (1992). Some aspects of Polya tree distributions for statistical modeling. Annals of Statistics, 20,1222–1235.

M. Lavine (1994). More aspects of Polya tree distributions for statistical modeling. Annals of Statistics, 22,1161–1176.

D. Lin & Z. Ying (1995). Semiparametric inference for the accelerated failure time model with time-

dependent covariates. Journal of Statistical Planning and Inference, 44, 47–63.T. Martinussen & T. H. Scheike (2006). “Dynamic Regression Models for Survival Data,” Springer-Verlag:

New York.

R. M. Neal (2003). Slice sampling (with discussion). Annals of Statistics, 31, 705–767.R. L. Prentice & J. D. Kalbfleisch (1979). Hazard rate models with covariates. Biometrics, 35, 25–39.S. K. Sahu & D. K. Dey (2004). On multivariate survival models with a skewed frailty and a correlated

baseline hazard process. “Skew-Elliptical Distributions and Their Applications: A Journey BeyondNormality,” M. G. Genton, editor, CRC/Chapman & Hall, Boca Raton, FL, pp. 321–338.

P. B. Seetharaman (2004). The additive risk model for purchase timing. Marketing Science, 23, 234–242.H.-J. Shyur, E. A. Elsayed & J. T. Luxhøj (1999). A general model for accelerated life testing with time-

dependent covariates. Naval Research Logistics, 46, 303–321.D. J. Spiegelhalter, N. G. Best, B. P. Carlin & A. van der Linde (2002). Bayesian measures of model

complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B, 64, 583–639.S. Sundaram (2006). Semiparametric inference in proportional odds model with time-dependent covariates.

Journal of Statistical Planning and Inference, 136, 320–334.L. Tierney (1994). Markov chains for exploring posterior distributions. Annals of Statistics, 22, 1701–1762.



Y.-K. Tseng, F. Hsieh, & J.-L. Wang (2005). Joint modelling of accelerated failure time and longitudinal

data. Biometrika, 92, 587–603.S.G.Walker&B.K.Mallick (1997). Hierarchical generalized linearmodels and frailtymodelswithBayesian

nonparametric mixing. Journal of the Royal Statistical Society, Series B, 59, 845–860.S. G. Walker & B. K. Mallick (1999). Semiparametric accelerated life time model. Biometrics, 55, 477–483.D. Zeng & D. Y. Lin (2007). Maximum likelihood estimation in semiparametric regression models with

censored data. Journal of the Royal Statistical Society, Series B, 69, 507–564.

Received 11 April 2007Accepted 28 September 2008


Documents

Semiparametric inference for survival models with step process covariates