Upload
timothy-hanson
View
212
Download
0
Embed Size (px)
Citation preview
60 The Canadian Journal of StatisticsVol. 37, No. 1, 2009, Pages 60–79
La revue canadienne de statistique
Semiparametric inference for survival modelswith step process covariatesTimothy HANSON1*, Wesley JOHNSON2 and Purushottam LAUD3
1Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA2Department of Statistics, University of California at Irvine, Irvine, CA 92697, USA3Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, USA
Key words and phrases: Accelerated failure time; covariate process; mixture of Polya trees; proportional
hazards; time-dependent covariates.
MSC 2000: Primary 62N01; secondary 62G09, 62P10.
Abstract: The authors consider Bayesian methods for fitting three semiparametric survival models, incor-
porating time-dependent covariates that are step functions. In particular, these are models due to Cox [Cox
(1972) Journal of the Royal Statistical Society, Series B, 34, 187–208], Prentice & Kalbfleisch and Cox &
Oakes [Cox & Oakes (1984) Analysis of Survival Data, Chapman and Hall, London]. The model due to
Prentice & Kalbfleisch [Prentice & Kalbfleisch (1979) Biometrics, 35, 25–39], which has seen very limited
use, is given particular consideration. The prior for the baseline distribution in each model is taken to be
a mixture of Polya trees and posterior inference is obtained through standard Markov chain Monte Carlo
methods. They demonstrate the implementation and comparison of these three models on the celebrated
Stanford heart transplant data and the study of the timing of cerebral edema diagnosis during emergency
room treatment of diabetic ketoacidosis in children. An important feature of their overall discussion is the
comparison of semi-parametric families, and ultimate criterion based selection of a family within the con-
text of a given data set. The Canadian Journal of Statistics 37: 60–79; 2009 © 2009 Statistical Society of
Canada
Resume: Les auteurs considerent des methodes bayesiennes pour ajuster trois modeles de survie semi-
parametriques incorporant des covariables dependant du temps. En particulier, ces modeles sont dus a Cox
[Cox (1972) Journal of the Royal Statistical Society, Series B, 34, 187–208], Prentice et Kalbfleisch et
Cox et Oakes [Cox & Oakes (1984) Analysis of Survival Data, Chapman and Hall, London]. Une attention
particuliere est donnee aumodele de Prentice et Kalbfleish [Prentice &Kalbfleish (1979) Biometrics, 35, 25–39] dont l’utilisation est tres limitee. La densite a priori pour la distribution de reference pour chaque modele
est unmelange d’arbres de Polya et l’inference a posteriori est faite en utilisant les methodes deMonte-Carlo
markoviennes standards. Ils illustrent l’implantation de ces modeles et ils les comparent a l’aide des celebres
donnees de transplantation cardiaque de Stanford et sur une etude du temps de diagnostic de l’œdeme cerebral
lors du traitement a l’urgence d’enfants souffrant d’acidocetose diabetique. Les points importants de leur
argumentation portent sur la comparaison de familles semi-parametriques et le critere de selection final
d’une de ces familles en fonction du jeu de donnees. La revue canadienne de statistique 37: 60–79; 2009© 2009 Société statistique du Canada
1. INTRODUCTION
Bayesian semiparametric methods for survival data in the regression context have been devel-
oped by many authors over the past three decades. Generally, from the Bayesian viewpoint of
employing a full likelihood, two main components are relevant: a nonparametric prior on the
*Author to whom correspondence may be addressed.E-mail: [email protected]
© 2009 Statistical Society of Canada / Société statistique du Canada
2009 SEMIPARAMETRIC INFERENCE 61
space of all baseline distributions considered in the model and a parametric form specifying how
the covariates modify the baseline distribution. For the regression specification, the proportional
hazards (PH) model of Cox (1972) is the most widely used, followed by the accelerated fail-
ure time (AFT) model considered, for example, by Kalbfleisch & Prentice (1980). We consider
both.
In addition to PH and AFT models, there are other approaches to modelling survival in
the presence of time-dependent covariates. Aalen (1980) develops an additive hazards regression
model, broadly illustrated inMartinussen& Scheike (2006, Chapter 5), whereas Sundaram (2006)
develops a proportional odds rate model. Recently, Zeng & Lin (2007) extend semiparametric
transformationmodels,H(T ) = −β′x + ε, where ε has a known distribution andH is arbitrary, to
a heteroscedastic version accommodating time-dependent covariates. These latter models include
proportional hazards and proportional odds with arbitrary survival functions as special cases, but
only parametric AFT models are possible, as noted by Zeng & Lin (p. 527). Since we allow for
nonparametric ε, our work complements that of Zeng & Lin (2007), who are currently working on
a nonparametric maximum likelihood approach to AFT models accommodating time dependent
covariates.
In this article we develop new Bayesian semiparametric methodology for three regression
models for survival data with TDC’s: Cox’s original model (here denoted CTD); and two gen-
eralizations of the accelerated failure time model: a little-used model introduced by Prentice &
Kalbfleisch (1979) (PKTD) and a model first proposed by Cox & Oakes (1984) (COTD). We
take the covariate function to be fixed and observed without (or with negligible) error as this
is the case in many applications (e.g., time of transplant, fluids administered to a patient, onset
of a well defined condition). We only consider TDC’s that correspond to jump processes. Thus,
as is standard practice, each TDC will take on a finite number of values that change at discrete
times over the course of the experiment. We refer to the times at which a TDC changes as TDC
changepoints.
Similar to the case of modelling TDC’s in the context of the Cox model, where the resulting
model is no longer a PH model, the PKTD model is no longer an AFT model. In the PKTD
model, the hazard function is in the usual AFT form at time zero with an acceleration factor (AF)
that depends on the values of the covariate information at time zero. The AF remains constant
until the first TDC changepoint at which time there is a new AF. The hazard function now is in
the same form as an AFT model with the new AF until the next TDC changepoint, and so on.
In Section 3 we make this precise and provide the machinery to make MCMC inferences. This
model is analogous to the PH model where hazards are proportional for two individuals over
periods of time between TDC changepoints.
2. BACKGROUND MATERIAL
This section focuses on survival analysis with fixed covariates as background to modelling with
time-dependent covariates, and on themixture of Polya trees prior that wewill employ throughout.
We ultimately focus on three survival analysis models as discussed in Section 1, but in terms of
background material, the Cox model is well understood and the PKTD and COTD models with
fixed covariates reduce to the AFT model. Thus we focus here on the AFT model with fixed
covariates, followed by an introduction to mixtures of Polya trees and some discussion about
potential advantages of this model.
Hanson & Johnson (2002) present a Bayesian semiparametric AFT model in which the base-
line survival distribution has a mixture of finite Polya trees prior. Let T be a survival time
for an individual with covariates x = (x1, . . . , xp)′. Let F0(t) = P(T0 ≤ t), S0(t) = P(T0 > t),
f0(t) = (d/dt)F0(t), andh0(t) = f0(t)/S0(t) be the baseline cdf, survival function, pdf, and hazard
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
62 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
function respectively. Hanson & Johnson (2002) consider the model
T = e−xβT0, T0 ∼ S0, S0 ∼∫
PTM(w, Gθ)p(θ) dθ,
where β is a p-dimensional vector of regression coefficients including an intercept. Given S0, the
survivor function for such a T is
S(t|x, β) = e−exβ∫ t
0h0(s e
xβ) ds = e−H0(exβt),
where H0(t) = ∫ t
0 h0(s) ds. The notation S0 ∼ ∫ PTM(w, Gθ)p(θ) dθ is shorthand for a particular
prior that “centres” S0 on a parametric family {Gθ : θ ∈ �} with weight w > 0. We will refer to
c ≡ exβ as the acceleration factor and we will say that T ∼ AFT (c, S0). Note that the cumulative
hazard is H(t|x, β) = H0(ct).
To give a brief description of thePolya tree prior as used here and bymanyothers (Lavine, 1992,
1994; Walker & Mallick, 1997, 1999; Hanson, 2006), let M be a positive integer. Let Gθ denote
a parametric family of cumulative distribution functions, such as the log-normal or log-logistic,
indexed by θ. A Polya tree (PT) prior is constructed from a set of partitions �θM and a familyAM
of positive reals. Here Gθ is the centring distribution of the Polya tree prior. Following Walker
& Mallick (1999) and Hanson & Johnson (2002), we constrain S0(1) = 0.5 with probability 1,
thus yielding a generalization of a standard median regression model on the log scale, that is, the
median failure time for given covariate x is e−xβ. This necessitates Gθ(1) = 0.5 for all θ, in the
non time-dependent covariate case.
Define the partition �θM = {Bθ
ε : ε ∈ ⋃Ml=1{0, 1}l}. If j is the base-10 representation of the
binary ε = ε1 · · · εk at level k, then Bθε1···εk
is defined to be the interval (G−1θ (j/2k), G−1
θ ((j +1)/2k)). For example, with k = 3, and ε = 000, then j = 0 and Bθ
000 = (0, G−1θ (1/8)), and with
ε = 001, then j = 1 and Bθ001 = (G−1
θ (1/8), G−1θ (2/8)), etc.
Note then that at each level k, the class {Bθε : ε ∈ {0, 1}k} forms a partition of the positive
reals and furthermore Bθε1···εk
= Bθε1···εk0
⋃Bθ
ε1···εk1for k = 1, 2, . . . , M − 1. We take the family
AM = {αε : ε ∈ ⋃Mj=1{0, 1}j} to be defined by αε1···εk
= wk2 for somew > 0 (Walker &Mallick,
1999; Hanson & Johnson, 2002). The parameter w acts much like the precision in a Dirichlet
process (Ferguson, 1973). As w tends to zero the posterior baseline is almost entirely data-driven.
As w tends to infinity we obtain a fully parametric analysis. A prior can be placed on w (Hanson,
2006) but it is also common practice to select w = 1, as we do here.
Given�θM andAM , the Polya tree prior is defined up to levelM by the class of random vectors
YM = {(Yε0, Yε1) : ε ∈ ⋃M−1j=1 {0, 1}j} through the product
S0(Bθε1···εk
|YM, θ) =k∏
j=1
Yε1···εj ,
for k = 1, 2, . . . , M, where we define S0(A) to be the baseline measure of any set A. Vector
(Y0, Y1) is set to (0.5, 0.5) to ensure S0(1|YM, θ) = 0.5; the remaining vectors (Yε0, Yε1) are
independent Dirichlet:
(Yε0, Yε1) ∼ Dirichlet(αε0, αε1), ε ∈M−1⋃j=1
{0, 1}j.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 63
Beyond sets at the level M in �M we assume S0|YM, θ follows the baseline Gθ . Hanson
& Johnson (2002) show that this assumption yields predictive distributions that are the same
as from a fully specified (infinite) Polya tree for large enough M; this assumption also avoids a
complication involving infinite probability in the tail of S0|YM, θ that arises from taking S0|YM, θ
to be flat on these sets. Note that S0(Bθε1···εM
|Bθε1···εM−1
,YM, θ) = Yε1···εM which converges to 0.5
in probability. This implies that as M grows S0(A|B,YM, θ) ≈ Gθ(A|B) for sets B with small
Lebesgue measure and A ⊂ B.
Define the vector of probabilities p = p(YM) = (p1, p2, . . . , p2M )′ as pj+1 =S0(B
θε1···εM
|Y, θ) =∏Mi=1 Yε1···εi where j is the base-10 representation of ε1 · · · εM . After
simplification, the baseline survival function is,
S0(t|YM, θ) = pN
[N − 2MGθ(t)
]+2M∑
j=N+1
pj, (1)
where N denotes the integer part of 2MGθ(t) + 1 and where gθ(·) is the density corresponding toGθ . The density associated with S0(t|YM, θ) is given by
f0(t|YM, θ) =2M∑j=1
2Mpjgθ(t)IBθεM (j−1)
(t) = 2MpNgθ(t), (2)
where εM(i) is the binary representation ε1 · · · εM of the integer i and N is as above. Note that the
number of elements of YM may not be prohibitively large. This number is∑M
j=1 2j = 2M+1 − 2.
For M = 5, a typical level, this is 62. To obtain p,∑M
j=2 2j = 2M+1 − 4 multiplications are
required; for M = 5 this is 60.
The mixture of Polya trees (MPT) prior provides an intermediate choice between a strictly
parametric analysis and allowing S0 to be completely arbitrary. In some ways it provides the best
of both worlds. In areas where data are sparse, such as the tails, the MPT prior places relatively
more posterior mass on the underlying parametric family {Gθ : θ ∈ �}. In areas where data are
plentiful the posterior ismore data driven; and features not allowed in the strictly parametricmodel,
such as left-skew and multimodality, become apparent. The user-specified weightw controls how
closely the posterior follows {Gθ : θ ∈ �} with larger values of w yielding inference closer to
that obtained from the underlying parametric model.
3. THREE MODELS FOR STEP PROCESS COVARIATES
In this section we discuss models for time-dependent covariates that exhibit a countably infinite
number of jumps in time. We develop methodology for such covariates assuming models due to
Prentice & Kalbfleisch (1979), Cox & Oakes (1984), and the proportional hazards model of Cox
(1972). The first two reduce to the standard AFT model when covariates are fixed for all time. All
three models involve a baseline survivor function S0; our methodology involves placing an MPT
prior on S0 in each instance.
3.1. The AFT Model of Prentice & Kalbfleisch (1979)Prentice & Kalbfleisch (1979) suggest a generalization of the AFTmodel allowing for TDC’s that
has been largely undeveloped in the literature. Exceptions where parametric versions of the model
are fit includeChintagunta (1998) and Seetharaman (2004); wewere unable to find semiparametric
versions of the model.
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
64 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
In the previous section we defined the basic AFT regression model for a single observation.
We assume that there are p − 1 covariates and that any or all of these can be time dependent. The
observed processes are assumed to be jump processes in discrete time, also called step stresses inreliability literature (e.g., Bagdonavicius & Nikulin, 2000a). Define the observed vector of p − 1
processes as x ≡ {x(s) : s ≤ u}where u is the largest time at which the covariates are “observed,”
and is less than the survival time T or a censoring time.
As in the fixed covariate case, we assume a baseline survival function S0. This distribution
now corresponds to the survival function for an individual with constant, zero covariates for all
t. That is, if T0 corresponds to covariates x(s) = 0 for all s, then T0 ∼ S0. Covariates may be
transformed so that this baseline is interpretable. In this case the full model developed in Sections
3.1–3.3 allows for the inclusion of real prior information for S0 in terms of, say, the first and third
quartiles or other percentiles of S0.
In their (12), Prentice & Kalbfleisch (1979) define the hazard function for an individual with
covariate x(s) to be
h(s|x, β) = ex(s)βh0(s ex(s)β), s ≤ u,
where h0(·) is an arbitrary “baseline” hazard function. Thus for periods between TDC change-
points, namely periods between times when x(s)β remains constant, the hazard is the same as the
hazard for an AFT model with covariates fixed to be those values at the beginning of the inter-
val. Now let {rj : j = 1, . . . , m} denote the ordered TDC changepoints in (0, u) and let r0 = 0.
We assume that s ∈ [rj−1, rj) implies that x(s) = x(rj−1) for j = 1, . . . , m and x(s) = x(rm) for
s ≥ rm. Define cj = ex(rj−1)β, j = 1, . . . , m + 1.
Let j∗(t) = maxj{rj ≤ t}. Then j∗(u) = m. The cumulative hazard for this individual is thus
H(t|x, β) =∫ t
0
ex(s)βh0(s ex(s)β) ds
=j∗(t)∑j=1
cj
∫ rj
rj−1
h0(scj) ds + c(j∗(t)+1)
∫ t
j∗(t)h0(sc(j∗(t)+1)) ds,
and the survivor function is S(t|x, β) = exp{−H(t|x, β)}.Define Sj(t) = P(T > t|T > rj−1) and let pj = Sj(rj). Then for t ∈ [rj−1, rj),
Sj(t) = e−cj
∫ t
rj−1h0(cjs) ds = e−{H0(cjt)−H0(cjrj−1)} = S0(cjt)
S0(cjrj−1).
We have
S(t|x, β) =
j∗(t)∏j=1
pj
S(j∗(t)+1)(t) =
j∗(t)∏j=1
S0(cjrj)
S0(cjrj−1)
S0(c(j∗(t)+1)t)
S0(c(j∗(t)+1)rj∗(t)), (3)
for all t ≤ u. This essentially characterizes the model.
For technical reasons, we require a slightly more elaborate specification than what was given
above. We assume that the covariate processes, x(s), take on their last values, x(rm) for all s ≥ rm.
Thus the formula (3) is defined for all t > 0, and provided cm > 0, the survivor function defined
there is proper since then, limt→∞ S(t|x, β) = 0.
Combining the PKTD specification of how covariates modify the baseline survival function
through time with an MPT prior for S0, we can construct the full likelihood. Our introduction
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 65
of the MPT is reflected by use of the notation (YM, θ). In particular, the contribution from an
individual with covariate process x(·) and survival time T = t is, where N denotes the integer
part of 2MGθ(cm+1t) + 1
Lx(β,YM, θ|T = t) =
m∏j=1
pj
2MpNf0(cm+1t|YM, θ)cm+1
S0(cm+1rm|YM, θ).
This results because the last observed jump occurs before u, and so j∗(t) = m, cj∗(t)+1 becomes
simply cm+1, and pj = S0(cjrj|YM, θ)/S0(cjrj−1|YM, θ). Taking the derivative of 1 − S(t|x, β)
in (3) yields the density evaluated at T = t. The likelihood contribution for an observation right-
censored at time t is
Lx(β,YM, θ|T > t) =
m∏j=1
pj
S0(cm+1t|YM, θ)
S0(cm+1rm|YM, θ).
The complete data involve n independent event times, {ti}ni=1, that are the observed survival
times (Ti = ti) or are right-censoring times (Ti > ti), and n covariate processes {xi(·)}ni=1. There
are thus n likelihood contributions like the one above. Define the contribution for individual
i with covariate process xi(·) as Li(β,YM, θ). Then the complete likelihood is L(β,YM, θ) =∏ni=1 Li(β,YM, θ). There are additional notational changes. For example, mi is the number of
TDC changepoints for individual i occurring at or before time ui and rij is the time of the jth
changepoint for individual i, etc.
The random quantities in the prior are thus YM , θ, and β, and all are assumed a priori inde-
pendent. We assume throughout that (β, θ) has an improper uniform distribution. Our methods
are easily modified to account for real prior information (Bedrick, Christensen & Johnson, 2000).
In the Gibbs sampler we alternate between sampling β, θ|YM and YM |β, θ (where dependence
on the data is suppressed). The former can be sampled via a Metropolis–Hastings step (Tierney,
1994) or slice sampling (Neal, 2003).
The full conditional YM |β, θ will not have a recognizable closed form. A simple Metropolis–
Hastings step for updating the components (Yε0, Yε1) first samples a candidate (Y∗ε0, Y
∗ε1) from
a Dirichlet(mYε0, mYε1) distribution, where m > 0, typically m = 20 or 30. This candidate is
accepted as the “new” (Yε0, Yε1) with probability
ρ = min
{1,
�(mYε0)�(mYε1)(Yε0)mY∗
ε0−wj2 (Yε1)
mY∗ε1
−wj2L(β,Y∗M, θ)
�(mY∗ε0)�(mY∗
ε1)(Y∗ε0)
mYε0−wj2 (Y∗ε1)
mYε1−wj2L(β,YM, θ)
},
where j is the number of digits in the binary number ε0 and Y∗M is the set YM with (Y∗
ε0, Y∗ε1)
replacing (Yε0, Yε1). This may be done in blocks or all at once with the acceptance probability
being changed accordingly, although we have had good luck simply sampling the components of
YM one at a time.
For interval censored data, where T ∈ [a, b), for b < ∞ we assume that the observed TDC’s
are constant over the interval [a, b). This would be the case for events like death since if vital signs
were taken at some time during the censoring interval, then it would be known that the individual
was still alive at the time of taking the information. We assume that u = a and that there is no
new measurement on any TDC in [a, b). If this is not the case, our formulas are easily modified,
though at the expense of losing notational simplicity, to account for the extra information. The
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
66 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
likelihood contribution is
Lx(β,YM, θ|T ∈ [a, b)) =
m∏j=1
pj
S0(cm+1a|YM, θ) − S0(cm+1b|YM, θ)
S0(cm+1rm|YM, θ).
The likelihood contribution for right truncated data is similarly obtained.
3.2. The AFT Model of Cox & Oakes (1984)The MPT approach allows fitting versions of a model due to Cox & Oakes (1984) and the PH
model (Cox, 1972), discussed in the next section, for time-dependent covariates. Cox & Oakes
(1984) develop a model in which an individual with covariates x(·) and survival time T uses up
her life at the rate of e−x(t)β relative to “baseline time.” Where T0 is the time a baseline individual
lives, this implies the relationship T0 = ∫ T
0 ex(s)β ds. The model is succinctly written as
S(t|x, β) = S0(c(t)t), c(t) = 1
t
∫ t
0
ex(s)β ds.
Here, we can interpret c(t) as the average value of an acceleration factor ex(s)β over s ∈ [0, t). The
equivalent specification through the hazard function is
h(t|x, β) = ex(t)βh0(c(t)t).
A generalization of this model that also includes PH was given by Shyur, Elsayed & Luxhøj
(1999) who model logh0(·) using quadratic splines. More recently, Tseng, Hsieh &Wang (2005)
consider jointly modelling survival via COTD with a univariate TDC, modeled via a normal
random-effects model. In this article we place an MPT prior on S0 and proceed as before, noting
that
S(t|x, β) = S0
(t − rj∗(t))cj∗(t)+1 +
j∗(t)∑i=1
(ri − ri−1)ci|YM, θ
,
and
f (t|x, β) = f0
(t − rj∗(t))cj∗(t)+1 +
j∗(t)∑i=1
(ri − ri−1)ci|YM, θ
cj∗(t)+1.
See also (18) in Bagdonavicius & Nikulin (2000b).
One can understand this model through the survivor function S(t|x, β) = S0(c(t)t). So if we
have standardized the covariate process, suppose that we have an individual with c(t) ≡ 1, for
all t, so that we can regard them as a baseline individual with survivor function S0. Then we can
compare this individual to one with c(t) = 2, say, for a value of t of interest. Then the prospects
of living at least t units of time for this individual are the same as the prospects for the baseline
individual to live at least 2t units. This model then allows survival prospects to slide forwards or
backwards relative to a hypothetical baseline individual. It is not possible with the PKTD model
to make such a simple comparison with a baseline individual.
Now consider the conditional survival distributions, given survival up to time s, in the PKTD
and COTD models, namely
e−∫ t
sex(v)βh0(ve
x(v)β) dvand e
−∫ t
sex(v)βh0(vc(v)) dv,
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 67
respectively. It is clear then that in the PKTD model, median (or any quantile) residual life
and the hazard function immediately change to values associated with the change in covariates,
while the effect of the covariates prior to time s may have an effect on the comparable quantities
under the COTD model. For example, consider two individuals. A and B, identical in every
way except individual A develops staph infection at time t1. According to the PKTD model,
as soon as this person gets infected his or her median residual life and hazard function are
the same as others who have had staph since the start for any time after t1. Say at t2 the staph
infection has been eradicated. Then given that individual A has lived past t2, it is as if the staph
infection never happened whereas under the model of Cox and Oakes the median residual life
and hazard functions would reflect this unhappy and (presumedly) detrimental episode well into
the future. That is, under Cox and Oakes the median residual life would be shorter and the hazard
higher for all time after t2 relative to the PKTD model. Thus when considering residual life, the
PKTD model forgets previous time-dependent events while the COTD model has the capacity
to remember. When considering overall (unconditional) survival, both types of curves will drop
at rates dictated by the relative effects of the TDC’s as they vary in time.
3.3. The Model of Cox (1972)The hazard function under the CTD model for an individual is
h(t|x, β) = ex(t)βh0(t).
Then given S0 and β we obtain
S(t|x, β) =
j∗(t)∏j=1
[S0(rj)
S0(rj−1)
]cj
[
S0(t)
S0(rj∗(t))
]cj∗(t)+1
,
and
f (t|x) =
j∗(t)∏j=1
[S0(rj)
S0(rj−1)
]cj
[
cj∗(t)+1S0(t)cj∗(t)+1−1f0(t)
S0(rj∗(t))cj∗(t)+1
].
See also page 15 in Bagdonavicius & Nikulin (2000a); the methods of Section 3.1 apply.
Like the PKTD model, the model due to Cox (1972) also has the property that a change in
covariate value at a given time immediately changes the individual’s hazard. Until there is another
change, regression parameters are interpreted the same as in the fixed-covariate case. This ease
of interpretation makes the PH model attractive.
3.4. Commonality Among ModelsAll threemodels share the same baseline distribution. Therefore if one has information on baseline
survival one only need consider one prior distribution on S0 for all three models. This is what we
do.
Moreover, if S0 is exponential with parameter θ, then the PKTD, COTD, and CTD models
are all the same model. The likelihood is
L(β, θ) =n∏
i=1
mi∏j=1
e−θ[rij−ri,j−1] ex(ri,j−1)β
e[ti−ri,mi
] e−θx(ri,mi
)β
θδi .
This is the product of right-censored or observed-exactly exponential “observations” and can be
fit readily in SAS, S-plus, WinBUGS, et cetera, thus providing crude but very reasonable starting
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
68 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
values and covariancematrices for the candidate generating distributions for theMetropolis step to
sample β.We also take this approach. Bagdonavicius&Nikulin (2000b) show that the intersection
of CTD and COTD models is the class of exponential regression models.
Since all models that we discuss share the same baseline, we centre our baseline on the same
family as well. Natural candidates are parametric AFT models of the form Gθ(t) = G0{(log(t) −µ)/σ}; θ = (µ, σ), where G0(·) is either a standard normal, standard logistic, or extreme value
CDF; this specification includes the intercept\µ in Gθ .
3.5. Model ComparisonWe consider the problem of model selection, assuming that a set of important predictors has been
identified a priori. Choosing among non-nested semi-parametric models can be problematic since
traditional information criteria, such as the Akaike information criterion for example, rely on the
use of a full, differentiable likelihood. Although popular, Bayes factors are undefined for improper
priors, and can be quite sensitive to the choice of prior specification when real priors are used.
An alternative measure, termed the log pseudomarginal likelihood (LPML) measure (Geisser
& Eddy, 1979), is an aggregated measure of model fit based on the “leave-one-out” principle
used in procedures such as cross validation and the jackknife. The use of the LPML measure also
leads to a “pseudo Bayes factor” that can be interpreted in much the same manner as traditional
Bayes factors. In addition to being easy to compute, the LPML statistic does not depend on model
“focus” as does the Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002), and, unlike
Bayes factors, is defined when specifying improper priors.
In each model, for i = 1, . . . , n, we also obtain the Conditional Predictive Ordinate (CPO)
(Geisser, 1993) statistic:
CPOi = [S(ti|xi,D−i)]1−δi [f (ti|xi,D−i)]
δi .
For an uncensored observation CPOi is the posterior predictive density given the subset D−i
evaluated at the observed survival time ti. For a censored observation CPOi is P(Ti > ti|D−i).
The LPML statistic is given by
LPML =n∑
i=1
log(CPOi),
and the pseudo Bayes factor between models 1 and 2 is defined to be PBF12 = exp(LPML1 −LPML2), where LPMLi is the statistic from model i. The ratios of CPOi statistics from two
different models provide relative information on how well each case is fit. Chen, Shao & Ibrahim
(2000) discuss MCMC computation of CPOi. See Sahu & Dey (2004) for a recent application of
CPO and LPML statistics for comparing survival models.
Since there has not yet been justification of the use of the DIC (Spiegelhalter et al., 2002) in
semi-parametric settings, we will instead simply calculate the posterior expected log likelihood
PELL ≡ ∫ logL(β,YM, θ)p(β,YM, θ|D), where L(β,YM, θ) is the likelihood. The model with
the largest expected log likelihood is regarded as more plausible. Carlin & Louis (2008) suggest
that differences in DIC that are less than 5 are hardly worth mentioning, with differences greater
than 10 perhaps decisively indicating a preferred model in terms of prediction. Furthermore,
Draper & Krnjajic (2007, Sec. 4.1) have shown that the DIC can be viewed as an approximation
to the LPML. With this in mind, we look for differences in LPML that are greater than, say, 5 or
10 in magnitude as “decisive.”
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 69
4. SIMULATED DATA AND EXAMPLES
In this section two data sets are considered: the classic Stanford heart transplant data and data
involving cerebral edema in children with diabetic ketoacidosis. Data simulated exactly according
to each model provides some indication of how well the model selection criteria work, as well as
what MPT extensions of the parametric log-logistic model adds, and how well regression effects
are estimated.
4.1. Simulated DataAmodest simulation studywas carried out to determine the ability of PELL andLPML to correctly
identify the data generating mechanism, as well as the ability of the model to tie down the correct
regression effect. To keep things simple, datawere generated from two covariate groups, a baseline
with x0(t) = 0 for all t ≥ 0, and a second group with a covariate process consisting of one jump
x1(t) = I{t ≥ 4}; the baseline covariate was chosen with probability 0.5. Baseline survival is
assigned a mixture of two normal distributions 0.7N(6, 1.52) + 0.3N(12, 1); that is, the survival
curve for each model is given by S0(t) = 1 − 0.7�{(t − 6)/1.5) − 0.3�{t − 12}. The survival
curves from the group with TDC x1(t) are given by:
S1(t) =
I{t < 4}S0(t) + I{t ≥ 4}S0(4 + (t − 4) eβ) COTD
I{t < 4}S0(t) + I{t ≥ 4}S0(4)S0(eβt)/S0(eβ4) PKTD
I{t < 4}S0(t) + I{t ≥ 4}S0(4)1−eβS0(t)
eβCTD
The simulation reflects the situation where a covariate switches “on” at time t = 4 with
probability 0.5. Figure 1 illustrates the differences among models for β = 0.5. The PKTD model
exhibits the most pronounced deviation from baseline, where time is warped instantaneously at
t = 4 for those with covariate x1(t). The COTD also shows a time warping effect, but not as
drastic; for example, the distance between modes is a bit bigger. The CTD model shows the least
deviation from baseline and relatively little “warping.”
Figure 1: Survival densities for x1(t): dashed = PKTD, solid = COTD, thick solid = CTD. The dots =baseline x0(t).
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
70 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
The regression effect was fixed at β = 0.5. The same random-walk Metropolis–Hastings
update was used for each model:
µ∗
β∗
σ∗
∼ N3
µ
β
σ
,
0.02 0.01 0.00
0.01 0.10 0.00
0.00 0.00 0.02
2 .
Each model was fit assuming a baseline log-logistic centring family with c = 1 and J = 5.
The log-logistic family including an intercept was indexed by θ = (µ, σ) as
Gθ(t) = t1/σ exp(−µ/σ)
(1 + t1/σ exp(−µ/σ)).
Inferences from each fit were obtained from 10,000 MCMC iterates kept after a burn-in of 2,000.
Starting values were σ = 0.3, µ = 2.0, and β = 0.5, reflecting an approximate log-logistic fit to
the true bimodal baseline density.
The underlying parametric model c → ∞ was also fit to assess the necessity of the MPT
generalization versus a unimodal log-logistic centring distribution (recall that the true baseline
is bimodal). One hundred data sets of size n = 200 were generated for each of the three models
and PELL, LPML, coverage probabilities for β as well as interval size computed. The results of
the simulation study are summarized in Table 1. The mean credible interval sizes for the effect
β = 0.5 were quite similar across MPT and parametric models and are not included. Similarly,
the PELL and LPML rankings were similar, so only LPML are reported. The first four rows
of Table 1 pertain to estimating β = 0.5 under parametric and nonparametric models. For both
AFT models the mean squared error (MSE) decreases when the nonparametric model is used and
the coverage increases (PKTD) or stays the same (COTD). For the Cox model these trends are
reversed.
For a true model matched with itself, the proportion in the lower part of the table gives the
proportion of times the MPT model was picked relative to the parametric log-logistic model.
The more general MPT model is clearly preferable to the simple parametric model in terms of
prediction for all three models, most markedly the PKTD and CTD models.
Table 1: Results from simulating data of size n = 200 from each model 100 times.
True model
PKTD COTD CTD
√MSE—MPT 0.10 0.08 0.24√MSE—parametric 0.17 0.14 0.14
Coverage—MPT 0.90 0.90 0.82
Coverage—parametric 0.65 0.90 0.96
Fitted model Proportion picked
PKTD 1.00 0.65 0.92
COTD 0.90 0.87 0.96
CTD 0.98 0.63 1.00
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 71
The off-diagonals give the proportion of time the true columnmodel was picked relative to the
row. For example, for the PKTD model, the MPT PKTD model was picked over the parametric
model 100% of the time, the PKTD MPT model was picked over the COTD MPT model 90% of
the time, and the PKTD MPT model was picked over the CTD MPT model 98% of the time.
Figure 1 shows that the difference in the effect of two covariate trajectories x0(t) and x1(t)
on predictive densities is minor for the CTD model compared to the difference for PKTD and
COTD, perhaps contributing to the apparent paradox in terms of interval coverage and MSE.
We also performed simulations with covariates that jump at random locations, as well as other
scenarios. Overall, the “best” model as chosen by either PELL or LMPL provided reasonable
estimates of the true underlying densities, whether or not the model chosen was the true model
that generated the data. This highlights the fact that the best predictive model may not necessarily
be the “true” model. However, the ability of the criteria to pick the true model increased with the
sample size.
4.2. Stanford Heart Transplant DataCrowley&Hu (1977) presenteddata onpatients admitted to theStanfordHeartTransplant Program
and analyzed it using the Cox model with TDC’s. Lin & Ying (1995) use these same data to
illustrate their semiparametric estimation procedure for COTD that is based on a heuristically
constructed estimating equation and justified via asymptotic properties. Here we fit to these data
themodels CTD, COTD, and PKTDusing theMPT prior with a log-logistic base-measure,M = 5
andw = 1. Patients in the program either underwent a heart transplant operation or not. For those
patients that did not receive a new heart the TDC process is xi(t) ≡ 0. For patient i who received
a transplant, we denote by zi the time of transplant and define the TDC’s
xi1(t) ={0 if t < zi
1 if t ≥ zi
xi2(t) ={0 if t < zi
age at transplant – 35 if t ≥ zi
xi3(t) ={0 if t < zi
mismatch score – 0.5 if t ≥ zi
and xi(t) = (xi1(t), xi2(t), xi3(t))′. Results from the three posterior distributions are displayed in
Table 2.
These data were first analyzed using the CTD model by Crowley & Hu (1977) and involve
the time to death from after entry into the study, which was designed to assess the effect of
Table 2: Posterior inferences for Stanford heart transplant data.
Parameter PKTD COTD CTD-MPT CTD
PELL −461.3 −460.5 −458.3
LPML −468.0 −467.0 −464.1
Status −1.76 (−3.86, 1.57) −1.10 (−2.70, 0.50) −1.04 (−1.99, −0.17)−1.04 (−2.01, −0.07)
Age-35 0.104 (−0.020, 0.260)0.054 (−0.004, 0.133) 0.058 (0.015, 0.107) 0.055 (0.010, 0.100)
Mismatch-0.5 1.63 (−0.38, 3.89) 0.64 (−0.30, 1.52) 0.49 (−0.09, 1.03) 0.49 (−0.06, 1.04)
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
72 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
heart transplant on survival. Individuals entered the study and many received donor hearts at
some point according to availability of an appropriate heart and a prioritization scheme. Some
patients died before a suitable heart was found. The main TDC we considered was an indicator of
having received a heart, yes or no, at each time t. The second and third TDC’s were a mismatch
score, centred at 0.5, and age at transplant, centred at 35 years. These TDC’s switched on when
the heart was transplanted. While there are other covariates, these are the ones used by Lin &
Ying (1995), and which correspond to one of the models fit by Crowley & Hu (1977). Table 2
has posterior regression effect estimates and model selection criteria, along with results from
maximizing the partial likelihood. The models are ranked CTD, COTD, PKTD using the LPML
and PELL measures, although the differences among models are not large.
Our estimates for the CTD model are quite close to those obtained via partial likelihood.
Under the CTD model, consider two individuals aged 35 years with mismatch scores of 0.5. The
first individual receives no heart transplant while the second receives a new heart after 6 months.
The relative hazard comparing the individual with no heart transplant to the one with the heart
transplant equals one from time 0 to 6 months, and is e−β1 from that time on. A 95% probability
interval for the relative hazard after 6 months is (1.19, 7.31), and the posterior median is 2.83. So
under this model, the individual without the heart transplant is estimated to have about three times
the risk of the individual with the heart transplant, from 6 months on. Figure 2 displays estimated
survivor curves for these two individuals, and their 95% limits from the MPT CTDmodel and the
defaults using the survival package for the R computing language (e.g., the Aalen estimator
of survival coupled with the partial likelihood estimate of β). The two approaches agree quite
well. Turning to the PKTD model, the estimated median/mean residual life increases by a factor
of e1.76 = 5.81 after transplant; that is, transplant recipients can expect to live about six times as
long as those without transplants.
For comparison, we fit the model with a parametric exponential baseline survival func-
tion, yielding posterior median estimates for (β1, β2, β3) of (−2.74, 0.08, 0.98) and an LPML
of −486.3, much smaller than any value in Table 2. Thus the exponential model, the intersection
of the three semiparametric models, does not appear to be appropriate for these data.
Lin&Ying (1995) estimates for the COTDmodel were (−1.99, 0.096, 0.93), found from their
Table 1. These values are closer to the exponential fit described above than to our semiparametric
fit. This, in one sense, is not surprising. A component of the score function used in their estimation
procedure involves the unknown ratio λ′0(t)/λ0(t). For simplicity, Lin &Ying take this to be equal
to 0, which is optimal for an exponential baseline. We note that when we tried minimizing Lin
and Ying’s estimating function ‖U(b)‖ over a grid with λ0(t) ∝ 1, we obtained the point estimate
(−2.49, 0.11, 1.07), which is not too dissimilar to theirs and even closer to that obtained from the
Bayesian model assuming an exponential baseline.
Figure 2: Estimated survival curves and 95% probability intervals for individuals with mismatch score =0.5 and age = 35.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 73
4.3. Cerebral EdemaHere we develop survival models for data collected by Glaser et al. (2001), who assessed risk
factors associated with the onset of cerebral edema (CE) in children with diabetic ketoacidosis.
Cerebral edema is a dangerous complication associatedwith emergency department and in-patient
hospital care of children with diabetic ketoacidosis. Children with symptoms of diabetic ketoaci-
dosis are initially treated in the emergency department, then moved to the hospital, typically the
paediatric intensive care unit, over the course of 24 h. The main purpose of treatment is to nor-
malize blood serum chemistry and acid-base abnormalities. A major, but infrequent complication
of children associated with diabetic ketoacidosis and its treatment is CE, or swelling in the brain,
which may result in death or permanent neurological damage. After 24 h, the child is considered
to be “cured,” so we only consider those children who actually developed CE; thus there is no
censoring for these data (n = 58).
Glaser et al., developed epidemiologic models for assessing the probability of onset as a
function of potential risk factors and confounders. Our goal, on the other hand, is limited to
ascertaining the effect of treatment procedures, which are changing in time, and the effect of
fixed covariates, on the timing of CE. For example, what are the factors associated with early
versus late onset, if any, among those who develop CE? We could have considered an alternative
formulation where the yes/no binary onset is modeled as a binary regression jointly with the
survival model considered here for those who did experience CE. The development in Hanson et
al. (2003) illustrates such an analysis, in the absence of time-dependent covariates. They modeled
the event of abortion in dairy cattle (yes/no) as a binary regression and, conditional on abortion,
modeled the timing of abortion with a semi-parametric survival model. If animals failed to abort
within 260 days, they would have given birth so there was also no censoring, unless a cow was
culled before aborting or giving birth.
4.3.1. The dataAt the time of initial admission to the emergency department, several measurements were taken,
and at the same time, various treatments applied, continuously for up to 24 h. The original data
include 19 fixed covariates and 12 time-dependent covariates. Rather than attempt a complete
analysis with so many variables, we considered only a subset of the covariates for our illustration,
thus the analysis should only be considered as a demonstration of how our modelling works rather
than a definitive analysis of these data.
The only fixed variable considered here is age (in years) at the time of admission to the emer-
gency department. Time-dependent covariates were recorded hourly rather than continuously.
Two types of TDC’s are considered. The first involves simply monitoring biochemical variables
over time. Two of these included here are serum bicarbonate (Serum-BIC) (concentration in the
blood measured in mmol/L) and blood urea nitrogen (BUN) (concentration measured in mg/dl).
The second type involves actions by the physicians over time. The two used here are fluids admin-
istered (FL) (volume of fluids inml/kg/h) and sodium administered (NA) (measured inmEq/kg/h).
The ith patient in the data has a covariate process given by {(rij, x(rij))}mi
j=1, i = 1, . . . , 58, and
is diagnosed with CE at time ti. Note that then rimi < ti. None of the event times are censored.
4.3.2. Initial data analysisWe used the log-logistic family with unknown scale to centre the three MPT survival models with
TDC’s with M = 4 and w = 1. After selecting candidate generating distributions, we were able
to establish convergence of our chains by standard techniques after 10,000 iterations. Estimates
based on fitted models are shown in Table 3.
Plots of log CPOi values versus the index i showed that the data are supported similarly by the
three models. Plots of the residuals versus predicted values also showed no obvious lack of fit for
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
74 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
Table 3: Posterior inferences for cerebral edema data.
Par. PKTD COTD CTD
PELL −167 −168 −168
LPML −176 −176 −175
Age (Fixed) 0.028 (−0.01, 0.08) 0.021 (−0.02, 0.07) 0.044 (−0.02, 0.11)
Serum-Bic (TD) 0.04 (−0.01, 0.13) 0.05 (−0.02, 0.12) 0.06 (−0.05, 0.17)
Serum-BUN (TD) −0.005 (−0.02, 0.01) −0.01 (−0.022, 0.005) −0.00 (−0.03, 0.03)
Adm-FL (TD) −0.03 (−0.09, 0.03) −0.05 (−0.10, 0.02) −0.05 (−0.15, 0.04)
Adm-NA (TD) 0.60 (0.16, 0.93)∗ 0.74 (0.18, 1.2)∗ 0.90 (0.19, 1.57)∗
FL×NA −0.011 (−0.03, −0.00)∗ −0.013 (−0.03, 0.001) −0.014 (−0.04, 0.003)
Serum-Bic2 −0.005 (−0.01, 0.006) −0.006 (−0.02, 0.003) −0.007 (−0.02, 0.005)
any of the models. Prediction intervals for the actual observations were skewed, but all contained
the observed values. Table 3 gives PELL and LPML values for each model, and there is again no
obvious distinction among the models.
Estimates of regression coefficients for all variables in the models have the same sign and
general magnitude across models. Under all models, there is a 99% posterior probability that the
coefficient for Admin-NA is positive and at least a 96% posterior probability that the coefficient
for the interaction is negative. The Serum-BIC variable has at least a 94% probability of being
positive across models. Other variables have between an 80% and 91% posterior probability of
being positive, or negative.
According to all models, since the linear term is positive and the quadratic term is negative the
effect of Serum-BIC increases and then decreases. For example, under the CTD model, we have
the estimated risk factor exp{0.06 (Serum-BIC) − 0.007 (Serum-BIC)2}, which equals 1 when
Serum-BIC is about 8.6, is maximized at the value 4.3, with risk factor 1.37, and takes on the
value 0.13 when Serum-BIC is 22. Normal levels are between 20 and 29 and lower values are
associated with diabetic ketoacidosis.
4.3.3. Comparative analysisComparing two children that are otherwise being treated the same over a period of time and who
are of the same age, the hazard of CE for a child with a value of 22, a value in the normal range
for Serum-BIC, will be considerably lower than for one with a value of 5. Similarly, under the
PKTD model, the failure time of the first child would be decelerated relative to the second, over
the same period of time, and under the COTD model, total time to CE for the first type of child
would be “used up” at a slower rate than it would for the second child, which amounts to CE
happening earlier for the first child than for the second.
The posterior density estimates and hazard functions for time to CE corresponding to patients
with specified TDC profiles are simple to obtain. Figure 3, for example, presents predictive
densities for a hypothetical patient of age 10, BUN = 25, fluids constant at 3.6, NA constant at
0.7, and Serum-BIC increasing from 5 to 22, as was the case for patient 5 in the data. All three
densities are at least bimodal, however the COTD and PKTD estimates are smoother, with modes
shifted to the left. The CTD estimate has visible hourly “spikes,” a consequence of the CTD
model specification coupled with data being recorded in hours. Smoother densities would have
been obtained with larger w.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 75
Figure 3: CE data; Predictive densities for subject of age 10, constant fluids at 3.6, BUN = 25, constantNA = 0.7 and Serum-BIC increasing from 5 to 22 over the 24 h. Solid=COTD, Thick solid=CTD, dashed
= PKTD.
Figure 4 presents an estimated relative hazard comparing two subjects who have the same
characteristics as the above hypothetical patient except that they both have BUN = 35 and subject
1 has constant NA = 0.7 and subject 2 has constant NA = 0.35. Observe that the CTD model
gives a constant relative hazard since the only difference in the two subjects is a TDC that is
remaining constant over time. According to this model, subject 1 is estimated to be about 1.35
times as much at risk of CE as subject 2 for all times. Under the PKTD and COTDmodels, subject
2 is at lower risk of CE, but the estimated risk varies considerably over the first 18 h.
4.3.4. Risk factor profilesWhile the interpretations ofmodel parameters and comparative risks such as above are quite useful,
the continuous monitoring and treatment of these patients also calls for a description of how the
risk evolves for individuals. Towards this end, we considered individual risk profiles obtained by
plotting the evolution of estimated risk factors (RFs) over time for individuals who experience
changing circumstances, as in our illustration. Here, an RF profile is defined as {exp(xi(t)β) : t ∈0, 1, . . . , ti} for each i = 1, . . . , 58}. We select regression estimates (posterior means) from one
of the three models. We obtained estimated RF profiles for the CTD model against t for the 58
Figure 4: CE data hazard ratio for subject with NA = 0.7 versus NA = 0.35. Dotted = CTD, dashed =COTD, solid = PKTD.
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
76 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
subjects. The RFs at time of diagnosis of CE had a median of 1.7 with first and third quartiles of
1.3 and 2.3; minimum and maximum were 0.1 and 15, respectively. There were four basic shapes
of these plots: (i) relatively sharp increase just before diagnosis (there were about 7 children with
this shape), (ii) relatively sharp increase followed by a decrease and then followed quickly by CE
(about 9 children), (iii) moderate to small increase followed by decrease followed by an overall
flattening of the plot and finally CE (there are about 20 childrenwith this shape), and (iv) relatively
flat or modulating curve but without sharp increases.
The type (i) shape appears to be dictated by a combination of increasing fluids and increasing
sodium administration. Plots of FL by NA for each patient indicate that FL and NA are posi-
tively associated across patients; administering more or less fluids corresponds to simultaneously
administering more or less sodium, respectively. The sample correlations between the estimated
RFs and FL, NA and Serum-BIC were 0.27, 0.57, and −0.41, respectively. As expected, larger
Serum-BIC was associated with smaller RFs.
The type (ii) scenario is generally associated with a relatively sharp increase in sodium and
or fluid administration, followed by a relatively quick decrease in these values. This was true for
patient 9 in particular; the RF went from 1.9 at baseline to 4.1 after the first hour, to 12.3 after
the third hour, and then down to about 4.5 roughly thereafter. FL went from 0 at baseline to 29
after the first hour and then immediately down to 5 in the next hour and back up to around 16 for
the remaining time until CE at 5 h. NA values went from 0 to 4.4 and then down to about 2.5
thereafter. The type (iii) scenario is similar only less dramatic.
Regarding the type (iv) scenario, there were three children that had flat plots for periods less
than 6 h; one with an estimated RF of about 2 for two consecutive hours, another with an RF of
1.2 for 3 h and the third with an RF of about 1 3/4 for 5 h. About 18 of the curves are relatively
flat for over 10 h. Conditions here involve no dramatic changes in treatment and relatively low
administrations of FL and NA. Later CE is thus evidently associated with these conditions.
Interestingly, patient 35 had a very large RF after the first hour (11.7), despite the fact that no
fluids were administered either at baseline or at time 1. NA administration was relatively large
(2.1) at that time (median NA in the sample was 0.59 and 75th percentile was 1.25). Moreover, in
the next hour, their RF went down to 3.8 after water was administered at the second hour at the
rate FL = 29.2 (median FL = 7.6 and third quartile = 11.3), and NA went up to 4.5. This nicely
illustrates the effect of the interaction between FL and NA, since the sign of the coefficient for it is
negative. According to our models, having no or relatively low fluids, and relatively high sodium
administration results in a higher risk of CE diagnosis than having relatively high administration
of both.
The above presentation was based on the CTD model, which is the simplest to interpret
among the three. If the PKTD model had been accepted, we could now proceed to calculate the
RFs as above based on the coefficients from that model and we would call them acceleration
factors. Magnitudes would be attenuated somewhat for higher rates of sodium infusion, and their
interpretations require modification.
4.3.5. Concluding remarksAs a final comment in this illustration, we again remind the reader that our analysis is far from
complete. For example, we do not adjust the sodium infusion rate for the baseline sodium con-
centration in our analysis because baseline sodium was nonsignificant in a preliminary analysis.
Moreover, the sample size is very small, and the diagnosis could well have come after the actual
onset of CE. And finally, one must be very careful to not over interpret the associations found
between increasing fluids and sodium and the onset of CE since it may well be the case that the
physicians were simply reacting to a child who was not doing well, in which case it could be
the onset of CE that was bringing on the higher levels of these factors rather than the other way
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 77
around.We also remark that users of our methodology are encouraged to plot and compare hazard
functions based on a selected model for different TDC scenarios.
5. DISCUSSION
We have presented a unified approach to handling two standard semiparametric survival models
for survival data with time-dependent “step-stress” covariates, and we have taken a third model
that was suggested by Prentice and Kalbfleisch and developed it as an alternative semiparametric
model. This latter model has ready interpretation of regression effects in terms of residual life. We
presented methods of comparing models and illustrated them on two data sets and in a simulation.
In the cerebral edemadata sets, therewas little difference in themodel ranking criteria; theStandard
heart transplant data showed more of a difference, but not marked. However, we have found in
other data rather decisive rankings when choosing among survival models. For example, for a
large (n = 10,973) data set involving the time to bankruptcy of firms and seven time-dependent
predictor variables, the COTD model was decisively chosen relative to CTD (LPML = −1,901
versus LPML = −1,925). Analyzing the same (n = 251) medfly data as Tseng, Hsieh & Wang
(2005), we found the models of Sundaram (2006) and CTD to be far superior to COTD (LPML
values of −865, −866, and −938).
Results obtained here are easily extended to right truncated data by straightforward modifica-
tion of the likelihood functions, while left truncated data create difficulties due to the necessity of
observing covariate processes from time zero. In theory, these models can be fit to arbitrary con-
tinuous x(t) subject to ‖x(t)‖ being bounded. Themain obstacle in fitting themodels is performing
the integrations involved in, for example in the PH model,
S(t|x, β) = exp
{−∫ t
0
ex(s)βh0(s) ds
}
and its derivative. However, the function h0(s|YM, θ) is simply computed from (1) and (2) and
any number of numerical integration techniques can be used for this purpose.
We finally note that our TDC’s in the heart transplant example were external covariates,
while those in the CE example were internal (see Kalbfleisch & Prentice, 1980). The practical
distinction is that internal covariates are not observed after “death” or censoring, while external
covariates are. While fitting models is unaffected by the distinction, inferences about conditional
survivor functions and predictive densities are. For example, when we calculate a predictive
density conditional on a TDC process for an individual in the CE data, we have to impute values
for the process beyond the time of their removal from the study.
ACKNOWLEDGEMENTSThe authors thank Dr. Nathan Kuppermann and Dr. Nicole Glaser of the University of California
at Davis Medical School for permission to use their CE data and for providing medical insight
for the analysis presented here. We also thank two referees and an associate editor for comments
and suggestions that resulted in an improved manuscript.
BIBLIOGRAPHYO. O. Aalen (1980). A model for nonparametric regression analysis of counting processes. In Springer
Lecture Notes in Statistics, W. Klonecki, A. Kozek and J. Rosinski, editors, 2, pp. 1–25.
V. Bagdonavicius&M. S.Nikulin (2000a).Mathematicalmodelling of failure-time in dynamic environment.
Preprint of the St. Petersburg Mathematical Society, 2000–2004.V. Bagdonavicius &M. S. Nikulin (2000b). Statistical analysis of semiparametric models in accelerated life
testing. Preprint of the St. Petersburg Mathematical Society, 2000–2002.
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
78 HANSON, JOHNSON AND LAUD Vol. 37, No. 1
E. Bedrick, R. Christensen &W. Johnson (2000). Bayesian accelerated failure time analysis with application
to veterinary epidemiology. Statistics in Medicine, 19, 221–237.B. P. Carlin & T. A. Louis (2008). “Bayesian Methods for Data Analysis,” 3rd ed., Chapman and Hall/CRC
Press, Boca Raton, FL.
M.-H. Chen, Q.-M. Shao & J. G. Ibrahim (2000). “Monte Carlo Methods in Bayesian Computation,”Springer-Verlag, New York.
P. K. Chintagunta (1998). Inertia and variety seeking in amodel of brand purchase timing.Marketing Science,17, 253–270.
D. R. Cox (1972). Regression models and life tables (with discussion). Journal of the Royal StatisticalSociety, Series B, 34, 187–208.
D. R. Cox & D. Oakes (1984). “Analysis of Survival Data,” Chapman and Hall, London.
J. Crowley &M. Hu (1977). Covariance analysis of heart transplant data. Journal of the American StatisticalAssociation, 77, 27–36.
D. Draper & M. Krnjajic (2007). Bayesian model specification. Technical report, Department of Applied
Mathematics and Statistics, University of California – Santa Cruz.
T. S. Ferguson (1973). ABayesian analysis of some nonparametric problems.Annals of Statistics, 1, 209–230.S. Geisser (1993). “Predictive Inference: An Introduction,” Chapman & Hall, London.
S. Geisser&W. F. Eddy (1979). A predictive approach tomodel selection. Journal of the American StatisticalAssociation, 74, 153–160.
N. Glaser, P. Barnett, I. McCaslin, D. Nelson, J. Trainor, J. Louie, F. Kaufman, K. Quayle, M. Roback, R.
Malley &N. Kuppermann (2001). Risk factors for cerebral edema in children with diabetic ketoacidosis.
New England Journal of Medicine, 344, 264–269.T. E. Hanson (2006). Inference for mixtures of finite Polya tree models. Journal of the American Statistical
Association, 101, 1548–1565.T. Hanson &W. O. Johnson (2002). Modeling regression error with a mixture of Polya trees. Journal of the
American Statistical Association, 97, 1020–1033.J. D. Kalbfleisch & R. L. Prentice (1980). “The Statistical Analysis of Failure Time Data,” Wiley, NY.
M. Lavine (1992). Some aspects of Polya tree distributions for statistical modeling. Annals of Statistics, 20,1222–1235.
M. Lavine (1994). More aspects of Polya tree distributions for statistical modeling. Annals of Statistics, 22,1161–1176.
D. Lin & Z. Ying (1995). Semiparametric inference for the accelerated failure time model with time-
dependent covariates. Journal of Statistical Planning and Inference, 44, 47–63.T. Martinussen & T. H. Scheike (2006). “Dynamic Regression Models for Survival Data,” Springer-Verlag:
New York.
R. M. Neal (2003). Slice sampling (with discussion). Annals of Statistics, 31, 705–767.R. L. Prentice & J. D. Kalbfleisch (1979). Hazard rate models with covariates. Biometrics, 35, 25–39.S. K. Sahu & D. K. Dey (2004). On multivariate survival models with a skewed frailty and a correlated
baseline hazard process. “Skew-Elliptical Distributions and Their Applications: A Journey BeyondNormality,” M. G. Genton, editor, CRC/Chapman & Hall, Boca Raton, FL, pp. 321–338.
P. B. Seetharaman (2004). The additive risk model for purchase timing. Marketing Science, 23, 234–242.H.-J. Shyur, E. A. Elsayed & J. T. Luxhøj (1999). A general model for accelerated life testing with time-
dependent covariates. Naval Research Logistics, 46, 303–321.D. J. Spiegelhalter, N. G. Best, B. P. Carlin & A. van der Linde (2002). Bayesian measures of model
complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B, 64, 583–639.S. Sundaram (2006). Semiparametric inference in proportional odds model with time-dependent covariates.
Journal of Statistical Planning and Inference, 136, 320–334.L. Tierney (1994). Markov chains for exploring posterior distributions. Annals of Statistics, 22, 1701–1762.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2009 SEMIPARAMETRIC INFERENCE 79
Y.-K. Tseng, F. Hsieh, & J.-L. Wang (2005). Joint modelling of accelerated failure time and longitudinal
data. Biometrika, 92, 587–603.S.G.Walker&B.K.Mallick (1997). Hierarchical generalized linearmodels and frailtymodelswithBayesian
nonparametric mixing. Journal of the Royal Statistical Society, Series B, 59, 845–860.S. G. Walker & B. K. Mallick (1999). Semiparametric accelerated life time model. Biometrics, 55, 477–483.D. Zeng & D. Y. Lin (2007). Maximum likelihood estimation in semiparametric regression models with
censored data. Journal of the Royal Statistical Society, Series B, 69, 507–564.
Received 11 April 2007Accepted 28 September 2008
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique