Bayesian Semiparametric Proportional Odds Models

Biometrics 63, 88–95March 2007

DOI: 10.1111/j.1541-0420.2006.00671.x

Bayesian Semiparametric Proportional Odds Models

Timothy Hanson

Division of Biostatistics, School of Public Health, University of Minnesota,Minneapolis, Minnesota 55455, U.S.A.

email: [email protected]

and

Mingan Yang

Biostatistics Branch, National Institute of Environmental Health Sciences,Research Triangle Park, North Carolina 27709, U.S.A.

Summary. Methodology for implementing the proportional odds regression model for survival data assum-ing a mixture of finite Polya trees (MPT) prior on baseline survival is presented. Extensions to frailties andgeneralized odds rates are discussed. Although all manner of censoring and truncation can be accommo-dated, we discuss model implementation, regression diagnostics, and model comparison for right-censoreddata. An advantage of the MPT model is the relative ease with which predictive densities, survival, and haz-ard curves are generated. Much discussion is devoted to practical implementation of the proposed models,and a novel MCMC algorithm based on an approximating parametric normal model is developed. A modestsimulation study comparing the small sample behavior of the MPT model to a rank-based estimator and areal data example is presented.

Key words: Frailty; Generalized odds rate; Hazard curve; Mixture of Polya trees; Regression; Survivalanalysis; Transformation model.

1. IntroductionThe proportional odds (PO) model has recently gained atten-tion as an alternative to proportional hazards (Cox, 1972) andaccelerated failure time models. All three of these models aretypically specified to be semiparametric where the paramet-ric portion of the model provides a succinct summary relatingpatient survival to a relatively small number of regression co-efficients: risk factors, acceleration factors, or in the presentarticle a multiplicative factor quantifying the relative odds ofsurvival. The nonparametric part of the model, the baselinehazard or survival function, is typically left as arbitrary aspossible, so inference does not depend on a particular para-metric form such as log normal, log logistic, or Weibull. How-ever, in addition to numerical summaries such as accelerationfactors, researchers are often also interested in hazard andsurvival functions, as well as predictive survival densities.

We motivate the proposed methodology with the well-known Veterans Administration (VA) Lung Cancer datasetintroduced by Prentice (1973) and subsequently analyzed bymany authors in the context of the semiparametric PO model(Cheng, Wei, and Ying, 1997; Murphy, Rossini, and van derVaart, 1997; Yang and Prentice, 1999) as well as parametricmodels (Farewell and Prentice, 1977; Bennett, 1983b). Thesedata are comprised of the lifetimes in days of men with ad-vanced inoperable lung cancer. Important predictors of sur-vival have been well established by other authors, namely tu-mor type (large, adeno, small, squamous) and a general fitness

performance score ranging from 10 (completely hospitalized)to 90 (able to take care of oneself).

Most existing approaches to fitting the PO model focuson the estimation of regression effects; a few extend estima-tion to survival curves (e.g., Cheng et al., 1997; Zeng, Lin,and Yin, 2005). The approach that we propose in this articleallows the simultaneous estimation of regression coefficientsand the baseline survival density. Other common loci of infer-ence, such as hazard curves, are therefore routine to obtain,as are more complicated functionals such as hazard ratios,survival quantiles, etc. Specifically, the nonparametric part ofthe model, the baseline survival density f0, is assumed to fol-low a mixture of finite Polya trees (MPT) prior (Lavine, 1992;Berger and Guglielmi, 2001; Hanson and Johnson, 2002). Thisprior is attractive in that it is easily centered at a paramet-ric family such as the class of log-logistic distributions. Whensample sizes are small, posterior inference has much more ofa parametric flavor due to the centering family. When samplesizes increase, the data take over the prior and features suchas multimodality and left skew are well modeled. For the VAdata, analyzed in Section 7, we observe multimodal predictivesurvival densities; we are also able to obtain an estimate ofmedian survival versus performance score with 95% credibleintervals for a particular tumor type.

Sinha and Dey (1997) discuss other nonparametric pri-ors such as the gamma process, beta process, Dirichlet pro-cess, and smoothed versions of these processes. The most

88 C© 2006, The International Biometric Society

Bayesian Semiparametric Proportional Odds Models 89

widely used of the priors, the celebrated Dirichlet process(Ferguson, 1973), is intractable in the PO model. Ibrahim,Chen, and Sinha (2001, p. 94) note that “Dirichlet processesare quite difficult to work with in the presence of covariates,because they have no direct representation through either thehazard or cumulative hazard function.” Other Bayesian semi-parametric approaches to the PO model have been scarce.Mallick and Walker (2003) propose a Bayesian transforma-tion model that includes PO as a special case. Banerjeeand Dey (2005) discuss the implementation of a PO modelwith spatially varying frailties. Both approaches utilize

the nonparametric transformation set forth in Mallick andGelfand (1994) and are discussed briefly in Hanson and Yang(2006).

Given a baseline survival function S0 and vector of regres-sion coefficients β, the PO model defines the survival functionSx(t) for an individual with p-dimensional covariate vector xthrough the relation

Sx(t)

1 − Sx(t)= e−x′β S0(t)

1 − S0(t). (1)

Approaches to estimating β in this model include Bennett(1983a), Murphy et al. (1997), and Yang and Prentice (1999).

In Section 2, we discuss the MPT prior for model (1), andin Section 3 outline Markov chain Monte Carlo (MCMC)schemes for obtaining inference. Section 4 discusses modelselection and validation. Extensions to frailties and the gen-eralized odds-rate model are developed in Section 5. Sec-tion 6 presents a simulation study and in Section 7 theVA data are analyzed. Discussion and future research are inSection 8.

2. Mixture of Polya Trees PriorWe consider model (1) with an MPT prior on S0. A mixture isattractive in that it smoothes over partitioning effects associ-ated with a simple Polya tree and the mixture model includesthe parametric log-logistic PO model as a special case. Thelog-logistic model is a natural choice for centering the MPTmodel as it has the PO property (Bennett, 1983b), which weexploit in Section 3.

Consider an MPT prior on S0,

S0 |θ ∼ PT (c, ρ,Gθ), θ ∼ p(θ), (2)

where equation (2) is shorthand for a particular MPT prior(Hanson and Johnson, 2002; Hanson, 2006). We briefly de-scribe the prior but leave details to these references andLavine (1992).

Let J be a fixed, positive integer and let Gθ denotethe family of log-logistic cumulative distribution functions,

Gθ(t) = 1 − {1 + (e−β0t)√τ}−1 for t ≥ 0, where θ = (β0, τ)′.

The distribution Gθ serves to center the random distributionS0. A Polya tree prior is constructed from a set of partitionsΠθ = {Bθ(ε) : ε ∈

⋃J

l=1{0, 1}l} and a family A of positive realnumbers. Here, the partition points are quantiles of the cen-tering family: if j is the base-10 representation of the binarynumber ε = ε1 · · · εk at level k, then Bθ(ε1 · · · εk) is definedto be the interval (G−1

θ (j/2k), G−1θ ((j + 1)/2k)], except the

“rightmost” set is Bθ(11 · · · 1) = (G−1θ ((2k − 1)/2k), ∞). For

example, with k = 3, and ε = 000, then j = 0 and Bθ(000) =(0, G−1

θ (1/8)], and with ε = 010, then j = 2 and Bθ(010) =(G−1

θ (2/8), G−1θ (3/8)], etc. A picture helps.

R+

Bθ(0) Bθ(1)

Bθ(00) Bθ(01) Bθ(10) Bθ(11)

Bθ(000) Bθ(001) Bθ(010) Bθ(011) Bθ(100) Bθ(101) Bθ(110) Bθ(111)

Note then that at each level k, the class {Bθ(ε) : ε ∈{0, 1}k} forms a partition of the positive reals and further-more Bθ(ε1 · · · εk) = Bθ(ε1 · · · εk0)

⋃Bθ(ε1 · · · εk1) for any bi-

nary ε1 · · · εk. We take the family A = {αε : ε ∈⋃M

j=1{0, 1}j}to be defined by αε1 ··· εk = wk2 for some w > 0 (Walker andMallick, 1999; Hanson and Johnson, 2002; Hanson, 2006). Theparameter w acts much like the precision in a Dirichlet process(Ferguson, 1973). As w tends to zero the posterior baseline isalmost entirely data driven. As w tends to infinity we obtaina fully parametric analysis.

Given Πθ and A, the Polya tree prior is defined up to levelJ by the random vectors Y = {(Yε0, Yε1) : ε ∈

⋃M−1j=0 {0, 1}j}

through the product of conditional probabilities

S0{Bθ(ε1 · · · εk) | Y,θ} =

k∏j=1

Yε1 ··· εj , (3)

for k = 1, 2, . . . ,M, where we define S0(A) to be thebaseline measure of any set A. For example, given Yand θ, the S0-probability of Bθ(110) is Y 110Y 11Y 1 asBθ(110)⊂Bθ(11)⊂Bθ(1).

The vectors (Yε0, Yε1) are independent Dirichlet:

(Yε0, Yε1) ∼ Dirichlet(αε0, αε1), ε ∈M−1⋃j=0

{0, 1}j . (4)

The Polya tree parameters “adjust” conditional probabil-ities, and hence the shape of the survival density f0, relativeto a parametric centering family of distributions. If the dataare truly distributed Gθ observations should be on averageevenly distributed among partition sets at any level j. Un-der the Polya tree posterior, if more observations fall intointerval Bθ(ε0) ⊂ R

+ than its companion set Bθ(ε1), the con-ditional probability Yε0 of Bθ(ε0) is accordingly stochastically“increased” relative to Yε1. This adaptability makes the Polyatree attractive in its flexibility, but also anchors the random S0

firmly about the family {Gθ : θ ∈ Θ}. Within sets at the levelJ in Πθ we assume S0 | Y,θ follows the baseline Gθ (Hanson,2006).

90 Biometrics, March 2007

Define the vector of probabilities pY = (pY(1), pY(2), . . . ,pY(2J))′ through

pY(j + 1) = S0{Bθ(ε1 · · · εJ) | Y,θ} =

J∏i=1

Yε1···εi ,

where ε1 · · · εJ is the base-2 representation of j, j = 0, . . . ,2J − 1. After simplification, the baseline survival function is

S0(t | Y,θ) = pY{kθ(t)}{kθ(t) − 2JGθ(t)} +

2J∑j=kθ(t)+1

pY(j),

(5)

where kθ(t) denotes the integer part of 2JGθ(t) + 1. Thedensity associated with S0(t | Y,θ) is given by

f0(t | Y,θ) =

2J∑j=1

2JpY(j)gθ(t)IBθ{εJ (j−1)}(t)

= 2JpY{kθ(t)}gθ(t), (6)

where gθ(·) is the density corresponding to Gθ and εJ (i) isthe binary representation ε1 · · · εJ of the integer i.

The MPT prior provides an intermediate choice between astrictly parametric analysis and allowing S0 to be completelyarbitrary. In some ways it provides the best of both worlds.In areas where data are sparse, such as the tails, the MPTprior places relatively more posterior mass on the underly-ing parametric family {Gθ : θ ∈ Θ}. In areas where data areplentiful the posterior is more data driven, and features notallowed in the strictly parametric model, such as left skew andmultimodality, become apparent. The user-specified weight wcontrols how closely the posterior follows {Gθ : θ ∈ Θ} withlarger values of w yielding inference closer to that obtainedfrom the underlying parametric model. Hanson (2006) showsthat w = 1 provides a great deal of variability about the para-metric centering family.

3. Likelihood Construction and MCMCAssume standard, right-censored survival data D = {(xi, ti,δi)}ni=1, and let Di denote the ith triple (xi, ti , δi). Let Ti ∼Sxi(·). As usual, δi = 0 indicates that ti is a censoring time,Ti > ti , and δi = 1 denotes that ti is a survival time, Ti = ti .

Given S0 (through Y and θ) and β, the survival functionfor covariates x is

Sx(t | Y,θ,β) =e−x′βS0(t | Y,θ)

1 − S0(t | Y,θ) + e−x′βS0(t | Y,θ), (7)

and the pdf is

fx(t | Y,θ,β) =e−x′βf0(t | Y,θ)

{1 − S0(t | Y,θ) + e−x′βS0(t | Y,θ)}2, (8)

where S0(t | Y,θ) and f0(t | Y,θ) are given by equations (5)and (6).

Assuming uninformative censoring, the likelihood for right-censored data is then given by

L(Y,θ,β) =

n∏i=1

fxi(ti | Y,θ,β)δiSxi(ti | Y,θ,β)1−δi . (9)

Standard adjustments are made to (9) for left-censored and/ortruncated data.

We assume some familiarity with MCMC techniques; seeTierney (1994) for an overview. Three MCMC algorithms forfitting the MPT PO model were considered. All algorithmsuse a simple Metropolis–Hastings step for updating the com-ponents (Y ε0, Y ε1) one at a time by sampling candidates (Y ∗

ε0,Y ∗

ε1) from a Dirichlet(mYε0, mYε1) distribution, where m >0, typically m = 20 or 30. This candidate is accepted as the“new” (Y ε0, Y ε1) with probability

ρ =

min

{1,

Γ(mYε0)Γ(mYε1)(Yε0)mY ∗ε0−wj2

(Yε1)mY ∗ε1−wj2L(Y∗,θ,β)

Γ(mY ∗

ε0

)Γ(mY ∗

ε1

)(Y ∗ε0

)mYε0−wj2(Y ∗ε1

)mYε1−wj2

L(Y,θ,β)

},

where j is the number of digits in the binary number ε0 andY∗ is the set Y with (Y ∗

ε0, Y∗ε1) replacing (Y ε0, Y ε1). These pa-

rameters have mixed reasonably well for many datasets unlessw is set to be very close to zero; see Hanson (2006).

When Yε1 ··· εk = 0.5 for k = 1, . . . , J and ε1 · · · εk ∈ {0, 1}kthe underlying parametric model is obtained and L(Y,θ,β) isequal to the corresponding parametric log-logistic likelihoodfunction. When the observed data are not highly unlike log-logistic generated data, a reasonable set of proposal distribu-tions for (β0, β1, . . . ,βp) and τ are obtained by consideringa normal approximation to the parametric log-logistic model,which further approximates the MPT model centered at thelog-logistic family. The log-logistic regression model in “stan-dard form” is given by

Yi = log(Ti) = x′iβ

s + σκi, (10)

where κi are i.i.d. standard logistic, the first component of(p + 1)-dimensional xi is 1, and βs = (β0, β

s1 , . . . ,β

sp)

′ includesthe baseline “intercept” term β0. It is well known that κi can

be approximated by 1.7εi where εiiid∼ N(0, 1) (e.g., Lam, Lee,

and Leung, 2002). Approximating logistic with Gaussian er-rors a scaled version of the standard normal-errors regressionmodel is obtained

Yi = log(Ti) = x′iβ

s + σ1.7εi. (11)

Define τ = 1/σ2. Let X denote the n × (p + 1) design matrix.When the prior p(βs, τ) ∝ τ−1 is specified the full conditionaldistributions are then

βs | τ,y ∼ Np+1

((X ′X)−1X ′y,

(X ′X)−11.72

τ

),

τ |βs,y ∼ Γ

(0.5n,

0.5

1.72

n∑i=1

(yi − x′iβ

s)2

). (12)

A censored Yi > y∗i can be sampled from [Yi |Yi > y∗

i , βs,τ ], namely an N(xi

′βs, 1.72/τ) distribution truncated to (y∗i ,

∞).The log-logistic version of equation (1) assum-

ing Gθ(t) = 1 − [1 + (e−β0t)√τ ] yields (β0, β1, . . . , βp) =

(β0,−√τβs

1 , . . . ,−√τβs

p). Assuming that the parametricmodel somewhat approximates the MPT generalizationforms the crux of the three MCMC algorithms.

(i) Algorithm 1 uses the full conditionals (12) as is. A can-didate τ ∗ is drawn from a Γ(0.5n, 0.5(1.72)−1

∑n

i=1(yi −x′iM

−1τ β)2) distribution and accepted with probability


ρ =

min

{1,

L(Y,β, τ ∗)(τ)0.5n−1e−0.5τ

∑n

i=1(yi−x′

iM−1

τ∗ β)2/1.72

L(Y,β, τ)(τ ∗)0.5n−1e−0.5τ ∗

∑n

i=1(yi−x′

iM−1

τ β)2/1.72

},

where Mτ is diagonal with entries m11 = 1 andmii = −

√τ for i = 2, . . . , p + 1. Similarly, a can-

didate β∗ is drawn from an Np+1(Mτ (X ′X)−1X ′y,Mτ (X ′X)−1Mτ1.72/τ) distribution and accepted withprobability

ρ = min

{1,

L(Y,β∗, τ)e−0.5(M−1τ β−(X ′X)−1X ′y)′(X ′X)−1(M−1

τ β−(X ′X)−1X ′y)τ/1.72

L(Y,β, τ)e−0.5(M−1τ β∗−(X ′X)−1X ′y)′(X ′X)−1(M−1

τ β∗−(X ′X)−1X ′y)τ/1.72

}.

(ii) Algorithm 2 uses the same proposal for updating τbut rather uses a scaled random-walk proposal forβ, namely β∗ ∼Np+1(β, hM τ (X ′X)−1Mτ1.72/τ) whereh = 0.2 or h = 0.3. β∗ is accepted with probability

ρ = min

{1,

L(Y,β∗, τ)

L(Y,β, τ)

}.

(iii) Algorithm 3 is used in Hanson and Yang (2006) andsimply uses a multivariate normal random-walk pro-posal for the parameter vector (α, λ, β1, . . . ,βp), whereα =

√τ and λ = e−

√τβ0 . The proposal covariance ma-

trix is a scaled version of the estimated asymptotic co-variance matrix of the maximum likelihood estimatefrom fitting the parametric log-logistic model.

We have found that Algorithm 2 provides superior mixingover Algorithms 1 and 3 across a variety of datasets and sim-ulations.

Note that a prior for (βs, τ) can be elicited for the under-lying log-logistic model using the approach of Bedrick, Chris-tensen, and Johnson (2000). This prior intuitively holds forthe more general MPT model. In Sections 6 and 7, we usea normal prior with a very large variance β ∼ Np(0, 106Ip),a diffuse gamma prior for τ , namely τ ∼ Γ(10−6, 10−6), andβ0 ∼ N(0, 106), all assumed a priori independent.

4. Model Comparison and DiagnosticsA general residual defined in Cox and Snell (1968) hasbeen widely used in a variety of regression settings. Aversion of this residual in the current Bayesian setup isri = − logE{Sxi(ti | Y,θ,β) | D}. Given Sxi(·),− logSxi(Ti)is distributed exp(1). The posterior expected value ofSxi(ti | Y,θ,β) provides a point estimate of this unknown.Therefore, if the model is “correct,” and under right-censoreddata, the pairs {(ri , δi)} are approximately a random right-censored sample from an exp(1) distribution, and the es-timated integrated hazard plot should be approximatelystraight with slope 1 (Nelson, 1972).

Bayes factors (Kass and Raftery, 1995; Han and Carlin,2001) are notoriously difficult to obtain in practice. Instead,for each model we compute the pseudo marginal likelihood(Geisser and Eddy, 1979), a component of so-called pseudoBayes factors, for model choice. Let p1(·) and p2(·) denoteprobability densities corresponding to models 1 and 2, respec-tively. The conditional predictive ordinate (CPO) for obser-vation j under model i is given by

CPOij = pi(Dj | D−j),

where D−j = {(xk, tk, δk)}k �=j . The ratio CPO1j/CPO2j mea-sures how well model 1 supports the observation Dj relativeto model 2, based on the remaining data D−j . CPO statisticsare surprisingly easy to estimate from standard MCMC out-put across a wide variety of models (e.g., Section 10.1, Chen,Shao, and Ibrahim, 2000). The product of the CPO ratiosgives an overall aggregate summary of how well supported

the data are by model 1 relative to model 2 and is called thepseudo Bayes factor:

B12 =

n∏j=1

CPO1j

CPO2j.

The log of the product of the n CPO statistics under a givenmodel is termed the log-pseudo marginal likelihood (LPML)statistic for that model, LPMLi = log

∏nj=1CPOij , and there-

fore B12 = exp(LPML1 − LPML2).

5. Extensions5.1 FrailtiesFrailties, random effects that account for clustering in survivaldata, are readily incorporated into the PO model. Considernow N sets of related survival times (e.g., rat litters, siblings,treatment centers, etc.). Let i = 1, . . . ,N denote the cluster,and j = 1, . . . , ni denote observations within a cluster. Thesurvival time Tij for the jth individual in cluster i is modeled

Sxij (tij)

1 − Sxij (tij)= e

−x′ij

β−γiS0(tij)

1 − S0(tij).

The data are now denoted D = {(tij ,xij , δij) : i = 1, . . . , N ;j = 1, . . . , ni}. The frailties are assumed to have a dis-tribution γ = (γ1, . . . , γN ) ∼ p(γ |π). Typically E(γi ) =0 or median(γi ) = 0 for identifiability, and marginally assum-ing γi |π ∼ N(0, π−1

i ) is very common. Simplifying further,

γ1, . . . , γN |π iid∼ N(0, π−1) is considered in Hanson and Yang(2006) for the MPT PO model. More involved dependencystructures can certainly be theorized (e.g., Lam et al., 2002;Banerjee and Dey, 2005; Zeng et al., 2005), and other para-metric frailty distributions can be considered (e.g., Sahu andDey, 2004). Nonparametric frailty distributions have also beeninvestigated (e.g., Walker and Mallick, 1997).

5.2 Transformation ModelsModel (1) can be written

logitSx(t) = −x′β + logitS0(t). (13)

Similarly, the proportional hazards model, given by Sx(t) =S0(t)

exp(x′β), can be written

− log{− logSx(t)} = −x′β − log{− logS0(t)}. (14)

Both models are of the form

qρ{Sx(t)} = −x′β + qρ{S0(t)}, (15)


where qρ(s) = log{ρsρ/(1 − sρ)}. Clearly, ρ = 1 gives POand ρ → 0+ yields proportional hazards. This model hasbeen termed the generalized odds-rate model; see Scharfstein,Tsiatis, and Gilbert (1998) and references therein for rigorousfrequentist semiparametric treatments of this model. We ex-tend (1) to (15) and incorporate the sampling of ρ within theGibbs sampler via a simple random-walk Metropolis–Hastingsstep.

Given Y,θ,β, and ρ > 0 the survival function for covariatesx is

Sx(t | Y,θ,β, ρ) =1

{1 + ex′βS0(t | Y,θ)−ρ − ex′β}1/ρ,

and the pdf is

fx(t | Y,θ,β, ρ) =ex′βf0(t | Y,θ)S0(t | Y,θ)−(1+ρ)

{1 + ex′βS0(t | Y,θ)−ρ − ex′β}1+1/ρ.

The likelihood (9) is changed accordingly to L(Y,θ,β, ρ).Let T x ∼ Sx(t). Then Sx(T x) ∼ U(0, 1) and equation (13)

can be written logitS0(T x) = x′β − W , where W is dis-tributed standard logistic. Similarly, equation (14) becomes−log{−logS0(T x)} = x′β − W , where W is distributed ex-treme value. In general, consider

q(Tx) = x′β −W, (16)

where q(·) is strictly decreasing and W is a random variable.Cheng, Wei, and Ying (1995, 1997) consider this model whenthe distribution of W is known but q(·) is unknown. Theyconsider the special cases of PO, proportional hazards, andthe generalized odds model (15), which yields eW ∼ Pareto(ρ)in equation (16).

6. Simulation StudyAs Yang and Prentice (1999) point out, ideally one wouldcompare asymptotic efficiency of competing point estima-tors. However, for most nonparametric estimators this isprohibitively difficult and the present MPT context is no ex-ception. Amewou-Atisso et al. (2003) show asymptotic poste-rior consistency for an accelerated failure time model with aPolya tree prior, but in general convergence rates, asymptoticefficiencies, etc. have not been established for more compli-cated Bayesian survival regression models.

Table 1Bias, coverage rates, and MSE comparison to the timereg’s package prop.odds() function

for simulated (n = 100) data

Censoring Par. Model MSE 95% cov. Bias

0% β1 MPT 0.1331 94% −0.0351timereg 0.1285 94% −0.0212

β2 MPT 0.0468 95% −0.0005timereg 0.0481 96% −0.0035

20%, uninformative β1 MPT 0.1391 95% −0.0196timereg 0.1352 94% −0.0059

β2 MPT 0.0498 94% 0.0114timereg 0.0411 95% −0.0240

20%, informative β1 MPT 0.1404 95% −0.0312timereg 0.1376 95% −0.0051

β2 MPT 0.0490 94% 0.0049timereg 0.0451 96% −0.0056

We directly compare the Bayes estimators derived fromthe model with an MPT prior to one frequentist estimatorin a small simulation study; an additional simulation is con-sidered in Hanson and Yang (2006). Although we cannot ingeneral expect Bayesian posterior probability intervals, of-ten termed credible intervals, to have coverage rates equalto the probability associated with the interval, it is still ofinterest to see what the actual coverage rates are in simu-lations. We have found in simulations that the actual cover-age rate is typically close to the probability associated withthe interval. For all simulations, 1000 Monte Carlo sampleswere taken. For each Monte Carlo iteration, the MCMC algo-rithm for the MPT model took 5000 iterates after a burn-inof 500.

Data were generated according to one of the simulationsconsidered in Murphy et al. (1997): xi1 ∼ Bernoulli(0.5) inde-pendent of xi2 ∼ exp(1). Then log (Zi ) = −xi1β1 − xi2β2 +εi, where ε1, . . . , ε100 are i.i.d. from a standard logistic distri-bution. For all simulations β1 = 0 and β2 = 1. Uncensoreddata (Ti = Zi ) were considered along with two types of rightcensoring. For 20% independent censoring, Ti = min{1.863,Zi}. For 20% dependent censoring, if xi1 = 0 then Ti = Zi ,otherwise when xi1 = 1, Ti = min{0.64, Zi}.

We compared posterior medians from the MPT modelcentered at the log-logistic family to an estimator basedon a modification of the partial likelihood presented inMartinussen and Scheike (2006, Section 8.2). This estimatoris implemented in the prop.odds() function for R, foundin the timereg package by Scheike and Martinussen at theWeb address http://www.biostat.ku.dk/∼ts/timereg.html.The frequentist estimator performs slightly better overall,with the mean squared error (MSE) ratios for estimat-ing β2 averaging about 1.1, and for β1 averaging about1.03; see Table 1. Clearly, little efficiency is lost by usingthe Bayesian nonparametric approach, and there are theadded benefits of being able to incorporate prior infor-mation and to compute predictive survival densities andhazard curves. The inclusion of even a modest amount ofgood prior information on β in either simulation wouldbe sure to markedly improve the MSE and bias of theestimators.


The generalized estimating equation and generalized em-pirical odds approaches of Cheng et al. (1995) and Yangand Prentice (1999), or the excellent prop.odds() function ofScheike and Martinussen may be preferred due to their easeof implementation when all that is desired are the estimatedregression coefficients and their standard errors. We preferthe Bayesian semiparametric approach when additional infer-ences, such as predictive densities, hazard curves, and func-tionals of these, are required. In addition, the semiparametricBayesian model that we propose can be adapted for use withtruncated data by simply modifying the likelihood (9).

The slightly increased MSE for MPT versus the frequentistestimators could be due in part to increased bias in the Bayesestimates from using posterior medians rather than modes,which are trickier to obtain. Although modest, the simula-tions performed here and in Hanson and Yang (2006) werequite time consuming and are simply meant to give an idea ofrelative performance of point estimators and coverage prob-abilities for the moderate sample size of n = 100 in two sit-uations where the data-generating mechanisms are known;making overarching generalizations based on two simulationstudies should be done with care. The simulations focus onpoint estimation and Bayesian estimators are almost alwaysbiased. We emphasize that the MPT approach may be pre-ferred when the estimation of predictive survival densities andhazard curves is desired. Again, having coverage rates equal tothe credible probability is not a property of credible intervalsin general, but it is comforting to see that this is approx-imately the case when using the Bayesian model solely forpoint estimation purposes.

7. Data AnalysisThe data considered are the well-known VA lung cancer trial(Prentice, 1973). As in Cheng et al. (1995), Murphy et al.(1997), and Yang and Prentice (1999) we consider a subgroupof n = 97 patients with no prior therapy. Six of the 97 sur-vival times are censored. Table 2 summarizes various fits tothese data including the MPT PO model presented in thisarticle, the maximum profile likelihood estimator (MPLE) ofMurphy et al. (1997), and a particular minimum distance es-timator (MDF) of Yang and Prentice (1999). Note that pos-terior medians and standard deviations obtained under theMPT model are very close to the MPLE estimates. Under theMPT model increasing the performance score by 20 increasesthe odds of surviving past any fixed time point by about 200%,e−(20)(−0.055) ≈ 3. In Figure 1 the predictive survival densities

Table 2Estimates and posterior standard deviations (MPT) or

standard errors for VA lung cancer data

Parameter MPT MPLE MDF

Score −0.055 (0.010) −0.055 (0.010) −0.034 (0.007)Adeno vs. 1.303 (0.559) 1.339 (0.556) 1.411 (0.674)

largeSmall vs. 1.362 (0.527) 1.440 (0.525) 1.353 (0.506)

largeSquamous −0.173 (0.580) −0.217 (0.589) 0.165 (0.653)

vs. large

Figure 1. Predictive densities, squamous, MPT with w =1; survival is in days.

for squamous with three values of performance score are plot-ted. Clearly, performance score markedly affects survival. Weemphasize that although routine to obtain from the MPT POmodel, predictive densities are not easily obtained from exist-ing methods (e.g., Murphy et al., 1997; Yang and Prentice,1999; Lam et al., 2002; Zeng et al., 2005). Furthermore, fullinference for such functionals such as the maximum (or min-imum) relative risk ξ1 = maxt>0 hx1(t)/hx2(t) or the time atwhich maximum excess risk occurs ξ2 = argmaxt>0{hx1(t) −hx2(t)} is also readily obtained. One functional of interest ismedian survival as a function of performance score. For squa-mous tumor type, a plot of this quantile versus performancescore is in Figure 2 with a pointwise 95% CI band. An overallincreasing trend is apparent, with a greater rate of change forlarger performance scores.

From looking at posterior credible intervals of differencesin regression coefficients, there are no significant differencesin survival odds for adeno versus small tumors, or large ver-sus squamous tumors. Figure 3 illustrates this with predictivedensities from the four tumor types for a performance scoreof 60.

A luxury of the MPT approach is the ability to comparenonnested Bayesian parametric and semiparametric modelsusing the LPML statistic. Several models were fit: the para-metric log-logistic model, the MPT PO model centered at

Figure 2. Median survival with 95% CI versus score forsquamous, w = 1.


Figure 3. Predictive densities, performance status = 60,MPT with w = 1.

the log-logistic family, proportional hazards and acceleratedfailure time models centered at the Weibull family (Hanson,2006), and the generalized odds-rate model centered at thelog-logistic family described in Section 5.2. The MPT modelsfixed J = 5 and w = 1. The MPT PO and log-logistic modelswere fit with the prior described in Section 3; the other mod-els were fit with uniform priors p(α, λ, β) ∝ 1 on a boundedhypercube; see Hanson (2006). Rounded to the nearest inte-ger, the LPML statistics are −508, −508, −511, −514, and−516 for the log-logistic, MPT PO model centered at the log-logistic family, MPT generalized odds rate centered at thelog-logistic family, MPT accelerated failure time model cen-tered at the Weibull family, and MPT proportional hazardsmodel centered at the Weibull family, respectively.

The proportional hazards model has by far the worst LPMLstatistic, and inferior model fit is reflected in the integratedCox–Snell residual plot in Figure 4. The MPT PO and para-metric log-logistic models have the largest LPML statistics,closely followed by the generalized odds-rate model. Thepseudo Bayes factor for comparing the PO model to the pro-portional hazards model is approximately e516−508 ≈ 3000. POversus accelerated failure time yields a pseudo Bayes factor ofabout 400. Assuming ρ ∼ U(0, 10) in the generalized odds-rate model, the posterior density of ρ | D has a mode at about1.55 and an equal-tailed 95% credible interval of (0.98, 4.78).

Figure 4. Integrated Cox–Snell hazard plots.

Among models in the generalized odds-rate class, PO, H0 :ρ = 1, is not rejected by a slim margin. A mode of ρ̂ = 1.55is in line with Cheng et al. (1997), who came to the sameconclusion using a graphical assessment of model fit.

In fitting the MPT PO model, Algorithm 2 (with h = 0.3)yielded a well-mixed representative sample from the posteriorwith 50,000 iterates, assessed from history plots of variousmodel parameters. Algorithm 3 required a 10-fold increase,about 500,000 iterates, and Algorithm 1 would get “stuck” atcertain values over thousands of iterates when w = 1. Increas-ing w, achieving a baseline model closer to the log-logisticfamily, greatly improved the mixing of Algorithm 1. Over-all we have found Algorithm 2 to outperform the others andrecommend it for general use.

8. Summary and Future ResearchWe have introduced a semiparametric PO model where base-line survival is assumed to follow an MPT prior. Inferenceis carried out through standard MCMC techniques. Exten-sions to random effects and generalized odds-rate models arepresented along with a modest simulation study and a re-analysis of the VA lung cancer trial data. We have found theproposed model easy to implement and highly flexible, oftencapturing multiple modes in predictive survival densities haz-ard functions.

Recently Sundaram (2006) considered a model for the oddsrate that incorporates time-dependent covariates x(t):

d

dt

{1 − Sx(t)

Sx(t)

}= exp{x(t)′β} d

dt

{1 − S0(t)

S0(t)

}. (17)

Inference follows Yang and Prentice (1999) using weightedempirical odds functions. This model can be fit assuming anMPT prior on S0 using methods presented in this article, andare especially tractable when x(·) is a jump function andxj

′(t) = 0 for t > 0. Assume that x(t) is constant on inter-vals [rj−1, rj ) for j = 1, 2, . . . , and let J(t) = max{j : rj ≤ t}.Then integrating both sides of equation (17) over [0, t) yields

1 − Sx(t)

Sx(t)=

J(t)∑j=1

ex(rj−1)′β

{1 − S0(rj)

S0(rj)− 1 − S0(rj−1)

S0(rj−1)

}

+ ex(rJ(t))′β

{1 − S0(t)

S0(t)−

1 − S0(rJ(t))

S0(rJ(t))

},

and fx(t) is obtained through differentiation. The likelihood(9) is constructed and inference proceeds as usual. The chal-lenging part is coming up with a general strategy for con-structing efficient proposal distributions for the Metropolis–Hastings sampling of model parameters.

We are also currently extending the model to multivari-ate data through the incorporation of frailties with differ-ent types of covariance structure. In general, each observa-tion Tij can have the frailty γij associated with it whereγi = (γi1, . . . , γini

)′ has a multivariate distribution. See Lamet al. (2002) and Zeng et al. (2005) for applications and ex-amples of inference assuming a multivariate Gaussian distri-bution on each vector of frailties γi ∼ Nni

(0,Σi). Banerjeeand Dey (2005) consider spatially varying conditional autore-gressive priors for frailties γi . We are investigating both multi-variate Gaussian and nonparametric frailty specifications withapplications to recurrent event data.


References

Amewou-Atisso, M., Ghosal, S., Ghosh, J. K., and Ra-mamoorthi, R. V. (2003). Posterior consistency for semi-parametric regression problems. Bernoulli 9, 291–312.

Banerjee, S. and Dey, D. K. (2005). Semi-parametric propor-tional odds models for spatially correlated survival data.Lifetime Data Analysis 11, 175–191.

Bedrick, E. J., Christensen, R., and Johnson, W. O. (2000).Bayesian accelerated failure time analysis with applica-tion to veterinary epidemiology. Statistics in Medicine19, 221–237.

Bennett, S. (1983a). Analysis of survival data by the propor-tional odds model. Statistics in Medicine 2, 273–277.

Bennett, S. (1983b). Log-logistic regression models for sur-vival data. Applied Statistics 32, 165–171.

Berger, J. O. and Guglielmi, A. (2001). Bayesian testing ofa parametric model versus nonparametric alternatives.Journal of the American Statistical Association 96, 174–184.

Chen, M.-H., Shao, Q.-M., and Ibrahim, J. G. (2000). MonteCarlo Methods in Bayesian Computation. New York:Springer-Verlag.

Cheng, S. C., Wei, L. J., and Ying, Z. (1995). Analysis oftransformation models with censored data. Biometrika82, 835–845.

Cheng, S. C., Wei, L. J., and Ying, Z. (1997). Predicting sur-vival probabilities with semiparametric transformationmodels. Journal of the American Statistical Association92, 227–235.

Cox, D. R. (1972). Regression models and life-tables (withdiscussion). Journal of the Royal Statistical Society, SeriesB 34, 187–220.

Cox, D. R. and Snell, E. J. (1968). A general definition ofresiduals (with discussion). Journal of the Royal Statisti-cal Society, Series B 30, 248–275.

Farewell, V. T. and Prentice, R. L. (1977). A study of distri-butional shape in life testing. Technometrics 19, 69–75.

Ferguson, T. S. (1973). A Bayesian analysis of some nonpara-metric problems. Annals of Statistics 1, 209–230.

Geisser, S. and Eddy, W. F. (1979). A predictive approachto model selection. Journal of the American StatisticalAssociation 74, 153–160.

Han, C. and Carlin, B. P. (2001). Markov chain Monte Carlomethods for computing Bayes factors: A comparative re-view. Journal of the American Statistical Association 96,1122–1132.

Hanson, T. (2006). Inference for mixtures of finite Polya treemodels. Journal of the American Statistical Association,in press.

Hanson, T. and Johnson, W. O. (2002). Modeling regressionerror with a mixture of Polya trees. Journal of the Amer-ican Statistical Association 97, 1020–1033.

Hanson, T. and Yang, M. (2006). Bayesian semiparametricproportional odds models. Technical Report, Division ofBiostatistics, University of Minnesota School of PublicHealth.

Ibrahim, J. G., Chen, M.-H., and Sinha, D. (2001). BayesianSurvival Analysis. New York: Springer-Verlag.

Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journalof the American Statistical Association 90, 773–795.

Lam, K. F., Lee, Y. W., and Leung, T. L. (2002). Modelingmultivariate survival data by a semiparametric randomeffects proportional odds model. Biometrics 58, 316–323.

Lavine, M. (1992). Some aspects of Polya tree distributions forstatistical modeling. Annals of Statistics 20, 1222–1235.

Mallick, B. K. and Gelfand, A. E. (1994). Generalized lin-ear models with unknown link functions. Biometrika 81,237–245.

Mallick, B. K. and Walker, S. G. (2003). A Bayesian semi-parametric transformation model incorporating frailties.Journal of Statistical Planning and Inference 112, 159–174.

Martinussen, T. and Scheike, T. H. (2006). Dynamic Re-gression Models for Survival Data. New York: Springer-Verlag.

Murphy, S. A., Rossini, A. J., and van der Vaart, A. W.(1997). Maximum likelihood estimation in the propor-tional odds model. Journal of the American StatisticalAssociation 92, 968–976.

Nelson, W. (1972). Theory and applications of hazard plottingfor censored failure data. Technometrics 14, 945–965.

Prentice, R. L. (1973). Exponential survivals with censoringand explanatory variables. Biometrika 60, 279–288.

Sahu, S. K. and Dey, D. K. (2004). On multivariate survivalmodels with a skewed frailty and a correlated baselinehazard process. In Skew-Elliptical Distributions and TheirApplications: A Journey beyond Normality, M. G. Genton(ed), 321–338. Boca Raton, Florida: CRC/Chapman &Hall.

Scharfstein, D. O., Tsiatis, A. A., and Gilbert, P. B. (1998).Efficient estimation in the generalized odds-rate class ofregression models for right-censored time-to-event data.Lifetime Data Analysis 4, 355–391.

Sinha, D. and Dey, D. K. (1997). Semiparametric Bayesiananalysis of survival data. Journal of the American Statis-tical Association 92, 1195–1212.

Sundaram, S. (2006). Semiparametric inference in propor-tional odds model with time-dependent covariates. Jour-nal of Statistical Planning and Inference 136, 320–334.

Tierney, L. (1994). Markov chains for exploring posteriordistributions (with discussion). Annals of Statistics 22,1701–1762.

Walker, S. G. and Mallick, B. K. (1997). Hierarchical gener-alized linear models and frailty models with Bayesiannonparametric mixing. Journal of the Royal StatisticalSociety, Series B 59, 845–860.

Walker, S. G. and Mallick, B. K. (1999). Semiparamet-ric accelerated life time model. Biometrics 55, 477–483.

Yang, S. and Prentice, R. L. (1999). Semiparametric inferencein the proportional odds regression model. Journal of theAmerican Statistical Association 94, 125–136.

Zeng, D., Lin, D. Y., and Yin, G. (2005). Maximum likeli-hood estimation for the proportional odds model withrandom effects. Journal of the American Statistical Asso-ciation 100, 470–483.

Received April 2005. Revised June 2006.Accepted June 2006.

Documents

Bayesian Semiparametric Proportional Odds Models