bayesian_analysis_hazard

Embed Size (px)

Citation preview

  • 8/7/2019 bayesian_analysis_hazard

    1/19

    Lifetime Data Anal

    DOI 10.1007/s10985-010-9181-x

    Bayesian analysis for monotone hazard ratio

    Yongdai Kim Jin Kyung Park Gwangsu Kim

    Received: 9 April 2009 / Accepted: 2 July 2010 Springer Science+Business Media, LLC 2010

    Abstract We propose a Bayesian approach for estimating the hazard functions under

    the constraint of a monotone hazard ratio. We construct a model for the monotone

    hazard ratio utilizing the Coxs proportional hazards model with a monotone time-

    dependent coefficient. To reduce computational complexity, we use a signed gamma

    process prior for the time-dependent coefficient and the Bayesian bootstrap prior for

    the baseline hazard function. We develope an efficient MCMC algorithm and illustrate

    the proposed method on simulated and real data sets.

    Keywords Bayesian bootstrap Censoring Monotone hazard ratio Orderrestriction Proportional hazards model

    1 Introduction

    Estimation and inference of two survival functions S1 and S2 under certain order restric-

    tions have received much attention in survival analysis. The most popular order restric-tion is the stochastic ordering, which assumes that S1(t) S2(t) for all t [0, ).The nonparametric estimator of the survival functions under the stochastic ordering

    were found by Brunk et al. (1966) for complete observations and Dykstra (1982)

    for right censored data, and asymptotic properties were studied by Paestgaard and

    Huang (1996). Bayesian approaches for stochastic ordering have been proposed by

    Arjas and Gasbarra (1996) and Gelfand and Kottas (2001). Also, uniform stochastic

    Y. Kim (B) G. KimSeoul National University, Seoul, Korea

    e-mail: [email protected]

    J. K. Park

    International Vaccine Institute, Seoul, Korea

    123

  • 8/7/2019 bayesian_analysis_hazard

    2/19

    Yongdai Kim et al.

    ordering, which assumes that S1(t)/S2(t) is nonincreasing/nondecreasing in t, has

    been considered by Dykstra et al. (1991) and Mukerjee (1996).

    Statistical inference under order restriction on hazard functions has also been con-

    sidered in the context of assessing the validity of the proportional hazards assumption.

    Gill and Schumacher (1987) and Deshpande and Sengupta (1995) proposed test sta-tistics for assessing the hypothesis of the proportional hazards against the monotone

    hazard ratio alternative, and Sengupta et al. (1998) developed a testing procedure for

    the increasing cumulative hazard ratio alternative. However, these methods do not give

    an estimation of the hazard ratio under order restriction.

    In this paper, we propose a Bayesian approach for estimating the hazard functions

    under the monotone hazard ratio constraint. We construct a model for the monotone

    hazard ratio using the Coxs proportional hazards model with a time-dependent coef-

    ficient that is monotone. An advantage of this model is that we can simultaneously

    estimate the monotone hazard ratio and assess the validity of the proportional hazardsassumption against the monotone hazard ratio alternative.

    We utilize a signed gamma process prior for the monotone hazard ratio. For the

    prior of the baseline hazard function, we could use gamma process (Kalbfleisch 1978;

    Kim and Lee 2003a) and beta process (Laud et al. 1998; Kim and Lee 2003a) pri-

    ors. Such priors, however, require extensive computation for obtaining the posterior

    because there are two nonparametric priors: one for the monotone hazard ratio and the

    other for the baseline hazard function. To reduce the computational burden, we utilize

    the Bayesian bootstrap (BB) prior proposed by Kim and Lee (2003b). The BB prior

    makes the problem conceptually parametric and yields a much simpler MCMC algo-rithm to compute, while still retaining the flexibility of nonparametric priors. Also,

    Kim and Lee (2003b) showed that the posterior obtained with the BB prior closely

    approximates the full Bayesian posterior with gamma or beta processes priors.

    The paper is organized as follows. In Sect. 2, the model and prior are presented. In

    Sect. 3, we first review the BB approach for the proportional hazards model and then

    develop an efficient MCMC algorithm for calculating the BB posterior numerically. In

    Sect. 4, we illustrate the proposed method on various data sets. In Sect. 5, we present

    concluding remarks.

    2 Model and prior

    Let (xsi , si ), s = 1, 2, i = 1, . . . , ns be observations of pairs of right censored right-censored time and censoring indicator. That is, xsi = min{tsi , csi } and si = I(tsi csi ) where xsi and csi are survival and censoring times, respectively.

    To model the monotone hazard ratio assumption, we propose the following propor-

    tional hazards model with a time-dependent coefficient: the hazard functions for the

    groups s=

    1, 2 are given as

    1(t) = (t)

    and

    123

  • 8/7/2019 bayesian_analysis_hazard

    3/19

    Bayesian analysis for monotone hazard ratio

    2(t) = exp (0 + 1 H(t))(t)

    where 0 (, ), 1 {1, 0, 1} and H() is a nondecreasing nonnegative func-tion with H(0)

    =0. Note that the hazard ratio is monotonically increasing, constant,

    or monotonically decreasing when 1 is 1,0 or 1, respectively. Hence, we can assessthe validity of the proportional hazards assumption using the posterior probability of

    1 being 0. Also, we can estimate the monotone hazard ratio by estimating 0, 1, and

    H, as the hazard ratio is given as

    2(t)/1(t) = exp (0 + 1 H(t)) .

    Note that the hazard ratio is modeled nonparametrically, as H is completely unspeci-

    fied.

    Remark Another advantage of the proposed model is that we could easily incorporate

    other covariates z, if they exist in the model, by setting

    1(t|z) = exp(z)(t)

    and

    2(t

    |z)

    =expz

    +0

    +1 H(t)(t).

    This is useful if we want to know whether the risk of one group decreases faster than

    that of the other group after adjusting for other risk factors such as age, gender, etc.

    For prior, we use standard parametric priors for 0 and 1 and a nonparametric prior

    for H. A priori, we let 0 N(0, 20 ) and Pr(1 = k) = 1/3 for k = 1, 0, 1. For H,a priori, we let H be a gamma process with mean H0 and precision parameter c > 0.

    That is, H is a nondecreasing stochastic process on [0, ) with independent incre-ments such that H(0) = 0 and H(t) H(s), s t follows a gamma distribution withmean H0(t)

    H0(s) and variance (H0(t)

    H0(s))/c. See Lo (1982) and Kalbfleisch

    (1978) for details of gamma processes. To reduce computational complexity, we use

    the BB prior for , which is explained in detail in Sect. 3.

    3 Posterior: Bayesian bootstrap approach

    In this section, we develop an efficient MCMC algorithm to calculate the BB posterior

    distribution. We first review the BB approach proposed by Kim and Lee (2003b) and

    present the corresponding MCMC algorithm.

    3.1 Bayesian bootstrap for the proportional hazards model: review

    The main idea of the BB approach for the proportional hazards model is to approxi-

    mate the full Bayesian posterior by the BB posterior that is proportional to the product

    123

  • 8/7/2019 bayesian_analysis_hazard

    4/19

    Yongdai Kim et al.

    of the empirical likelihood and prior. Let (x1, 1, z1() ) , . . . , (xn, n , zn ()) be obser-vations where xi are right-censored times (i.e., minimum of survival and censoring

    times), i are censoring indicators, and zi () are (time-dependent) covariates. Underthe proportional hazards model given as

    (t|z) = exp(z(t))(t)

    where (t|z) is the hazard function of the survival time with covariate z, the likelihoodfunction of = (,()) is

    L( ) =n

    i=1

    exp(zi (xi )

    )(xi )

    iexp

    xi

    0

    exp(zi (s))(s)ds

    =

    ni=1

    exp(zi (xi )

    )d(xi )

    iexp

    xi

    0

    exp(zi (s))d(s)

    , (1)

    where (t) = t0 (s)ds is the cumulative hazard function. Let q be the number ofdistinct, uncensored observations, and let 0 < t1 < < tq be the correspond-ing ordered, uncensored observations. Then, the empirical likelihood is obtained by

    assuming that is a step function having jumps only at t1, . . . , tq and replacing d(t)

    by (t) = (t) (t) in (1), which results in

    L E( ) =n

    i=1

    exp(zi (xi )

    )(xi )

    iexp

    k:tkxiexp(zi (tk)

    )(tk)

    . (2)

    For details of the empirical likelihood (2), see Andersen et al. (1993). Finally, the BB

    posterior of is defined to be proportional to the product of the empirical likelihood

    and prior.

    Remark There is an alternative empirical likelihood called the binomial form empir-

    ical likelihood. See Kim and Lee (2003b) for details. An advantage of the binomial

    form is that the resulting BB posterior can be obtained as a limit of full Bayesian

    posteriors. However, the computation is more difficult, and the BB posterior may not

    be proper. Therefore, we do not consider the binomial form empirical likelihood in

    this paper.

    An advantage of the BB approach is that the dimension of parameter, , is finite

    because we discretize to a step function with finitely many jumps. That is, the

    parameters in the empirical likelihood are and {(tk), k = 1, . . . , q}, and hence,the posterior distribution can be obtained easily using Bayes theorem.

    A technical difficulty in the BB approach is the choice of the prior for {(tk), k =1, . . . , q}. For this, Kim and Lee (2003b) proposed the following improper prior (BBprior):

    123

  • 8/7/2019 bayesian_analysis_hazard

    5/19

    Bayesian analysis for monotone hazard ratio

    () q

    k=1

    1

    (tk), (3)

    and showed that the resulting posterior is always proper, approximates the full Bayes-

    ian posterior well, and has desirable large sample properties. It is interesting to note

    that the marginal BB posterior of with the prior (3) turns out to be proportional to

    the Coxs partial likelihood times prior.

    Remark The BB approach does not require prior information on , which may be a

    disadvantage when we have prior information. However, we could incorporate prior

    information to the BB posterior by choosing the prior of accordingly. Suppose

    a priori follows a gamma process with mean 0 and precision parameter c > 0.

    Given that we could think of (tk) as an approximation of (tk) (tk1), wecould incorporate the prior information into the BB posterior by choosing the BB prioras

    () q

    k=1((tk))

    c(0(tk)0(tk1))1 exp (c(tk)) . (4)

    Remark A similar approach to the BB is to assume a piecewise constant hazard func-

    tion. That is, (t) is given by

    (t) = mk=1

    kI(sk1 < t sk)

    for some sequence 0 = s0 < s1 < s2 < < sm . See, for example, Arjas andGasbarra (1996) and Ibrahim et al. (2001). Nonetheless, we use the BB approach

    because it has more sound theoretical backgrounds (at least asymptotically) and pro-

    vides a simpler MCMC algorithm. In contrast, it is not easy to choose the break

    points s1, . . . , sm in the piecewise constant hazard model, and the computation of the

    posterior would be more difficult.

    3.2 Bayesian bootstrap posterior

    The parameter in the model is = (0, 1, H,). The likelihood of the proposedmodel is

    L( ) =2

    s

    =1

    nsi

    =1

    exp (0 + 1 H(xsi ))I(s=2) d(xsi )

    si

    exp

    xsi

    0

    exp (0 + 1 H(u))I(s=2) d(u) .

    The full Bayesian computation is extremely hard, as the likelihood involves terms like

    123

  • 8/7/2019 bayesian_analysis_hazard

    6/19

    Yongdai Kim et al.

    t

    0

    exp(1 H(s))d(s),

    which require the knowledge of sample paths of both H(t) and (t). To resolve thisproblem, we employ the BB approach as follows:

    Let 0 < t1 < t2 < < tq be the corresponding ordered distinct uncensoredsurvival times among the pooled sample, and let R(t) = {(s, i ) : xsi t} andD(t) = {(s, i ) : xsi = t, si = 1}. Let (tk) = (tk) (tk) = k, and weassume that (t) = tkt k. Then, the empirical likelihood of the proposed modelbecomes

    LE

    ( ) =q

    k=1

    d(tk)

    k exp

    (2,i )D(tk)(0 + 1 H(tk))

    exp

    k

    (s,i )R(tk)

    exp (0 + 1 H(tk))I(s=2)

    where d(t) is the cardinality of D(t). For prior ofks, we use the BB prior

    () =

    q

    k=1

    1

    k ,

    as in (3) where = (1, . . . , q ). Then, the BB posterior of is given by

    B B ( |Data) L E()(),

    where () = (0)(1)(H)().

    3.3 MCMC algorithm

    We use a Gibbs sampler algorithm in which the parameters 0, 1, and H are gen-

    erated sequentially from the conditional BB posteriors. We can easily generate 0 and

    1 using the Metropolis-Hastings (MH) algorithm with the following conditional BB

    posterior distributions:

    (0|1, , H, Data) exp

    0q

    k=1 (2,i )D(tk)

    1

    exp

    exp(0)

    qk=1

    k exp(1 H(tk))

    (2,i )R(tk)1

    (0),

    (5)

    123

  • 8/7/2019 bayesian_analysis_hazard

    7/19

    Bayesian analysis for monotone hazard ratio

    (1|0, , H, Data) exp1

    qk=1

    H(tk)

    (2,i )D(tk)1

    exp exp(0)

    qk=1

    k exp(1 H(tk))

    (2,i )R(tk)1(1).

    (6)

    Also, the conditional BB posterior distribution of k given 0, 1, H and data is a

    gamma distribution with mean k/k and variance k/2k, where k = d(tk) and

    k =

    (s

    ,i

    )R

    (tk

    )

    exp (0 + 1 H(tk))I(s=2) . (7)

    The difficult part is to generate H from the conditional BB posterior. To gen-

    erate H, we use the Gibbs sampler algorithm with the acceptance-rejection (AR)

    sampling technique (Ripley 2006). Note that the empirical likelihood depends on

    H through H(t1) , . . . , H(tq ), and so it suffices to generate W = (W1, . . . , Wq )from the conditional posterior where Wk = H(tk) H(tk1) and H(t0) = 0.In applying the Gibbs sampler algorithm to generate W, we need to generate Wkfrom its conditional distribution given 0, 1, , W

    (k) and data where W(k) =(W1, . . . , Wk

    1, Wk

    +1, . . . , Wq ).

    Identifiability issues arise. First, 0 and W1 are not identifiable in the empirical like-

    lihood, whereas 0 +W1 is identifiable. Other unidentifiable quantities in the empiricallikelihood are Wk for k > p where

    p = min{max{x1i : 1i = 1}, max{x2i : 2i = 1}}.

    Note that Wk for k > p are not used in the empirical likelihood when p = max{x2i :2i = 1}, as they affect the empirical likelihood through 0 + 1Wk + log k whenp

    =max

    {x1i

    :1i

    =1

    }, in which case Wk and k are not identifiable by the empirical

    likelihood. To avoid these identifiability issues, we let W1 = 0 and Wk = 0 for k > p,which is equivalent to using H0 instead of H0 in the prior parameter of the gammaprocess where H0 (t) = 0 for t < t1, H0 (t) = H0(t) H0(t1) for t1 t tp andH0 (t) = H0(tp) for t > tp.

    We now explain how to generate Wk from its conditional posterior distribution. Let

    H(l)k = H(tk) Wl . Then, the conditional posterior distribution ofWl for 2 l p

    given others = (0, 1, , W(l), Data) is given as

    (Wl |others) expWl 1q

    k=l

    (2,i )D(tk)

    1

    exp

    exp(1Wl )

    q

    k=lk exp

    0 + 1 H(l)k

    (2,i )D(tk)

    1

    123

  • 8/7/2019 bayesian_analysis_hazard

    8/19

    Yongdai Kim et al.

    Wvl 1l exp(cWl )I(Wl 0),

    where vl = c(H0 (tl ) H0 (tl1)). Let

    l =q

    k=l

    (2,i )D(tk)

    1

    and

    l =q

    k=lk exp

    0 + 1 H(l)k

    (2,i )D(tk)

    1

    .

    Then, the conditional posterior distribution of Wl is simplified as

    (Wl |others) = hl (exp(1Wl )) Wvl 1l exp(cWl )I(Wl 0), (8)

    where

    hl (y) = yl exp(l y). (9)

    Note that the maximum of hl (exp(1Wl )), say hl , on Wl

    (0,

    ) can be easily

    calculated and we can easily generate a random number from the gamma distribution.Hence, we can use the AR sampling technique for generating Wl from (8) as follows:

    1. Generate W Gamma(vl , c) where Gamma(a, b) is the gamma distribution withmean a/b and variance a/b2.

    2. Generate U Uniform(0, 1).3. Let Wl = W ifhl (exp(1W))/ hl U. Otherwise, go to 1.

    The MCMC algorithm for the BB posterior can be summarized as follows:

    Sampling

    0given

    1, , H and data: We use the random-walk MH algorithm. Let

    0 be a candidate value generated from a random-walk kernel q(0,

    0 ). Then,

    the acceptance rate is

    (0 |1, , H, Data)q(0 , 0)(0|1, , H, Data)q(0, 0 )

    where (0|1, , H, Data) is in (5). Sampling 1 given 0, , H and data: We generate 1 = h for h {1, 0, 1} with

    probability ph where

    ph =(h|0, , H, Data)

    l{1,0,1} (l|0, , H, Data)

    and (h|0, , H, Data) is in (6).

    123

  • 8/7/2019 bayesian_analysis_hazard

    9/19

    Bayesian analysis for monotone hazard ratio

    Sampling given 0, 1, H and data: For k = 1, . . . , q, generate k fromGamma(k, k) where = d(tk) and k is in (7).

    Sampling H given 0, 1, and data:

    Let Wk = H(tk) H(tk1). Let W1 = 0 and Wk = 0 for k > p. For l = 2, . . . , p1. Generate W Gamma(vl , c).2. Generate U Uniform(0, 1).3. Let Wl = W ifhl (exp(1W))/ hl U where hl is in (9). Otherwise, go to 1.

    Let H(t) = k:tkt Wk.

    4 Numerical experiments

    In this section, we illustrate the proposed model on various data sets. For prior param-

    eters, we let 20 = 10, H0(t) = log(1 + t) and c = 1.

    4.1 Simulation 1

    We let n1 = n2 = 50 and generated survival times of the first group from the expo-nential distribution with mean 20, and those of the second group from the exponential

    distribution with mean 30. Censoring times are generated from the exponential dis-

    tribution such that the censoring probability is 0.3. Note that the model used for thesimulation satisfies the proportional hazards assumption. We obtained the posterior

    distributions of using the proposed MCMC algorithm. We iterated the MCMC algo-

    rithm 100,000 times after a burn-in period of 10,000 iterations. Then, we collected

    2,000 samples at every 50th iteration after the burn-in for further analysis. We used

    a relatively extreme thinning (every 50th iteration) to make the samples almost inde-

    pendent, making further analysis easier.

    Figure 1 gives the traceplots and histograms of0 and H(t) and (t) at t = 20 (themean survival time of the first group) generated from the MCMC algorithm. The pro-

    posed MCMC algorithm converges well, and the posterior densities have nice shapes(at least, they are unimodal). Figure 2a shows how the empirical probability of1 = 0,calculated based on the generated samples from the MCMC algorithm, converges. The

    two dashed lines in the figure represent the 95% confidence interval obtained from the

    samples, assuming that they are independent. With the exception of the early stage of

    the iteration, the empirical probabilities lie inside the confidence limits, which implies

    that the MCMC algorithm converges well to its stationary distribution for 1, too.

    Figure 2b displays the posterior probabilities of 1, which supports the proportional

    hazards model because it has the largest value when 1 = 0.Figure 3 shows the acceptance probability of Wk for k

    =2, . . . , p in the AR

    sampling step inside the MCMC algorithm. The smallest acceptance probability is

    around 30%, which implies that the AR sampling step does not significantly hamper

    the overall computing time of the MCMC algorithm.

    Table 1 compares the Bayes estimator and 90% (equal-tail) posterior probability

    interval of0 with those obtained from the proportional hazards model (i.e., 1 = 0)

    123

  • 8/7/2019 bayesian_analysis_hazard

    10/19

    Yongdai Kim et al.

    the number of iteration

    gamma0

    the number of iteration

    H(20)

    the number of iteration

    Lambda(20)

    gamma0

    density

    H(20)

    density

    Lambda(20)

    density

    0 1000 2000 0 1000 2000 0 1000 2000

    3 1 1 0 2 4 6 8 0.1 0.3 0.5 0.7

    3

    2

    1

    0

    1

    2

    0

    2

    4

    6

    8

    10

    0.2

    0.3

    0.4

    0.5

    0.6

    0.

    0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0

    1

    2

    3

    4

    5

    (a)

    (b)

    Fig. 1 Panel a shows the traceplots of0, H(20) and (20), and panel b shows the corresponding histo-

    grams

    and corresponding frequentist counterpart. The posterior interval based on the pro-

    posed model is much wider than the other two intervals. This is because there is

    additional uncertainty in estimating H for the proposed model. However, all intervals

    contain the true value 0.4055.We conducted additional simulations to investigate the effect of the censoring prob-

    ability and sample sizes on the posterior distribution. Table 2 presents the posterior

    distributions of 1 for various values of the censoring probability and sample sizes.

    The results are stable and consistently support the proportional hazards model.

    123

  • 8/7/2019 bayesian_analysis_hazard

    11/19

    Bayesian analysis for monotone hazard ratio

    0 500 1000 1500 20000.5

    0.6

    0.7

    0.8

    0.9

    1.0

    the number of iterationTheempiricalposteriorprobability

    101

    gamma1

    Posteriorproba

    bility

    0.0

    0.2

    0.4

    0.6

    0.8

    ofgamma

    1

    (a) (b)

    Fig. 2 Panel a shows the traceplots of the empirical posterior probability of 1 = 0 (solid) with the 95%confidence limits (dashed), and panel b present the posterior probabilities of 1

    2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50k

    Acceptanceprobability

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Fig. 3 Acceptance probabilities of Wl for l = 2, . . . , p in the AR algorithm

    Table 1 Bayes estimator and 90% posterior probability interval of0 of the proposed model (MHR Mono-

    tone hazard ratio) with those obtained from the proportional hazards ( PH) model and corresponding frequ-

    entist results

    Method Point estimate 90% Interval

    BB with the MHR model 0.6438 (1.1583, 0.0898)BB with the PH model 0.6638 (1.0805, 0.2382)MLE with the PH model 0.6600 (1.0716, 0.2484)

    Table 2 The posterior probabilities of 1 = 1, 0 and 1 for various values of the censoring probabilityand sample sizes in simulation 1

    (n1, n2) 30% censoring 50% censoring

    (50,50) (0.0610, 0.8955, 0.0435) (0.0570, 0.8120, 0.1310)

    (100,50) (0.2215, 0.7595, 0.0190) (0.0555, 0.7980, 0.1465)

    (50,100) (0.0615, 0.6580, 0.3255) (0.0575, 0.7460, 0.1965)

    (100,100) (0.0010, 0.7590, 0.2400) (0.0315, 0.8855, 0.0830)

    123

  • 8/7/2019 bayesian_analysis_hazard

    12/19

    Yongdai Kim et al.

    gamma1

    Posterio

    rprobabilty

    0.0

    0.2

    0.4

    0.6

    0.8

    0 20 40 60 80 100

    4

    2

    0

    2

    4

    6

    time

    log(Ha

    zardratio)

    BayesTrue90%PB

    0 20 40 60 80 100

    0

    1

    2

    3

    4

    time

    Cumula

    tivelambda

    BayesTrue90%PB

    (a) (b) (c)

    Fig. 4 Panel a draws the posterior probability of 1, and panel b and c presents Bayes estimators of the

    log hazard ratio and with the pointwise 90% probability bands (PB) and true functions, respectively

    Table 3 The posterior probabilities of 1 = 1, 0 and 1 for various values of the censoring probabilityand sample sizes in Simulation 2

    (n1, n2) 30% censoring 50% censoring

    (50,50) (0.9550, 0.0445, 0.0005) (0.9505, 0.0465, 0.0030)

    (100,50) (1.0000, 0.0000, 0.0000) (0.9975, 0.0025, 0.0000)

    (50,100) (0.9915, 0.0085, 0.0000) (0.8630, 0.1345, 0.0025)

    (100,100) (1.0000, 0.0000, 0.0000) (0.9995, 0.0005, 0.0000)

    4.2 Simulation 2

    We let 2(t) = t11(t). The hazard ratio is increasing monotonically when > 1 and decreasing when < 1. We set = 0.5 and 1(t) = 1/20 to have amonotonically decreasing hazard ratio, and = 20/

    10 to make the mean survival

    time of the second group equal to 20. The other set-ups such as sample sizes, censor-

    ing probability, the number of iterations of the MCMC algorithm etc., are the same as

    those for the simulated data set 1.

    The posterior probability of 1 is given in Fig. 4a, which strongly supports the

    true model, monotonically decreasing hazard ratio. Figure 4b and c present the Bayesestimator and corresponding pointwise 90% posterior probability bands of the log

    hazard ratio (0 +1 H(t)) and cumulative baseline hazard function (t) with the trueones, respectively. Note that the true functions lie inside the probability bands, imply-

    ing that the proposed method estimates the monotone hazard ratio and cumulative

    baseline hazard function well.

    As is done for Simulation 1, Table 3 presents the posterior probabilities of 1 for

    various values of the censoring probability and sample sizes. All of the results strongly

    indicate that the hazard ratio is decreasing.

    4.3 Prior sensitivity

    Priors need to be specified for three parameters 0, 1 and H. Since 1 has a value

    among {1, 0, 1}, the uniform prior is a natural one. For 0, unless the prior variance

    123

  • 8/7/2019 bayesian_analysis_hazard

    13/19

  • 8/7/2019 bayesian_analysis_hazard

    14/19

    Yongdai Kim et al.

    0 500 1000 1500 20000.2

    0.4

    0.6

    0.8

    1.0

    the number of iteration

    Theempiricalposteriorprobability

    0 500 1000 1500 20000.2

    0.4

    0.6

    0.8

    1.0

    the number of iteration

    Theempiricalpos

    teriorprobability

    ofgamma1

    ofgamm

    a1

    (a) (b)

    Fig. 5 The panels a and b show the traceplots of the empirical posterior probability of 1 = 0, 1 (solid)with the 95% confidence limits (dashed) for the Leukemia and Ovarian data sets, respectively

    As is done in Simulation 1, Fig. 5 presents the traceplots of 0 and Figs. 6 and 7

    present the traceplots and corresponding histograms of 0, H(10) and (10) for the

    Leukemia and Ovarian data sets. It seems that there is no problem in the convergence

    of the MCMC algorithms.

    Figure 8 presents the posterior probabilities of1 for the two data sets, and Table 5

    gives the p-values of the three frequentist test statistics for the proportional hazards

    model against the monotone hazard ratio alternative, as well as the DIC (deviance

    information criterion, Spiegelhalter et al. (2002)) values and the effective numbers of

    parameters ( pD) of the proposed model with 1=

    1, 0 and 1, respectively.

    The GS1 and GS2 in Table 5 represent the test statistics proposed by Gill and

    Schumacher (1987), with the Gehan versus log-rank weights and Prentice versus log-

    rank weights, respectively, the DS is the test statistic proposed by Deshpande and

    Sengupta (1995). The DIC is calculated based on the marginal likelihood obtained

    by integrating out the baseline hazard function with respect to the prior. Because we

    used the BB prior, the resulting marginal likelihood becomes the partial likelihood.

    Note that the DIC is an extension of the AIC (Akaike information criterion), and the

    AIC works well with the partial likelihood Hjort and Claeskens (2006). Hence, it is

    reasonable to calculate the DIC with the marginal likelihood. The five methods, the

    posterior probability, three p-values, and DIC, indicated that the proportional hazardsassumption is valid for the Leukemia data set, but not for the Ovarian data set.

    Remark When we are interested in the validity of the monotone hazard ratio assump-

    tion, the frequentist tests are not valid because the rejection of the frequentist tests does

    not necessarily mean that the monotone hazard ratio is valid. In contrast, the Bayesian

    resultsthe posterior probability of1 and the DIC values, directly confirm whether

    the assumption of the monotone hazard ratio is valid.

    Remark Along with the DIC values for 1=

    1, 0 and 1, we calculated the DIC

    value of the model where 1 is random. The DIC value with random 1 would be

    expected to be smaller than that with 1 = 0 when the proportional hazards assump-tion is valid. The DIC and pD values with random 1 for the Leukemia and Ovarian

    data sets are 175.23, 1.20, and 128.41, 2.16 respectively, which do not confirm our

    conjecture. We find, however, that the DIC values are unstable, particularly when the

    123

  • 8/7/2019 bayesian_analysis_hazard

    15/19

    Bayesian analysis for monotone hazard ratio

    0 1000 2000

    4

    3

    2

    1

    0

    1

    the number of iteration

    gamma0

    0 1000 2000

    0

    2

    4

    6

    8

    the number of iteration

    H(10)

    0 1000 2000

    0.5

    1.0

    1.5

    2.0

    2.5

    the number of iteration

    Lambda(10)

    gamma0

    density

    4 2 0 20

    .0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    H(10)

    density

    0 2 4 6 80

    .0

    0.1

    0.2

    0.3

    0.4

    Lambda(10)

    density

    0.5 1.5 2.5

    1.5

    1.0

    0.5

    0

    .0

    (a)

    (b)

    Fig. 6 The Leukemia data set resultspanel a presents the traceplots of0, H(10), and (10) and panel

    b shoes the corresponding histograms

    proportional hazards assumption is valid. Note that the difference of the DIC values

    between 1=

    0 and 1 for the Leukemia data set is very small, whereas the posterior

    probabilities are much different. We think that the DIC may not be appropriate for

    our model because our model is semiparametric (i.e., the hazard ratio is completely

    unspecified), and the DIC is developed mainly for parametric models where the max-

    imum likelihood estimator is asymptotically Gaussian. We leave this problem as a

    future work.

    123

  • 8/7/2019 bayesian_analysis_hazard

    16/19

    Yongdai Kim et al.

    gamma0

    H(10)

    2

    0

    2

    4

    6

    0

    2

    4

    6

    0.2

    0.4

    0.6

    0.8

    1.0

    Lambda(10)

    gamma0

    density

    4 0 2 4 6

    H(10)

    density

    0 2 4 6

    Lambda(10)

    density

    0.0 0.4 0.8

    0.00

    0.05

    0.10

    0.15

    0.20

    0.2

    5

    0.30

    0.35

    0.0

    0.1

    0.2

    0.3

    0.4

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    0 1000 2000

    the number of iteration

    0 1000 2000

    the number of iteration

    0 1000 2000

    the number of iteration

    (a)

    (b)

    Fig. 7 The Ovarian data set resultspanel a presents the traceplots of0, H(10), and (10) and panel b

    shows the corresponding histograms

    For the Ovarian data set, in which the proportional hazard assumption is rejected

    against the monotone hazard ratio, we draw the Bayes estimator of the hazard ratio

    with the pointwise 90% probability bands in Fig. 9a. The figure suggests that the hazard

    ratio of the second group (stage II) over the first group (stage IIA) decreases steadily.

    We draw the Bayes estimators of the two cumulative hazard functions 1 and 2 with

    their pointwise 90% probability bands and the empirical cumulative hazard (ECH)

    123

  • 8/7/2019 bayesian_analysis_hazard

    17/19

    Bayesian analysis for monotone hazard ratio

    1 0 1

    gamma1

    Posteriorp

    robability

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    101

    gamma1

    Posterior

    probability

    0.0

    0.2

    0

    .4

    0.6

    0.8(b)(a)

    Fig. 8 Panels a and b present the posterior probabilities of 1 for the Leukemia and Ovarian data sets,

    respectively

    Table 5 P-values of the three frequentist test statistics forthe proportional hazards against monotone hazardratio and the DIC, and pD values for the proposed model with 1 = 1, 0, 1, respectively

    p-values DIC and pD

    GS1 GS2 DS 1 = 1 1 = 0 1 = 1

    Leukemia 0.6897 0.6807 0.1660 176.73, 1.39 174.64, 0.92 174.72, 1.28

    Ovarian 0.0571 0.0507 0.0298 127.52, 2.03 130.73, 1.05 133.46, 1.67

    100 200 300 400

    4

    3

    2

    1

    0

    1

    2

    time

    Hazardratio

    100 200 300 400

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    time

    Lambda1(t)

    BayesECH

    90% PB

    100 200 300 400

    0.0

    0.5

    1.0

    1.5

    2.0

    2.53.0

    time

    Lambda2(t)

    (a) (b) (c)BayesECH

    90% PB

    Fig. 9 Part a draws the Bayes estimator of H(t) with its poinwise 90% probability band, part b for 1and part c for

    2.

    functions in Fig. 9b and c, respectively. The Bayes estimators and ECH functions are

    close and are located inside the probability bands.

    5 Concluding remarks

    We proposed a Bayesian approach for estimating the two hazard functions under the

    monotone hazard ratio constraint and developed an efficient MCMC algorithm. We

    demonstrated with simulated and real data sets that the MCMC algorithm, based on

    the BB approach, converges well and provides reliable results.

    In this paper, we modeled the monotone hazard ratio nonparametrically. An alterna-

    tive model is a piecewise constant monotone hazard ratio, which provides information

    123

  • 8/7/2019 bayesian_analysis_hazard

    18/19

    Yongdai Kim et al.

    about when the hazard ratio changes. The proposed BB approach can be easily modi-

    fied to this model to save significant computational costs.

    The proposed model can be extended to a case where there are more than two haz-

    ard functions. Suppose there are three hazard functions 1, 2 and 3 with 2/1 and

    3/2 increasing monotonically. We can model 2 and 3 by

    2(t) = exp

    (2)

    0 + H(2)(t)

    1(t)

    and

    3(t) = exp

    (3)

    0 + H(2)(t) + H(3)(t)

    1(t)

    where H(2) and H(3) are two independent gamma processes a priori. The proposedMCMC algorithm can be easily modified for this model as well.

    Studying asymptotic properties of the posterior distribution is worth pursuing. With-

    out H, Kim and Lee (2003b) and Kim (2006) proved that the convergence rate of the

    BB and full Bayesian posteriors is 1/

    n. We think, however, that the convergence

    rate of the posterior of H to the true hazard ratio would be slower than 1/

    n, as the

    optimal convergence rate for the hazard function is typically slower than 1/

    n. This

    conjecture would partly explain the wider probability interval of0 for the proposed

    model compared to the results for the proportional hazards model in Table 1 and the

    wider probability band for 2 in Fig. 9c compared to that of1 in Fig.9b.

    Acknowledgment This work was supported by the Korea Science and Engineering Foundation (KOSEF)

    grant funded by the Korea government (MEST) (R01-2007-000-20045-0(2008)).

    References

    Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical methods based on counting processes.

    Springer, New York

    Arjas E, Gasbarra D (1996) Bayesian inference of survival probabilities under stochastic ordering con-

    straints. J Am Stat Assoc 91:11011109Brunk HD, Franck WE, Hanson DL, Hogg RV (1966) Maximum likelihood estimation of the distribution

    of two stochastically ordered random variables. J Am Stat Assoc 61:10671080

    Deshpande JV, Sengupta D (1995) Testing for the hypothesis of proportional hazards in two population.

    Biometrika 82:251261

    Dykstra RL (1982) Maximum likelihood estimation of the survival functions of stochastically ordered

    random variables. J Am Stat Assoc 77:621628

    Dykstra RL, Kochar S, Robertson T (1991) Statistical inference for uniform stochastic ordering in several

    population. Ann Stat 19:870888

    Gelfand AE, Kottas A (2001) Nonparametric Bayesian modeling for stochastic order. Ann Stat 53:865876

    Gill R, Schumacher M (1987) A simple test of the proportional hazards assumption. Biometrika 74:289300

    Hjort NL, Claeskens G (2006) Focussed information criteria and model averaging for Coxs hazard regres-sion model. J Am Stat Assoc 101:14491464

    Ibrahim JG, Chen MH, Sinha D (2001) Bayesian survival analysis. Springer-Verlag, New York

    Kalbfleisch JD (1978) Nonparametric Bayesian analysis of survival time data. J R Stat Soc Ser B 40:214

    221

    Kim Y, Lee J (2003) Bayesian analysis of proportional hazard models. Ann Stat 31:493511

    Kim Y, Lee J (2003) Bayesian bootstrap for proportional hazards models. Ann Stat 31:19051922

    123

  • 8/7/2019 bayesian_analysis_hazard

    19/19

    Bayesian analysis for monotone hazard ratio

    Kim Y (2006) TheBernstein-von Mises theorem for the proportional hazard model. Ann Stat 34:16781700

    Laud PW, Damien P, Smith AFM (1998) Bayesian nonparametric and covariate analysis of failure time data.

    In: Practical nonparametric and semiparametric Bayesian statistics. Springer, New York, pp 213225

    Lo AY (1982) Bayesian nonparametric statistical inference for Poisson point processes. Z Wahrsch Verw

    Gebiete 59:5566

    Mukerjee H (1996) Estimation of survival functions under uniform stochastic ordering. J Am Stat Assoc91:16841689

    Paestgaard JT, Huang J (1996) Asymptotic theory for nonparametric estimation of survival curves under

    order restriction. Ann Stat 24:16791716

    Ripley BD (2006) Stochastic simulation. Wiley, New York

    Sengupta D, Bhattacharjee A, Rajeev V (1998) Testing for the proportionality of hazards in two samples

    against the increasing cumulative hazard ratio alternative. Scand J Stat 25:637647

    Spiegelhalter DJ, Best N, Carlin B, Linde A (2002) Bayesian measures of model complexity and fit (with

    discussion). J R Stat Soc Ser B 64:583639

    13