Generalized Phase-Type Distribution and Competing Risks ... · School of Mathematics and Statistics Wellington, New Zealand 11 November 2016 Abstract Phase-type distribution has been

Generalized Phase-Type Distribution andCompeting Risks for Markov Mixtures Process

B.A. Surya∗

Victoria University of WellingtonSchool of Mathematics and Statistics

Wellington, New Zealand

11 November 2016

Abstract

Phase-type distribution has been an important probabilistic tool inthe analysis of complex stochastic system evolution. It was introducedby Neuts [28] in 1975. The model describes the lifetime distribution of afinite-state absorbing Markov chains, and has found many applications inwide range of areas. It was brought to survival analysis by Aalen [3] in1995. However, the model has lacks of ability in modeling heterogeneityand inclusion of past information which is due to the Markov property ofthe underlying process that forms the model. We attempt to generalizethe distribution by replacing the underlying by Markov mixtures process.Markov mixtures process was used to model jobs mobility by Blumen[10] et al. in 1955. It was known as the mover-stayer model describinglow-productivity workers tendency to move out of their jobs by a Markovchains, while those with high-productivity tend to stay in the job. Frydman[20] later extended the model to a mixtures of finite-state Markov chainsmoving at different speeds on the same state space. In general the mixturesprocess does not have Markov property. We revisit the mixtures model[20] for mixtures of multi absorbing states Markov chains, and proposegeneralization of the phase-type distribution under competing risks. Thenew distribution has two main appealing features: it has the ability tomodel heterogeneity and to include past information of the underlyingprocess, and it comes in a closed form. Built upon the new distribution, wepropose conditional forward intensity which can be used to determine rateof occurrence of future events (caused by certain type) based on available

∗Email address: [email protected]; Postal address: School of Mathematics and Statis-tics, Victoria University of Wellington, Gate 6 Kelburn PDE, Wellington 6140, New Zealand.

1

arX

iv:1

611.

0383

2v1

[st

at.M

E]

11

Nov

201

6

2 B.A. Surya

information. Numerical study suggests that the new distribution and itsforward intensity offer significant improvements over the existing model.

MSC2010 subject classifications: 60J20, 60J27, 60J28, 62N99Keywords: Markov chains; Markov mixtures process; phase-type dis-

tribution; forward intensity; competing risk and and survival analysis

1 IntroductionPhase-type distribution is known to be a dense class of distributions, which canapproximate any distribution arbitrarily well. It was introduced in 1975 by Neuts([28], [27]) and has found many applications in wide range of fields ever since.

When jumps distribution of compound Poisson process is modeled by phase-type distributions, it results in a dense class of Lévy processes (Asmussen [6]).The advantage of working under phase-type distribution is that it allows us to getsome analytically tractable results in applications: like e.g. in actuarial science(Rolski et al. [31], Albrecher and Asmussen [7], Lin and Liu [25], Lee and Lin[24]), option pricing (Asmussen et al. [5], Rolski et. al [31]), queueing theory(Badila et al. [9], Chakravarthy and Neuts [13], Buchholz et al. [12], Breuer andBaum [11], Asmussen [6]), reliability theory (Assaf and Levikson [8], Okamuraand Dohi [30]), and in survival analysis (Aalen [3], Aalen and Gjessing [2]).

The phase-type distribution F is expressed in terms of a Markov jump process(Xt)t≥0 with finite state space S = E ∪ {∆}, where for some integer m ≥ 1,E = {i : i = 1, ...,m} is non absorbing states and ∆ is the absorbing state, andan initial distribution π, such that F is the distribution of time until absorption,

τ = inf{t ≥ 0 : Xt = ∆} and F (t) = P{τ > t}. (1.1)

Unless stated otherwise, we assume for simplicity that the initial probabilityπ has zero mass on the absorbing state ∆, so that P{τ > 0} = 1. We also referto ∆ as the m+1th element of state space S, i.e., ∆ = m + 1. Associated withthe time propagation of X on the state space S is the intensity matrix Q. Thismatrix is partitioned according to the process moving to non-absorbing states Eand to single absorbing state ∆, and has the block-partitioned matrix form:

Q =

(T δ0 0

), (1.2)

where T is m×m− nonsingular matrix which together with exit vector δ satisfy

T1 + δ = 0, (1.3)

with 1 = (1, ..., 1)>, as the rows of the intensity matrix Q sums to zero. That isto say that the entry qij of intensity matrix Q satisfies the following properties:

qii ≤ 0, qij ≥ 0,∑j 6=i

qij = −qii = qi, (i, j) ∈ S. (1.4)

Generalized Phase-Type Distribution and Competing Risks 3

As δ is a non-negative vector, (1.3)-(1.4) implies that T to be a negative definitematrix, i.e., 1>T1 < 0. The matrix T is known as the phase generator matrixof X. The absorption is certain if and only if T is nonsingular, see [27], [6], [11].

Following Theorem 3.4 and Corollary 3.5 in [6] and by the homogeneity ofX, the transition probability matrix P(t) of X over the period of time (0, t) is

P(t) = exp(Qt), t ≥ 0, (1.5)

where exp(Qt) is the (m+ 1)× (m+ 1) matrix exponential defined by the series

exp(Qt) =∞∑n=0

1

n!(Qt)n. (1.6)

The entry qij has probabilistic interpretation: 1/(−qii) is the expected lengthof time that X remains in state i ∈ E, and qij/qi is the probability that whena transition out of state i occurs, it is to state j ∈ S, i 6= j. The representationof the distribution F is uniquely specified by (π,T). We refer among othersto Neuts [27], Asmussen [6] and Rolski et al. [31] for more details. FollowingTheorem 8.2.3 and Theorem 8.2.5 in [31] (or Proposition 4.1 in [6]), we have

F (t) = π>eTt1 and f(t) = π>eTtδ. (1.7)

In survival and event history analysis, e.g., Aalen [3], Aalen and Gjessing [2]and Aalen et al. [1], intensity function α(t), for t > 0, has been used to determinerate of occurrence of an event in time interval (t, t+ dt]. It is defined by

α(t)dt = P{τ ≤ t+ dt|τ > t}. (1.8)

Solution to this equation is given in terms of the distributions (1.7) by

α(t) =f(t)

F (t)=

π>eTtδ

π>eTt1. (1.9)

This intensity has been used in greater applications in biostatistics ([3], [1]).Recall that the underlying process that forms the phase-type distribution

(1.7) is based on continuous-time finite-state absorbing Markov chains. We at-tempt to generalize the distribution for non Markovian stochastic process.

Quite recently, Frydman [20] discussed mixture model of Markov chains mov-ing at different speed as a generalization of the mover-stayer model introduced byBlumen et al. [10]. The result was then applied to study the dynamics of creditratings, see Frydman and Schuermann [19]. As shown in [19], the mixture processdoes not have Markov property. We revisit the model by considering a mixtureof two absorbing Markov chains moving with different speed (intensity matrix).We derive conditional distribution of the time until absorption of the mixtureprocess and propose generalizations of (1.7). Due to non Markovian nature of the

4 B.A. Surya

mixture process, the generalized distribution allows for the inclusion of availablepast information of the mixture process and modeling heterogeneity. Built uponthe new distribution, we propose conditional forward intensity which extendsfurther (1.8) to determine rate of occurrence of future events based on availableinformation about the past of the underlying process. The new distribution andits intensity have the ability to capture path dependence and heterogeneity.

The organization of this paper is as follows. We discuss in Section 2 themixture process for further details. Section 3 presents some preliminary results,which are derived based on extending the results in [19], in particular to contin-uous time setting. The main results concerning the new phase-type distributionsand the forward intensity are given in Section 4. We also discuss in Section 4residual lifetime and occupation time for the mixture process. We further extendthe model in Section 5 for multi absorbing states (competing risks) and discussconditional joint distribution of the lifetime of the mixtures process and the type-specific events (risk). Subsequently, we derive the corresponding type-specificconditional forward intensity for the competing risks. Section 6 discusses nu-merical examples on married-and-divorced problem to compare the performanceof existing phase-type distributions and intensity with the new ones proposed.Numerical study suggests that the new distributions and forward intensity offersignificant improvements and reveal resemblances of data found e.g. in Aalen etal. [1] (p.222) and the common U-bend shape of lifetime found in the 2010 TheEconomist article. Section 7 summarizes this paper with some conclusions.

2 Markov mixtures processesIn this paper, we consider (Xt)t≥0 as a mixture of two Markov jump processes(XQ

t )t≥0 and (XGt )t≥0 both of which propagate on the same state space S =

E ∪ {∆}. The corresponding intensity matrix G for the process XG is given by

G =

(ΨT Ψδ0 0

), (2.1)

where Ψ is m ×m nonsingular matrix, whilst the m ×m matrix T and m × 1vector δ are the elements of the intensity matrix Q (1.2) of XQ satisfying theconstraint (1.3). As before, we refer i = m+ 1 to the absorbing state ∆.

Frydman and Schuermann [19] proposed the use of mixture models for theestimation in discrete time of credit rating dynamics. In their work, intensitymatrix G was taken in a simpler form where Ψ was given by diagonal matrix

Ψ = diag(ψ1, ψ2, ..., ψm). (2.2)

Furthermore, they considered the absorbing state ∆ corresponding to the cen-sored events such as bond issuing firms being default, the bond are withdrawnfrom being rated, expiration of the bond, or calling back of the bond, etc.


Markov mixture process is a generalization of mover-stayer model, a mixtureof two discrete-time Markov chains which was introduced by Blumen et al [10]in 1955 to model population heterogeneity in jobs mobility. In the mover-stayermodel [10], the population of workers consists of stayers (workers who alwaysstay in the same job category (ψi = 0)) and movers (workers who move accord-ing to a stationary Markov chain with intensity matrix Q). The estimation ofmover-stayer model was given by Frydman [21]. The extension to a mixture oftwo continuous-time Markov chains moving with different speeds (ψi 6= 0) wasproposed by Frydman [20]. Frydman and Schuermann [19] later on used the mix-ture process to model the dynamics of firms’ credit ratings. As empirically shownin [19], there is strong evidence that firms of the same credit rating may move atdifferent speed to other credit ratings, a feature that lacks in the Markov model.

The mixture is defined conditionally on the initial state. We denote by φ =1{X=XG} random variable having value 1 when X = XG and 0 otherwise, i.e.,

X =

{XG, φ = 1

XQ, φ = 0.(2.3)

Notice that when the random variable φ is independent of X having Bernoullidistribution with probability of success si0 (2.4), (2.3) reduced to regime switch-ing model for Markov jump process with changing intensity matrix Q and G.

For initial state X0 = i0 ∈ E, there is a separate mixing distribution:

si0 = P{φ = 1|X0 = i0} and 1− si0 = P{φ = 0|X0 = i0}, (2.4)

with 0 ≤ si0 ≤ 1. The quantity si0 has the interpretation as being the proportionof population with initial state i0 moving according to XG, whilst 1− si0 is theproportion that propagates according to XQ. It follows from (2.1) that in generalXQ and XG are different in rates at which the process leaves the states, i.e.,qi 6= gi, but under (2.2) both of them may have the same transition probabilityof leaving from state i ∈ E to state j ∈ S, i 6= j, i.e., qij/qi = gij/gi. Note thatwe have used qi and gi to denote diagonal element of Q and G, respectively.Thus, depending on the value of ψi (2.2), XG never moves out of state i whenψi = 0 (reduced to the mover-stayer model), moves out of state i at lower ratewhen 0 < ψi < 1 and moves at higher rate when ψi > 1 (or the same for ψi = 1)than XQ. If ψi = 1, for all i = 1, ...,m, it simplifies to simple Markov process.

The main feature of the Markov mixture process X (2.3) is that unlike itscomponent XQ and XG, X does not necessarily inherit the Markovian propertyof XQ neither XG; the conditional distribution of future state of X depends onits past information. Empirical evidence of this fact was given by Frydmand andSchuermann [19]. It was shown in [19] that the incorporation of the past infor-mation help improve the out-of-sample prediction of the Nelson-Aalen estimateof the cumulative intensity rate for risky bonds. This path dependence propertycertainly gives favorable feature in applications. Motivated by these findings,

6 B.A. Surya

we derive the phase-type distribution (of time until absorption τ (1.1)) of Xand introduced forward intensity associated with the distribution. This intensitydetermines the rate of occurrences of future events given available information.Also, we derive the residual lifetime of X and its occupation time in any statei ∈ S. Below we present some preliminaries needed to establish the results.

3 PreliminariesRecall that the mixture model (2.3) contains two regimes (processes), one at slowspeed, the other fast. The regime is however not directly observable, but can beidentified based on past information ([20], [19]). To incorporate past information,we denote by It− = {X(s), 0 ≤ s ≤ t−} the available realization of X prior totime t, and by Ii,t = It− ∪ {Xt = i} all previous and current information of X.The set It− may contain complete, partial or maybe no information about X.

The likelihoods of observing the past realization Ii,t of X under XQ and XG

conditional on the initial state X0 = i0 are given respectively, following [19], by

LQ :=P{Ii,t|φ = 0, X0 = i0} =∏k∈E

e−qkTk∏

j 6=k,j∈S

(qkj)Nkj ,

LG :=P{Ii,t|φ = 1, X0 = i0} =∏k∈E

e−gkTk∏

j 6=k,j∈S

(gkj)Nkj ,

(3.1)

where in the both expressions we have denoted subsequently by Tk and Nkj

the total time the process spent in state k for Ii,t, k ∈ S, and the number oftransitions from state k to state j, with j 6= k, observed in Ii,t; whereas qij andgij represent the (i, j)−entry of intensity matrix Q and G, respectively.

The likelihood is used to classify which regime the process X belongs to.Frydman [20] proposed EM algorithm for the maximum likelihood estimation(si, ψi, qi, 1 ≤ i ≤ m) of the mixture model parameters. The results are given by

qij =Nij∑j 6=iNij

qi and gij = ψiqij, j 6= i ∈ S. (3.2)

Throughout the remaining of this paper, we define quantity si(t) given by

si(t) = P{φ = 1|Ii,t}, i ∈ S, (3.3)

Note that in the case where t = 0, for which Ii,t = {X0 = i0}, si(0) = si0 (2.4).This quantity says given that the process ends in state i at time t and all its priorrealizations It−, the proportion of those moving according to XG is by si(t).

Next, we denote by Sn(t) diagonal matrix, with n ∈ {m,m+ 1}, defined by

Sn(t) = diag(s1(t), s2(t), ..., sn(t)), with Sn := Sn(0). (3.4)

Note that for the absorbing state i = m+1, sm+1(t) denotes the portion of thosearriving in the absorbing state according to XG. In special case when t = 0,


for which Ii,t = {X0 = i}, si is equal to the one given in (2.4). The transitionprobability matrices PQ(t) and PG(t) of the underlying Markov chain XQ andXG are given following (1.5) by PQ(t) = eQt and PG(t) = eGt. The result belowgives the transition matrix P of the mixture process in terms of Sm+1, PQ(t)and PG(t). The result was stated in [20]. Below we give the proof of the result.

Proposition 3.1 For a given t ≥ 0, the transition probability matrix P(t) overperiod of time (0, t) of the Markov mixture process X (2.3) is given by

P(t) = Sm+1eGt +

(Im+1 − Sm+1

)eQt, (3.5)

where we have defined by In, n× n identity matrix with n = m+ 1.

Proof The proof follows from applying the law of total probability along withthe use of Bayes’ theorem for conditional probability and (2.4). For i, j ∈ S,

pij(t) =P{Xt = j|X0 = i}=P{Xt = j, φ = 1|X0 = i}+ P{Xt = j, φ = 0|X0 = i}=P{Xt = j|φ = 1, X0 = i}P{φ = 1|X0 = i}

+ P{Xt = j|φ = 0, X0 = i}P{φ = 0|X0 = i}=pGij(t)si + pQij(t)(1− si),

where the last equality is due to the Markov property of the Markov chains XG

and XQ. This is the (i, j) entry of the transition probability matrix P (3.5). �

Depending on the availability of the past information Ii,t of X, the elementssi(t), of the information matrix Sm(t) (3.4) are given following (3.3) below.

Lemma 3.2 For a given t ≥ 0 and any state i, i0 ∈ S, we have

(i) in case of full information, i.e., Ii,t = {X(s), 0 ≤ s ≤ t},

si(t) =si0L

G

si0LG + (1− si0)LQ

, (3.6)

where si0, LG and LQ are defined respectively in (2.4) and (3.1).

(ii) in case of limited information with Ii,t = {X0 = i0} ∪ {X(t) = i},

si(t) =e>i0Sm+1P

G(t)ei

e>i0P(t)ei. (3.7)

(iii) in case of limited information with Ii,t = {X(t) = i},

si(t) =π>Sm+1P

G(t)eiπ>P(t)ei

, (3.8)

where ei = (0, .., 1, ...0)> is (m+ 1)× 1−vector with 1 at ith element.

8 B.A. Surya

Proof The proof of (3.6) is given on p.1074 in [19], whilst (3.8) and (3.7) are thematrix notation of the corresponding quantity si(t) given on p.1074 of [19]. �

Notice that the past information Ii,t of the mixture process X (2.3) is storedin matrix Sm+1(t) (3.4). Below we give the limiting value of si(t) (3.8) as t→∞.

Proposition 3.3 Let T (ΨT) have distinct eigenvalues ϕ(1)j (ϕ(2)

j ), j = 1, ...,m,

with ϕ(k)pk = max{ϕ(k)

j , j = 1, ...,m} and pk = argmaxj{ϕ(k)j }, k = 1, 2. Then,

limt→∞

si(t) =

1, if ϕ(2)

p2 > ϕ(1)p1

0, if ϕ(2)p2 < ϕ

(1)p1

π>SmLp2 (ΨT)ei

π>(

SmLp2 (ΨT)+(Im−Sm)Lp1 (T))

ei

, if ϕ(2)p2 = ϕ

(1)p1

for i ∈ E

{π>SmT−1δπ>T−1δ

, for i = m+ 1.

(3.9)where Lp1(T) is the Lagrange interpolation coefficients defined by

Lp1(T) =m∏

j=1,j 6=p1

(T− ϕ(1)j Im

ϕ(1)p1 − ϕ

(1)j

). (3.10)

Proof Recall that as the intensity matrices T and ΨT are negative definite,it is a known fact that their eigenvalues have negative real part, and that amongthem there is a dominant eigenvalue (with the smallest absolute value of thereal part) which is unique and real (see O’cinneide [29] p.27). Then, followingTheorem 2 in Apostol [4], we have for m×m intensity matrix T (ΨT), with mdistinct eigenvalues ϕ(j)

1 , ..., ϕ(j)m , for j = 1, 2, that exp

(Tt)has an explicit form:

exp(Tt)

=m∑k=1

exp(ϕ(1)k t)Lk(T), (3.11)

similarly defined for exp(ΨTt

). Applying (3.11) to (3.8) we arrive at (3.9). �

The transition probability of X conditional on Ii,t constitutes the basic build-ing block of constructing conditional phase-type distributions and forward inten-sity for mixture models (2.3). Using Lemma 3.2, this conditional probability isgiven below as an adaptation in matrix notation of eqn. (11) on p.1074 of [19].

Theorem 3.4 For any two times s ≥ t ≥ 0 and i, j ∈ S,

P{Xs = j|Ii,t} = e>i

(Sm+1(t)e

G(s−t) +[Im+1 − Sm+1(t)

]eQ(s−t)

)ej, (3.12)

where ei = (0, .., 1, ...0)> is (m+ 1)× 1−vector with 1 at the ith element.


Proof Following similar arguments to the proof of (3.5), the result is establishedby applying the law of total probability along with Bayes’ theorem, i.e.,

P{Xs = j|Ii,t} =P{Xs = j, φ = 1|Ii,t}+ P{Xs = j, φ = 0|Ii,t}=P{Xs = j|φ = 1, Ii,t}P{φ = 1|Ii,t}

+ P{Xs = j|φ = 0, Ii,t}P{φ = 0|Ii,t}=pGij(s− t)si(t) + pQij(s− t)(1− si(t)),

(3.13)

where the last equality is due to Markov property of XG and XQ. As (i, j) ∈ S,the last expression can be represented in the matrix notation by (3.12). �

It is clearly to see following (3.12) that, unless when Ψ = I,X does not inheritthe Markov and stationary property of XG and XQ, i.e., future occurrence of Xis determined by its past information Ii,t through its likelihoods and the age (t).

Putting the conditional probability (3.12) into a matrix P(t, s) gives a(t, s)−transition probability matrix of the mixture process X defined by

P(t, s) = Sm+1(t)eG(s−t) +

[Im+1 − Sm+1(t)

]eQ(s−t). (3.14)

Lemma 3.5 The transition probability matrix (3.14) has the composition

P(t, s) =

(F11(t, s) F12(t, s)

0 1

), (3.15)

where the matrix entries F11(t, s) and F12(t, s) are defined by

F11(t, s) = Sm(t)eΨT(s−t) + (Im − Sm(t))eT(s−t),

F12(t, s) = Sm(t)(ΨT)−1(eΨT(s−t) − Im

)Ψδ

+(Im − Sm(t))T−1(eT(s−t) − Im

)δ.

Proof Using the property of matrix exponential (1.6), it can be shown that

eQt =

(eTt T−1(eTt − Im)δ0 1

)and eGt =

(eΨTt (ΨT)−1(eΨTt − Im)Ψδ

0 1

).

By partitioning identity matrix Im+1 and Sm+1 (3.4) into block diagonal matrix

Im+1 =

(Im 00 1

)and Sm+1(t) =

(Sm(t) 0

0 sm+1(t)

),

the transition matrix P(t, s) has the composition given by (3.15). �The result gives a generalization to the transition matrix P(t) (3.5).

Corollary 3.6 Let Ψ = I. Then, the transition probability of X is given by

P(t, s) = eQ(s−t),

implying that the mixture process is a simple homogeneous Markov chain.

10 B.A. Surya

4 Generalized phase-type distributionsThis section presents the main results of this paper. First, we derive conditionalphase-type distributions F i(t, s) = P{τ > s|Ii,t} and its conditional densityfi(t, s) := − ∂

∂sFi(t, s) of the Markov mixtures process (2.3) given the past infor-

mation Ii,t. Secondly, we derive the unconditional distribution F (t) = P{τ > t}and its density f(t) := − d

dtF (t) of the phase-type and study the dense and clo-

sure properties of the distributions using its Laplace transform. Afterwards, wediscuss forward intensity and express the distributions in terms of the intensity.

4.1 Conditional distributions

Theorem 4.1 For a given t ≥ 0 and information set Ii,t, the generalized con-ditional phase-type distribution and its density are given for any s ≥ t by

F i(t, s) = e>i

(Sm(t)eΨT(s−t) +

(Im − Sm(t)

)eT(s−t)

)1m,

fi(t, s) = e>i

(Sm(t)eΨT(s−t)Ψ +

(Im − Sm(t)

)eT(s−t)

)δ,

(4.1)

for any initial state i ∈ E X started in at time t and is zero for i = m+ 1.

Under full information, we observe that the distribution has path dependenceon its past information Ii,t through the appearance of matrix Sm(t) (3.4). How-ever, when the information is limited to only the current (and initial) state, thedistribution forms a non stationary function of time t. In any case, the proposeddistribution has the ability to capture heterogeneity of object of interest repre-sented by the speed matrix Ψ (2.1). These two important properties are removedwhen we set Ψ being equal to m×m−identity matrix Im. As a result, we have,

Corollary 4.2 Let Ψ = Im. Then the conditional distributions (4.1) reduced to

F i(t, s) = e>i eT(s−t)1m and fi(t, s) = e>i e

T(s−t)δ. (4.2)

Remark 4.3 Observe that the conditional distribution (4.2) can be obtainedstraightforwardly from (1.7) by setting π = ei and replacing the time t in (1.7)by s− t. However, this is in general not possible for obtaining (4.1) from (4.3).

Furthermore, the new distribution has two additional parameters representedby Sm and Ψ responsible for the inclusion of past information and modelingheterogeneity. It is fully characterized by the parameters (π,T,Ψ,Sm), whichin turn generalizing the existing distribution (1.7) which is specified by (π,T).

Proof As τ (1.1) is the time until absorption of X, we have by (3.12)-(3.14)

Fi(t, s) =m∑j=1

P{Xs = j|Ii,t} =m∑j=1

e>i P(t, s)ej = e>i P(t, s)(1m+1 − em+1

).


On account of the fact that 1m+1 − em+1 = (1m, 0)>, the claim for distributionF i(t, s) follows on account of (3.15). The density fi(t, s) is obtained by takingpartial derivative of F i(t, s) w.r.t s and then applying the condition (1.3). �

Observe following the above proof that the (m + 1)th element sm+1(t) ofSm+1(t) (3.4) does not play any role in getting the final result (4.1). This elementcorresponds to the portion of population arriving in the absorbing state at speedΨδ. So, necessarily we can set sm+1 = 0 which we will use in numerical example.

4.2 Unconditional distributions

Using the main result (4.1) for t = 0 and summing up over all i ∈ E for eachgiven weight P{X0 = i}, we obtain the unconditional phase-type distribution forthe mixture process. This unconditional distribution will be used to study denseand closure properties of the distribution through its Laplace transform.

Theorem 4.4 The unconditional distribution is given for any t ≥ 0 by

F (t) =π>(Sme

ΨTt +(Im − Sm

)eTt)1m,

f(t) =π>(Sme

ΨTtΨ +(Im − Sm

)eTt)δ.

(4.3)

For further reference throughout the remaining of this paper, we refer toGPH(π,T,Ψ,Sm) as the generalized phase-type distribution (4.3) with param-eters (π,T,Ψ,Sm), and to PH(π,T) for the existing distribution (1.7). SettingSm = αIm, with 0 ≤ α ≤ 1, in (4.3) leads to the convex mixture of PH(π,ΨT)and PH(π,T), i.e., GPH(π,T,Ψ, αI) = αPH(π,ΨT) + (1−α)PH(π,T). Thus,(4.3) can be seen as a generalized mixture of the phase-type distributions (1.7).

Corollary 4.5 By setting Ψ = Im in (4.3), the distributions reduced to (1.7).

Proof The proof is based on similar arguments employed before to establish(4.1) and by conditioning X on its initial position X0. Using the result in (4.1),

F (t) =m+1∑i=1

m∑j=1

P{X0 = i}P{Xt = j|X0 = i}

= (π πm+1)>P(0, t)

(1m+1 − em+1

),

where we have used (π, πm+1)> =

∑m+1i=1 P{X0 = i}e>i , whilst πm+1 is the mass

of initial distribution on the absorbing state. Our claim for survival function F (t)

is complete on account of (3.15). On recalling (1.3), we have f(t) = −F ′(t). �

The theorem below gives the Laplace transform of (4.3) and its nth moment.

12 B.A. Surya

Theorem 4.6 Let F be the phase-type distribution GPH(π,T,Ψ,Sm). Then,

(i) the Laplace transform F [θ] =∫∞0e−θuf(u)du is given by

F [θ] = π>(Sm(θIm −ΨT

)−1Ψ + (Im − Sm)

(θIm −T

)−1)δ. (4.4)

(ii) the nth moment, for n = 0, 1, ..., of τ (1.1) is given by

E{τn} = (−1)nn!π>(Sm(ΨT)−n + (Im − Sm)T−n

)1m. (4.5)

Remark 4.7 As the conditional distribution (4.1) is a function of two time-variables t and s, with s ≥ t, we can reparameterize it in terms of initial time tand remaining time ν := s− t ≥ 0 such that, e.g., density can be rewritten as

fi(t, ν) = e>i

(Sm(t)eΨTνΨ + (Im − Sm(t))eTν

)δ,

whose Laplace transform F(i)t [θ] =

∫∞0e−θνfi(t, ν)dν is given below for any fixed

t ≥ 0 and i ∈ E. The result is obtained in the same way as before to get (4.4).

F(i)t [θ] = e>i

(Sm(t)(θIm −ΨT)−1Ψ + (Im − Sm(t))(θIm −T)−1

)δ. (4.6)

So, there is similarity between the Laplace transform (4.4) and (4.6). For (4.6)the initial distribution π has mass one on state i and Sm has dependence on t.

Proof As the sub-intensity matrix T and Ψ are nonsingular, then followingLemma 8.2.5 in [31] (θIm −T) and (θIm −ΨT) are nonsingular for each θ ≥ 0whose all entries are rational functions of θ ≥ 0. Furthermore, for all θ ≥ 0,∫ ∞

0

e−θueBudu =(θIm −B

)−1,

for any nonsingular matrix B. The Laplace transform of (4.3) is therefore

F [θ] =

∫ ∞0

e−θu(π>(Sme

ΨTuΨ + (Im − Sm)eTu)δ)du

=π>Sm

(∫ ∞0

e−θueΨTudu)Ψδ + π>(Im − Sm)

(∫ ∞0

e−θueTudu)δ

=π>Sm(θIm −ΨT

)−1Ψδ + π>

(Im − Sm

)(θIm −T

)−1δ.

Following the same Lemma 8.2.5 of [31], we have for all θ ≥ 0 and n ∈ N,

dn

dθn(θIm −B

)−1= (−1)nn!

(θIm −B

)−(n+1),


for any nonsingular matrix B. On account of this, nth derivative of F [θ] is

dn

dθnF [θ] =π>

(Sm

dn

dθn(θIm −ΨT

)−1Ψ + (Im − Sm)

dn

dθn(θIm −T

)−1)δ

=(−1)nn!π>(Sm(θIm −ΨT

)−(n+1)Ψ + (Im − Sm)

(θIm −T

)−(n+1))δ.

The claim is established by setting θ = 0 and taking into account of (1.3). �

When Ψ = I, (4.4)-(4.5) reduced to the one given by Theorem 8.2.5 in [31]:

F [θ] =π>(θIm −T)−1δ,

E{τn} =(−1)nn!π>T−n1m.(4.7)

4.3 Relationship with the existing PH distribution

In this section we express the relationship between unconditional distribution(4.3) GPH(π,T,Ψ,Sm) and the existing distribution PH(π,T) (1.7). We givetwo main examples of (4.3) which encloses Erlang distribution and its mixtures.

Lemma 4.8 The distribution GPH(π,T,Ψ,Sm) has PH(π, T) representation,

π> =((Smπ)>, ((Im − Sm)π)>

)and T =

(ΨT 00 T

). (4.8)

Remark 4.9 Based on the Laplace transform (4.6), the same conclusion isreached as in Lemma 4.8 for the conditional distribution (4.1) for which case

π> =((Sm(t)ei)

>, ((Im − Sm(t))ei)>) and T =

(ΨT 00 T

). (4.9)

Proof Following the Laplace transform (4.4), define π1 = Smπ, δ1 = Ψδ,π2 = (Im−Sm)π and δ2 = δ. Respectively, the pairs (π1, δ1) and (π2, δ2) definethe initial proportion of population and their exit rate to absorbing state movingat speed ΨT and T. Having defined these, we can rearrange (4.4) as:

F [θ] =π>1(θIm −ΨT

)−1δ1 + π>2

(θIm −T

)−1δ2

=(π>1 π>2 )

((θIm −ΨT)−1 0

0 (sIm −T)−1

)(δ1

δ2

).

Notice that we have made the advantage of Sm and (Im − Sm) being symmetricmatrices. On recalling that (θIm −ΨT) and (θIm − T) are square nonsingularmatrices, the above block diagonal matrix can be further simplified as follows(

(θIm −ΨT)−1 00 (θIm −T)−1

)=

((θIm −ΨT) 0

0 (θIm −T)

)−1.

14 B.A. Surya

To arrive at the representation of (4.4) in the framework of Laplace transform(4.7) of the existing phase-type distribution, define the following matrices

I2m =

(Im 00 Im

)and T =

(ΨT 00 T

).

As a result of having defined the two matrices I2m and T, the Laplace trans-form (4.4) of the generalized phase-type distribution (4.3) is finally given by

F [θ] = π>(θI2m − T

)−1δ, (4.10)

which falls in the form of (4.7) with π> =(π>1 ,π

>2

)and δ = (δ1, δ2)

>. �

Remark 4.10 The relationship between the distribution GPH(π,T,Ψ,Sm) andPH(π, T) is that the adjusted intensity matrix T for distribution PH(π, T) is ofblock diagonal matrix of size 2m× 2m. That is to say that the transient state Efor the underlying Markov chains that form the phase-type distribution PH(π, T)is given by E = E1 ∪ E2, E1 = {1, ...,m} and E2 = {m+ 1, ..., 2m}, with initialdistribution π. Denote by π1 and π2 the distributions of the Markov chains on thetransient states E1 and E2, i.e., π> = (π>1 , π

>2 ). The corresponding distribution

π of the Markov mixture process is determined following (4.8) by π = π1 + π2.As Sm is a diagonal matrix, each element si, i = 1, 2, ...,m, of Sm is given bysi = π

(i)1 /(π

(i)1 + π

(i)2

), where π

(i)1 and π

(i)2 are the i−th element of π1 and π2,

provided that π(i) = π(i)1 + π

(i)2 6= 0. Otherwise, si can be arbitrarily chosen.

Recall following (4.8) that since T = diag(ΨT,T) and ΨT and T are squarematrices, we have eTt = diag

(eΨTt, eTt

). Therefore, following (1.7) we have

F (t) = π>eTt1 and f(t) = π>eTtδ, which are just the same results as in (4.3).

Below we give two examples of unconditional phase-type distribution (4.3)which encloses known distributions such as Erlang distribution and its mixture.

Example 4.11 Let GPH(π,T,Ψ,Sm), have initial distribution π = (1, 0, 0)>,

T =

−β2 β2 00 −β2 β20 0 −β2

, Ψ =

β1/β2 0 00 β1/β2 00 0 β1/β2

, Sm =

α 0 00 α 00 0 α

.

In view of Lemma 4.8, GPH(π,T,Ψ,Sm) can be represented by PH(π, T

)

π =

α00

1− α00

, T =

−β1 β1 0 0 0 0

0 −β1 β1 0 0 00 0 −β1 0 0 00 0 0 −β2 β2 00 0 0 0 −β2 β20 0 0 0 0 −β2

,


which is found to be the familiar form of mixture of Erlang distribution whosedensity function is given following (4.3) and definition of exponential matrix by

f(t) = αf(t;m,β1) + (1− α)f(t;m,β2).

where f(t;m,βi), for i = 1, 2 and m = 3, is Erlang distribution defined by

f(t;m,βi) =1

(m− 1)!tm−1βmi e

−tβi . (4.11)

Example 4.12 (Erlang distribution) If we set α = 1 or β2 = β1, i.e., Ψ = I,in Example 4.11, it results in the Erlang distribution Erl(m,β1) with m = 3:

π =

100

, T =

−β1 β1 00 −β1 β10 0 −β1

,

whose density f(t;m,β1) and distribution F (t;m,β1) are given by (4.11) and

F (t;m,β1) = 1−m−1∑k=0

(β1t)k

k!e−β1t. (4.12)

4.4 Closure and dense properties

In this section we prove that the generalized phase-type distribution (4.1) (and4.3) is closed under convex mixture and convolution, i.e., the sum and convexcombinations of random variables with phase-type distributions again belong tophase-type distributions. These two properties emphasis additional importanceof the phase-type distribution as a probabilistic tools in applications. For theexisting phase-type distribution, the closure properties were established in [8].We refer also to [31] and [11]. The proof is established based on Theorem 4.6 andby taking the advantage of the already known closure properties of the existingdistribution. Based on closure properties, we show by adapting similar argumentsin [31] that the generalized distribution forms a dense class of distribution.

To establish the closure properties, the following result is required.

Lemma 4.13 Let p > 0 and Z ∼ GPH(π,T,Ψ,Sm). Then,

pZ ∼ GPH(π,T/p,Ψ,Sm). (4.13)

Proof Let Z = pZ. Following Lemma 4.8 and Laplace transform (4.10), we have

F [θ] = E{e−θZ

}= π>

(θI2m − T/p

)−1δ/p.

The proof follows on account T /p12m + δ/p = 0, which establishes (4.13). �

Taking into account of Theorem 4.6 on the Laplace transform along withLemma 4.8 and Remark 4.10, we show in the theorem below that the generalizedphase-type distribution is closed under convex mixtures and finite convolutions.

16 B.A. Surya

Theorem 4.14 The generalized phase-type distribution GPH(π,T,Ψ,Sm) pos-sesses closure property under finite convex mixtures and finite convolutions.

Proof To start with, let Zk ∼ GPH(π(k),T(k),Ψ(k),S(k)m ), for k = 1, 2. We

assume that the transient state space for Z1 is given by E1 = {1, ...,m} and byE2 = {m+ 1, ..., 2m} for Z2. By representation PH(π(k), T(k)), we have

F (k)[θ] := E{e−θZk} = (π(k))>(θI2m − T (k)

)−1δ(k),

where the vectors π(k) and δ(k) and matrix T (k) are given by (4.8):

π(k) =(S(k)m π(k), (Im − S(k)

m )π(k))>

and T(k) =

(Ψ(k)T(k) 0

0 T(k)

).

By Theorem 8.2.7 in [31], it is known that Z = pZ1+(1−p)Z2 is again PH(π, T),

π =

{pπ(1), on E1

(1− p)π(2), on E2

and T =

(T (1) 0

0 T (2)

), (4.14)

for 0 < p < 1. We deduce from (4.14) that new transition matrix T preservesthe block diagonal nature of intensity matrix for the underlying Markov mixtureprocess X (2.3) moving according to T(1) and T(2) on the transient states E1 andE2 with initial distribution pπ(1) and (1−p)π(2) on each of which. Taking p = 1/2,we conclude that Z = 1

2(Z1+Z2) is again PH(π, T). Again, by the use of Laplace

transform, it follows from (4.13) that 2Z ∼ PH(π, T/2) moving at half speedin transient states and to absorbing states. As a result, Z1 + Z2 ∼ PH(π, T/2)proving closure property of the distribution under finite convolutions.

Adapting similar arguments of [31], the above implies that for n ≥ 2 andany vector probability p = (p1, ..., pn)>, with 0 < pk < 1 and

∑nk=1 pk = 1, the

mixtures∑n

k=1 pkZk belongs to PH(π, T) on transient state E =⋃nk=1Ek with

πi = pkπ(k) if i ∈ Ek and T =

T (1) 0 . . . 0

0 T (2) . . . 0...

... . . . ...0 0 . . . T (n)

,

with |Ek| = m. The conclusion over finite convolutions is reached when we setpk = 1/2 for all k = 1, ..., n resulting in Z = 1/2

∑nk=1 Zk ∼ PH(π, T). Again,

by (4.13) 2Z ∼ PH(π, T/2) from which we deduce∑n

k=1 Zk ∼ PH(π, T/2). �

Theorem 4.15 (Dense property) The family of generalized phas-type distri-bution (4.1) (as well as (4.3)) forms a dense class of distributions on R+.


Proof The proof follows from adapting the approach employed in [32] and [31].Let F be any right continuous distribution on R+ with F (0) = 0. For fixed t ≥ 0,assume that F is continuous at t. Consider the following approximation of F :

Fn(t) =∞∑j=1

pj(n)F (t; j, n), (4.15)

where F (t; j, n) is the Erlang distribution (4.12) with β1 = n and pj(n) is

pj(n) = F (j/n)− F ((j − 1)/n).

It is clear that∑∞

j=1 pj(n) = 1. Thus, (4.15) can be seen as infinite convexmixtures of Erlang distribution. On account of (4.12), we have following (4.15)

Fn(t) =∞∑j=1

(F (j/n)− F ((j − 1)/n)

) ∞∑k=j

e−nt(nt)k

k!

=∞∑k=0

e−nt(nt)k

k!

k∑j=1

(F (j/n)− F ((j − 1)/n)

)=∞∑k=0

e−nt(nt)k

k!F (k/n)

=E{F(Xn(t)/n

)},

where Xn(t) is a Poisson random variable with intensity nt. Since F is boundedfunction, i.e., supt F (t) ≤ 1, and that F is continuous at t, for each ε > 0 wechoose δ > 0 such that |F (x)− F (t)| ≤ ε/2 whenever |x− t| ≤ δ. By doing so,

E{|F (Xn(t)/n)− F (t)|

}=E{|F (Xn(t)/n)− F (t)|; |Xn(t)/n− t| ≤ δ

}+ E

{|F (Xn(t)/n)− F (t)|; |Xn(t)/n− t| > δ

}≤ ε

2+ 2P{|Xn(t)/n− t| > δ}

≤ ε2

+2t

δ2n,

where we have applied Chebyshev’s inequality in the last inequality and the factthat E{Xn(t)/n} = t. Thus, by choosing n big enough so that 2t

δ2n≤ ε/2,

|Fn(t)− F (t)|}≤ ε,

i.e., limn→∞ Fn(t) = F (t). Next, consider the truncated version of (4.15):Fn(t) :=

∑n2

j=1 pj(n)F (t; j, n). By the above and monotone convergence theorem,we have limn→∞ |Fn(t) − Fn(t)| = 0 for each t ≥ 0. The proof is complete byrecalling Theorem 4.14 and Example 4.11 that a mixture of Erlang distributionsbelongs to the class of generalized phase-type distribution. The case F (0) 6= 0 ishandled by adding up F (0)δ0 in (4.15), where δ0 is Dirac measure at t = 0. �

18 B.A. Surya

4.5 Forward intensity

In survival and event history analysis (see e.g.,[1]), it has been common practiceto associate an intensity function λ(t) to the stopping time τ (1.1). This intensitydetermines rate of occurrence of an event within period of time (t, t + dt] giveninformation set Ii,t available at time t. Quite recently, Duan et al. [16] introducedthe notion of forward intensity which to some extent plays similar roles as forwardrate of interest in finance. However, as stated in [16] the forward intensity canonly be calculated with the exact knowledge of the underlying Markov process.

Motivated by this work, we introduce for any s ≥ t the forward intensity:

λi(t, s)ds = P{τ ≤ s+ ds∣∣τ > s, Ii,t}, (4.16)

which corresponds to the usual intensity λi(t) (see [1]) when we set s = t.Following (4.16), the forward intensity λi(t, s) determines the rate of occurrenceof future event in (s, s+ ds] based on the information Ii,t available at time t.

In the theorem below we give an explicit expression for the forward intensity.

Theorem 4.16 For a given s ≥ t ≥ 0 and information Ii,t, with any i ∈ E,

λi(t, s) =e>i(Sm(t)eΨT(s−t)Ψ + (Im − Sm(t))eT(s−t))δ

e>i(Sm(t)eΨT(s−t) + (Im − Sm(t))eT(s−t)

)1m

, (4.17)

based on which the phase-type distribution F i(t, s) and fi(t, s) (4.1) are given by

F i(t, s) = exp(−∫ s

t

λi(t, u)du),

fi(t, s) =λi(t, s) exp(−∫ s

t

λi(t, u)du).

(4.18)

Proof As τ is absolutely continuous (by (4.1)), we have following (4.16) that

λi(t, s)ds =P{τ ≤ s+ ds∣∣τ > s, Ii,t}

=P{s < τ ≤ s+ ds|Ii,t}

P{τ > s|Ii,t}

=− P{τ > s+ ds|Ii,t} − P{τ > s|Ii,t}P{τ > s|Ii,t}

=−∂∂sP{τ > s|Ii,t}P{τ > s|Ii,t}

ds

=−∂∂sF i(t, s)

F i(t, s)ds.

(4.19)

Solving the equation on last equality with condition F i(t, t) = 1 gives (4.18). �


However, when we set s = t in (4.17), we obtain intensity rate λi(t) of oc-currence of an event within infinitesimal interval (t, t + dt]. This intensity isinfluenced by the available past information Ii,t of X and is defined by

λi(t)dt = P{τ ≤ t+ dt|τ > t, Ii,t}.

To distinguish it from λ(t, s) (4.17), we refer to λ(t) as instantaneous intensity.

Corollary 4.17 Given i ∈ E and past information Ii,t, we have for any t ≥ 0,

λi(t) = e>i(Im + Sm(t)(Ψ− Im)

)δ. (4.20)

As it plays a central role in survival analysis, it is interesting to see that theinclusion of past information of X, particularly when the information Ii,t is fullyavailable, on the intensity λ is now made possible under the mixture model.

Observe that unlike (1.8) the two intensity functions (4.17) and (4.20) areboth dependence on past realization Ii,t of X. Below we give the baseline inten-sity function (1.8) for our generalized phase-type distributions (4.3).

Corollary 4.18 For given any t > 0, the baseline function of τ is given by

α(t) =π>(Sme

ΨTtΨ +(Im − Sm

)eTt)δ

π>(SmeΨTt +

(Im − Sm

)eTt)1m

, (4.21)

based on which the distribution function F (t) and f(t) (4.3) are given by

F (t) = exp(−∫ t

0

α(u)du)

f(t) =α(t) exp(−∫ t

0

α(u)du).

(4.22)

Proof Following similar arguments for the proof of Theorem 4.16, it can beshown that α(t) = f(t)/F (t), where f(t) and F (t) are given by (4.3). �

Below we give the long-run estimate of intensity (4.17), (4.20) and (4.21).


j ), j =

1, ...,m with ϕ(k)pk = max{ϕ(k)

1 , ..., ϕ(k)m } and pk = argmaxj{ϕ

(k)j }, k = 1, 2. Then,

lims→∞

λi(t, s) =

e>i Sm(t)Lp2 (ΨT)Ψδ

e>i Sm(t)Lp2 (ΨT)1m, ϕ

(2)p2 > ϕ

(1)p1

e>i (Im−Sm(t))Lp1 (T)δ

e>i (Im−Sm(t))Lp1 (T)1m, ϕ

(2)p2 < ϕ

(1)p1

e>i

(Sm(t)Lp2 (ΨT)Ψ+(Im−Sm(t))Lp1 (T)

)δ

e>i

(Sm(t)Lp2 (ΨT)+(Im−Sm(t))Lp1 (T)

)1m

, ϕ(2)p2 = ϕ

(1)p1

(4.23)

∀t ≥ 0 and i ∈ E, where Lp1(T) is the Lagrange interpolation (3.10)-(3.11).

20 B.A. Surya

Proof Since generator matrices T and ΨT are both negative definite, the prooffollows from applying similar arguments to the proof of Proposition 3.3. �

Corollary 4.20 Using (3.9), the long-run rate of λi(t) (4.20) is given by

limt→∞

λi(t) = e>i(Im + Sm(∞)(Ψ− Im)

)δ,

where Sm(∞) is diagonal matrix whose ith entry is given by (3.9). The long-runrate of α(t) (4.21) is given by inserting t = 0 in (4.23) and replacing ei by π:

limt→∞

α(t) =

π>SmLp2 (ΨT)Ψδ

π>SmLp2 (ΨT)1m, ϕ

(2)p2 > ϕ

(1)p1

π>(Im−Sm)Lp1 (T)δ

π>(Im−Sm)Lp1 (T)1m, ϕ

(2)p2 < ϕ

(1)p1

π>(

SmLp2 (ΨT)Ψ+(Im−Sm)Lp1 (T))δ

π>(

SmLp2 (ΨT)+(Im−Sm)Lp1 (T))

1m

, ϕ(2)p2 = ϕ

(1)p1

(4.24)

4.6 Residual lifetime

We derive residual lifetime Ri(t) = E{τ − t|τ > t, Ii,t} of the mixture process(2.3) given its past information Ii,t, for i ∈ S. The result is summarized below.

Proposition 4.21 For a given t ≥ 0 and past information Ii,t, we have

Ri(t) = −e>i

(Sm(t)(ΨT)−1 + (Im − Sm(t))T−1

)1m. (4.25)

for any initial state i ∈ E of X at time t and is zero for i = m+ 1.

Proof The proof is based on (4.1) and Bayes rule for conditional expectation,

E{τ − t|τ > t, Ii,t} =E{(τ − t)1{τ>t}|Ii,t}

P{τ > t|Ii,t}= E{(τ − t)1{τ>t}|Ii,t}, (4.26)

as P{τ > t|Ii,t} = 1. The latter expectation can be simplified using (4.1) as

E{(τ − t)1{τ>t}|Ii,t} =

∫ ∞t

P{τ > s|Ii,t}ds

=

∫ ∞t

e>i

(Sm(t)eΨT(s−t) + (Im − Sm(t))eT(s−t)

)1mds

= e>i Sm(t)(∫ ∞

t

eΨT(s−t)ds)1m + e>i

(Im − Sm(t)

)( ∫ ∞t

eT(s−t)ds)1m.

It is clear following (4.1) that Ri(t) = 0 for i = m + 1 as Fi(t, s) = 0 for s ≥ t.The result is established as for negative definite A,

∫∞teA(s−t)ds = −A−1. �


Observe that, if Ψ 6= I, the expected residual lifetime has dependence onpast realization of X through Sm(t) and the current age t when the informationis fully available. However, when the information is limited to only the current(and initial state), the expected residual lifetime forms a non stationary functionof the current age t. Otherwise, residual lifetime expectancy is just a constant.

4.7 Occupation time

The second quantity we are interested in is the generalization of the result wehad above for residual lifetime expectancy. In this section, we consider the totaltime Tj(t) the mixture process X (2.3) spent in state j ∈ S up to time t, i.e.,

Tj(t) =

∫ t

0

1{Xs=j}ds.

This quantity somehow appears in the likelihood function (3.1). As we allowthe inclusion of past information Ii,t and the current age t, we define

T j(t, s) = Tj(s)− Tj(t), for s ≥ t. (4.27)

Theorem 4.22 For a given s ≥ t ≥ 0 and past information Ii,t, we have

E{T j(t, s)∣∣Ii,t} =

e>i F11(t, s)ej, ∀i, j ∈ Ee>i F12(t, s)em+1, ∀i ∈ E0, ∀j ∈ E, i = m+ 1

(s− t), for i, j = m+ 1,

(4.28)

where the matrices F11(t, s) and F12(t, s) are defined by

F11(t, s) = Sm(t)[ΨT

]−1(eΨT(s−t) − Im

)+(Im − Sm(t)

)[T]−1(

eT(s−t) − Im),

F12(t, s) = Sm(t)[(s− t)Im − [ΨT]−1

(eΨT(s−t) − Im

)]1m

+(Im − Sm(t))[(s− t)Im − [T]−1

(eT(s−t) − Im

)]1m.

Proof The proof follows from applying Theorem 3.4 to obtain

E{T j(t, s)∣∣Ii,t} =

∫ s

t

P{Xu = j|Ii,t}du

=

∫ s

t

e>i

(Sm+1(t)e

G(u−t) +[Im+1 − Sm+1(t)

]eQ(u−t)

)ejdu

=e>i

(∫ s

t

(F11(t, u) F12(t, u)

0 1

)du)ej,

where the last equality is due to Lemma 3.5, which proving the claim. �

22 B.A. Surya

5 Multi absorbing states and competing risksIn this section we extend the lifetime distribution of Markov mixture processto the case where we have p ≥ 1 absorbing states and m ≥ 1 transient states.In this situation, Q (1.2) is a (m + p) × (m + p) singular matrix whose topleft (m ×m)−submatrix T corresponds to XQ moving to transient states E ={1, ...,m}, whereas δ now refers to a m × p exit transition matrix of XQ. Todistinguish it from one absorbing state, we reserve matrix notation D for δ, i.e.,

Q =

(T D0 0

), (5.1)

Following (1.4), both matrices T and D satisfy the following constraint:

T1m + D1p = 0m, (5.2)

where we denoted by 1n, n = m, p, a (n× 1)unit vector of size n. The intensitymatrix for XG is defined by to (2.1). Accordingly, the information matrix Sn(t)(3.4) is defined as n× n diagonal matrix, n ∈ {m,m+ p}, whose ith element isgiven by si(t) defined in Lemma 3.2, necessarily si(t) = 0 for i ∈ {m+1, ...,m+p}.

We denote by τj the lifetime of the Markov mixture process X (2.3) until itmoves to absorbing state δj := m+j, i.e., τj = inf{t > 0 : Xt = δj}. In competingrisks (for which we refer among others to Kalbfleisch and Prentice [22]) only thefirst event is observed, we therefore define observed lifetime τ = min{τ1, ..., τp}and a random variable J to denote an event-specific type, i.e., J = argminj(τj).

5.1 Conditional distributions and intensity

We are interested in determining conditional joint distribution Fi(t, s) for a givenpast information Ii,t of lifetime τ of the Markov mixture process (2.3) and therandom variable J corresponding to absorption of X to absorbing state δj,

Fij(t, s) = P{τ ≤ s,J = j|Ii,t} and F ij(t, s) = P{τ > s,J = j|Ii,t} (5.3)

with density fij(t, s) = ∂∂sFij(t, s). By (5.3) and (4.1), we have the following

Fi(t, s) =

p∑j=1

Fij(t, s) and F i(t, s) = 1− Fi(t, s), (5.4)

with density fi(t, s) = ∂∂sFi(t, s). Denote by λij(t, s) forward intensity associated

with absorption to absorbing state δj for given past information Ii,t defined by

λij(t, s)ds = P{τ ≤ s+ ds,J = j|τ > s, Ii,t

},

from which the overall intensity λi(t, s) (4.16) for only one type of absorption is

λi(t, s) =

p∑j=1

λij(t, s). (5.5)


Theorem 5.1 Given Ii,t, we have for any s ≥ t ≥ 0 and j ∈ {1, ..., p} that

Fij(t, s) =e>i

(Sm(t)(ΨT)−1

(eΨT(s−t) − Im

)Ψ

+(Im − Sm(t)

)T−1

(eT(s−t) − Im

))Dej

fij(t, s) =e>i


(Im − Sm(t)

)eT(s−t)

)Dej.

(5.6)

Proof As the two events {τ ≤ s,J = j} and {Xs = δj} are equivalent, we have

P{τ ≤ s,J = j

∣∣Ii,t} =P{Xs = j

∣∣Ii,t} = e>i P(t, s)em+j

=e>i

(Sm(t)(ΨT)−1

(eΨT(s−t) − Im

)Ψ

+(Im − Sm(t)

)T−1

(eT(s−t) − Im

))Dej.

Note that in the last equality we have used (3.15) with δ replaced by D. Ourclaim for density fij(t, s) is established on account fij(t, s) = ∂

∂sFij(t, s). �

Corollary 5.2 By (5.4) and (5.2), we have for any s ≥ t ≥ 0 and i ∈ E,

F i(t, s) =e>i

(Sm(t)eΨT(s−t) +

(Im − Sm(t)

)eT(s−t)

)1m

fi(t, s) =e>i


(Im − Sm(t)

)eT(s−t)

)D1p.

(5.7)

It can be shown using the same steps as in (4.19) and following (5.4)-(5.5),

λij(t, s) =fij(t, s)

F i(t, s)and λi(t, s) =

fi(t, s)

F i(t, s). (5.8)

From the result of Theorem 5.1, we see that the lifetime τ and type-specificabsorption are not independent. This implies following Theorem 1.1 in [15] thatthe intensity is not of proportional type as we can see from Corollary 5.3 below.

Corollary 5.3 For any i ∈ E and j ∈ {m+ 1, ...,m+ p}, we have ∀s ≥ t ≥ 0

λij(t, s) =e>i(Sm(t)eΨT(s−t)Ψ +

(Im − Sm(t)

)eT(s−t))Dej

e>i(Sm(t)eΨT(s−t) +

(Im − Sm(t)

)eT(s−t)

)1m

. (5.9)

Remark 5.4 Let Ψ = I, i.e., the case when the two Markov chains XQ and XG

move at the same speed, the type-specific distribution Fij(t) (5.6) and intensityλij(t) (5.9) are reduced to simpler expression with information Ii,t removed:

Fij(t, s) = e>i T−1(eT(s−t) − Im

)Dej and λij(t, s) =

e>i eT(s−t)Dej

e>i eT(s−t)1m

.

24 B.A. Surya

In the proposition below, we give the long-term rate of intensitylims→∞ λij(t, s). The result is due to applying similar arguments used in (3.11).


j ), j = 1, ...,m

with ϕ(k)pk = max{ϕ(k)

1 , ..., ϕ(k)m } and pk = argmaxj{ϕ

(k)j }, k = 1, 2. Then,

lims→∞

λij(t, s) =

e>i Sm(t)Lp2 (ΨT)ΨDej

e>i Sm(t)Lp2 (ΨT)1m, ϕ

(2)p2 > ϕ

(1)p1

e>i (Im−Sm(t))Lp1 (T)Dej

e>i (Im−Sm(t))Lp1 (T)1mϕ(2)p2 < ϕ

(1)p1

e>i

(Sm(t)Lp2 (ΨT)Ψ+(Im−Sm(t))Lp1 (T)

)Dej

e>i

(Sm(t)Lp2 (ΨT)+(Im−Sm(t))Lp1 (T)

)1m

, ϕ(2)p2 = ϕ

(1)p1

(5.10)

where in the two results Lp1(T) refers to the Lagrange interpolation (3.10).

Below we give the result on type-specific instantaneous intensity λij(t):

λij(t)dt = P{τ ≤ t+ dt,J = j|τ > t, Ii,t}.

Corollary 5.6 Given i ∈ E, j = {m+ 1, ...,m+ p} and past information Ii,t,

λij(t) = e>i(Im + Sm(t)(Ψ− Im)

)Dej ∀t ≥ 0 (5.11)

Denote Sm(∞) = diag(s1(∞), ..., sm(∞)) with si(∞) = limt→∞ si(t) (3.9). Then,

limt→∞

λij(t) = e>i(Im + Sm(∞)(Ψ− Im)

)Dej.

Solving (5.8) for distributions fi(t, s) and F i(t, s) leads to (4.18) with λi(t, s)given in (5.9). By doing so, we can express fij(t, s) and Fij(t, s) as follows.

Proposition 5.7 Given i ∈ E and j = m+1, ...,m+p, we have for all s ≥ t ≥ 0,

fij(t, s) =λij(t, s) exp(−∫ s

t

p∑j=1

λij(t, u)du)

Fij(t, s) =

∫ s

t

λij(t, u) exp(−∫ u

t

p∑j=1

λij(t, v)dv)du.

(5.12)

Remark 5.8 Following (5.12), we may consider a function Hij(t, s) = exp(−∫ s

tλij(t, u)du

), for s ≥ t, j = m+ 1, ...,m+ p and i = 1, ...,m. However, it does

not have a clear probability interpretation in view of competing risks.


Inspired by similar type of intensity discussed in Crowder [15], it may seemnatural to define an intensity function λij(t, s) = fij(t, s)/F ij(t, s) given by

λij(t, s) =P{τ ≤ s+ ds

∣∣τ > s,J = j, Ii,t}. (5.13)

This intensity can be seen as conditional density of lifetime distribution ofMarkov mixture process X at absorption caused by type j given its past infor-mation Ii,t. Following [15] and by (5.13), it is then sensible to define distributions

Fij(t, s) = P{τ ≤ s|J = j, Ii,t} and pij(t, s) = P{J = j|τ > s, Ii,t}. (5.14)

Proposition 5.9 For any i = 1, ...,m, j = m+1, ...,m+p, we have ∀s ≥ t ≥ 0,

Fij(t, s) =−(e>i Sm(t)(ΨT)−1

(eΨT(s−t) − Im

)ΨDej

e>i T−1Dej

+e>i(Im − Sm(t)

)T−1

(eT(s−t) − Im

)Dej

e>i T−1Dej

)pij(t, s) =−

e>i(Sm(t)eΨT(s−t) + (Im − Sm(t))eT(s−t)

)T−1Dej

e>i

(Sm(t)eΨT(s−t) +

(Im − Sm(t)

)eT(s−t)

)1m

.

Proof By Bayes’ rule for conditional probability, it follows from (5.14) that

Fij(t, s) =Fij(t, s)

pijand pij(t, s) =

F ij(t, s)

F i(t, s),

where pij is the probability of ultimate absorption to absorbing state δj given by

pij = P{J = j|Ii,t} = −e>i T−1Dej (5.15)

The results follow from (5.15), (5.7), (5.6) and by F ij(t, s) = pij − Fij(t, s):

F ij(t, s) = −e>i(Sm(t)eΨT(s−t) + (Im − Sm(t))eT(s−t)

)T−1Dej. � (5.16)

5.2 Unconditional distributions and intensity

In this section we present the results on unconditional distributions

Fj(t) = P{τ ≤ t,J = j} and F j(t) = P{τ > t,J = j},

with density fj(t) = F ′j(t). Type-specific instantaneous intensity is defined by

αj(t)dt = P{τ ≤ t+ dt, ,J = j|τ > t}. (5.17)

26 B.A. Surya

The distribution F (t) (4.3) and overall intensity α(t) (4.21) are given by

F (t) =

p∑j=1

F j(t) and α(t) =

p∑j=1

αj(t). (5.18)

Based on similar arguments given in Sections 4.2 and 5.1 we have:Theorem 5.10 Given any j ∈ {1, ..., p}, we have for all t ≥ 0 that

Fj(t) =π>(Sm(ΨT)−1

(eΨTt − Im

)Ψ +

(Im − Sm

)T−1

(eTt − Im

))Dej

fj(t) = π>(Sme

ΨTtΨ +(Im − Sm

)eTt)Dej.

(5.19)

Applying the same arguments of (5.15)-(5.16) to the result above, we obtain

F j(t) = −π>(Sme

ΨTt + (Im − Sm)eTt)T−1Dej. (5.20)

Following (5.17) and (5.18), the instantaneous intensity α(t) is given byCorollary 5.11 For any j ∈ {m+ 1, ...,m+ p}, we have for all t ≥ 0

αj(t) =π>(Sme

ΨTtΨ +(Im − Sm

)eTt)Dej

π>(SmeΨTt +

(Im − Sm

)eTt)1m

(5.21)

Remark 5.12 By setting Ψ = I the type-specific distribution Fj(t) (5.19) andintensity αj(t) (5.21) are reduced to the same expression given in Lindqvist [26]:

Fj(t) = π>T−1(eTt − Im

)Dej and αj(t) =

π>eTtDejπ>eTt1m

.

Solving differential equation (5.17) for fj(t) and Fj(t) we can express similarto (4.22) and (5.7) the latter two in terms of αj(t) and overall intensity α(t) as

fj(t) =αj(t) exp(−∫ t

0

p∑j=1

αj(u)du)

Fj(t) =

∫ t

0

αj(u) exp(−∫ u

0

p∑j=1

αj(v)dv)du.

The distribution of age at the time X gets absorbed by type j and theprobability of ultimate absorption by type j are defined following [15] by

Fj(t) = P{τ ≤ t|J = j} and pj(t) = P{J = j|τ > t}.

Proposition 5.13 For any j = m+ 1, ...,m+ p, we have for all t ≥ 0,

Fj(t) =−π>(Sm(ΨT)−1

(eΨTt − Im

)Ψ +

(Im − Sm

)T−1

(eTt − Im

))Dej

π>T−1Dej

pj(t) = −π>(Sme

ΨTt + (Im − Sm)eTt)T−1Dej

π>(SmeΨTt +

(Im − Sm

)eTt)1m

.


5.3 Residual lifetime under competing risks

Following [15], the residual lifetime for competing risk is defined similar to (4.26):

Rij(t) =

∫ ∞t

F ij(t, u)du.

Applying the sub-distribution (5.16) to the above, we have the following.

Proposition 5.14 For any i = 1, ..,m and j = m+1, ...,m+p, we have ∀t ≥ 0,

Rij(t) = e>i(Sm(t)(ΨT)−1 + (Im − Sm(t))T−1

)T−1Dej. (5.22)

In the section below, we discuss some numerical examples of the main resultsof Sections 4 and 5. The examples are given for multi absorbing states model.

6 Numerical examplesWe consider marriage and divorce problem under the mixture model (2.3). Themarriage breakup is due to divorce in one-absorbing state Markov mixture modeland by being widowed in competing risks model. We are interested in the lifetimedistribution of marriage until a breakup due to divorce is observed. In one-absorbing state model, the transient states is E = {N,M,S,W} whilst theabsorbing states ∆ = {D}. We denote by N for never married,M for married,Wfor widowed, S for separated, and D for divorced. For simplicity, we label N = 1,M = 2, S = 3, W = 4 and D = 5. The transition rates from never married tomarried is given by q12, married to widowed by q24 and widowed to married byq42. We assume that married couples may get separated after separation whichoccurs at rate of q23, and eventually getting divorced or being widowed at therate of q35 or q34, respectively. We allow the possibility that divorced may takeplace at rate q25 straight from married. In the competing risks, we set q42 = 0and move the widowed state W to the absorbing states, i.e., ∆ = {W,D}.

The composition of individuals in each state is denoted by initial vectorπ. The corresponding intensity matrix Q is given by (6.1). Heterogeneity ofindividuals are described by the speed at which they move along the states.We assume that there are two groups of individuals: one moving with intensitymatrix T, whilst the other with ΨT, with Ψ given by (2.2). At initial time, thecomposition of these groups in each state is given by diagonal matrix Sm (3.4).

Q =

−q12 q12 0 0 0

0 −(q23 + q24 + q25) q23 q24 q250 0 −(q34 + q35) q34 q350 q42 0 −q42 00 0 0 0 0

. (6.1)

28 B.A. Surya

Figure 1: Forward divorced intensity λi(t, d) (4.17) and survival distributionF i(t, d) (4.1) as functions of marriage duration d = s−t for a given age t > 0. Thetwo graphs on top correspond to the case where the population is homogeneous,i.e., Ψ = I, whereas the other two below for the case where there is heterogeneityin population; two different groups moving at different speed, i.e., Ψ 6= I.

For the purpose of simulation, we set the following values for the mixturemodel. We set the elements of intensity matrix T by q12 = 0.95, q24 = 0.05,q34 = 0.1, q23 = 0.25, q25 = 0.07, q42 = 0.85, and q35 = 0.5. Heterogeneity ofthe population is described by matrix Sm = diag(0.5, 0.5, 0.5, 0.5, 0.5), i.e., thereare 50% of the population in each state moving with intensity matrix T, whilethe other 50% moving with rate ΨT with Ψ = diag(0.25, 0.25, 0.25). The initialdistribution of population is given by π = (0.5, 0.3, 0.1)>; 50% never married,30% married, 10% widowed and the rest is being in separated status. Based onthese values, we compute and compare intensity rate λi(t, s) (4.17), λi(t) (4.20)and baseline function α(t) (4.21) of getting divorced for individuals who havebeen married (ei = (0, 1, 0, 0)>) for t period of time. Heterogeneity is removedby setting Ψ to be identity matrix. We assume that we have only the currentstatus of marriage (limited information on Ii,t). The results are presented inFigures 1 and 2. We observe in case there is no heterogeneity in the population


Figure 2: Intensity rate λi(t) (4.20) and baseline function α(t) (4.21) as a functionof the marriage age t. The two graphs on top correspond to the case where thepopulation is homogeneous, i.e., Ψ = I, whereas the other two below for the casewhere there is heterogeneity; two groups moving at different speed, i.e., Ψ 6= I.

that the divorced intensity is always increasing in duration, regardless of theage of marriage t. Closure look at Figure 2 suggests that the forward divorcedintensity λ(t, d) (4.17), with d = s− t, is the same as the baseline function α(t)(4.21). Moreover, the instantaneous divorced intensity λ(t) defined in (4.20) isnot affected by the marriage age t, i.e., divorced intensity λ(t) is just a constant.

In contrast to this observation for heterogeneous population, we notice fromFigure 1 that the divorced intensity λ(t, d) (4.17) changes its shapes w.r.t themarriage age t1. As we see from Figure 1 that those who have been married fort = 10 period of time have the lowest divorced intensity and those who just gotmarried for t = 0.01 period of time have the highest divorced intensity of all. Inbetween is the divorced intensity for those who have been married for t = 4.

From these curves we notice that the newly married and have been marriedfor few periods face increasing divorced intensity for the next 3.6 time period oftheir marriage. Especially at the beginning, the increment is rather steep. After

1It exhibits similar type of behavior to Fig 9 in [3] for different initial states.

30 B.A. Surya

Figure 3: Residual lifetime (4.25) of the marriage against the marriage age t.The graph on top correspond to the case where the population is homogeneous,i.e., Ψ = I, whereas the one below is for the case where there is heterogeneity -two groups moving at different speed, i.e., Ψ 6= I - and has U-bend shape.

approximately 3.6 period of time, the marriage is getting stable for which thedivorced rate decreases to the long-term rate. Even though the divorced rate isrelative low for old couples, it tends to increase at faster rate at the beginning andthen gradually reaching up the long-term rate 0.0644 (4.23). The instantaneousdivorced intensity λ(t) (4.20) confirms the above phenomenon that the divorcedintensity tends to increase for newly married, and then the relationship gets morestable as the age of marriage t increases as evidenced by the decreasing rate ofdivorce. The baseline curve α(t) in Figure 2 also exhibits similar observations.

In terms of survival probability, we see from Figure 1 that in both models(with or without heterogeneity) the marriage survival probability decreases asthe marriage duration time d increases. This is due to the fact that both intensitymatrices T and ΨT are negative definite. The only main differences we observe isthat the survival probability goes to zero faster under homogeneity model thanunder heterogeneity as seen by the strict increasing intensity for the existingmodel, see Figure 1. This is due to ψi < 1 for all i = 1, ..., 3. As a result, themodel implies that every marriage gets divorced in longer run, something whichdoes not necessarily occur in practice. In contrast to this observation, we seeunder the mixture model (with heterogeneity) that the survival probability does


Figure 4: Sub-distributions Fij(t, s) and F ij(t, s) under competing risks.

not decay to zero at the end of observation period; marriage would last in longerrun with a positive probability. Moreover, we can investigate the effect of themarriage age t to the marriage survival probability. We observe that for the samemarriage duration, the survival probability increases by the age, i.e., the olderthe age of marriage, the higher the probability that the marriage would survive.

Even though the above observations are based on numerical study on married-and-divorced model, but the results from the proposed model are able to providesound explanations to the data depicted in Figure 5.3 on page 222 of [1].

Figure 3 depicts expected residual lifetime of marriage. We observe that theresidual lifetime under homogeneity (the existing model) is not affected by theage of marriage t, i.e., it is just flat. In contrast to this observation, we noticethat the residual lifetime exhibits a U-bend shape curve under heterogeneity (thenew model), i.e., the residual lifetime is decreasing for newly married, and thenincreasing as the age of marriage gets older. These numerical findings seem to beconsistent with U-bend shape of lifetime discussed in the 2010 The Economistarticle. Thus, once again, we see that the proposed phase-type distribution andits forward intensity offer significant improvements over the existing model.

Figure 4 displays the sub-distribution functions Fij(t, s) and F ij(t, s) (5.6)

32 B.A. Surya

under competing risks in which case q42 was set to zero. We again observe thatunder the Markov mixture model we can study the effect of heterogeneity todivorce probability. The figure exhibits higher conditional forward probability ofgetting divorce for newly marriage and lower for older marriage. This observationis in line with that of for conditional forward divorce intensity given in Figure 1.

7 Conclusions

We have presented in this paper a generalization to the phase-type distribu-tion introduced in Neuts [28] and [27]. The generalization is made to allow theinclusion of available past information of underlying process and for modelingheterogeneity. The new distribution and its dependence on the past informationare given in an explicit form. It is constructed by extending the mixture model[20] and [19] into a mixture of two continuous-time finite-state absorbing Markovchains moving on the same state space at different speeds. The distribution hasdense and closure properties over convex mixtures and finite convolutions.

Its availability in closed form would give an advantage of getting some analyt-ically tractable results in applications. We have also proposed forward intensityassociated with the new distribution. This intensity is responsible for deter-mining rate of occurrence of future events based on available past informationof the mixture process. We extended the phase-type model for multi-absorbingstates which can be used to deal with competing risks in survival analysis. Nu-merical study on married-and-divorced problem was performed to compare theperformance of the new distribution and its intensity against their existing coun-terparts. Following the results, we have seen that the proposed models providesignificant improvements over the existing models. In particular, the forward in-tensity was able to provide sound explanations to divorced rate data found onpage 222 of Aalen et al. [1], and the residual lifetime reveals the common U-bendshape of lifetime found, e.g., in the 2010 The Economist article.

Given its explicit form and the ability to capture path dependence and het-erogeneity, we believe that the proposed phase-type distribution and forwardintensity should be able to offer appealing features for wide range of applica-tions which the existing distribution and intensity have been widely applied to.

Acknowledgement

B.A. Surya acknowledges the support and hospitality provided by the Center forApplied Probability of Columbia University during his academic visit in May2014 where he saw the work of Frydman and Schuermann [19] and started towork on the problem. He thanks Professor Karl Sigman for the invitation. Thisresearch is financially supported by Victoria University PBRF Research Grants# 212885 and # 214168 for which the author would like to acknowledge.


References

[1] Aalen, O.O, Borgan,Ø, Gjessing, H.K. (2008). Survival and Event HistoryAnalysis: A Process Point of View, Springer.

[2] Aalen, O.O. and Gjessing, H.K. (2001). Understanding the shape of thehazard rate: a process point of view. Stat. Sci., 16, 1-22.

[3] Aalen, O.O. (1995). Phase type distributions in survival analysis. Scand. J.Stat., 22, 447-463.

[4] Apostol, T.M. (1969). Explicit formulas for the exponential matrix etA. Am.Math. Mon., 76, 289-292.

[5] Asmussen, S., Avram, F. and Pistorius, M.R. (2004). Russian and Americanputoptions under exponential phase-type Lévy models. Stoch. Proc. Appl.,109, 79-111.

[6] Asmussen, S. (2003). Applied Probability and Queues, 2nd Edition, Springer.

[7] Albrecher, H. and Asmussen, S. (2010). Ruin Probabilities, 2nd Edition,World Scientific.

[8] Assaf, D. and Levikson, B. (1982). Closure of phase type distributions underoperations arising in reliability theory. Ann. Probab., 10, 265-269.

[9] Badila, E.S. Boxma, O.J. and Resing, J.A.C. (2014). Document Queues andrisk processes with dependencies. Stoch. Models, 30, 390-419.

[10] Blumen, I., Kogan, M. and McCarthy, P.J. (1955). The industrial mobilityof labor as a probability process. Cornell Stud. Ind. Labor Relat., Vol. 6,Ithaca, N.Y., Cornel University Press.

[11] Breuer, L. and Baum, D. (2005). An Introduction to Queueing Theory andMatrix-Analytic Methods, Springer.

[12] Buchholz, P., Kriege, J. and Felko, I. (2014). Input Modeling with Phase-Type Distributions and Markov Models: Theory and Applications, Springer.

[13] Chakravarthy, S.R. and Neuts, M.F. (2014). Analysis of a multi-serverqueueing model with MAP arrivals of regular customers and phase typearrivals of special customers. Simul. Model. Pract. Th., 43, 79-95.

[14] Cheng, H.-W and Yau, S.-T. (1997). More explicit formulas for exponentialmatrix. Linear Algebra Appl., 262, 131-163.

[15] Crowder, M. (2001). Classical Competing Risks, Chapman & Hall.

34 B.A. Surya

[16] Duan, J.C, Sun, J. and Wang, T. (2012). Multiperiod corporate defaultprediction - a forward intensity approach. J. Econometrics, 170, 191-209.

[17] Economist. Age and happiness: The U-bend of life, The Economist, Ed.December 16th, 2010. http://www.economist.com/node/17722567

[18] Embrecht, P., Frey, R. and McNeil, A.J. (2005). Quantitative Risk Manage-ment, Princeton University Press.

[19] Frydman, H. and Schuermann, T. (2008). Credit rating dynamics andMarkov mixture models. J. Bank. Financ., 32, 1062-1075.

[20] Frydman, H. (2005). Estimation in the mixture of Markov chains movingwith different speeds. J. Am. Stat. Assoc., 100, 1046-1053.

[21] Frydman, H. (1984). Maximum likelihood estimation in the mover-stayermodel. J. Am. Stat. Assoc., 79, 632-638.

[22] Kalbfleisch, J.D. and Prentice, R.L. (2002). The Statistical Analysis of Fail-ure Time Data, 2nd Ed. Wiley.

[23] Klugman, S.A., Panjer, H.H., and Willmot, G.E. (2012). Loss models: fromdata to decisions, Volume 715. John Wiley.

[24] Lee, S.C.K. and Lin, X.S. (2010). Modeling and evaluating insurance lossesvia mixtures of Erlang distributions. N. Am. Actuar. J., 14 (1), 107-130.

[25] Lin, X. S. and Liu, X. (2007). Markov aging process and phase-type law ofmortality. N. Am. Actuar. J., 11, 92-109.

[26] Lindqvist, B.H. (2013). Phase-type distributions for competing risks. Pro-ceedings of the 59th World Statistics Congress of the International StatisticalInstitute, Hong Kong

[27] Neuts, M.F. (1981).Matrix-Geometric Solutions in Stochastic Models, JohnsHopkins University Press, Baltimore.

[28] Neuts, M.F. (1975). Probability distributions of phase-type. In Liber Ami-corum Prof. Emiritus H. Florin, 173-206, University of Louvain, Belgium.

[29] O’cinneide, C. (1990). Characterization of phase-type distributions. Com-mun. Statist.-Stochastic Models, 6, 1-57.

[30] Okamura, H. and Dohi, T. (2015). Phase-type software reliability model:parameter estimation algorithms with grouped data. Ann. Oper. Res., 1-32.

[31] Rolski, T., Schmidli, H., Schmidt, V. and Teugels, J. (1998). StochasticProcesses for Insurance and Finance, Willey.

[32] Tijms, H. (1994). Stochastic Models: An Algorithm Approach, John Wiley.

http://www.economist.com/node/17722567

Documents

Generalized Phase-Type Distribution and Competing Risks ... · School of Mathematics and Statistics Wellington, New Zealand 11 November 2016 Abstract Phase-type distribution has been