59
Recurrent Event Data: Models, Analysis, Efficiency Edsel A. Pe ˜ na Department of Statistics University of South Carolina Columbia, SC 29208 Talk at Brown University March 10, 2008

Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Recurrent Event Data:Models, Analysis, Efficiency

Edsel A. Pena

Department of StatisticsUniversity of South Carolina

Columbia, SC 29208

Talk at Brown UniversityMarch 10, 2008

Page 2: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Some (Recurrent) EventsSubmission of a manuscript for publication.Occurrence of tumor.Onset of depression.Patient hospitalization.Machine/system failure.Occurrence of a natural disaster.Non-life insurance claim.Change in job.Onset of economic recession.At least a 200 points decrease in the DJIA.Marital disagreement.

Recurrent Event Data:Models, Analysis, Efficiency – p.1

Page 3: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Event Times and DistributionsT : the time to the occurrence of an event of interest.F (t) = Pr{T ≤ t} : the distribution function of T .S(t) = F (t) = 1− F (t) : survivor/reliability function.Hazard rate/ probability and Cumulative Hazards:

Cont: λ(t)dt ≈ Pr{T ≤ t+ dt|T ≥ t} = f(t)

S(t−)dt

Disc: λ(tj) = Pr{T = tj |T ≥ tj} =f(tj)

S(tj−)

Cumulative: Λ(t) =

∫ t

0λ(w)dw or Λ(t) =

tj≤t

λ(tj)

Recurrent Event Data:Models, Analysis, Efficiency – p.2

Page 4: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Representation/Relationships0 < t1 < . . . < tM = t,M(t) = max |ti − ti−1| = o(1),

S(t) = Pr{T > t} =M∏

i=1

Pr{T > ti|T ≥ ti−1}

≈M∏

i=1

[1− {Λ(ti)− Λ(ti−1)}] .

Identities:S(t) =

w≤t

[1− Λ(dw)]

Λ(t) =

∫ t

0

dF (w)

1− F (w−)

Recurrent Event Data:Models, Analysis, Efficiency – p.3

Page 5: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Estimating F

A Classic Problem: Given T1, T2, . . . , Tn IID F , obtainan estimator F of F .Importance?

θ(F ) of F (e.g., mean, median, variance)estimated via θ = θ(F ).Prediction of time-to-event for new units.Comparing groups, e.g., thru a statistic

Q =

W (t)d[

F1(t)− F2(t)]

where W (t) is some weight function.

Recurrent Event Data:Models, Analysis, Efficiency – p.4

Page 6: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Gastroenterology Data: Aalen and Husebye (’91)Migratory Motor Complex (MMC) Times for 19 Subjects

0 100 200 300 400 500 600 700

05

1015

20

MMC Data Set

Calendar Time

Unit N

umbe

r

Question: How to estimate the MMC period dist, F?Recurrent Event Data:Models, Analysis, Efficiency – p.5

Page 7: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Parametric ApproachAssume a Model: F = {F (t; θ) : θ ∈ Θ ⊂ <p}Given t1, t2, . . . , tn, θ estimated by θ such as ML.L(θ) =

∏ni=1 f(ti; θ) =

∏ni=1 λ(ti; θ) exp{−Λ(ti; θ)}

θ = argmaxθ L(θ)

Fpa(t) = F (t; θ)

I(θ) = Var{ ∂∂θ log f(T1; θ)}

θ ∼ AN(

θ, 1nI(θ)−1)

F (t; θ) = ∂∂θF (t; θ)

Fpa(t) ∼ AN(

F (t; θ), 1n•

F (t; θ)′I(θ)−1•

F (t; θ))

Recurrent Event Data:Models, Analysis, Efficiency – p.6

Page 8: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Nonparametric ApproachNo assumptions are made regarding the family ofdistributions to which the unknown df F belongs.Empirical Distribution Function (EDF):

Fnp(t) =1

n

n∑

i=1

I{Ti ≤ t}

Fnp(·) is a nonparametric MLE of F .Since I{Ti ≤ t}, i = 1, 2, . . . , n, are IID Ber(F (t)), byCentral Limit Theorem,

Fnp(t) ∼ AN

(

F (t),1

nF (t)[1− F (t)]

)

.

Recurrent Event Data:Models, Analysis, Efficiency – p.7

Page 9: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

An Efficiency ComparisonAssume: F ∈ F = {F (t; θ) : θ ∈ Θ}Fpa and Fnp are asymptotically unbiased.Efficiency = Ratio of Asymptotic Variances:

Eff(Fpa(t) : Fnp(t)) =F (t; θ)[1− F (t; θ)]

F (t; θ)′I(θ)−1•

F (t; θ).

When F = {F (t; θ) = 1− exp{−θt} : θ > 0}:

Eff(Fpa(t) : Fnp(t)) =exp{θt} − 1

(θt)2

Recurrent Event Data:Models, Analysis, Efficiency – p.8

Page 10: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Efficiency: Parametric/NonparametricAsymptotic efficiency of parametric

versus nonparametric estimators under a correctnegative exponential family model.

0 1 2 3 4 5

020

040

060

080

010

00

Effi of Para Relative to NonPara in Expo Case

Value of (Theta)(t)

Effic

iency

(in

perc

ent)

Recurrent Event Data:Models, Analysis, Efficiency – p.9

Page 11: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Whither Nonparametrics?A Misspecified Model:Fitted model is negative exponential family, but thetrue model is the gamma family.Under wrong model, with T = 1

n

∑ni=1 Ti the sample

mean, the parametric estimator of F is

Fpa(t) = 1− exp{−t/T}.

Under gamma with shape α and scale θ, and sinceT ∼ AN(α/θ, α/(nθ2)), by δ-method

Fpa(t) ∼ AN(

1− exp

{

−θtα

}

,1

n

(θt)2

α3exp

{

−2(θt)

α

})

.

Recurrent Event Data:Models, Analysis, Efficiency – p.10

Page 12: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Efficiency: Under a Mis-specified ModelSimulated Effi: MSE(Non-Parametric)/MSE(Parametric)

under a mis-specified exponential family model.True Family of Model: Gamma Family

0 1 2 3 4 5 6

050

100

150

200

250

300

Effi of Para vs NonPara under Misspecified Model

Value of t

Effi(

%) −

(Par

a/No

npar

a)

n=10n=30n=100n=500

Recurrent Event Data:Models, Analysis, Efficiency – p.11

Page 13: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

MMC Data: Censoring AspectFor each unit, red mark is the potential termination time.

0 100 200 300 400 500 600 700

05

1015

20

MMC Data Set

Calendar Time

Unit N

umbe

r

Remark: All 19 MMC times completely observed.Recurrent Event Data:Models, Analysis, Efficiency – p.12

Page 14: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Estimation of F : With CensoringRight-censoring variables: C1, C2, . . . , Cn IID G.Observables: (Zi, δi), i = 1, 2, . . . , n withZi = min{Ti, Ci} and δi = I{Ti ≤ Ci}.Problem: Given (Zi, δi)s, estimate df F or hazardfunction Λ of the Tis.Nonparametric Approaches:

Nonparametric MLE (Kaplan-Meier).Martingale and method-of-moments.

Pioneers: Kaplan & Meier; Efron; Nelson; Breslow;Breslow & Crowley; Aalen; Gill.

Recurrent Event Data:Models, Analysis, Efficiency – p.13

Page 15: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Product-Limit EstimatorCounting and At-Risk Processes:

N(t) =∑n

i=1 I{Zi ≤ t; δi = 1};Y (t) =

∑ni=1 I{Zi ≥ t}

Hazard probability estimate at t:

Λ(dt) =∆N(t)

Y (t)=

# of Observed Failures at t# at-risk at t

Product-Limit Estimator (PLE):

1− F (t) = S(t) =∏

w≤t

[

1− ∆N(t)

Y (t)

]

Recurrent Event Data:Models, Analysis, Efficiency – p.14

Page 16: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Stochastic Process ApproachA martingale M is a zero-mean process whichmodels a fair game. With Ht = history up to t:

E{M(s+ t)|Ht} =M(t).

M(t) = N(t)−∫ t

0 Y (w)Λ(dw) is a martingale, so withJ(t) = I{Y (t) > 0} and stochastic integration,

E

{∫ t

0

J(w)

Y (w)dN(w)

}

= E

{∫ t

0J(w)Λ(dw)

}

.

Nelson-Aalen estimator of Λ, and PLE:

Λ(t) =

∫ t

0

dN(w)

Y (w), so S(t) =

w≤t

[1− Λ(dw)].

Recurrent Event Data:Models, Analysis, Efficiency – p.15

Page 17: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Asymptotic Properties

NAE: √n[Λ(t)− Λ(t)]⇒ Z1(t) with {Z1(t) : t ≥ 0} azero-mean Gaussian process with

d1(t) = Var(Z1(t)) =∫ t

0

Λ(dw)

S(w)G(w−) .

PLE: √n[F (t)− F (t)]⇒ Z2(t)st= S(t)Z1(t) so

d2(t) = Var(Z2(t)) = S(t)2∫ t

0

Λ(dw)

S(w)G(w−) .

If G(w) ≡ 1 (no censoring), d2(t) = F (t)S(t)!

Recurrent Event Data:Models, Analysis, Efficiency – p.16

Page 18: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Regression ModelsCovariates: temperature, degree of usage, stresslevel, age, blood pressure, race, etc.How to account of covariates to improve knowledgeof time-to-event.

Modelling approaches:Log-linear models:

log(T ) = β′x + σε.

The accelerated failure-time model. Errordistribution to use? Normal errors not appropriate.Hazard-based models: Cox proportional hazards(PH) model; Aalen’s additive hazards model.

Recurrent Event Data:Models, Analysis, Efficiency – p.17

Page 19: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Cox (’72) PH Model: Single Event

Conditional on x, hazard rate of T is:

λ(t|x) = λ0(t) exp{β′x}.

β maximizes partial likelihood function of β:

LP (β) ≡n∏

i=1

t<∞

[

exp(β′xi)∑n

j=1 Yj(t) exp(β′xj)

]∆Ni(t)

.

Aalen-Breslow semiparametric estimator of Λ0(·):

Λ0(t) =

∫ t

0

∑ni=1 dNi(w)

∑ni=1 Yi(w) exp(β

′xi).

Recurrent Event Data:Models, Analysis, Efficiency – p.18

Page 20: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

MMC Data: Recurrent AspectAalen and Husebye (’91) Full Data

0 100 200 300 400 500 600 700

05

1015

20

MMC Data Set

Calendar Time

Unit N

umbe

r

Problem: Estimate inter-event time distribution.Recurrent Event Data:Models, Analysis, Efficiency – p.19

Page 21: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Representation: One Subject

Recurrent Event Data:Models, Analysis, Efficiency – p.20

Page 22: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Observables: One SubjectX(s) = covariate vector, possibly time-dependentT1, T2, T3, . . . = inter-event or gap timesS1, S2, S3, . . . = calendar times of event occurrencesτ = end of observation period: Assume τ ∼ G

K = max{k : Sk ≤ τ} = number of events in [0, τ ]

Z = unobserved frailty variableN †(s) = number of events in [0, s]

Y †(s) = I{τ ≥ s} = at-risk indicator at time sF† = {F†s : s ≥ 0} = filtration: information that

includes interventions, covariates, etc.

Recurrent Event Data:Models, Analysis, Efficiency – p.21

Page 23: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Aspect of Sum-Quota AccrualObserved Number of Events:

K = max

k :k∑

j=1

Tj ≤ τ

Induced Constraint:

(T1, T2, . . . , TK) satisfiesK∑

j=1

Tj ≤ τ <K+1∑

j=1

Tj .

(K,T1, T2, . . . , TK) are all random and dependent, andK is informative about F .

Recurrent Event Data:Models, Analysis, Efficiency – p.22

Page 24: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Recurrent Event Models: IID Case

Parametric Models:HPP: Ti1, Ti2, Ti3, . . . IID EXP(λ).IID Renewal Model: Ti1, Ti2, Ti3, . . . IID F where

F ∈ F = {F (·; θ) : θ ∈ Θ ⊂ <p};

e.g., Weibull family; gamma family; etc.Non-Parametric Model: Ti1, Ti2, Ti3, . . . IID F which issome df.With Frailty: For each unit i, there is an unobservableZi from some distribution H(·; ξ) and (Ti1, Ti2, Ti3, . . .),given Zi, are IID with survivor function

[1− F (t)]Zi .

Recurrent Event Data:Models, Analysis, Efficiency – p.23

Page 25: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

A General Class of Full Models

Peña and Hollander (2004) model.

N †(s) = A†(s|Z) +M †(s|Z)

M †(s|Z) ∈M20 = sq-int martingales

A†(s|Z) =∫ s

0Y †(w)λ(w|Z)dw

Intensity Process:

λ(s|Z) = Z λ0[E(s)] ρ[N †(s−);α]ψ[βtX(s)]

Recurrent Event Data:Models, Analysis, Efficiency – p.24

Page 26: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Effective Age Process: E(s)

Recurrent Event Data:Models, Analysis, Efficiency – p.25

Page 27: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Effective Age Process, E(s)

PERFECT Intervention: E(s) = s− SN†(s−).IMPERFECT Intervention: E(s) = s.MINIMAL Intervention (BP ’83; BBS ’85):

E(s) = s− SΓη(s−)

where, with I1, I2, . . . IID BER(p),

η(s) =

N†(s)∑

i=1

Ii and Γk = min{j > Γk−1 : Ij = 1}.

Recurrent Event Data:Models, Analysis, Efficiency – p.26

Page 28: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Semi-Parametric Estimation: No Frailty

Observed Data for n Subjects:

{(Xi(s), N†i (s), Y

†i (s), Ei(s)) : 0 ≤ s ≤ s∗}, i = 1, . . . , n

N †i (s) = # of events in [0, s] for ith unit

Y †i (s) = at-risk indicator at s for ith unitwith the model for the ‘signal’ being

A†i (s) =

∫ s

0Y †i (v) ρ[N

†i (v−);α]ψ[βtXi(v)]λ0[Ei(v)]dv

where λ0(·) is an unspecified baseline hazard ratefunction.

Recurrent Event Data:Models, Analysis, Efficiency – p.27

Page 29: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Processes and NotationsCalendar/Gap Time Processes:

Ni(s, t) =

∫ s

0I{Ei(v) ≤ t}N †i (dv)

Ai(s, t) =

∫ s

0I{Ei(v) ≤ t}A†i (dv)

Notational Reductions:

Eij−1(v) ≡ Ei(v)I(Sij−1,Sij ](v)I{Y†i (v) > 0}

ϕij−1(w|α, β) ≡ρ(j − 1;α)ψ{βtXi[E−1ij−1(w)]}

E ′ij−1[E−1ij−1(w)]

Recurrent Event Data:Models, Analysis, Efficiency – p.28

Page 30: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Generalized At-Risk ProcessYi(s, w|α, β) ≡∑N

†i (s−)

j=1 I(Eij−1(Sij−1), Eij−1(Sij)](w) ϕij−1(w|α, β)+I(E

iN†i(s−)

(SiN

†i(s−)

), EiN

†i(s−)

((s∧τi))](w) ϕiN†i (s−)

(w|α, β)

For IID Renewal Model (PSH, 01) this simplifies to:

Yi(s, w) =

N†i (s−)∑

j=1

I{Tij ≥ w}+ I{(s ∧ τi)− SiN

†i (s−)

≥ w}

Recurrent Event Data:Models, Analysis, Efficiency – p.29

Page 31: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Estimation of Λ0

Ai(s, t|α, β) =∫ t

0Yi(s, w|α, β)Λ0(dw)

S0(s, t|α, β) =n∑

i=1

Yi(s, t|α, β)

J(s, t|α, β) = I{S0(s, t|α, β) > 0}

Generalized Nelson-Aalen ‘Estimator’:

Λ0(s, t|α, β) =∫ t

0

{

J(s, w|α, β)S0(s, w|α, β)

}

{

n∑

i=1

Ni(s, dw)

}

Recurrent Event Data:Models, Analysis, Efficiency – p.30

Page 32: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Estimation of α and β

Partial Likelihood (PL) Process:

LP (s∗|α, β) =

n∏

i=1

N†i (s

∗)∏

j=1

[

ρ(j − 1;α)ψ[βtXi(Sij)]

S0[s∗, Ei(Sij)|α, β]

]∆N†i (Sij)

PL-MLE: α and β are maximizers of the mapping

(α, β) 7→ LP (s∗|α, β)

Iterative procedures. Implemented in an R packagecalled gcmrec (Gonzaléz, Slate, Peña ’04).

Recurrent Event Data:Models, Analysis, Efficiency – p.31

Page 33: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Estimation of F0

G-NAE of Λ0(·): Λ0(s∗, t) ≡ Λ0(s

∗, t|α, β)G-PLE of F0(t):

ˆF 0(s∗, t) =

t∏

w=0

[

1−∑n

i=1Ni(s∗, dw)

S0(s∗, w|α, β)

]

For IID renewal model with Ei(s) = s− SiN

†i (s−)

,ρ(k;α) = 1, and ψ(w) = 1, the generalizedproduct-limit estimator in PSH (2001, JASA) obtains.

Recurrent Event Data:Models, Analysis, Efficiency – p.32

Page 34: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

First Application: MMC Data Set

Aalen and Husebye (1991) DataEstimates of distribution of MMC period

Migrating Moto Complex (MMC) Time, in minutes

Survi

vor P

robab

ility E

stima

te

50 100 150 200 250

0.00.2

0.40.6

0.81.0

IIDPLEWCPLEFRMLE

Recurrent Event Data:Models, Analysis, Efficiency – p.33

Page 35: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Second Application: Bladder Data SetBladder cancer data pertaining to times to recurrence forn = 85 subjects studied in Wei, Lin and Weissfeld (’89).

0 10 20 30 40 50 60

020

4060

80

Calendar Time

Subje

ct

Recurrent Event Data:Models, Analysis, Efficiency – p.34

Page 36: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Results and ComparisonsEstimates from Different Methods for Bladder Data

Cova Para AG WLW PWP General ModelMarginal Cond*nal Perfecta Minimalb

logN(t−) α - - - .98 (.07) .79Frailty ξ - - - ∞ .97

rx β1 -.47 (.20) -.58 (.20) -.33 (.21) -.32 (.21) -.57Size β2 -.04 (.07) -.05 (.07) -.01 (.07) -.02 (.07) -.03

Number β3 .18 (.05) .21 (.05) .12 (.05) .14 (.05) .22

aEffective Age is backward recurrence time (E(s) = s− SN†(s−)).bEffective Age is calendar time (E(s) = s).

Details: Peña, Slate, and Gonzalez (2007). JSPI.

Recurrent Event Data:Models, Analysis, Efficiency – p.35

Page 37: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Sum-Quota Effect: IID Renewal

Generalized product-limit estimator F of commongap-time df F presented in PSH (2001, JASA).

√n( ˆF (·)− F (·)) =⇒ GP(0, σ2(·))

σ2(t) = F (t)2∫ t

0

dΛ(w)

F (w)G(w−) [1 + ν(w)]

ν(w) =1

G(w−)

∫ ∞

w

ρ∗(v − w)dG(v)

ρ∗(·) =∞∑

j=1

F ?j(·) = renewal function

Recurrent Event Data:Models, Analysis, Efficiency – p.36

Page 38: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Efficiency: Some Questions

Is it worth using the additional event recurrences inthe analysis? How much do we gain in efficiency?Impact of G, the distribution of the τis, when G isrelated to inter-event time distribution? Loss if thisinformative monitoring structure is ignored?(In)Efficiency of GPLE relative to estimator whichexploits informative monitoring structure?What is a reasonable informative monitoring modelfor examining these questions?Could we extend similar studies that were performedfor the PLE using the so-called Koziol-Green Model?

Recurrent Event Data:Models, Analysis, Efficiency – p.37

Page 39: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Koziol-Green ModelKoziol & Green (1976); Chen Hollander & Langberg(1982); Cheng & Lin (1989)T ∼ F and C ∼ G with T failure time and C rightcensoring time.Assumption: 1−G = (1− F )β for some β ≥ 0.Z = min(T,C) and δ = I{T ≤ C} are independent;Z ∼ H = F β+1; δ ∼ Ber(1/(1 + β)); β = censoringparameter.CHL: Exact properties of the PLE: mean, variance,mean-squared error.CL: Efficiency of PLE relative to estimator thatexploits KG assumption.

Recurrent Event Data:Models, Analysis, Efficiency – p.38

Page 40: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Generalized KG: Recurrent EventsFor a unit or subject,Inter-event times Tjs are IID F ;End-of-monitoring time τ has distribution G.Assumption: 1−G = (1− F )β for some β ≥ 0.Remark: Independence property that allowed exactderivations in right-censored single-event settingsdoes not play a role in this recurrent event setting.Efficiency comparisons performed via asymptoticanalysis and through computer simulations.Two Cases: (i) F is exponential, and (ii) F istwo-parameter Weibull.

Recurrent Event Data:Models, Analysis, Efficiency – p.39

Page 41: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Ignoring Informative StructureWhen F is exponential, so model is HPP.θn: estimator that exploits informative monitoring.θn: estimator that ignores informative monitoring.

Efficiency Result: ∆ARE(θn : θn) = 0.Surprising result!? It turns out that in the exponentialsetting, these two estimators are identical.The estimators are:

θn = θn =

∑ni=1Ki

∑ni=1 τi

Recurrent Event Data:Models, Analysis, Efficiency – p.40

Page 42: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Single-Event AnalysisSingle-Event Analysis: Only the first, possiblyright-censored, observations are used in thestatistical analysis?θ: depends only on the first event times.

When F is exponential, we have

∆ARE(θn : θn) =1

β.

1/β: (approximate) expected number of events perunit.

Recurrent Event Data:Models, Analysis, Efficiency – p.41

Page 43: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Inefficiency of GPLE

How inefficient is the generalized PLE of F comparedto the parametric estimator that exploits informativemonitoring structure?

Fn: parametric estimator and exploits informativestructure.

Fn: generalized PLE in PSH (JASA, 01).When F is exponential,

ARE(Fn(t) : Fn(t)) =[(1 + β)t]2

exp[(1 + β)t]− 1, t ≥ 0.

Recurrent Event Data:Models, Analysis, Efficiency – p.42

Page 44: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Efficiency Plot: Exponential FGPLE (F ) versus Parametric Estimator (F )

0.0 0.2 0.4 0.6 0.8 1.0

0.00.1

0.20.3

0.40.5

0.6

p = Prob(min(T,tau) > t)

ARE(

p)

Recurrent Event Data:Models, Analysis, Efficiency – p.43

Page 45: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Efficiencies: Weibull F

n θ1 θ2 β MeanEvs Eff(θ : θ) Eff(θ : θ)50 0.9 1 0.3 3.90 1.26 30.2750 0.9 1 0.5 2.25 1.41 13.4850 0.9 1 0.7 1.57 1.66 7.9650 1.0 1 0.3 3.34 1.30 23.2950 1.0 1 0.5 2.00 1.53 9.9150 1.0 1 0.7 1.42 1.71 6.6950 1.5 1 0.3 1.97 1.49 9.5350 1.5 1 0.5 1.34 1.75 5.5550 1.5 1 0.7 1.02 2.11 3.82

Recurrent Event Data:Models, Analysis, Efficiency – p.44

Page 46: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Efficiency Plot: Weibull FGPLE (F ) versus Parametric Estimator (F )

0.0 0.2 0.4 0.6 0.8 1.0

0.10.2

0.30.4

0.50.6

Simul Params: n = 50 Theta1 = 1.5 Theta2 = 1

F(t)

Relat

ive E

fficien

cy (F

_Tild

e:F_H

at)

beta= 0.3beta= 0.5beta= 0.7

Recurrent Event Data:Models, Analysis, Efficiency – p.45

Page 47: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

On Marginal Modeling: WLW and PWP

k0 specified (usually the maximum value of theobserved Ks).Assume a Cox PH-type model for each Sk,k = 1, . . . , k0.Counting Processes (k = 1, 2, . . . , k0):

Nk(s) = I{Sk ≤ s;Sk ≤ τ}At-Risk Processes (k = 1, 2, . . . , k0):

YWLWk (s) = I{Sk ≥ s; τ ≥ s}

Y PWPk (s) = I{Sk−1 < s ≤ Sk; τ ≥ s}

Recurrent Event Data:Models, Analysis, Efficiency – p.46

Page 48: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Working Model SpecificationsWLW Model

{

Nk(s)−∫ s

0YWLWk (v)λWLW

0k (v) exp{βWLWk X(v)}dv

}

PWP Model

{

Nk(s)−∫ s

0Y PWPk (v)λPWP

0k (v) exp{βPWPk X(v)}dv

}

are assumed to be zero-mean martingales (in s).

Recurrent Event Data:Models, Analysis, Efficiency – p.47

Page 49: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Parameter EstimationSee Therneau & Grambsch’s book Modeling SurvivalData: Extending the Cox Model.βWLWk and βPWP

k obtained via partial likelihood (Cox(72) and Andersen and Gill (82)).Overall β-estimate:

βWLW =k0∑

k=1

ckβWLWk ;

cks being ‘optimal’ weights. See WLW paper.ΛWLW0k (·) and ΛPWP

0k (·): Aalen-Breslow-Nelson typeestimators.

Recurrent Event Data:Models, Analysis, Efficiency – p.48

Page 50: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Two Relevant QuestionsQuestion 1: When one assumes marginal models forSks that are of the Cox PH-type, does there exist afull model that actually induces such PH-typemarginal models?Answer: YES, by a very nice paper by Nang and Ying(Biometrika:2001). BUT, the joint model obtained israther ‘limited’.Question 2: If one assumes Cox PH-type marginalmodels for the Sks (or Tks), but the true full modeldoes not induce such PH-type marginal models[which may usually be the case in practice], what arethe consequences?

Recurrent Event Data:Models, Analysis, Efficiency – p.49

Page 51: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Case of the HPP ModelTrue Full Model: for a unit with covariate X = x,events occur according to an HPP model with rate:

λ(t|x) = θ exp(βx).

For this unit, inter-event times Tk, k = 1, 2, . . . are IIDexponential with mean time 1/λ(t|x).Assume also that X ∼ BER(p) and µτ = E(τ).Main goal is to infer about the regression coefficientβ which relates the covariate X to the eventoccurrences.

Recurrent Event Data:Models, Analysis, Efficiency – p.50

Page 52: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Full Model Analysisβ solves

XiKi∑

Ki=

τiXi exp(βXi)∑

τi exp(βXi).

β does not directly depend on the Sijs. Why?Sufficiency: (Ki, τi)s contain all information on (θ, β).

(Si1, Si2, . . . , SiKi)|(Ki, τi)

d= τi(U(1), U(2), . . . , U(Ki)).

Asymptotics:

β ∼ AN

(

β,1

n

(1− p) + peβ

µτθ[(1− p) + peβ]

)

.

Recurrent Event Data:Models, Analysis, Efficiency – p.51

Page 53: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Some Questions

Under WLW or the PWP: how are βWLWk and βPWP

k

related to θ and β?Impact of event position k?Are we ignoring that Kis are informative? Why notalso a marginal model on the Kis?Are we violating the Sufficiency Principle?Results simulation-based: Therneau & Grambschbook (’01) and Metcalfe & Thompson (SMMR, ’07).Comment by D. Oakes that PWP estimates lessbiased than WLW estimates.

Recurrent Event Data:Models, Analysis, Efficiency – p.52

Page 54: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Properties of βWLWk

Let βWLWk be the partial likelihood MLE of β based

on at-risk process Y WLWk (v).

Question: Does βWLWk converge to β?

gk(w) = wk−1e−w/Γ(k): standard gamma pdf.Gk(v) =

∫∞

vgk(w)dw: standard gamma survivor

function.G(·): survivor function of τ .E(·): denotes expectation wrt X.

Recurrent Event Data:Models, Analysis, Efficiency – p.53

Page 55: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Limit Value (LV) of βWLWk

Limit Value β∗k = β∗k(θ, β) of βWLWk : solution in β∗ of

∫ ∞

0E(XθeβXgk(vθe

βX))G(v)dv =

∫ ∞

0eWLWk (v; θ, β, β∗)E(θeβXgk(vθe

βX))G(v)dv

where

eWLWk (v; θ, β, β∗) =

E(Xeβ∗X Gk(vθeβX))

E(eβ∗X Gk(vθeβX))

Asymptotic Bias of βWLWk = β∗k − β

Recurrent Event Data:Models, Analysis, Efficiency – p.54

Page 56: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Bias Plots for WLW EstimatorColors pertain to value of k, the Event Position

k = 1: Black; k = 2: Red; k = 3: Green; k = 4: DarkBlue;k = 5: LightBlue

Theoretical Simulated

−1.0 −0.5 0.0 0.5 1.0

−2−1

01

Beta

Asym

ptotic

Bias

−1.0 −0.5 0.0 0.5 1.0

−2

−1

01

2

True Beta

Sim

ula

ted B

ias

of W

LW

Recurrent Event Data:Models, Analysis, Efficiency – p.55

Page 57: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

On PWP EstimatorsMain Difference Between WLW and PWP:

E(Y WLWk (v)|X) = G(v)Gk(vθ exp(βX));

E(Y PWPk (v)|X) = G(v)

gk(vθ exp(βX))

θ exp(βX).

Leads to: uPWPk (s; θ, β) = 0 for k = 1, 2, . . ..

βPWPk are asymptotically unbiased for β for each k (at

least in this HPP model)!Theoretical result consistent with observed resultsfrom simulation studies and D. Oakes’ observation.

Recurrent Event Data:Models, Analysis, Efficiency – p.56

Page 58: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Concluding RemarksRecurrent events prevalent in many areas.Dynamic models: accommodate unique aspects.More research in inference for dynamic models.Current limitation: tracking effective age.Efficiency gains with recurrences.Caution: informative aspects of model.Caution: marginal modeling approaches.Dynamic recurrent event modeling: challenging andstill fertile!

Recurrent Event Data:Models, Analysis, Efficiency – p.57

Page 59: Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Acknowledgements

Thanks to NIH grants that partially support research.Research collaborators: Myles Hollander, RobStrawderman, Elizabeth Slate, Juan Gonzalez, RussStocker, Akim Adekpedjou, and Jonathan Quiton.Thanks to current students: Alex McLain, LauraTaylor, Josh Habiger, and Wensong Wu.Thanks to all of you!

Recurrent Event Data:Models, Analysis, Efficiency – p.58