Additive Hazards - GitHub Pagesleemcdaniel.github.io/presentations/addhaz.pdfAnother Form The...

Preview:

Citation preview

Additive Hazards

1 / 34

Motivation

Let λ(t|Zi ) be the conditional hazard function given the covariateprocess

Zi (t) = (Zi1(t), . . . ,Zip(t))′

Then an extraordinarily flexible model is

λ(t|Zi ) = α(t,Zi (t)) (1)

With α an unknown function of time and the covariates.

2 / 34

Motivation

• First assume α(t, 0) = 0.

• Then take a first order Taylor expanion of α(t, z) about 0

Now it turns out that (1) reduces to

λ(t|Zi ) =

p∑j=1

αj(t)Zij(t) (2)

This is Aalen’s additive hazard model.

3 / 34

Another Form

The previous model is the original formulation. Let’s instead use

λ(t|Zj(t)) = β0(t) +

p∑k=1

βk(t)Zjk(t) (3)

to follow the notation in Klein and Moeschberger.

4 / 34

A Quick Definition

First we make a design matrix X(t), an n× (p + 1) matrix. For theith row, set

Xi (t) = Yi (t)(1,Zi (t)).

So the ith row of X(t) is (1,Zi (t)) if subject i is at risk at time t,otherwise 0.

5 / 34

Martingale Form

By the form in (3),

M(t) = N(t)−∫ t

0X(u)β(u)du

where M(t) is a n × 1 vector of martingales. So

dN(t) = X(t)β(t) + dM(t)

Now set dM(t) = 0

6 / 34

Martingale Form

∫ t

0β(u)du =

∫ t

0X−(u)dN(u)

=∑Tj≤t

δjX−(Tj), for t ≤ τ

This gives a way to estimate what we need.

Note that X−(u) can be any generalized inverse.

7 / 34

Estimation

We won’t estimate βk(t) directly, instead estimate

Bk(t) =

∫ t

0βk(u)du

for each k .

Bk(t) is called a cumulative regression function

Now we will use least-squares.

8 / 34

Another Definition

Let I(t) be the n× 1 vector with ith element equal to 1 if subject idies at t and 0 otherwise.

This is essentially dN(t).

9 / 34

Estimation

The least squares estimate of B(t) = (B0(t), . . . ,Bp(t))′ is

B̂(t) =∑Ti≤t

[X′(Ti )X(Ti )

]−1X′(Ti )I(Ti ) (4)

This is using the generalized inverse suggested by Aalen.

10 / 34

Variance

(B̂− B)(t) =

∫ t

0X−(u)dN(u)−

∫ t

0β(u)du

=

∫ t

0X−(u) {X(u)β(u) + dM(u)} −

∫ t

0β(u)du

=

∫ t

0X−(u)dM(u)

and so

〈B̂− B〉(t) =

∫ t

0X−(u)〈dM(u)〉X−(u)′

=

∫ t

0X−(u)diag(Y(u)λ(u))X−(u)′

11 / 34

Variance

A good estimator for Y(t)λ(t) is dN(t)

So, using I(t) for dN(t) and the specific choice of generalizedinverse,

V̂ar(B̂(t))

=∑Ti≤t

[X′(Ti )X(Ti )

]−1X′(Ti )D(I(Ti ))X(Ti )

{[X′(Ti )X(Ti )

]−1}′where D(I(t)) is diag(I(t)).

12 / 34

Estimation

• [X′(Ti )X(Ti )]−1X′(Ti ) is one specific choice for X(Ti )−

• Based on least squares

• Non-optimal

• Optimality requires the true values of parameter functions

• Estimator is defined as long as X′(Ti )X(Ti ) is invertible

13 / 34

Simulation

Similar (but not identical) to Aalen’s 1989 paper, we will use

λ(t) = 1 + 1.75I{t ≤ 0.2}Z1 + 3Z2

with Z1, Z2 binary and

P(Z1 = 0) = P(Z2 = 0) = 1/2

No censoring

14 / 34

Cumulative Regression Functions

Initial risk set = 100

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Time

Cum

ulat

ive R

egre

ssio

n Fu

nctio

n

Z1Z2Intercept

15 / 34

Cumulative Regression Functions

Initial risk set = 1000

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Time

Cum

ulat

ive R

egre

ssio

n Fu

nctio

n

Z1Z2Intercept

16 / 34

Cumulative Regression Functions

Initial risk set = 10000

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Time

Cum

ulat

ive R

egre

ssio

n Fu

nctio

n

Z1Z2Intercept

17 / 34

Testing

Aalen presents the null hypothesis:

Hj : βj(t) = 0 ∀t

A test statistic for Hj is given by the (j + 1)th element of

U =∑Ti

K(Ti )I(Ti ) =∑Ti

W(Ti )[X′(Ti )X(Ti )]−1X′(Ti )I(Ti )

(5)with

W(t) ={

diag[X′(t)X(t)]−1}−1

18 / 34

Testing

The covariance matrix of U is

V =∑Ti

K(Ti )diag(I(Ti ))K′(Ti ) (6)

ThenU′VU

is asymptotically χ2p when Hj holds for all j .

19 / 34

Efficiency Considerations

• Based on least squares

• No guarantee estimates are optimal

• No guarantee tests are optimal

20 / 34

Lung Data

• From survival package in R: Survival in patients with advancedlung cancer from the North Central Cancer Treatment Group.

• Performance scores rate how well the patient can performusual daily activities.

• We will use ECOG performance score as covariate

21 / 34

ECOG Score

Score ECOG

0 Fully active1 Restricted activity but ambulatory and capable of light work2 Ambulatory, capable of self care, no work3 Limited self care, bed or chair most of the time4 Completely disabled5 Dead

22 / 34

aareg() Function

>library(survival)

>aalen <- aareg(Surv(time, status) ~ age + sex + ph.ecog,

data=lung)

>aalen

slope coef se(coef) z p

Intercept 5.05e-03 5.87e-03 4.74e-03 1.240 0.216000

age 4.01e-05 7.15e-05 7.23e-05 0.989 0.323000

sex -3.16e-03 -4.03e-03 1.22e-03 -3.310 0.000935

ph.ecog 3.01e-03 3.67e-03 1.02e-03 3.610 0.000303

Chisq=26.18 on 3 df, p=8.7e-06; test weights=aalen

23 / 34

ECOG Cumulative Regression Function

0 200 400 600 800

0.00.5

1.01.5

Time

ph.ec

og

24 / 34

Lin and Ying’s Model

25 / 34

Lin and Ying’s Model

Lin and Ying (1994) present the reduced model

λ(t|Zj(t)) = β0(t) +

p∑k=1

βkZjk(t) (7)

With the usual definitions, the intensity function for Ni (t) is

Yi (t)dΛ(t;Zi ) = Yi (t){dΛ0(t) + β′0Zi (t)dt}

with Λ0(t) =∫ t0 λ0(u)du

26 / 34

Martingale Structure

Now it’s useful to note that Ni (·) can be decomposed into

Ni (t) =Mi (t) +

∫ t

0Yi (u)dΛ(u;Zi )

=Mi (t) +

∫ t

0Yi (u){dΛ0(t) + β′0Zi (t)dt} (8)

Then the obvious way to estimate Λ0(t) is by

Λ̂0(β̂, t) =

∫ t

0

∑ni=1{dNi (u)− Yi (u)β̂′Zi (u)du}∑n

j=1 Yj(u)

27 / 34

Estimation

Use the estimating function

U(β) =n∑

i=1

∫ ∞0

Zi (t){dNi (t)− Yi (t)d Λ̂0(β, t)− Yi (t)β′Zi (t)dt}

Which is equivalent to

U(β) =n∑

i=1

∫ ∞0{Zi (t)− Z̄ (t)}{dNi (t)− Yi (t)β′Zi (t)dt}

where

Z̄ (t) =

∑nj=1 Yj(t)Zj(t)∑n

j=1 Yj(t)

28 / 34

Estimation

Then U(β) = 0 and solving, we get that

β̂ = A−1

[n∑

i=1

∫ ∞0{Zi (t)− Z̄ (t)}dNi (t)

]

with

A =

[n∑

i=1

∫ ∞0

Yi (t){Zi (t)− Z̄ (t)}⊗2dt

]

29 / 34

Variance Estimation

The covariance of β̂ can be estimated by

(n−1A

)−1 [1

n

n∑i=1

∫ ∞0{Zi (t)− Z̄ (t)}⊗2dNi (t)

] (n−1A

)−1

30 / 34

Properties of the Estimating Equation

If β0 is the true covariate vector, then

U(β0) =n∑

i=1

∫ ∞0{Zi (t)− Z̄ (t)}dMi (t)

and n−1/2U(β0)→d N(0,Σ), with Σ estimated by

n−1n∑

i=1

∫ ∞0{Zi (t)− Z̄ (t)}⊗2dNi (t)

31 / 34

Asymptotic Considerations

• No guarantee of optimal consistency

• Weighting by true hazard leads to optimality

• Can estimate hazard, then weight

32 / 34

ahaz() Function

library(ahaz)

attach(lung)

time2 <- time[!is.na(ph.ecog) & !is.na(time)]

time2 <- time2 + runif(length(time2), 0, 0.001)

sex2 <- sex[!is.na(ph.ecog) & !is.na(time)]

age2 <- age[!is.na(ph.ecog) & !is.na(time)]

status2 <- status[!is.na(ph.ecog) & !is.na(time)]

ph.ecog2 <- ph.ecog[!is.na(ph.ecog) & !is.na(time)]

ly <- ahaz(Surv(time2, status2),

cbind(age2,sex2,ph.ecog2))

33 / 34

Results

Coefficients:

Estimate Std. Error Z value Pr(>|z|)

age2 2.109e-05 2.149e-05 0.982 0.326304

sex2 -1.216e-03 3.595e-04 -3.384 0.000715 ***

ph.ecog2 1.116e-03 3.039e-04 3.672 0.000240 ***

34 / 34