Survival Analysis using R - Western...

Preview:

Citation preview

Survival Analysisusing R

Bruce L. Jones

Department of Statistical and Actuarial SciencesThe University of Western Ontario

March 24, 2010

Outline

• What is R?

• Why use R?

• A bit about R

• What is Survival Analysis?

• The survival package in R

• Example

1

What is R?

• R is a free software environment for statistical computing and graphics.

• It compiles and runs on a wide variety of UNIX platforms, Windowsand MacOS.

• R is very popular among researchers in statistics.

• R is similar in appearance to S.

• R was initially written by Ross Ihaka and Robert Gentleman

2

Why use R?

• It contains advanced statistical routines not yet available in otherpackages.

• It provides an unparalleled platform for programming new statisticalmethods in an easy and straightforward manner.

• It has state-of-the-art graphics capabilities.

• It’s free. Just go to http://www.r-project.org

3

Assignment, Vectors and Arrays

> 1+2*3

[1] 7

> x=3

> y<-2

> x+y

[1] 5

> z=c(2,3,4,5)

> z

[1] 2 3 4 5

> 2*z

[1] 4 6 8 10

>

9

Assignment, Vectors and Arrays

> 1+2*3

[1] 7

> x=3

> y<-2

> x+y

[1] 5

> z=c(2,3,4,5)

> z

[1] 2 3 4 5

> 2*z

[1] 4 6 8 10

>

9

Assignment, Vectors and Arrays

> z=2:5

> z

[1] 2 3 4 5

> z=seq(2,5,1)

> z

[1] 2 3 4 5

> zz=seq(10,300,3)

> zz

[1] 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64[20] 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121[39] 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 169 172 175 178[58] 181 184 187 190 193 196 199 202 205 208 211 214 217 220 223 226 229 232 235[77] 238 241 244 247 250 253 256 259 262 265 268 271 274 277 280 283 286 289 292[96] 295 298

>

10

Assignment, Vectors and Arrays

> z=2:5

> z

[1] 2 3 4 5

> z=seq(2,5,1)

> z

[1] 2 3 4 5

> zz=seq(10,300,3)

> zz

[1] 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64[20] 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121[39] 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 169 172 175 178[58] 181 184 187 190 193 196 199 202 205 208 211 214 217 220 223 226 229 232 235[77] 238 241 244 247 250 253 256 259 262 265 268 271 274 277 280 283 286 289 292[96] 295 298

>

10

Assignment, Vectors and Arrays

> mat=array(1:12,c(3,4))

> mat

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

> mat=matrix(1:12,3,4)

> mat

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

>

11

Assignment, Vectors and Arrays

> mat=array(1:12,c(3,4))

> mat

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

> mat=matrix(1:12,3,4)

> mat

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

>

11

Functions

> plus=function(a,b) a+b> plus(3,4)[1] 7> plus(3)

Error in plus(3) : element 2 is empty;the part of the args list of ’+’ being evaluated was:(a, b)

> plus=function(a,b=0) a+b> plus(3,4)[1] 7> plus(3)[1] 3> plus(1:3,4:5)

[1] 5 7 7Warning message:In a + b : longer object length is not a multiple of shorter object length

>

12

Functions

> plus=function(a,b) a+b> plus(3,4)[1] 7> plus(3)

Error in plus(3) : element 2 is empty;the part of the args list of ’+’ being evaluated was:(a, b)

> plus=function(a,b=0) a+b> plus(3,4)[1] 7> plus(3)[1] 3> plus(1:3,4:5)

[1] 5 7 7Warning message:In a + b : longer object length is not a multiple of shorter object length

>

12

What is Survival Analysis?

Survival Analysis is the study of lifetimes and their distributions. It usuallyinvolves one or more of the following objectives:

• to explore the behaviour of the distribution of a lifetime.

• to model the distribution of a lifetime.

• to test for differences between the distributions of two or more lifetimes.

• to model the impact of one or more explanatory variables on a lifetimedistribution.

13

The Nature of Lifetime Data

• It’s almost always incomplete.

– It often involves right-censoring.

– It sometimes involves left-truncation.

• The methods of survival analysis allow for this incompleteness.

14

The survival Package in R

> install.packages("survival") # first time only

--- Please select a CRAN mirror for use in this session ---trying URL ’http://probability.ca/cran/bin/windows/contrib/2.10/survival_2.35-8.zip’Content type ’application/zip’ length 2445387 bytes (2.3 Mb)opened URLdownloaded 2.3 Mb

package ’survival’ successfully unpacked and MD5 sums checked

The downloaded packages are inC:\Documents and Settings\jones\Local Settings\Temp\RtmpEQ5ZaF\downloaded_packages

> library(survival)

Loading required package: splines

>

15

Creating a Survival Object

Example 1. Complete data lifetimes: 26, 42, 71, 85, 92.

> ex1.times=c(26,42,71,85,92)

> ex1.surv=Surv(ex1.times)

> ex1.surv

[1] 26 42 71 85 92

> class(ex1.surv)

[1] "Surv"

> class(ex1.times)

[1] "numeric"

>

16

Creating a Survival Object

Example 2. Right-censored lifetimes: 26, 42, 71, 80+, 80+.

> ex2.times=c(26,42,71,80,80)

> ex2.events=c(1,1,1,0,0)

> ex2.surv=Surv(ex2.times,ex2.events)

> ex2.surv

[1] 26 42 71 80+ 80+

>

17

Creating a Survival Object

Example 3. Left-truncated and right-censored lifetimes:Left-truncation time is 40 for all individuals;Event/right-censoring times are 42, 71, 80+, 80+.

> ex3.lttimes=rep(40,4)

> ex3.times=c(42,71,80,80)

> ex3.events=c(1,1,0,0)

> ex3.surv=Surv(ex3.lttimes,ex3.times,ex3.events)

> ex3.surv

[1] (40,42 ] (40,71 ] (40,80+] (40,80+]

>

18

Real Data Example

Lifetimes: Times until death of 26 psychiatric patients

Number of deaths: 14

Number of censored observations: 12

Covariates: patient age and sex (15 females, 11 males)

19

Real Data Example

The Data

patient sex age time death patient sex age time death

1 2 51 1 1 14 2 30 37 02 2 58 1 1 15 2 33 35 03 2 55 2 1 16 1 36 25 14 2 28 22 1 17 1 30 31 05 1 21 30 0 18 1 41 22 16 1 19 28 1 19 2 43 26 17 2 25 32 1 20 2 45 24 18 2 48 11 1 21 2 35 35 09 2 47 14 1 22 1 29 34 010 2 25 36 0 23 1 35 30 011 2 31 31 0 24 1 32 35 112 1 24 33 0 25 2 36 40 113 1 25 33 0 26 1 32 39 0

20

Real Data Example

Questions

• Does the lifetime distribution behave the way we expect?

• Are the lifetimes different for females and males?

• Do the lifetimes depend on age?

21

Estimating the Survival Function

We can explore the lifetime distribution by examining nonparametricestimates of the survival function.

The R function survfit allow us to do this.

> library(KMsurv) # get the data> data(psych)> attach(psych)> names(psych)

[1] "sex" "age" "time" "death"

> psych.surv=Surv(age,age+time,death) # create a survival object

> psych.fit1=survfit(psych.surv˜1) # obtain the estimates

> plot(psych.fit1,xlim=c(40,80),xlab="age",ylab="probability",+ main="Survival Function Estimates") # plot the estimates>

22

Estimating the Survival Function

40 50 60 70 80

0.0

0.2

0.4

0.6

0.8

1.0

Survival Function Estimates

age

prob

abili

ty

23

Estimating the Survival Function

Now let’s consider females and males separately.

> psych.fit2=survfit(psych.surv˜sex) # separate by sex

> plot(psych.fit2,xlim=c(40,80),xlab="age",ylab="probability",+ main="Survival Function Estimates for Males (red) and Females",+ col=c("red","blue"))

> plot(psych.fit2,xlim=c(40,80),xlab="age",ylab="probability",+ main="Survival Function Estimates for Males (red) and Females",+ col=c("red","blue"), conf.int=T)>

24

Estimating the Survival Function

40 50 60 70 80

0.0

0.2

0.4

0.6

0.8

1.0

Survival Function Estimates for Females (blue) and Males

age

prob

abili

ty

25

Estimating the Survival Function

40 50 60 70 80

0.0

0.2

0.4

0.6

0.8

1.0

Survival Function Estimates for Females (blue) and Males

age

prob

abili

ty

26

Testing for Differences

The R function survdiff allow us to test for differences between lifetimedistributions.

> survdiff(psych.surv˜sex)

Error in survdiff(psych.surv ˜ sex) : Right censored data only

> psych.surv2=Surv(time,death) # create new survival object> survdiff(psych.surv2˜sex)

Call:survdiff(formula = psych.surv2 ˜ sex)

N Observed Expected (O-E)ˆ2/E (O-E)ˆ2/Vsex=1 11 4 6.24 0.807 1.61sex=2 15 10 7.76 0.650 1.61

Chisq= 1.6 on 1 degrees of freedom, p= 0.205

>

27

Testing for Differences

The R function survdiff allow us to test for differences between lifetimedistributions.

> survdiff(psych.surv˜sex)

Error in survdiff(psych.surv ˜ sex) : Right censored data only

> psych.surv2=Surv(time,death) # create new survival object> survdiff(psych.surv2˜sex)

Call:survdiff(formula = psych.surv2 ˜ sex)

N Observed Expected (O-E)ˆ2/E (O-E)ˆ2/Vsex=1 11 4 6.24 0.807 1.61sex=2 15 10 7.76 0.650 1.61

Chisq= 1.6 on 1 degrees of freedom, p= 0.205

>

27

Fitting a Proportional Hazards Model

The model: h(t|x1, . . . , xp) = h0(t) exp(β1x1 + · · · + βpxp)

• The PH model is often used when we are interested in the impact ofthe covariates, x1, . . . , xp, but not the lifetime distributions themselves.

• We can estimate and make inferences about β1, . . . , βp without esti-mating h0.

• The R function coxph allows us to do this.

28

Fitting a Proportional Hazards Model

> psych.coxph1=coxph(psych.surv˜sex)> summary(psych.coxph1)

Call:coxph(formula = psych.surv ˜ sex)

n= 26

coef exp(coef) se(coef) z Pr(>|z|)sex 0.3900 1.4770 0.6102 0.639 0.523

exp(coef) exp(-coef) lower .95 upper .95sex 1.477 0.677 0.4466 4.884

Rsquare= 0.016 (max possible= 0.926 )Likelihood ratio test= 0.43 on 1 df, p=0.5141Wald test = 0.41 on 1 df, p=0.5227Score (logrank) test = 0.41 on 1 df, p=0.5203

29

Fitting a Proportional Hazards Model

Next we use our survival object psych.surv2, which does not involve left-truncation.

> psych.coxph2=coxph(psych.surv2˜sex)> summary(psych.coxph2)

Call:coxph(formula = psych.surv2 ˜ sex)

n= 26

coef exp(coef) se(coef) z Pr(>|z|)sex 0.7511 2.1194 0.6055 1.241 0.215

exp(coef) exp(-coef) lower .95 upper .95sex 2.119 0.4718 0.6469 6.944

Rsquare= 0.062 (max possible= 0.945 )Likelihood ratio test= 1.66 on 1 df, p=0.1981Wald test = 1.54 on 1 df, p=0.2148Score (logrank) test = 1.61 on 1 df, p=0.2046

Note that the last test is exactly that performed using survdiff.

30

Fitting a Proportional Hazards Model

Finally, consider

> psych.coxph3=coxph(psych.surv2˜age+sex)> summary(psych.coxph3)

Call:coxph(formula = psych.surv2 ˜ age + sex)

n= 26

coef exp(coef) se(coef) z Pr(>|z|)age 0.20753 1.23063 0.05828 3.561 0.00037 ***sex -0.52374 0.59230 0.73753 -0.710 0.47762---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

exp(coef) exp(-coef) lower .95 upper .95age 1.2306 0.8126 1.0978 1.380sex 0.5923 1.6883 0.1396 2.514

Rsquare= 0.553 (max possible= 0.945 )Likelihood ratio test= 20.91 on 2 df, p=2.879e-05Wald test = 14.3 on 2 df, p=0.0007866Score (logrank) test = 21.27 on 2 df, p=2.409e-05

31

Conclusions about this Example

• There is great uncertainty due to the small number of observations.

• Times until death depend on age at first admission to the hospital.

• We cannot conclude that the lifetimes are different for females andmales.

32

Fitting an Accelerated Failure Time Model

• This is a popular fully parametric model for which the lifetime distrib-ution is the same for different covariate values, except that the timescale is multiplied by a different constant.

• The R function survreg can be used to fit an AFT model.

33

Summary

• R is a flexible and free software environment for statistical computingand graphics.

• The survival package contains functions for survival analysis.

– Surv creates a survival object.

– survfit estimates (nonparametrically) the survival function.

– survdiff performs tests for differences in lifetime distributions.

– coxph fits the proportional hazards model.

– survreg fits the accelerated failure time model.

These slides are here:

http://www.stats.uwo.ca/faculty/jones/survival_talk.pdf

34

Recommended