Introduction to Survival Analysis August 3 and 5, 2004

Introduction to Survival Analysis

August 3 and 5, 2004

OverviewOverview

What is survival analysis?Introduction to Kaplan-Meier methods.Introduction to Cox proportional hazards

methods (Thursday)Recommended reading in Walker: Chapters

21-22

What is survival analysis?What is survival analysis?

Statistical methods for analyzing longitudinal data on the occurrence of events.

Events may include death, injury, onset of illness, recovery from illness (binary variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts).

Accommodates data from randomized clinical trial or cohort study design.

Randomized Clinical Trial Randomized Clinical Trial (RCT)(RCT)

Target population

Intervention

Control

Disease

Disease-free

Disease

Disease-free

TIME

Random assignment

Disease-free, at-risk cohort

Target population

Treatment

Control

Cured

Not cured

Cured

Not cured

TIME

Random assignment

Patient population


Target population

Treatment

Control

Dead

Alive

Dead

Alive

TIME

Random assignment

Patient population


Cohort study Cohort study (prospective/retrospective) (prospective/retrospective)

Target population

Exposed

Unexposed

Disease

Disease-free

Disease

Disease-free

TIME

Disease-free cohort

Examples of survival analysis Examples of survival analysis in medicinein medicine

RCT: Women’s Health RCT: Women’s Health Initiative (Initiative (JAMAJAMA, 2001), 2001)

On hormones

On placeboCumulative incidence

Retrospective cohort study:Retrospective cohort study:From December 2003 From December 2003 BMJBMJ: :

Aspirin, ibuprofen, and mortality after myocardial infarction: Aspirin, ibuprofen, and mortality after myocardial infarction:

retrospective cohort studyretrospective cohort study

– Estimate time-to-event for a group of individuals, such as time until second heart-attack for a group of MI patients.

– To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial.

– To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients?

Note: expected time-to-event = 1/incidence rate

Objectives of survival analysis

Survival Analysis: TermsSurvival Analysis: Terms Time-to-event: The time from entry into a study

until a subject has a particular outcome Censoring: Subjects are said to be censored if

they are lost to follow up or drop out of the study, or if the study ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. – If dropout is related to both outcome and treatment,

dropouts may bias the results

Why use survival analysis?Why use survival analysis?

1. Why not compare mean time-to-event between your groups using a t-test or linear regression?

-- ignores censoring 2. Why not compare proportion of events in

your groups using odds ratios or logistic regression?

--ignores time

Data Structure: survival Data Structure: survival analysisanalysis

Time variable: ti = time at last disease-free observation or time at event

Censoring variable: ci =1 if had the event; ci =0 no event by time ti

Choice of time of origin. Note varying start times.

Count every subject’s time since their baseline data collection.

Survival functionSurvival function

)()( tTPtS

Gives the probability of surviving past a certain time.

For example, the probability of surviving beyond 10, years, 50 years, or 100 years.

One goal of survival analysis is to estimate and compare survival experiences of different groups.

Survival experience is described by the survival function:

Introduction to Kaplan-MeierIntroduction to Kaplan-Meier

Non-parametric estimate of the survival function.

Commonly used to describe survivorship of study population/s.

Commonly used to compare two study populations.

Intuitive graphical presentation.

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival Data (right-censored)Survival Data (right-censored)

1. subject E dies at 4 months

X

100%

Time in months

Corresponding Kaplan-Meier Corresponding Kaplan-Meier CurveCurve

Probability of surviving to 4 months is 100% = 5/5

Fraction surviving this death = 4/5

Subject E dies at 4 months


Subject B

Subject A

Subject C

Subject D

Subject E

Survival DataSurvival Data

2. subject A drops out after 6 months


X

3. subject C dies at 7 monthsX

100%

Time in months


subject C dies at 7 months

Fraction surviving this death = 2/3


Subject B

Subject A

Subject C

Subject D

Subject E

Survival DataSurvival Data

2. subject A drops out after 6 months

4. Subjects B and D survive for the whole year-long study period


X

3. subject C dies at 7 monthsX

100%

Time in months


Product limit estimate of survival = P(surviving event 1/at-risk up to failure 1) * P(surviving event 2/at-risk up to failure 2) =4/5 * 2/3= .5333

The product limit estimateThe product limit estimate

The probability of surviving in the entire year, taking into account censoring

= (4/5) (2/3) = 53%

NOTE: 40% (2/5) because the one drop-out survived at least a portion of the year.

AND <60% (3/5) because we don’t know if the one drop-out would have survived until the end of the year.

Comparing 2 groupsComparing 2 groups

Use log-rank test to test the null hypothesis of no difference between survival functions of the two groups.

CaveatsCaveats

Survival estimates can be unreliable toward the end of a study when there are small numbers of subjects at risk of having an event.

WHI and breast cancerWHI and breast cancer

Small numbers

left

Limitations of Kaplan-MeierLimitations of Kaplan-Meier

• Mainly descriptive• Doesn’t control for covariates• Requires categorical predictors• Can’t accommodate time-dependent

variables

Introduction to Cox RegressionIntroduction to Cox Regression

History“Regression Models and Life-Tables” by

D.R. Cox, published in 1972, is one of the most frequently cited journal articles in statistics and medicine


Also called proportional hazards regressionMultivariate regression technique where

time-to-event (taking into account censoring) is the dependent variable.

Estimates covariate-adjusted hazard ratios.– A hazard ratio is a ratio of incidence, or hazard,

rates


Distinction between rate and proportion: Incidence rate: number of new cases of disease per

population at-risk per unit time – Hazard rate: Instantaneous incidence rate; probability

that, given you survived disease-free up to time t, you succumb to the disease in the next instant.

Cumulative incidence (or cumulative risk): proportion of new cases that develop in a given time period

Rates vs. risksRates vs. risks

Relationship between risk and rates:

htetR 1)(

t

h

in time disease ofy probabilitR(t)

rate hazardconstant


For example, if rate is 5 cases/1000 person-years, then the chance of developing disease over 10 years is:

0488.951.1)(

1)(

1)(05.

)10)(005(.

tR

etR

etRCompare to .005(10) = 5% The loss of persons

at risk because they have developed disease within the period of observation is small relative to the size of the total group.


If rate is 50 cases/1000 person-years, then the chance of developing disease over 10 years is:

39.61.1)(

1)(

1)(5.

)10)(05(.

tR

etR

etRCompare to .05(10) = 50%

Distinction between hazard/rate ratio and odds ratio/risk ratio:

Hazard ratio: ratio of hazard ratesOdds/risk ratio: ratio of proportions

By taking into account time, you are taking into account more information than just binary yes/no.

Gain power/precision.

Logistic regression aims to estimate the odds ratio; Cox regression aims to estimate the hazard ratio

Introduction to Cox Regression

Example: Example: Study of publication biasStudy of publication bias

By Kaplan-Meier methods

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

Table 4 Risk factors for time to publication using univariate Cox regression analysis

Characteristic # not published # published Hazard ratio (95% CI)

Null 29 23 1.00

Non-significant trend

16 4 0.39 (0.13 to 1.12)

Significant 47 99 2.32 (1.47 to 3.66)

Interpretation: Significant results have a 2-fold higher incidence of publication compared to null results.

Univariate Cox regressionUnivariate Cox regression

Example : Example : Study of mortality in academy Study of mortality in academy award winning screenwriters (multivariate)award winning screenwriters (multivariate)

Kaplan-Meier methods

From: Longevity of screenwriters who win an academy award: longitudinal study BMJ 2001;323:1491-1496 ( 22-29 December )

Table 2. Death rates for screenwriters who have won an academy award. Values are percentages (95% confidence intervals) and are adjusted for the factor indicated Relative increase

in death rate for winners

Basic analysis 37 (10 to 70)

Adjusted analysis

Demographic:

Year of birth 32 (6 to 64)

Sex 36 (10 to 69)

Documented education 39 (12 to 73)

All three factors 33 (7 to 65)

Professional:

Film genre 37 (10 to 70)

Total films 39 (12 to 73)

Total four star films 40 (13 to 75)

Total nominations 43 (14 to 79)

Age at first film 36 (9 to 68)

Age at first nomination 32 (6 to 64)

All six factors 40 (11 to 76)

All nine factors 35 (7 to 70)

HR=1.37; interpretation: 37% higher incidence of death for winners compared with nominees

HR=1.35; interpretation: 35% higher incidence of death for winners compared with nominees even after adjusting for potential confounders

The modelThe model

ikki xxi ethth ...

011)()(

Components:

•A baseline hazard function that is left unspecified but must be positive (=the hazard when all covariates are 0)

•A linear function of a set of k fixed covariates that is exponentiated. (=the relative risk)

ikkii xxthth ...)(log)(log 110

Can take on any form

)(

0

0

2

1 21

2

1

)(

)(

)(

)( xxx

x

eeth

eth

th

thHR

The modelThe model

The point is to compare the hazard rates of individuals who have different covariates:

Hence, called Proportional hazards:

Hazard functions should be strictly parallel.

Evaluation of proportional hazards assumption.

Characteristics of Cox Characteristics of Cox RegressionRegression

Cox models the effect of predictors and covariates on the hazard rate but leaves the baseline hazard rate unspecified.

Does NOT assume knowledge of absolute risk.

Estimates relative rather than absolute risk.

Assumptions of Cox RegressionAssumptions of Cox Regression

Proportional hazards assumption: the hazard for any individual is a fixed proportion of the hazard for any other individual

Multiplicative risk

Survival analysis: ExampleSurvival analysis: Example

<1800 g (n=15)

1800-2199 g (n=55)

≥2200 g (n=52)

Kaplan-Meier estimates of stress fracture-free survivorship by BMC at baseline

<800 mg/day (n=22)

800-1499 mg/day (n=63)

1500+mg/day (n=36)

Kaplan-Meier estimates of stress fracture-free survivorship by levels of daily calcium intake at baseline

Previous fracture (n=39)

No previous fracture(n=83)

Kaplan-Meier estimates of stress fracture-free survivorship by previous stress fracture

Lowest quartile of lean mass

Highest quartile of lean mass

Middle two quartiles

Risk FactorsRisk Factors

Hazard Ratio (95% CI) History of menstrual irregularity prior to baseline 2.91 (0.81,10.43)BMC<1800g 3.70 (1.31, 10.46)

Low calcium (<800 mg/d) 3.60 (1.12,11.59)

Stress fracture prior to baseline 5.45 (1.48,20.08)Fat mass (per kg) 1.05 (0.91, 1.21)

**All analyses are stratified on site and menstrual status at baseline, and adjusted for age and spine Z-score at baseline using Cox

Regression.

Other protective factorsOther protective factors

Hazard Ratio (95% CI) Spine BMD (per 1-standard deviation increase) .54 (0.30, 0.96)Every 100-mg/d calcium (continuous) .90 (0.81, 0.99)

Lean mass (per kg), time-dependent .91 (0.81, 1.02)Change in lean mass (per kg) .83 (0.56, 1.24)Menarche (per 1-year older) .55 (0.34,0.90)

**All analyses are stratified on site and menstrual status at baseline, and adjusted for age and spine Z-score at baseline (except spine Z

score) using Cox Regression.

Documents

Introduction to Survival Analysis August 3 and 5, 2004