Upload
russell-melton
View
217
Download
0
Embed Size (px)
Citation preview
Introduction to Survival Analysis
August 3 and 5, 2004
OverviewOverview
What is survival analysis?Introduction to Kaplan-Meier methods.Introduction to Cox proportional hazards
methods (Thursday)Recommended reading in Walker: Chapters
21-22
What is survival analysis?What is survival analysis?
Statistical methods for analyzing longitudinal data on the occurrence of events.
Events may include death, injury, onset of illness, recovery from illness (binary variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts).
Accommodates data from randomized clinical trial or cohort study design.
Randomized Clinical Trial Randomized Clinical Trial (RCT)(RCT)
Target population
Intervention
Control
Disease
Disease-free
Disease
Disease-free
TIME
Random assignment
Disease-free, at-risk cohort
Target population
Treatment
Control
Cured
Not cured
Cured
Not cured
TIME
Random assignment
Patient population
Randomized Clinical Trial Randomized Clinical Trial (RCT)(RCT)
Target population
Treatment
Control
Dead
Alive
Dead
Alive
TIME
Random assignment
Patient population
Randomized Clinical Trial Randomized Clinical Trial (RCT)(RCT)
Cohort study Cohort study (prospective/retrospective) (prospective/retrospective)
Target population
Exposed
Unexposed
Disease
Disease-free
Disease
Disease-free
TIME
Disease-free cohort
Examples of survival analysis Examples of survival analysis in medicinein medicine
RCT: Women’s Health RCT: Women’s Health Initiative (Initiative (JAMAJAMA, 2001), 2001)
On hormones
On placeboCumulative incidence
Retrospective cohort study:Retrospective cohort study:From December 2003 From December 2003 BMJBMJ: :
Aspirin, ibuprofen, and mortality after myocardial infarction: Aspirin, ibuprofen, and mortality after myocardial infarction:
retrospective cohort studyretrospective cohort study
– Estimate time-to-event for a group of individuals, such as time until second heart-attack for a group of MI patients.
– To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial.
– To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients?
Note: expected time-to-event = 1/incidence rate
Objectives of survival analysis
Survival Analysis: TermsSurvival Analysis: Terms Time-to-event: The time from entry into a study
until a subject has a particular outcome Censoring: Subjects are said to be censored if
they are lost to follow up or drop out of the study, or if the study ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. – If dropout is related to both outcome and treatment,
dropouts may bias the results
Why use survival analysis?Why use survival analysis?
1. Why not compare mean time-to-event between your groups using a t-test or linear regression?
-- ignores censoring 2. Why not compare proportion of events in
your groups using odds ratios or logistic regression?
--ignores time
Data Structure: survival Data Structure: survival analysisanalysis
Time variable: ti = time at last disease-free observation or time at event
Censoring variable: ci =1 if had the event; ci =0 no event by time ti
Choice of time of origin. Note varying start times.
Count every subject’s time since their baseline data collection.
Survival functionSurvival function
)()( tTPtS
Gives the probability of surviving past a certain time.
For example, the probability of surviving beyond 10, years, 50 years, or 100 years.
One goal of survival analysis is to estimate and compare survival experiences of different groups.
Survival experience is described by the survival function:
Introduction to Kaplan-MeierIntroduction to Kaplan-Meier
Non-parametric estimate of the survival function.
Commonly used to describe survivorship of study population/s.
Commonly used to compare two study populations.
Intuitive graphical presentation.
Beginning of study End of study Time in months
Subject B
Subject A
Subject C
Subject D
Subject E
Survival Data (right-censored)Survival Data (right-censored)
1. subject E dies at 4 months
X
100%
Time in months
Corresponding Kaplan-Meier Corresponding Kaplan-Meier CurveCurve
Probability of surviving to 4 months is 100% = 5/5
Fraction surviving this death = 4/5
Subject E dies at 4 months
Beginning of study End of study Time in months
Subject B
Subject A
Subject C
Subject D
Subject E
Survival DataSurvival Data
2. subject A drops out after 6 months
1. subject E dies at 4 months
X
3. subject C dies at 7 monthsX
100%
Time in months
Corresponding Kaplan-Meier Corresponding Kaplan-Meier CurveCurve
subject C dies at 7 months
Fraction surviving this death = 2/3
Beginning of study End of study Time in months
Subject B
Subject A
Subject C
Subject D
Subject E
Survival DataSurvival Data
2. subject A drops out after 6 months
4. Subjects B and D survive for the whole year-long study period
1. subject E dies at 4 months
X
3. subject C dies at 7 monthsX
100%
Time in months
Corresponding Kaplan-Meier Corresponding Kaplan-Meier CurveCurve
Product limit estimate of survival = P(surviving event 1/at-risk up to failure 1) * P(surviving event 2/at-risk up to failure 2) =4/5 * 2/3= .5333
The product limit estimateThe product limit estimate
The probability of surviving in the entire year, taking into account censoring
= (4/5) (2/3) = 53%
NOTE: 40% (2/5) because the one drop-out survived at least a portion of the year.
AND <60% (3/5) because we don’t know if the one drop-out would have survived until the end of the year.
Comparing 2 groupsComparing 2 groups
Use log-rank test to test the null hypothesis of no difference between survival functions of the two groups.
CaveatsCaveats
Survival estimates can be unreliable toward the end of a study when there are small numbers of subjects at risk of having an event.
WHI and breast cancerWHI and breast cancer
Small numbers
left
Limitations of Kaplan-MeierLimitations of Kaplan-Meier
• Mainly descriptive• Doesn’t control for covariates• Requires categorical predictors• Can’t accommodate time-dependent
variables
Introduction to Cox RegressionIntroduction to Cox Regression
History“Regression Models and Life-Tables” by
D.R. Cox, published in 1972, is one of the most frequently cited journal articles in statistics and medicine
Introduction to Cox RegressionIntroduction to Cox Regression
Also called proportional hazards regressionMultivariate regression technique where
time-to-event (taking into account censoring) is the dependent variable.
Estimates covariate-adjusted hazard ratios.– A hazard ratio is a ratio of incidence, or hazard,
rates
Introduction to Cox RegressionIntroduction to Cox Regression
Distinction between rate and proportion: Incidence rate: number of new cases of disease per
population at-risk per unit time – Hazard rate: Instantaneous incidence rate; probability
that, given you survived disease-free up to time t, you succumb to the disease in the next instant.
Cumulative incidence (or cumulative risk): proportion of new cases that develop in a given time period
Rates vs. risksRates vs. risks
Relationship between risk and rates:
htetR 1)(
t
h
in time disease ofy probabilitR(t)
rate hazardconstant
Rates vs. risksRates vs. risks
For example, if rate is 5 cases/1000 person-years, then the chance of developing disease over 10 years is:
0488.951.1)(
1)(
1)(05.
)10)(005(.
tR
etR
etRCompare to .005(10) = 5% The loss of persons
at risk because they have developed disease within the period of observation is small relative to the size of the total group.
Rates vs. risksRates vs. risks
If rate is 50 cases/1000 person-years, then the chance of developing disease over 10 years is:
39.61.1)(
1)(
1)(5.
)10)(05(.
tR
etR
etRCompare to .05(10) = 50%
Distinction between hazard/rate ratio and odds ratio/risk ratio:
Hazard ratio: ratio of hazard ratesOdds/risk ratio: ratio of proportions
By taking into account time, you are taking into account more information than just binary yes/no.
Gain power/precision.
Logistic regression aims to estimate the odds ratio; Cox regression aims to estimate the hazard ratio
Introduction to Cox Regression
Example: Example: Study of publication biasStudy of publication bias
By Kaplan-Meier methods
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
Table 4 Risk factors for time to publication using univariate Cox regression analysis
Characteristic # not published # published Hazard ratio (95% CI)
Null 29 23 1.00
Non-significant trend
16 4 0.39 (0.13 to 1.12)
Significant 47 99 2.32 (1.47 to 3.66)
Interpretation: Significant results have a 2-fold higher incidence of publication compared to null results.
Univariate Cox regressionUnivariate Cox regression
Example : Example : Study of mortality in academy Study of mortality in academy award winning screenwriters (multivariate)award winning screenwriters (multivariate)
Kaplan-Meier methods
From: Longevity of screenwriters who win an academy award: longitudinal study BMJ 2001;323:1491-1496 ( 22-29 December )
Table 2. Death rates for screenwriters who have won an academy award. Values are percentages (95% confidence intervals) and are adjusted for the factor indicated Relative increase
in death rate for winners
Basic analysis 37 (10 to 70)
Adjusted analysis
Demographic:
Year of birth 32 (6 to 64)
Sex 36 (10 to 69)
Documented education 39 (12 to 73)
All three factors 33 (7 to 65)
Professional:
Film genre 37 (10 to 70)
Total films 39 (12 to 73)
Total four star films 40 (13 to 75)
Total nominations 43 (14 to 79)
Age at first film 36 (9 to 68)
Age at first nomination 32 (6 to 64)
All six factors 40 (11 to 76)
All nine factors 35 (7 to 70)
HR=1.37; interpretation: 37% higher incidence of death for winners compared with nominees
HR=1.35; interpretation: 35% higher incidence of death for winners compared with nominees even after adjusting for potential confounders
The modelThe model
ikki xxi ethth ...
011)()(
Components:
•A baseline hazard function that is left unspecified but must be positive (=the hazard when all covariates are 0)
•A linear function of a set of k fixed covariates that is exponentiated. (=the relative risk)
ikkii xxthth ...)(log)(log 110
Can take on any form
)(
0
0
2
1 21
2
1
)(
)(
)(
)( xxx
x
eeth
eth
th
thHR
The modelThe model
The point is to compare the hazard rates of individuals who have different covariates:
Hence, called Proportional hazards:
Hazard functions should be strictly parallel.
Evaluation of proportional hazards assumption.
Characteristics of Cox Characteristics of Cox RegressionRegression
Cox models the effect of predictors and covariates on the hazard rate but leaves the baseline hazard rate unspecified.
Does NOT assume knowledge of absolute risk.
Estimates relative rather than absolute risk.
Assumptions of Cox RegressionAssumptions of Cox Regression
Proportional hazards assumption: the hazard for any individual is a fixed proportion of the hazard for any other individual
Multiplicative risk
Survival analysis: ExampleSurvival analysis: Example
<1800 g (n=15)
1800-2199 g (n=55)
≥2200 g (n=52)
Kaplan-Meier estimates of stress fracture-free survivorship by BMC at baseline
<800 mg/day (n=22)
800-1499 mg/day (n=63)
1500+mg/day (n=36)
Kaplan-Meier estimates of stress fracture-free survivorship by levels of daily calcium intake at baseline
Previous fracture (n=39)
No previous fracture(n=83)
Kaplan-Meier estimates of stress fracture-free survivorship by previous stress fracture
Lowest quartile of lean mass
Highest quartile of lean mass
Middle two quartiles
Risk FactorsRisk Factors
Hazard Ratio (95% CI) History of menstrual irregularity prior to baseline 2.91 (0.81,10.43)BMC<1800g 3.70 (1.31, 10.46)
Low calcium (<800 mg/d) 3.60 (1.12,11.59)
Stress fracture prior to baseline 5.45 (1.48,20.08)Fat mass (per kg) 1.05 (0.91, 1.21)
**All analyses are stratified on site and menstrual status at baseline, and adjusted for age and spine Z-score at baseline using Cox
Regression.
Other protective factorsOther protective factors
Hazard Ratio (95% CI) Spine BMD (per 1-standard deviation increase) .54 (0.30, 0.96)Every 100-mg/d calcium (continuous) .90 (0.81, 0.99)
Lean mass (per kg), time-dependent .91 (0.81, 1.02)Change in lean mass (per kg) .83 (0.56, 1.24)Menarche (per 1-year older) .55 (0.34,0.90)
**All analyses are stratified on site and menstrual status at baseline, and adjusted for age and spine Z-score at baseline (except spine Z
score) using Cox Regression.