Assessing Survival:
Cox Proportional Hazards Model
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Statistics for Health ResearchStatistics for Health Research
Objectives of Workshop Objectives of Workshop
•Understand the general form Understand the general form of Cox PH modelof Cox PH model
•Understand the need for Understand the need for adjustedadjusted Hazard Ratios (HR) Hazard Ratios (HR)
•Implement the Cox model in Implement the Cox model in SPSSSPSS
•Understand and interpret Understand and interpret the output from SPSS the output from SPSS
•Understand the general form Understand the general form of Cox PH modelof Cox PH model
•Understand the need for Understand the need for adjustedadjusted Hazard Ratios (HR) Hazard Ratios (HR)
•Implement the Cox model in Implement the Cox model in SPSSSPSS
•Understand and interpret Understand and interpret the output from SPSS the output from SPSS
Modelling: Detecting Modelling: Detecting signal from background signal from background
noisenoise
Survival Regression Survival Regression ModelsModels
Expressed in terms of the Expressed in terms of the hazardhazard function formally function formally defined as:defined as:
The instantaneous The instantaneous riskrisk of event of event (mortality) in next time interval (mortality) in next time interval t, conditional on having t, conditional on having survived to start of the interval survived to start of the interval t t
Survival Regression Survival Regression ModelsModels
The Cox model expresses the The Cox model expresses the relationship between the hazard relationship between the hazard and a set of variables or and a set of variables or covariates covariates
These could be arm of trial, age, These could be arm of trial, age, gender, social deprivation, gender, social deprivation, Dukes stage, co-morbidity, Dukes stage, co-morbidity, etc…. etc….
How is the relationship How is the relationship formulated?formulated?
Simplest equation Simplest equation is:is:
k=h
h is the hazardh is the hazard
K is a constant e.g. 0.3 per Person-K is a constant e.g. 0.3 per Person-yearyear
Age in yearsAge in years
HHaazzaarrdd
How is the relationship How is the relationship formulated?formulated?
Next Simplest is linear Next Simplest is linear equation:equation:
ie+xβ+a=h
h is the outcome; a is the h is the outcome; a is the intercept;intercept;
ββ is the slope related to x the is the slope related to x the explanatory variable and;explanatory variable and;
e is the error term or ‘noise’e is the error term or ‘noise’
Linear model of hazard Linear model of hazard
Hazard
Age in yearsAge in years
Cox Proportional Cox Proportional Hazards Model Hazards Model
(1972)(1972))x,β(r×)t(h=)t(h 0
hh00 is the baseline hazard; is the baseline hazard;
r ( r ( ββ, x) function reflects how , x) function reflects how the hazard function changes the hazard function changes ((ββ) according to differences in ) according to differences in subjects’ characteristics (x)subjects’ characteristics (x)
Exponential model of Exponential model of hazard hazard
Hazard
Age in yearsAge in years
Cox Proportional Cox Proportional Hazards Model: Hazards Model:
Hazard RatioHazard Ratio )xβexp(=)x,β(rConsider hazard ratio for men vs. Consider hazard ratio for men vs. women, then -women, then -
)()(
)()(
)(
)(
0
0
women
men
women
men
xrth
xrth
th
thHR
Cox Proportional Cox Proportional Hazards Model: Hazards Model:
Hazard RatioHazard RatioIf coding for gender is x=1 If coding for gender is x=1 (men) and x=0 (women) (men) and x=0 (women) then: then:
)βexp(=
)0exp()βexp(
=)xβ(r
)xβ(r=HR
women
men
where where ββ is the regression coefficient for is the regression coefficient for gendergender
Hazard ratios in Hazard ratios in SPSSSPSS
SPSS gives hazard ratios for a SPSS gives hazard ratios for a binary factor coded (0,1) binary factor coded (0,1) automatically from automatically from exponentiation of regression exponentiation of regression coefficients (95% CI are also coefficients (95% CI are also given as an given as an optionoption))
Note that the HR is labelled as Note that the HR is labelled as EXP(B) in the output EXP(B) in the output
Fitting Gender in Cox Fitting Gender in Cox Model in SPSSModel in SPSS
Variables in the Equation
-.038 .121 .097 1 .755 .963SEXNUMB SE Wald df Sig. Exp(B)
Output from Cox Model in Output from Cox Model in SPSSSPSS
VariablVariable in e in modelmodel
RegressiRegression on CoefficienCoefficientt
Standard Standard errorerror
Test Test StatisticStatistic
( ( ββ/se(/se(ββ) )) )22
Degrees Degrees of of freedofreedomm
p-p-valuvaluee HR for HR for
men men vs. vs. womenwomen
Logrank Test: Null Logrank Test: Null HypothesisHypothesis
The Null hypothesis for the The Null hypothesis for the logrank test: logrank test:
Hazard Rate group A =Hazard Rate group A = Hazard Rate for group BHazard Rate for group B
= HR = = HR = OOAA / E / EAA = 1 = 1
OOBB / E / EBB
The Null hypothesis for the The Null hypothesis for the logrank test: logrank test:
Hazard Rate group A =Hazard Rate group A = Hazard Rate for group BHazard Rate for group B
= HR = = HR = OOAA / E / EAA = 1 = 1
OOBB / E / EBB
Wald Test: Null Wald Test: Null HypothesisHypothesis
The Null hypothesis for the Wald test: The Null hypothesis for the Wald test:
Hazard Ratio = 1Hazard Ratio = 1
Equivalent to regression coefficient Equivalent to regression coefficient ββ=0=0
Note that if the 95% CI for the HR Note that if the 95% CI for the HR includes 1 then the null hypothesis includes 1 then the null hypothesis cannot be rejectedcannot be rejected
The Null hypothesis for the Wald test: The Null hypothesis for the Wald test:
Hazard Ratio = 1Hazard Ratio = 1
Equivalent to regression coefficient Equivalent to regression coefficient ββ=0=0
Note that if the 95% CI for the HR Note that if the 95% CI for the HR includes 1 then the null hypothesis includes 1 then the null hypothesis cannot be rejectedcannot be rejected
Hazard ratios for Hazard ratios for categorical factors in categorical factors in
SPSSSPSS• Enter factor as beforeEnter factor as before
• Click on Click on ‘categorical’ ‘categorical’ and choose the and choose the reference category (usually first or reference category (usually first or last)last)
• E.g. Duke’s staging may choose Stage E.g. Duke’s staging may choose Stage A as the reference categoryA as the reference category
• HRs are now given in output for HRs are now given in output for survival in each category relative to survival in each category relative to Stage A Stage A
• Hence there will be n-1 HRs for n Hence there will be n-1 HRs for n categories categories
Fitting a categorical Fitting a categorical variable: Duke’s Stagingvariable: Duke’s Staging
ReferenReference ce categorcategoryy
Categorical Variable Codingsa,b
18 .000 .000 .000 .000
107 1.000 .000 .000 .000
188 .000 1.000 .000 .000
123 .000 .000 1.000 .000
40 .000 .000 .000 1.000
0=A
1=B
2=C
3=D
9=UK
DUKESFreqency (1) (2) (3) (4)
Indicator Parameter Codinga.
Category variable: DUKES (Dukes Staging)b.
Variables in the Equation
105.703 4 .000
.066 .441 .022 1 .882 1.068 .450 2.536
.716 .421 2.893 1 .089 2.047 .897 4.672
1.753 .420 17.379 1 .000 5.769 2.531 13.151
1.328 .446 8.875 1 .003 3.775 1.575 9.046
DUKES
DUKES(1)
DUKES(2)
DUKES(3)
DUKES(4)
B SE Wald df Sig. Exp(B) Lower Upper
95.0% CI for Exp(B)
B vs. AB vs. A
C vs. AC vs. A
D vs. AD vs. A
UK vs. AUK vs. A
One Solution to One Solution to ConfoundingConfounding
Use multiple Cox regression Use multiple Cox regression with both predictor and with both predictor and confounder as explanatory confounder as explanatory variables i.e fit:variables i.e fit:
)xβ+xβexp()t(h=)t(h 22110
xx11 is Duke’s Stage and x is Duke’s Stage and x22 is is AgeAge
Fitting a multiple Fitting a multiple regression: Duke’s regression: Duke’s Staging and AgeStaging and Age
Age adjusted for Duke’s Age adjusted for Duke’s StageStage
Variables in the Equation
111.400 4 .000
.159 .442 .130 1 .719 1.172 .493 2.788
.822 .422 3.800 1 .051 2.276 .996 5.203
1.896 .422 20.181 1 .000 6.662 2.913 15.238
1.321 .446 8.773 1 .003 3.748 1.564 8.986
.024 .007 13.761 1 .000 1.024 1.011 1.038
DUKES
DUKES(1)
DUKES(2)
DUKES(3)
DUKES(4)
AGE
B SE Wald df Sig. Exp(B) Lower Upper
95.0% CI for Exp(B)
Variables in the Equation
.019 .006 9.181 1 .002 1.019 1.007 1.032AGEB SE Wald df Sig. Exp(B) Lower Upper
95.0% CI for Exp(B)
Interpretation of the Interpretation of the Hazard RatioHazard Ratio
For a continuous variable such as age, For a continuous variable such as age, HR represents the incremental increase HR represents the incremental increase in hazard per unit increase in agein hazard per unit increase in age i.e i.e HR=1.024, increase 2.4% for a one year HR=1.024, increase 2.4% for a one year increase in ageincrease in ageFor a categorical variable the HR For a categorical variable the HR represents the incremental increase in represents the incremental increase in hazard in one category relative to the hazard in one category relative to the reference category i.e. reference category i.e. HR = 6.66 for HR = 6.66 for Stage D compared with A represents a Stage D compared with A represents a 6.7 fold increase in hazard6.7 fold increase in hazard
First steps in modelling First steps in modelling
•What hypotheses are you testing?What hypotheses are you testing?
•If main ‘If main ‘exposureexposure’ variable, enter ’ variable, enter first and assess confounders one first and assess confounders one at a timeat a time
•Assess each variable on Assess each variable on statistical statistical significancesignificance and and clinical clinical importanceimportance..
•It is acceptable to have an It is acceptable to have an ‘important’ variable ‘important’ variable withoutwithout statistical significancestatistical significance
SummarySummary• The Cox Proportional Hazards The Cox Proportional Hazards
model is the most used analytical model is the most used analytical tool in survival researchtool in survival research
• It is easily fitted in SPSSIt is easily fitted in SPSS
• Model assessment requires some Model assessment requires some thoughtthought
• Next step is to consider how to Next step is to consider how to select multiple factors for the select multiple factors for the ‘best’ model‘best’ model
Check assumption of Check assumption of proportional hazards (PH)proportional hazards (PH)
• Proportional hazards assumes that the Proportional hazards assumes that the ratio of hazard in one group to another ratio of hazard in one group to another remains the same throughout the follow-remains the same throughout the follow-up periodup period
• For example, that the HR for men vs. For example, that the HR for men vs. women is constant over timewomen is constant over time
• Simplest method is to check for parallel Simplest method is to check for parallel lines in the Log (-Log) plot of survivallines in the Log (-Log) plot of survival
• Proportional hazards assumes that the Proportional hazards assumes that the ratio of hazard in one group to another ratio of hazard in one group to another remains the same throughout the follow-remains the same throughout the follow-up periodup period
• For example, that the HR for men vs. For example, that the HR for men vs. women is constant over timewomen is constant over time
• Simplest method is to check for parallel Simplest method is to check for parallel lines in the Log (-Log) plot of survivallines in the Log (-Log) plot of survival
Check assumption of proportional Check assumption of proportional hazards for each factor. Log minus log hazards for each factor. Log minus log plot of survival should give parallel lines plot of survival should give parallel lines if PH holdsif PH holds Hint: Within Cox model Hint: Within Cox model
select factor as select factor as CATEGORICAL and in CATEGORICAL and in PLOTS select log PLOTS select log minus log function for minus log function for separate lines of separate lines of factorfactor
Check assumption of proportional Check assumption of proportional hazards for each factor. Log minus log hazards for each factor. Log minus log plot of survival should give parallel lines plot of survival should give parallel lines if PH holdsif PH holds
Hint: Within Cox Hint: Within Cox model select model select factor as factor as CATEGORICAL and CATEGORICAL and in PLOTS select in PLOTS select log minus log log minus log function for function for separate lines of separate lines of factorfactor
Proportional hazards Proportional hazards holds for Duke’s holds for Duke’s
StagingStagingCategorical Variable Codings(b)
Frequency (1) (2) (3) (4)
dukes(a)0=A 18 1 0 0 0
1=B 107 0 1 0 0
2=C 188 0 0 1 0
3=D 123 0 0 0 1
9=UK 40 0 0 0 0
a Indicator Parameter Coding
b Category variable: dukes (Dukes Staging)
Categorical Variable Codings(b)
Frequency (1) (2) (3) (4)
dukes(a)0=A 18 1 0 0 0
1=B 107 0 1 0 0
2=C 188 0 0 1 0
3=D 123 0 0 0 1
9=UK 40 0 0 0 0
a Indicator Parameter Coding
b Category variable: dukes (Dukes Staging)
Proportional hazards Proportional hazards holds for Duke’s holds for Duke’s
StagingStaging
SummarySummary• Selection of factors for Multiple Selection of factors for Multiple
Cox regression models requires Cox regression models requires some judgementsome judgement
• Automatic procedures are Automatic procedures are available but treat results with available but treat results with cautioncaution
• They are easily fitted in SPSSThey are easily fitted in SPSS
• Check proportional hazards Check proportional hazards assumptionassumption
• Parsimonious models are betterParsimonious models are better
PracticalPractical
• Read in Colorectal.sav and try Read in Colorectal.sav and try to fit a multiple proportional to fit a multiple proportional hazards modelhazards model
• Check proportional hazards Check proportional hazards assumptionassumption