25
Gompertz regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini Biostatistics Team Departmentof Public Health Sciences Karolinska Institutet 2017 Nordic and Baltic Stata meeting

Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Embed Size (px)

Citation preview

Page 1: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Gompertz regression parameterized as accelerated failure time model

Filip Andersson and Nicola Orsini

Biostatistics Team Department of Public Health SciencesKarolinska Institutet

2017 Nordic and Baltic Stata meeting

Page 2: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Content

§ Introduction§ Proportional hazard model§ Accelerated failure time model§ The Gompertz distribution§ Structural equation models and mediation§ Mediation in survival models§ Estimating confidence intervals§ What I am working on

2017-08-31Filip Andersson 2

Page 3: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Content

§ Exampleà Dataà Pre-estimationà Gompertz proportional hazardà Cox regressionà Gompertz vs. Kaplan-Maierà Gompertz ATF modelà Post-estimationà Conclusion

2017-08-31Filip Andersson 3

Page 4: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Introduction

§ Why use parametric surival models?à Can handle right-, left- or interval-censored dataà Cox regression can’t handle left- or interval-censored dataà Produce better estimation if you have a theoretical expectation of

the baseline hazardà Can estimate expected life, not only hazard ratios (AFT-models)à Can include random effects – frailty models (not discussed here)

2017-08-31Filip Andersson 4

Page 5: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Introduction

§ A model that is lacking an easy way to estimate in Stataà Gompertz regression parameterized as accelerated failure time

modelà Exist in R

§ eha-package, with command: aftreg

§ Why use Stata?à Easy handling survival data

§ Data management§ Setup

à Good graphical possibility

2017-08-31Filip Andersson 5

Page 6: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Proportional hazard model

§ Easy to compare with Cox regressionà Hazard ratiosà Plots

§ Cummulative hazard function§ Survival function

à Commonly used

§ Hazard function general formà ℎ 𝑡 𝑥 = ℎ%(𝑡)𝑒)*

2017-08-31Filip Andersson 6

Page 7: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Accelerated failure time model

§ Can be seen as a linear model (simplest form):à log 𝑡 = 𝑎 + 𝑏𝑥 + 𝜀à Useful in mediation

§ Estimation on life scaleà Estimation of expected baseline life

§ Area under the survival curve when all covariates are zeroà Compare expected life between two groups

§ Logarithmic change in expected life compared to the baseline life expectancy

§ Expectedlife = Baselinelifeexpectancy ∗ exp(effect)

2017-08-31Filip Andersson 7

Page 8: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Accelerated failure time model§ Definiton of accelerated failure time model

à For a group (X1,X2…Xp) , the model is written mathematically as 𝑆 𝑡 𝑥 = 𝑆%

CD())

, where S0(t) is the baseline survival function and 𝜂(𝑥) is an acceleration factor that is a ratio of survival times corresponding to any fixed value of S(t). The acceleration factor is given according to the formula 𝜂 𝑥 = 𝑒(FG)GH⋯HFJ)J). (Qi, J (2009))

§ Hazard function

à ℎ 𝑡 𝑥 = KD())

ℎ%C

D())

§ Log-linear fromà log 𝑡 = 𝑎 + 𝑏𝑥 + 𝜎𝜀à Where t and ε following corresponing distributions

2017-08-31Filip Andersson 8

Page 9: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

The Gompertz distribution

§ When is it useful?à Adult and old age mortality for humans

§ Demographic models§ Including models with treatment effects, such as cancer patiens§ Can be problem with very old individuals

§ Normal paramertizationà ℎ 𝑡 = 𝜆𝑒NC

à 𝜆 > 0, 𝛾 ≥ 0, 𝑡 > 0

2017-08-31Filip Andersson 9

Page 10: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

The Gompertz distribution

§ Suggested new parametrization by Broström, G & Edvinsson, S (2013)

à 𝜆 → UN, 𝛾 → K

N

à ℎ 𝑡 = UN𝑒V WX

à 𝜆 > 0, 𝛾 > 0, 𝑡 > 0

§ Proof of new parametrizationà Hazard for AFT-model

à ℎ 𝑡 𝑥 = KD())

ℎ%C

D())

à Here, new gamma can be seen as an accelerated factor

2017-08-31Filip Andersson 10

Page 11: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

The Gompertz distribution

§ Linear model: log 𝑡 = 𝑎 + 𝑏𝑥 + 𝜀§ Here, ε is following a log-Gompertz or inverse Weibull distribution§ Compare to the Weibull model, where ε follows a Gumbel distribution

§ Likelihood functionà Survival function: 𝑆 𝑡 = 𝑒𝑥𝑝 −𝜆 𝑒V WX − 1

à Density function: 𝐹 𝑡 = ℎ 𝑡 𝑆 𝑡

à Hazard function: ℎ 𝑡 = UN𝑒V WX

à 𝐿 𝛼, 𝜇, 𝜎 = ∏ ℎa 𝑡a 𝑆a(𝑡a) bc 𝑆a(𝑡a) KdbceafK

2017-08-31Filip Andersson 11

Page 12: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Structural equation models and mediation

§ Simple way to estimate linear models within a pathwayframework

§ Estimate all equations and combine for the direct and indirecteffects

§ Supported by most statistical programsà In Stata the gsem-command combined with simulation is preferable

2017-08-31Filip Andersson 12

Page 13: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Mediation in survival models

§ What do we need to do?1. Estimate a parametric survival model2. Estimate the exposure on the mediator

§ First two steps directly from the gsem output3. Estimate the indirect, direct and total effect4. Estimate confidence intervals and significance

§ Step three and four can be done with either simulation or delta method§ These models are simple for continous mediators, but can be tricky

with binary or categorical mediators

2017-08-31Filip Andersson 13

Page 14: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Estimating confidence intervals

§ Simulationà Boostraping

§ Seems to be the more popular simulation method§ Calculate point estimates for the indirect and direct effects§ Simulate these point estimates

à Monte carlo simulation§ More flexible to handle problematic correlations§ Not as straight forward

§ Delta method§ Easiest method and probably most popular§ Need a stronger assumption of normality

2017-08-31Filip Andersson 14

Page 15: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

What I am working on

§ A Stata command, staftgomp, to estimate the Gompertzregression parameterized as accelerated failure time model similar to what streg does

§ A post-estimation command that would make it simple to estimate direct, indirect and total effect, with confidence intervals, for survival models

2017-08-31Filip Andersson 15

Page 16: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Example

§ Scanian Economic Demographic Database (Bengtsson, T., Dribe, M. and Svensson, P. (2012))

§ Longitudinal historical databaseà Data from 17th century and onwardsà Here, data from individuals born between 1815-1860 are usedà Comes from five rural parishes in western Scaniaà Consist of important life course events as birth and death, but also

births of children, marriage or socioeconomic status are recorded

2017-08-31Filip Andersson 16

Page 17: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Data§ Variables used:

à ”Treatment variable”:§ Approximation of bad early life conditions§ Infant mortality rate at the year of birth§ High imr vs. low imr (binary)§ Years of high diseaseload such as measles, smallpox and whooping

cough (Quaranta, L. (2013))

à Parental socioeconomic status§ Socioceconomic status at birth (binary)§ Confounder

à Outcome§ The individuals are followed until death or out-migration.

2017-08-31Filip Andersson 17

Page 18: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Pre-estimation

§ Compare hazard estimations of Gompertz proportional hazardmodel and Cox regression

§ Plot survival curve and compare with Kaplan-Maier

§ If not acceptable test with different survival distribution until the parametric model is acceptableà Here, we choose Gompertz as it fits good and are supported

theoretically for adult mortality

2017-08-31Filip Andersson 18

Page 19: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Gompertz proportional hazard. streg imr_high ses, dist(gompertz)

Gompertz regression -- log relative-hazard form

No. of subjects = 3,756 Number of obs = 3,756No. of failures = 880Time at risk = 19824107

LR chi2(2) = 26.53Log likelihood = -1773.9194 Prob > chi2 = 0.0000

------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------imr_high | 1.259023 .0951873 3.05 0.002 1.085624 1.460119

ses | 1.362878 .1010669 4.17 0.000 1.178513 1.576084_cons | 9.57e-06 8.25e-07 -134.05 0.000 8.08e-06 .0000113

-------------+----------------------------------------------------------------/gamma | .0002332 8.35e-06 27.92 0.000 .0002168 .0002496

------------------------------------------------------------------------------

2017-08-31Filip Andersson 19

Page 20: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Cox regression. stcox imr_high ses

Cox regression -- Breslow method for ties

No. of subjects = 3,756 Number of obs = 3,756No. of failures = 880Time at risk = 19824107

LR chi2(2) = 28.17Log likelihood = -5889.8259 Prob > chi2 = 0.0000

------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------imr_high | 1.261686 .0955679 3.07 0.002 1.087617 1.463614

ses | 1.381581 .102833 4.34 0.000 1.194043 1.598573------------------------------------------------------------------------------

2017-08-31Filip Andersson 20

Page 21: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Gompertz vs. Kaplan-Maier

2017-08-31Filip Andersson 21

Page 22: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Gompertz AFT model

. staftgomp imr_high ses

Gompertz AFT regression No. of obs = 3756Log likelihood = -9325.8767 LR chi2(2) = 14.34Baseline life expectancy = 11669.94 Prob > chi2 = 0.0008------------------------------------------------------------------------------

_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------xb |

imr_high | -.0496389 .0261027 -1.90 0.057 -.1007992 .0015214ses | -.0873728 .0273233 -3.20 0.001 -.1409255 -.0338202

-------------+----------------------------------------------------------------bp |

lambda | 8.434053 .0451675 186.73 0.000 8.345526 8.522579gamma | -2.931995 .102271 -28.67 0.000 -3.132442 -2.731547

------------------------------------------------------------------------------

2017-08-31Filip Andersson 22

Page 23: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Post-estimation. lincom imr_high, eform

( 1) [xb]imr_high = 0

------------------------------------------------------------------------------_t | exp(b) Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------(1) | .951573 .0248386 -1.90 0.057 .9041146 1.001523

------------------------------------------------------------------------------

. nlcom exp([xb]imr_high)*11699

_nl_1: exp([xb]imr_high)*11699

------------------------------------------------------------------------------_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------_nl_1 | 11132.45 290.5866 38.31 0.000 10562.91 11701.99

------------------------------------------------------------------------------

2017-08-31Filip Andersson 23

Page 24: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Post-estimation

§ Baseline life expectancyà

KKghhigj

days = 32,1years

§ Estimating for individuals after 16000 daysà

KKghhHKg%%%igj

days = 75,9yearsofage

§ Effect of high imr during birthà

KKKiqHKg%%%igj

days = 74,3yearsofage

2017-08-31Filip Andersson 24

Page 25: Gompertz regression parameterized as accelerated … regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini BiostatisticsTeam DepartmentofPublic

Conclusion

§ Conclusionà Even if you survive over the age of 40 you still have a mean shorter

life expectancyof 1,6 years if you were born in a year with high imrà Latent effectà Support for the fetal origins hypothesisà Is the estimate reasonable?

§ If neededà Mediation analysisand calculation of direct, indirect and total effect

of treatmentà Here, total effect = direct effect

2017-08-31Filip Andersson 25