Upload
vankhanh
View
260
Download
10
Embed Size (px)
Citation preview
Gompertz regression parameterized as accelerated failure time model
Filip Andersson and Nicola Orsini
Biostatistics Team Department of Public Health SciencesKarolinska Institutet
2017 Nordic and Baltic Stata meeting
Content
§ Introduction§ Proportional hazard model§ Accelerated failure time model§ The Gompertz distribution§ Structural equation models and mediation§ Mediation in survival models§ Estimating confidence intervals§ What I am working on
2017-08-31Filip Andersson 2
Content
§ Exampleà Dataà Pre-estimationà Gompertz proportional hazardà Cox regressionà Gompertz vs. Kaplan-Maierà Gompertz ATF modelà Post-estimationà Conclusion
2017-08-31Filip Andersson 3
Introduction
§ Why use parametric surival models?à Can handle right-, left- or interval-censored dataà Cox regression can’t handle left- or interval-censored dataà Produce better estimation if you have a theoretical expectation of
the baseline hazardà Can estimate expected life, not only hazard ratios (AFT-models)à Can include random effects – frailty models (not discussed here)
2017-08-31Filip Andersson 4
Introduction
§ A model that is lacking an easy way to estimate in Stataà Gompertz regression parameterized as accelerated failure time
modelà Exist in R
§ eha-package, with command: aftreg
§ Why use Stata?à Easy handling survival data
§ Data management§ Setup
à Good graphical possibility
2017-08-31Filip Andersson 5
Proportional hazard model
§ Easy to compare with Cox regressionà Hazard ratiosà Plots
§ Cummulative hazard function§ Survival function
à Commonly used
§ Hazard function general formà ℎ 𝑡 𝑥 = ℎ%(𝑡)𝑒)*
2017-08-31Filip Andersson 6
Accelerated failure time model
§ Can be seen as a linear model (simplest form):à log 𝑡 = 𝑎 + 𝑏𝑥 + 𝜀à Useful in mediation
§ Estimation on life scaleà Estimation of expected baseline life
§ Area under the survival curve when all covariates are zeroà Compare expected life between two groups
§ Logarithmic change in expected life compared to the baseline life expectancy
§ Expectedlife = Baselinelifeexpectancy ∗ exp(effect)
2017-08-31Filip Andersson 7
Accelerated failure time model§ Definiton of accelerated failure time model
à For a group (X1,X2…Xp) , the model is written mathematically as 𝑆 𝑡 𝑥 = 𝑆%
CD())
, where S0(t) is the baseline survival function and 𝜂(𝑥) is an acceleration factor that is a ratio of survival times corresponding to any fixed value of S(t). The acceleration factor is given according to the formula 𝜂 𝑥 = 𝑒(FG)GH⋯HFJ)J). (Qi, J (2009))
§ Hazard function
à ℎ 𝑡 𝑥 = KD())
ℎ%C
D())
§ Log-linear fromà log 𝑡 = 𝑎 + 𝑏𝑥 + 𝜎𝜀à Where t and ε following corresponing distributions
2017-08-31Filip Andersson 8
The Gompertz distribution
§ When is it useful?à Adult and old age mortality for humans
§ Demographic models§ Including models with treatment effects, such as cancer patiens§ Can be problem with very old individuals
§ Normal paramertizationà ℎ 𝑡 = 𝜆𝑒NC
à 𝜆 > 0, 𝛾 ≥ 0, 𝑡 > 0
2017-08-31Filip Andersson 9
The Gompertz distribution
§ Suggested new parametrization by Broström, G & Edvinsson, S (2013)
à 𝜆 → UN, 𝛾 → K
N
à ℎ 𝑡 = UN𝑒V WX
à 𝜆 > 0, 𝛾 > 0, 𝑡 > 0
§ Proof of new parametrizationà Hazard for AFT-model
à ℎ 𝑡 𝑥 = KD())
ℎ%C
D())
à Here, new gamma can be seen as an accelerated factor
2017-08-31Filip Andersson 10
The Gompertz distribution
§ Linear model: log 𝑡 = 𝑎 + 𝑏𝑥 + 𝜀§ Here, ε is following a log-Gompertz or inverse Weibull distribution§ Compare to the Weibull model, where ε follows a Gumbel distribution
§ Likelihood functionà Survival function: 𝑆 𝑡 = 𝑒𝑥𝑝 −𝜆 𝑒V WX − 1
à Density function: 𝐹 𝑡 = ℎ 𝑡 𝑆 𝑡
à Hazard function: ℎ 𝑡 = UN𝑒V WX
à 𝐿 𝛼, 𝜇, 𝜎 = ∏ ℎa 𝑡a 𝑆a(𝑡a) bc 𝑆a(𝑡a) KdbceafK
2017-08-31Filip Andersson 11
Structural equation models and mediation
§ Simple way to estimate linear models within a pathwayframework
§ Estimate all equations and combine for the direct and indirecteffects
§ Supported by most statistical programsà In Stata the gsem-command combined with simulation is preferable
2017-08-31Filip Andersson 12
Mediation in survival models
§ What do we need to do?1. Estimate a parametric survival model2. Estimate the exposure on the mediator
§ First two steps directly from the gsem output3. Estimate the indirect, direct and total effect4. Estimate confidence intervals and significance
§ Step three and four can be done with either simulation or delta method§ These models are simple for continous mediators, but can be tricky
with binary or categorical mediators
2017-08-31Filip Andersson 13
Estimating confidence intervals
§ Simulationà Boostraping
§ Seems to be the more popular simulation method§ Calculate point estimates for the indirect and direct effects§ Simulate these point estimates
à Monte carlo simulation§ More flexible to handle problematic correlations§ Not as straight forward
§ Delta method§ Easiest method and probably most popular§ Need a stronger assumption of normality
2017-08-31Filip Andersson 14
What I am working on
§ A Stata command, staftgomp, to estimate the Gompertzregression parameterized as accelerated failure time model similar to what streg does
§ A post-estimation command that would make it simple to estimate direct, indirect and total effect, with confidence intervals, for survival models
2017-08-31Filip Andersson 15
Example
§ Scanian Economic Demographic Database (Bengtsson, T., Dribe, M. and Svensson, P. (2012))
§ Longitudinal historical databaseà Data from 17th century and onwardsà Here, data from individuals born between 1815-1860 are usedà Comes from five rural parishes in western Scaniaà Consist of important life course events as birth and death, but also
births of children, marriage or socioeconomic status are recorded
2017-08-31Filip Andersson 16
Data§ Variables used:
à ”Treatment variable”:§ Approximation of bad early life conditions§ Infant mortality rate at the year of birth§ High imr vs. low imr (binary)§ Years of high diseaseload such as measles, smallpox and whooping
cough (Quaranta, L. (2013))
à Parental socioeconomic status§ Socioceconomic status at birth (binary)§ Confounder
à Outcome§ The individuals are followed until death or out-migration.
2017-08-31Filip Andersson 17
Pre-estimation
§ Compare hazard estimations of Gompertz proportional hazardmodel and Cox regression
§ Plot survival curve and compare with Kaplan-Maier
§ If not acceptable test with different survival distribution until the parametric model is acceptableà Here, we choose Gompertz as it fits good and are supported
theoretically for adult mortality
2017-08-31Filip Andersson 18
Gompertz proportional hazard. streg imr_high ses, dist(gompertz)
Gompertz regression -- log relative-hazard form
No. of subjects = 3,756 Number of obs = 3,756No. of failures = 880Time at risk = 19824107
LR chi2(2) = 26.53Log likelihood = -1773.9194 Prob > chi2 = 0.0000
------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------imr_high | 1.259023 .0951873 3.05 0.002 1.085624 1.460119
ses | 1.362878 .1010669 4.17 0.000 1.178513 1.576084_cons | 9.57e-06 8.25e-07 -134.05 0.000 8.08e-06 .0000113
-------------+----------------------------------------------------------------/gamma | .0002332 8.35e-06 27.92 0.000 .0002168 .0002496
------------------------------------------------------------------------------
2017-08-31Filip Andersson 19
Cox regression. stcox imr_high ses
Cox regression -- Breslow method for ties
No. of subjects = 3,756 Number of obs = 3,756No. of failures = 880Time at risk = 19824107
LR chi2(2) = 28.17Log likelihood = -5889.8259 Prob > chi2 = 0.0000
------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------imr_high | 1.261686 .0955679 3.07 0.002 1.087617 1.463614
ses | 1.381581 .102833 4.34 0.000 1.194043 1.598573------------------------------------------------------------------------------
2017-08-31Filip Andersson 20
Gompertz vs. Kaplan-Maier
2017-08-31Filip Andersson 21
Gompertz AFT model
. staftgomp imr_high ses
Gompertz AFT regression No. of obs = 3756Log likelihood = -9325.8767 LR chi2(2) = 14.34Baseline life expectancy = 11669.94 Prob > chi2 = 0.0008------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------xb |
imr_high | -.0496389 .0261027 -1.90 0.057 -.1007992 .0015214ses | -.0873728 .0273233 -3.20 0.001 -.1409255 -.0338202
-------------+----------------------------------------------------------------bp |
lambda | 8.434053 .0451675 186.73 0.000 8.345526 8.522579gamma | -2.931995 .102271 -28.67 0.000 -3.132442 -2.731547
------------------------------------------------------------------------------
2017-08-31Filip Andersson 22
Post-estimation. lincom imr_high, eform
( 1) [xb]imr_high = 0
------------------------------------------------------------------------------_t | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------(1) | .951573 .0248386 -1.90 0.057 .9041146 1.001523
------------------------------------------------------------------------------
. nlcom exp([xb]imr_high)*11699
_nl_1: exp([xb]imr_high)*11699
------------------------------------------------------------------------------_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------_nl_1 | 11132.45 290.5866 38.31 0.000 10562.91 11701.99
------------------------------------------------------------------------------
2017-08-31Filip Andersson 23
Post-estimation
§ Baseline life expectancyà
KKghhigj
days = 32,1years
§ Estimating for individuals after 16000 daysà
KKghhHKg%%%igj
days = 75,9yearsofage
§ Effect of high imr during birthà
KKKiqHKg%%%igj
days = 74,3yearsofage
2017-08-31Filip Andersson 24
Conclusion
§ Conclusionà Even if you survive over the age of 40 you still have a mean shorter
life expectancyof 1,6 years if you were born in a year with high imrà Latent effectà Support for the fetal origins hypothesisà Is the estimate reasonable?
§ If neededà Mediation analysisand calculation of direct, indirect and total effect
of treatmentà Here, total effect = direct effect
2017-08-31Filip Andersson 25