Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Multi-state modelling software, and encouragingstatistical software development.
Chris JacksonMRC Biostatistics Unit, Cambridge, U.K.
MRC Biostatistics Unit Centenary Conference, 25 March 2014
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 1/ 24
Overview
Part 1. Software for multi-state models
I Two different types of multi-state model
I msm package for R — features, design principles, potentialdevelopments
I survival + mstate packages for R.
Part 2. Encouraging statistical software development inbiostatistics research
I What’s needed, and how to do it. Start discussion. . .
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 2/ 24
Part I
Software for multi-state modelling —
state of the art and future.
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 3/ 24
Multi-state models in continuous time
Example:
STAGE n−2 STAGE n−1
DISEASE DISEASE DISEASE DISEASE...
ABSORBING
STATE n
STAGE 1 STAGE 2
Defined by matrix Q of transition intensities: instantaneous risk ofmoving from state r to state s 6= r : at time t:
qrs(t,F(t−)) = limδt→0
P(S(t + δt) = s|S(t) = r ,F(t−))/δt.
e.g.Markovtime-homogeneous
}model, qrs independent of
{F(t−)t
I Single period (sojourn time) in state r ∼ Exp(mean = −1/qrr )
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 4/ 24
Multi-state models in continuous time
Example:
STAGE n−2 STAGE n−1
DISEASE DISEASE DISEASE DISEASE...
ABSORBING
STATE n
STAGE 1 STAGE 2
Defined by matrix Q of transition intensities: instantaneous risk ofmoving from state r to state s 6= r : at time t:
qrs(t,F(t−)) = limδt→0
P(S(t + δt) = s|S(t) = r ,F(t−))/δt.
e.g.Markovtime-homogeneous
}model, qrs independent of
{F(t−)t
I Single period (sojourn time) in state r ∼ Exp(mean = −1/qrr )
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 4/ 24
Data for multi-state models (1): intermittently-observed
State 1
State 2
State 3
State 4
t0 t1 t2 t3 t4
Panel data. State only observed ata finite number of times j .
I Don’t know the state betweenthese times
I e.g. chronic disease onlymeasurable at clinic visit /screening test
I Likelihood is product of transition probabilities between statesS(tj) observed at successive tj (Kalbfleisch & Lawless, JASA 1985).
L(Q) =∏j
PS(tj ),S(tj+1)(tj+1 − tj).
I Closed form for corresponding matrix P(t) = Exp(tQ) only ifQ is constant / piecewise constant with time t.
I Non-Markov models difficult (see later. . .)
msm package for R — designed for this type of data
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 5/ 24
Data for multi-state models (1): intermittently-observed
State 1
State 2
State 3
State 4
t0 t1 t2 t3 t4
Panel data. State only observed ata finite number of times j .
I Don’t know the state betweenthese times
I e.g. chronic disease onlymeasurable at clinic visit /screening test
I Likelihood is product of transition probabilities between statesS(tj) observed at successive tj (Kalbfleisch & Lawless, JASA 1985).
L(Q) =∏j
PS(tj ),S(tj+1)(tj+1 − tj).
I Closed form for corresponding matrix P(t) = Exp(tQ) only ifQ is constant / piecewise constant with time t.
I Non-Markov models difficult (see later. . .)
msm package for R — designed for this type of data
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 5/ 24
Data for multi-state models (2): completely-observed
State 1
State 2
State 3
Death
0 12
Observe all changes of state
I know the complete processhistory.
I e.g. changes of state representevents
I MI, stroke, periods in hospital.
I event times may be known fromadministrative data.
I Time-to-event data with competing event times censored.I substantial literature on survival / competing risks
I Only Markov models supported by msm.I exponential / piecewise-exponential event times.
I Can estimate transition rates under more flexible models (e.g.Cox semi-Markov) using standard survival analysis software.
survival and mstate packages for R designed for this
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 6/ 24
Data for multi-state models (2): completely-observed
State 1
State 2
State 3
Death
0 12
Observe all changes of state
I know the complete processhistory.
I e.g. changes of state representevents
I MI, stroke, periods in hospital.
I event times may be known fromadministrative data.
I Time-to-event data with competing event times censored.I substantial literature on survival / competing risks
I Only Markov models supported by msm.I exponential / piecewise-exponential event times.
I Can estimate transition rates under more flexible models (e.g.Cox semi-Markov) using standard survival analysis software.
survival and mstate packages for R designed for this
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 6/ 24
msm R package for multi-state modelling
http://CRAN.R-project.org/package=msm
Jackson (J Stat. Soft. 2011), Jackson et al. (Statistician 2003)
Used in health, finance, ecology, social science, engineering. . .
General and flexible. Fit continuous-time Markov modelsI with any state structure / transition matrixI covariates (proportional intensities) for any / all transitions
I subject-specific time-constant orI piecewise-constant time-dependent, including time itself
I to various patterns of observation, particularlyintermittently-observed. . .
msm(state ~ time, subject=subj, data=mydata,
covariates = list("1-2" = ~ age,
"2-3" = ~ age + treatment),
qmatrix=rbind(c(0,1,1),
c(0,0,1),
c(0,0,0)), gen.inits=TRUE)
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 7/ 24
Other observation schemes
State 1
State 2
State 3
4 = Death
0.0 1.5 3.5 5.0 9.0
State entry time observed, butstate at previous time unknown(typical for times of death)
# state 4 like this
msm(..., death=4, ...)
State 1
State 2
State 3
4 = Death
0.0 1.5 3.5 5.0 9.0
?
?
?X States only observed to be in a
certain set of possible states (e.g.alive, but unknown disease severity)
# (put 999 in data at these times)
msm(..., censor=999,
censor.states=c(1,2,3))
Appropriate likelihoods computed / maximised by msm.(No truncated samples, or informative observation times).
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 8/ 24
Hidden Markov models
State 1
State 2
State 3
State 4
0.0 1.5 3.5 5.0 10.0
States observed with error, from atrue hidden Markov chain.
msm(..., ematrix = rbind(c(0,1,0),
c(1,0,0),
c(0,0,0)),...)
Months
Mar
ker
leve
l
0 20 40 60 80 100
40
70
100
DISEASE−FREE
DISEASE STAGE 1
DISEASE STAGE 2
ObservedUnderlying
Continuous outcome, conditionalon a hidden Markov chain (manyoutcome distributions)
msm(y ~ time,
hmodel = list(hmmNorm(mean=100, sd=16),
hmmNorm(mean=54, sd=18),
hmmIdent(999)), ...)
msm jointly estimates (with covariates on either)
I Markov transition intensities QI misclassification probabilities / outcome parameters
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 9/ 24
Other features
I Constrain parameters to equal other parameters or to knownvalues — parsimony / model checking
msm(...,
constraint = list(age = c(1,1,1,2,2),
treatment = c(1,2,3,4,4)),
fixedpars = c(4,9),
...)
I Mixture of observation types (intermittent, exact,misclassified. . .) in the same data
msm(..., obstype = obs, obstrue = obst, ... )
I Model assessment: plots and formal tests of fit1.
plot.prevalence.msm(x), pearson.msm(x), ...
1Titman & Sharples, Stat Meth Med Res (2010) 19(6):621-651Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 10/ 24
Programming principles (1)
(I don’t always follow these! Takes time)
I Minimum input from user: sensible defaults, that advancedusers can tweak.
I Complete documentation, user guides and examples.I Maintain it / fix bugs. Reply helpfully to emails!
I Update documentation if people find something unclear /commonly misuse in same way
I Outputs that are meaningful / helpful.I e.g. confidence intervals instead of SEs / p-values, hazard
ratios as well as log HRs.
I Useful messages. Tell the user how to fix any errors:
Data inconsistent with transition matrix
Data inconsistent with transition matrix: subject 200 moves from
state 4 to state 2 at non-missing observation 1436
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 11/ 24
Programming principles (1)
(I don’t always follow these! Takes time)
I Minimum input from user: sensible defaults, that advancedusers can tweak.
I Complete documentation, user guides and examples.I Maintain it / fix bugs. Reply helpfully to emails!
I Update documentation if people find something unclear /commonly misuse in same way
I Outputs that are meaningful / helpful.I e.g. confidence intervals instead of SEs / p-values, hazard
ratios as well as log HRs.
I Useful messages. Tell the user how to fix any errors:
Data inconsistent with transition matrix
Data inconsistent with transition matrix: subject 200 moves from
state 4 to state 2 at non-missing observation 1436
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 11/ 24
Programming principles (2)
(I don’t always follow these! Takes time)
I Modular / maintainable code.I Avoid reinventing the wheel – build on others’ work
I e.g. msm now depends on mexp (Goulet et al.), a nice library formatrix exponentiation, instead of relying on its own methods
I Automatic “unit testing”: for each package function, whenupdating package
stopifnot(result_from_new_version() == known_result)
I Efficiency: vectorised R code, heavy numerical work done in C
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 12/ 24
Future developments in msm
I Many internal cleanups, and unglamorous new features, inpast year.
I In progress: modified AIC to compare models withdifferently-aggregated state structures (Howard Thom)
I Future: test assumptions (Markov, time/between patienthomogeneity) through fitting more complex models. . .
I . . . but difficult with intermittent observations: unobservedevent times / paths through states.
I Integrating likelihoods over possible paths / event times onlyfeasible for simple transition structures (e.g. well/illness/death)
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 13/ 24
Semi-Markov models for panel data
State 1
State 2
State 3
State 4
t0 t1 t2 t3 t4
I Transition intensitydepends on time spent incurrent state.
I Difficult for panel data:don’t know time of entryinto state.
Phase-type models (Titman and Sharples, Biometrics 2010; Titman, Statistics and Computing 2011)
Current state
Nextstate
Substate 1 Substate 2 Substate kSubstate
k+1...
Exponential sojourn in current state
replaced by embedded Markov
model with k phases: sojourn time
is time to state k + 1.
May be implemented ashidden Markov models in msm
I many more parameters,how to interpret?
I identifiability constraintsnot implemented yet
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 14/ 24
Random effects models
I Unexplained heterogeneity in transition intensities betweenindividuals / groups.
I Likelihood only tractable for specific casesI e.g. discrete random effects distribution (Cook et al, Biometrics
2004)I same random effect for all intensities (Satten, Biometrics 1999).I conjugate gamma frailties, exact event times (Putter &
Houwelingen, SMMR 2011), mixture models / mover-stayer(O’Keeffe et al. Stat. Med. 2012). . .
I MCMC approaches (JAGS / BUGS / Stan software) or MonteCarlo EM (Sutradhar and Cook, JRSS C, 2008)?
I experimental facility available in msm to generate code to fitsame model in JAGS
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 15/ 24
Time-inhomogeneous models for panel data
State 1
State 2
State 3
State 4
t0 t1 t2 t3 t4
I Likelihood needs transitionprobability matrix P with r , sentry Pr(S(t1) = s|S(t0) = r).
I Kolmogorov forward equationsdP(t0, t1)/dt = P(t0, t1)Q(t)
I Q not constant or piecewiseconstant with time → noanalytic solution.
Or numerically solve the differential equation (Titman, Biometrics
2011).
I Allows e.g. Weibull or spline functions for Q(t) — smoother /more realistic than piecewise constant
I Need to solve for each distinct covariate value — hard forcontinuous covariates / big datasets.
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 16/ 24
Time-inhomogeneous models for panel data
State 1
State 2
State 3
State 4
t0 t1 t2 t3 t4
I Likelihood needs transitionprobability matrix P with r , sentry Pr(S(t1) = s|S(t0) = r).
I Kolmogorov forward equationsdP(t0, t1)/dt = P(t0, t1)Q(t)
I Q not constant or piecewiseconstant with time → noanalytic solution.
Or numerically solve the differential equation (Titman, Biometrics
2011).
I Allows e.g. Weibull or spline functions for Q(t) — smoother /more realistic than piecewise constant
I Need to solve for each distinct covariate value — hard forcontinuous covariates / big datasets.
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 16/ 24
Future developments in msm
Ideally any new methods should work with any multi-statestructure.
I or at least for common structures: e.g. progressive disease
, or progression and death
I unmaintainable if handle too many special cases in differentways.
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 17/ 24
All transition times known: survival/mstate system
Model fitting: survival R package (Therneau)
I Estimate event-specific hazards == transition ratesI under Cox or fully parametric models for times to each event.
Prediction: mstate R package (de Wreede et al. J. Stat. Soft. 2011)
I Estimate cumulative incidences from a Cox model (Breslow)
I Convert these to transition probabilities over a time periodI Aalen-Johansen estimator for inhomogeneous Markov modelsI Individual patient simulation for semi-Markov models
I No documentation for using parametric modelsI needed e.g. for extrapolation in health economic evaluations
I Data need some awkward manipulation
Tutorials / courses to clear up msm vs. mstate confusion / makeall their methods more accessible?
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 18/ 24
Part II
Encouraging more software development
in biostatistics research
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 19/ 24
Why encourage software development?
Statistical methods need accessible software.
I allows the method to be used by more people— especially non-experts
I saves time for everyone, even experts
I increases transparency / trust in research resultsI good for promoting a new method
I “the other way our ideas get out there is through software . . .
software implementation is a kind of publication, indeed, one of the
best kinds.” (http://andrewgelman.com/2014/03/12/publishing-journals)
I see, e.g. popularity of DIC in BUGS
I impact, citations. . .
→ good for science (and scientists).
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 20/ 24
Types of statistical software
Ad-hoc code, often to accompany a journal paper
I Typically only usable by experts, maybe only after tweaking
I OK for reproducibility / transparency
I Bare minimum
R (or Stata) packages.
Standalone software / large libraries (e.g. BUGS, JAGS, Stan).
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 21/ 24
Types of statistical software
Ad-hoc code, often to accompany a journal paper
R (or Stata) packages.
I Documented and maintained, easy to install and use. . .
I Builds on great work done in the last 10-15 years to make Rand CRAN the main platform for statistics research.
I Ideal that we can aim for for new methodology.I see e.g. Jeff Leek (Johns Hopkins) group policy: all PhD students
to develop and maintain an R package: http://simplystatistics.org/2013/
10/07/the-leek-group-policy-for-developing-sustainable-r-packages/
Standalone software / large libraries (e.g. BUGS, JAGS, Stan).
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 21/ 24
Types of statistical software
Ad-hoc code, often to accompany a journal paper
R (or Stata) packages.
Standalone software / large libraries (e.g. BUGS, JAGS, Stan).
I Often to accompany a major methodological advance (e.g.MCMC).
I Needs advanced programming / software engineering skills todevelop.
I Still needs users for feedback / bug reports / testing / support
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 21/ 24
How can we develop more accessible software?
Culture shift: software viewed as a valuable research output
I Funding bodies / grant reviewers, research assessors, PhDexaminers, journal editors, supervisors and line managers. . .
Time and money. . .
People and skills. . .
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24
How can we develop more accessible software?
Culture shift: software viewed as a valuable research output
Time and money. . .
I Priorities: consider a methodology project not finishedwithout usable software — not an “optional extra”
I A lot of tedious work involved in writing software — but sameis true for writing papers!
People and skills. . .
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24
How can we develop more accessible software?
Culture shift: software viewed as a valuable research output
Time and money. . .
I Priorities: consider a methodology project not finishedwithout usable software — not an “optional extra”
I A lot of tedious work involved in writing software — but sameis true for writing papers!
“. . .[our academic culture] has traditionally put a very high premium on beingclever and a relatively low premium on being willing to go through the schlep.[As applied statistics grows] the schlep becomes just as important as the cleveridea. If you aren’t willing to put in the time to code your methods up and makethem accessible to other investigators, then who will be?”Jeff Leek (http://simplystatistics.org/2012/05/28/schlep-blindness-in-statistics)
People and skills. . .
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24
How can we develop more accessible software?
Culture shift: software viewed as a valuable research output
Time and money. . .
I Priorities: consider a methodology project not finishedwithout usable software — not an “optional extra”
I A lot of tedious work involved in writing software — but sameis true for writing papers!
“creating an R package is building something. It is something you can point toand say, ”I made that”. Leaving aside all the tangible benefits to your career,the profession, etc. it is maybe the most gratifying feeling you get whenworking on research.” (Jeff Leek, https://github.com/jtleek/rpackages)
People and skills. . .
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24
How can we develop more accessible software?
Culture shift: software viewed as a valuable research output
Time and money. . .
People and skills. . .
I More collaborative programming, just like we do collaborativewriting (tools could help e.g. GitHub)
I Informal / internal peer review by local software expertsI Training of students / researchers in software development
I Software user / discussion groups (e.g. for R techniques)I Courses and online resources. A lot to be learnt from “open
source” community!
I Collaborations with computing specialists: especially for majorsoftware projects: http://www.timeshighereducation.co.uk/news/
save-your-work-give-software-engineers-a-career-track/2006431.article
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24
More than just software
“Knowledge transfer”: Link software with documentation andtraining→ make methods more accessible, avoid them being misused
I Tutorial papers
I “Vignettes”: tutorials with worked examples, packaged withthe software
I Journal of Statistical Software, R Journal, Stata Journal— gives author a publication!
I Short courses
Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 23/ 24