Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Interpretable treatmentregimes via decision lists
Eric B. Laber
Department of Statistics, North Carolina State University
September 14, 2017
Acknowledgments
Thanks to
I IMA organizing committee
I NCSU/UNC AI and Precision Medicine Lab
I PR Department of Natural Resources
I National Science Foundation
I National Institutes of Health
Joint work with
I Students in Laber Labs
I Yichi Zhang
I John Drake
I Krishna Pacifici
I Brian Reich
I Marie Davidian
I Butch Tsiatis
Outline: talk in two parts
I Interpretable treatment regimes
I Open problems
Outline: talk in two parts
I Interpretable treatment regimes
I Open problems
Precision medicine
I “The right treatment for the right patient at the right time.”–Mantra of precision medicine advocates
I Widely recognized that best clinical care requires treatmentdecisions tailored to individual patient characteristics
I Improve patient outcomes, reduce cost and patient burden
I Treatment regimes
I Formalize clinical decision making as sequence of decision rules
I One rule per stage of clinical intervention
I Maps current patient info to recommended treatment
I Optimal regime maximizes the mean of some cumulativeclinical outcome if applied to population of interest
1 / 30
Ex. Treatment regime: mHealth for PTSD incancer patients (PI S. Smith)
First stage decision rule
If distress ≥ 3 then: Cancer Distress Coach (CDC)
Else if PTSD symptom score ≥ 20 then: CDC
Else: usual care
Second stage decision rule
If responder then: continue first stage treatment
Else if using CDC and PSTD change ≥ 3 then: add mCoaching
Else if using CDC and distress ≥ 4 then: add FaceTime CBT
Else FaceTime CBT only
2 / 30
Ex. SMART: mHealth for PTSD
R
Distress Coach
Treatment A
Standard Care
Treatment B
Response?
Response?
R
No
R
No
Distress coach
Yes
Continue
Add mCoaching
Treatment AA
Facetime CBT
Treatment AB
Follow-up only
Yes
Continue
DC + mCoaching
Treatment BA
Facetime CBT
Treatment BB
3 / 30
Estimation of treatment regimes: recent trend
I Current paradigm: assume estimated regime used to dictatetreatment choice ⇒ emphasize flexibility over interpretability
I Surge in machine-learning/semi-parametric methods
I Direct-search with large-margin classifiers (Zhao et al. 2012,2013ab, 2014; Kang et al., 2014; Zhao and L., 2015)
I Generalized additive models (Moodie et al. 2014)
I Nearest-neighbor methods (Zhou and Kosorok, 2016)
I JSM JASA T&M (2016) and A&CS (2016, 2017) discussedpapers about non-parametric estimation of treatment regimes(Xu et al. 2016; Chen et al., 2016, Rashid et al., 2017)
4 / 30
Treatment regimes and clinical research
I Reality: estimated regimes part of secondary, exploratoryanalyses ⇒ emphasize interpretability in domain context
I Treatment regimes for application in clinical practice requireslongterm, joint-effort of statisticians and clinical scientists
Posit clinicalhypotheses
Collect data
Design study
Critical evaluation
Estimate regime
5 / 30
List-based regimes
I Goal: estimation of an interpretable yet expressive regime
I List-based rules are a sequence of if-then statements mappinglogical clauses to treatment recommendations
I Regime comprising list-based decision rules are immediatelyinterpretable in a domain context
If c1 then: a1
Else if c2 then: a2
· · ·Else if cL−1 then: aL−1
Else: aL
If distress ≥ 3 then: CDC
Else if PTSD ≥ 20 then: CDC
Else: usual care
6 / 30
List-based regimes cont’d
I Simple but powerfulI Txt a if and only if all {c1, . . . , ck} holdI Txt a if any of {c1, . . . , ck} holdI Different treatment for each level of categorical variableI . . .
I Restrict attention to clauses dictated by union or intersectionof thresholded regions
I Xj > τj
I Xj ≤ τjI Xj > τj and Xk > τk
I Xj > τj and Xk ≤ τkI Xj > τj or Xk > τk
I Xj ≥ τj or Xk ≤ τkI Xj ≤ τj and Xk ≤ τkI Xj ≤ τj and Xk > τk
I Xj ≤ τj or Xk ≤ τkI Xj ≤ τj or Xk > τk
7 / 30
List-based regimes and short-circuiting
I List-based regimes need not be unique
π′
If c1 then: a1
Else if c2 then: a2
Else: a3
π′
If ¬c1 and ¬c2 then: a3
Else if ¬c1 then: a2
Else: a1
I Choose regime that requires least patient burden/cost amongequivalence class dictated by marginal mean outcome
I Ex. π need not evaluate c2 among patients satisfying c1
I Focus on estimating a single member of equivalence class
8 / 30
Setup and notation
I Observe {(X 1,i ,A1,i , . . . ,XT ,i ,AT ,i ,Yi )}ni=1 i .i .d . from P
I X t ∈ Rpt subject info during stage t
I At ∈ At assigned txt at stage t
I Y ∈ R outcome coded so that higher is better
I Define history H1 = X 1, H t = (Hᵀt−1,At−1,X
ᵀt )ᵀ, t ≥ 2
I Treatment regime πππ = (π1, . . . , πT ) where
πt : suppH t → suppAt ,
patient presenting with H t = ht assigned txt πt(ht)
9 / 30
Optimal regime via potential outcomes
I Define at = (a1, . . . , at) and set of potential outcomes
W ∗ ={H1,H∗2(a1), . . . ,H∗T (aT−1),Y ∗(aT ) : aT ∈ ⊗T
t=1At
}
I Potential outcome under a regime πππ
Y ∗(πππ) =∑aT
Y ∗(aT )T∏t=1
1 [πt {H∗t (at−1)} = at ]
I Optimal regime in class Π satisfies EY ∗(πππopt) ≥ EY ∗(πππ),where πππopt, πππ ∈ Π
10 / 30
Characterizing the optimal regime
I For each t = 1, . . . ,T assume
(C1) Sequential ignorability: At ⊥W ∗∣∣H t
(C2) Positivity: P(At = at∣∣H t) ≥ ε wp1 for some ε > 0
(C3) Consistency: H t = H∗t (At), Y = Y ∗(AT )
(C4) Stable unit treatment value assumption (SUTVA)
I (C1)-(C3) are satisfied by construction in a sequential multipleassignment randomized trial (SMART; Murphy, 2005)
11 / 30
Characterizing the optimal regime cont’d
I Let Πt denote space of list-based rules on suppH t for t ≥ 1
I Assume (C1)-(C4) then
I Define
QT (hT , aT ) = E(Y∣∣HT = hT ,AT = aT
)then πopt
T = arg maxπT∈ΠTEQT {HT , πT (HT )}
I Recursively, define
Qt(ht , at) = E[Qt+1
{H t+1, π
optt+1(H t+1)
} ∣∣H t = ht ,At = at]
then πoptt = arg maxπt∈Πt EQt {H t , πt(H t)}
12 / 30
Q-learning with policy search
I Let Qt denote postulated class of models for Qt(ht , at)
I Estimation algorithm
I Compute
Q̂T = arg minQT∈QT
[Pn {YT − QT (HT ,AT )}2 + PT (QT )
]and subsequently π̂T = arg maxπT∈ΠT
PnQ̂T {HT , πT (HT )}
I Recursively, compute
Q̂t = arg minQt∈Qt
{Pn
[Q̂t+1 {H t+1, π̂t+1(H t+1)} − Qt(H t ,At)
]2
+ Pt(Qt)
}and subsequently π̂t = arg maxπ∈Πt PnQ̂t {H t , πt(H t)}
13 / 30
Computation
I Πt class of list-based rules ⇒ computation ofarg maxπt∈Πt PnQ̂t {H t , πt (H t)} discrete opt. problem
I Basic idea: stepwise splitting procedure in spirit of CART
I Optimize marginal mean outcome plus complexity penalty
I Stopping criteria based on significance test
LemmaLet mt = #At , dt = dim(ht), and Lt is the maximum list depth.Then, the time complexity for computing π̂ππ = (π̂1, . . . , π̂T ) is
O
{n log(n)
T∑t=1
Ltmtd2t
}.
14 / 30
Convergence of Q-learning with policy search
I Overview of technical assumptions
(A1) H t and Y are bounded with probability one
(A2) Q-functions are sufficiently smooth (weakly differentiable insense of Eberts and Steinwart, 2013, Defn. 2.1 with LP norm)
(A3) Margin condition on the treatment effects
(A4) Estimation via kernel ridge regression with Gaussian kernel
(A5) Rate conditions on complexity penalty in splitting criteria
15 / 30
Convergence of Q-learning with policy searchcont’d
LemmaAssume (C1)-(C4) and (A1)-(A5). Then, for each t = 1, . . . ,T
P{π̂t(H t) 6= πoptt (H t)
∣∣Pn
}converges to zero in probability.
Corollary
Assume (C1)-(C4) and (A1)-(A5). Then EY ∗(πππopt)− EY ∗(π̂ππ)converges to zero.
16 / 30
Q-learning with policy search discussion
I Divorce form of Q-function from class of regimes
I Decision lists offer interpretable yet expressive class of regimes
I Diagnostics via comparison of arg maxat Q̂t(ht , at) with π̂t(ht)
I Short-circuiting can save cost/burden
I Potential drawbacks and open problems
I Sub-parametric rates of convergence (not shown)
I Methods for inference and uncertainty quantification needed
17 / 30
Simulation experiments: overview
I Settings taken from Zhao et al. (2015) and Murphy (2003)
I Basic framework
I Normal covariates with AR-1 updates
I Txt assign. mimics both randomized and observational study
I Mean outcome nonlinear in history
Model Stages Covariates Propensity Source
1 2 50 Constant Zhao et al.2 2 50 History-dependent Zhao et al.3 3 3 Constant Zhao et al.4 10 10 History-dependent Murphy
18 / 30
Simulation experiments: overview cont’d
I Performance measure E {EY ∗(π̂ππ)}I Q-learning + Policy search with decision lists
I Non-parametric Q-learning with random forests
I Linear Q-learning with lasso penalty
I Backward outcome weighted learning (BOWL)
I Additional numerical details
I Use 1K Monte Carlo replications
I Performance estimated with test set of size 10K
I Consider training sets of size n = 100 and n = 400
I Maximum list-depth set to Lt = 5 for all t
19 / 30
Simulation experiments: results
Model n Decision List Q-RF Q-Lasso BOWL
1 100 6.63 6.70 6.50 6.701 400 6.94 6.70 6.66 6.70
2 100 3.66 3.41 3.68 2.682 400 3.73 3.71 3.75 3.19
3 100 14.49 12.94 5.74 8.123 400 18.60 18.02 8.42 13.62
4 100 23.68 17.83 12.39 NA4 400 26.80 24.73 16.58 NA
20 / 30
Simulation experiments: summary
I Q-learning + policy search with decision lists performsfavorably to parametric and non-parametric competitors
I Parsimony of list provides automatic variable selection
I Did not consider dense but weak signals
I Significantly more interpretable than alternatives
I R package decisionList freely available on CRAN
21 / 30
Discussion of list-based optimal regimes
I Estimation of optimal treatment regimes often part ofsecondary, hypothesis-generating analyses
I Q-learning + policy search with decision lists is a powerfultool for estimating high-quality interpretable regimes
I Computationally efficient
I Consistent under weak conditions
I Because of underlying regression framework extends tohigh-dimensional data, censored data, continuous txts etc.
I Applies to infinite horizon problems modeled as MarkovDecision Processes through Bellman optimality est. equations
22 / 30
Outline: talk in two parts
I Interpretable treatment regimes
I Open problems
Outline: talk in two parts
I Interpretable treatment regimes
I Open problems
Research-practice gap
I Active methods work
I Machine learning
I Infinite horizon problems
I Clustered designs and regimes
I Most active methods work under framework considered here
I Single outcome
I Single decision maker
I Stationary preferences
I . . .
23 / 30
Research-practice gap cont’d
I Need decision-support systems that faithfully reflect thecomplexities and realities of clinical decision making
I Heterogeneous patient preferences
I Multiple stakeholders with different objectives
I Implementation costs
I Non-stationarity
I Need to expand current mathematical framework
24 / 30
Individual patient preference
I Clinical decision-making requires balancing multiple, possiblycompeting outcomes
I Ex., side-effects and efficacy
I Ex., cost and local availability
I . . .
I Preferences across outcomes can vary across patients andwithin patients over time
I Ex., become averse to side-effect after experiencing it
I Ex., lethargy and lifestyle
I Ex., hypertension and weight gain
25 / 30
Composite outcomes
I Idea: create a single composite summary across all outcomes
I Assumes homogeneous preference across all patients
I Thall et al. have applied this approach successfully in cancer
I Potential problems
I Preferences vary across patients
I Preferences change over time
26 / 30
Preference elicitation
I Idea! ask patients about their preferences
I Administer questionnaire, link answers to utility functionthrough latent preference model, e.g., item response model
I Questionnaire becomes part of treatment package
I Leads to regime based on individual patient preference
I Potential problems
I Need high-quality instruments
I Patients unable/unwilling to communicate preferences
27 / 30
Set-valued regimes
I Idea! screen-out treatment choices that are dominated acrossall outcomes under consideration
I Recommend set of treatment options
I Includes optimal regime for large class of preferences
I Avoid elicitation or composite outcomes
I Potential problems
I Little guidance on how to choose among set
I Individual preference only incorporated indirectly
28 / 30
Informing, not dictating
I Supplement domain expertise with data
I Distill data into information
I Communicate strength of evidence provided by data
I Health communication is notoriously difficult
I Health literacy varies widely
I Shared decision model can be time and resource intensive
29 / 30
Informing, not dictating cont’d
I Methodology must be informed by clinical need
I Not all statistical/methodological challenges
I Information communication tools needed
I Many important open problems
I Extensions to more exotic data types/gen models
I Research-translation
30 / 30
Thank you.