Interpretable treatment regimes via decision lists · Precision medicine I \The right treatment for the right patient at the right time." {Mantra of precision medicine advocates I

$Page 1: Interpretable treatment regimes via decision lists · Precision medicine I \The right treatment for the right patient at the right time." {Mantra of precision medicine advocates I$
Interpretable treatmentregimes via decision lists

Eric B. Laber

Department of Statistics, North Carolina State University

September 14, 2017

Acknowledgments

Thanks to

I IMA organizing committee

I NCSU/UNC AI and Precision Medicine Lab

I PR Department of Natural Resources

I National Science Foundation

I National Institutes of Health

Joint work with

I Students in Laber Labs

I Yichi Zhang

I John Drake

I Krishna Pacifici

I Brian Reich

I Marie Davidian

I Butch Tsiatis

Outline: talk in two parts

I Interpretable treatment regimes

I Open problems



I Open problems

Precision medicine

I “The right treatment for the right patient at the right time.”–Mantra of precision medicine advocates

I Widely recognized that best clinical care requires treatmentdecisions tailored to individual patient characteristics

I Improve patient outcomes, reduce cost and patient burden

I Treatment regimes

I Formalize clinical decision making as sequence of decision rules

I One rule per stage of clinical intervention

I Maps current patient info to recommended treatment

I Optimal regime maximizes the mean of some cumulativeclinical outcome if applied to population of interest

1 / 30

Ex. Treatment regime: mHealth for PTSD incancer patients (PI S. Smith)

First stage decision rule

If distress ≥ 3 then: Cancer Distress Coach (CDC)

Else if PTSD symptom score ≥ 20 then: CDC

Else: usual care

Second stage decision rule

If responder then: continue first stage treatment

Else if using CDC and PSTD change ≥ 3 then: add mCoaching

Else if using CDC and distress ≥ 4 then: add FaceTime CBT

Else FaceTime CBT only

2 / 30

Ex. SMART: mHealth for PTSD

R

Distress Coach

Treatment A

Standard Care

Treatment B

Response?

Response?

R

No

R

No

Distress coach

Yes

Continue

Add mCoaching

Treatment AA

Facetime CBT

Treatment AB

Follow-up only

Yes

Continue

DC + mCoaching

Treatment BA

Facetime CBT

Treatment BB

3 / 30

Estimation of treatment regimes: recent trend

I Current paradigm: assume estimated regime used to dictatetreatment choice ⇒ emphasize flexibility over interpretability

I Surge in machine-learning/semi-parametric methods

I Direct-search with large-margin classifiers (Zhao et al. 2012,2013ab, 2014; Kang et al., 2014; Zhao and L., 2015)

I Generalized additive models (Moodie et al. 2014)

I Nearest-neighbor methods (Zhou and Kosorok, 2016)

I JSM JASA T&M (2016) and A&CS (2016, 2017) discussedpapers about non-parametric estimation of treatment regimes(Xu et al. 2016; Chen et al., 2016, Rashid et al., 2017)

4 / 30

Treatment regimes and clinical research

I Reality: estimated regimes part of secondary, exploratoryanalyses ⇒ emphasize interpretability in domain context

I Treatment regimes for application in clinical practice requireslongterm, joint-effort of statisticians and clinical scientists

Posit clinicalhypotheses

Collect data

Design study

Critical evaluation

Estimate regime

5 / 30

List-based regimes

I Goal: estimation of an interpretable yet expressive regime

I List-based rules are a sequence of if-then statements mappinglogical clauses to treatment recommendations

I Regime comprising list-based decision rules are immediatelyinterpretable in a domain context

If c1 then: a1

Else if c2 then: a2

· · ·Else if cL−1 then: aL−1

Else: aL

If distress ≥ 3 then: CDC

Else if PTSD ≥ 20 then: CDC

Else: usual care

6 / 30

List-based regimes cont’d

I Simple but powerfulI Txt a if and only if all {c1, . . . , ck} holdI Txt a if any of {c1, . . . , ck} holdI Different treatment for each level of categorical variableI . . .

I Restrict attention to clauses dictated by union or intersectionof thresholded regions

I Xj > τj

I Xj ≤ τjI Xj > τj and Xk > τk

I Xj > τj and Xk ≤ τkI Xj > τj or Xk > τk

I Xj ≥ τj or Xk ≤ τkI Xj ≤ τj and Xk ≤ τkI Xj ≤ τj and Xk > τk

I Xj ≤ τj or Xk ≤ τkI Xj ≤ τj or Xk > τk

7 / 30

List-based regimes and short-circuiting

I List-based regimes need not be unique

π′

If c1 then: a1

Else if c2 then: a2

Else: a3

π′

If ¬c1 and ¬c2 then: a3

Else if ¬c1 then: a2

Else: a1

I Choose regime that requires least patient burden/cost amongequivalence class dictated by marginal mean outcome

I Ex. π need not evaluate c2 among patients satisfying c1

I Focus on estimating a single member of equivalence class

8 / 30

Setup and notation

I Observe {(X 1,i ,A1,i , . . . ,XT ,i ,AT ,i ,Yi )}ni=1 i .i .d . from P

I X t ∈ Rpt subject info during stage t

I At ∈ At assigned txt at stage t

I Y ∈ R outcome coded so that higher is better

I Define history H1 = X 1, H t = (Hᵀt−1,At−1,X

ᵀt )ᵀ, t ≥ 2

I Treatment regime πππ = (π1, . . . , πT ) where

πt : suppH t → suppAt ,

patient presenting with H t = ht assigned txt πt(ht)

9 / 30

Optimal regime via potential outcomes

I Define at = (a1, . . . , at) and set of potential outcomes

W ∗ ={H1,H∗2(a1), . . . ,H∗T (aT−1),Y ∗(aT ) : aT ∈ ⊗T

t=1At

}

I Potential outcome under a regime πππ

Y ∗(πππ) =∑aT

Y ∗(aT )T∏t=1

1 [πt {H∗t (at−1)} = at ]

I Optimal regime in class Π satisfies EY ∗(πππopt) ≥ EY ∗(πππ),where πππopt, πππ ∈ Π

10 / 30

Characterizing the optimal regime

I For each t = 1, . . . ,T assume

(C1) Sequential ignorability: At ⊥W ∗∣∣H t

(C2) Positivity: P(At = at∣∣H t) ≥ ε wp1 for some ε > 0

(C3) Consistency: H t = H∗t (At), Y = Y ∗(AT )

(C4) Stable unit treatment value assumption (SUTVA)

I (C1)-(C3) are satisfied by construction in a sequential multipleassignment randomized trial (SMART; Murphy, 2005)

11 / 30

Characterizing the optimal regime cont’d

I Let Πt denote space of list-based rules on suppH t for t ≥ 1

I Assume (C1)-(C4) then

I Define

QT (hT , aT ) = E(Y∣∣HT = hT ,AT = aT

)then πopt

T = arg maxπT∈ΠTEQT {HT , πT (HT )}

I Recursively, define

Qt(ht , at) = E[Qt+1

{H t+1, π

optt+1(H t+1)

} ∣∣H t = ht ,At = at]

then πoptt = arg maxπt∈Πt EQt {H t , πt(H t)}

12 / 30

Q-learning with policy search

I Let Qt denote postulated class of models for Qt(ht , at)

I Estimation algorithm

I Compute

Q̂T = arg minQT∈QT

[Pn {YT − QT (HT ,AT )}2 + PT (QT )

]and subsequently π̂T = arg maxπT∈ΠT

PnQ̂T {HT , πT (HT )}

I Recursively, compute

Q̂t = arg minQt∈Qt

{Pn

[Q̂t+1 {H t+1, π̂t+1(H t+1)} − Qt(H t ,At)

]2

+ Pt(Qt)

}and subsequently π̂t = arg maxπ∈Πt PnQ̂t {H t , πt(H t)}

13 / 30

Computation

I Πt class of list-based rules ⇒ computation ofarg maxπt∈Πt PnQ̂t {H t , πt (H t)} discrete opt. problem

I Basic idea: stepwise splitting procedure in spirit of CART

I Optimize marginal mean outcome plus complexity penalty

I Stopping criteria based on significance test

LemmaLet mt = #At , dt = dim(ht), and Lt is the maximum list depth.Then, the time complexity for computing π̂ππ = (π̂1, . . . , π̂T ) is

O

{n log(n)

T∑t=1

Ltmtd2t

}.

14 / 30

Convergence of Q-learning with policy search

I Overview of technical assumptions

(A1) H t and Y are bounded with probability one

(A2) Q-functions are sufficiently smooth (weakly differentiable insense of Eberts and Steinwart, 2013, Defn. 2.1 with LP norm)

(A3) Margin condition on the treatment effects

(A4) Estimation via kernel ridge regression with Gaussian kernel

(A5) Rate conditions on complexity penalty in splitting criteria

15 / 30

Convergence of Q-learning with policy searchcont’d

LemmaAssume (C1)-(C4) and (A1)-(A5). Then, for each t = 1, . . . ,T

P{π̂t(H t) 6= πoptt (H t)

∣∣Pn

}converges to zero in probability.

Corollary

Assume (C1)-(C4) and (A1)-(A5). Then EY ∗(πππopt)− EY ∗(π̂ππ)converges to zero.

16 / 30

Q-learning with policy search discussion

I Divorce form of Q-function from class of regimes

I Decision lists offer interpretable yet expressive class of regimes

I Diagnostics via comparison of arg maxat Q̂t(ht , at) with π̂t(ht)

I Short-circuiting can save cost/burden

I Potential drawbacks and open problems

I Sub-parametric rates of convergence (not shown)

I Methods for inference and uncertainty quantification needed

17 / 30

Simulation experiments: overview

I Settings taken from Zhao et al. (2015) and Murphy (2003)

I Basic framework

I Normal covariates with AR-1 updates

I Txt assign. mimics both randomized and observational study

I Mean outcome nonlinear in history

Model Stages Covariates Propensity Source

1 2 50 Constant Zhao et al.2 2 50 History-dependent Zhao et al.3 3 3 Constant Zhao et al.4 10 10 History-dependent Murphy

18 / 30

Simulation experiments: overview cont’d

I Performance measure E {EY ∗(π̂ππ)}I Q-learning + Policy search with decision lists

I Non-parametric Q-learning with random forests

I Linear Q-learning with lasso penalty

I Backward outcome weighted learning (BOWL)

I Additional numerical details

I Use 1K Monte Carlo replications

I Performance estimated with test set of size 10K

I Consider training sets of size n = 100 and n = 400

I Maximum list-depth set to Lt = 5 for all t

19 / 30

Simulation experiments: results

Model n Decision List Q-RF Q-Lasso BOWL

1 100 6.63 6.70 6.50 6.701 400 6.94 6.70 6.66 6.70

2 100 3.66 3.41 3.68 2.682 400 3.73 3.71 3.75 3.19

3 100 14.49 12.94 5.74 8.123 400 18.60 18.02 8.42 13.62

4 100 23.68 17.83 12.39 NA4 400 26.80 24.73 16.58 NA

20 / 30

Simulation experiments: summary

I Q-learning + policy search with decision lists performsfavorably to parametric and non-parametric competitors

I Parsimony of list provides automatic variable selection

I Did not consider dense but weak signals

I Significantly more interpretable than alternatives

I R package decisionList freely available on CRAN

21 / 30

Discussion of list-based optimal regimes

I Estimation of optimal treatment regimes often part ofsecondary, hypothesis-generating analyses

I Q-learning + policy search with decision lists is a powerfultool for estimating high-quality interpretable regimes

I Computationally efficient

I Consistent under weak conditions

I Because of underlying regression framework extends tohigh-dimensional data, censored data, continuous txts etc.

I Applies to infinite horizon problems modeled as MarkovDecision Processes through Bellman optimality est. equations

22 / 30



I Open problems



I Open problems

Research-practice gap

I Active methods work

I Machine learning

I Infinite horizon problems

I Clustered designs and regimes

I Most active methods work under framework considered here

I Single outcome

I Single decision maker

I Stationary preferences

I . . .

23 / 30

Research-practice gap cont’d

I Need decision-support systems that faithfully reflect thecomplexities and realities of clinical decision making

I Heterogeneous patient preferences

I Multiple stakeholders with different objectives

I Implementation costs

I Non-stationarity

I Need to expand current mathematical framework

24 / 30

Individual patient preference

I Clinical decision-making requires balancing multiple, possiblycompeting outcomes

I Ex., side-effects and efficacy

I Ex., cost and local availability

I . . .

I Preferences across outcomes can vary across patients andwithin patients over time

I Ex., become averse to side-effect after experiencing it

I Ex., lethargy and lifestyle

I Ex., hypertension and weight gain

25 / 30

Composite outcomes

I Idea: create a single composite summary across all outcomes

I Assumes homogeneous preference across all patients

I Thall et al. have applied this approach successfully in cancer

I Potential problems

I Preferences vary across patients

I Preferences change over time

26 / 30

Preference elicitation

I Idea! ask patients about their preferences

I Administer questionnaire, link answers to utility functionthrough latent preference model, e.g., item response model

I Questionnaire becomes part of treatment package

I Leads to regime based on individual patient preference


I Need high-quality instruments

I Patients unable/unwilling to communicate preferences

27 / 30

Set-valued regimes

I Idea! screen-out treatment choices that are dominated acrossall outcomes under consideration

I Recommend set of treatment options

I Includes optimal regime for large class of preferences

I Avoid elicitation or composite outcomes


I Little guidance on how to choose among set

I Individual preference only incorporated indirectly

28 / 30

Informing, not dictating

I Supplement domain expertise with data

I Distill data into information

I Communicate strength of evidence provided by data

I Health communication is notoriously difficult

I Health literacy varies widely

I Shared decision model can be time and resource intensive

29 / 30

Informing, not dictating cont’d

I Methodology must be informed by clinical need

I Not all statistical/methodological challenges

I Information communication tools needed

I Many important open problems

I Extensions to more exotic data types/gen models

I Research-translation

30 / 30

Thank you.

[email protected]

Documents

Interpretable treatment regimes via decision lists · Precision medicine I \The right treatment for the right patient at the right time." {Mantra of precision medicine advocates I