Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
4. Single Decision Treatment Regimes: Additional Methods4.1 Optimal Regimes from a Classification Perspective4.2 Outcome Weighted Learning4.3 Interpretable Treatment Regimes via Decision Lists4.4 Additional Approaches4.5 Key References
189 ST 790, Dynamic Treatment Regimes
Classification analogy
Premise:• Estimation of an optimal regime can be viewed as a weighted
classification problem
• This allows the vast literature on classification and machinelearning to be exploited to define a restricted class of regimesand to estimate an optimal regime within it
• Rules characterizing regimes can be likened to classifiers so caninvolve high-dimensional parameterizations that are complex,flexible functions of individual characteristics
• Algorithms and software developed for classification problemscan be exploited for implementation
• Demonstrated by Zhang et al. (2012) and Zhao et al. (2012)
190 ST 790, Dynamic Treatment Regimes
Generic classification problem
• Z = outcome or class label; here, Z = {0, 1} (binary)
• X = vector of covariates or features taking values in X, thefeature space
• d is a classifier: d : X → {0, 1}
• D is a family of classifiers; e.g., with X = (X1,X2)T
I Hyperplanes of the form
d(X) = I(η11 + η12X1 + η13X2 > 0)
I Rectangular regions of the form
d(X) = I(X1 < η11) + I(X1 ≥ η11,X2 < η12)
191 ST 790, Dynamic Treatment Regimes
Generic classification problem
Implementation:• Training set: (Xi ,Zi), i = 1, . . . ,n• Find classifier d ∈ D that minimizes
I Classification error
n∑i=1
{Zi − d(Xi )}2 =n∑
i=1
I{Zi 6= d(Xi )}
I Weighted classification error
n∑i=1
wi{Zi − d(Xi )}2 =n∑
i=1
wi I{Zi 6= d(Xi )}
for wi , i = 1, . . . ,n, fixed, known weights
192 ST 790, Dynamic Treatment Regimes
Generic classification problem
• This problem has been studied extensively by statisticians andcomputer scientists
• Is a form of supervised learning , an approach within the broadarea of machine learning
• Many methods and software are available
• Recursive partitioning (CART): Rectangular regions
• Support vector machines (SVM): Hyperplanes (linear SVM),nonlinear SVM
193 ST 790, Dynamic Treatment Regimes
Value search estimation, revisited
Zhang et al. (2012): A1 = {0,1}, restricted class Dη• Elements dη = {d1(h1; η1)}, optimal restricted regime
doptη = {d1(h1; ηopt
1 )}, ηopt1 = arg max
η1
V(dη)
• AIPW estimator (3.43) for V(dη) for fixed η = η1
VAIPW (dη) =
n−1n∑
i=1
[ Cdη ,iYi
πdη ,1(H1i ; η1, γ1)−Cdη ,i − πdη ,1(H1i ; η1, γ1)
πdη ,1(H1i ; η1, γ1)Qdη ,1(H1i ; η1, β1)
]
Cdη = I{A1 = d1(H1; η1)} = A1I{d1(H1; η1) = 1}+ (1− A1)I{d1(H1; η1) = 0}
πdη,1(H1; η1, γ1) = π1(H1; γ1)I{d1(H1; η1) = 1}+{1−π1(H1; γ1)}I{d1(H1; η1) = 0}
Qdη,1(H1; η1, β1) = Q1(H1,1;β1)I{d1(H1; η1) = 1}+Q1(H1,0;β1)I{d1(H1; η1) = 0}
194 ST 790, Dynamic Treatment Regimes
Value search estimation, revisited
Estimator for doptη : dopt
η,AIPW = {d1(h1; ηopt1,AIPW )}
• ηopt1,AIPW maximizes VAIPW (dη) in η1
Algebra:CdηY
πdη,1(H1; η1, γ1)=
[A1I{d1(H1; η1) = 1}+ (1− A1)I{d1(H1; η1) = 0}]Yπ1(H1; γ1)I{d1(H1; η1) = 1}+ {1− π1(H1; γ1)}I{d1(H1; η1) = 0}
=A1Y
π1(H1; γ1)I{d1(H1; η1) = 1}+
(1− A1)Y{1− π1(H1; γ1)}
I{d1(H1; η1) = 0}
Cdη − πdη,1(H1; η1, γ1)
πdη,1(H1; η1, γ1)Qdη,1(H1; η1, β1)
={A1 − π1(H1; γ1)}
π1(H1; γ1)Q1(H1,1;β1)I{d1(H1; η1) = 1}
− {A1 − π1(H1; γ1)}1− π1(H1; γ1)
Q1(H1,0;β1)I{d1(H1; η1) = 0}
195 ST 790, Dynamic Treatment Regimes
Classification analogy
Define:
ψ1(H1,A1,Y ) =A1Yπ1(H1)
− {A1 − π1(H1)}π1(H1)
Q1(H1,1), (4.1)
ψ0(H1,A1,Y ) =(1− A1)Y1− π1(H1)
+{A1 − π1(H1)}
1− π1(H1)Q1(H1,0) (4.2)
• Under SUTVA, NUC, positivity
E{ψ1(H1,A1,Y )|H1} = Q1(H1,1), E{ψ0(H1,A1,Y )|H1} = Q1(H1,0)
• Thus
E{ψ1(H1,A1,Y )−ψ0(H1,A1,Y )|H1} = C1(H1) = Q1(H1,1)−Q1(H1,0),
the contrast function (3.33)
196 ST 790, Dynamic Treatment Regimes
Classification analogy
Thus, by all of this algebra: Can write
VAIPW (dη) = n−1n∑
i=1
[ψ1(H1i ,A1i ,Yi)I{d1(H1i ; η1) = 1}
+ ψ0(H1i ,A1i ,Yi)I{d1(H1i ; η1) = 0}]
• ψ1(H1i ,A1i ,Yi) and ψ0(H1i ,A1i ,Yi) are (4.1) and (4.2) evaluatedat (H1i ,A1i ,Yi) with the fitted models Q1(H1,1; β1), Q1(H1,0; β1),and π1(H1; γ1) substituted
• Rewrite using I{d1(H1; η1) = 1} = d1(H1; η1),I{d1(H1; η1) = 0} = 1− d1(H1; η1)
197 ST 790, Dynamic Treatment Regimes
Classification analogy
By further algebra: VAIPW (dη) can be expressed as
VAIPW (dη)
= n−1n∑
i=1
[ψ1(H1i ,A1i ,Yi )d1(H1i ; η1) + ψ0(H1i ,A1i ,Yi ){1− d1(H1i ; η1)}
]= n−1
n∑i=1
[d1(H1i ; η1)
{ψ1(H1i ,A1i ,Yi )− ψ0(H1i ,A1i ,Yi )
}+ ψ0(H1i ,A1i ,Yi )
]= n−1
n∑i=1
{d1(H1i ; η1)C1(H1i ,A1i ,Yi ) + ψ0(H1i ,A1i ,Yi )
}
• Predictor of the contrast function
C1(H1i ,A1i ,Yi) = ψ1(H1i ,A1i ,Yi)− ψ0(H1i ,A1i ,Yi)
198 ST 790, Dynamic Treatment Regimes
Classification analogy
Result: Maximizing VAIPW (dη) in η1 is equivalent to maximizing
n−1n∑
i=1
d1(H1i ; η1)C1(H1i ,A1i ,Yi)
More algebra: Using a = I(a > 0)|a| − I(a ≤ 0)|a| for any a andwriting dη1,1i = d1(H1i ; η1), C1i = C1(H1i ,A1i ,Yi)
dη1,1i C1i = dη1,1i I(C1i > 0)|C1i | − dη1,1i I(C1i ≤ 0)|C1i |
= I(C1i > 0)|C1i | − |C1i |{(1− dη1,1i )I(C1i > 0) + dη1,1i I(C1i ≤ 0)}
= I(C1i > 0)|C1i | − |C1i |{I(C1i > 0)− dη1,1i}2
Thus:
d1(H1i ;η1)C1(H1i ,A1i ,Yi) = I{C1(H1i ,A1i ,Yi) ≥ 0}|C1(H1i ,A1i ,Yi)|
− |C1(H1i ,A1i ,Yi)|[I{C1(H1i ,A1i ,Yi) ≥ 0} − d1(H1i ; η1)
]2
199 ST 790, Dynamic Treatment Regimes
Classification analogy
Final result: Maximizing VAIPW (dη) in η1 is equivalent to minimizingin η1
n−1n∑
i=1
|C1(H1i ,A1i ,Yi)|[I{C1(H1i ,A1i ,Yi) > 0} − d1(H1i ; η1)
]2
= n−1n∑
i=1
|C1(H1i ,A1i ,Yi)|I[I{C1(H1i ,A1i ,Yi) > 0} 6= d1(H1i ; η1)
](4.3)
• A weighted classification error with
I “Label” I{C1(H1i ,A1i ,Yi ) ≥ 0} (Zi )
I “Weight” |C1(H1i ,A1i ,Yi )| (wi )
I “Classifier” d1(h1; η1) (d)
200 ST 790, Dynamic Treatment Regimes
Classification analogy
n−1n∑
i=1
|C1(H1i ,A1i ,Yi)|I[I{C1(H1i ,A1i ,Yi) > 0} 6= d1(H1i ; η1)
](4.3)
Intuitive interpretation:• From (3.34), dopt ∈ D satisfies
dopt1 (h1) = I{C1(h1) > 0}
• The second term in (4.3) compares a predictor of the optionselected by the global dopt to that selected by a rule in Dη
• The “weight” |C1(H1i ,A1i ,Yi)| in (4.3) places greater importanceon contributions from individuals for whom the absolutedifference in expected outcomes for options 0 and 1 is large
201 ST 790, Dynamic Treatment Regimes
Classification analogy
Similarly: Analogous argument applies to
VIPW (dη) = n−1n∑
i=1
Cdη ,iYi
πdη ,1(H1i ; η1, γ1)
• Can be shown: The same formulation applies with
ψ1(H1,A1,Y ) =A1Yπ1(H1)
, ψ0(H1,A1,Y ) =(1− A1)Y1− π1(H1)
C1(H1i ,A1i ,Yi) = ψ1(H1i ,A1i ,Yi)− ψ0(H1i ,A1i ,Yi)
=A1iYi
π1(H1i ; γ1)− (1− A1i)Yi
1− π1(H1i ; γ1)
202 ST 790, Dynamic Treatment Regimes
Classification analogy
Summary: Value (policy) search estimation of an optimal restrictedregime dopt
η by maximizing VIPW (dη) or VAIPW (dη) is equivalent tominimizing a weighted classification error
• Choice of classification approach dictates the restricted class Dη• Can be implemented using off-the-shelf software and algorithms
for classification problems
• E.g., for CART, SVM
However: This analogy does not circumvent the need to optimize anonsmooth function of η1
203 ST 790, Dynamic Treatment Regimes
Demonstration
Decision function: Write d1(h1; η1) = I{f1(h1; η1) > 0}• E.g.
f1(h1; η1) = η11 + η12x11 + η13x12
• By algebra, can write
n−1n∑
i=1
|C1(H1i ,A1i ,Yi)|I[I{C1(H1i ,A1i ,Yi) > 0} 6= d1(H1i ; η1)
]= n−1
n∑i=1
|C1(H1i ,A1i ,Yi)| `0-1
([2I{C1(H1i ,A1i ,Yi) > 0} − 1
]f1(H1i ; η1)
)in terms of the 0-1 loss function
`0-1(x) = I(x ≤ 0)
204 ST 790, Dynamic Treatment Regimes
Demonstration
n−1n∑
i=1
|C1(H1i ,A1i ,Yi)| `0-1
([2I{C1(H1i ,A1i ,Yi) > 0} − 1
]f1(H1i ; η1)
)Source of difficulty: The 0-1 loss function is nonconvex
`0-1(x) = I(x ≤ 0)
• Optimization involving nonconvex loss functions is challenging;standard techniques cannot be used
• This problem has been well studied in the classification literature
• E.g., with SVM, replace `0-1(x) by a convex “surrogate” such asthe hinge loss function
`hinge(x) = (1− x)+, x+ = max(0, x)
205 ST 790, Dynamic Treatment Regimes
Hinge loss vs. 0-1 loss
−2 −1 0 1 2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
206 ST 790, Dynamic Treatment Regimes
4. Single Decision Treatment Regimes: Additional Methods4.1 Optimal Regimes from a Classification Perspective4.2 Outcome Weighted Learning4.3 Interpretable Treatment Regimes via Decision Lists4.4 Additional Approaches4.5 Key References
207 ST 790, Dynamic Treatment Regimes
Original formulation
Zhao et al. (2012): Approach based on the IPW estimator
VIPW (dη) = n−1n∑
i=1
Cdη ,iYi
πdη ,1(H1i ; η1, γ1)
• Assume that Y is bounded and Y ≥ 0• Can be developed as a special case with
ψ1(H1,A1,Y ) =A1Yπ1(H1)
, ψ0(H1,A1,Y ) =(1− A1)Y1− π1(H1)
C1(H1i ,A1i ,Yi) = ψ1(H1i ,A1i ,Yi)− ψ0(H1i ,A1i ,Yi)
=A1iYi
π1(H1i ; γ1)− (1− A1i)Yi
1− π1(H1i ; γ1)
=Yi{A1i − π1(H1; γ1)}
π1(H1; γ1){1− π1(H1; γ1)}
208 ST 790, Dynamic Treatment Regimes
As a special case
• With Y ≥ 0 and the positivity assumption
I{C1(H1i ,A1i ,Yi) > 0} = I(A1i = 1) = A1i
• By considering A1i = 1 and A1i = 0
|C1(H1i ,A1i ,Yi)| =Yi
A1iπ1(H1i ; γ1) + (1− A1i){1− π1(H1i ; γ1)}
• Substitute in (4.3): Maximizing VIPW (dη) in η1 is equivalent tominimizing
n−1n∑
i=1
Yi
A1iπ1(H1i ; γ1) + (1− A1i){1− π1(H1i ; γ1)}︸ ︷︷ ︸“Weight”
I{A1i 6= d1(H1i ; η1)}
(4.4)
• “Label” A1i , “Classifier” d1(h1; η1)
209 ST 790, Dynamic Treatment Regimes
Original formulation
Randomized study: With known
π1(H1) = P(A1 = 1|H1) = P(A1 = 1) = π1
• Recode options: A1 = {−1,1}• d1(h1; η1) = sign{f1(h1; η1)} for decision function f1(h1; η1)
Weighted classification error: (4.4) can be rewritten as
n−1n∑
i=1
Yi
A1iπ1 + (1− A1i)/2I{A1i 6= d1(H1i ; η1)}
= n−1n∑
i=1
Yi
A1iπ1 + (1− A1i)/2I[A1i 6= sign{f1(H1i ; η1)}]
• Involves the 0-1 loss function
I[A1i 6= sign{f1(H1i ; η1)}] = I{A1i f1(H1i ; η1) ≤ 0} = `0-1{A1i f1(H1i ; η1)}
210 ST 790, Dynamic Treatment Regimes
Outcome weighted learning (OWL)
Minimize:
n−1n∑
i=1
Yi
A1iπ1 + (1− A1i)/2`0-1{A1i f1(H1i ; η1)}
Original OWL: Zhao et al. (2012)
• Restricted class Dη induced by linear or nonlinear SVM
• Replace 0-1 loss by the convex surrogate hinge loss function
`hinge(x) = (1− x)+, x+ = max(0, x)
• Minimize in η1 the penalized objective function
n−1n∑
i=1
Yi
A1iπ1 + (1− A1i)/2{1− A1i f1(H1i ; η1)}+ + λn‖f1‖2 (4.5)
• Flexible, highly parameterized representation of f1(h1; η1),penalty for overfitting
211 ST 790, Dynamic Treatment Regimes
Outcome weighted learning
Remarks:• Can take a similar approach (flexible f1(h1; η1), penalization) with
the full AIPW formulation• Important: Minimizing the original objective (4.4) and minimizing
(4.5) with hinge loss substituted are different optimizationproblems and will lead to different ηopt
1 and thus differentestimated optimal regimes
• Similarly for the AIPW formulation• Simulation evidence: Suggests this might not be such a big deal
in practice; resulting estimated optimal regimes perform well
Refinements and extensions of OWL: Zhou et al. (2017), Liu et al(2018)
212 ST 790, Dynamic Treatment Regimes
4. Single Decision Treatment Regimes: Additional Methods4.1 Optimal Regimes from a Classification Perspective4.2 Outcome Weighted Learning4.3 Interpretable Treatment Regimes via Decision Lists4.4 Additional Approaches4.5 Key References
213 ST 790, Dynamic Treatment Regimes
Flexibility vs. interpretability
Classification approach: Flexible representation• Complex, highly parameterized estimated decision rules• Pro: Can synthesize high-dimensional patient information and
achieve performance close to a true optimal regime dopt ∈ D• Con: Difficult to interpret, “black box,” hard to glean new scientific
insights
Opposing view: Emphasize parsimony and interpretability• Deliberately focus on Dη with rules that can be understood by
clinicians and patients• Pro: Accessibility, more readily accepted, can generate new
scientific insights and hypotheses• Con: Optimal such regimes may not approach performance of
dopt ∈ D
214 ST 790, Dynamic Treatment Regimes
Decision rules as decision lists
Zhang et al. (2015): Focus on Dη with decision rules characterizingregimes in the form of a decision list• Decision list: A sequence of if-then clauses• “If” is a condition involving patient information that, if true, leads
to selection of an option a1 ∈ A1
• Natural for A1 with m1 ≥ 2 options
Example: Acute leukemia, A1 = {C1,C2} = {0,1}• Rule d1(h1) = I(age < 50 and WBC < 10) as a list
If age < 50 and WBC < 10 then C2;
else C1
215 ST 790, Dynamic Treatment Regimes
Decision rules as decision lists
Fancier example: Acute leukemia, {C1,C2,C3}If age < 50 and WBC < 10 then C2;
else if age ≥ 50 then C1;
else C3
(4.6)
C1
C2
C3
WBC 10
50 age
216 ST 790, Dynamic Treatment Regimes
Decision rules as decision lists
Fancier still:
If age < 50 and ECOG < 2 then C2;
else if WBC ≥ 20 then C1;
else if PLAT > 300 then C1;
else C3.
(4.7)
217 ST 790, Dynamic Treatment Regimes
Decision rules as decision lists
In general: A1 with m1 ≥ 2 options; rule in form of decision list oflength L1
If c11 then a11;
else if c12 then a12;
...
else if c1L1 then a1L1 ;
else a10,
• Summarized as {(c11,a11), . . . , (c1L1 ,a1L1),a10}• For example, in (4.7), L1 = 3, c11 = { age < 50 and ECOG < 2},
and a11 = C2
• Treatment options can be repeated in different clauses• L1 = 0 corresponds to a static regime
218 ST 790, Dynamic Treatment Regimes
Basic formulation
Can mathematize: Define
T1(c1`) = {h1 : c1` is true }, ` = 1, . . . ,L1
R11 = T1(c11)
R1` ={∩j<`T1(c1j)
c}⋂ T1(c1`), ` = 2, . . . ,L1,
R10 =
L1⋂j=1
T1(c1j)c
• Each R1`, ` = 0, . . . ,L1, represents the conditions that must besatisfied for an individual to receive option a1`
• For the diligent student: Determine the sets R11, R12, R13, andR10 for the example in (4.7)
• Clearly: A given h1 can belong to at most one set R1`,` = 0,1, . . . ,L1
219 ST 790, Dynamic Treatment Regimes
Basic formulation
Treatment regime: Decision rules of form
d1(h1) =
L1∑`=0
a1` I(h1 ∈ R1`). (4.8)
• Characterized by {(c11,a11), . . . , (c1L1 ,a1L1),a10}
Zhang et al. (2015): For parsimony and interpretability, restrict to c1`involving at most 2 components of h1
• For h1 with p1 components, j1 < j2 ∈ {1, . . . ,p1}, restrict toT1(c1`) of any of the forms{h1 : h1j1 ≤ τ11} {h1 : h1j1 ≤ τ11 or h1j2 ≤ τ12}{h1 : h1j1 ≤ τ11 and h1j2 ≤ τ12} {h1 : h1j1 ≤ τ11 or h1j2 > τ12}{h1 : h1j1 ≤ τ11 and h1j2 > τ12} {h1 : h1j1 > τ11 or h1j2 ≤ τ12}{h1 : h1j1 > τ11 and h1j2 ≤ τ12} {h1 : h1j1 > τ11 or h1j2 > τ12}{h1 : h1j1 > τ11 and h1j2 > τ12} {h1 : h1j1 > τ11}
(4.9)
220 ST 790, Dynamic Treatment Regimes
Regimes
Restricted class Dη: Define η1 to be a collection
{(c11,a11), . . . , (c1L1 ,a1L1),a10}
with conditions c1` as in one of the T1(c1`) in (4.9)• A rule as in (4.8) can be written as d1(h1; η1), and Dη comprises
all regimes with rules of this form• Feature: Do not need to collect all patient variables up front; can
ascertain as needed, useful if some are expensive orburdensome to collect
221 ST 790, Dynamic Treatment Regimes
Nonuniqueness
A decision rule can be represented with more than one list:• For a decision list with η1 = {(c11,a11), . . . , (c1L1 ,a1L1),a10} and
decision rule d1(h1; η1), there may exist another decision list withη′1 = {(c′11,a
′11), . . . , c′1L′1
,a′1L′1),a′10} and decision rule d1(h1; η′1)
such that d1(h1; η1) = d1(h1; η′1) for all h1 but L1 6= L′1 or L1 = L′1but c1j 6= c′1j or a1j 6= a′1j for some j = 1, . . . ,L1
222 ST 790, Dynamic Treatment Regimes
Nonuniqueness
Example: (4.6) and alternative
If age < 50 and WBC < 10 then C2; If age ≥ 50 then C1;
else if age ≥ 50 then C1; else if WBC < 10 then C2;
else C3 else C3
C1
C2
C3
WBC 10
50 age
223 ST 790, Dynamic Treatment Regimes
Optimal regime
Value of a regime: For any regime dη ∈ Dη, V(dη) is the sameregardless of which version of dη is considered
• Optimal regime doptη
d1(h1; ηopt1 ), ηopt
1 = arg maxη1
V(dη)
• Suggests: If there are equivalent versions of doptη , estimate the
version that is least costly/burdensome to implement• Value search: Maximize VAIPW (dη) on Slide 185 subject to
targeting the version of an optimal regime minimizing a measureof “cost”
224 ST 790, Dynamic Treatment Regimes
Optimal regime
Value search: As on Slide 185, with A1 = {1, . . . ,m1}
πdη ,1(H1; η1, γ1) =
m1∑a1=1
I{d1(H1; η1) = a1}ω1(H1,a1; γ1)
Qdη ,1(H1; η1, β1) =
m1∑a1=1
I{d1(H1; η1) = a1}Q1(H1,a1;β1)
VAIPW (dη)
= n−1n∑
i=1
[ Cdη ,iYi
πdη ,1(H1i ; η1, γ1)−Cdη ,i − πdη ,1(H1i ; η1, γ1)
πdη ,1(H1i ; η1, γ1)Qdη ,1(H1i ; η1, β1)
]
= n−1n∑
i=1
m1∑a1=1
([I(A1i = a1)
ω1(H1i ,a1; γ1)
{Yi −Q1(H1i ,a1; β1)
}+ Q1(H1i ,a1; β1)
]I{d1(H1i ; η1) = a1}
)as in (2) of Zhang et al. (2015)
225 ST 790, Dynamic Treatment Regimes
Optimal regime
Cost: If N1` = cost of measuring components of h1 necessary tocheck c11, . . . , c1` (fewer comparisons to thresholds is better), thecost of implementing regime dη with rule as in (4.8)
d1(h1; η1) =
L1∑`=0
a1` I(h1 ∈ R1`).
in the population is
N1(dη) =
L1∑`=1
N1`P(H1 ∈ R1`) +N1L1P(H1 ∈ R10)
• Zhang et al. (2015): Describe a computational algorithm tomaximize VAIPW (dη) while minimizing an estimator of the costN1(dη)
226 ST 790, Dynamic Treatment Regimes
4. Single Decision Treatment Regimes: Additional Methods4.1 Optimal Regimes from a Classification Perspective4.2 Outcome Weighted Learning4.3 Interpretable Treatment Regimes via Decision Lists4.4 Additional Approaches4.5 Key References
227 ST 790, Dynamic Treatment Regimes
Extensive literature
Numerous approaches: We highlight two additional approaches toestimation of an optimal regime• Regression-based estimation: To mitigate concern over
parametric model misspecification, represent
Q1(h1,a1) = E(Y |H1 = h1,A1 = a1)
nonparametrically,, e.g., using generalized additive models,support vector regression, random forests, etc, to obtain anonparametric estimator Q1(h1,a1)
• Use Q1(h1,a1) as the fitted model and thus obtain
doptQ,1(h1) = arg max
a1∈A1
Q1(h1,a1)
• E.g., Qian and Murphy (2011)
228 ST 790, Dynamic Treatment Regimes
Extensive literature
• Alternative form of value search: Because for dη in a restrictedclass Dη
V(dη) = E [Q1(H1,1)I{d1(H1; η1) = 1}+ Q1(H1,0)I{d1(H1; η1) = 0}]
maximize in η1
V(dη)
= n−1n∑
i=1
[Q1(H1,1)I{d1(H1; η1) = 1}+ Q1(H1,0)I{d1(H1; η1) = 0}
]• As above, Q1(h1,a1) is a nonparametric estimator for Q1(h1,a1)
• But here Q1(h1,a1) is used only to ensure faithful representationof the value and not to define the optimal regime estimatordirectly (Taylor et al., 2015)
229 ST 790, Dynamic Treatment Regimes
4. Single Decision Treatment Regimes: Additional Methods4.1 Optimal Regimes from a Classification Perspective4.2 Outcome Weighted Learning4.3 Interpretable Treatment Regimes via Decision Lists4.4 Additional Approaches4.5 Key References
230 ST 790, Dynamic Treatment Regimes
References
Liu, Y.,Wang, Y., Kosorok, M. R., Zhao, Y., and Zeng, D. (2018).Augmented outcome-weighted learning for estimating optimaldynamic treatment regimens. Statistics in Medicine 37, 3776–3788.
Qian, M. and Murphy, S. (2011). Performance guarantees forindividualized treatment rules. Annals of Statistics 39, 1180–1210.
Taylor, J. M. G., Cheng, W., and Foster, J. C. (2015). Reader reactionto “A robust method for estimating optimal treatment regimes” byZhang et al. (2012). Biometrics 71, 267–273.
Zhang, B., Tsiatis, A. A., Davidian, M., Zhang, M., and Laber, E. B.(2012). Estimating optimal treatment regimes from a classificationperspective. Stat 1, 103–114.
231 ST 790, Dynamic Treatment Regimes
References
Zhang, Y., Laber, E. B., Tsiatis, A. A., and Davidian, M. (2015). Usingdecision lists to construct interpretable and parsimonious treatmentregimes. Biometrics 71, 895–904.
Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012). Estimatingindividualized treatment rules using outcome weighted learning.Journal of the American Statistical Association 107, 1106–1118.
Zhou, X., Mayer-Hamblett, N., Khan, U., and Kosorok, M. R. (2017).Residual weighted learning for estimating individualized treatmentrules. Journal of the American Statistical Association 112, 169–187.
232 ST 790, Dynamic Treatment Regimes