Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
More Robust Doubly RobustEstimators
Marie Davidian and Anastasios A. Tsiatis
Department of Statistics
North Carolina State University
Weihua Cao
Center for Devices and Radiological Health
US Food and Drug Administration
http://www.stat.ncsu.edu/∼davidian
More Robust Doubly Robust Estimators 1
Outline
• The simplest setting
• Estimators and double robustness
• More robust doubly robust estimators
• Extension to longitudinal data
More Robust Doubly Robust Estimators 2
The problem
The simplest setting:
• Response (outcome ) Y
• Parameter of interest : µ = E(Y )
• Full, ideal data : (Yi, Xi), i = 1, . . . , n, independent and identically
distributed (iid)
• Xi is a vector of covariates for individual i
• With no additional assumptions (nonparametric model): Usual
estimator for µ
µ̂ = n−1n∑
i=1
Yi (X’s are not needed)
More Robust Doubly Robust Estimators 3
The problem
Complication: Response is missing for some individuals
• Ri = 1 if Y is observed for individual i and Ri = 0 otherwise
• Observed data: iid (Ri, RiYi, Xi), i = 1, . . . , n
• Unless unobserved Y are missing completely at random (MCAR ),
i.e., Ri⊥⊥Yi, the complete case estimator
(n∑
i=1
Ri
)−1 n∑
i=1
RiYi
is a biased (inconsistent ) estimator for µ
• Suppose we are willing to assume that the missingness is missing at
random (MAR ), i.e.,
Ri⊥⊥Yi|Xi
• Under MAR , several estimators for µ have been advocated. . .
More Robust Doubly Robust Estimators 4
Outcome regression estimator
Under MAR: R⊥⊥Y |X
µ = E(Y ) = E{E(Y |X)} = E{E(Y |R = 1, X)} (1)
• Suppose the true E(Y |X) = m0(X), and posit a (parametric )
regression model m(X, β) for m0(X)
• If the model is correct , µ = E(Y ) = E{m0(X)} = E{m(X, β0)},
suggestingµ̂OR = n−1
n∑
i=1
m(Xi, β̂) for some β̂
• By (1), β can be estimated using the complete cases
{(Yi, Xi) : Ri = 1}; e.g., by least squares, solving
n∑
i=1
Ri mβ(X, β){Yi − m(Xi, β)} = 0, mβ(X, β) =∂m(Xi, β)
∂β
• Whether or not µ̂OR is a consistent estimator for µ depends on
whether or not the posited model m(X, β) is correct
More Robust Doubly Robust Estimators 5
Inverse propensity score weighted estimator
Under MAR: R⊥⊥Y |X
pr(R = 1|Y, X) = E(R|Y, X) = E(R|X) = pr(R = 1|X)
• Propensity score pr(R = 1|X); π0(X) = true propensity score
• If π0(X) were known , estimate µ by
n−1n∑
i=1
RiYi
π0(Xi)or
{n−1
n∑
i=1
Ri
π0(Xi)
}−1
n−1n∑
i=1
RiYi
π0(Xi)
• I.e., weight the observed responses (Yi : Ri = 1) so that these
individuals represent themselves and “similar ” others whose
responses are missing
More Robust Doubly Robust Estimators 6
Inverse propensity score weighted estimator
Under MAR: R⊥⊥Y |X
pr(R = 1|Y, X) = E(R|Y, X) = E(R|X) = pr(R = 1|X) (2)
n−1n∑
i=1
RiYi
π0(Xi)
p−→ E
{RY
π0(X)
}= E
[E
{RY
π0(X)
∣∣∣∣Y, X
}]
= E
{Y E(R|Y, X)
π0(X)
}= E
{Y
π0(X)
π0(X)
}
using MAR (2)
= E(Y ) = µ as long as π0(X) > 0 a.s.
More Robust Doubly Robust Estimators 7
Inverse propensity score weighted estimator
Unless missingness is by design: π0(X) is not known
• Posit a model π(X, γ) for pr(R = 1|X); e.g., logistic regression
• Estimate γ; e.g., by maximum likelihood, (ML) maximizing
n∏
i=1
{π(Xi, γ)}Ri{1 − π(Xi, γ)}1−Ri
and form the estimator
µ̂IPW = n−1n∑
i=1
RiYi
π(Xi, γ̂)
• Interestingly : If the model is correct , i.e., π(X, γ0) = π0(X)
n−1n∑
i=1
RiYi
π(Xi, γ̂)is more efficient than n−1
n∑
i=1
RiYi
π0(Xi)
• If the propensity score model is misspecified , then µ̂IPW may be
biased (inconsistent )
More Robust Doubly Robust Estimators 8
Semiparametric theory
Robins, Rotnitzky, and Zhao (1994): If the propensity model is
correct , with no additional assumptions on the distribution of the data
• All consistent and asymptotically normal estimators are
asymptotically equivalent to estimators of the form
n−1n∑
i=1
{RiYi
π(Xi, γ̂)+
Ri − π(Xi, γ̂)
π(Xi, γ̂)h(Xi)
}for some function h(X)
• Optimal h(X) leading to smallest variance (asymptotically ) is
h(X) = −E(Y |X)
• Suggests modeling E(Y |X) by m(X, β), estimating β, and
estimating µ by
n−1n∑
i=1
{RiYi
π(Xi, γ̂)−
Ri − π(Xi, γ̂)
π(Xi, γ̂)m(Xi, β̂)
}
More Robust Doubly Robust Estimators 9
Semiparametric theory
• If γ̂p
−→ γ∗ and β̂p
−→ β∗, this estimatorp
−→
E
{RY
π(X, γ∗)−
R − π(X, γ∗)
π(X, γ∗)m(X, β∗)
}
= E
[Y +
{R − π(X, γ∗)
π(X, γ∗)
}{Y − m(X, β∗)}
]
= µ + E
[{R − π(X, γ∗)
π(X, γ∗)
}{Y − m(X, β∗)}
]
• Because R⊥⊥Y |X, E
[{R − π(X, γ∗)
π(X, γ∗)
}{Y − m(Xi, β
∗)}
]= 0
and hence the estimator is consistent if either
– π(X, γ∗) = π(X, γ0) = π0(X) (propensity model correct )
– m(X, β∗) = m(X, β0) = m0(X) (outcome regression correct )
• Double robustness
More Robust Doubly Robust Estimators 10
Not so fast. . .
Kang and Schafer (2007): Simulation studies
• The “usual ” doubly robust estimator can be severely biased when
both models are even just “slightly ” misspecified
• And can exhibit non-negligible bias if the estimated propensity score
is close to zero for some observations, even if the model is correct
• The outcome regression estimator µ̂OR performed much better ,
even under misspecification
• “Two wrong models are not necessarily better than one ”
More Robust Doubly Robust Estimators 11
Simulation scenario
• Zi = (Zi1, . . . , Zi4)T ∼ standard multivariate normal, n = 1000
• Xi = (Xi1, . . . , Xi4)T , where
Xi1 = exp(Zi1/2), Xi2 = Zi2/{1 + exp(Zi1)} + 10
Xi3 = (Zi1Zi3/25 + 0.6)3, Xi4 = (Zi3 + Zi4 + 20)2
• True outcome regression model, Y |X ∼ N{m0(X), 1}
m0(X) = 210 + 27.4Z1 + 13.7Z2 + 13.7Z3 + 13.7Z4
• True propensity score model
π0(X) = expit(−Z1 + 0.5Z2 − 0.25Z3 − 0.1Z4)
• Misspecified models used X’s instead of Z’s, fitted by least squares
and ML respectively (“usual ” doubly robust estimator µ̂DR)
• True µ = 210
More Robust Doubly Robust Estimators 12
Simulation results
Bias RMSE MCSD AveSE Cov
OR correct, PS correct
µ̂OR −0.03 1.13 1.13 1.15 0.95
µ̂DR −0.03 1.13 1.13 1.15 0.95
OR correct, PS incorrect
µ̂OR −0.03 1.13 1.13 1.15 0.95
µ̂DR −0.03 2.07 2.07 1.32 0.95
OR incorrect, PS correct
µ̂OR 2.31 2.72 1.43 1.41 0.63
µ̂DR 0.18 1.84 1.84 1.64 0.93
OR incorrect, PS incorrect
µ̂OR 2.31 2.72 1.43 1.41 0.63
µ̂DR −17.89 179.88 178.98 22.60 0.94
More Robust Doubly Robust Estimators 13
New perspective
Can we identify alternative doubly robust estimators?
• For now : Propensity model π(X) is fixed (no unknown parameters)
• Consider the class of estimators
n−1n∑
i=1
{RiYi
π(Xi)−
Ri − π(Xi)
π(Xi)m(Xi, β)
},
indexed by the parameter β
• Assume the propensity model is correct ; i.e., π(X) = π0(X)
• The outcome regression model may or may not be correct
• Then each estimator in the class is consistent for µ and can be
shown to have asymptotic variance
var(Y ) + E
[{1 − π0(X)
π0(X)
}{Y − m(X, β)}2
]
More Robust Doubly Robust Estimators 14
New perspective
var(Y ) + E
[{1 − π0(X)
π0(X)
}{Y − m(X, β)}2
](3)
• The best (efficient ) estimator in this class is the one with smallest
asymptotic variance =⇒ minimize (3) in β
• Choose β = βopt satisfying
E
[{1 − π0(X)
π0(X)
}mβ(X, βopt){Y − m(X, βopt)}
]= 0
or equivalently
E
[{1 − π0(X)
π0(X)
}mβ(X, βopt){m0(X) − m(X, βopt)}
]= 0
More Robust Doubly Robust Estimators 15
New perspective
Key idea:
• Because the estimator
n−1n∑
i=1
{RiYi
π(Xi)−
Ri − π(Xi)
π(Xi)m(Xi, β̂)
}
is asymptotically equivalent to the estimator
n−1n∑
i=1
{RiYi
π(Xi)−
Ri − π(Xi)
π(Xi)m(Xi, β
∗)
},
where β̂p
−→ β∗, it would be desirable to find β̂ such that
1. β̂p
−→ βopt when the propensity score model is correct , whether
or not the regression model is misspecified
2. m(X, β̂) is a consistent estimator for m0(X) , i.e., β̂p
−→ β0,
when the outcome regression model is correct , even if the
propensity score model is misspecified
More Robust Doubly Robust Estimators 16
New perspective
• The least squares estimator for β is the solution to
n∑
i=1
Ri{Yi − m(Xi, β)}mβ(Xi, β) = 0
and hence satisfies condition 2.
• However , this estimatorp
−→ β∗, the solution to
E [R mβ(X, β∗) {Y − m(X, β∗)}] = E [π0(X) mβ(X, β∗) {Y − m(X, β∗)}]
= E [π0(X) mβ(X, β∗) {m0(X) − m(X, β∗)}] = 0
• Thus, β∗ is not the same as βopt, the solution to
E
[{1 − π0(X)
π0(X)
}mβ(X, βopt) {m0(X) − m(X, βopt)}
]= 0,
unless m(X, β∗) = m0(X) (in which case β∗ = β0 = βopt),
• So the least squares estimator does not satisfy condition 1.
More Robust Doubly Robust Estimators 17
New perspective
E
[{1 − π0(X)
π0(X)
}mβ(X, βopt) {m0(X) − m(X, βopt)}
]= 0, (4)
Implication:
• The “usual ” doubly robust estimator using the least squares
estimator for β is doubly robust but does not achieve minimum
variance when m(X, β) is misspecified
• Motivated by (4), consider estimating β instead by the solution to
n∑
i=1
Ri
{1 − π(Xi)
π2(Xi)
}mβ(Xi, β) {Yi − m(Xi, β)} = 0
More Robust Doubly Robust Estimators 18
New perspective
• For arbitrary π(X), this estimatorp
−→ to the solution to
E
[R
{1 − π(X)
π2(X)
}mβ(X, β) {Y − m(X, β)}
]
= E
[π0(X)
{1 − π(X)
π2(X)
}mβ(X, β) {Y − m(X, β)}
]
= E
[π0(X)
{1 − π(X)
π2(X)
}mβ(X, β) {m0(X) − m(X, β)}
]= 0
(5)
• Thus, if the propensity model is correct , i.e., π(X) = π0(X), (5)
reduces to (4), and the estimatorp
−→ βopt
• And, more generally, the estimator satisfies condition 2.
More Robust Doubly Robust Estimators 19
Result
• The best estimator for β for the regression model
E(Y |X) = m(X, β) (under var(Y |X) = constant) does not
necessarily lead to the best estimator for µ
• Weighting the summand in the least squares estimating equation by1 − π(Xi)
π2(Xi)may be preferable
• Holds more generally ; e.g., if m(X, β) is a generalized linear model
More Robust Doubly Robust Estimators 20
Projection estimator
• The foregoing took the propensity score model to be fixed , with no
unknown parameters γ
• When γ is estimated via ML, a slight modification is necessary
• The optimal estimator for β is found by solving jointly in (β, c)
n∑
i=1
[Ri
π(Xi, γ̂)
1 − π(Xi, γ̂)
π(Xi, γ̂)
mβ(X, β)
πγ(Xi, γ̂)
1 − π(Xi, γ̂)
×
{Yi − m(Xi, β) − cT πγ(Xi, γ̂)
1 − π(Xi, γ̂)
}]= 0
where πγ(X, γ) =∂π(X, γ)
∂γ
• Denote the resulting doubly robust estimator by µ̂PROJ
More Robust Doubly Robust Estimators 21
Simulation results revisited
Bias RMSE MCSD AveSE Cov
OR correct, PS correct
µ̂OR −0.03 1.13 1.13 1.15 0.95
µ̂DR −0.03 1.13 1.13 1.15 0.95
µ̂PROJ −0.03 1.13 1.13 1.15 0.95
OR correct, PS incorrect
µ̂OR −0.03 1.13 1.13 1.15 0.95
µ̂DR −0.03 2.07 2.07 1.32 0.95
µ̂PROJ −0.03 1.13 1.13 1.15 0.95
More Robust Doubly Robust Estimators 22
Simulation results revisited
Bias RMSE MCSD AveSE Cov
OR incorrect, PS correct
µ̂OR 2.31 2.72 1.43 1.41 0.63
µ̂DR 0.18 1.84 1.84 1.64 0.93
µ̂PROJ 0.19 1.18 1.16 1.17 0.95
OR incorrect, PS incorrect
µ̂OR 2.31 2.72 1.43 1.41 0.63
µ̂DR −17.89 179.88 178.98 22.60 0.94
µ̂PROJ 0.13 1.25 1.25 1.20 0.94
More Robust Doubly Robust Estimators 23
Remarks
Take home message:
• How one estimates β in the regression model can have substantial
impact on the performance of the corresponding doubly robust
estimator for µ, especially under model misspecification
• Details : Cao, Tsiatis, and Davidian (2009)
• A similar idea was proposed in the literature on survey sampling by
Montanari (1987)
• Tan (2006) suggested related strategy : Substitute α0 + α1m(X, β)
for m(X, β), estimate β by least squares and then estimate (α0, α1)
by a weighted regression
• All of this generalizes to estimation of more complex quantities
More Robust Doubly Robust Estimators 24
Remarks
Causal inference on treatment effect in an observational study:
Supplementary material to Cao et al. (2009)
• Potential outcomes (Y0, Y1) for treatments (0, 1); inference on
∆ = E(Y1) − E(Y0)
• Observed data : iid (Ri, Yi, Xi), i = 1, . . . , n; treatment indicator
Ri = 0 or 1; assume Y = (1 − R)Y0 + RY1
• No unmeasured confounders : (Y0, Y1)⊥⊥R|X
• Propensity score model : π(X) = pr(R = 1|X)
• Estimators for ∆: For some h(X, β)
n−1n∑
i=1
[RiYi
π(Xi)−
(1 − Ri)Yi
1 − π(Xi)− {Ri − π(Xi)}h(Xi, β̂)
]
More Robust Doubly Robust Estimators 25
Remarks
n−1n∑
i=1
[RiYi
π(Xi)−
(1 − Ri)Yi
1 − π(Xi)− {Ri − π(Xi)}h(Xi, β̂)
]
• Let E(Yk|X) = m(k)0 (X); posit models mk(X, αk), k = 0, 1, and let
β = (αT0 , αT
1 )T
• Optimal choice : If models are correct
h(X, β) =m0(X, α0)
1 − π(X)+
m1(X, α1)
π(X)
• Can find βopt and β̂ to achieve the “best ” doubly robust estimator
More Robust Doubly Robust Estimators 26
Remarks
Start with premise regression model is correct?
• Instead assume m(X, β) is correct , so m(X, β0) = m0(X)
• The propensity model may or may not be correct
• The “best ” estimator for µ is µ̂OR
• However : see Tsiatis and Davidian (2007), discussion of Kang and
Schafer (2007), for speculation on choices of π(X) to yield
alternative doubly robust estimators for µ with good robustness
properties
More Robust Doubly Robust Estimators 27
Longitudinal data
Extension: Monotonely coarsened data and general parameters in a
statistical model (Tsiatis, Davidian, and Cao, 2011)
• Special case : Longitudinal study with dropout
• Collect data Lj at time tj , j = 1, . . . , M + 1
• Full, ideal data : L = LM+1 = (L1, . . . , LM+1)
• Dropout : If subject is last seen at time tj , dropout indicator D = j,
observe only Lj = (L1, . . . , Lj)
• Observed data : iid (Di, LDi), i = 1, . . . , n
• Interest : Parameter µ in a semiparametric model for the full data
• Full data estimator for µ : Solve
n∑
i=1
ϕ(Li, µ) = 0, E{ϕ(L, µ)} = 0
More Robust Doubly Robust Estimators 28
Missing at random
MAR assumption: Probability of dropout depends on the full data only
through the data observed before dropout
• pr(D = j|L) depends only on Lj , j = 1, . . . , M + 1
• Must have pr(D = M + 1|L) > 0 a.s.
• Fully specified dropout model : pr(D = j|L) = π(j, Lj),
j = 1, . . . , M + 1
• Write π(M + 1, L) = π(L)
More Robust Doubly Robust Estimators 29
Estimators
Robins et al. (1994), Tsiatis (2006): If the dropout model is
correctly specified
• All consistent and asymptotically normal estimators for µ solve
n∑
i=1
I(Di = M + 1)ϕ(Li, µ)
π(Li)+
M∑
j=1
dMji(Lji)
Kji(Lji)Lj(Lji)
= 0
• Arbitrary functions Lj(Lj), j = 1, . . . , M
• λj(Lj) =
8
>
<
>
:
π(1, L1), j = 1
π(j, Lj)
1 −Pj−1
k=1π(k, Lk)
, j = 2, . . . , M
• dMj(Lj) = I(D = j) − λj(Lj)I(D ≥ j), j = 1, . . . , M
• Kr(Lj) = 1 −Pj
k=1π(j, Lj), j = 1, . . . , M
• Optimal Lj(Lj) = E{ϕ(L, µ)|Lj}
More Robust Doubly Robust Estimators 30
Estimators
Simplest case: M = 1, L1 = X, L2 = Y
• D = 1, observe L1 = X; D = 2, observe L = (X, Y ) (full data )
• Estimate µ = E(Y ) =⇒ ϕ(L, µ) = Y − µ
• π(1, L1) = pr(D = 1|X) = 1 − pr(D = 2|X);
π(2, L2) = π(L) = pr(D = 2|X, Y ) = pr(D = 2|X) = π(X), say
=⇒ π(1, L1) = 1 − π(X)
• 1st term =I(D = 2)(Y − µ)
π(X), 2nd term =
I(D = 1) − {1 − π(X)}I(D ≥ 1)
1 − {1 − π(X)}L1(X) = −
I(D = 2) − π(X)
π(X)L1(X)
• Optimal L1(X) = E(Y − µ|X) = E(Y |X) − µ
• I(D = 2) ⇐⇒ I(R = 1) = R
• Algebra =⇒ same estimator for µ as before
More Robust Doubly Robust Estimators 31
Estimators
Models for E{ϕ(L, µ)|Lj):
• Specify Lj(Lj , β), j = 1, . . . , M
• Feasibility : Must satisfy E{Lj+1(Lj+1, β)|Lj} = Lj(Lj , β)
• Want an estimator for β that satisfies conditions analogous to
conditions 1. and 2. from before
Estimator for β: Solve
n∑
i=1
M∑
j=1
I(D > j)qj(Lj , β){Lj+1(Lj+1, β) − Lj(Lj , β)}
= 0,
where LM+1(LM+1, β) = ϕ(L, µ)
• Appropriate weighting
qj(Lj , β) = −{Kj(Lj)}−1
j∑
k=1
λk(Lk)
Kk(Lk)Lk,β(Lk, β), Lj,β(Lj , β) =
∂Lj(Lj , β)
∂β
More Robust Doubly Robust Estimators 32
Estimators
• All of this extends by analogy to the projection estimator when the
π(j, Lj) are modeled and fitted
Simplest case: With these weights, estimating equation becomes
n∑
i=1
I(Di = 2)
{1 − π(Xi)
π2(Xi)
}L1,β(X, β)[ Y − {L1(X, β) + µ}︸ ︷︷ ︸
model for E(Y |X)
] = 0
• With I(D = 2) ⇐⇒ R, same as slide 18
• Reasoning extends to M > 1
Simulation studies: Tsiatis et al. (2011)
• The proposed estimator outperformed a competing estimator of
Bang and Robins (2005) that does not attempt to “optimize ” the
method of fitting the models Lj(Lj , β)
More Robust Doubly Robust Estimators 33
Discussion
• Not all doubly robust estimators are the same
• Key : How one estimates β in the regression model matters
• The proposed estimators are designed to equal or exceed the
efficiency of “usual ” doubly robust estimators when the missingness
mechanism is correctly modeled
• Empirical evidence suggests that using an estimator for β that
minimizes the variance in this case endows the estimator for µ with
robustness to model misspecification
• . . . and serves to counteract instability of estimation due to very
small estimated π(X, γ̂)
• This idea can be adapted to other contexts
More Robust Doubly Robust Estimators 34
References
Bang, H. and Robins, J.M. (2005). Doubly robust estimation in missing data and
causal inference models. Biometrics 61, 962–972.
Cao, W., Tsiatis, A.A. and Davidian, M. (2009). Improving efficiency and
robustness of the doubly robust estimator for a population mean with
incomplete data. Biometrika 96, 723–734.
Kang, D.Y.J. and Schafer, J.L (2007). Demystifying double robustness: a
comparison of alternative strategies for estimating a population mean from
incomplete data (with discussion and rejoinder) Statistical Science 22, 523–580.
Montanari, G.E. (1987). Post-sampling efficient QR-prediction in large-sample
surveys. International Statistical Review 55, 191–202.
Robins, J.M., Rotnitzky, A., and Zhao, L.P. (1994). Estimation of regression
coefficients when some regressors are not always observed. Journal of the
American Statistical Association 89, 846–866.
Tan, Z. (2006). A distributional approach for causal inference using propensity
scores. Journal of the American Statistical Association 101, 1619–1637.
Tan, Z. (2007). Understanding OR, PS, and DR. Statistical Science 22, 560–568.
More Robust Doubly Robust Estimators 35
References
Tan, Z. (2008). Comment: Improved local efficiency and double-robustness. The
International Journal of Biostatistics 4, Issue 1, Article 10.
Tsiatis, A.A. (2006). Semiparametric Theory and Missing Data. New York:
Springer.
Tsiatis, A.A., Davidian, M. and Cao, W. (2011). Improved doubly robust estimation
when the data are monotonely coarsened, with application to longitudinal
studies with dropout. Biometrics 67, 536–545.
More Robust Doubly Robust Estimators 36