Upload
armando-patrick
View
42
Download
0
Embed Size (px)
DESCRIPTION
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001. Imputation Techniques Implemented in SOLAS 3.0. SINGLE IMPUTATION Hot Decking Predicted Mean Imputation Last Value Carried Forward. MULTIPLE IMPUTATIONS Propensity Score Based Imputation Predictive Model Based Imputation. - PowerPoint PPT Presentation
Citation preview
A REVIEWA REVIEW
ByBy
Chi-Ming KamChi-Ming Kam
Surajit RaySurajit Ray
April 23, 2001April 23, 2001
Imputation TechniquesImputation Techniques Implemented in SOLAS 3.0 Implemented in SOLAS 3.0
SINGLE IMPUTATIONSINGLE IMPUTATIONHot DeckingHot Decking
Predicted Mean ImputationPredicted Mean Imputation
Last Value Carried ForwardLast Value Carried Forward
MULTIPLE IMPUTATIONSPropensity Score Based ImputationPredictive Model Based Imputation
Method 1: Propensity Score Based Method 1: Propensity Score Based ImputationImputation
This was the only Method in Version 1.This was the only Method in Version 1.
Method similar to Lavori,Dawson,Shera Method similar to Lavori,Dawson,Shera (1995) (1995) “A multiple imputation strategy for clinical trials “A multiple imputation strategy for clinical trials with truncation of patient data”with truncation of patient data”
GOAL:GOAL: To impute Missing values by minimal To impute Missing values by minimal Distributional AssumptionsDistributional Assumptions
How it WorksHow it Works
Let R be the indicator for the Let R be the indicator for the missingness pattern (R=0 or missingness pattern (R=0 or 1)1)
X1 X2 ……….XP Y
?
?
.
.
?
R
1
1
1
1
.
.
0
0
.
.
0
Model R from X1, X2,..., XP
using logistic regression
p=Prob(R=1| X1, X2,…,XP) for each case yielding N pi’s.
How it works…. How it works…. (Approximate Bayesian bootstrap, Rubin, 1987)(Approximate Bayesian bootstrap, Rubin, 1987)
Group (user specified) Group (user specified) the units by the value of the units by the value of the quintiles of p.the quintiles of p.
Suppose that within a Suppose that within a particular group there are particular group there are nn1 1 observed and nobserved and n00
missing values.missing values.Quintiles of p
sample n1+n0 units with replacement from the observed values.
From the sampled pool, subsample n0 units with replacement
Use these n0 units as the imputed values for the n0 missing values
Repeat the procedure m times to get m imputations
with replacement with replacement
n1 obs n0+ n1 n0
Theoretical JustificationTheoretical Justification
It produces an imputed distribution of Y that It produces an imputed distribution of Y that has been corrected for biases due to has been corrected for biases due to missingness related to X.missingness related to X.
It's similar in spirit to reweighting but here we It's similar in spirit to reweighting but here we have a multiple imputation version of it.have a multiple imputation version of it.
The method produces unbiased estimates for The method produces unbiased estimates for marginal distribution of Y.marginal distribution of Y.
Problems/DrawbacksProblems/Drawbacks
The method does not preserve the The method does not preserve the association between Y and individual association between Y and individual XXii’s.’s.
Reasoning: Reasoning: The only aspect of X The only aspect of X ii’s that is ’s that is
used here is the linear prediction for Y used here is the linear prediction for Y ( (00+ + 11XX11++22XX22
….…. + +ppXXpp) in the logistic ) in the logistic
model. This is the function that predicts model. This is the function that predicts missingness of Y (R) but not Y itself.missingness of Y (R) but not Y itself.
Problems/Drawbacks Problems/Drawbacks (Continued….)(Continued….)
Suppose XSuppose X11 is highly correlated with Y but is is highly correlated with Y but is
unrelated to P(R=1). Xunrelated to P(R=1). X11 will drop out of the will drop out of the
the logistic model and it is not used in the the logistic model and it is not used in the imputation. As a result, the model will imputation. As a result, the model will misrepresent the correlation of Xmisrepresent the correlation of X11 and Y. and Y.
Also, by not using XAlso, by not using X11 in the imputation, we are in the imputation, we are
failing to impute Y efficiently.failing to impute Y efficiently.
Simulation Results Using SOLAS 1.1
Data Generation Mechanism:
Y=X+Z+, where and ~(0,1)
Source: Paul D. Allison “Multiple Imputation for Missing Data, A Cautionary Tale”
1 2 3 4Missing DataMechanism
Ordinary LeastSquares on
Original Data
listwiseDeletion
MultipleImputationWith SOLAS
MultipleImputation With
data AugmentationMissing completelyat random
XZ
0.979(011)1.014 (.012)
0.969 (.016)1.029 (.017)
1.141 (.016)0.667 (.020)
0.976 (.012)1.028 (.016)
Missing at random(dependent on X)
XZ
1.012 (.012)1.007 (.012)
0.986 (.025)1.011 (.017)
1.470 (.013)0.448 (.015)
1.005 (.025)0.997 (.016)
Missing at random(dependent on Y)
XZ
0.993 (.012)1.001 (.012)
0.695 (.015)0.708 (.015)
1.350 (.013)0.746 (.023)
0.985 (.021)0.997 (.013)
Nonignorable(dependent on Z)
XZ
1.003 (.012)1.002 (.012)
0.995 (.016)1.007 (.024)
1.250 (.013)1.215 (.027)
1.154 (.015)1.245 (.020)
Some Comments About the Some Comments About the Propensity Score Based MethodPropensity Score Based Method
The method can provide valid but The method can provide valid but possibly inefficient inferences about possibly inefficient inferences about Y (marginal).Y (marginal).
The method can lead to very The method can lead to very misleading inferences about the misleading inferences about the relationships between Y and other relationships between Y and other variables.variables.
Method 2: Predictive Model Method 2: Predictive Model Based Multiple ImputationBased Multiple Imputation
This method is implemented in SOLAS 2.0 and 3.0This method is implemented in SOLAS 2.0 and 3.0
HOW IT WORKS:HOW IT WORKS:
Regress Y on XRegress Y on X1, 1, XX2,…, 2,…, XXpp
Get the estimates of Get the estimates of 0,0, 1,1, 2,….2,…. pp and and
Draw Draw 00**,, 11
**,, 22
**….…. pp
*, *, ** from an approximate posterior from an approximate posterior distributiondistribution
Impute YImpute Y**= = 00**+ + 11
* * XX11++22* * XX22
….…. + +pp
* * XXpp++**
where where **Normal(0, Normal(0, **))
Repeat m times to get the m imputed datasetsRepeat m times to get the m imputed datasets
Good pointsGood points
The method provides correct model based MI The method provides correct model based MI under the regression model and MARunder the regression model and MAR
It also preserves the correlation between XIt also preserves the correlation between X ii's
and Yand Y
What is the difference with NORMWhat is the difference with NORM ? ?
NORMNORM does the same thing with MCMC does the same thing with MCMC
Under multivariate normal model, both Under multivariate normal model, both methods give the same results methods give the same results
Which Software is More General ?Which Software is More General ?
I work for arbitrary missingness pattern
I work for non-linear relation of y on X
But that’s probably very similar to norm with rounding
Concluding Remarks
SOLAS is the first commercial missing data software.
It has good graphical interface.
Easy data import and export to other softwares.
Performs well under monotone missingness pattern.
Estimates are not always unbiased.