Using Modern Missing Data Analyses for effective inference about Hunters’ satisfaction towards OFW...

Preview:

Citation preview

Using Modern Missing Data Analyses for effective inference about Hunters’ satisfaction towards OFW Program

Muhammad Imran Khan

Motivation of Study• Hunting & fishing are part of Nebraska's

heritage

• NGPC is interested in improving hunter/angler recruitment & retention (NGPC,2008)

• Data collected in 2013 to know about hunters’ motivations & satisfactions towards OFW lands

• Purpose of this study is to compare estimates using appropriate imputation methods

2

Missing Data• Missingness in Surveys (Groves et al., 2004)– Noncoverage– Unit Nonresponse– Item Nonresponse– Partial Nonresponse (Brick & Kalton,1996)– Data Entry Error (Anne & Andrea,2014)

• Missing data Mechanism(Buuren, 2012)– Missing Completely At Random (MCAR)– Missing At Random (MAR)– Missing Not At Random (MNAR)

3

How much missing data is “problematic”• Researchers assign some limits:– > 5% (Schafer,1999)– >10% (Benntt,2001)– >20% (Peng et al., 2006)– (Widaman,2006) specified the following scaleo 1%-2% (Negligible)o 5%-10% ( Minor)o 10%-25% (Moderate)o 25%-50% (High)o >50% (Excessive)

• Important problems of missingness (Bell & Fairclough,2013)– decrease in precision– Increase bias in parameter estimation

4

NGPC & UNL conducted survey• Sampling frame: hunters who purchased hunting

license for hunting in 2012 in NE– The survey contained three parts:o Where, & what hunt; Environment Impacto Motivations(Relatedness, Competence, Autonomy)o Socio-demographic factors

• About collected data– Total questions = 42 (used 19 Qus. for analysis)– Sample size = 8181– Completely filled =1555 (19%)– Unit nonresponse = 627 (8%)– Item nonresponse = 5999 (73%)o Varies from 1 to 8 missingness per respondent in all 19 Qus.

5

81%

Determining Type of Missing Data• Test for MCAR (Little, 1988)– Little’s Test of MCAR (Omnibus test of all specified

variables) o If test is not significant, then data can be assumed MCARo If test is significant, then Then, data may be MAR or MNAR

• For given data test is sig. So data are MAR– 3256.783 with .

• Table shows number & percent missing

6

M. Satisf. Rel_1 Rel_2 Comp. Auto. H_Days “Harvest” Educ. Income Age

Ns. 5171 332 332 345 397 5096 0 1088 1465 1263

% 0.685 0.04 0.04 0.046 0.053 0.675 0 0.144 0.194 0.167

Data used for analysis• 13 Questions for motivation based on SDT

5 Questions on relatedness transformed to 2 factors

7

Data used for analysis• 13 Questions for motivation based on SDT

4 Qus. on competence & autonomy transformed each to 1 factor

8

Satisfaction=Rel_1+Rel_2+Comp+Auto+ Educ+Age+Income+H_Days+Harvest

Model used for the analysis9

Variable Description of the variable [measured on 7 point Likert scale]

Satisfaction How satisfied were you with your experience on private lands enrolled in the Open Fields and Waters (OFW)?

Releatedness_1 I enjoy mentoring other huntersReleatedness_2 I go hunting primarily to spend time with others & people I care aboutCompetence Overall, Hunting makes me feel competent in other areas of my lifeAutonomy Hunting helps me to feel independent; self-sufficient and more control

in lifeEducation Highest level of education that you have complete (<HS;HS;S.C;C;≥G)

Age Age (Approximately in years)

Income Total annual income for your household before taxes (8 diff. levels)

Hunting_Days Visiting OFW sites allowed me to increase total days I spent hunting

“Harvest” If you hunted in 2012 on a OFW site, did you harvest? (Yes/No)

• Deletion or non-imputing methods:o List-wise Deletion (Pigott, 2001)o Pair-wise Deletion (Bennett, 2001)

• Nonstochastic or ad-hoc methods:o Mean Imputation (Graham,2003)o Regression Imputation (Qin et.al., 2007)

• Stochastic or Established methods:o Stochastic Regression (Todd et al., 2013)o Multiple Imputation(MI) (John, et al., 2007)o Full Information Maximum Likelihood(FIML)o Expectation Maximization (EM)(Yiran & Chao-Ying, 2013)

Methods for Handling Missing Data 10

Mean Imputation 11

Comparing Results 12

Fitted Model

List-wise Deletion Mean Imputation

p-value p-valueIntercept 0.415 0.205 0.043 0.381 0.062 0.000Releatedness_1 -0.023 0.040 0.565 -0.005 0.010 0.614Releatedness_2 0.038 0.045 0.401 0.017 0.011 0.120Competence 0.147 0.079 0.062 0.023 0.019 0.227Autonomy

0.049 0.075 0.5140.009 0.018 0.619

Education-0.045 0.039 0.241

-0.011 0.010 0.296

Age -0.001 0.003 0.682 0.000 0.001 0.563Income 0.003 0.022 0.903 0.002 0.006 0.754Hunting_Days 0.135 0.017 0.000 0.162 0.007 0.000“Harvest” 0.569 0.077 0.000 0.364 0.028 0.000

5999 cases or rows are Deleted m=1, maxit=1

Multiple Imputation 13

Comparing Results 14

Fitted Model

List-wise Deletion Mean Imputation Multiple Imputation

p-value p-value p-valueIntercept 0.415 0.205 0.043 0.381 0.062 0.000 0.316 0.183 0.093Releatedness_1 -0.023 0.040 0.565 -0.005 0.010 0.614 -0.019 0.037 0.605Releatedness_2 0.038 0.045 0.401 0.017 0.011 0.120 0.048 0.037 0.205Competence 0.147 0.079 0.062 0.023 0.019 0.227 0.097 0.077 0.219Autonomy 0.049 0.075 0.514 0.009 0.018 0.619 0.017 0.061 0.787Education -0.045 0.039 0.241 -0.011 0.010 0.296 -0.032 0.027 0.245Age -0.001 0.003 0.682 0.000 0.001 0.563 -0.001 0.002 0.731Income 0.003 0.022 0.903 0.002 0.006 0.754 0.007 0.022 0.761Hunting_Days 0.135 0.017 0.000 0.162 0.007 0.000 0.152 0.013 0.000“Harvest” 0.569 0.077 0.000 0.364 0.028 0.000 0.575 0.060 0.000

5999 cases or rows are Deleted m=1, maxit=1 m=20, maxit=10

Comparing Results 15

Fitted Model

List-wise Deletion

Full Information Maximum Likelihood

(FIML) Imputation

Expectation Maximization

(EM) Imputation

p-value p-value p-valueIntercept 0.415 0.205 0.043 0.309 0.185 0.096 0.301 0.155 0.053Releatedness_1 -0.023 0.040 0.565 -0.012 0.032 0.713 -0.010 0.034 0.781Releatedness_2 0.038 0.045 0.401 0.061 0.036 0.089 0.061 0.034 0.076Competence 0.147 0.079 0.062 0.102 0.065 0.116 0.106 0.065 0.106Autonomy 0.049 0.075 0.514 0.016 0.062 0.798 0.013 0.062 0.839Education -0.045 0.039 0.241 -0.034 0.034 0.319 -0.030 0.033 0.359Age -0.001 0.003 0.682 -0.001 0.002 0.779 0.005 0.020 0.803Income 0.003 0.022 0.903 0.006 0.020 0.766 -0.001 0.002 0.752Hunting_Days 0.135 0.017 0.000 0.148 0.014 0.000 0.148 0.015 0.000“Harvest” 0.569 0.077 0.000 0.599 0.062 0.000 0.598 0.060 0.000

5999 cases or rows are Deleted EM algorithm (MLE) converges in 37 iterations

• EM only shows that Releadness_2 is significant• EM estimates smallest standard error for Income• Comparison of Imputation Methods

Summary16

% of smaller estimations than List-wise Deletion out of 10 variablesApproaches Estimates Std. Err. P-value SuggestionsList-wise Deletion Base Base Base Avoid to useMean Imputation 60% 100% 40% Careful useMultiple Imputation 30% 100% 20% BetterFull Information Maximum Likelihood

30% 100% 20% Better

Expectation Maximization

40% 90% 20% Preferred if converged

Thanks for your kind attention

Special Thanks to: Dr. Andrew Tyre, Uni. Of Nebraska, LincolnDr. Lisa Pennisi, Uni. Of Nebraska, Lincoln Dr. Allan McCutcheon, Uni. Of Nebraska, Lincoln

Nebraska Game & Parks Commission

Anne-Kathrin,F. & Andrea B. (2014). The economic performance of Swiss drinking water utilities. Journal of Prod. Analysis.41:383-397. doi 10.1007/s11123-013-0344-0

Bell, M. L.,& Fairclough,D.L. (2013). Practical and statistical issues in missing data for longitudinal patient reported outcomes.Statistical Methods in Medical Research, 0(0), 1-20. doi: 10.1177/0962280213476378Bennett, D.A. (2001). How can I deal with missing data in my study? Australian and New Zealand Journal of Public Health, 25,

464-469.Brick, J., & Kalton, J. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238. doi:10.1177/096228029600500302Buuren, S.V.(2012). Flexible imputation of missing data. Taylor & Francis, FL: CRC Press.John, W. G. & Allison E. O. & Tamika D. G.(2007). How many imputations are really needed? some practical clarifications of multiple imputation theory, Springer,8:206- 213.Graham, J. W. (2003). Adding missing-data-relevant variables to FIML based structuralequation models. Structural Equation Modeling, 10,80–100.Groves, R., Fowler, F., Couper, M., Lepkowski, J., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken, NJ: John Wiley.Little, R.J.A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association , 83, 1198-1202. NGPC (2008). Nebraska 20 year hunter/angler recruitment, development and retention plan. Lincoln, NE.Pigott, T. D. (2001). A Review of Methods for Missing Data. Educational Research and Evaluation, 7(4), 353-383.Peng, C.Y., Harwell, M., Liou, S.M., & Ehman, L.H. (2006). Advances in missing data methods and implications for educational

research. In S Sawilowsky (Ed.), Real data analysis (pp.31-78), Greenwich, CT: Information Age.Qin,Y.,Zhang,S.,Zhu,X.,Zang,J.,& Zhang,C. (2007). Semi-parametric optimization for missing data imputation. Appl Intell 27,79-88.

DOI 10.1007/s10489-006-0032-0Schafer, J.L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research. 8: 3-15. Todd D. L., Terrence D. J., Kyle M. L., & Whitney M. (2013). On the joys of missing data. Journal of Pediatric Psychology, 1-12. doi:10.1093/jpepsy/jst048 Yiran D. & Chao-Ying J.P.(2013). Principled missing data methods for researchers. Springer, 2:222.

References18

Questions & Comments!are most welcome

Contact Information: mik3.stat@gmail.com