Upload
domenic-hamilton
View
214
Download
1
Embed Size (px)
Citation preview
1G89.2229 Lect 13M
• Why might data be missing in psychological studies?
• Missing data patterns
• Overview of statistical approaches
• Example
G89.2229 Multiple Regression Week 13 (Monday)
2G89.2229 Lect 13M
What leads to missing data?
• Experimental studies • equipment failure• experimenter error• subject noncompliance• drop out of subject• data entry error
» SOLUTION: collect more data
3G89.2229 Lect 13M
What leads to missing data?
• Observational (nonexperimental) studies» equipment failure» observer/coder error» subject refusal» subject loss to follow-up» design/measure changes» nested interview questions» SOLUTION:
• Collect More Data
4G89.2229 Lect 13M
Missing data patterns
• Terms suggested by Rubin» Rubin (1976), Little & Rubin (1987)
• In some cases, the data are MISSING COMPLETELY AT RANDOM (MCAR)» Which data point is missing cannot be
predicted by any variable, measured or unmeasured.• Prob(M|Y)=Prob(M)
» The missing data pattern is ignorable. Analyzing available complete data is just fine.
5G89.2229 Lect 13M
Missing data patterns
• In other cases, the data are MISSING AT RANDOM (MAR)» Which data point is missing is systematically
related to subject characteristics, but these are all measured• Conditional on observed variables,
missingness is random
• Prob(M|Y)=Prob(M|Yobserved)
» E.g. Lower educated respondents might not answer a certain question.
» Missingness is can be treated as ignorable.
6G89.2229 Lect 13M
Missing data pattern
• When data is Not Missing At Random (NMAR)» Data are missing because of process related to
value that is unavailable• Someone was too depressed to come report
about depression• Abused woman is not allowed to meet
interviewer» Missing data pattern is not ignorable.
7G89.2229 Lect 13M
Statistical Approaches
• Listwise deletion» If a person is missing on any analysis variable,
he is dropped from the analysis
• Pairwise deletion» Correlations/Covariances are computed using
all available pairs of data.
• Imputation of missing data values
• Model-based use of complete data» E-M (estimation-maximization approach)» Illustration in Excel
8G89.2229 Lect 13M
Classic Cohen & Cohen advice
• Create dummy code for who has missing data (M)» Find out what variables are related to
missingness
• Insert mean or some other value for missing values in IV and create multiple regression with full data plus variable M.» Procedure has been criticized for
underestimating variance
• Current text reflects compromise
9G89.2229 Lect 13M
Case Study:Depression Following Miscarriage
» Neugebauer et al (1992) American Journal of Public Health, 82, 1332-1339.
» Neugebauer et al (1992) American Journal of Obstetrics and Gynecology, 166, 104-109.
• Neugebauer and his colleagues recruited women who sought treatment for miscarriages and measured their levels of depression at 2 weeks, 6 weeks and 26 weeks post miscarriage.
• The study built on a case-control study of the causes of miscarriage that successfully recruited nearly 80% of eligible women. Neugebauer and his colleagues enrolled approximately 85% of the women in the initial study.
• 382 women were initially enrolled.
10G89.2229 Lect 13M
Neugebauer Missing Data
• Some women were not available in the first two weeks following miscarriage, and others were not available in the subsequent two-week windows for followup measurement.
• Only 166 women were measured at all three time points.
• Missing observations were not related to:» SES, Ethnicity, Parity (# of pregnancies), # of
previous miscarriages
• Those with missing observations were» Somewhat younger, with fewer living children.
11G89.2229 Lect 13M
Pattern of missing data
2 Wk 6Wk 26Wk N
0 0 0 166
0 0 1 30
0 1 0 12
0 1 1 24
1 0 0 88
1 0 1 26
1 1 0 36
12G89.2229 Lect 13M
Means for different groups
10.0
12.0
14.0
16.0
18.0
20.0
22.0
24.0
26.0
28.0
0 5 10 15 20 25 30
0,0,0
0,0,1
0,1,0
0,1,1
1,0,0
1,0,1
1,1,0