Integrating GPS and self-reported measures of
land area in household surveys
Alberto Zezza Development Data Group, The World Bank
RuLIS Expert Consultation FAO Headquarters - Rome – November 8, 2016
worldbank.org/lsms RuLIS, 8 November 2016
Background: LSMS Methodological Research
• Broad scope of LSMS methodological research since 2005• Agricultural productivity measurement: LSMS Methodological Validation
Program (MVP) - UK Aid, Partnerships w/ (FAO) Global Strategy to Improve Agricultural & Rural Statistics
• Approach:• Test (old & new) methods in tandem with a gold standard• Assess relative accuracy & scale-up feasibility• Cost effectiveness, skill & training requirements, respondent burden• Document results, best practices & protocols for scale-up (guidelines)• Integrate validated & cost-effective methods into LSMS operations
• Today’s focus: GPS and self-reported land area measures in Household surveys
worldbank.org/lsms RuLIS, 8 November 2016
Motivation
Land area is critical in:• Measuring productivity• Assessing farmer wealth• Designing
titling/registration schemes
• Anything agriculture… > 70%Of the world’s poor reside in rural areas*
*source: IFAD, Rural poverty report 2011
worldbank.org/lsms RuLIS, 8 November 2016
Measuring Land Area: Methodological OptionsFarmer self-reported
estimate
PROS- Inexpensive
- Less missingness
CONS- Subjective
- Complicated by traditional units
-Potential ulterior motives
Compass and rope (aka traversing)
PROS-Traditional gold
standard for accuracy- Eliminates subjectivity
CONS- Time/labor intensive
(leading to higher costs)
- Requires travel to plot
GPS
PROS- Significantly quicker than traversing with
advantages of objective measurement
CONS- Questions of accuracy
on small plots (?)- Requires travel to plot
Remote Sensing (?)
PROS - Potential to eliminate
plot visits
CONS- Resolution limitations
- Feasibility of boundary identification
3 Methodological experiments: Ethiopia (n=1798), Tanzania (n=1945), Nigeria (n=494) – Total N=4237
worldbank.org/lsms RuLIS, 8 November 2016
Comparison of Methods (National Surveys): Subjective vs. Objective
Subjective farmer self-reported estimates are potentially sensitive to:
• Respondent characteristics• Perceived use of the data
(taxation, program eligibility)• Traditional/local units of
measurement• Rounding
• Large errors• Systematic biases
Source: Carletto, Savastano, Zezza (2013). “Fact or Artifact: the Impact of Measurement Errors on the Farm size - Productivity Relationship”, Journal of Development Economics.
worldbank.org/lsms RuLIS, 8 November 2016
0
2
4
6
0 2 4 6GPS
Ethiopia
0
2
4
6
CR
0 2 4 6SR
0
1
2
3
0 1 2 3 4GPS
Tanzania
0
1
2
3
CR
0 2 4 6 8SR
05101520
0 5 10 15 20GPS
Nigeria
05101520
CR
0 5 10 15 20SR
Correlation between GPS and CR measurements:
0.997
(about 0.5 between SR and CR)
GPS vs. Compass & Rope vs Subjective
worldbank.org/lsms RuLIS, 8 November 2016
0
20
40
60
80
100
Ethiopia Tanzania Total
GPS CRPlot Size Level (CR) & MinutesAverage Measurement Duration
• Ethiopia:– GPS = 13.7 minutes– CR = 56.8 minutes
• Tanzania:– GPS = 7.4 minutes– CR = 29.3 minutes
GPS much, much faster (cheaper) than CR
worldbank.org/lsms RuLIS, 8 November 2016
So GPS is the way to go, except…• Collecting GPS-based land areas not always feasible – field work
protocols, lack of physical access, refusals• Substantial presence of missing values (up to 30 percent or more):
Empirical implications unclear
SurveyRate ofMissingness
Required Spatial Coverage ofGPS-Based Plot Area Measurements
Niger Enquête Nationale sur les Conditionsde Vie des nages et l’Agriculture 2011
29%Measure all plots in the same enumerationarea as the household.
Nigeria General Household Survey - Panel2012/2013
13%Measure all plots in the same district of thehousehold and within 3 hours of travel,regardless of mode of transportation.
Tanzania National Panel Survey2010/2011
22%Measure all plots within 1 hour of travel fromthe household, regardless of mode oftransportation.
Uganda National Panel Survey2011/2012
44%Measure all plots in the same enumerationarea as the household.
www.worldbank.org/lsms RuLIS, 8 November 2016
Non-randomness in missing GPS-based plot areasUNPS 2009/10 TZNPS 2010/11
Entire Sample W/ GPS W/o GPS
Observations 4,142 3,383(82%)
759(18%)
GPS-BasedPlot Area (Acres)
2.59 2.59 --
Farmer-Reported Plot Area (Acres)
2.31 2.30 2.35
Distance to Home (KM) 3.74 1.95 13.92 ***
Distance to Road (KM) 2.18 1.62 5.39 ***
Rented/Other † 0.12 0.09 0.25 ***
# of Plots in Holding 3.09 3.08 3.15 ***
Mover Original HH † 0.06 0.05 0.09 ***
Split-Off HH † 0.09 0.08 0.15 ***
Wealth Index (2008/09) -1.06 -1.09 -0.88 ***
Note: Results from tests of mean differences reported. *** p<0.01, ** p<0.05, * p<0.1.Statistics weighted through the use of household sampling weights. † denotes a dummyvariable.
Entire Sample W/ GPS W/o GPS
Observations 4,333 2,814(65%)
1,519(35%)
GPS-Based Plot Area (Acres) 2.13 2.13 --
Farmer-Reported Plot Area (Acres) 2.05 2.00 2.12
Less Than 15 Mins Away from HH †
0.62 0.80 0.31 ***
30+ Mins Away from HH † 0.22 0.06 0.48 ***
Rented/Other † 0.26 0.14 0.46 ***# of Plots in Holding 3.31 3.17 3.54 ***
Mover Original HH † 0.04 0.01 0.09 ***
Split-Off HH † 0.13 0.06 0.25 ***
Wealth Index (2005/06) -0.66 -0.77 -0.47 ***
Note: Results from tests of mean differences reported. *** p<0.01, ** p<0.05, * p<0.1. Statisticsweighted through the use of household sampling weights. † denotes a dummy variable.
worldbank.org/lsms RuLIS, 8 November 2016
Multiple Imputation (MI): Background• MI originally proposed to handle missing data in public use files from
censuses, sample household surveys (Rubin, 1977)• Using distribution of observed data to estimate plausible values for missing
data, incorporating random, imputation-related components to reflectuncertainty (Rubin, 1987)
• Superior over casewise deletion & conditional mean imputation, known tounderstate true variance (Schafer & Graham, 2002)
• Key assumption: Missing At Random (MAR) conditional on observables,plausibility depends on the nature & sources of missing data
Our Approach:• 50 imputations of GPS-based plot area, using PMM with 5 neighbors• Robustness checks: # of m, # of neighbors, bootstrapping, PMM vs. OLS
worldbank.org/lsms RuLIS, 8 November 2016
Multiple Imputation (MI) model
Selected OLS Regression Results Underlying Multiple ImputationDependent Variable = GPS-Based Plot Area (Acres)
UNPS 2009/10 TZNPS 2010/11
Farmer-Reported Plot Area (Acres) 0.945*** 0.866***Log [Value of Plot Output] 0.023 0.056***Log [Value of Plot Input] 0.027** 0.032***# of Plots in Holding -0.141*** -0.094**District & Enumerator Fixed Effects YES YESObservations 2,814 3,363R2 0.658 0.688
worldbank.org/lsms RuLIS, 8 November 2016
Empirical Approach for MI validation
• Create artificial missing(ness) in GPS-based plot areas• Conduct MI based on each unique data set under a specific
simulated degree of missing observations beyond the two different distance thresholds (SR: key dependent variable)
• Compare the distributions of plot area and plot-level agricultural productivity ( imputed vs observed) the same plots
• Identify the missing(ness) threshold beyond which, MI yields imputed distributions that are statistically different from the observed distributions
• MI reliably predicting missing GPS-based plot areas in surveys
worldbank.org/lsms RuLIS, 8 November 2016
Assessing the tolerable rate of missing (ness) for use of MI
9382
010
2030
4050
# Im
puta
tions
Sta
tistic
ally
iden
tical
to th
e 'T
ruth
'
0 20 40 60 80 100% of plots missing beyond threshold
Malawi (1km threshold)
5245
010
2030
4050
# Im
puta
tions
Sta
tistic
ally
iden
tical
to th
e 'T
ruth
'
0 20 40 60 80 100% of plots missing beyond threshold
Malawi (500m threshold)
73560
1020
3040
50
# Im
puta
tions
Sta
tistic
ally
iden
tical
to th
e 'T
ruth
'
0 20 40 60 80 100% of plots missing beyond threshold
Ethiopia (1km threshold)
4836
010
2030
4050
# Im
puta
tions
Sta
tistic
ally
iden
tical
to th
e 'T
ruth
'
0 20 40 60 80 100% of plots missing beyond threshold
Ethiopia (500m threshold)
Tolerable rates of plot area missingness
Plot area Land Productivity
Plot Area YieldTolerable rate (%)
Tolerable rate (%)
Malawi1.0 km
93(26)
82(23)
500 m52
(24)45
(21)
Ethiopia1.0 km
73(20)
56(13)
500 m48
(18)36
(15)
*overall missing(ness) in parentheses
worldbank.org/lsms RuLIS, 8 November 2016
Concluding Thoughts• Clear evidence of systematic bias in farmer self-reported area estimates• GPS serves as a time- and cost-efficient substitute for CR (in most cases)• GPS + SR: When GPS measurements are missing, impute them using the self-
reported area estimates• Imputing missing GPS-based plot areas has clear implications for policy-relevant
productivity analysis• MI use to compute mean statistics is empirically validated by our work under MAR.• Critical rates of missing(ness) that MI can overcome is context specific and can be
use to efficiently plan survey operations • RuLIS: Distribute one land area variable with notation on whether SR or GPS+MI?
worldbank.org/lsms RuLIS, 8 November 2016
LSMS Resources on Land Area Measurement• Carletto, G., Gourlay, S., Murray, S., & Zezza, A., 2016. Land Area Measurement in Household Surveys: A Guidebook.
Washington DC: World Bank.
• Carletto, G., Gourlay, S., Murray, S. and Zezza, A. (2016). Cheaper, Faster and More Than Good Enough: Is GPS the new gold standard in land area measurement? World Bank Policy Research Working Paper, 7759.
• Carletto, G., Gourlay, S., and Winters, P. (2015). From Guesstimates to GPStimates: Land Area Measurement and Implications for Agricultural Analysis. Journal of African Economies, 24 (5), 593–628. (Also available in the World Bank Policy Research Working Paper series.)
• Carletto, G., Savastano, S., and Zezza, A. (2013). Fact or artifact: The impact of measurement errors on the farm size–productivity relationship, Journal of Development Economics, 103(C), 254–261. (Also available in the World Bank Policy Research Working Paper series.)
• Dillon, A., Gourlay, S., McGee, K., and Oseni, G. (2016). Land measurement bias and its empirical implications: evidence from a validation exercise. World Bank Policy Research Working Paper, 7597.
• Kilic, T., Zezza, A., Carletto, G., and Savastano, S. (2013). Missing(ness) in Action: Selectivity Bias in GPS-Based Land Area Measurements. World Bank Policy Research Working Paper 6490.
• Kilic, T., I. Yacoubou Djima, and C. Carletto. (2016). Is Predicting Missing GPS-Based Land Area Measures Mission Impossible in Household Surveys? Exploring the Promise of MI. World Bank Policy Research Working Paper, forthcoming.