Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Fitting a Bayesian Fay-Herriot Model
Nathan B. Cruze
United States Department of AgricultureNational Agricultural Statistics Service (NASS)
Research and Development Division
Washington, DCOctober 25, 2018
“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1
Disclaimer
The Findings and Conclusions in This Preliminary Presen-tation Have Not Been Formally Disseminated by the U.S.Department of Agriculture and Should Not Be Construed toRepresent Any Agency Determination or Policy.
Overview
I NASS interest in small area estimation (SAE)
I The Fay and Herriot (1979) modelI Case study: county estimates of planted corn, Illinois 2014
I Computation in R and JAGS
GASP 2018–Fitting Bayesian Fay-Herriot Model 3
Small Area Estimation (SAE) Literature“A domain is regarded as ‘small’ if the domain-specific sample isnot large enough to support [survey] estimates of adequateprecision.”–Rao and Molina (2015)
Regression and mixed-modeling approaches in SAE literature
I Shrinkage–improve estimates with other information
I Utility of auxiliary data as covariate
I Variance-bias trade off
Two common models
1. Unit-level models, e.g., Battese et al. (1988)I USDA NASS (formerly SRS) as source of data/funding
2. Area-level models, e.g., Fay and Herriot (1979)
GASP 2018–Fitting Bayesian Fay-Herriot Model 4
NASS Interest In SAE
Iwig (1996): USDA’s involvement in county estimates in 1917
Published estimates used by:
I Agricultural sector
I Financial institutions
I Research institutions
I Government and USDA
Published estimates used for:
I County loan rates
I Crop insurance
I County-level revenueguarantee
National Academies of Sciences, Engineering, and Medicine (2017)
I Consensus estimates: Board review of survey and other data
I Currently published without measures of uncertainty
I Recommends transition to system of model-based estimates
GASP 2018–Fitting Bayesian Fay-Herriot Model 5
Fay-Herriot (Area-Level) ModelFay and Herriot (1979)–improved upon per capita incomeestimates with following model
θ̂j = θj + ej , j = 1, . . . ,m counties (1)
θj = x′j β + uj (2)
Adding Eqs. 1 and 2
θ̂j = x′j β + uj + ej
I θ̂j , direct estimate
I E (ej |θj) = 0I V (ej |θj) = σ̂2j , estimated
variance
I xj , known covariates
I uj , area random effect
I ujiid∼ (0, σ2u)
GASP 2018–Fitting Bayesian Fay-Herriot Model 6
Fay-Herriot Formulated As Bayesian Hierarchical Model‘Recipe’ for hierarchical Bayesian model as in Cressie and Wikle(2011)
Data model:θ̂j |θj ,β
ind∼ N(θj , σ̂2j ) (3)
Process model:θj |β, σ2u
iid∼ N(x ′j β, σ2u) (4)
Prior distributions on β and σ2uI Browne and Draper (2006), Gelman (2006): σ2u ∼?I We will specify σ2u ∼ Unif (0, 108), β
iid∼ MVN(0, 106I )
Goal: Obtain posterior summaries about county totals, θj
GASP 2018–Fitting Bayesian Fay-Herriot Model 7
County Agricultural Production Survey (CAPS)
Case study in Cruze et al. (2016)
Illinois planted corn
I 9 Ag. Statistics Districts
I 102 counties
I a major producer of cornI End-of-season survey
– Direct estimates of totals– Estimated sampling variances
Min Median Max
n reports 2 47 93CV (%) 9.1 19.2 92.3
Warren 187
Hende son 071
McDonough 109
Stephenson 1n
Carroll 015
Ogle 141
Lee
Winnebago Boon 201 007
McHenry
111
DeKalb Kane 037 089
Lake 097
Cook
10 103
?----��---r----1 Kendall Will 197
Knox 095
Fulton 057
Bureau 011
LaSalle 099 20
093
Grundy 063
Peoria 143
Macoupin 117
Kankakee 091
Livingston ,,L---- ---.. 105 Iroquois
075 Woodford 203
Montgom 135
McLean 113 50 Ford
053
40 __ __ _ _ _, Vermilion L_�-----,- Champaign 183 DeWitt 039
Macon 115
019
Coles 029
Edgar 045
Clark 023 70 Cumberland .. ---- -� -- ---I 035
L_�------'--�- -� Fayette
051 Effingham
049 Jasper
079
.... --.&....,i,Marion Clay 025
.... ----- 121
Jefferson 081
Franklin 055
Williamson 199
https://www.nass.usda.gov/Charts_and_Maps/Crops_
County/indexpdf.php
https://www.nass.usda.gov/Charts_and_Maps/Crops_County/indexpdf.phphttps://www.nass.usda.gov/Charts_and_Maps/Crops_County/indexpdf.php
Covariate x1: USDA Farm Service Agency (FSA) Acreage
I FSA administers farmsupport programs
I Enrollment popular,not compulsory
I Data self-reported atFSA office
I Administrative vs.physical county
https://www.fsa.usda.gov/news-room/efoia/electronic-reading-room/
frequently-requested-information/crop-acreage-data/index
https://www.fsa.usda.gov/news-room/efoia/electronic-reading-room/frequently-requested-information/crop-acreage-data/indexhttps://www.fsa.usda.gov/news-room/efoia/electronic-reading-room/frequently-requested-information/crop-acreage-data/index
Covariate x2: NOAA Climate Division March Precipitation
Weather as auxiliary variable
I March: Planting ‘intentions’
I April: Illinois planting
I Could rainfall in Marchaffect planting?
I One-to-one mapping: ASDand climate division
I Repeat value for all countieswithin ASD
ASD Precip (in)
10 1.0820 1.3530 1.2740 1.6650 1.5060 1.3670 1.4680 1.6990 2.00
Source: ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdivDetails in Vose et al. (2014)
GASP 2018–Fitting Bayesian Fay-Herriot Model 10
ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv
NASS Official StatisticsFrom prior publication: Illinois 2014, 11.9 million acres of cornplanted
I Require: State-ASD-county benchmarking of estimates
State/district: https://quickstats.nass.usda.gov/results/3A17F375-B762-37BD-8C03-D581DC8F7A85County: https://quickstats.nass.usda.gov/results/478D1A7B-E680-3E5E-95E4-9A59F938A256
https://quickstats.nass.usda.gov/results/3A17F375-B762-37BD-8C03-D581DC8F7A85https://quickstats.nass.usda.gov/results/478D1A7B-E680-3E5E-95E4-9A59F938A256
JAGS Model
I Note data, process, prior structure from earlier slide
I Note distributions parameterized in terms of precision
I Read into R script as stored R source code or as text stringOnline resources http://www.sumsar.net/blog/2013/06/three-ways-to-run-bayesian-models-in-r/
http://www.sumsar.net/blog/2013/06/three-ways-to-run-bayesian-models-in-r/
A Pseudo-Code R Script
GASP 2018–Fitting Bayesian Fay-Herriot Model 13
Analysis of JAGS Model OutputPosterior summaries of parameters–based on 3,000 saved iterates
I Posterior means, standard deviations, quantiles, potentialscale reduction factors, effective sample sizes, pD, DIC
I Transform back to acreage scale
I Ratio benchmarking–inject benchmarking factor back intochains as in Erciulescu et al. (2018)
GASP 2018–Fitting Bayesian Fay-Herriot Model 14
Results: Models With and Without BenchmarkingI Modeled estimates (ME) may not satisfy benchmarkingI Ratio-benchmarked estimates (MERB) are consistent with
state targets and improve agreement with external sources
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
County Comparisons of Model and FSA Acreage
FSA Planted Area (Acres of Corn)
Mod
eled
Est
imat
es o
f Pla
nted
Are
a (A
cres
) ● MERB estimateME estimate
●
●
●
●●
●
●
●●
ASD Comparisons of Model and FSA Acreage
FSA Planted Area (Acres of Corn)M
odel
ed E
stim
ates
of P
lant
ed A
rea
(Acr
es) ● MERB estimate
ME estimate
GASP 2018–Fitting Bayesian Fay-Herriot Model 15
Results: Posterior Distributions of ASD-Level AcreagesUsed county-level inputs to produce county-level estimates
I Idea: derive ASD-level estimates from Monte Carlo iteratesI Sum corresponding draws from county posterior distributions
– Compute means and variances from aggregated chains
500 1000 1500 2000
Planted Area (1,000 Acres)
ASD 80 ASD 90
ASD 20 ASD 30
ASD 70
ASD 40
ASD 50
ASD 60
ASD 10
MERBNASS OFFICIAL
GASP 2018–Fitting Bayesian Fay-Herriot Model 16
Results: Relative Variability of Survey Versus ModelObtain estimates and measures of uncertainty for counties anddistricts
I Recall the goal of SAE–increased precision!
CV (%) of CAPS Survey Estimates
Min Q1 Median Mean Q3 MaxCounty 9.1 16.6 19.2 22.2 23.5 92.3District 4.4 5.6 6.8 6.6 7.2 8.7
CV (%) of MERB Estimates
Min Q1 Median Mean Q3 MaxCounty 3.6 5.6 7.2 9.0 10.5 31.2District 1.7 2.0 2.1 2.5 2.3 4.4
GASP 2018–Fitting Bayesian Fay-Herriot Model 17
Results: Comparison to Other SourcesFor counties and districts, compute ‘standard score’
I (model estimate-other source)/model standard errorI Direct Estimates, Cropland Data Layer, Battese-Fuller, FSA
● ●●●●●
● ●● ●●
● ●●
● ●
−6 −4 −2 0 2 4 6
County−Level Comparisons
Number of Posterior Standard Deviations
FS
AB
atte
se_F
ulle
rC
DL
DE
*
●●
FS
AB
atte
se_F
ulle
rC
DL
DE
−2 0 2 4 6
ASD−Level Comparisons
Number of Posterior Standard Deviations
GASP 2018–Fitting Bayesian Fay-Herriot Model 18
Conclusions
Discussed Bayesian formulation of Fay-Herriot model motivated byNASS applications
Other R packages facilitate Bayesian small area estimation
I ‘BayesSAE’ by Chengchun Shi
I ‘hbsae’ by Harm Jan Boonstra
I May be bound by limited choice of prior distributions
I Transformations of data may be needed
Proc MCMC in SAS added ‘Random’ statement as of version 9.3
Thanks to Andreea Erciulescu (NISS) and Balgobin Nandram(WPI) for three years of adventures in small area estimation!
GASP 2018–Fitting Bayesian Fay-Herriot Model 19
ReferencesBattese, G. E., Harter, R. M., and Fuller, W. A. (1988). An error-components model for prediction of county crop
areas using survey and satellite data. Journal of the American Statistical Association, 83(401):28–36.
Browne, W. J. and Draper, D. (2006). A comparison of bayesian and likelihood-based methods for fitting multilevelmodels. Bayesian Analysis, 1(3):473–514.
Cressie, N. and Wikle, C. (2011). Statistics for Spatio-Temporal Data. Wiley, Hoboken, NJ.
Cruze, N., Erciulescu, A., Nandram, B., Barboza, W., and Young, L. (2016). Developments in Model-BasedEstimation of County-Level Agricultural Estimates. In Proceedings of the Fifth International Congress onEstablishment Surveys. American Statistical Association, Geneva.
Erciulescu, A. L., Cruze, N. B., and Nandram, B. (2018). Model-based county level crop estimates incorporatingauxiliary sources of information. Journal of the Royal Statistical Society: Series A (Statistics in Society).doi:10.1111/rssa.12390.
Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of james-steinprocedures to census data. Journal of the American Statistical Association, 74(366):269–277.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article bybrowne and draper). Bayesian Analysis, 1(3):515–534.
Iwig, W. (1996). The National Agricultural Statistics Service County Estimates Program. In Schaible, W., editor,Indirect Estimators in U.S. Federal Programs, chapter 7, pages 129–144. Springer, New York.
National Academies of Sciences, Engineering, and Medicine (2017). Improving Crop Estimates by IntegratingMultiple Data Sources. The National Academies Press, Washington, DC.
Rao, J. and Molina, I. (2015). Small Area Estimation. In Wiley Online Library: Books. Wiley, 2nd edition.
Vose, R. S., Applequist, S., Squires, M., Durre, I., Menne, M. J., Williams, C. N., Fenimore, C., Gleason, K., andArndt, D. (2014). Improved Historical Temperature and Precipitation Time Series for U.S. Climate Divisions.Journal of Applied Meteorology and Climatology, 53(5):1232–1251.
GASP 2018–Fitting Bayesian Fay-Herriot Model 20