30
Small Area Estimation: An Overview Stas Kolenikov Abt SRBI AAPOR 2014 Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 1 / 28

Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Small Area Estimation: An Overview

Stas Kolenikov

Abt SRBI

AAPOR 2014

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 1 / 28

Page 2: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Motivation

In many social, behavioral or health studies, there may be interest inobtaining estimates for small subgroups of population

National study → estimates forI statesI countiesI school districtsI metro areas (NHIS phone use)I health service areas

Statewide study → estimates for counties or cities

City-wide study → estimates for neighborhoods

Detailed industry by size by region classifications (some BLS work)

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 2 / 28

Page 3: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

US official statistics

Census Bureau: Small Area Income and Poverty EstimatesI States, Counties, School districtsI http://www.census.gov/did/www/saipe/

National Cancer Institute: Small Area Estimates for Cancer RiskFactors and Screening Behaviors

I States, CountiesI Combines BRFSS and NHIS (Raghunathan et al. 2007)I http://sae.cancer.gov/

National Center for Health Statistics: NHIS phone use data(Blumberg et al. 2013, Stephen Blumberg’s presentation)

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 3 / 28

Page 4: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 4 / 28

Page 5: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Some work at Abt SRBI

2013 National Survey of American Jews for the Pew Research CenterI County-level estimates of Jewish populationI Coverage decisions (dial 90% of US population, 99% Jewish

population)I Stratification decisionsI See also Abt SRBI news release, Pew Research Center Methods report,

AAPOR 2013 presentations by Ben Phillips and Stas Kolenikov

Small area phone usage figuresI ACS PUMA level estimatesI Sample design for current and future surveysI Weighting targetsI See AAPOR 2013 presentation by Kolenikov and ZuWallack

Ongoing work in health surveys conducted by Abt SRBII Small area = community within a city, county within a state

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 5 / 28

Page 6: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Statement of the problem

Decent estimates are needed!

Given the low sample sizes for small areas (sometimes n = double digits,sometimes n = single digits, sometimes n = 0), can any reasonableestimates be obtained? If yes, can any reasonable measures of precision beattached to these estimates?

Since survey means/proportions (direct estimates in SAE jargon) may notbe available, or are of insufficient accuracy, statistical models have to beused, and weaved into complex survey estimation.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 6 / 28

Page 7: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Small area estimation approaches

Statistical approaches growing from traditional survey statistics:

Apply the ratios/proportions

Treat as mixed/multilevel models

Filter/reweight a “big” data set to look like a small area

Fit models on one data set, apply on another

Main reference: Rao (2003)

Reviews: Ghosh & Rao (1994), Rao (1999), Pfeffermann (2002), Datta(2009), Lehtonen & Veijanen (2009), Pfeffermann (2013)

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 7 / 28

Page 8: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Data requirements

A large data set that would allow precise estimation of the SAEmodel for an outcome y using variables x

Identifiers of the small areas (if a part of the large data set)

The same variables x for the small area as those used in the SAEmodel in the large data set

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 8 / 28

Page 9: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Groupwise ratio estimator

Applies means or proportions of the outcome in population group

1 Split large sample into groups (e.g., age-gender-education)g = 1, . . . ,G

2 Estimate the nationwide mean outcome in each group yg3 Apply the small area proportions γg of the groups as weights to

obtain the SAE estimate∑

g γg yg

This produces a synthetic estimate, i.e., the one that is based only onaggregate data subset to the small area values.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 9 / 28

Page 10: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Group-wise ratio estimator

Education level % smoking Nationwide Columbia, MO

Less than college 22.6% 69.2% 44.1%Bachelor+ 7.9% 30.8% 55.9%

Overall 18.0% 14.4%

Direct Synthetic

Source NHIS 2012 NHIS 2012 ACS 2012(via FactFinder)

Note: age 25+.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 10 / 28

Page 11: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Group-wise ratio estimator

Education level % smoking Nationwide Columbia, MO

Less than college 22.6% 69.2% 44.1%Bachelor+ 7.9% 30.8% 55.9%

Overall 18.0% 14.4%Direct

Synthetic

Source NHIS 2012 NHIS 2012 ACS 2012(via FactFinder)

Note: age 25+.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 10 / 28

Page 12: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Group-wise ratio estimator

Education level % smoking Nationwide Columbia, MO

Less than college 22.6% 69.2% 44.1%Bachelor+ 7.9% 30.8% 55.9%

Overall 18.0% 14.4%Direct Synthetic

Source NHIS 2012 NHIS 2012 ACS 2012(via FactFinder)

Note: age 25+.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 10 / 28

Page 13: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Issues

Simplistic: assumes group structure explains all variations in outcome

Synthetic estimate only: even if there are data available for Columbia,MO, they are ignored

No telling how accurate the (implicit) model of ratios y ∝ x is

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 11 / 28

Page 14: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Area model: SAIPE history

Fay & Herriot (1979) analyzed per capita incomes for small places in theUS with population less than 1,000:

θi = x ′iβ + vi

θi = θi + ei = x ′iβ + vi + ei (1)

where

θi = log Yi is the log of the mean per capita income

Xi are demographic explanatory variables

vi is the model error

θi is the direct estimator based only on sample data on yi (if ni > 0)

ei is sampling error

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 12 / 28

Page 15: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Area model: composite estimate

Fay-Herriot model (1) → composite estimate

θi = γi θi + (1− γi )x ′i β (2)

γi =σ2v

σ2v + ψi=

contribution ofthe direct estimate,

(3)

ψi = sampling variance of θi ,

β = estimate of β from (1)

That is,

compositeestimate

=precisionweight

× directestimate

+precisionweight

× syntheticestimate

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 13 / 28

Page 16: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Unit models

Unit-level data on respondents in their specific areas is available →the unit model for responses:

yij = x ′ijβ + vi + εij (4)

where

i still enumerates the areas

j enumerates individuals in areas

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 14 / 28

Page 17: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Unit model: composite estimate

If

sampling fractions within areas are small

+

the covariates are known for all units in the area,

then the composite estimator is

µi = γi [yi + (Xi − xi )β] + (1− γi )X ′i β, (5)

γi =σ2v

σ2v + σ2e/ni(6)

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 15 / 28

Page 18: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Variance estimation

In the area model framework,

V[composite estimator θi ] =[g1i (σ

2v , ψi ) ∼ fraction of the direct estimator variance

]+[g2i (σ

2v , ψi , xi ) ∼ sampling uncertainty due to the fact

that β needs to be estimated]

+[g3i (σ

2v , ψi ) ∼ sampling uncertainty due to the fact

that σ2v needs to be estimated]

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 16 / 28

Page 19: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Variance estimation, small print

What is being estimated is the mean squared error (MSE) rather thanthe variance of an estimate

Contribution of vi is treated as a bias contribution rather thanvariance contribution

Simply plugging in the parameter estimates underestimates MSE

This is a large sample expression applicable when there are manysmall areas

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 17 / 28

Page 20: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

SAE Extensions

Multivariate models (for several outcomes at a time; cancer SAE)

Generalized linear models (e.g., logistic for binary outcome) (Jiang &Lahiri 2001)

Models for quantities other than means and proportions (e.g.,quantiles) (Chambers & Tzavidis 2006)

Spatial covariance modeling for area effects

Parametric bootstrap to account for uncertainty in variancecomponent estimation (Lahiri 2003)

Bayesian methods (Ghosh et al. 1998) (cancer SAE)

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 18 / 28

Page 21: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Simulation of non-sampled units

1 For each small area, prepare the set of calibration characteristics

2 Take a rich, large sample survey data set3 Simulate the units in small area by either. . .

I . . . reweighting the observations in the large data set, or. . .I . . . finding a subset of observations from the large data set via

combinatorial optimization. . .

. . . so that the resulting sample matches the calibration characteristicsof the small area of interest

Microdata analysis can be performed on the resulting microsample that ismade to resemble the characteristics of the small area of interest.

See Williamson et al. (1998)

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 19 / 28

Page 22: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Two micro data sets

Blumberg et al. (2009) estimated phone usage at state level:

1 Phone usage data is available in NHIS

2 Fit the multinomial logistic model using NHIS outcome yi and NHISdemographics xi

3 Microdata with both relevant demographic variables and state IDs areavailable through CPS

4 Take NHIS coefficients, apply the model with CPS demographics, getpredicted probabilities for everybody

5 Estimate prevalences as average probabilities by state

Likewise, combine NHIS and ACS data (with PUMA geographic IDs) toestimate phone usage: Battaglia et al. (2010) for NYC neighborhoods andKolenikov & ZuWallack (2013) for states.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 20 / 28

Page 23: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

What I covered today

1 MotivationFederal statisticsWork at Abt SRBI

2 MethodsRatio calculationsArea modelsUnit modelsVariance estimationExtensionsReweighting simulationTwo micro data sets

3 Discussion

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 21 / 28

Page 24: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

SAE is the synthesis of. . .

Mixedmodels

GLM Surveystatistics

BLUP / MSEoptimizationBayesian

methods GIS

SMALL AREAESTIMATION

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 22 / 28

Page 25: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

Challenges

Standard errors are always tough

Need to match the outcomes data set with administrative data setI Definitions of explanatory variables may not match in different data sets

Very detailed levels of geography may be protected due toconfidentiality constraints

I ACS only has PUMA ≈100,000 people

Methodological challenges specific to the components of SAEI Non-response as a survey methodology issueI Use of weights in multilevel models as multilevel modeling issueI Complicated custom code

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 23 / 28

Page 26: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

The end

THANK YOU!

Questions, comments, requests: [email protected]

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 24 / 28

Page 27: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

References I

Battaglia, M. P., Eisenhower, D., Immerwahr, S. & Konty, K. (2010),Dual-frame weighting of RDD and cell phone interviews at the locallevel, in ‘Proceedings of Survey Research Methods Section’, TheAmerican Statistical Association, Alexandria, VA.

Blumberg, S. J., Ganesh, N., Luke, J. V. & Gonzales, G. (2013), Wirelesssubstitution: State-level estimates from the National Health InterviewSurvey, 2012, Technical Report 70, National Center for Health Statistics.

Blumberg, S. J., Luke, J. V., Davidson, G., Davern, M. E., Yu, T.-C. &Soderberg, K. (2009), Wireless substitution: State-level estimates fromthe National Health Interview Survey, january-december 2007, TechnicalReport 14, National Center for Health Statistics.

Chambers, R. & Tzavidis, N. (2006), ‘M-quantile models for small areaestimation’, Biometrika 93(2), 255–268.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 25 / 28

Page 28: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

References II

Datta, G. S. (2009), Model-based approach to small area estimation, inD. Pfeffermann & C. R. Rao, eds, ‘Sample Surveys: Inference andAnalysis’, Vol. 29B of Handbook of Statistics, North Holland,Amsterdam, pp. 251–288.

Fay, R. E. & Herriot, R. A. (1979), ‘Estimates of income for small places:An application of James-Stein procedures to census data’, Journal ofthe American Statistical Association 74(366), 269–277.

Ghosh, M., Natarajan, K., Stroud, T. W. F. & Carlin, B. P. (1998),‘Generalized linear models for small-area estimation’, Journal of theAmerican Statistical Association 93(441), 273–282.

Ghosh, M. & Rao, J. N. K. (1994), ‘Small area estimation: An appraisal’,Statistical Science 9(1), 55–76.

Jiang, J. & Lahiri, P. (2001), ‘Empirical best prediction for small areainference with binary data’, Annals of the Institute of StatisticalMathematics 53(2), 217–243.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 26 / 28

Page 29: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

References III

Lahiri, P. (2003), ‘On the impact of bootstrap in survey sampling andsmall-area estimation’, Statistical Science 18, 199–210.

Lehtonen, R. & Veijanen, A. (2009), Design-based methods for domainsand small areas, in D. Pfeffermann & C. R. Rao, eds, ‘Sample Surveys:Inference and Analysis’, Vol. 29B of Handbook of Statistics, NorthHolland, Amsterdam, pp. 219–249.

Pfeffermann, D. (2002), ‘Small area estimation—new developments anddirections’, International Statistical Review 70(1), 125–143.

Pfeffermann, D. (2013), ‘New important developments in small areaestimation’, Statistical Science 28(1), 40–68.

Raghunathan, T. E., Xie, D., Schenker, N., Parsons, V. L., Davis, W. W.,Dodd, K. W. & Feuer, E. J. (2007), ‘Combining information from twosurveys to estimate county-level prevalence rates of cancer risk factorsand screening’, Journal of the American Statistical Association102(478), 474–486.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 27 / 28

Page 30: Small Area Estimation: An Overvie€¦ · I De nitions of explanatory variables may not match in di erent data sets Very detailed levels of geography may be protected due to con dentiality

References IV

Rao, J. N. K. (1999), ‘Some recent advances in model-based small areaestimation’, Survey Methodology 25(2), 175–186.

Rao, J. N. K. (2003), Small Area Estimation, Wiley series in surveymethodology, John Wiley and Sons, New York.

Williamson, P., Birkin, M. & Rees, P. (1998), ‘The estimation ofpopulation microdata by using data from small area statistics andsample of anonymised records’, Environment and Planning Analysis30, 785–816.

Stas Kolenikov (Abt SRBI) Small Area Estimation: An Overview AAPOR 2014 28 / 28