18
STATISTICAL EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

Embed Size (px)

Citation preview

Page 1: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

STATISTICAL EXPERTISE &

GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY

Research & Methodology Directorate

U.S. Census Bureau

Page 2: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

1The Center for Statistical Research & Methodology reviews all research activities and results to ensure that Census Bureau

Statistical Quality Standards are met and that

each effort meets a business need of the Census Bureau (motivation, research problem(s), potential applications), which

includes how it aligns with the Census Bureau’s strategic plan and the R&M Directorate portfolio management;

each effort is deeply based in the scientific method and the foundations of statistical science; and

each effort does not put at risk the Census Bureau’s mission and reputation.

To help the Census Bureau continuously improve its processes and data products, general research

activity is undertaken in seven broad areas of statistical expertise and general research topics. The

activities are supported primarily by the General Research Project and the Working Capital Fund.

Expertise for Collaboration and Research1 Page

1. Missing Data, Edit, and Imputation 1

2. Record Linkage 3

3. Small Area Estimation 5

4. Survey Sampling: Estimation and Modeling 7

5. Statistical Computing and Software 11

6. Time Series and Seasonal Adjustment 12

7. Experimentation, Simulation, and Modeling 15

Page 3: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

1

Missing Data, Edit, and Imputation

Motivation: Missing data problems are endemic to the conduct of statistical experiments and data collection projects. The

instigators almost never observe all the outcomes they had set to record. When dealing with sample surveys or censuses that

means individuals or entities omit to respond, or give only part of the information they are being asked to provide. In addition the

information provided may be logically inconsistent, which is tantamount to missing. To compute official statistics, agencies need

to compensate for missing data. Available techniques for compensation include cell adjustments, imputation and editing, possibly

aided by administrative information. All these techniques involve mathematical modeling along with subject matter experience.

Research Problem:

Compensating for missing data typically involves explicit or implicit modeling. Explicit methods include Bayesian multiple

imputation and propensity score matching and can be aided by administrative records. Implicit methods revolve around

donor-based techniques such as hot-deck imputation and predictive mean matching. All these techniques are subject to edit

rules to ensure the logical consistency of the remedial product. Research on integrating together statistical validity and logical

requirements into the process of imputing continues to be challenging. Another important problem is that of correctly

quantifying the reliability of predictor in part through imputation, as their variance can be substantially greater than that

computed nominally.

Potential Applications:

Research on missing data leads to improved overall data quality and predictors accuracy for any census or sample survey with a

substantial frequency of missing data. It also leads to methods to adjust the variance to reflect the additional uncertainty

created by the missing data. Given the continuously rising cost of conducting censuses and sample surveys, imputation and

other missing-data compensation methods aided by administrative records may come to replace actual data collection, in the

future, in situations where collection is prohibitively expensive.

Accomplishments (October 2013 – September 2014): Researched classification techniques for determining sufficiently reliable administrative records for purposes of Census

enumeration.

Investigated alternative classification techniques such as classification trees and random forests.

Researched topics such as multiple imputation hot deck via Approximate Bayesian Bootstrap, sequential regression multiple

imputation (SRMI), and properties of the Fraction of Missing Information (FMI).

Provided data-preparation and response propensity modeling code for use in data from the 2013 Census Test. The purpose of

the test was to explore new methods for data collection in the 2020 Census nonresponse follow-up operation.

Successfully submitted a proposal for an edit and imputation system based on the CSRM open source software “Tea” to thegovernance of the American Community Survey Office.

Short-Term Activities (FY 2015):

Research responsive design option – mode switching and other responsive strategies– for the Survey of College Graduates and

other surveys that use the ACS as a sampling frame.

Continue to research methods for utilizing administrative records in the decennial enumeration activities.

Continue supporting the integration of “Tea” in the survey processing activities of the American Community Survey and other

surveys.

Continue developing and evaluating models for estimating mode preference and time-to-response.

Continue expanding formulae for propensity and efforts from the Mariano-Kadane model for similar parametric models.

Longer-Term Activities (beyond FY 2015):

Supporting the development and integration of a multifunction data processing system for ACS and other demographic surveys

through the open-source software Tea.

Research model-assisted and model-based estimators based on survey frame information, such as the business register and/or

demographic administrative records.

Research methods for making decisions to mitigate non-response based on interpreting and modeling administrative

information.

Develop general methods for predicting mode preference and account for mode effect in the data collection and prediction.

Examine the effects of performing large-scale imputation in lieu of completing data collection in the Census Bureau’s largest

sample surveys.

Page 4: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

2

Selected Publications:

Erdman, C. and Adams, T. (2014). “Development of Interviewer Performance Standards for National Surveys,”

Field Methods.

Klein, M. and Sinha, B. (2013). “Statistical Analysis of Noise Multiplied Data Using Multiple Imputation,” Journal of Official

Statistics, 29, 425-465.

Klemens, B., Rodriguez, R., Thibaudeau, Y. (2014). "Simultaneous Editing and Imputation for ACS Data." ACS Work Request

RS14-3-0094 (supporting research project report in progress).

Morris, D. S. (2014) “A Comparison of Methodologies for Classification of Administrative Records Quality for Census

Enumeration,” 2014 Proceedings of the Joint Statistical Proceedings (to appear).

Rodriguez, R. A. (2008). “Disclosure Avoidance for the American Community Survey Group Quarters Sample: Details of the

Synthetic Data Method,” Research Report Series, Statistics (#2008-01), Statistical Research Division, U.S. Census Bureau,

Washington, DC.

Rodriguez, R. A. (2007). “Synthetic Data Disclosure Control for American Community Survey Group Quarters,” Proceedings of

the American Statistical Association, American Statistical Association, Alexandria, VA.

Thibaudeau Y., Slud, E., and Gottschalck, A. O. (2011). “Modeling Log-Linear Conditional Probabilities for Prediction in

Surveys,” Proceedings of the 2010 Joint Statistical Meetings, American Statistical Association, Alexandria, VA.

Thibaudeau, Y., Shao, J., and Mulrow, J. (2007). “A Study of Basic Calibration Estimators in Presence of Nonresponse,”

Proceedings of the American Statistical Association, American Statistical Association, Alexandria, VA.

Thibaudeau, Y. (2002). “Model Explicit Item Imputation for Demographic Categories,” Survey Methodology, 28(2), 135-143.

Winkler, W. E. (2008). “General Methods and Algorithms for Imputing Discrete Data under a Variety of Constraints,” Research

Report Series (Statistics #2008-08), Statistical Research Division, U.S. Census Bureau, Washington DC.

Winkler, W. and Garcia, M. (2009). “Determining a Set of Edits,” Research Report Series (Statistics #2009-05), Statistical

Research Division, U.S. Census Bureau, Washington, DC.

Contact: Yves Thibaudeau, Chandra Erdman, Maria Garcia, Martin Klein, Ben Klemens, Darcy Morris, Rolando Rodríguez,

Joseph Schafer, Jun Shao, William Winkler.

Funding Sources for FY 2015: 0331 – Working Capital Fund

1871 – General Research

Various Decennial, Demographic, and Economic Projects

Back to Top

Page 5: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

3

Record Linkage

Motivation: Record linkage is intrinsic to efficient, modern survey operations. It is used for unduplicating and updating name

and address lists. It is used for applications such as matching and inserting addresses for geocoding, coverage measurement,

Primary Selection Algorithm during decennial processing, Business Register unduplication and updating, re-identification

experiments verifying the confidentiality of public-use microdata files, and new applications with groups of administrative lists.

Significant theoretical and algorithmic progress (Winkler 2006ab, 2008, 2009a, 2013b, 2014a, 2014b; Yancey 2005, 2006, 2007,

2011, 2013; Tokle 2013) demonstrates the potential for this research. For cleaning up administrative records files that need to be

linked, theoretical and extreme computational results (Winkler 2010, 2011b) yield methods for editing, missing data and even

producing synthetic data with valid analytic properties and reduced/eliminated re-identification risk. Easy means of constructing

synthetic data make it straightforward to pass files among groups.

Research Problems:

The research problems are in three major categories. First, we need to develop effective ways of further automating our majorrecord linkage operations. The software needs improvements for matching large sets of files with hundreds of millions records against other large sets of files. Second, a key open research question is how to effectively and automatically estimate matching error rates. Third, we need to investigate how to develop effective statistical analysis tools for analyzing data from groups of administrative records when unique identifiers are not available. These methods need to show how to do correct demographic, economic, and statistical analyses in the presence of matching error.

Potential Applications:

Presently, the Census Bureau is contemplating or working on many projects involving record linkage. The projects encompassthe Demographic, Economic, and Decennial areas.

Accomplishments (October 2013 – September 2014): Produced some updates/revisions to the error-rate estimation software.

Wrote a chapter for a monograph on applications of record linkage to health sciences, wrote a survey/overview article“Matching and Record Linkage” that appeared in WIRES: Computational Statistics, wrote the article “Cleanup and Analysis of Sets of National Files” that appeared in the 2013 FCSM Proceedings.

Provided very extensive advice to three groups at Carnegie-Mellon University and two groups at Statistics Canada who areworking on record linkage methods.

Short-Term Activities (FY 2015):

Do additional speed tests of BigMatch with various Census Files including administrative files received in Summer 2014.

Develop better automatic error-rate estimation procedures and software, particularly those that do not require truth data sets forcalibration.

Apply CSRM/SRD methods of record linkage error-rate estimation to CARRA administrative data. Compare methods toLayne and Wagner (2013) who applied Belin-Rubin (1995) methods to CARRA data. Also apply the methods to DSSD CCM data.

Longer-Term Activities (beyond FY 2015):

Improve the ability of BigMatch software to use auxiliary data (particularly third-party administrative lists) to improve thematching of two files.

Develop methods for analyzing quantitative and other types of data from groups of administrative lists. These analytic linkingmethods have enormous potential for increasing the value of administrative lists and allowing analyses that heretofore were seemingly impossible.

Combine modeling/edit/imputation, statistical matching, and certain aspects of distribution theory for discrete data into modelsfor adjusting statistical analyses for record linkage error.

Create new versions of the record linkage error-rate estimation methods and software.

Selected Publications:

Alvarez, M., Jonas, J., Winkler, W. E., and Wright, R. “Interstate Voter Registration Database Matching: The Oregon-

Washington 2008 Pilot Project,” Electronic Voting Technology.

Herzog, T. N., Scheuren, F., and Winkler, W. E. (2007). Data Quality and Record Linkage Techniques, New York, NY: Springer.

Herzog, T. N., Scheuren, F., and Winkler, W. E. (2010). “Record Linkage,” in (Y. H. Said, D. W. Scott, and E. Wegman, eds.)

Wiley Interdisciplinary Reviews: Computational Statistics.

Page 6: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

4

Winkler, W. E. (2004a). “Approximate String Comparator Search Strategies for Very Large Administrative Lists,” Proceedings

of the Section on Survey Research Methods, American Statistical Association, CD-ROM.

Winkler, W. E. (2004b). “Methods for Evaluating and Creating Data Quality,” Information Systems, 29 (7), 531-550.

Winkler, W. E. (2006a). “Overview of Record Linkage and Current Research Directions,” Research Report Series (Statistics

#2006-02), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Winker, W. E. (2006b). “Automatically Estimating Record Linkage False-Match Rates without Training Data,” Proceedings of

the Section on Survey Research Methods, American Statistical Association, Alexandria, VA, CD-ROM.

Winkler, W. E. (2008). “Data Quality in Data Warehouses,” in (J. Wang, Ed.) Encyclopedia of Data Warehousing and Data

Mining (2nd

Edition).

Winkler, W. E. (2009a). “Record Linkage,” in (D. Pfeffermann and C. R. Rao, eds.) Sample Surveys: Theory, Methods and

Inference, New York: North-Holland, 351-380.

Winkler, W. E. (2009b). “Should Social Security numbers be replaced by modern, more secure identifiers?”, Proceedings of the

National Academy of Sciences.

Winkler, W. E. (2010). “General Discrete-data Modeling Methods for Creating Synthetic Data with Reduced Re-identification

Risk that Preserve Analytic Properties,” http://www.census.gov/srd/papers/pdf/rrs2010-02.pdf .

Winkler, W. E. (2011). “Machine Learning and Record Linkage” in Proceedings of the 2011 International Statistical Institute.

Winkler, W. E. (2012), “Cleaning and using administrative lists: Methods and fast computational algorithms for record linkage

and modeling/editing/imputation,” Proceedings of the ESSnet Conference on Data Integration, Madrid, Spain, November

2011 (http://www.ine.es/e/essnetdi_ws2011/ppts/Winkler.pdf).

Winkler, W. E. (2013a). “Record Linkage,” in Encyclopedia of Environmetrics. J. Wiley.

Winkler, W. E. (2013b). “Methods For Adjusting Statistical Analyses for Record Linkage Error,” Proceedings of the Survey

Research Methods Section, American Statistical Association, Alexandria, VA (to appear).

Winkler, W. E., Yancey, W. E., and Porter, E. H. (2010). “Fast Record Linkage of Very Large Files in Support of Decennial and

Administrative Records Projects,” Proceedings of the Section on Survey Research Methods, American Statistical Association,

Alexandria, VA.

Yancey, W. E. (2005). “Evaluating String Comparator Performance for Record Linkage,” Research Report Series (Statistics

#2005-05), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Yancey, W. E. (2006). “Evaluating String Comparator Performance for Record Linkage,” Proceedings of the Section on Survey

Research Methods, American Statistical Association, Alexandria, VA, CD-ROM.

Yancey, W. E. (2007). “BigMatch: A Program for Extracting Probable Matches from a Large File,” Research Report Series

(Computing #2007-01), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Yancey, W. E. (2011). “Probability Distributions for the Birthday and Collision Problems,” Research Report Series (Statistics),

Center for Statistical Research & Methodology, U.S. Census Bureau, Washington, DC.

Yancey, W. E. (2013). “Methods of Computing Optimal Record Linkage Parameters,” Proceedings of the Survey Research

Methods Section, American Statistical Association, Alexandria, VA (to appear).

Contact: William E. Winkler, William E. Yancey, Edward H. Porter, Joshua Tokle

Funding Sources for FY 2015: 0331 – Working Capital Fund

1871 – General Research

Various Decennial, Demographic, and Economic Projects

Back to Top

Page 7: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

5

Small Area Estimation

Motivation: Small area estimation is important in light of a continual demand by data users for finer geographic detail of

published statistics. Traditional demographic sample surveys designed for national estimates do not provide large enough samples

to produce reliable direct estimates for small areas such as counties and even most states. The use of valid statistical models can

provide small area estimates with greater precision; however, bias due to an incorrect model or failure to account for informative

sampling can result.

Research Problems:

Development/evaluation of multilevel random effects models for capture/recapture models.

Development of small area models to assess bias in synthetic estimates.

Development of expertise using nonparametric modeling methods as an adjunct to small area estimation models.

Development/evaluation of Bayesian methods to combine multiple models.

Development of models to improve design-based sampling variance estimates.

Extend current univariate small-area models to handle multivariate outcomes.

Development of models to improve uncertainty estimates for design-based estimates near boundaries (e.g., counts near 0, ratesnear 0 or 1).

Potential Applications:

Development/evaluation of binary, random effects models for small area estimation, in the presence of informative sampling,cuts across many small area issues at the Census Bureau.

Using nonparametric techniques may help determine fixed effects and ascertain distributional form for random effects.

Improving the estimated design-based sampling variance estimates leads to better small area models which assumes thesesampling error variances are known.

For practical reasons, separate models are often developed for counties, states, etc. There is a need to coordinate the resultingestimates so smaller levels sum up to larger ones in a way that correctly accounts for accuracy.

Extending small area models to estimators of design-base variance.

Accomplishments (October 2013 – September 2014): Combined data from the American Community Survey (ACS) and Survey of Income and Program Participation (SIPP) to

model estimates of disability at the county level.

Conducted a comprehensive simulation study to evaluate which model framework produces better confidence intervals forsmall rates (near 0%) when the sample size is small under a variety of data generation mechanisms.

Created an artificial population from 5-year ACS microdata to assist in Small Area model evaluations which makes estimates atthe tract level.

Compared standard Fay-Herriot models with measurement error model and Observed Best Predictor models through asimulation study when one of the predictors contains error.

Short-Term Activities (FY 2015): Compare different methods of variance estimation for the disability models, which combine ACS and SIPP survey data

Develop small area models for estimating specific characteristics of disability at the state and county level.

Continue evaluation of small area models for tract level estimates of school age children in poverty using the artificialpopulation framework.

Expand the simulation study on the measurement error models and Observed Best Predictor model to understand which typesof model misspecifications favor one method over the other.

Longer-Term Activities (beyond FY 2015):

Compare area-level and unit-level small area models for estimates of disability.

Extend the small area models for design variances to account for the correlation between the direct estimate and its estimatedvariance, i.e. a joint small area model for the design-based point and variance estimates.

Modify the tract-level small area models of poverty in school children for school district geographies.

Examine the distribution of residuals from the Small Area Health Insurance Estimates production model.

Create a framework to enable year-to-year comparisons of small area estimates.

Develop a mixture-model for the small area random effects to account for outliers / extreme local effects.

Selected Publications:

Datta, G., Ghosh, M., Steorts, R. and Maples, J. (2011). “Bayesian Benchmarking with Applications to Small Area Estimation,”

TEST, Volume 20, Number 3, 574-588.

Page 8: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

6

Huang, E., Malec, D., Maples J., and Weidman, L. (2007). “American Community Survey (ACS) Variance Reduction of Small

Areas via Coverage Adjustment Using an Administrative Records Match,” Proceedings of the 2006 Joint Statistical

Meetings, American Statistical Association, Alexandria, VA, 3150-3152.

Janicki, R. (2011). “Selection of prior distributions for multivariate small area models with application to small area health

insurance estimates.” JSM Proceedings, Government Statistics Section. American Statistical Association, Alexandria, VA.

Joyce, P. and Malec, D. (2009). “Population Estimation Using Tract Level Geography and Spatial Information,” Research Report

Series (Statistics #2009-3), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Malec, D. (2005). “Small Area Estimation from the American Community Survey using a Hierarchical Logistic Model of Persons

and Housing Units,” Journal of Official Statistics, 21 (3), 411-432.

Malec, D. and Maples, J. (2008). “Small Area Random Effects Models for Capture/Recapture Methods with Applications to

Estimating Coverage Error in the U.S. Decennial Census,” Statistics in Medicine, 27, 4038-4056.

Malec, D. and Müller, P. (2008). “A Bayesian Semi-Parametric Model for Small Area Estimation,” in Pushing the Limits of

Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh (eds. S. Ghoshal and B. Clarke), Institute of

Mathematical Statistics, 223-236.

Maples, J. and Bell, W. (2007). “Small Area Estimation of School District Child Population and Poverty: Studying Use of IRS

Income Tax Data,” Research Report Series (Statistics #2007-11), Statistical Research Division, U.S. Census Bureau,

Washington, DC.

Maples, J. (2011). “Using Small-Area Models to Improve the Design-Based Estimates of Variance for County Level Poverty Rate

Estimates in the American Community Survey,” Research Report Series (Statistics #2011-02), Center for Statistical Research

and Methodology, U.S. Census Bureau, Washington, DC.

Slud, E. and Maiti, T. (2006). “Mean-Squared Error Estimation in Transformed Fay-Herriot Models,” Journal of the Royal

Statistical Society Series B, 239-257.

Slud, E. and Maiti, T. (2011). “Small-Area Estimation Based on Survey Data from Left-Censored Fay-Herriot Model,” Journal of

Statistical Planning & Inference, 3520-3535.

Contact: Jerry Maples, Aaron Gilary, Ryan Janicki, Jiashen You, Gauri Datta, Bill Bell (R&M), Eric Slud

Funding Sources for FY 2015: 0331 – Working Capital Fund

1871 – General Research

Various Decennial, Demographic, and Economic Projects

Back to Top

Page 9: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

7

Survey Sampling: Estimation and Modeling

Motivation: Survey sampling helps the Census Bureau provide timely and cost efficient estimates of population characteristics.

Demographic sample surveys estimate characteristics of people or households such as employment, income, poverty, health,

insurance coverage, educational attainment, or crime victimization. Economic sample surveys estimate characteristics of

businesses such as payroll, number of employees, production, sales, revenue, or inventory. Survey sampling helps the Census

Bureau assess the quality of each decennial census. Estimates are produced by use of design-based estimation techniques or

model-based estimation techniques. Methods and topics across the three program areas (Demographic, Economic, and

Decennial) include: sample design, estimation and use of auxiliary information (e.g., sampling frame and administrative records),

weighting methodology, adjustments for non-response, proper use of population estimates as weighting controls, variance

estimation, effects of imputation on variances, coverage measurement sampling and estimation, coverage measurement

evaluation, evaluation of census operations, uses of administrative records in census operations, improvement in census

processing, and analyses that aid in increasing census response.

Research Problems:

What are important areas of research in design-based estimation, model-based estimation, and model-assisted estimation that

can benefit Census Bureau data programs?

How can administrative records, supported by new research on matched survey and administrative lists, be used to increase the

efficiency of censuses and sample surveys?

Can non-traditional design methods such as adaptive sampling be used to improve estimation for rare characteristics and

populations?

How can time series and spatial methods be used to improve ACS estimates or explain patterns in the data?

Can generalized weighting methods be formulated and solved as optimization problems to avoid the ambiguities resulting from

multiple weighting step and to explicitly allow inexact calibration?

How can we detect and adjust for outliers and influential sample values to improve sample survey estimates?

What models can aid in assessing the combined effect of all the sources of sampling and nonsampling error, including frame

coverage errors and measurement errors, on sample survey estimates?

What experiments and analyses can inform the development of outreach methods to enhance census response?

Can unduplication and matching errors be accounted for in modeling frame coverage in censuses and sample surveys?

What additional information could have been obtained if deleted census persons and housing units had been part of the Census

Coverage Measurement (CCM) survey?

How can small-area or other model-based methods be used to improve interval estimates in sample surveys, to design survey

collection methods with lowered costs, or to improve Census Bureau imputation methods?

Can classical methods in nonparametrics (e.g., using ranks) improve estimates from sample surveys?

How can we measure and present uncertainty in rankings of units based on sample survey estimates?

Can BIG DATA improve results from censuses and sample surveys?

Potential Applications:

Improve estimates and reduce costs for household surveys by introducing new design and estimation methods.

Produce improved ACS small area estimates thorough the use of time series and spatial methods.

Streamline documentation and make weighting methodology more transparent by applying the same nonresponse and

calibration weighting adjustment software across different surveys.

New procedures for adjusting weights or reported values in the monthly trade surveys and surveys of government employment,

based on statistical identification of outliers and influential values, to improve accuracy of estimation monthly level and of

month-to-month change.

Provide a synthesis of the effect of nonsampling errors on estimates of net census coverage error, erroneous enumerations, and

omissions and identify the types of nonsampling errors that have the greatest effects. Employ administrative records to

improve the estimates of census coverage error.

Use American Community Survey (ACS) data to assess the uncertainty in estimates of foreign-born immigration used by

Demographic Analysis (DA) and the Postcensal Estimates Program (PEP) to estimate population size.

Target outreach resources via models to improve the mail response rate in censuses and thereby reduce costs.

Provide a unified computer matching system that can be used with appropriate parameter settings for both the Decennial

Census and several Decennial-related evaluations. Reduce decennial census errors by optimally balancing rates of detected

and removed census duplicates with rates of coincidental matches.

Page 10: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

8

Accomplishments (October 2013 – September 2014): Analyzed 2012 ACS CAPI data to inform policy changes in curtailing CAPI contact attempts to minimize respondent burden

with minimal loss of completed interviews.

Refined methodology for implementing an M-estimation algorithm for detecting and treating influential values in economic

surveys—with minimal false detections—that is widely applicable in the presence of seasonal effects.

Examined results of the latest nationwide matching of the 2010 Census Unedited File (CUF) against itself using methodology

similar to that used for the across response and within response matching in the 2010 Duplicate Person Identification (DPI)

process. Examined patterns of distances between units linked in the most recent matching of the 2010 CUF against itself.

Developed a method to investigate the net effect of recall error in the reporting of move month using data with survey

responses linked to a commercial database of administrative records.

Developed a new time series small area estimation model appropriate for SAIPE and applied it to ACS data.

Designed and implemented a multi-factor simulation to assess the performance of several competing types of confidence

intervals for proportions in complex surveys such as ACS.

Compared the performance of time series, bivariate, and univariate small area estimation models fitted through a hierarchical

Bayes approach both empirically based on SAIPE data, and through simulations clarifying the differences in inferences from

these models in a variety of scenarios.

Assessed the impact of ignoring measurement error in a Binomial Logit Normal Model applied to SAIPE county school-aged

children in poverty rate estimates, as opposed to accounting for the (known) measurement error across counties through a

bivariate model.

Generalized an exact method of optimal sample allocation; demonstrated that it is more efficient than Neyman Allocation; and

applied the generalized method to data from the Service Annual Survey.

Extended methodology for ranking populations based on data from sample surveys and developed some visualizations.

Short-Term Activities (FY 2015):

Code general-purpose software for weight adjustment for nonresponse and frame errors via calibration with some inexact

constraints using quadratic programming to enforce bounds on multiplicative weight changes.

Compare the relative accuracy of occupancy status and responses for housing units enumerated in the 2015 Census

Nonresponse Followup versus the administrative records, using data from the 2010 Census Coverage Measurement Program.

Analyze the effects of outreach on response to censuses and sample surveys.

Develop methods of model-based imputation based on multivariate nonparametric-Bayes and copula methods.

Explore the possibility of replacing the confidence intervals used in ACS for another direct method.

Continue examining results of the latest nationwide matching of the 2010 CUF against itself using methodology similar to that

used on the 2010 DPI process. Begin considering the effects of alternative procedures including refining the use of

geographic distance to help evaluate links.

Examine the relationship between potential covariates predictive of household size, both for Maricopa county and nationally.

Further investigate the potential benefit of using small area time series estimation models on ACS data.

Continue to study properties of the exact optimal allocation method and its application.

Continue to investigate ranking methods based on sample survey estimates.

Longer-Term Activities (beyond FY 2015):

Investigate the properties and applicability of alternative design and estimation methods for household sample surveys: both

design-based and model-based.

Research and evaluate relevant methodologies, techniques, and technologies to improve matching in the 2020 Census

operations and related evaluations. Use the results of this research to propose a revised matching methodology for these

operations.

Expand the search for methods appropriate for estimation methodology in the presence of influential values in sample surveys.

Develop methodology for estimation and for assessing the quality of estimates using multiple sources of data, such as multiple

sample surveys, administrative records systems, and/or a census.

Develop methods for estimating the uncertainty due to nonsampling errors in estimates from censuses, sample surveys, and

administrative records.

Investigate the relative accuracy of occupancy status and responses for housing units enumerated in the 2015 Census Test

Nonresponse Followup with the accuracy of the administrative records using data collected in the 2015 Evaluation Followup.

Develop statistical models for using administrative records (AR) in the 2020 Census NRFU process. Help use these models

and other information to determine the most effective use of AR in the NRFU to reduce costs while maintaining quality.

Page 11: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

9

Expand the applicability of a realistic simulation utility for ACS, based on a frame constructed from census and ACS data, for

the purpose of ACS estimation methods development, including methods for reporting confidence intervals and for small

area estimation.

Investigate ranking methods based on sample survey estimates as well as the use of ranks in sample surveys.

Selected Publications:

Bates, N. and Mulry, M. H. (2012). “Did the 2010 Census Social Marketing Campaign Shift Public Mindsets?” Proceedings of

the Joint Statistical Meetings, American Statistical Association, Alexandria, VA, 5258-5271. Bates, N. and Mulry, M. H. (2011). “Using a Geographic Segmentation to Understand, Predict, and Plan for Census and Survey

Mail Nonresponse.” Journal of Official Statistics. 27(4).

Franco, C. and Bell, W. (2013). “Applying Bivariate/Logit Normal Models to Small Area Estimation,” Proceedings of Survey

Research Methods Section, American Statistical Association, Alexandria, VA.

Franco, C., Little, R. J. A., Louis, T. A., Slud, E. V. (2014) “Coverage Properties of Confidence Intervals for Proportions in

Complex Sample Surveys,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria,

VA.

Griffin, D., Slud, E. and Erdman, C. (2014). “Reducing Respondent Burden in the American Community Survey's Computer

Assisted Personal Visit Interviewing Operation - Phase 3 Results,” ACS Research and Evaluation Memorandum #ACS 14-

RER-28.

Hogan, H. and Mulry, M. H. (In Press). “Assessing Accuracy of Postcensal Estimates: Statistical Properties of Different

Measures,” in N. Hogue (Ed.), Emerging Techniques in Applied Demography. Springer. New York.

Hunley, Pat. (2014). “Proof of Equivalence of Webster’s Method and Willcox’s Method of Major Fractions,” Research Report

Series (Statistics #2014-04), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Ikeda, M. and Porter, N. (2007). “Initial Results from a Nationwide BigMatch Matching of 2000 Census Data.” Research Report

Series (Statistics #2007-22), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Ikeda, M. and Porter, N. (2008). “Additional Results from a Nationwide Matching of 2000 Census Data.” Research Report Series

(Statistics #2008-02), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Ikeda, M., Tsay, J., and Weidman, L. (2012). “Exploratory Analysis of the Differences in American Community Survey

Respondent Characteristics Between the Mandatory and Voluntary Response Methods,” Research Report Series (Statistics

#2012-01), Center for Statistical Research & Methodology, US Census Bureau, Wash. DC.

Joyce, P. and Malec, D. (2009). “Population Estimation Using Tract Level Geography and Spatial Information,” Research Report

Series (Statistics #2009-03), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Joyce, P., Malec, D., Little, R. and Gilary, A. (2012). “Statistical Modeling Methodology for the Voting Rights Act Section 203

Language Assistance Determinations,” Research Report Series (Statistics #2012-02), Center for Statistical Research &

Methodology, U.S. Census Bureau, Washington, DC.

Klein, M. and Wright, T. (2011). “Ranking Procedures for Several Normal Populations: An Empirical Investigation,”

International Journal of Statistical Sciences, Volume 11 (P.C. Mahalanobis Memorial Special Issue), 37-58.

Mulry, M. H. (2014). “Measuring Undercounts in Hard-to-Survey Groups,” in R. Tourangeau, N. Bates, B. Edwards, T. Johnson,

and K. Wolter (Eds.), Hard-to-Survey Populations. Cambridge University Press, Cambridge, England.

Mulry, M. H., Nichols, E. M., and Childs, J. H. (2014) “Study of Error in Survey Reports of Move Month Using the

U.S. Postal Service Change of Address Records.” 2014 Proceedings of the Joint Statistical Meetings. American Statistical

Association.

Mulry, M. H., Oliver, B. E. and Kaputa, S. J. (2014) “Detecting and Treating Verified Influential Values in a Monthly Retail

Trade Survey.” Journal of Official Statistics. 30(4). 1–28.

Mulry, M. H., Oliver, B., and Kaputa, S. (2012). “Study of Treatment of Influential Values in a Monthly Retail Trade Survey,”

Proceedings of the Fourth International Conference on Establishment Surveys. American Statistical Association. Alexandria,

VA.

Mulry, M. H. and Spencer, B. D. (2011). “Designing Estimators of Nonsampling Errors in Estimates of Components of Census

Coverage Error,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria, VA.

Mulry, M. H. and Olson, T. P. (2011). “Analyses for the U.S. 2010 Census Partnership Program.” Social Marketing Quarterly.

17(1), 27 – 55.

Shao, J., Slud, E., Cheng, Y., Wang, S. and Hogue, C. (2014). “Theoretical and Empirical Properties of Model Assisted Decision-

Based Regression Estimators,” Survey Methodology 40(1), 81-104.

Slud, E. and Erdman, C. (2013) Adaptive curtailment of survey followup based on contact history data. FCSM Proceedings paper,

http://fcsm.sites.usa.gov/files/2014/05/B1_Slud_2013FCSM.pdf

Slud, E., Grieves, C. and Rottach, R. (2013). “Single Stage Generalized Raking Weight Adjustment in the Current Population

Survey,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria, VA.

Slud, E. and Thibaudeau,Y. (2010). “Simultaneous Calibration and Nonresponse Adjustment,” Research Report Series (Statistics

#2010-03), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Slud, E. (2012). “Assessment of Zeroes in Survey-Estimated Tables via Small-Area Confidence Bounds”, Journal of the Indian

Society for Agricultural Statistics, Special Issue on Small Area Estimation, 66, 157-169.

Page 12: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

10

Slud, E. (2014), “Modeling Frame Deficiencies for Improved Calibrations,” Proceedings of Survey Research Methods Section,

American Statistical Association, Alexandria, VA.

Wieczorek, J. and Franco, C. (2013). “An Empirical Artificial Population and Sampling Design for Small-Area Model

Evaluations,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria, VA.

Wright, T. (2012). “The Equivalence of Neyman Optimum Allocation for Sampling and Equal Proportions for Apportioning the

U.S. House of Representatives,” The American Statistician, 66 (4), 217-224.

Wright, T. (2013). “A Visual Proof, a Test, and an Extension of a Simple Tool for Comparing Competing Estimates,” Research

Report Series (Statistics #2013-05), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, DC.

Wright, T., Klein, M., and Wieczorek, J. (2013). “An Overview of Some Concepts for Potential Use in Ranking Populations

Based on Sample Survey Data,” 2013 Proceedings of the World Congress of Statistics (Hong Kong), International Statistical

Institute.

Wright, T. (2014). “A Simple Method of Exact Optimal Sample Allocation under Stratification with Any Mixed Constraint

Patterns,” Research Report Series (Statistics #2014-07), Center for Statistical Research and Methodology, U.S. Census

Bureau, Washington, DC.

Wright, T. (2014). “Lagrange’s Identity and Congressional Apportionment,” The American Mathematical Monthly, 121, 523-528.

Contact: Eric Slud, Mary Mulry, Michael Ikeda, Carolina Franco, Patrick Joyce, Martin Klein, Ned Porter, Tommy Wright

Funding Sources for FY 2015: 0331 – Working Capital Fund

1871 – General Research

Various Decennial, Demographic, and Economic Projects

Back to Top

Page 13: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

11

Statistical Computing and Software

Motivation: Modern statistics and computing go hand in hand, and new statistical methods need to be implemented in software

to be broadly adopted. The focus of this research area is to develop general purpose software using sound statistical methods that

can be used in a variety of Census Bureau applications. These application areas include: survey processing (editing, imputation,

non-response adjustment, calibration and estimation); record linkage; disclosure methods; time series and seasonal adjustment;

variance estimation; small-area estimation; and data visualization, exploratory data analysis, and graphics. Also see the other

sections in this document for more detail on some of the topics.

Research Problems:

Investigate the current best and new statistical methods for each application.

Investigate alternative algorithms for statistical methods.

Determine how to best implement the statistical algorithms in software.

Potential Applications:

Anywhere in the Census Bureau statistical software is used.

Accomplishments (October 2013 – September 2014): Revised and enhanced X-13ARIMA-SEATS to include new seasonality diagnostics and improvements in the accessible HTML

output.

Developed milestone of Apophenia software for generalized statistical modeling.

Developed new methods for distribution of Tea.

Improved SAHIE small area estimation software.

Constructed artificial population of firms for simulation of Monthly Wholesale Trade Survey.

Short-Term Activities (FY 2015):

Continue to update the model-based seasonal adjustment module within X-13ARIMA-SEATS while supporting empirical studiesof model based seasonal adjustment of Census Bureau series. Improve automatic model identification procedures.

Rewrite SAIHE implementation.

Test Tea against ACS and Census data procedures.

Develop procedures for copula-based multivariate imputation.

Continue to maintain, document, and develop BigMatch.

Longer-Term Activities (beyond FY 2015):

Incorporate matching procedures into Tea.

Rollout of new high-performance computing environment.

Development of software tools for Bayesian survey analysis.

Develop expertise in data visualization and machine learning.

Develop package for aggregating multiple data sources for small-area estimation.

Continue development of X-13ARIMA-SEATS needed to publish SEATS seasonal adjustments of selected Census Bureau seriesand re-engineer the software around a more modern computing paradigm.

Investigate speed and stability of likelihood optimization in X-13 ARIMA-SEATS.

Contact: Joe Schafer, Yves Thibaudeau, Bill Winkler, Brian Monsell, Ben Klemens, Rolando Rodríguez, Chad Russell, Ned

Porter, Tom Petkunas, Isaac Dompreh

Funding Sources for FY 2015: 0331 – Working Capital Fund

1871 – General Research

Various Decennial, Demographic, and Economic Projects

Back to Top

Page 14: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

12

Time Series and Seasonal Adjustment

Motivation: Seasonal adjustment is vital to the effective presentation of data collected from monthly and quarterly economic

sample surveys by the Census Bureau and by other statistical agencies around the world. As the developer of the X-13ARIMA-

SEATS Seasonal Adjustment Program, which has become a world standard, it is important for the Census Bureau to maintain an

ongoing program of research related to seasonal adjustment methods and diagnostics, in order to keep X-13ARIMA-SEATS up-to-

date and to improve how seasonal adjustment is done at the Census Bureau.

Research Problems:

All contemporary seasonal adjustment programs of interest depend heavily on time series models for trading day and calendar

effect estimation, for modeling abrupt changes in the trend, for providing required forecasts, and, in some cases, for the

seasonal adjustment calculations. Better methods are needed for automatic model selection, for detection of inadequate

models, and for assessing the uncertainty in modeling results due to model selection, outlier identification and non-normality.

Also, new models are needed for complex holiday and calendar effects.

Need better diagnostics and measures of estimation and adjustment quality, e.g., for model-based seasonal adjustment.

Develop a viable multivariate seasonal adjustment methodology that can handle a large number of series, such that the

aggregation constraints of multiple time series can be directly accounted for.

Develop and extend seasonal adjustment methods to time series observed at higher frequencies, e.g., weekly data.

Potential Applications

Potential applications encompass the Decennial, Demographic, and Economic areas.

Accomplishments (October 2013 – September 2014): Revised several times the X-13ARIMA-SEATS and updated the version of SEATS incorporated into the software. Developed

enhancements that include new seasonality diagnostics, and improvements in the accessible HTML output. This version was released to and tested by the Economic Directorate.

Began empirical studies of model based seasonal adjustment of Census Bureau series, including comparing the averagerevisions of SEATS and X-11 seasonal adjustments generated by X-13ARIMA-SEATS.

Wrote multivariate seasonal adjustment methodology, algorithms, and software and tested them on retail and constructionseries.

Conducted methodological research into the Visual Significance criterion for spectral peak detection has continued, withderivations of asymptotic distributions being obtained, as well as the linkage between correlation behavior and spectral shape.

Short-Term Activities (FY 2015):

Develop and apply non-lattice spatial models to reduce sampling variability in small area estimates in surveys, such as theAmerican Community Survey.

Continue to update the model-based seasonal adjustment module within X-13ARIMA-SEATS, while supporting empiricalstudies of model based seasonal adjustment of Census Bureau series.

Update and improve automatic model identification procedures within the X-13ARIMA-SEATS program.

Expand research on multivariate seasonal adjustment, seasonal long memory modeling of time series, and spectral peakdetection.

Investigate the speed and stability of likelihood optimization in X-13ARIMA-SEATS.

Develop and apply non-lattice spatial models to reduce sampling variability in small area estimates in surveys, such as theAmerican Community Survey.

Longer-Term Activities (beyond FY 2015):

Continue development of X-13ARIMA-SEATS needed to publish SEATS seasonal adjustments of selected Census Bureau series

and re-engineer the software around a more modern computing paradigm.

Develop models that allow for heavy-tailed marginal distributions, as a way to handle outliers and extreme values in seasonal

adjustment algorithms.

Improve seasonal adjustment and forecasting by explicit modeling of sampling and non-sampling error.

Investigate seasonal adjustment arising from seasonally heteroscedastic models.

Study the modeling of seasonal time series measured at higher sampling frequencies, such as weekly and daily frequencies.

Selected Publications:

Alexandrov, T., Bianconcini, S., Dagum, E., Maass, P., and McElroy, T. (2012). “The Review of Some Modern Approaches to

the Problem of Trend Extraction,” Econometric Reviews 31, 593-624.

Bell, W., Holan, S., and McElroy, T. (2012). Economic Time Series: Modeling and Seasonality. New York: Chapman Hall.

Page 15: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

13

Blakely,C. (2012). “Extracting Intrinsic Modes in Stationary and Nonstationary Time Series Using Reproducing Kernels and

Quadratic Programming,” International Journal of Computational Methods, Vol. 8, No. 3.

Findley, D. F. (2013). “Model-Based Seasonal Adjustment Made Concrete with the First Order Seasonal Autoregressive Model,”

Center for Statistical Research & Methodology, Research Report Series (Statistics #2013-04). U.S. Census Bureau,

Washington, DC.

Findley, D. F. and Lytras, D. P. (2012). “Properties of an F-Test and Spectral Diagnostics for Detecting Seasonality,”

Proceedings of the 2012 Joint Statistical Meetings. American Statistical Association, Alexandria, VA.

Findley, D. F., Monsell, B. C., and Hou, C.-T. (2012). “Stock Series Holiday Regressors Generated from Flow Series Holiday

Regressors,” Taiwan Economic Forecast and Policy.

Holan, S. and McElroy, T. (2012). “On the Seasonal Adjustment of Long Memory Time Series,” in Economic Time Series:

Modeling and Seasonality. Chapman-Hall.

Jach, A., McElroy, T., and Politis, D. (2012). "Subsampling Inference for the Mean of Heavy-tailed Long Memory Time Series,"

Journal of Time Series Analysis, 33, 96-111.

McElroy, T. (2009). “Incompatibility of Trends in Multi-Year Estimates from the American Community Survey,” The Annals of

Applied Statistics, 3(4), 1493-1504.

McElroy, T. (2010). “A Nonlinear Algorithm for Seasonal Adjustment in Multiplicative Component Decompositions,” Studies

in Nonlinear Dynamics and Econometrics, 14, No. 4, Article 6.

McElroy, T. (2011). “A Nonparametric Method for Asymmetrically Extending Signal Extraction Filters,” Journal of

Forecasting, 30, 597-621.

McElroy, T. (2012) . “The Perils of Inferring Serial Dependence from Sample Autocorrelation of Moving Average Series,”

Statistics and Probability Letters, 82, 1632-1636.

McElroy, T. (2012). “An Alternative Model-based Seasonal Adjustment that Reduces Over-Adjustment,” Taiwan Economic

Forecast and Policy 43, 33-70.

McElroy, T. (2013). “Forecasting CARIMA Processes with Applications to Signal Extraction,” Annals of the Institute of

Statistical Mathematics, 65, 439-456.

McElroy, T. and Findley, D. (2014). “Fitting Constrained Vector Autoregression Models,” in Empirical Economic and Financial

Research.

McElroy, T. and Holan, S. (2012). “A Conversation with David Findley,” Statistical Science, 27, 594-606.

McElroy, T. and Holan, S. (2012). “On the Computation of Autocovariances for Generalized Geganbauer Processes,” Statistica

Sinica 22, 1661-1687.

McElroy, T. and Holan, S. (2012). “The Error in Business Cycle Estimates Obtained from Seasonally Adjusted Data,” in

Economic Time Series: Modeling and Seasonality. Chapman-Hall.

McElroy, T. and Holan, S. (2014) “Asymptotic Theory of Cepstral Random Fields,” Annals of Statistics, 42, 64-86.

McElroy, T. and Jach, A. (2012). “Subsampling inference for the autocovariances of heavy-tailed long-memory time series,”

Journal of Time Series Analysis, 33, 935-953.

McElroy, T. and Jach, A. (2012). “Tail Index Estimation in the Presence of Long Memory Dynamics,” Computational Statistics

and Data Analysis, 56, 266-282.

McElroy, T. and Maravall, A. (2014) “Optimal Signal Extraction with Correlated Components,” Journal of Time

Series Econometrics, 6, 237--273.

McElroy, T. and Monsell, B. (2014) “The Multiple Testing Problem for Box-Pierce Statistics,” Electronic Journal of Statistics,

8, 497-522.

McElroy, T. and Pang, O. (2014). “The Algebraic Structure of Transformed Time Series,” in Empirical Economic and Financial

Research.

McElroy, T. and Politis, D. (2012). “Fixed-b Asymptotics for the Studentized Mean for Long and Negative Memory Time

Series,” Econometric Theory, 28, 471-481.

McElroy, T. and Politis, D. (2013). “Distribution Theory for the Studentized Mean for Long, Short, and Negative Memory Time

Series,” Journal of Econometrics.

McElroy, T. and Politis, D. (2014). “Spectral Density and Spectral Distribution Inference for Long Memory Time Series via

Fixed-b Asymptotics,” Journal of Econometrics, 182, 211-225.

McElroy, T. and Trimbur, T. (2011). “On the Discretization of Continuous-Time Filters for Nonstationary Stock and Flow Time

Series,” Econometric Reviews, 30, 475-513.

McElroy, T. and Wildi, M. (2010). “Signal Extraction Revision Variances as a Goodness-of-Fit Measure,” Journal of Time Series

Econometrics 2, Issue 1, Article 4.

McElroy, T. and Wildi, M. (2013). “Multi-Step Ahead Estimation of Time Series Models,” International Journal of Forecasting

29, 378-394.

Monsell, B. C. (2014) “The Effect of Forecasting on X-11 Adjustment Filters,” 2014 Proceedings American Statistical

Association [CD-ROM]: Alexandria, VA.

Monsell, B. C. and Blakely, C. (2013). “X-13ARIMA-SEATS and iMetrica,” 2013 Proceedings of the World Congress of

Statistics (Hong Kong), International Statistical Institute.

Page 16: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

14

Quenneville, B. and Findley, D. F. (2012). “The Timing and Magnitude Relationships Between Month-to-Month Changes and

Year-to-Year Changes That Make Comparing Them Difficult,” Taiwan Economic Forecast and Policy, 43, 119-138.

Roberts, C., Holan, S., and Monsell, B. C. (2010). “Comparison of X-12-ARIMA Trading Day and Holiday Regressors With

Country Specific Regressors,” Journal of Official Statistics, 26 (2), 371-394.

Contact: Tucker McElroy, Brian C. Monsell, James Livsey, Osbert Pang, Anindya Roy, Bill Bell (R&M), David Findley

(consultant)

Funding Sources for FY 2015: 0331 – Working Capital Fund

1871 – General Research

Economic Project

Back to Top

Page 17: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

15

Experimentation, Simulation, and Modeling

Motivation: Experiments at the Census Bureau are used to answer many research questions, especially those related to testing,

evaluating, and advancing survey sampling methods. A properly designed experiment provides a valid, cost-effective framework

that ensures the right type of data is collected as well as sufficient sample sizes and power are attained to address the questions of

interest. The use of valid statistical models is vital to both the analysis of results from designed experiments and in characterizing

relationships between variables in the vast data sources available to the Census Bureau. Statistical modeling is an essential

component for wisely integrating data from previous sources (e.g., censuses, sample surveys, and administrative records) in order

to maximize the information that they can provide. Monte Carlo simulation techniques aid in the design of complicated

experiments as well as the evaluation of complex statistical models.

Research Problems:

Develop models for the analysis of measurement errors in Demographic sample surveys (e.g., Current Population Survey or the

Survey of Income and Program Participation).

Develop methods for designed experiments embedded in sample surveys. Simulation studies can provide further insight (as

well as validate) any proposed methods.

Assess feasibility of established design methods (e.g., factorial designs) in Census Bureau experimental tests.

Identify and develop statistical models (e.g., loglinear models, mixture models, and mixed-effects models) to characterize

relationships between variables measured in censuses, sample surveys, and administrative records.

Assess the applicability of post hoc methods (e.g., multiple comparisons and tolerance intervals) with future designed

experiments and when reviewing previous data analyses.

Investigate noise multiplication for statistical disclosure control.

Potential Applications:

Modeling approaches with administrative records can help enhance the information obtained from various sample surveys.

Experimental design can help guide and validate testing procedures proposed for the 2020 Census.

Expanding the collection of experimental design procedures currently utilized with the American Community Survey.

Better informing tests developed for frame improvement research.

Developing appropriate noise multiplication strategies can help achieve a desired balance between accuracy of inference and

protection against disclosure.

Accomplishments (October 2013 – September 2014): Studied further properties of zero-inflated models as an appropriate strategy for modeling the coverage errors of the Master

Address File (MAF).

Short-Term Activities (FY 2015):

Investigate improvements to MAF error modeling. Possible approaches include use of additional data sources such as socio-

economic data from administrative records, spatial modeling at levels finer than Census blocks, and joint models for

overcoverage and undercoverage.

Assist with the handoff of the Census Enterprise Model to the Census Bureau. This software was developed by MITRE for the

Census Bureau to simulate Decennial Census operations at the household level, including address canvassing.

Longer-Term Activities (beyond FY 2015):

Use zero-inflated model results to help inform data collection efforts for expanding the universe file used by both the MAF

Error Model (MEM) team and the GSSI Targeted Address Canvassing Research, Modeling, and Classification (TRMAC)

team.

Apply existing or develop new methodologies for properly incorporating spatial components in the ongoing frame

improvement research.

Selected Publications:

Mathew, T. and Young, D. S. (2013). “Fiducial-Based Tolerance Intervals for Some Discrete Distributions,” Computational

Statistics and Data Analysis, 61, 38-49.

Young, D. S. (2013). “Regression Tolerance Intervals,” Communications in Statistics – Simulation and Computation, 42(9),

2040-2055.

Young, D. S. (In Press). “A Procedure for Approximate Negative Binomial Tolerance Intervals,” Journal of Statistical

Computation and Simulation.

Page 18: STATISTICAL EXPERTISE & GENERAL RESEARCH … EXPERTISE & GENERAL RESEARCH TOPICS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY Research & Methodology Directorate U.S. Census Bureau

16

Klein, M., Mathew, T. and Sinha, B. K. (In Press). “Likelihood Based Inference Under Noise Multiplication,” Thailand

Statistician.

Contact: Derek Young, Thomas Mathew, Andrew Raim

Funding Sources for FY 2015: 0331 – Working Capital Fund

1871 – General Research

Various Decennial and Demographic Projects

Back to Top