Using Commercial-off-the-Shelf (COTS) Software at the National … · 2018. 9. 17. ·...

Preview:

Citation preview

Using Commercial-off-the-Shelf (COTS) Software at the National

Agricultural Statistics Service

Darcy Miller

UNECE Work Session on Statistical Data Editing

September 18-20, 2018

THE FINDINGS AND CONCLUSIONS IN THIS PRELIMINARY PRESENTATION HAVE NOT BEEN FORMALLY DISSEMINATED BY THE U.S. DEPARTMENT OF AGRICULTURE AND SHOULD NOT BE CONSTRUED TO REPRESENT ANY AGENCY DETERMINATION OR POLICY

2

National Agricultural Statistics Service(NASS)

• Agency in the United States Department of Agriculture (USDA)

• Mission: “The National Agricultural Statistics Service provides timely, accurate, and useful statistics in service to U.S. Agriculture.”

• Hundreds of survey reports– Surveys of farmers and farm businesses, scientific

measurements, satellite data, weather data, and more

• Census of Agriculture– 5 years

3

Editing and Imputation

• Data collected contain missing or erroneous values

• Often, customized code and/or a semi-manual process is used

• Major goal is a ‘clean’ dataset where edit logic is met

4

Edit and Imputation Review

• NASS continually seeks to improve its products

• Contracted with external organization to review editing and imputation processes

• One of the recommendations: use COTS software

5

COTS Software• Benefits:

– Generalized script

– Often supported

– Reduction in maintenance

– Ease of use/development

– Optimized code

– Reproducibility

• Challenges:– Desired features/flexibilities may not be available

– Fitting COTS software in an established process

6

COTS Software at NASSEditing and Imputation

• Blaise – hundreds of small surveys

– Interactive edit, changes primarily manual

• Banff evaluation

• IVEware

• PROC MI (FCS)

• PROC SURVEYIMPUTE

7

Blaise: Survey Processing• 100 + surveys (smaller)

8

Banff Evaluated

-

PRISM: Census of Agriculture

• Significant change to demographic section of the form

• Additional minor changes continued through cognitive, content, and web instrument testing to the final form

• Short timeframe to code and update code

• Census of Agriculture is edited/imputed record by record – Call to imputation code is made by the edit code

– Code for editing and imputation is custom script9

PRISM: Census of Agriculture~3 million records on list frame

10

Donor Pool 1

Keying or Web Collection

Statistical Edit

Nearest Neighbor 1

1 record

Questionnaire Section 1

Donor Pool 2

Statistical Edit

Nearest Neighbor 2

Questionnaire Section 2

1 record

Stored for Manual Review

and Analysis

Clean Record Data

PROC MIBatchedDemographics

. . .

. . .

. . .

Census of AgricultureWeighting Process

• June Area Survey– Annual survey

– Area based sample (theoretically complete)

– Demographic information is not edited/imputed

• Census of Agriculture weighting– June Area Survey data used in the dual system

estimation weighting which incorporates coverage, undercount, misclassification, and nonresponse

• Edited/imputed demographic information for June Area Survey using PROC SURVEYIMPUTE

11

PRISM: Survey Processing• <10 surveys (larger ~30,000 sampled)

12

IVEwarePROC MI

(all data collected)

Moving Forward

• NASS has had success in utilizing COTS software

– Primarily implemented in cases where timelines are short and data are new

• Continue to update edit & imputation processes to incorporate COTS software, where appropriate

– Features/flexibility

• Reduce challenges by modularizing processes

13

Thank you

Contact Information: darcy.miller@nass.usda.gov

14

Recommended