15
Migration of a large survey onto a micro-economic platform Val Cox April 2014

Migration of a large survey onto a micro-economic platform Val Cox April 2014

Embed Size (px)

Citation preview

Page 1: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Migration of a large survey onto a micro-economic platform

Val CoxApril 2014

Page 2: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Micro-economic Platform (MEP)

Standardises and automates processes

- Provides more efficient processing, more analysis

Enables Statistics NZ to gain more from available data

- Basic principle: use administrative data wherever possible, with surveys filling the gaps

- Objective: bring core information about every business in the economy into the Longitudinal Business DB to allow Statistics NZ to respond quickly to changing needs for economic statistics2

Page 3: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Aim of paper

To discuss the challenges of building a non-response imputation package for a large survey on the MEP

- Rationalises the use of Banff for outlier detection and imputation

SEVANI (System for Estimation of Variance due to Nonresponse and Imputation) to estimate sampling and non-sampling errors

3

Page 4: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Annual Enterprise Survey(AES)

Provides statistics on the financial performance and position of New Zealand businesses

- Captures about 90% of New Zealand's GDP

Uses four different major data sources

- Three administrative (covers 72% of the population)

- One postal survey

4

Page 5: Migration of a large survey onto a micro-economic platform Val Cox April 2014

AES before MEP

5

Page 6: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Editing strategy of AES on MEP

Guided by the Methodological Standard for E&I

Key objective of standard

- Editing is fit-for-purpose and enables continuous improvement of processes and data quality

Key principles used- Automate editing processes where possible

- Use Statistics NZ standard editing tools, wherever possible, to achieve standardisation

6

Page 7: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Editing system of AES in MEP

Uses Banff to automate and standardise editing and imputation processes

Uses analytical views to assess the quality of the edited data

7

Page 8: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Challenges and solutions

A. Sheer volume of data

- 28 questionnaires, 113 industries and 180 variables

Solution: Use of a “thin slice” approach- Restrict dataset to one questionnaire and one

industry to show all stages of E&I are working

- Once successful, expand dataset to include more industries until all 28 questionnaires are replicated

- Successful in determining optimal level of automation for correcting failed edits

8

Page 9: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Challenges and solutions

B. Determining which variable is erroneous when groups of variables must add or subtract to a total

- Banff “errorloc” procedure always recommends to change one variable by a large amount

- Change is done by “deterministic” procedure

Solution: Assign weights to variables- Assign lower weights to more reliable variables so

Banff doesn’t change their values

Examples: totals, gross profit, since respondents use this to determine the tax they pay

9

Page 10: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Challenges and solutions

C. Outlier detection

- Old system detects outlier in 3 key variables but unlinks whole unit (all variables)

- Banff does univariate outlier detection

Solution: Compared 2 E&I runs of data

- 1st run had only the 3 key variables set as outliers and 2nd had all variables included in outlier steps

- Decision: Choose variables to be set as outliers based on the effect on the totals

10

Page 11: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Challenges and solutions

D. Running imputation one variable at a time would have been very time-consuming

Solution: Group variables- By imputation method (4 methods)

- By industry (some industries have different characteristics)

- By type of variable (e.g. some variables can be negative)

11

Page 12: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Challenges and solutions

E. Imputation failed for some variables

- Some imputation cells were too small

Solution: Merged small imputation cells- Each imputation stage was run twice, the first

without cell merging and the second with cell merging, resulting in 8 imputation stages

- Use of a “catch-all” stage at the end (9th stage) to carry out mean imputation by industry

12

Page 13: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Challenges and solutions

F. Challenges with no solutions

- Analysis of improvements in the E&I was slow as it took several hours to run E&I and write back to the main data storage area to view data in a cube

- Attempt to replicate published results as closely as possible created a dilemma: When to stop trying?

- What was the “right” answer?

13

Page 14: Migration of a large survey onto a micro-economic platform Val Cox April 2014

SEVANIProvided a standardised and automated method to report on estimates of variances due to sampling as well as non-response and imputation

Challenges:- Can produce output for one variable at a time- SEVANI required a lot of parameters to set-up

- MEP is unit-based so can’t easily output SEVANI results

Solution:- Use of a macro to identify variable names- Created a SAS code to set-up parameters- Output SEVANI results outside MEP

14

Page 15: Migration of a large survey onto a micro-economic platform Val Cox April 2014

Next steps

Educate the users of the new system on MEP

Identify potential areas to make improvements in the editing and imputation system

Create a new MEP collection for Charities data to include its own editing and imputation system

15