Eurostat Secondary data: collection and use Presented by Arnout van Delden Methodologist Statistics...

Preview:

Citation preview

Eurostat

Secondary data: collection and use

Presented by

Arnout van DeldenMethodologist

Statistics Netherlands

Secondary data

Secondary sources

Statistical sources

Administrative sources

Organic sources

Public administrative sources

Private administrative sources

- Survey data from other organizations

- Trade register - Tax data - Medical register - Base register

- Product price data - Call detail records - Electricity data

- Dwelling prices on internet - GPS data - Social media messages

SecondarySources

Registers

Base registers

Statistical registers

specific

PAST PRESENT FUTURE

Official Statistics

Post-war II Identifiers Concepts: variable, units, time Population registers Administrative Census

– Denmark (1981), Finland (1991), Netherlands (2001)

Use (EU/EFTA Survey 2010)

Frame Observations Auxiliary data Model parameters Data quality

             

  admin data only

admin and survey data

survey data only

not specified non response Total

BR 12,0 16,0 2,0 30

SBS 10,5 11,5 4,7 0,7 2,7 30

STS 4,0 11,0 14,0 0,0 1,0 30

Prodcom 0,0 10,0 13,0 1,0 2,0 26

In sum

Many types of data sources Long history Potentially very useful

CollectionExistenceAccess

Existence

• Data protection act• Organisation registers data under DPA

Existence

• Data protection act• Organisation registers data under DPA

Access

Element Explanation

Legislation National Statistics Act

Public approval Informed consent

Identification codes Base registers (business, dwellings, …)

Reliable data Obliged to report errors; multi users

Cooperation Contacts with administration authorities

In Sum

Explore potential data sources Access: legal uses and public consent

Proper use

Exploration phase

Source

Meta

Processing phase: data useful?y = 1,1625x

0

500

1.000

1.500

2.000

2.500

0 200 400 600 800 1.000 1.200 1.400 1.600

x 10

00 E

UR

x 1000 EUR

Omzet KS

omze

t BTW

y = 0,9743x

0

200

400

600

800

1.000

1.200

1.400

1.600

1.800

0 200 400 600 800 1.000 1.200 1.400

x 10

00 E

UR

x 1000 EUR

Omzet KS

omze

t BTW

March ‘04 Dec ‘04Turnover Sample Survey Turnover Sample SurveyTurn

over

VAT d

ata

Data patterns

Unit Period Value Unit Period Value

2022253 Q1 3000 222201 Q1 2000

2022253 Q2 3000 222201 Q2 2500

2022253 Q3 3000 222201 Q3 0

2022253 Q4 4561 333301 Q4 2200

Issues to consider

Dimension Issues Methods

Time Reporting delays Now casting, imputation

Reporting < > Statistical period

Harmonisation (time series)

Representation Administrative units Linkage

Coverage errors Business register

Measurement Data patterns Model/time series

Corrections Updates

Different meaning Analyse

Administrative data:

Many merits

Explore

More than adding up

Access

Access

Set of base registers• data re-used• report errors• 1 contact person in NSI• large dependency users

Properties of Administrative data

1 Collected externally

2 Administrative goal

3 Different objectives

4 Subject to changes

2 Can I use of a specific data source?

What ‘steps’ are needed?

• Existence• Access• Fitness for use• Fall back scenario’s• Processing

Processing: data integration

Register F R A M E

Tax Unit Tax Unit Legal UnitStatistical Unit

3 2 1

Survey 4Statistical Unit

• Linkage• Micro-integration• Imputation/weighting• Macro-integration

Fall back scenarios

Quarterly turnover from Survey en Admin data– Risk only data from month 1 and 2– Model: missing units predicted from respondents– Indicator: how many and which units to call

Fall back scenarios

• Risk analyses• Strategy fall back scenario

– Obtain missing data elsewhere?– Model-based approach– Inform users– Postpone publication

Processing: robust estimation

• Medical expenses (volume, prices)• Coding system for medical treatments • First coding in 2008• Coding slightly revised 2009• New coding system 2010

Fitness for use

Dimension Description

Technical Checks Technical usability of file and data

Accuracy 1) Closeness to true values, 2) Correctness, reliability

Completeness Describe the corresponding set of real-world objects and variables

Time-related dimension Rime and/or stability related

Integrability Capable of undergoing integration or of being integrated.

Data

Use

Type of use Example Source typePopulation frame Chamber of Commerce data for Business

RegisterBase register

Source for observations

VAT data for quarterly turnover estimates Public admin source

Auxiliary data Internet data to verify the NACE code of enterprises

Organic source

Estimation of model parameters

Energy supplier data for average energy consumption for CPI

Private admin source

Audit quality of statistical data

Social security data to assess quality employment position based on sampling data

Public admin source

Concluding remarks

• Merits– Reduction response burden– Detailed & Longitudinal – Longitudinal data

• Consequences– Relations with administrative data holder– Prone to changes

Recommended