Eurostat
Secondary data: collection and use
Presented by
Arnout van DeldenMethodologist
Statistics Netherlands
Secondary data
Secondary sources
Statistical sources
Administrative sources
Organic sources
Public administrative sources
Private administrative sources
- Survey data from other organizations
- Trade register - Tax data - Medical register - Base register
- Product price data - Call detail records - Electricity data
- Dwelling prices on internet - GPS data - Social media messages
SecondarySources
Registers
Base registers
Statistical registers
specific
PAST PRESENT FUTURE
Official Statistics
Post-war II Identifiers Concepts: variable, units, time Population registers Administrative Census
– Denmark (1981), Finland (1991), Netherlands (2001)
Use (EU/EFTA Survey 2010)
Frame Observations Auxiliary data Model parameters Data quality
admin data only
admin and survey data
survey data only
not specified non response Total
BR 12,0 16,0 2,0 30
SBS 10,5 11,5 4,7 0,7 2,7 30
STS 4,0 11,0 14,0 0,0 1,0 30
Prodcom 0,0 10,0 13,0 1,0 2,0 26
In sum
Many types of data sources Long history Potentially very useful
CollectionExistenceAccess
Existence
• Data protection act• Organisation registers data under DPA
Existence
• Data protection act• Organisation registers data under DPA
Access
Element Explanation
Legislation National Statistics Act
Public approval Informed consent
Identification codes Base registers (business, dwellings, …)
Reliable data Obliged to report errors; multi users
Cooperation Contacts with administration authorities
In Sum
Explore potential data sources Access: legal uses and public consent
Proper use
Exploration phase
Source
Meta
Processing phase: data useful?y = 1,1625x
0
500
1.000
1.500
2.000
2.500
0 200 400 600 800 1.000 1.200 1.400 1.600
x 10
00 E
UR
x 1000 EUR
Omzet KS
omze
t BTW
y = 0,9743x
0
200
400
600
800
1.000
1.200
1.400
1.600
1.800
0 200 400 600 800 1.000 1.200 1.400
x 10
00 E
UR
x 1000 EUR
Omzet KS
omze
t BTW
March ‘04 Dec ‘04Turnover Sample Survey Turnover Sample SurveyTurn
over
VAT d
ata
Data patterns
Unit Period Value Unit Period Value
2022253 Q1 3000 222201 Q1 2000
2022253 Q2 3000 222201 Q2 2500
2022253 Q3 3000 222201 Q3 0
2022253 Q4 4561 333301 Q4 2200
Issues to consider
Dimension Issues Methods
Time Reporting delays Now casting, imputation
Reporting < > Statistical period
Harmonisation (time series)
Representation Administrative units Linkage
Coverage errors Business register
Measurement Data patterns Model/time series
Corrections Updates
Different meaning Analyse
Administrative data:
Many merits
Explore
More than adding up
Access
Access
Set of base registers• data re-used• report errors• 1 contact person in NSI• large dependency users
Properties of Administrative data
1 Collected externally
2 Administrative goal
3 Different objectives
4 Subject to changes
2 Can I use of a specific data source?
What ‘steps’ are needed?
• Existence• Access• Fitness for use• Fall back scenario’s• Processing
Processing: data integration
Register F R A M E
Tax Unit Tax Unit Legal UnitStatistical Unit
3 2 1
Survey 4Statistical Unit
• Linkage• Micro-integration• Imputation/weighting• Macro-integration
Fall back scenarios
Quarterly turnover from Survey en Admin data– Risk only data from month 1 and 2– Model: missing units predicted from respondents– Indicator: how many and which units to call
Fall back scenarios
• Risk analyses• Strategy fall back scenario
– Obtain missing data elsewhere?– Model-based approach– Inform users– Postpone publication
Processing: robust estimation
• Medical expenses (volume, prices)• Coding system for medical treatments • First coding in 2008• Coding slightly revised 2009• New coding system 2010
Fitness for use
Dimension Description
Technical Checks Technical usability of file and data
Accuracy 1) Closeness to true values, 2) Correctness, reliability
Completeness Describe the corresponding set of real-world objects and variables
Time-related dimension Rime and/or stability related
Integrability Capable of undergoing integration or of being integrated.
Data
Use
Type of use Example Source typePopulation frame Chamber of Commerce data for Business
RegisterBase register
Source for observations
VAT data for quarterly turnover estimates Public admin source
Auxiliary data Internet data to verify the NACE code of enterprises
Organic source
Estimation of model parameters
Energy supplier data for average energy consumption for CPI
Private admin source
Audit quality of statistical data
Social security data to assess quality employment position based on sampling data
Public admin source
Concluding remarks
• Merits– Reduction response burden– Detailed & Longitudinal – Longitudinal data
• Consequences– Relations with administrative data holder– Prone to changes