Upload
anastasia-phillips
View
219
Download
1
Embed Size (px)
Citation preview
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS
Pieter Vlag – Statistics Netherlands
Joint work with
DESTATIS, Statistics Estonia, Statistics Finland, ISTAT,
Statistics Lithuania, ONS
Imputing missing admin data for STS-estimates 2
Outline of the presentation
• Scope of the project - use of admin data for STS
• Two situations:
a. VAT fairly complete and representative - VAT representative
b. VAT not complete and not-representative - VAT not representative
• VAT representative
a. imputing missing values
• Imputing missing values
a. methods for imputations
b. which units to impute
• Conclusions and implications for other projects
3
Scope of the project
Final situation: (after year)- all admin data are available for NSIs- data cover the population
Monthly and quarterly estimates:Part of admin data are ‘missing’
L.E. (survey)
admin data
L.E. (survey)
admin data
Missing
Assumption If admin data are complete, possible to use for statistics
Challenge How to estimate for ‘ missing’ admin data in case of monthly and quarterly estimates
Scope: turnover (VAT-registration), wages+employees (“social security data”)
Imputing missing admin data for STS-estimates
4
Additional Value of ESSnet AdminData
• VAT = Value Added Tax
• The European Union value added tax (EU VAT) is a value added tax encompassing member states in the European Union VAT area. Joining in this is compulsory for member states of the European Union.
• Each Member State's national VAT legislation must comply with the provisions of EU VAT law as set out in Directive 2006/112/EC.
TRANSLATION TO STATISTCS
• INPUT: Available VAT-information quite similar in Europe !
• OUTPUT: obligations also similar in Europe (STS, SBS. ESR regulations)
• CONCLUSIONS ESSNET: methodological challenges in use of admin data indentical -> solution may differ, but only limited
Imputing missing admin data for STS-estimates
Imputing missing admin data for STS-estimates 5
Two situations
Situation A:
L.E. (100 % sample) L.E (100 % sample)
VATAlmost complete
VATNot available or very limited
GENERAL SITUATION FOR
Q; t+45days
GENERAL SITUATION FOR
M; t+30 days
SITUATION A. or B. FOR OTHER ESTIMATES(Q-flash; M-T+45/50d)
DIFFERS PER COUNTRY
Situation B:
experimental meth.
NOT DISCUSSED FURTHER
established techniques•Level estimates•Imputation of missing data (with available VAT)
100 % sample
AdmindataFinal situation
100 % sample
Admindata
Missing
STS
SITUATION A:Admindata coverage almost complete
ESTIMATION ONLY BASED ON ADMIN DATA
SITUATION B:Admindata coverage incomplete
ADMIN DATA = AUXILIARY INFORMATION
sample
VAT
ESTIMATION
VAT
sample
QUALITY STS-ESTIMATES:Revision compared to final estimate
T
eB
T
tt
1
T
eE
T
tt
1
average bias:
average error:
L.E.
SME
Methods Situation A: methodology
VATT-x
7
Methods for imputations
• Analysed several production systems:
i.e. DE, F, “Nordic countries’, NL , I
• Imputation of “missing VAT” based on:
Ot/Ot-1, Ot/Ot-12 of available VAT – or similar approaches
• Stratification levels for calculation stratum imputations differ
from
NACE 2-digit x 2-size classes
to
NACE 4-digit x 9 size classes
KEY QUESTION: Do these different approaches lead to different output, because methods are generally applied when coverage of L.E. survey + available VAT exceeds 90 % of target variable ?
Imputing missing admin data for STS-estimates
8
Methods for imputations– testing of different methodologies (example Estonia)
Conclusion: Imputation method provide similar results if the population is fixed and VAT covers > 80 % of population
Turnover growth rate, NACE 47
1,0
1,1
1,2
1,3
1,4
1 2 3 4 5 6 7 8
Month, Year 2011
Gro
wth
rat
e
IMP Ot/Ot-12 NACE2 (1st trans.)
IMP Ot/Ot-12 STS (1st trans.)
IMP Ot/Ot-1 NACE2 (1st trans.)
IMP Ot/Ot-1 STS (1st trans.)
IMP Ot/Ot-12 NACE2 (2nd trans.)
IMP Ot/Ot-12 STS (2nd trans.)
IMP Ot/Ot-1 NACE2 (2nd trans.)
IMP Ot/Ot-12 STS (2nd trans.)
Survey growth rate
Imputing missing admin data for STS-estimates
9
Comparing imputations with realisations (approach Statistics Finland)
• Five imputation rules for current period at mico-level
• Imputation rules automatically evaluated and compared by calculating maximum proportional forecast errors using data concerning the five latest months. The selection rules are:
• An imputation rule < 20% maximum proportional forecast error and the same direction of change as in the last two months is automatically admissible;
• The model with the smallest maximum error is considered best
Main difference with other detected practices:
• No assumption; available VAT = representative
• Not all missing data imputed (in practice 20 - 50 %)
Imputing missing admin data for STS-estimates
Mean annual change
Geometric mean of monthly changes
Previous turnover
Mean turnover
Turnover of comparison month
10
Comparing imputations with realisations(more precise conclusions)
Imputing missing admin data for STS-estimates
Explanations:- Outlier effect on calculated Ot/Ot-1 or Ot/Ot-12 values- Late VAT-reporters are likely a selective group in countries with automatic fining systems in case of late VAT-reporting.
impact of selectivity on output is generally neglible due to high coverage available data
11
Which units to impute
Imputing missing admin data for STS-estimates
at STS-estimate
PROVISIONAL POPULATION
I ACTIVE reporter active
(a) x
II ASSUMED ACTIVE: (late) reporting expected IMPUTED VALUE
correctly assumed active (b)
incorrectly assumed active
(c)
III.ASSUMED INACTIVE: no (late) reporting expected NO IMPUTED VALUE
incorrectly assumed inactive
(c)
correctly assumed inactive
(a)
ESTIMATE = I + II + III REVISED ESTIMATE = A.
when all data complete
FINAL POPULATION
REVISION DUE TO: (a) revised VAT; (b) imputation technique; (c) uncertainty provisional population
A. ACTIVE B. INACTIVE
12
Impact on resultsexample Italy
Imputing missing admin data for STS-estimates
without later
reportingimputed values at
t
reported values at
t+12imputed
values a tImputed
valueReported
valuesa b c d e f=(b+d) g=(c+e) h=(a+f) i=(a+g)
10 98.2 1.2 1.2 0.4 0.7 1.6 1.8 99.8 100.015 98.4 1.0 1.0 0.5 0.7 1.5 1.6 99.8 100.025 98.6 0.9 0.9 0.3 0.4 1.3 1.4 99.9 100.028 98.5 1.0 1.0 0.3 0.5 1.3 1.5 99.8 100.030 98.1 1.2 1.2 0.5 0.7 1.7 1.9 99.8 100.041 97.7 1.4 1.4 0.8 1.0 2.2 2.3 99.9 100.047 98.0 1.2 1.2 0.6 0.8 1.8 2.0 99.8 100.064 98.2 1.2 1.2 0.3 0.6 1.5 1.8 99.7 100.071 98.3 1.1 1.1 0.3 0.6 1.5 1.7 99.8 100.081 96.3 1.8 1.8 0.7 2.0 2.5 3.7 98.8 100.0
Nace Division
Early reporters
Total reported
value
with later reporting
Units with imputed missing Units
without imputed values at
t but reporting
at t+12
Total imputed
value
imputation technique
uncert. provisional population
Conclusion: effect on revision caused by uncertainty of units to be imputed is larger than imputation technique itself
13
Conclusions
• When using Admin Data for STS missing data are imputed
• Most widely used imputation rules are: Ot/Ot-1 or Ot/Ot-12
• Taking into account large coverage of available data exact chosen imputation technique has only limited impact on outcome, despite the indication that the main assumption of the used techniques “available VAT = representative” might not be 100 % correct.
• More important than the imputation technique = estimate for provisional population
Imputing missing admin data for STS-estimates