19
Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data 10 July 2008 A methodological process for assessing variables coming from administrative sources

Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

Embed Size (px)

Citation preview

Page 1: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat)

An application to the Tax Authority Source (Sector Studies)

Session: Administrative data

10 July 2008

A methodological process for assessing variables coming from administrative

sources

Page 2: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 2

Agenda

A methodological process for assessing variables coming from administrative sources

Part 1 - Scheme for assessing administrative sources for statistical use

Part 2 - The process for assessing variables: the theory

Part 3 - An application to the Tax Authority Source - Sector Studies (SS)

Page 3: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 3

Agenda

A methodological process for assessing variables coming from administrative sources

Part 1 - Scheme for assessing administrative sources for statistical use

Part 2 - The process for assessing variables: the theory

Part 3 - An application to the Tax Authority Source - Sector Studies (SS)

Page 4: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 4

Background and motivations

A methodological process for assessing variables coming from administrative sources

• use of administrative archives in place of statistical surveys

• much more information on small medium enterprises • reducing the statistical burden

• development of a general scheme for validating administrative data as statistical ones

• focus on the process of assessing quantitative variables with benchmark

• Sector Studies (SS) compared with the statistical survey on SMEs as a benchmark source

Page 5: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 5

Scheme for assessing administrative sources 1/2

A methodological process for assessing variables coming from administrative sources

GENERAL ANALYSIS OF THE SOURCE

SELECTION OF THE ARCHIVE’S PARTS AND/OR VARIABLES TO BE ASSESSED

ANALYSIS OF VARIABLES

OUTPUT: SOURCE’S ASSESSMENT FOR STATISTICAL USE

Part 1

Page 6: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 6

Scheme for assessing administrative sources 2/2

A methodological process for assessing variables coming from administrative sources

Part 1 Preliminary judgement on an administrative archive

Is it possible to identify a well defined universe? yes/no

Reference population for coverage yes (specify)

Mean coverage level (specify percentage)

Coverage level (by existing disaggregation) between … and … (specify)

Are there any benchmark variables? yes (specify)/no

Can data be imported in a SAS format? yes/no

Data delivery timeliness (specify)

Does it need a formal request for data releasing? yes/no

Variables’ classifications specify existing problems

Judgementwe can/can not go on processing the source

Page 7: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 7

A methodological process for assessing variables coming from administrative sources

Part 1 - General scheme for assessing administrative sources for statistical use

Part 2 - The process for assessing variables: the theory

Part 3 - An application to the Tax Authority Source - Sector Studies (SS)

Page 8: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 8

Scheme for assessing quantitative variables having a benchmark

A methodological process for assessing variables coming from administrative sources

GENERAL ANALYSIS OF THE SOURCE

SELECTION OF THE ARCHIVE’S PARTS AND/OR VARIABLES TO BE ASSESSED

ANALYSIS OF VARIABLES

OUTPUT: SOURCE’S ASSESSMENT FOR STATISTICAL USE

Part 2

QUANTITATIVE ASSESSMENT

QUALITATIVE ASSESSMENT

INPUT: DATA (ARCHIVES)

OUTPUT: VARIABLE’S ASSESSMENT FOR STATISTICAL USE

PRACTICAL VALIDATION

OUTLIER DETECTION

STANDARD VALIDATION

Frequency validation

By group average validation

Microdata validation

Forcing the definition

IS COMPARISON OF DEFINITIONS EFFECTIVE?

N Y

N END

IS NUMERICAL EVALUATION EFFECTIVE?

Looking for a new benchmark

N Y

QUANTITATIVE ASSESSMENT

QUALITATIVE ASSESSMENT

Page 9: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 9

Qualitative and quantitative assessment of a variable 1/2

A methodological process for assessing variables coming from administrative sources

1. Outlier detection: irregular values/outliers

Irregular values:

• legal and economic constraints are taken into account

• inexistence of a systematic scheme for them

Outliers: 2 out of 3 criterions should be satisfied

i. statistical/probabilistic (Bienaymé–Tchebicev)

ii. computational/explorative (k-mean clustering method)

iii. deterministic (relative differences within the threshold values of 5%, 2% or 1%)

• inexistence of a systematic scheme for them

Part 2

Page 10: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 10

A methodological process for assessing variables coming from administrative sources

2. Standard validation:For both the source variable and its benchmark • calculation of the main descriptive statistics (mean, std, median,

asymmetry, kurtosis) and check whether the distance between the two variables decreases from the raw to the trimmed distribution

• through the kernel histogram check whether the series have the same graphical shape and the distribution of the deviations is symmetric, leptokurtic and with a zero mean.

3. Practical validation:It is useful for specific surveys and studies to check a level of concordance between the variable and its benchmark

• Frequency validation: concordance by class frequencies, simple index of dissimilarity, Cohen coefficient, relative weights of frequencies on the main diagonal, verification of correspondence by log-linear model adjusting test

• By group validation: per group concordance by checking the linearity of the groups’ means

• Micro-data validation: robust point to point correspondence through regression techniques

Quantitative assessment of a variable 2/2

Part 2

Page 11: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 11

A methodological process for assessing variables coming from administrative sources

Part 1 - General scheme for assessing administrative sources for statistical use

Part 2 - The process for assessing variables: the theory

Part 3 - An application to the Tax Authority Source - Sector Studies

Page 12: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 12

A methodological process for assessing variables coming from administrative sources

Part 3

Assessing the source: The accounting table of Sector Studies

Preliminary judgement on the accounting table of Sector Studies

Is it possible to identify a well defined universe? yes

Reference population for coverageItalian Business Register

(ASIA)

Mean coverage level 79.4%

Coverage level (by existing disaggregation) between 65% and 90%

Are there any benchmark variables? yes (SME survey)

Can data be imported in a SAS format? yes

Data delivery timeliness 15-months time lag

Does it need a formal request for data releasing? yes

Variables’ classificationssome differences exist but

they can be overcome

Judgement

the accounting table can be processed through the procedure for assessing

variables

Page 13: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 13

A methodological process for assessing variables coming from administrative sources

Part 3 Qualitative assessment

First hypothesis: assess each cost variable of Sector Studies with its own SME survey benchmark

Results: comparison of definitions is not effective for each variable. Even forcing the definition, the numerical evaluation is not effective: an appropriate combination of variables and its new benchmark should be taken into account

Second hypothesis: assess total cost of Sector Studies with the total cost of SME survey

Total cost of SS = Total cost of SME survey

Assessing the variable: the total cost 1/5

Page 14: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 14

A methodological process for assessing variables coming from administrative sources

Part 3 Quantitative assessment

Outlier detection and standard validation

1° 5° 10° 25° 75° 90° 95° 99° Pctl Pctl Pctl Pctl Pctl Pctl Pctl Pctl

Total cost of SS 1 2 6 10 36 192 957 2,538 3,473 4,700 6,566Total cost of SMEs 1 2 5 10 36 192 960 2,532 3,488 4,699 6,660

Variable Min Median Max

Assessing the variable: the total cost 2/5

VariableN of

recordsMean St. dev.

Coeff. of var.

Symmetry Kurtosis

Total cost of SS (raw data) 29.464 762,0 1.153,01 151,3 1,99 3,88

Total cost of SS (trimmed data) 29.123 749,1 1.132,77 151,22 1,94 3,12Total cost of SME survey 29.123 750,6 1.135,54 151,28 1,94 3,13

Page 15: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 15

A methodological process for assessing variables coming from administrative sources

Part 3

Fig 1. Distribution of the deviations of SS from SME survey values

Assessing the variable: the total cost 3/5

Page 16: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 16

A methodological process for assessing variables coming from administrative sources

Part 3

Practical validation

Frequency validationthe independence between the two sources does not exist: the percentage of frequencies on the main diagonal (79.8%) plus the percentage found on its contiguous lines achieves 95.8%

By group validation

Assessing the variable: the total cost 4/5

Page 17: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 17

A methodological process for assessing variables coming from administrative sources

Part 3

Micro-data validation

Correlation coefficient (Pearson): 0.99837Linear regression: TC(SS)= α + β×TC(SMEs)

a ≈ 0 b ≈ 1R2= 0.9967

Point to point correspondence (through the robust regression method) : 87,8%

ConclusionJudgment on the total cost: the variable is reliable at an individual level

Assessing the variable: the total cost 5/5

Page 18: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 18

Summary of the overall process

A methodological process for assessing variables coming from administrative sources

Part 3

N. record confrontati % sul totaleDati originari 29,149 100.0 Dati estremi 201 0.7 Dati ridotti 28,948 99.3

Valori caratteristici Variabile d'analisi Variabile benchmarkVariazione % tra

valori caratteristiciMedia 785.6 787.9 - 0.3 Mediana 207.0 207.0 - Std 1,187.5 1,191.4 - 0.3 Cv 151.2 151.2 - 1° Pctl 1.0 1.0 - 5° Pctl 5.0 5.0 - 10° Pctl 10.0 10.0 - 25° Pctl 38.0 38.0 - 75° Pctl 1,023.0 1,026.0 - 0.3 90° Pctl 2,609.0 2,615.0 - 0.2 95° Pctl 3,595.0 3,599.0 - 0.1 99° Pctl 4,734.0 4,765.0 - 0.7

Aspetti distibutivi Variabile d'analisi Variabile benchmark

Asimmetria Positiva (2,3) Positiva (2,3)

Curtosi Leptocurtica (8,8) Leptocurtica (8,7)

Giudizio sulla validazione di base

% di dati sulla diagonale principale

% di dati nella triangolare bassa

% di dati nella parte triangolare alta

Modello base IM: dev/gdl

Modello QIM: dev/gdl

Validazione di livello di dati sintetizzati in valori

medi di gruppo Trend lineare delle medie

% di dati robusti

% di dati non robusti

Giudizio di sintesi sulla variabile

193.5

27.7

Si può proseguire con il processo di validazione della variabile

7.6

Costi totali SDSCosti totali PMI

Variabile d'analisiVariabile benchmark

Validazione di base

Validazione qualitativa (definizioni)

Pre-analisi

0.99871

Validazione funzionale

La variabile può ritenersi molto affidabile dal punto di vista statistico.

88.3

11.7

Validazione a livello puntuale

Indice di correlazione (Pearson)

Validazione a livello di dati sintetizzati in

frequenze

82.0

10.4

Page 19: Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data

10 July 2008 19

Thank you for your attention

For further information:

Antonio Bernardi: [email protected]

Fulvia Cerroni: [email protected]

Viviana De Giorgi: [email protected]