1
Objective: building a longitudinal micro-database on Italian formal educational paths and qualifications, using administrative microdata Useful to: update the Educational Attainment of Population and support socio-economic statistics production From 2011 onwards the BIT is fed by about with information on: educational attainment in the reference year (ISCED 2011 - first digit); information on the course of study attended (type of course, school/university); information on the qualifications acquired from 2011 onwards (attainment date, school/university, type of qualification coded in detail). E.g. extraction of individual_code 52881082 from Table 1 and Table 2 Accurate analysis of the available administrative sources through close (mainly Ministry for Education, University and Research) to evaluate data availability and usability to acquire fundamental information on data and metadata to define actions to improve data quality and data availability from data holders of statistically relevant and usable administrative datasets Loading, storage and integration of the administrative microdata in the (SIM) to standardize and streamline processes to comply with legislation on data privacy Contextual assignment to each statistical unit (individuals, economic units, places) of the (SIM- ID code), unique across administrative datasets and across time Phase 2: A specific database on Education and Qualifications named BIT (‘Base integrata su Istruzione e Titoli di studio’) integrates all relevant educational datasets. These administrative datasets record individual enrolments and qualifications and provide microdata from Primary education to PhD courses. STRUCTURE: Two tables: Table 1 – Partitioned per years, contains the individual status in education in each year Table 2 – Contains all information recorded in the administrative datasets about the qualifications acquired POPULATION: Microdata refers to people who are in a formal educational path in Italy, from 2011 onwards The acquisition of administrative data for the construction of the BIT is a complex process that evolves over time making possible the improvement of data quality especially for the data completeness. The presence of the SIM codes for integration and the overall good quality already allow to support statistical processes and to develop longitudinal studies. The first results are promising in view of the construction of the Education Register. INPUT DATA QUALITY Availability is improving over time Punctuality to be improved ---> BIT timeliness 16 months Not significant number of missing values for relevant variables in each dataset Good Integrability, not relevant missing values for identification variables in each datasets BIT DATA QUALITY - Coverage Enrolled in 2013-14 Coverage 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 PhD Master's degree A.F.A.M. second level Bachelor's degree A.F.A.M. first level Short degree (o.p.) Academic Diploma (o.p.) ITS IFTS Upper secondary edu. IFP Lower secondary edu. Primary school Benchmark macro data BIT Algorithms are implemented to derive from administrative data the statistical variable Educational Attainment consistent with the first digit of ISCED 2011 The level of education, as ordinal variable, is used both to recover missing data and to check the consistency over time The has been implemented in the BIT. University qualifications have been coded in smallest detail by automated record linkage procedure and residual clerical review Sources Year of availability Results for Primary and Secondary School Qualifications 2013 National Register of University Students (ANS) 2013 National Register of Pupils 2014 Bolzano Register of Pupils 2015 PhD degrees and PhD enrolled students (I° year) 2016 Enrolled and Diplomas of Higher Technical education (ITS) 2016 Benchmark macro data BIT 2.799.553 2.803.761 1.743.587 1.729.879 328.174 0 2.647.057 2.670.734 2.496 0 4.163 3.985 5.190 0 78 155 38.784 0 1.022.273 1.109.480 12.921 0 651.942 674.679 33.507 11.275 Tot. 9.289.625 9.003.948 Table 1 YEAR INDIVIDUAL_ CODE EDUCATIONAL _ATTAINMENT QUALIFICATION_ VALIDITY_DATE STATUS_ CODE INSTITUTE_ CODE COURSE_ YEAR CONSI- STENCY COURSE_ CODE CLASS_ CODE STATUS_SOURCES STATUS_EVENT_DESCRIPTION 2011 52881082 07 01/08/2011 3 PDPS01301A 5 0 Pupils_Register2010/2011 Upper secondary school -Frequency 2012 52881082 07 01/01/2012 MT 19 1 0 134462 2027 Univ_Enrollements2011/2012 Bachelor’s-Admission 2013 52881082 07 01/01/2013 MT 19 2 0 134462 2027 Univ_Enrollements2012/2013 Bachelor’s-Annual enrollment 2014 52881082 13 25/09/2014 MT 19 3 0 134462 2027 Univ_Enrollements2013/2014 Bachelor’s-Annual enrollment 2015 52881082 13 01/01/2015 MS 19 1 0 155523 3072 Univ_Enrollements2013/2015 Master's-Admission Table 2 INDIVIDUAL _CODE EDU_ ATTAIN MENT QUALIFICATIO N_DATE QUALIFICATION_ SOURCE INSTITU TE_ CODE INSTITUTE_ DESCRIPTION QAL_ CODE QUAL_ DESCRIPTION QUAL_ VOTE UNI_ COURSE_ CODE UNI_ CLASS_ CODE LEVEL4_ DESCRIPTION LEVEL4_ CODE LEVEL3_ DESCRIPTION LEVEL2_ DESCRIPTION LEVEL1_ DESCRIPTION LEVEL1_ CODE 52881082 07 01/08/2011 SCHOOL_RESULTS 2010/2011 PDPS01 301A SCIENTIFIC HIGH SCHOOL PS00 SCIENTIFIC HIGH SCHOOL DIPLOMA 97/100 Diploma of upper secondary school specializing in scientific studies 40501 Diploma of upper secondary school specializing in scientific studies Diploma of upper secondary school specializing in scientific studies Diploma of upper secondary education (4-5- years) 40000 52881082 13 25/09/2014 UNIVERSITY_ GRADUATED2014 19 UNIVERSITY OF PADOVA MT Bachelor’s degree CHEMISTRY 98/110 134462 2027 Chemistry 71002001 Chemistry; industrial chemistry Pharmaceutical- chemical field Bachelor's degree 71000000 EDUCATIONAL_ATTAINMENT ‘07’= Diploma of upper secondary education EDUCATIONAL_ATTAINMENT ‘13’= Bachelor’s degree The linking procedures have coded university qualifications in the most detailed level of Census Classification in 94% of cases. Collecting and integrating administrative microdata on Education and Qualifications Francesca Cuppone ([email protected]), Grazia Di Bella ([email protected]), Maria Carla Runci ([email protected]) NTTS 2017 Phase 1: METHODS RESULTS CONCLUSIONS Acknowledgments The authors would like to thank the Statistical Office of the Ministry for Education, University and Research for the fruitful cooperation. References P.J.H. Daas, et al. Reports on methods preferred for the quality indicators of administrative data sources, Deliverable 4.2 of Workpackage 4 of the BLUE-ETS project (2011). M. C. Runci, G. Di Bella and F. Cuppone, Integrated Education Microdata to Support Statistics Production, in Data Science and Social Research - Epistemology, Methods, Technology and Applications, Springer series: Studies in Classification, Data Analysis, and Knowledge Organization (2017) (in press). A. Wallgren and B. Wallgren, Register-Based Statistics. Administrative Data for Statistical Purposes, Wiley (2007). The BIT can update the Population Educational Attainment at a reference time providing microdata on individuals acquiring new qualifications. Each year, nearly 2 million people increase their level of education. The BIT is currently built for the years 2011 to 2014 (2015 in progress). It has already been used for data dissemination of 2014 Enterprises’ employment by highest level of education. Automated procedure to apply the Quality Report Card to the input datasets is under construction Vocational training qualifications managed by the Regions (IFP) represents the largest undercoverage. Academic degrees of artistic and musical advanced training (AFAM) I and II level follow.

Collecting and integrating administrative microdata on ......A. Wallgren and B. Wallgren, Register-Based Statistics. Administrative Data for Statistical Purposes, Wiley (2007). The

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Collecting and integrating administrative microdata on ......A. Wallgren and B. Wallgren, Register-Based Statistics. Administrative Data for Statistical Purposes, Wiley (2007). The

Objective: building a longitudinal micro-database on Italian formal educational paths and qualifications, using administrative microdata Useful to: update the Educational Attainment of Population and support socio-economic statistics production

From 2011 onwards the BIT is fed by about with information on: educational attainment in the reference year (ISCED 2011 - first digit); information on the course of study attended (type of course, school/university); information on the qualifications acquired from 2011 onwards (attainment date, school/university, type of qualification coded in detail).

E.g. extraction of individual_code 52881082 from Table 1 and Table 2

• Accurate analysis of the available administrative sources through close (mainly Ministry for Education, University and Research)

to evaluate data availability and usability to acquire fundamental information on data and metadata

to define actions to improve data quality and data availability from data holders

of statistically relevant and usable administrative datasets

• Loading, storage and integration of the administrative microdata in the (SIM)

to standardize and streamline processes to comply with legislation on data privacy

• Contextual assignment to each statistical unit (individuals, economic units, places) of the (SIM- ID code), unique across administrative datasets and across time

Phase 2:

A specific database on Education and Qualifications named BIT (‘Base integrata su Istruzione e Titoli di studio’) integrates all relevant educational

datasets. These administrative datasets record individual enrolments and qualifications and provide microdata from Primary education to PhD courses.

• STRUCTURE: Two tables: Table 1 – Partitioned per years, contains the individual status in education in each year Table 2 – Contains all information recorded in the administrative datasets about the qualifications acquired • POPULATION: Microdata refers to people who are in a formal educational path in Italy, from 2011 onwards

The acquisition of administrative data for the construction of the BIT is a complex process that evolves over time making possible the improvement of data quality especially for the data completeness. The presence of the SIM codes for integration and the overall good quality already allow to support statistical processes and to develop longitudinal studies. The first results are promising in view of the construction of the Education Register.

INPUT DATA QUALITY

• Availability is improving over time

• Punctuality to be improved ---> BIT timeliness 16 months • Not significant number of missing values for relevant variables in

each dataset • Good Integrability, not relevant missing values for identification

variables in each datasets

BIT DATA QUALITY - Coverage

Enrolled in 2013-14 Coverage

0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 3.000.000

PhD

Master's degree

A.F.A.M. second level

Bachelor's degree

A.F.A.M. first level

Short degree (o.p.)

Academic Diploma (o.p.)

ITS

IFTS

Upper secondary edu.

IFP

Lower secondary edu.

Primary school

Benchmark macro data BIT

Algorithms are implemented to derive from administrative data the statistical variable

Educational Attainment consistent with the first digit of ISCED 2011

The level of education, as ordinal variable, is used both to

recover missing data and to check the consistency over time

The has been implemented in

the BIT. University qualifications have been coded in smallest detail by automated record linkage

procedure and residual clerical review

Sources Year of

availability

Results for Primary and Secondary School Qualifications 2013

National Register of University Students (ANS) 2013

National Register of Pupils 2014

Bolzano Register of Pupils 2015

PhD degrees and PhD enrolled students (I° year) 2016

Enrolled and Diplomas of Higher Technical education (ITS) 2016

Benchmark macro data

BIT

2.799.553 2.803.761

1.743.587 1.729.879

328.174 0

2.647.057 2.670.734

2.496 0

4.163 3.985

5.190 0

78 155

38.784 0

1.022.273 1.109.480

12.921 0

651.942 674.679

33.507 11.275

Tot. 9.289.625 9.003.948

Table 1

YEAR INDIVIDUAL_ CODE

EDUCATIONAL_ATTAINMENT

QUALIFICATION_ VALIDITY_DATE

STATUS_ CODE

INSTITUTE_ CODE

COURSE_ YEAR

CONSI- STENCY

COURSE_CODE

CLASS_ CODE

STATUS_SOURCES STATUS_EVENT_DESCRIPTION … … …

2011 52881082 07 01/08/2011 3 PDPS01301A 5 0 Pupils_Register2010/2011 Upper secondary school -Frequency … … …

2012 52881082 07 01/01/2012 MT 19 1 0 134462 2027 Univ_Enrollements2011/2012 Bachelor’s-Admission … … …

2013 52881082 07 01/01/2013 MT 19 2 0 134462 2027 Univ_Enrollements2012/2013 Bachelor’s-Annual enrollment … … …

2014 52881082 13 25/09/2014 MT 19 3 0 134462 2027 Univ_Enrollements2013/2014 Bachelor’s-Annual enrollment … … …

2015 52881082 13 01/01/2015 MS 19 1 0 155523 3072 Univ_Enrollements2013/2015 Master's-Admission

Table 2

INDIVIDUAL_CODE

EDU_ ATTAINMENT

QUALIFICATION_DATE

QUALIFICATION_ SOURCE

INSTITUTE_ CODE

INSTITUTE_ DESCRIPTION

QAL_ CODE

QUAL_ DESCRIPTION

QUAL_ VOTE

UNI_ COURSE_CODE

UNI_ CLASS_ CODE

LEVEL4_ DESCRIPTION

LEVEL4_ CODE

LEVEL3_ DESCRIPTION

… LEVEL2_ DESCRIPTION

… LEVEL1_ DESCRIPTION

LEVEL1_ CODE

52881082 07 01/08/2011 SCHOOL_RESULTS 2010/2011

PDPS01301A

SCIENTIFIC HIGH SCHOOL

PS00 SCIENTIFIC HIGH SCHOOL DIPLOMA

97/100

Diploma of upper secondary school specializing in scientific studies

40501

Diploma of upper secondary school specializing in scientific studies

Diploma of upper secondary school specializing in scientific studies

Diploma of upper secondary education (4-5-years)

40000

52881082 13 25/09/2014 UNIVERSITY_ GRADUATED2014

19 UNIVERSITY OF PADOVA

MT Bachelor’s degree – CHEMISTRY

98/110 134462 2027 Chemistry 71002001 Chemistry; industrial chemistry

… Pharmaceutical-chemical field

… Bachelor's degree 71000000

EDUCATIONAL_ATTAINMENT ‘07’= Diploma of upper secondary education EDUCATIONAL_ATTAINMENT ‘13’= Bachelor’s degree

The linking procedures have coded university qualifications in the most

detailed level of Census Classification in 94% of cases.

Collecting and integrating administrative microdata on Education and Qualifications

Francesca Cuppone ([email protected]), Grazia Di Bella ([email protected]), Maria Carla Runci ([email protected]) NTTS 2017

Phase 1:

METHODS

RESULTS

CONCLUSIONS

Acknowledgments The authors would like to thank the Statistical Office of the Ministry for Education, University and Research for the fruitful cooperation.

References P.J.H. Daas, et al. Reports on methods preferred for the quality indicators of administrative data sources, Deliverable 4.2 of Workpackage 4 of the BLUE-ETS project (2011). M. C. Runci, G. Di Bella and F. Cuppone, Integrated Education Microdata to Support Statistics Production, in Data Science and Social Research - Epistemology, Methods, Technology and Applications, Springer series: Studies in Classification, Data Analysis, and Knowledge Organization (2017) (in press). A. Wallgren and B. Wallgren, Register-Based Statistics. Administrative Data for Statistical Purposes, Wiley (2007).

The BIT can update the Population Educational Attainment

at a reference time providing microdata on individuals acquiring new qualifications. Each year, nearly 2 million people increase

their level of education.

The BIT is currently built for the years

2011 to 2014 (2015 in progress).

It has already been

used for data dissemination of 2014 Enterprises’ employment by highest level of

education.

Automated procedure to

apply the Quality Report

Card to the input datasets is under

construction Vocational training

qualifications managed by the Regions (IFP)

represents the largest undercoverage. Academic

degrees of artistic and musical advanced training

(AFAM) I and II level follow.