Upload
lily-sharp
View
213
Download
0
Embed Size (px)
DESCRIPTION
19-20 October 2010IT Directors’ Group Meeting 3 Data validation in the ESS Data validation takes place: –In Member States – before transmission –In Eurostat – before further dissemination and processing Several steps in validation: –Format validation –Codes validation –Data validation 1st level: basic checks – existence of mandatory fields, range checks, consistency of info inside file 2nd level: consistency with historical data / data from other sources (other countries, other statistics) 3rd level: expert validation / in-depths analysis
Citation preview
IT Directors’ Group Meeting 119-20 October 2010
Sharing data validation tools in the ESS
Christine WIRTZ – Head of Unit B3Georges PONGAS – Unit B3Daniel SURANYI – Unit B5
Item 3.3.b of the agenda
19-20 October 2010 IT Directors’ Group Meeting 2
Background ITDG 2009:
– Eurostat presented new ESS vision with a specific view on IT architecture and IT tools
Harmonising statistical production processes welcomed, BUT– considered very ambitious– should be medium to longer term perspective
Sharing of IT tools– Implicit and crucial aspect of future infrastructure of the ESS– Virtual sharing OR sharing of real software could be envisaged– Challenges: IT standards, interdependence of actors– Linked to SDMX framework
Data validation services appropriate and logical step
19-20 October 2010 IT Directors’ Group Meeting 3
Data validation in the ESS
Data validation takes place:– In Member States – before transmission– In Eurostat – before further dissemination and processing
Several steps in validation: – Format validation – Codes validation– Data validation
• 1st level: basic checks – existence of mandatory fields, range checks, consistency of info inside file
• 2nd level: consistency with historical data / data from other sources (other countries, other statistics)
• 3rd level: expert validation / in-depths analysis
19-20 October 2010 IT Directors’ Group Meeting 4
Data validation tools developed by Eurostat
eVE = eDamis Validation Engine– Allows for a final check before transmitting data to Eurostat– Covers format, codes and basic checks – For files in SDMX-ML format; linked to DSD
EBB = Editing Building Block– Allows importing of external reference files
– Can be configured for 2nd level validation
– For files with an agreed format applied by all data senders (csv, flr,sdmx-ml, sdmx-edi)
Different ways of coding validation rules Validation of confidential data currently limited
19-20 October 2010 IT Directors’ Group Meeting 5
VIP “Data validation”
VIP on efficiency gains in the validation process Initial focus on Agriculture Statistics (Animal Production and
Farm Structure Survey 2010);
Ultimate aim: improve efficiency in the production chain from MS to Eurostat through improvements in the validation process
Looks at different approaches to achieve efficiency gains:– Implementing validation tools– Rebalancing validation tasks – ‘the sooner the better’ approach– Policy decisions and guidelines on the roles of different actors
19-20 October 2010 IT Directors’ Group Meeting 6
EBB = Editing Building Block
19-20 October 2010 IT Directors’ Group Meeting 7
EBB = Editing Building Block
Main Functionalities: Acceptance of various file formats and number of variables
(limited by the DBMS column number capacity) Validation programs are parametric Not only validation but also variable creation Possibility to manipulate incoming datasets Information is persistent (data+metadata) and reusable
19-20 October 2010 IT Directors’ Group Meeting 8
Functionality in detail
File management:
Fixed length records Variable length records (delimited) Sdmx-ML Gesmes files Scripting and web services Web version (dec 2010) and stand alone version
19-20 October 2010 IT Directors’ Group Meeting 9
Validation rules, ComputationsRules are logical expressions followed by: The rule name The rule severity The rule warning message A possible modification or creation of data depending on
the rule result. Rules can be horizontal or vertical (inter record) Special computations (outliers) Output statistics (summary) and details for errors (what
error where in the dataset).
19-20 October 2010 IT Directors’ Group Meeting 10
Dataset operations
Copy file, select part of file Split file Aggregate Rename Merge Append Reorder lines or columns
19-20 October 2010 IT Directors’ Group Meeting 11
The Architecture
19-20 October 2010 IT Directors’ Group Meeting 12
Applied in the domains
Foreign Trade Esspross AES, CVTS BOP EHIS Transport
19-20 October 2010 IT Directors’ Group Meeting 13
eVE = eDAMIS Validation Engine
19-20 October 2010 IT Directors’ Group Meeting 14
eDAMIS Validation Engine
Validation at the Single Entry Point Based on SDMX No installation or configuration in Member States
eDAMIS Web Forms: Real-Time Validation(in Production for some years)
eDAMIS Web Portal: New Validation Engine(available since eDAMIS 3.0, July 2010)
eDAMIS Web Application– Server side validation for all eWA versions– Local validation in eWA 3.1 (using rules from the server)
19-20 October 2010 IT Directors’ Group Meeting 15
eVE – Data Validation Features
Same Validation Rule Syntax as Web Forms Within one file and reference period Different rule sets per reference period possible Country specific rules
Mandatory values, Range checks Basic expressions
Validation of confidential datasets (Portal or eWA 3.1) Full automatic transmission and validation workflow
19-20 October 2010 IT Directors’ Group Meeting 16
Workflow eDAMIS Validation Engine
eWA
eWP
eDAMIS Server
Validation
SDMX Registry
DSDs
Eurostat Production Unit
MS
Dat
abas
e
WebService
SDMX
Report Report
Browser
SDMXConverterCSV
Settings
19-20 October 2010 IT Directors’ Group Meeting 17
Projects in Member States
Fisheries Pilot (May 2010)– Workshops in Sweden, Latvia, UK, Romania– Remote Testing in Netherlands (CBS)– SDMX based collection starts in December 2010
Aviation Pilot (September 2010)– Workshop with Statistik Austria
Results from both Pilot Projects– Implementation of SDMX is simpler than expected– Countries visited appreciated simple usage of eVE
19-20 October 2010 IT Directors’ Group Meeting 18
Conclusion
Tools have been developed that could be shared and tested
For SDMX-ML data collections: eVE offers basic validation without further configuration
EBB can be integrated without changing data transmission formats. It allows for more complex validation.
More sophisticated validation requires further multi-disciplinary reflection.
19-20 October 2010 IT Directors’ Group Meeting 19
Your feedback on:
How to use these tools ESS-wide?
Suggestions for directions of improvements