19
IT Directors’ Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ – Head of Unit B3 Georges PONGAS – Unit B3 Daniel SURANYI – Unit B5 Item 3.3.b of the agenda

IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ Head of Unit B3 Georges PONGAS Unit B3 Daniel

Embed Size (px)

DESCRIPTION

19-20 October 2010IT Directors’ Group Meeting 3 Data validation in the ESS Data validation takes place: –In Member States – before transmission –In Eurostat – before further dissemination and processing Several steps in validation: –Format validation –Codes validation –Data validation 1st level: basic checks – existence of mandatory fields, range checks, consistency of info inside file 2nd level: consistency with historical data / data from other sources (other countries, other statistics) 3rd level: expert validation / in-depths analysis

Citation preview

Page 1: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

IT Directors’ Group Meeting 119-20 October 2010

Sharing data validation tools in the ESS

Christine WIRTZ – Head of Unit B3Georges PONGAS – Unit B3Daniel SURANYI – Unit B5

Item 3.3.b of the agenda

Page 2: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 2

Background ITDG 2009:

– Eurostat presented new ESS vision with a specific view on IT architecture and IT tools

Harmonising statistical production processes welcomed, BUT– considered very ambitious– should be medium to longer term perspective

Sharing of IT tools– Implicit and crucial aspect of future infrastructure of the ESS– Virtual sharing OR sharing of real software could be envisaged– Challenges: IT standards, interdependence of actors– Linked to SDMX framework

Data validation services appropriate and logical step

Page 3: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 3

Data validation in the ESS

Data validation takes place:– In Member States – before transmission– In Eurostat – before further dissemination and processing

Several steps in validation: – Format validation – Codes validation– Data validation

• 1st level: basic checks – existence of mandatory fields, range checks, consistency of info inside file

• 2nd level: consistency with historical data / data from other sources (other countries, other statistics)

• 3rd level: expert validation / in-depths analysis

Page 4: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 4

Data validation tools developed by Eurostat

eVE = eDamis Validation Engine– Allows for a final check before transmitting data to Eurostat– Covers format, codes and basic checks – For files in SDMX-ML format; linked to DSD

EBB = Editing Building Block– Allows importing of external reference files

– Can be configured for 2nd level validation

– For files with an agreed format applied by all data senders (csv, flr,sdmx-ml, sdmx-edi)

Different ways of coding validation rules Validation of confidential data currently limited

Page 5: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 5

VIP “Data validation”

VIP on efficiency gains in the validation process Initial focus on Agriculture Statistics (Animal Production and

Farm Structure Survey 2010);

Ultimate aim: improve efficiency in the production chain from MS to Eurostat through improvements in the validation process

Looks at different approaches to achieve efficiency gains:– Implementing validation tools– Rebalancing validation tasks – ‘the sooner the better’ approach– Policy decisions and guidelines on the roles of different actors

Page 6: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 6

EBB = Editing Building Block

Page 7: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 7

EBB = Editing Building Block

Main Functionalities: Acceptance of various file formats and number of variables

(limited by the DBMS column number capacity) Validation programs are parametric Not only validation but also variable creation Possibility to manipulate incoming datasets Information is persistent (data+metadata) and reusable

Page 8: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 8

Functionality in detail

File management:

Fixed length records Variable length records (delimited) Sdmx-ML Gesmes files Scripting and web services Web version (dec 2010) and stand alone version

Page 9: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 9

Validation rules, ComputationsRules are logical expressions followed by: The rule name The rule severity The rule warning message A possible modification or creation of data depending on

the rule result. Rules can be horizontal or vertical (inter record) Special computations (outliers) Output statistics (summary) and details for errors (what

error where in the dataset).

Page 10: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 10

Dataset operations

Copy file, select part of file Split file Aggregate Rename Merge Append Reorder lines or columns

Page 11: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 11

The Architecture

Page 12: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 12

Applied in the domains

Foreign Trade Esspross AES, CVTS BOP EHIS Transport

Page 13: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 13

eVE = eDAMIS Validation Engine

Page 14: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 14

eDAMIS Validation Engine

Validation at the Single Entry Point Based on SDMX No installation or configuration in Member States

eDAMIS Web Forms: Real-Time Validation(in Production for some years)

eDAMIS Web Portal: New Validation Engine(available since eDAMIS 3.0, July 2010)

eDAMIS Web Application– Server side validation for all eWA versions– Local validation in eWA 3.1 (using rules from the server)

Page 15: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 15

eVE – Data Validation Features

Same Validation Rule Syntax as Web Forms Within one file and reference period Different rule sets per reference period possible Country specific rules

Mandatory values, Range checks Basic expressions

Validation of confidential datasets (Portal or eWA 3.1) Full automatic transmission and validation workflow

Page 16: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 16

Workflow eDAMIS Validation Engine

eWA

eWP

eDAMIS Server

Validation

SDMX Registry

DSDs

Eurostat Production Unit

MS

Dat

abas

e

WebService

SDMX

Report Report

Browser

SDMXConverterCSV

Settings

Page 17: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 17

Projects in Member States

Fisheries Pilot (May 2010)– Workshops in Sweden, Latvia, UK, Romania– Remote Testing in Netherlands (CBS)– SDMX based collection starts in December 2010

Aviation Pilot (September 2010)– Workshop with Statistik Austria

Results from both Pilot Projects– Implementation of SDMX is simpler than expected– Countries visited appreciated simple usage of eVE

Page 18: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 18

Conclusion

Tools have been developed that could be shared and tested

For SDMX-ML data collections: eVE offers basic validation without further configuration

EBB can be integrated without changing data transmission formats. It allows for more complex validation.

More sophisticated validation requires further multi-disciplinary reflection.

Page 19: IT Directors Group Meeting 1 19-20 October 2010 Sharing data validation tools in the ESS Christine WIRTZ  Head of Unit B3 Georges PONGAS  Unit B3 Daniel

19-20 October 2010 IT Directors’ Group Meeting 19

Your feedback on:

How to use these tools ESS-wide?

Suggestions for directions of improvements