40
13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

Embed Size (px)

Citation preview

Page 1: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

13 & 14 October 2008 IT Directors Group

Item 3.3.2 of the Agenda

CVD Antonio Consoli - Eurostat B1

Page 2: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

2 13 & 14 October 2008 IT Directors Group

Content

1. Intro

2. CVD Architecture

3. CVD Components

4. Production systems

5. Overview 2011

Page 3: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

3 13 & 14 October 2008 IT Directors Group

Intro

CVD = Statistical Business Process Model

(in French = Cycle de Vie des Données)

The new CVD project was launched in 2004 with 2 main objectives:

Rationalisation of existing IT systems. Harmonisation of IT architecture.

The gradual implementation has now started

Page 4: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

4 13 & 14 October 2008 IT Directors Group

Why?

Avoid stove pipe processing allowing synergy / economies of scale

Simplify the statistical process, increase the level of automation and integration of systems

Free resources (money + human) from IT to core business Ease mobility and backing-up of staff Achieve higher quality of common components Simplify data exchange and quality of data exchanged with

Member States

Page 5: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

5 13 & 14 October 2008 IT Directors Group

How?

Put together statisticians and IT experts Gradual implementation through use of opportunities Follow the CVD architecture / guidelines (Re)-use (generic) software (e.g. BBs) Harmonise and rationalise systems

Create economies of scale Make maintenance of organisation software simpler

Page 6: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

6 13 & 14 October 2008 IT Directors Group

Functional view

TDS

CVD Services

SEPMH CVD ManagerCVD Manager NUIReferenceBB’s

Statistical application

CVD ServicesCVD Services

SEPMH CVD ManagerCVD Manager CVD ManagerCVD Manager NUIReferenceBB’s

Statistical application

Page 7: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

7 13 & 14 October 2008 IT Directors Group

Statistical Business Process Model view

Data files

Metadata Handler

Production systemInternet Portal

DataExplorer

Manager

Data in production

Reference data

BB BB

COLLECT VALIDATE DISSEMINATE

ASSIST

User support

eDAMIS

Reception Storage

ANALYSE

METHODOLOGY + QUALITY ASSURANCE + METADATA MANAGEMENT

Page 8: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

8 13 & 14 October 2008 IT Directors Group

Cooperate with providers

1.4

Acquire domain intelligence

3.1

Set up collection1.1

Run collection1.2

Load data1.3

Edit2.1

Detect & treat outliers2.2

Impute2.3

Derive new variables2.4

Integrate and load data

2.5

Prepare tables forDissemination

3.5

Interpret and explain3.4

Check quality3.3

Produce statisticsor indicators

3.2

Manage customer queries

4.2

Produce products4.1

Collect1

Disseminate4

Analyse3

Validate2

Metadatamanagement

5

Statistical Business Process Model (data processing)

Page 9: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

9 13 & 14 October 2008 IT Directors Group

CVD components & Statistical Business Process

+

Components especially designed for the sub-process

Components designed for other sub-process but could be used for this sub-process as well if the functionalities are appropriate

Other uses may be possible in specific cases

Page 10: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

10 13 & 14 October 2008 IT Directors Group

Relation table of Statistical Business Process Model & CVD components

Page 11: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

11 13 & 14 October 2008 IT Directors Group

Component concept

Limited set of production systems Set of generic specialised components, which can be used by the

production systems Guidelines for implementation, development, data exchange

CVD is not a monolithic system to be used in all statistical domains but it is composed of:

Page 12: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

12 13 & 14 October 2008 IT Directors Group

Components

eDAMIS BBs (Building Blocks) MH (Metadata Handler) EUROBASE (common

reference database) Data Explorer

Generic tools Production systems

CVD systems– GSAST– NAPS– COMEXT

Existing systems– SAM– Eurocube– FAME

Page 13: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

13 13 & 14 October 2008 IT Directors Group

eDAMIS Supports the transmission of statistical data between Member States and

Eurostat Provides acknowledgements of data arrival Automatic reminders for late data Ensures secure and well monitored transmission of data through SEP Delivery of data to production environments User access management Links to structural metadata Automatic generation of SDMX-ML messages for online data

transmission Handles standardised messages Basic validation in both interactive and batch mode & format conversion,

to converge in future:–On eDAMIS data files field, some intra-record and limited inter-record

checks–Validations in Web Forms which apply directly on cells

Dataset occurrences received through SEP for EU 27:2005 26% 2006 32% 2007 38% 2008 (Q1) 43%

Page 14: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

14 13 & 14 October 2008 IT Directors Group

eDAMIS - PlanStatus: v. 2.6 in production. 43% of data sets received in Eurostat

through it.September 2008 v.2.7:

–Improvement in performances–Validation and conversion of SDMX-ML (as currently for GESMES)–Workflow manager can receive signals from other applications

March 2009 v. 3:–Dataset inventory, validation engine, further integration of Stadium db

2010:–Pull approach fully supported–ECAS sign in on internal and external users, link to CVD-MH

Page 15: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

15 13 & 14 October 2008 IT Directors Group

Editing BB (EBB) Executes editing rules optionally with reference data (lookup tables) Intra-cell, intra-record (horizontal) and inter-record (vertical) rules Reports on the rules execution Allows interactive review of messages Can be provided to MS for editing at source as it is: Generic: we evaluate that the system would be sufficient for most of the statistical

applications in any statistical office. Portable

– Generic package or specific application of the software can be shared among interested parties.

– It can be downloaded or distributed on a CD-ROM.– Written in Java so it can run on PC, Mac or any Java compatible environment.– Individual parameters can be customised to best suit specific needs:

• Update / change of edit rules• Update / change of classifications• Update / change of other execution parameters

– The parameter updates / changes can be also downloaded. Possible to make and distribute a non modifiable version for specific purpose.

Page 16: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

16 13 & 14 October 2008 IT Directors Group

EBB - Plan

Status: in production for AES, CVTS, External Trade 2008: ICT Household survey, CIS 2009: ESSPROS, Energy, SBS, BOP, Migration and new GSAST

domains

Page 17: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

17 13 & 14 October 2008 IT Directors Group

Derivation BB (DBB)

Derives new variables optionally with reference data (lookup tables) Intra-cell, intra-record (horizontal) and inter-record (vertical) derivations Reports execution Allows interactive review of messages Uses the same engine (subset) as editing BB

Status: under development End 2008: First version ready

Page 18: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

18 13 & 14 October 2008 IT Directors Group

Outliers Detection BB (ODBB)

Basic and statistical methods to identify outliers Methods:

– Hidiroglou-Berthelot and σ-gap– top and bottom – number or percentiles and conditions

Reports on the execution In future multidimensional distance measures

Status: in use for Urban Audit 2009: First implementations in agriculture and health

statistics

Page 19: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

19 13 & 14 October 2008 IT Directors Group

Disclosure Control BB (DCBB)

Performs confidentiality verification of tables Applies various masking techniques assuring confidentiality

of published statistics Based on CSB μ-argus and τ-argus

Status: Partially tested for SBS End 2008: Link to GSAST End 2009: SBS

Page 20: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

20 13 & 14 October 2008 IT Directors Group

Economic Indices BB (EIBB)

Calculates indices used in economy

– Weighted arithmetic mean

– Weighted geometric mean

– Weighted harmonic mean

– Laspeyres

– Paasche

– Lowe

– Edgeworth

– Bowley

– Fisher

– Laspeyres (Geometric)

– Paasche (Geometric)

– Törnqvist-Theil

– Laspeyres (harmonic)

– Paasche (harmonic)

– Chain index

– EKS(-S)

Status: ready for implementation, waiting for first requests

Page 21: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

21 13 & 14 October 2008 IT Directors Group

Imputation BB (IBB)

T.b.d. note: possibly based on BANFF software, any system should be really very similar to BANFF

Implementation of various mathematical imputation methods

Last BB to be developed Scope not yet established

Status: BB survey confirmed need for it Plan: start analysis in 2009 End 2010: alpha version

Page 22: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

22 13 & 14 October 2008 IT Directors Group

Seasonal adjustment BB (SABB) Methodology draws on Demetra + that is under

development – based on X-13 and Tramo-Seats core engines– to be developed : diagnostics, reconciliation,

aggregation, etc. – organisations involved: ESTAT, Banque Nationale de

Belgique, US Census Bureau, Banco de España

SABB – specifications methodological and IT architecture based

on Demetra + towards the end 2008 followed by development

Page 23: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

23 13 & 14 October 2008 IT Directors Group

ASSIST BB

User support tool Parallel to e-mail system (with attachments) Service request Request follow-up Searchable, central public knowledge database Decentralised help centres / persons Sub-systems by subject matter, geography or any other classification Access management (to appropriate parts of the system by administrative

privileges or subject matter)

Status: implemented for External Trade 2008:

– Implementation B6 – Implementation in MS

Page 24: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

24 13 & 14 October 2008 IT Directors Group

MH - Metadata Handler Integrated environment enabling the management of structural and reference

metadata in EurostatCovers: Structural metadata: data and metadata structure definitions, code lists,

classifications… Reference metadata: SDDS and ESMS metadata, quality reports…Provides Human user interfaces for viewing, creating and modifying metadata Interfaces to other applications so that other applications can upload and

retrieve metadata Export and import of metadata Common user access control for all metadata operationsEnables Coherent, reusable metadata across domains and through different stages

of the data life cycle Status: v1 under development (extension of the SDMX registry). End 2008: v1 in production, with two human interfaces plus Web Services

for applications. Link to GSAST. End 2009: v2 first release: Partial integration of EMIS and /or RAMON and

CODED. June 2010: v2 second release: full integration of horizontal metadata

management

Page 25: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

25 13 & 14 October 2008 IT Directors Group

EMIS (Eurostat metadata information system)

Supports the preparation and administrationof reference metadata

Status: v 2.1 in production, manages SDDS files Mid 2009: v 3 in production, management of ESMS

(Euro SDMX Metadata Structure)

Page 26: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

26 13 & 14 October 2008 IT Directors Group

MANAGER

The process monitoring and / or scheduling tool related to the production system. The scope can vary depending on the production system:

COMEXT is a tightly integrated system with minimal human intervention that does not require an external process management tool. It launches automatically processing steps and includes all the status information based on a design of the particular production process.

NAPS - there is no a priori defined process so a generic process scheduler can not be applied. Monitoring and reporting tool is foreseen.

GSAST: process management is native to SAS Enterprise Guide. Next version will have the possibility of conditional launching of process steps based on the result of predecessors.

2009: Automatic process scheduling in GSAST (New EG in SAS V9.2) 2011: Monitoring and reporting tool for NAPS

Page 27: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

27 13 & 14 October 2008 IT Directors Group

EUROBASE

New reference environment, to replace NewCronos.

Status– under development and parallel running.

2009– Java version of the user interface– replacing NewCronos

Page 28: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

28 13 & 14 October 2008 IT Directors Group

Data Explorer

To provide access to the statistical reference databases of Eurostat.

embargo single tool for all data and metadata based on Comext API and DB based on the principles of graphical tools highly interactive operation metadata is presented to the user shows relation of different types of metadata can be used inside Eurostat

Status: under development and testing. v2.1.4 ‘Try This’ is available for user testing on the Internet. September 2008: v2.2 full operational deployment. 2009: v3 integration of Table, Graph, Maps interface (TGM)

Page 29: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

29 13 & 14 October 2008 IT Directors Group

GSAST

Primary target - treating micro-data and operations of micro and macro-data from surveys

Based on SAS base, BI server and Enterprise Guide For unique or unusual processing requirements

Page 30: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

30 13 & 14 October 2008 IT Directors Group

GSAST - PlanCurrent 2008 2009 2010 2011

AES AES AES AES AES

CVTS3 CVTS3 CVTS3 CVTS3 CVTS3

ICT Household ICT Household ICT Household ICT Household

CIS CIS CIS CIS

STI STI STI

LFS LFS LFS

LACOST LACOST LACOST

EHIS EHIS EHIS

SES SES

INFOSOC INFOSOC

COINS COINS

CRIME CRIME

New F4 New F4

EU-SILC

LCI

CHOMM

JVS

HSAW

Public Health

Strikes

HBS

Page 31: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

31 13 & 14 October 2008 IT Directors Group

COMEXT

Tightly integrated production and dissemination environment with a wide range of generic statistical analysis functions and a powerful metadata management

Accommodates very large data volumes Methodologically coherent approach assuring maximum of

security and data integrity Assures timely production

Status: in production for external trade statistics, part of food safety. Dissemination data base, embargo, versioning and extraction and statistical calculation facility behind Data Explorer

December 2009: energy statistics, Esspros

Page 32: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

32 13 & 14 October 2008 IT Directors Group

NAPS

NAPS = National Accounts Production System System to target sub annual (seasonal time series) oriented data. To allow easy and direct interaction with atomic data on cell or

time series level: flexible approach, users can define their own calculations using high level MDT language and Oracle statistical functionalities.

Status: MDT in production for BOP in unit C4 2009:

– Detailed analysis of domains to migrate– Details of system designed

2010: – Pilot domain migrated– Start migration of dir. C applications

Page 33: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

33 13 & 14 October 2008 IT Directors Group

Other current production system: SAM, Eurocube, FAMESAM Most simple and straight-forward tool. Low-complexity, self-contained Microsoft Windows based tool (Visual Basic on

top of Oracle) designed specifically for applications that it is preferable to be self sufficient.

Eurocube It has a similar functionality to SAM. However it targets more complex multi-

dimensional applications requiring assistance from IT experts. It is based on Oracle Express – Oracle OLAP technology.

FAME FAME is a specialised database system with a wide range of functions for time

series storage and treatment. A number of complex and mission critical statistical production systems have

been developed using the FAME development language.

Status- Look for possibility of using BBs, started work on SAM link with EBB.- Support and maintain while studying future migrations.

Page 34: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

34 13 & 14 October 2008 IT Directors Group

Special applications

Current plan covers the majority of present production systems but special applications can exist outside CVD, such as:

LUCAS (statistics on land cover and land use): Uses special software image processing and spatial analysis. Terabytes of data.

Euro Business Register: First application based on registers in Eurostat.

GISCO (Geographical Reference Database for European Commission): Uses special software for geographic information systems.

Page 35: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

35 13 & 14 October 2008 IT Directors Group

Practical steps in implementation of CVD

Looking for opportunities - gradual migration to CVD

Proliferation of BBs in existing applications

Communicating the CVD in Eurostat

The CVD beyond Eurostat

Page 36: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

36 13 & 14 October 2008 IT Directors Group

Looking for opportunities - gradual migration to CVD

When the need arises in a production unit to migrate or implement a new application the procedure will be the following:

– Production unit express the need to their regular contact in unit B1 during the IT Masterplan (Schéma Directeur) exercise;

– If needed request is discussed in CAB/ITSC;– Solution and plan is proposed to production unit.

The choice of the production system is done taking into account:

– Technical aspects linked to the data production;– Easiness to implement a solution;– Time constraints;– Human resources:

• Availability of developers;• Tools already used in unit, directorate ;

– How does it fit with other ongoing and planned projects;– Compatibility with CVD strategy.

Page 37: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

37 13 & 14 October 2008 IT Directors Group

Proliferation of BBs in existing applications

For all new requests whenever possible, (i.e., desired functionality exists) available BBs are interfaced with specific statistical production applications to be used in data processing.

Page 38: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

38 13 & 14 October 2008 IT Directors Group

Communicating the CVD

Objectives At technical level

– Appropriate the CVD approach– Understand the CVD architecture

At managerial level– Get support in the implementation process

Means At technical level

– CVD seminar– Ad-hoc lunchtime presentations

At managerial level– ITSC (IT steering committee)– HUM (Head of Unit meeting)

Page 39: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

39 13 & 14 October 2008 IT Directors Group

The CVD beyond Eurostat

Rationalisation of statistical information systems is an objective of many NSI

– One session on architecture in the UNECE, Eurostat, OECD, MSIS meeting

– One session on rationalisation in last ITDG– Implementations in Ireland, New Zealand and Latvia– Developments in many countries

Opportunity to share components as a next step– SDMX – open (community) source tools– MSIS TF on tool sharing– Standardisation initiative in the ESS

Page 40: 13 & 14 October 2008 IT Directors Group Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

40 13 & 14 October 2008 IT Directors Group

Overview 2011

Single components are used for:– Data exchange: eDAMIS– Management of metadata: MH– Specialised statistical processing components: BBs – Reference: EUROBASE– Dissemination: Comext and Data Explorer

CVD production systems cover at least:– Applications in current dir. F and microdata treatment: GSAST– National accounts applications: NAPS– External trade and energy statistics: COMEXT

Current systems are maintained and evolving:– Linked to BBs: SAM and Eurocube– Based on the results of current migration plans: FAME