20
Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production Harry Goossens – Statistics Netherlands Head Data Service Centre / ESSnet Coordinator [email protected]

Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

Embed Size (px)

Citation preview

Page 1: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

Best practice case:

Comparing the implementations of the Irish CDM and the Dutch DSC

ESSnet on microdata linking and data warehousing in statistical production

Harry Goossens – Statistics NetherlandsHead Data Service Centre / ESSnet [email protected]

Page 2: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 2

The CSO Corporate Data Model (CDM)

Underlying principle: 4 datastores

1. INPUT - raw data2. CLEAN UNIT - cleaned data3. AGGREGATE - aggregated data4. DISSIMINATION - published data

CDM was seen as ≈ active DWH

Page 3: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 3

The CSO Corporate Data Model (CDM)

Main characteristics:

All (statistical) processes must use the 4 datastores Processing systems interact on the data stores At some moments: snap shots,

which build next data store It is possible to work further on the same

(snap shotted) data store Simultanious updating of / on data is mainly

organisational issue

Page 4: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 4

The CSO Corporate Data Model

INPUTCLEANED

DATASETSAGGREGATEDATASETS DISSEMINATION

DATA

MANAGEMENT

STORE

ADMINISTRATIVE

DATA CENTRE

2 OPERATIONALIMPLEMENTATIONS

Surveys

Admin data

Page 5: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 5

Data Management Store (DMS)

First implementation of CDM Only survey data Data tables are created and populated through

the DMS applications. Metadata must be entered as the data tables

are created. Metadata capturing = minimal

bottleneck BR outside DMS (stand alone)

Page 6: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 6

CDM – Data Management Store

DA

TA

C

OL

LE

CT

ION

A

CT

IVIT

IES

INPUTCLEANED

DATASETSAGGREGATEDATASETS DISSEMINATION

D

M

S

APP – layer, incl. I/O interfaces

DMS meta layer – Basic descriptions

SHAREDINPUT

SHAREDCLEANED UNIT

AGGREGATESTORE

SNAPSHOTS

B

I

SYS 1

SYS 2

SYS n

Mainly surveys

Page 7: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 7

Administrative Data Centre (ADC)

Developed for organisational reasons Only Admin data A catalyst to exploit administrative data for

statistical purposes Interface with public authorities on admin data

flows to CSO Clearing house inside CSO for admin data Data governance with respect to admin data

Page 8: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 8

Administrative Data Centre (ADC)

Has an analysis layer R&D on available data To develop new datasets Without specific needs / demands from

statistics

Page 9: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 9

CDM – Administrative Data Centre

INPUTCLEANED

DATASETSAGGREGATEDATASETS DISSEMINATION

A

D

C

ADC meta layer

B

I

SYS 1

SYS 2

SYS n

DA

TA

C

OL

LE

CT

ION

A

CT

IVIT

IES

SOURCES DataProducts

E

T

L

ADC

Front

Door

LEAN INTERFACE

Only Admin Data

Page 10: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 10

Corporate Data Model CSO - Ireland

DA

TA

C

OL

LE

CT

ION

A

CT

IVIT

IES

INPUTCLEANED

DATASETSAGGREGATEDATASETS DISSEMINATION

D

M

S

A

D

C

APP – layer, incl. I/O interfaces

DMS meta layer – Basic descriptions

ADC meta layer

SHAREDINPUT

SHAREDCLEANED UNIT

AGGREGATESTORE

SNAPSHOTS

B

I

SYS 1

SYS 2

SYS n

DA

TA

C

OL

LE

CT

ION

A

CT

IVIT

IES

SOURCESData

Products

E

T

L

ADC

Front

Door

LEAN INTERFACE

Page 11: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 11

The CBS Data Service Centre (DSC)

The concept:

No data without metadata

Dedicated metadata model as basis

Strict distinction between:

Statistical data (facts & figures)

Conceptual metadata (definitions, description of quality,process activities etc.)

Steady states explicitly designed for re-use.

All metadata (of steady states) are generally accessible and are standardised as much as possible

Page 12: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 12

The CBS Data Service Centre (DSC)

What is it ?

Fundamental corner stone of the CBSBusiness Architecture

Central ‘vault’ with Steady States, linking:

statistical data (facts & figures)

conceptual metadata (description)

technical metadata (user’s guide)\

Documentation

Implementation of the Dutch metadata model

Page 13: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 13

The CBS Data Service Centre (DSC)

What offers it ?

Generic services:

Metadata coordination

Centralised data distribution

Authorisation management

Automatic process interfacing (in developement)

Archiving of statistical dataset

Page 14: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 14

The CBS Data Service Centre (DSC)

Why do we do it ?

Data-sharing / re-using dataIntermediary, archive and distribution, CBS data-vault. Maximum efficient use of data en metadata

Process guarantee / securitySafety net in case of calamity, static ‘froozen’ data

Process standardizationTransparancy & efficiency

Coordination of metadata & classificatiesOne, single source with elements for the statistical process

Process chain supportSteady States as data hubs

Generic process for data linkingDSC structure enables linking datasets with equal object type

Page 15: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 15

CBS Business Architecture: Layers

ClientCustomer

Policy, budget setting, catalogue of data sources

Conceptual output metadata &

product quality

Process metadata: workflow &

process quality

Process metadata: rules

Conceptual base/input metadata &

product quality

Determine statistical

information requirements

Designprocess model

Designstatistical

product

Designof

rules

Design data sources

Planning, monitoring, adjusting en adapting

To be achieved- (intermediary) products - quality

Schedules

Outputbase

Source Respondent

Catalogue: achieved- (intermediary) products- quality

Progress- audit trail- reports

Statistical process –information model

Preparation, dispatch

Make enquiries, standardisation &

verification

Making statisticsavailable

Linking, deriving &

editing

Disclosurecontrol,

making data publishable

Inputbase Microbase Statbase

Aggregation,estimation &integration

StrategyStrategy

DesignDesign

Chain Chain managementmanagement

StatisticsStatisticsProductionProduction

SteadySteadyStatesStates

DSC - Data StorageDSC - Data Storage

DSC – Metadata CatalogueDSC – Metadata Catalogue

Page 16: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 16

CBS Business Architecture: Steady States

ClientCustomer

Policy, budget setting, catalogue of data sources

Conceptual output metadata &

product quality

Process metadata: workflow &

process quality

Process metadata: rules

Conceptual base/input metadata &

product quality

Determine statistical

information requirements

Designprocess model

Designstatistical

product

Designof

rules

Design data sources

Planning, monitoring, adjusting en adapting

To be achieved- (intermediary) products - quality

Schedules

OutputbaseOutputbase

Source Respondent

Catalogue: achieved- (intermediary) products- quality

Progress- audit trail- reports

Preparation, dispatch

Make enquiries, standardisation &

verification

Making statisticsavailable

Linking, deriving &

editing

Disclosurecontrol,

making data publishable

InputbaseInputbase MicrobaseMicrobase StatbaseStatbase

Aggregation,estimation &integration

PostPostOutputbaseOutputbase

PrePreInputbaseInputbase

Page 17: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 17

DSC: What are Steady States ?

A steady state is a dataset together with information for its correct interpretation.

Rectangular Rows represent units (micro) or classes of units (macro) Columns represent variables

Heading: population, time Dataset design is like a template of a table:

only borders and heading 1 Dataset design, n Datasets

Data Service Centre - DSC

Page 18: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 18

DSC: Why Steady States ?

Reduce storage: Store once Re-use many times

Secure the statistical proces: Each steady state is a guaranteed fall back

point Improve consistency:

Every following process uses the same dataset Improve flexibility:

Enables independent, generic proces design

Page 19: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 19

Conclusions

Both CSO & CBS Use the same basic principle of 4 (static) stages/bases had the same 'drivers' to start DWH:

- re-use of data, - deconnecting input - output (= getting rid of stove pipes)

CSO strong focus on practical results, (succesfull) quick wins; 2 different implementations of the CDM organisational driver for ADC

CBS Strong focus on metadata model DSC = essential element of the business architecture 1 implementation supporting all processes

Page 20: Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production

ESSnet Data Warehousing 20

Conclusions

Regarding the DWH ESSnet

S-DWH architecture covers both best practices ESSnet indicated right issues to focus:

- metadata- role/position BR

strong desire for knowledge exchange, learning from other NSIs

CSO = very helpful best practice case CSO acknowledges importance of ESSnet,

wants to stay closely involved