Upload
oswin-pearson
View
212
Download
0
Embed Size (px)
Citation preview
Best practice case:
Comparing the implementations of the Irish CDM and the Dutch DSC
ESSnet on microdata linking and data warehousing in statistical production
Harry Goossens – Statistics NetherlandsHead Data Service Centre / ESSnet [email protected]
ESSnet Data Warehousing 2
The CSO Corporate Data Model (CDM)
Underlying principle: 4 datastores
1. INPUT - raw data2. CLEAN UNIT - cleaned data3. AGGREGATE - aggregated data4. DISSIMINATION - published data
CDM was seen as ≈ active DWH
ESSnet Data Warehousing 3
The CSO Corporate Data Model (CDM)
Main characteristics:
All (statistical) processes must use the 4 datastores Processing systems interact on the data stores At some moments: snap shots,
which build next data store It is possible to work further on the same
(snap shotted) data store Simultanious updating of / on data is mainly
organisational issue
ESSnet Data Warehousing 4
The CSO Corporate Data Model
INPUTCLEANED
DATASETSAGGREGATEDATASETS DISSEMINATION
DATA
MANAGEMENT
STORE
ADMINISTRATIVE
DATA CENTRE
2 OPERATIONALIMPLEMENTATIONS
Surveys
Admin data
ESSnet Data Warehousing 5
Data Management Store (DMS)
First implementation of CDM Only survey data Data tables are created and populated through
the DMS applications. Metadata must be entered as the data tables
are created. Metadata capturing = minimal
bottleneck BR outside DMS (stand alone)
ESSnet Data Warehousing 6
CDM – Data Management Store
DA
TA
C
OL
LE
CT
ION
A
CT
IVIT
IES
INPUTCLEANED
DATASETSAGGREGATEDATASETS DISSEMINATION
D
M
S
APP – layer, incl. I/O interfaces
DMS meta layer – Basic descriptions
SHAREDINPUT
SHAREDCLEANED UNIT
AGGREGATESTORE
SNAPSHOTS
B
I
SYS 1
SYS 2
SYS n
Mainly surveys
ESSnet Data Warehousing 7
Administrative Data Centre (ADC)
Developed for organisational reasons Only Admin data A catalyst to exploit administrative data for
statistical purposes Interface with public authorities on admin data
flows to CSO Clearing house inside CSO for admin data Data governance with respect to admin data
ESSnet Data Warehousing 8
Administrative Data Centre (ADC)
Has an analysis layer R&D on available data To develop new datasets Without specific needs / demands from
statistics
ESSnet Data Warehousing 9
CDM – Administrative Data Centre
INPUTCLEANED
DATASETSAGGREGATEDATASETS DISSEMINATION
A
D
C
ADC meta layer
B
I
SYS 1
SYS 2
SYS n
DA
TA
C
OL
LE
CT
ION
A
CT
IVIT
IES
SOURCES DataProducts
E
T
L
ADC
Front
Door
LEAN INTERFACE
Only Admin Data
ESSnet Data Warehousing 10
Corporate Data Model CSO - Ireland
DA
TA
C
OL
LE
CT
ION
A
CT
IVIT
IES
INPUTCLEANED
DATASETSAGGREGATEDATASETS DISSEMINATION
D
M
S
A
D
C
APP – layer, incl. I/O interfaces
DMS meta layer – Basic descriptions
ADC meta layer
SHAREDINPUT
SHAREDCLEANED UNIT
AGGREGATESTORE
SNAPSHOTS
B
I
SYS 1
SYS 2
SYS n
DA
TA
C
OL
LE
CT
ION
A
CT
IVIT
IES
SOURCESData
Products
E
T
L
ADC
Front
Door
LEAN INTERFACE
ESSnet Data Warehousing 11
The CBS Data Service Centre (DSC)
The concept:
No data without metadata
Dedicated metadata model as basis
Strict distinction between:
Statistical data (facts & figures)
Conceptual metadata (definitions, description of quality,process activities etc.)
Steady states explicitly designed for re-use.
All metadata (of steady states) are generally accessible and are standardised as much as possible
ESSnet Data Warehousing 12
The CBS Data Service Centre (DSC)
What is it ?
Fundamental corner stone of the CBSBusiness Architecture
Central ‘vault’ with Steady States, linking:
statistical data (facts & figures)
conceptual metadata (description)
technical metadata (user’s guide)\
Documentation
Implementation of the Dutch metadata model
ESSnet Data Warehousing 13
The CBS Data Service Centre (DSC)
What offers it ?
Generic services:
Metadata coordination
Centralised data distribution
Authorisation management
Automatic process interfacing (in developement)
Archiving of statistical dataset
ESSnet Data Warehousing 14
The CBS Data Service Centre (DSC)
Why do we do it ?
Data-sharing / re-using dataIntermediary, archive and distribution, CBS data-vault. Maximum efficient use of data en metadata
Process guarantee / securitySafety net in case of calamity, static ‘froozen’ data
Process standardizationTransparancy & efficiency
Coordination of metadata & classificatiesOne, single source with elements for the statistical process
Process chain supportSteady States as data hubs
Generic process for data linkingDSC structure enables linking datasets with equal object type
ESSnet Data Warehousing 15
CBS Business Architecture: Layers
ClientCustomer
Policy, budget setting, catalogue of data sources
Conceptual output metadata &
product quality
Process metadata: workflow &
process quality
Process metadata: rules
Conceptual base/input metadata &
product quality
Determine statistical
information requirements
Designprocess model
Designstatistical
product
Designof
rules
Design data sources
Planning, monitoring, adjusting en adapting
To be achieved- (intermediary) products - quality
Schedules
Outputbase
Source Respondent
Catalogue: achieved- (intermediary) products- quality
Progress- audit trail- reports
Statistical process –information model
Preparation, dispatch
Make enquiries, standardisation &
verification
Making statisticsavailable
Linking, deriving &
editing
Disclosurecontrol,
making data publishable
Inputbase Microbase Statbase
Aggregation,estimation &integration
StrategyStrategy
DesignDesign
Chain Chain managementmanagement
StatisticsStatisticsProductionProduction
SteadySteadyStatesStates
DSC - Data StorageDSC - Data Storage
DSC – Metadata CatalogueDSC – Metadata Catalogue
ESSnet Data Warehousing 16
CBS Business Architecture: Steady States
ClientCustomer
Policy, budget setting, catalogue of data sources
Conceptual output metadata &
product quality
Process metadata: workflow &
process quality
Process metadata: rules
Conceptual base/input metadata &
product quality
Determine statistical
information requirements
Designprocess model
Designstatistical
product
Designof
rules
Design data sources
Planning, monitoring, adjusting en adapting
To be achieved- (intermediary) products - quality
Schedules
OutputbaseOutputbase
Source Respondent
Catalogue: achieved- (intermediary) products- quality
Progress- audit trail- reports
Preparation, dispatch
Make enquiries, standardisation &
verification
Making statisticsavailable
Linking, deriving &
editing
Disclosurecontrol,
making data publishable
InputbaseInputbase MicrobaseMicrobase StatbaseStatbase
Aggregation,estimation &integration
PostPostOutputbaseOutputbase
PrePreInputbaseInputbase
ESSnet Data Warehousing 17
DSC: What are Steady States ?
A steady state is a dataset together with information for its correct interpretation.
Rectangular Rows represent units (micro) or classes of units (macro) Columns represent variables
Heading: population, time Dataset design is like a template of a table:
only borders and heading 1 Dataset design, n Datasets
Data Service Centre - DSC
ESSnet Data Warehousing 18
DSC: Why Steady States ?
Reduce storage: Store once Re-use many times
Secure the statistical proces: Each steady state is a guaranteed fall back
point Improve consistency:
Every following process uses the same dataset Improve flexibility:
Enables independent, generic proces design
ESSnet Data Warehousing 19
Conclusions
Both CSO & CBS Use the same basic principle of 4 (static) stages/bases had the same 'drivers' to start DWH:
- re-use of data, - deconnecting input - output (= getting rid of stove pipes)
CSO strong focus on practical results, (succesfull) quick wins; 2 different implementations of the CDM organisational driver for ADC
CBS Strong focus on metadata model DSC = essential element of the business architecture 1 implementation supporting all processes
ESSnet Data Warehousing 20
Conclusions
Regarding the DWH ESSnet
S-DWH architecture covers both best practices ESSnet indicated right issues to focus:
- metadata- role/position BR
strong desire for knowledge exchange, learning from other NSIs
CSO = very helpful best practice case CSO acknowledges importance of ESSnet,
wants to stay closely involved