Upload
amberly-banks
View
222
Download
0
Embed Size (px)
DESCRIPTION
3 Statistics: How? Specify a statistic Design a process that will produce this statistic Build a system that will execute this process
Citation preview
1
CORE
Bringing the GSBPM to life!
J. Linnerud & J.-P. Kent
2
Main points
1. An ideal development process for a statistical system
2. Why this ideal usually is not met
3. How CORE aims at supporting this ideal development process
3
Statistics: How?
• Specify a statistic• Design a process that will
produce this statistic• Build a system that will execute
this process
4
What is the product?• Define a statistic– What does it say?
• Measures, dimensions, explanations…– What does it look like?
• Tables, Press release, Analytic paper…– What is the input?
• Population, variables, data sources…– What is the relation between input and
output?• Methods to apply
5
Statistics: How?
• Specify a statistic• Design a process that will
produce this statistic• Build a system that will execute
this process
6
How to produce the statistic
• Model the data– Input, output, intermediary results
• Specify process steps to apply the chosen statistical methods
• Integrate these steps in a process flow
7
Statistics: How?
• Specify a statistic• Design a process that will
produce this statistic• Build a system that will execute
this process
8
Let the machine do it
• Implement the data models• Implement the process steps• Implement the process flow
9
Why is this approach good? (1)
• Variability vs. stability– Statistical products are specific
• There is a great variety of products• A given product will vary in time
– Statistical processes are generic• The same method can be applied to many products• Process steps implementing methods can be reused• A significant change in the product can be
implemented with some simple changes in some process steps
10
Why is this approach good? (2)
• It allows a clean specification of the product– In terms of what it is– In terms of what is used– In terms of what the relation is
between input and output
11
• It separates product design from IT– The product is defined in terms of what
it is (and not how it is produced)– The process is defined in terms of what
it does (and not how it is implemented)– Only the system is defined in technical
terms
Why is this approach good? (3)
12
• It supports optimalisation of process development– Possibility of developing
standardised, re-usable process steps– Generic process steps are not defined
for an actual statistic, but for use in different statistics
Why is this approach good? (4)
13
Main points
1. An ideal development process for a statistical system
2. Why this ideal usually is not met
3. How CORE aims at supporting this ideal development process
14
The usual approach
• Statisticians present a project in which product and process are combined
• IT people specify and build a system that creates the product by performing the process
15
Why is the usual approach inefficient?
• Complexity• Process & product are tightly coupled
• Rigidity• Maintenance is labour-intensive
• Specificity• It is not easy to devise a generic solution
when developing for a specific product
16
Main points
1. An ideal development process for a statistical system
2. Why this ideal usually is not met
3. How CORE aims at supporting this ideal development process
17
Promoting the better approach
1. The CORA and CORE projects (Jenny)
2. Bringing the results into practice (Jean-Pierre)
18
CORACORA
CORA ESSnet• COmmon Reference Architecture
(CORA)Financed by Eurostat under 2009 Statistical WorkprogrammeCountries involved: it (coordinator), ch, dk, lv, nl, no, seDuration: October 2009 - October 2010
19
CORACORA
CORA deliverables• Questionnaire• Set of Requirements• State of the Art• Definition of the Layered Model• Technical Annex• Instruction Manual• Commercial and Legal Foundations for
the Exchange of Software between Statistical Offices
• Requirements Checklist for CORA Tools• Recommendations for CORA Tools
20
After CORA … CORE!
COmmon Reference Environment (CORE)Financed by Eurostat under 2010 Statistical WorkprogrammeCountries involved: it (coordinator), fr, nl, no, pt, se Duration: December 2010 - January 2012
21
CORE Workpackages• Design of the information model according to
GSBPM and alignment with NSI's information models
• Generic interface design for interconnecting GSBPM sub-processes
• Research workflow solutions for process management
• Implementation library for generic interface and production chain for .NET
• Implementation library for generic interface and production chain for Java
22
Practical usage of CORA / CORE
• Modeling a process in terms of services (CORA)
• Classifying services (CORA)• Making services platform-
independent (CORE)
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
CORACORA
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
An example process
• A transport statistic– Input:
• Loading reports• Unloading reports
– Date, time, place, type & quantity goods, type vehicle
– Output:• Monthly transport data
– Same data also used for time series
CORACORA
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
Modeling approach
• Use the CORA space grid
CORACORA
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
Microdata
Macrodata
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
Modeling approach
• Use the CORA space grid• Display statistical services in the
appropriate cells
CORACORA
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
Aggregate
Macroediting
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
Modeling approach
• Use the CORA space grid• Display statistical services in the
appropriate cells• Join services with arrows to show
the dependencies
CORACORA
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
Aggregate
Macroediting
Figures
Time series
Statistic
Population
Unit
Variable
Value
3Build
1Specify Needs
2Design
6Analyse
4Collect
5Process
9Evaluate
7Disseminate
8Archive
Monthly Transport Publication Confidentialty control
Select period data
Integrate data
ArchivePublication data
Archive TimeSeries data
Supply period data
AggregateArchive Statistic data
Macroediting
Microediting ArchiveUnit data
Compute distance
Combine
Archiveobs. vars.
Download
Outlier detection
Error detection
Correct outliers
Correct variables
?
?
32
CORACORA
A traditional service
Tool X
Model (X) Model (X)
Script (X)
Input (X) Output (X)
33
CORACORAA CORA service
Tool X
Model (X) Model (X)
Script (X)
Input (X) Output (X)
Model (CORA) Model (CORA)
Script (CORA)
CV CV CVCV CV
Input (CORA) Output (CORA)
CV = Convertor
Logging
Tool Y
Model (Y) Model (Y)
Script (Y)
Input (Y) Output (Y)
CV CV CVCV CV