27
SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising Official Statistics “ SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR DESIGNING DATA STRUCTURE DEFINITIONS (DSDs)

SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

  • Upload
    letruc

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising Official Statistics “

SDMX STATISTICAL CAPACITY BUILDING

GUIDELINES FOR DESIGNING DATA

STRUCTURE DEFINITIONS (DSDs)

Page 2: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Overview

• Design principles

• Exchange context

• Design process

• Data structuring approaches

• DSD analysis: STES as example

SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising Official Statistics “

Page 3: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Design Principles

Structural

• Parsimony

• Simplicity

• Purity

• Unambiguousness

• Exhaustiveness

• Orthogonality

Other

• Re-use of existing artefacts

• Flexibility and future needs

• Fitness for use throughout

statistical business process

• User-friendliness

Page 4: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Data Exchange Context

• Single- or multi-domain

• Single- or multi-purpose

• Type of data

• Human or machine as recipient

• Level of data exchange

• Role in data exchange

• Process pattern

• Phase of statistical process

Page 5: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Design Process

1. Specify context

2. Identify relevant existing DSDs

3. Check DSD suitability

4.2. Use suitable DSDs

4.3. Define new DSDs

5. Define supporting artefacts

4.1. Define modified DSDs

available not available

partly suitable suitable not suitable

Specify context

Identify relevant

existing DSDs

Check DSD

suitability

Define

modified DSDs

Use suitable

DSDs

Define new

DSDs

Define supporting

artefacts

Define new

DSDs

Page 6: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Design Process

Define new DSDs

4.3.1. Specify concepts

4.3.2. Specify code lists

4.3.3. Specify data formats

4.3.4. Assemble DSDs

Specify

concepts

Specify code

lists

Specify data

formats

Assemble

DSDs

Specify

concepts

Page 7: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Design Process

Specify concepts

4.3.1.2. Identify relevant existing concepts

4.3.1.3. Check concept suitability

4.3.1.4.2. Define new concepts

4.3.1.5. Define concept roles

4.3.1.4.1. Use suitable concepts

suitable not suitable

available not available

4.3.1.6. Define groups

4.3.1.1. Decide structuring approach

revise revise

4.3.1.7. Define attribute attachment levels

Structuring

approach

Relevant

concepts …

Concepts

suitable?

Use! Define new!

Define

concept roles

Define groups

Define

attachment levels

Page 8: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Design Process

Define new DSDs

4.3.1. Specify concepts

4.3.2. Specify code lists

4.3.3. Specify data formats

4.3.4. Assemble DSDs

Specify

concepts

Specify code

lists

Specify data

formats

Assemble

DSDs

Specify code

lists

Page 9: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Design Process

Specify code lists

4.3.2.1. Identify relevant existing code lists

4.3.2.2. Check code list suitability

4.3.2.3.2. Define modified code lists

4.3.2.3.3. Define new code lists

4.3.2.3.1. Use suitable code lists

suitable not suitablepartly suitable

available not available

Relevant code

lists available?

Code lists

suitable?

Use! Modify! Define new!

Page 10: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Design Process

Iterative

4.3.1. Specify concepts

4.3.2. Specify code lists

4.3.3. Specify data formats

4.3.4. Assemble DSDs

Specify

concepts

Specify code

lists

Specify data

formats

Assemble

DSDs

Specify

concepts

Page 11: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Number and content of dimensions

Number of DSDs

FEWER CONCEPTS AND DIMENSIONS IN THE KEY

NOT COMPLETELY INDEPENDENT:

LARGER NUMBER OF DSDs

Data structuring approaches

Page 12: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

DATA CHARACTERISTICS : C1 C2 C3 C4 Sex Age Sector Employment status…

Composite concepts: More characteristics = 1 concept e.g. Sex and Age

Pure concepts: 1 characteristic = 1 concept Sex; Age; Sector; …

wider use of composite concepts

lower number of dimensions

Number and content of dimensions

Page 13: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Horizontal complexity

V e r t i c a l c o m p l e x i t y

Codelist1

1

2

--

--

--

K1

Codelist2

1

2

--

--

--

K2

CodelistN

1

2

--

--

--

KN

………

Key: Dim1.Dim2………………………………….DimN

Many pure

Few mixed

Pure vs. composite concepts

Page 14: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

● clean data structure

● flexible in terms of mappings to other data structure… may be

mapped to any mixed representation

● flexible in terms of defining queries (for a skilled user)

● short and simple codelists

● long observation keys

● difficult to handle by end user (long codes; many dimensions) but for

skilled users is more flexible

● special values (not applicable; total) widely used

● creates sparseness

● needs many constraints (due to sparseness)

Some of the critical points may be overcome through a different strategy in choosing the number of DSDs. More DSDs reduce sparseness and the need for constraints, and would result in shorter keys.

Many pure concepts

Page 15: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

All pure concepts

Too many?

Composite concepts

Many different DSD’s

Trade-off

Strategies

Page 16: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

SDMX technical notes annex 6 (343)

“Avoid composite dimensions”

but in particular context they may be useful

Eg: to disseminate few key economic indicators

(multi-domain)

Composite concepts

Page 17: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

ONE DSD or MANY DSDs?

A possible approach: Master and satellite artefacts (derived via constraints)

Number of DSDs

Page 18: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Data exchange scenario

Concepts SC1 SC2 SC3 …… SCm

# 1 X X X X X

# 2 X O X X X

# 3 O X O O O

…… …. …. … … …

# n X O O X X

Master DSD matrix

Page 19: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Master DSD

DSD1 DSD2 ………..DSDn

constraints

Multiple satellite DSDs

(unique key structures)

Master and satellite DSDs

Multiple satellite DSDs

Page 20: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Master DSD

Dataflow 1 dataflow 2 ……….. dataflow n

constraints

ONE DSD

Master and satellite DSDs

One DSD, multiple data flows

Page 21: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

A bit different approach: Master DSD

DSD1 DSD2 ………..DSDn

Dropping not

relevant

dimensions

Multiple satellite DSDs

(multiple key structures)

Master and satellite DSDs

Multiple satellite DSDs

Page 22: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

CONCEPT DESCRIPTION CODE LIST ID

SUBJECT Subject matter CL_SUBJECT

MEASURE Quantitative variable value CL_MEASURE

FREQ Periodicity CL_FREQ

REFERENCE_AREA “Reference area” and/or “Counterpart

area”

CL_AREA

ADJUSTMENT Seasonal adjustment CL_ADJUSTMENT

UNIT Generic list with code values CL_UNIT

TIME_PERIOD Defines the observation period

CONCEPT ATTRIBUTES CODE LIST ID

UNIT_MULT Indicating the magnitude in the units

of measurements

CL_UNIT_MULT

OBS_STATUS The observation status CL_OBS_STATUS

Example: Short-term Economic Statistics

DSD: Dimensions and attributes

Page 23: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

• Reuse of existing code lists and future needs:

Adjustment, frequency, reference area, subject

matter.

• Parsimony, simplicity, density:

DSD is not redundant and has a small number of

dimensions. The DSD provides data for most of

the cells.

• Purity:

In this case we have the code list CL_UNIT

which is not pure but adds to simplicity.

DSD analysis

Design principles

Page 24: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

• Unambiguousness and orthogonality:

The code list MEASURE seems to be ambiguous

and CL_UNIT and CL_MEASUREMENT show

overlaps.

• Exhaustiveness:

It is possible to identify all data in the flow.

DSD analysis

Design principles

Page 25: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

• The DSD includes the dimension MEASURE to

differentiate the indicators expressed as an index

number from the rest.

• This item was added to the DSD as an independent

dimension, when by its nature, could be

incorporated into the CL_UNIT dimension.

• In the code list of the UNIT dimension the following

codes of different nature were included:

Physical unit measures

Monetary units

Several base periods for index numbers

DSD analysis

Dimensions

Page 26: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

CL_MEASURE

Code Description

ST Number, rate, value

IXNB Index

CL_UNIT

Code Description

1995100 1995=100

2000100 2000=100

2003100 2003=100

2005100 2005=100

2008100 2008=100

2010100 2010=100

AUD Australian Dollar

BPA Barrels per day

BPM Barrels per month

BRL Brazilian Real

CAD Canadian Dollar

CHF Swiss Franc

CLP Chilean Peso

CNY Yuan Renminbi

CZK Czech Koruna

DKK Danish Krone

DW Dwellings

EUR Euro

GBP Pound Sterling

GWH Gigawatt hour

HUF Forint

IDR Rupiah

ILS New Israeli Sheqel

INR Indian Rupee

ISK Iceland Krona

JB Jobs

Description: Generic list

with code values

(including currency,

base period, measures)

Description: A summary

(means, mode, total, index,

etc.) of the individual

quantitative variable values

for the statistical units in a

specific group (study

domains).

DSD analysis

Code lists

Page 27: SDMX STATISTICAL CAPACITY BUILDING GUIDELINES FOR ... 2013 Session 8.5 - Guidelines for designing... · SDMX Global Conference 2013, Paris “ Global SDMX Implementation : Modernising

Eliminate the MEASURE dimension.

Add to the CL_UNIT the code IXNB = Index

number, so that indicators expressed as

indices can be identified.

Eliminate from the CL_UNIT the codes for

base period.

Create a new concept to specify the base

period with its own code list / format.

DSD analysis

Suggestions