Upload
milo-collins
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Business needs and context for DDI and SDMX
ESS DDI/SDMX Workshop2013.06.05
Overview
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
Standard – ISO definition
• …• provide rules, guidelines or characteristics
• for common and repeated use
• for activities [eg production of official statistics] or their results [eg statistical products and services]
• aimed at the achievement of the optimum degree of order in a given context
Standards promote interoperability
• “interoperability”– ability of diverse systems and organizations to work together
(inter-operate).• different subject matter domains and functional teams within an agency• specific collaborations between agencies, and/or• as an industry (ESS vision, HLG vision)
• Levels of interoperability– Technical– Semantic– Organisational (eg business process alignment)– Legal
Efficiencies
• Interoperability leads to economies of scale– supports shared development, deployment and evolution of
the processes, methods, IT components and information which represent the “means of production”
• possible vendor interest
• In addition, standards reduce– non productive decision making processes
• which adds cost and time to projects
– unnecessary diversity• which adds cost (eg training, maintenance, lost opportunities)
longer term
ESS.VIP programme
Transformation programme for the modernisation of the production systems in the European Statistical System (ESS) through:• moving towards more common solutions and
shared services and environment• economies of scale and efficiency gains,
sharing costs
ESS.VIP business and information principles
• Maximum reuse of existing process components and segments
• Metadata driven processes allowing adaptation and extension to other contexts
• New business process built as a sequence of modular process steps / services
• Information objects structured according to available information models and stored in corporate registries/repositories in view of reuse
• Adherence to industry and open standards as available (e.g. Plug & Play)
Metadata Driven Business Processes
• systematic and consistent use of metadata to determine the inputs, outputs and behaviour of a statistical business process
• Characteristics1. Metadata is used systematically
• Metadata is used in a planned and managed way across the organisation.
2. Metadata is used consistently• Authoritative ‘single source of truth’ metadata is used throughout the end-to-end lifecycle
of an activity and/or across activities.
3. Metadata is used actively • Metadata is used to guide definition and automate execution of statistical processes• Metadata is structured so as to be machine-consumable
Metadata Driven Business Processes
• Operational benefits include• Reduced time and cost of statistics production• Improved quality of statistical products• Increased agility in meeting new demands for statistical products and
services.• Increased agility in harnessing new sources of statistical data.
• Strategic benefits• Provides a basis for designing and sharing components which can be
configured flexibly, using agreed business objects, to meet diverse needs and operate in diverse environments • This is particularly relevant when defining the information interfaces and business
behaviours of such components• Supports standards based industrialisation / modernisation
IATA : International Air Transport Association• Founded 1945• 2004 : Simplifying the business
• 5 initiatives to save $6.5 billion per year• Includes Bar Coded Boarding Pass
Information Models Standards
Objectives:• To ensure that ESS.VIP have access to a set of agreed-upon
standards supporting the modernisation of statistical production processes.
• To increase coherence between standards, at the same time ensuring that these are consistent with best practices and recommendations from the international community.
• To define information models that can be used across the ESS to model structural metadata for micro-data and aggregated data.
• To set up guidelines for designing and documenting business processes.
• To provide support mechanisms (e.g., capacity-building and training).
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
The journey beings….
• Importance and value of agreeing on common standards for metadata to be used structurally in the statistical production process has been recognised since at least 2009.
• Many existing standards were identified which could provide useful support for specific purposes
• Two existing standards were identified as providing broadest (not necessarily comprehensive) support– DDI and SDMX
Characterizing the Standards: DDI
DDI Lifecycle can provide a very detailed set of metadata, covering:– The study or series of studies– Many aspects of data collection, including surveys and
processing of microdata– The structure of data files, including hierarchical files
and those with complex relationships– The lifecycle events and archiving of data files and
their metadata– The tabulation and processing of data into tables
(Ncubes)
It allows for a link between microdata variables and the resulting aggregates
Characterizing the Standards: SDMX
Describes the structure of aggregate/dimensional data (“structural metadata”)
Provides formats for the dimensional data Provides a model of data reporting and dissemination Provides a way of describing and formatting stand-
alone metadata sets (“reference metadata”) Provides standard registry interfaces, providing a
catalogue of resources Provides guidelines for deploying standard web
services for SDMX resources Provides a way of describing statistical processes
The SDMX-DDI approach
Informal meetings (2010-2013) between members of SDMX and DDI communities
Initiative of the SDMX Secretariat through its Technical Working Group
Approach to using SDMX and DDI interchangeablyNow, we are at the stage where implementations are being investigated and prototyped– Not “if”, but “how”
DDI DDISDMX
An initial broad overlay on GSBPM(2010)
GSBPM, DDI and SDMX: towards a complete system?
DDI DDISDMXSDMX
SDMX
DDI offers a very rich model for the documentation of micro-data
SDMX offers a very integrated exchange platform for statistical outputs (IT architectures, tools, web services)
DDI and SDMX
The combined use of both standards could allow a higher level of integration of the complete production process
But: The devil is in the detail!
Dealing with the devil….• Need to consider other context for a business process
– eg are you are collecting, processing and analysing macrodata or microdata?
• Both standards have the capacity to be “stretched” to support many things.– eg SDMX Reference Metadata can carry any information
• How to decide what is appropriate?– Neither standard was originally designed to support all needs for structured
metadata associated with all phases of the statistical production process– It would be useful to have common agreement on business definitions and
purposes for this metadata so “business fit” (and integration) can be considered, not just technical feasibility.
Common Generic lndustrialised Statistics
GSBPM GSIM
Methods Technology
Business Concepts Information Concepts
Statistical HowTo Production HowTo
conc
eptu
alpr
actic
al
Common Generic lndustrialised Statistics
GSBPM GSIM
Methods Technology
Business Concepts Information Concepts
Statistical HowTo Production HowTo
conc
eptu
alpr
actic
al
GSIM is complementary to GSBPM
A model is needed to describe information objects and flows within the
statistical business process
What is GSIM?
A reference framework of information objects setting out definitions and (commonly agreed) attributes and relationships
Provides :• Information model for “business objects” at the conceptual (and, to
some degree, logical) level• Common (reference) semantics
Does not provide• Physical representation for information objects
Relationship of GSIM with SDMX/DDI• Alignment was “designed in” where relevant• In adoption/implementation, complementary (with synergies) but no
formal dependency.
24
Business Production
ConceptsStructures
CONCEPTS
PRODUCTIONBUSINESSSTRUCTURES
Statistical Need
Business Case
PopulationConcept
Statistical ProgramDesignchanges
design of
Statistical Program
Statistical Activity
has
includes
Data Channel
has
DataResource
uses
Process Step
Data Set
uses
includes
UnitClassification Variable
Data Structure
specifies
Process output
Process Input
specifies
has
describes
identifies defines
measuresdefines
is associated with
comprises
describes
specifies
may include
may include
may initiate
comprises
• Acquisition Activity
• Production Activity
• Dissemination Activity
initiates
GSIM Timeline• GSIM V1.0 was released in December 2012.
• The most detailed documentation of GSIM is UML in Enterprise Architect– More than 100 information objects
• Higher level views and a glossary are also provided.
• The next level of detail regarding correspondences between GSIM Information Objects and constructs in DDI & SDMX was completed in May.– Included general identification of “gaps”, “overlaps”, strong alignments and
partial alignments.• Important not to see standards as completely “fixed” in current form
– A further level of detail will be required to arrive at concrete recommendations for representing GSIM objects using SDMX and DDI
Analysis of use cases
The SDMX TWG has been defining a set of relevant use cases where the two standards could be compared and, if possible, used together:1. Survey data collection2. Administrative and register data3. Combined use of DDI and SDMX4. Micro-data access and on-demand
tabulation of micro-data5. Metadata and quality reporting
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
Better to have one standard rather than two?
• Theoretically, under ideal circumstances, perhaps, but in practice…– How many standards are there associated with the components to build a car or
a house?– The press chose to have standards for Images, News, Events, Sports
• What is often more important in practice is to define a “standard” means to harness multiple standards to support the interoperability needs of a particular industry.– More on “how” later– It is better to have a suite of standards each of which support particular needs
well rather than a single standard which provides mediocre support for some needs?
– Commonly, only some of the relevant underlying standards are “indigenous” to the industry
• This is partly because many interoperability needs are broader than a single industry.
Considerations• Both DDI and SDMX have constituencies beyond official statistics,
eg– Research Institutes, Data Archives for DDI– Central banks for SDMX
• This is positive in several regards– Broader interoperability, broader economies of scale– Shared cost of maintaining and supporting the standard
• If proposed further evolutions of DDI and SDMX would add complexity without value for other constituencies, or would contradict their business model, these may be resisted.– (DDI/SDMX interoperability interests some other constituencies)
How about….
• The official statistics industry develops and maintains an independent representation standard based on SDMX, DDI and on good models from statistical agencies?
Downsides include….
• Standards are– slow and costly to design and agree– costly to document and to support through tools and expertise– costly to maintain in terms of improving fitness for purpose and
evolving as business needs change
• If official statistics adopt similar but increasingly divergent standards to SDMX and DDI, external interoperability and economies of scale will decrease over time– implications include decrease in possible vendor/market interest
How about….
• ….we use SDMX for everything
• Given one aim is structured metadata to drive business processes then a lot of required metadata is not structurally defined in the SDMX Information Model– Extending SDMX to model this structurally would make the
standard much larger• Central Banks (and others) do not seem enthusiastic about this
– Reference metadata, however, underpinned by metadata structure definitions, can carry any information
• Reference metadata, however, is not necessarily intended (and sometimes not readily modelled) for structural use
Considerations
• Structural use of reference metadata in a standard manner requires common semantics– eg, one quality declaration held as reference metadata is not
necessarily comparable with another unless both are structured according to common semantics (eg ESMS)
– There would need to be a large exercise of defining common semantics for SDMX reference metadata to be used for structural purposes
• Where semantics are already defined in DDI, would we choose something different?
– If Yes, why invest in defining and maintaining different semantics?– If no, we are – in effect - representing DDI in SDMX.
» Why not also support DDI syntax, with an option to use DDI tools?» The option of using SDMX reference metadata for this purpose may, however, be
useful in other circumstances.
Concept of profiles on standards• Many industries and developments have faced similar challenges.
• “Application profiles” refer to a way of applying one or more standards in a particular context, eg – an industry and/or environment, or– an initiative, or– IT applications
• Examples– The W3C standard for representing dates and times is a profile on ISO
8601– INSPIRE, FGDC, ANZLIC etc use profiles on common ISO standards for the
semantics and representation of geospatial metadata
Possible application to official statistics
• An overall profile setting out how the official statistics industry proposes DDI and SDMX be used to support structured metadata needs of statistical business processes could be a target.– The overall profile could be built up progressively as
practical business needs related to particular business functions / sub-processes within the GSBPM are agreed
• Helps minimise the risk of unnecessary differences in the way semantically equivalent metadata (eg classifications) are represented for different business operations
• Specific IT applications can define which (usually small) subset of the overall profile they support
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
Business Perspective
• Most business staff should not need a detailed knowledge of SDMX and DDI.– They should understand aspects of a common “business
level” information model for the statistical information objects which are relevant to their work
– Somewhat similarly• most users of the web don’t have a detailed knowledge of
HTML– They do, however, experience impacts if developers of browsers and
web pages get it wrong (or right)
• Most of those putting the INSPIRE directive into effect don’t need a detailed knowledge of ISO 19115.
Developer Perspective
Ideally, most• developers of new IT components to support business
processes,• staff responsible for selecting & configuring IT
components to support specific statistical business processes
won’t need a detailed knowledge of SDMX and DDI.
• They may need to understand aspects of any agreed “industry profile” that are relevant to their work.
This would seem to require• A core team who, collectively, understand
– the standards and what they support– “as is” business operations and “to be” target of transformation– the perspective and needs of business staff and developers.
• Effective ongoing engagement with business and developer communities to ensure proposed approaches will meet their needs
• requires an appropriate level of common language underpinned by common understanding
– GSBPM and GSIM as common points of reference can support some, but not all, of this communication
• An iterative, agile approach – coherent but not monolithic - across the range of metadata requirements
Possible Dynamics• Different aspects of exploring and proposing “industry application” (eg
regarding different types of metadata and regarding support for different business functions) could be led by teams in different agencies– This is more efficient and effective than seeking to determine everything from
first principles in a single committee?
• There would, however, need to be active, practically oriented review by many agencies to ensure all business considerations, including local considerations, were supported to the extent which was practical.
• There would also need to be checking for coherence and consistency across different “packages” of the work.
Part IV
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
“Profiles” onDDI
SDMXetc
“Canonical”
InformationStandards
Conceptual Model“Business Objects”
MRR(Metadata Registry/Repository)
Repository(Logical)
“Objects live here”
RegistryRegistry
Information Model (RIM)
Objects are defined and retrieved based on standards
Allows discovery of objects and their address
Information standards are used to represent (“instantiate”) business objects
Registry lets you find/referencespecific (instances of) business objects
“Profiles” onDDI
SDMXetc
“Canonical”
InformationStandards
Conceptual Model“Business Objects”
MRR(Metadata Registry/Repository)
Repository(Logical)
“Objects live here”
RegistryRegistry
Information Model (RIM)
Objects are defined and retrieved based on standards
Allows discovery of objects and their address
Information standards are used to represent (“instantiate”) business objects
Registry lets you find/referencespecific (instances of) business objects
2011 – Defining the ABS Transitional Metadata Model (ATMM)
ATMM “Technical”
ATMM “Conceptual”
ATMM “Alignment with standards”
“Profiles” onDDI
SDMXetc
“Canonical”
InformationStandards
Conceptual Model“Business Objects”
MRR(Metadata Registry/Repository)
Repository(Logical)
“Objects live here”
RegistryRegistry
Information Model (RIM)
Objects are defined and retrieved based on standards
Allows discovery of objects and their address
Information standards are used to represent (“instantiate”) business objects
Registry lets you find/referencespecific (instances of) business objects
2012
ATMM “Technical”
ATMM “Conceptual”
ATMM “Alignment with standards”
GSIM(Under development)
“Profiles” onDDI
SDMXetc
“Canonical”
InformationStandards
Conceptual Model“Business Objects”
MRR(Metadata Registry/Repository)
Repository(Logical)
“Objects live here”
RegistryRegistry
Information Model (RIM)
Objects are defined and retrieved based on standards
Allows discovery of objects and their address
Information standards are used to represent (“instantiate”) business objects
Registry lets you find/referencespecific (instances of) business objects
2013
ATMM
GSIM@ABS
InfoStandards @ ABSOnly a small core currently Growing daily, prioritised by need.
GSIM V1.0
• GSIM/SDMX GSIM/DDI relationships (UNECE)
• Early work on model based, GSIM aligned, DDI (DDI Alliance)
“Profiles” onDDI
SDMXetc
“Canonical”
InformationStandards
Conceptual Model“Business Objects”
MRR(Metadata Registry/Repository)
Repository(Logical)
“Objects live here”
RegistryRegistry
Information Model (RIM)
Objects are defined and retrieved based on standards
Allows discovery of objects and their address
Information standards are used to represent (“instantiate”) business objects
Registry lets you find/referencespecific (instances of) business objects
Future?
ATMM
GSIM@ABS
InfoStandards @ ABSOnly a small core currently Growing daily, prioritised by need.
GSIM V1.0
• GSIM/SDMX GSIM/DDI relationships (UNECE)
• Early work on model based, GSIM aligned, DDI (DDI Alliance)
BEANS!
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
I. Why are common standards important?II. DDI and SDMX : The relationship so far….III. Better to have one standard than two?IV. Will everyone need to learn two standards?V. ABS approach, so far….VI. DiscussionVII. Optional bonus material
I. The challengeII. ApplicationIII. Use cases
The challenge
Is not about which flavor of XML we use (XML doesn’t really matter)
It’s about data and metadata!– If I want to use DDI to describe my data,
and you want to use SDMX, how can we ensure that we are getting the same data and metadata?
The challenge (2)
If I am using SDMX, but I am sent DDI, a simple transformation must give me the same payload of data and metadata
Vice-versa for SDMX users Conventions will need to be established regarding
identifiers and the way the unit record files are structured
There will need to be agreed models for each business case
Combined DDI-SDMX approaches
Mixing the two standards within an implementation, allowing for the expression of the same metadata in both standards, so that the information could be transformed from one format to the other.
This way, it would become possible to select either DDI or SDMX for a particular operation, similar to what we discussed above regarding application functionality.
Metadata stored and indexed in such a fashion that it can be expressed either as SDMX or DDI on an as-needed basis.
Metadata Repository and Registry project at ABS. The actual format used for metadata storage may be neither
SDMX nor DDI, so long as it can be expressed using both standards.
GSIM to be implemented through a combination of SDMX and DDI?
Generic Statistical Information Model (GSIM)
Common Generic lndustrialised Statistics
GSBPM GSIM
Methods Technology
Business Concepts Information Concepts
Statistical HowTo Production HowTo
con
ceptu
al
pra
ctic
al
Common Generic lndustrialised Statistics
GSBPM GSIM
Methods Technology
Business Concepts Information Concepts
Statistical HowTo Production HowTo
con
ceptu
al
pra
ctic
al
SDMXDDIISO 11179Etc.
GSIMGSBPM
Methods Technology
GSIMGSBPMGSIM
Service OutputsServiceService
Inputs
informs informs
enablesbusiness process
Service defined by methods and business need
informs
Generalised Statistical Production System
Conceptual
Practical
Expanding on the diagram
Standards Basede.g. DDI, SDMX
Survey
A Survey is targeted at a specific Population and comprises Questions
Questions may be linked to a Variable which specifies
- conceptual meaning (Concept)-valid set of responses that are allowed (Category Scheme and contained Category)
Output from the Survey is a Unit Record Data Set
The Proposed Approach
The full set of information includes:– The unit record data– Structural information about the variables and
representations– Additional information about how the data has
been generated/collected/processed In DDI, this set of information can be expressed
as a DDI instance and a data file– Both the structural and processing metadata can
be expressed as a single DDI instance
Output Tables
Concepts
MetadataSet
Unit Record Data
DDI Instance
ASCII Data File
SDMX DataSet
SDMX StructuralMetadata
SDMX Metadata
Report