Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Date: 03/09/2015
SC17DI06692
D4.3 Report on implementation of a Metadata Management pilot for DG COMP
Reference data governance and management at DG COMP
03/09/2015 Page i
Document Metadata
Property Value
Date 2014-06-07
Status Accepted
Version 1.00
Authors
Stijn Goedertier – PwC EU Services
Gerben Hoogeboom – PwC EU Services
Philippe Lamote – PwC EU Services
Nikolaos Loutas – PwC EU Services
Brecht Wyns – PwC EU Services
Reviewed by Pieter Breyne – PwC EU Services
Approved by
Jesper Abrahamsen – European Commission, DG COMP
Julian-Daniel Jimenez-Krause – EC, DG COMP
Athanasios Karalopoulos - European Commission, DG DIGIT
Vassilios Peristeras – European Commission, DG DIGIT
This study was prepared for the ISA Programme by:
PwC EU Services
Disclaimer:
The views expressed in this report are purely those of the authors and may not, in
any circumstances, be interpreted as stating an official position of the European
Commission.
The European Commission does not guarantee the accuracy of the information
included in this study, nor does it accept any responsibility for any use thereof.
Reference herein to any specific products, specifications, process, or service by
trade name, trademark, manufacturer, or otherwise, does not necessarily constitute
or imply its endorsement, recommendation, or favouring by the European
Commission.
All care has been taken by the author to ensure that s/he has obtained, where
necessary, permission to use any parts of manuscripts including illustrations, maps,
and graphs, on which intellectual property rights already exist from the titular
holder(s) of such rights or from her/his or their legal representative.
“PwC” is the brand under which member firms of PricewaterhouseCoopers
International Limited (PwCIL) operate and provide services. Together, these firms
form the PwC network. Each firm in the network is a separate and independent
legal entity and does not act as agent of PwCIL or any other member firm.
Reference data governance and management at DG COMP
03/09/2015 Page ii
Document History
Version Date Description Action
0.01 2013-12-17 Template & Table of Contents Creation
0.02 2014-01-08
Desk research based on previous
communication and business
case
Update
0.03 2014-01-16 Updates based on meeting with
DG COMP of the 15 January 2014 Update
0.04 - 0.06 2014-01-21 Internal review Update
0.07 2014-01-30
Updates based on received input
from the Conference Call of 29
January 2014
Update
0.08 2014-02-11
Updates based on received input
from the Conference Call of 29
January 2014
Update
0.09 - 0.10 2014-02-11 Internal review Update
0.11 2014-02
New structure based on the
document, domain model &
schema documentation updated.
Update
0.12 2014-03-10 Internal review Review
0.13 2014-03-10
Updates throughout the
document: stakeholder
requirements, best practices and
SKOS description
Update
0.14 2014-03-11 Update
0.15 2014-03-12 Update
0.16 2014-04-13 Review
0.17 2014-03-13 Update
0.18 – 0.22
2014-03-18
–
2014-03-26
Restructuring
Updates throughout the
document
Update
0.23 2014-03-31 Updates on the governance
model Update
0.24 2014-04-02 Updates on the domain model Update
0.25 – 0.27
2014-04-02
– 2014-04-
03
Updates on the description of
GENIS Reference Data
Component
Update
0.28 2014-04-08
Updates in structure of section 4:
requirements and specifications
for reference data tools
Update
0.29 2014-04-08 Updates in section 3 Update
Reference data governance and management at DG COMP
03/09/2015 Page iii
0.30 2014-04-08 Internal review Review
0.31 2014-04-08 Updates in section 3 and 4 Update
0.32 2014-04-08 Updated sections 2 and 3 Update
0.33 2014-04-15 Updated governance Update
0.34 – 0.35 2014-04-16 Updated tools Update
0.36 2014-04-17 Internal review Review
0.37 2014-04-18 Comments processed Update
0.38 2014-04-18 Delivered for review Review
0.39 2014-04-18 Updated change man. and tools Update
0.40 – 0.41 2014-04-18 Updated chapters 4 and 5 Update
0.42 2014-04-30 General Review by Jesper
Abrahapsen
Review &
Update
0.43 2014-05-06
Elaboration of GENIS
recommendations and End-2-End
example approach
Review &
Update
0.44 – 0.45 2014-05-07 Internal review Review
0.46 2014-05-08 Delivered for acceptance Delivered
0.47 2014-05-28 Delivered for acceptance Delivered
0.47 2014-06-04 Review by Athanasios
Karalopoulos Review
0.48 2014-06-05 Addressing review comments by
Athanasios Karalopoulos Update
0.49 2014-06-06 Delivered for acceptance Delivered
0.50 2014-06-07 Delivered for acceptance Delivered
1.00 2014-06-07 Accepted Accepted
Reference data governance and management at DG COMP
03/09/2015 Page iv
Contents
EXECUTIVE SUMMARY ........................................................................................................................... 1 1. INTRODUCTION ............................................................................................................................ 3
1.1. CONTEXT: STATE-AID CONTROL ......................................................................................................... 3 1.2. DEFINITION: REFERENCE DATA ........................................................................................................... 4 1.3. BUSINESS NEED .............................................................................................................................. 5 1.4. EXPECTED BENEFITS ........................................................................................................................ 5 1.5. APPROACH .................................................................................................................................... 6 1.6. STAKEHOLDERS AND ROLES ............................................................................................................... 6 1.7. GLOSSARY ..................................................................................................................................... 7
2. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA GOVERNANCE............................... 9
2.1. STAKEHOLDER REQUESTS AND NEEDS .................................................................................................. 9 2.2. EXISTING SOLUTIONS FOR REFERENCE DATA GOVERNANCE .................................................................... 12
2.2.1. ISA Committee and ISA Coordination Group ....................................................................... 12 2.2.2. Inter-Institutional Metadata Maintenance Committee (IMMC) ......................................... 12 2.2.3. ISO11179-6 Metadata Registration .................................................................................... 12 2.2.4. Data Management Body of Knowledge (DM-BOK) ............................................................ 13
2.3. SPECIFICATION OF METADATA GOVERNANCE ...................................................................................... 13 2.3.1. Scope................................................................................................................................... 13 2.3.2. Organisational structure ..................................................................................................... 15 2.3.3. Decisions ............................................................................................................................. 18 2.3.4. Authoritative source ........................................................................................................... 19 2.3.5. Licensing framework ........................................................................................................... 20 2.3.6. Enforcement ....................................................................................................................... 21 2.3.7. Continuous improvement ................................................................................................... 21
3. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA MANAGEMENT ........................... 22
3.1. STAKEHOLDER REQUESTS AND NEEDS ................................................................................................ 22 3.2. EXISTING METHODOLOGIES FOR REFERENCE DATA MANAGEMENT........................................................... 24
3.2.1. Data Management Body of Knowledge (DM-BOK) ............................................................ 24 3.2.2. ISO 11179-6 Metadata Registration ................................................................................... 24 3.2.3. ISO 19135:2005 Geographic information -- Procedures for item registration .................... 25 3.2.4. Information Technology Infrastructure Library (ITIL).......................................................... 25 3.2.5. Good practices from the Publications Office: integrating Reference Data Management in the Software Development Lifecycle ................................................................................................. 25
3.3. SPECIFICATION FOR METADATA MANAGEMENT ................................................................................... 27 3.3.1. Design structural metadata ................................................................................................ 27 3.3.2. Manage change of structural metadata ............................................................................ 27 3.3.3. Harmonise structural metadata ......................................................................................... 30 3.3.4. Release structural metadata .............................................................................................. 32 3.3.5. Deploy structural metadata ................................................................................................ 35 3.3.6. Retire structural metadata ................................................................................................. 36
4. REQUIREMENTS FOR AND ASSESSMENT OF EXISTING REFERENCE DATA TOOLS ......................... 38
4.1. STAKEHOLDER REQUESTS AND NEEDS ................................................................................................ 38 4.2. EXISTING STANDARDS FOR REFERENCE DATA MANAGEMENT .................................................................. 39
4.2.1. Representation: Simple Knowledge Organisation System (SKOS) ...................................... 39 4.2.2. Representation: GeneriCode ............................................................................................... 41 4.2.3. Representation: Using HTTP URIs to identify concept schemes and concepts ................... 41 4.2.4. Description: Asset Description Metadata Schema (ADMS) ................................................ 41
4.3. EXISTING TOOLS FOR REFERENCE DATA MANAGEMENT ......................................................................... 42 4.3.1. Publication: Joinup .............................................................................................................. 42 4.3.2. Publication: Metadata Registry of the Publications Office (MDR) ...................................... 43
Reference data governance and management at DG COMP
03/09/2015 Page v
4.3.3. Editor / Propagation: GENIS Reference Data Component (GENIS RDC) ............................. 43 4.3.4. Editor: VocBench ................................................................................................................. 45 4.3.5. Editor: PoolParty: Thesaurus Management ........................................................................ 45 4.3.6. Editor: Silk workbench (link discovery)................................................................................ 46 4.3.7. Workflow Management tool: Activiti ................................................................................. 46 4.3.8. Change management: Atlassian JIRA ................................................................................. 46 4.3.9. Deployment: Mule .............................................................................................................. 47 4.3.10. Editor / Deployment: Jena.............................................................................................. 47
4.4. DOMAIN MODEL .......................................................................................................................... 47 4.5. DATA FLOW DIAGRAM ................................................................................................................... 49 4.6. HIGH-LEVEL USE CASES .................................................................................................................. 49
4.6.1. Use Case 0 – Edit an authentic source of reference data ................................................... 50 4.6.2. Use Case 1 – Detect reference data changes ...................................................................... 50 4.6.3. Use Case 2 – Manage reference data changes ................................................................... 51 4.6.4. Use Case 3 – Deploy reference data changes ..................................................................... 52
4.7. ASSESSMENT OF PROPOSED TOOLING FOR REFERENCE DATA MANAGEMENT ............................................. 53 4.8. RECOMMENDATIONS FOR THE GENIS RDC – E2E IMPLEMENTATION EXAMPLE ....................................... 54
5. CONCLUSIONS ............................................................................................................................ 58 6. ACKNOWLEDGEMENTS ............................................................................................................... 59 BIBLIOGRAPHY ..................................................................................................................................... 60 ANNEX I STATE-AID REFERENCE DATA SETS ......................................................................................... 67 ANNEX II METADATA REGISTRY OF THE PUBLICATIONS OFFICE (MDR) ................................................ 69
List of Tables
Table 1 - Stakeholders ...................................................................................................... 6
Table 2 - Glossary ............................................................................................................ 7
Table 3 – Stakeholder requests: reference data management ........................................... 9
Table 4 – Stakeholder requests and needs: reference data management ........................ 22
Table 5 – Reference data tools ....................................................................................... 38
Table 6 – State Aid reference data.................................................................................. 67
List of Figures
Figure 1 – Overview of systems involved in State-aid control ........................................ 4
Figure 2: organisation structures .................................................................................... 13
Figure 3 – Illustration: objectives of State-aid control as defined in Commission
Regulation (EC) No 794/2004 ................................................................................ 18
Figure 4 – UML Static Diagram: Domain Model for reference data (based on SKOS-
XL) ......................................................................................................................... 49
Figure 5- Simplified DFD for the flow of data between authentic source and GENIS .. 49
Reference data governance and management at DG COMP
03/09/2015 Page vi
Figure 6 High-level use cases for metadata management .............................................. 50
Figure 7 - Overview (functional blocks) ........................................................................ 56
Figure 8: Overview (example implementation).............................................................. 56
Figure 9 – Schematic overview of how the Publications Office edits an XML file and
generates all distributions of Named Authority Lists (NALs) ............................... 70
Reference data governance and management at DG COMP
03/09/2015 Page 1 of 70
EXECUTIVE SUMMARY
This report is commissioned by the Interoperability Solutions for European Public
Administrations (ISA) Programme of the European Commission, in the context of its
Action 1.1 on semantic interoperability. It involves the tailoring of a methodology
for the management and governance of reference data, based on the proposed
methodology in D4.2 ‘Methodology and tools for Metadata Governance and
Management for EU Institutions’, for the State-aid information systems (register of
planned State-aid) of DG COMP in which the Commission exchanges information
both internally (with DG AGRI, DG MARE and Eurostat) and with European public
administrations in all Member States. It also assesses the extent to which the
Generic Interoperable Notification Services (GENIS) Reference Data Component
(RDC) can support the reference data governance and management processes.
During the development of this pilot the approach was set to the following:
Elicited and validate the specific requirements for reference data
management and governance for DG COMP in the context of State-aid
control;
Identify existing solutions for managing and governing reference data
based on input from the Publications Office and deliverable D4.1 ‘Metadata
management requirements and existing solutions in EU Institutions and
Member States’ ;
Specify a solution for the management and governance of reference data,
consistent with D4.2 ‘Methodology and tools for Metadata Governance and
Management for EU Institutions’ and based on standards, and
demonstrated its applicability and feasibility; and
Assess the coverage of the identified requirements and propose an
approach with existing tools, including the GENIS reference data
component, hereby identifying gaps, assessing usefulness, and fitness-for-
purpose.
Governance and Management
Chapters 2 and 3 look at the requirements and specifications for reference data
governance and reference data management respectively.
In terms of governance we have derived several models from existing solutions.
For the local level we have identified a governance structure composing out
of a steering committee, working group and stakeholder involvement.
For inter-institutional IMMC can be taken for inspiration.
On a trans-European level Comitology procedures need to be taken into
account.
We have determined that both reference data specifications under metadata
governance and related documentation should have an authoritative source.
The use of persistent Uniform Resource Identifiers (HTTP URI’s) for reference
data releases can make it easier to manage an authoritative source.
In terms of data management we have identified best practices from DM-BOK,
Publications Office and ITIL and found that these existing management practices
can be well applied to manage structural metadata as described in chapter 3.3.
Reference data governance and management at DG COMP
03/09/2015 Page 2 of 70
Tools
The focus of Chapter 4 is an assessment of and requirements for existing tools for
Reference Data. It enlists the main use cases and ends with briefly identifying
which functionality could be covered by which tool, as well as presenting an
example of a possible overall approach to demonstrate how they can all collaborate.
Having identified governance and management procedures, standards and best
practices we have done an inventory of existing tools to support this. It is
concluded that GENIS RDC is a well-placed tool that can be used for editing and
propagating data and perhaps play a part in change management and that there
are many tools available that could complement GENIS RDC in order to fulfil the
needs and requirements listed in this document. The tools can be categorized as
follows:
Editing: GENIS RDC or VocBench could be used as editing and workflow
tools for managing thesauri, authority lists and glossaries based on SKOS
RDF.
Change management: Use a dedicated component to manage and track
changes. Alternatively GENIS could be expanded to include more change
management.
Deployment: GENIS RDC is already used for deployment and its
functionality could be expanded as explained in Section 4.8
Publication: As publication source, the Joinup platform of the ISA
programme can be used. The structural metadata can be represented using
SKOS RDF and described using the Asset Description Metadata Schema
(ADMS RDF).
Harmonisation: For reference data mappings or interlinking two data
sources.
Each tool is tailored to fit the needs and requirements of the domain (e.g. editing,
change management, deployment, publication). In this context tools need to be
integrated so that automated exchange can be facilitated (e.g. change
management and workflow tools cover the entire process and need to keep track of
what is happening in editing tools etc.).
The following is therefore recommended:
Consider using the tools as mentioned in the categorization as they fulfil the
requirements and are also being widely used within the EC;
Consider using a standard representation format such as SKOS-XL;
Consider providing an import and export feature for reference data in
SKOS-XL format;
Consider attributing persistent HTTP URIs; and
Also consider the use of integration tools such as ESB MULE and combine it
with a workflow automation tool such as Activiti.
Reference data governance and management at DG COMP
03/09/2015 Page 3 of 70
1. INTRODUCTION
This report is commissioned by the Interoperability Solutions for European Public
Administrations (ISA) Programme of the European Commission, in the context of its
Action 1.1 on semantic interoperability. It involves the tailoring of a methodology
for the management and governance of reference data for the State-aid information
systems (register of planned State-aid) of DG COMP in which the Commission
exchanges information both internally (with DG AGRI, DG MARE and Eurostat) and
with European public administrations in all Member States. It also assesses the
extent to which the Generic Interoperable Notification Services (GENIS) Reference
Data Component (RDC) can support the reference data governance and
management processes.
1.1. Context: State-aid control
DG COMP – jointly with DG MARE And DG AGRI – supports the following two State-
aid control processes:
State-aid notification process: Member States are obliged to inform in
detail to the European Commission of their intention to spend public money
in undertakings (state aid). The legal basis for this is Commission Regulation
(EC) No 794/2004 of 21 April 2004 implementing Council Regulation (EC)
No 659/1999 laying down detailed rules for the application of Article 93 of
the EC Treaty, including Regulations amending Regulation 794/2004, and
Commission Regulation (EC) No 800/2008.
State-aid monitoring and reporting process: Member States are obliged
to report to the Commission on actual expenditures on current State-aid
measures. The legal basis for this is Article 21 of Council Regulation (EC)
659/1999 in regard of schemes and Article 6 of Commission Regulation
(EC) 794/2004 with respect to the remainder of existing aid, be it ad hoc
or any other kind.
The State-aid control processes are supported by the State-aid control information
systems of DG COMP, depicted in Figure 1, include:
GENIS (SANI-II): The Generic Interoperable Notification Services (GENIS)
Information System is used to manage and support the exchange of
information between Member States and the Commission within the State
Aid Notification Process, where Member States notify the European
Commission of planned State-aid. GENIS is also known as the State Aid
Notification Interactive (SANI-2), and is the successor to the existing SANI.
CMS: The Case Management System (CMS) receives the notification and is
used by Commission staff to investigate whether the State-aid can be
approved.
SARI: Once approved, the State Aid Reporting Interactive (SARI) is used by
Member States to supply the European Commission with the requested
information on state aid issued to beneficiaries.
Statistical reporting: DG COMP collects statistics for EuroStat1 on State-
1 Eurostat is the statistical office of the European Union, it provides the European Union with statistics at
European level that enable comparisons between countries and regions
Reference data governance and management at DG COMP
03/09/2015 Page 4 of 70
aid. For this it has created its own data warehouse.
GENIS has a component-based architecture, consisting of several building blocks,
including the GENIS Reference Data Component. DG COMP intends to ensure
that this component will be used to manage change to the reference data of GENIS,
CMS, and SARI by Q1 2014.
Figure 1 – Overview of systems involved in State-aid control
1.2. Definition: reference data
In this report the following definition for reference data is used:
Reference data are small, discrete sets of values that are not updated as part of
business transactions but are usually used to impose consistent classification.
Reference data normally has a low update frequency. Reference data is relevant
across more than one business systems belonging to different organisations and
sectors.
Reference data is a denominator for several artefacts that are used in information
systems and information exchange. The following is a list of types of reference data
that were identified by the ADMS Working Group2:
Code list: Complete set of data element values of a coded simple data
element [ISO 9735-1:2002, 4.14];
Taxonomy: scheme of categories and subcategories that can be used to
sort and otherwise organize items of knowledge or information [ISO/DIS
25964-2];
Thesaurus: controlled and structured vocabulary in which concepts are
represented by terms, organized so that relationships between concepts are
made explicit, and preferred terms are accompanied by lead-in entries for
synonyms or quasi-synonyms [ISO 25964-1:2011];
Name Authority List: controlled vocabulary for use in naming particular
entities consistently [ISO/DIS 25964-2].
2 ADMS Asset Types, https://joinup.ec.europa.eu/svn/adms/ADMS_v1.00/ADMS_SKOS_v1.00.html
GENISSANI-II
• Create notification
Case MgmtSystem - CMS
• COMP, MARE, AGRI
• Receive notification
SARI (DG COMP)
• Web-based user interface
• MS provide information after approval of notification
Reporting for EUROSTAT
• Publication
Reference Data Reference Data Reference Data
Reference data governance and management at DG COMP
03/09/2015 Page 5 of 70
Annex I contains an overview of the Reference Data that is managed by DG COMP
in the context of State-aid control.
1.3. Business need
A business case developed in the context of Action 1.1 of the ISA Programme
[European Commission, ISA Programme, 2013], elicits the following problem and
proposes the following solutions:
Problem: The business case reveals that uncoordinated use of reference
data may lead to failures in transaction handling between applications.
Moreover, the lack of common reference data makes integrating data
from different sources more cumbersome and has a negative impact on data
quality.
Solutions: The business case proposes a solution that is threefold:
o Metadata governance: well-defined roles and responsibilities,
cohesive policies and principles, and decision-making processes that
define, govern and regulate the lifecycle of metadata;
o Metadata management: the good practice of putting in place
people, processes, and systems to plan, perform, evaluate, and
improve the lifecycle of metadata;
o Metadata tools: tools that help to automate certain tasks in the
metadata management process.
DG COMP has a solid appreciation for the importance of reference data. High-level
management buy-in makes adoption of the appropriate methodologies easier. The
GENIS Reference Data Component, if supported by a well-thought management
and governance methodology, has the potential to improve the sharing and
integration of information and contribute directly to the realisation of GENIS.
Annex I contains an overview of the Reference Data that is managed by DG COMP.
The challenge lies in the fact that this is repeatedly the case across the myriad
databases across the Commission. The same attribute in different databases then
has a different set of allowable data values (value domains). The first challenge
thus is to map these values to an authentic source, and thus the one which is the
prime owner of that data’s creation in the Commission. The most typical example is
the list of codes with which Member States are referred, where a list is published by
the Publications Office in line with the process of accession of a state to the
European Union.
1.4. Expected benefits
The beneficiaries of the pilot (see Section 1.6) anticipate that a better
management and governance of reference data will yield several benefits,
including:
Better implementation of State-aid control policy;
Improve the coordination of the development and maintenance of reference
Reference data governance and management at DG COMP
03/09/2015 Page 6 of 70
data in the domain of State-aid control;
Increase consumer’s reliability on the data and reduced errors and
inconsistencies in the data flows between State-aid control information
systems; and
More efficient collaboration by developing a common understanding of
operational terminology, reinforced by multilingual concepts and keeping
track of temporal aspects.
1.5. Approach
The approach followed in this study is split in the following four phases.
1. Elicit and validate the specific requirements for reference data
management and governance for DG COMP in the context of State-aid
control;
2. Identify existing solutions for managing and governing reference data
based on input from the Publications Office and D4.1;
3. Specify a solution for the management and governance of reference
data (based on D4.2 and providing input to D4.2) and demonstrate its
applicability and feasibility; and
4. Assess the coverage of the identified requirements and proposed
approach by existing tools, including the GENIS reference data
component, and identify gaps. Identify gaps and assess use, usefulness, and
fitness-for-purpose.
The report is structured in three parts with requirements and specifications for
governance, management, and tools.
1.6. Stakeholders and roles
The table below lists the stakeholders involved in this study.
Table 1 - Stakeholders
Term Beneficiary System
owner
Approving
authority Sponsor
Member States X
DG COMP X X X
DG AGRI X
DG MARE X
EC ISA Programme X X
For facilitating the communication and the collaboration with the different
stakeholders, several meetings and workshops were organised:
W1: Tuesday 15 January 2014
W2: Wednesday 22 January 2014
Reference data governance and management at DG COMP
03/09/2015 Page 7 of 70
W3: Wednesday 29 January 2014
W4: Wednesday 26 February 2014
W5: Friday 21 March 2014
In these workshops we involved as necessary, the following:
the IT Unit of DG COMP: Mr J. Jimenez Krause (IT Project Manager), Mr J.
Abrahamsen (IT Project Officer - Database Administrator) and Mr Manuel
Perez Espin (Head of Unit - Information technology)
the ISA unit in DIGIT: Mr A. Karalopoulos (Programme Manager), Ms S.
Wigard (Programme Manager) and Mr V. Peristeras (Programme Manager -
EU policies)
The workshops were supplemented where necessary by direct communications with
an official responsible for the development of GENIS: Mr R. Atienza (IT Project
Officer).
1.7. Glossary
The table below provides common definitions used throughout the study.
Table 2 - Glossary
Term Description
ADMS A common metadata vocabulary to describe standards, so-called
interoperability assets, on the Web.
Code list Complete set of data element values of a coded simple data element [ISO 9735-1:2002, 4.14].
Data model
A data model is a collection of entities, their properties and the
relationships among them, which aims at formally representing a
domain, a concept or a real-world thing.
DG AGRI Directorate-General for Agriculture and Rural Development
DG COMP Directorate-General for Competition
DG MARE Directorate-General for Maritime Affairs and Fisheries
Interoperability
According the ISA Decision, interoperability means the ability of disparate and diverse organisations to interact towards mutually
beneficial and agreed common goals, involving the sharing of information and knowledge between the organisations, through the business processes they support, by means of the exchange of data between their respective ICT systems.
Metadata
Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about
information. [National Information Standards Organization , 2004]
Metadata alignment
Metadata alignment is the harmonisation of structural metadata either by forging a wide consensus on the use of a common specification for structural metadata or through the creation of mappings between terms of two or more specifications.
Metadata
governance
Metadata governance comprises well-defined roles and responsibilities, cohesive policies and principles, and decision-making processes that define, govern and regulate metadata.
Metadata
management
Metadata management is defined as the good practice of putting in place people, processes, and systems to plan, perform, evaluate, and improve the lifecycle of metadata.
Reference data governance and management at DG COMP
03/09/2015 Page 8 of 70
Term Description
Name Authority List
Controlled vocabulary for use in naming particular entities consistently [ISO/DIS 25964-2].
Reference data
Reference data is small, discrete sets of values that are not updated as
part of business transactions but are usually used to impose consistent classification. Reference data normally has a low update frequency. Reference data is relevant across more than one business systems belonging to different organisations and sectors.
RFC Request For Change a form used to record details of a request for a change and is sent as an input to change management by the change requestor.
SKOS Simple Knowledge Organization System – RDF Vocabulary for the representation of key reference data such as code lists, and taxonomies.
Structural
metadata Data model or reference data
Taxonomy Scheme of categories and subcategories that can be used to sort and otherwise organize items of knowledge or information [ISO/DIS
25964-2].
Thesaurus
Controlled and structured vocabulary in which concepts are represented by terms, organized so that relationships between
concepts are made explicit, and preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms [ISO 25964-1:2011].
GENIS
The Generic Interoperable Notification Services (GENIS) Information
System is used to manage and support the exchange of information between Member States and the Commission within the State Aid Notification Process, where Member States notify the European Commission of planned State-aid. GENIS is also known as the State Aid Notification Interactive (SANI-2), and is the successor to the existing SANI.
RDC Reference data component belonging to GENIS for the automated
deployment of reference data
SARI The State Aid Reporting Interactive (SARI) is used by Member States to supply the European Commission with the requested information on state aid issued to beneficiaries.
CMS The Case Management System (CMS) receives the notification and is used by Commission staff to investigate whether the State-aid can be approved.
Reference data governance and management at DG COMP
03/09/2015 Page 9 of 70
2. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA
GOVERNANCE
This section elicits the stakeholder requests and needs and formulates the
specifications for a reference data governance framework for the State-aid
information systems of DG COMP. We defined metadata governance as the set of
roles and responsibilities, cohesive policies and principles, and decision-making
processes that define, govern and regulate the lifecycle of metadata.
2.1. Stakeholder requests and needs
The table below lists the stakeholder requests and needs for reference data
governance.
Table 3 – Stakeholder requests: reference data management
ID Request or need
Organisational Structure
G1 Formal organisational structure (including ownership) for
context-neutral reference data
Authentic reference data is context-neutral, i.e. not defined in the
context of a single system. There must be a formal organisational
structure for the governance of each set of authentic, context-neutral
reference data with formally defined roles including ownership. The
owner should be committed to sustain the reference data specification
using an open change management process.
G2 Formal organisational structure (including ownership) for
system-based reference data
There must be a formal organisational structure for each information
system that uses reference data with formally defined roles including
ownership.
G3 Foster the reuse of existing standards
The reference data management and governance structure should foster
the reuse of existing standards.
G4 Involve direct stakeholders in the governance process
The solution should foresee the involvement of direct stakeholders in the
metadata governance process to ensure that the interests of the
stakeholders are taken into account.
[Note: The specification of this will be closely linked to ISO 11179-
6:2005 and OPOCE best practices]
G5 Involve operational staff in functional meetings
The solution should foresee to invite representatives from the operational
level to participate in functional-level meetings.
[Note: The specification of this will be closely linked to ISO 11179-
6:2005 and OPOCE best practices]
Reference data governance and management at DG COMP
03/09/2015 Page 10 of 70
Scope Criteria
G6 Intra- and inter-institutional governance
The mechanism for governance should encompass both intra- and inter-
institutional data exchange:
Inter-institutional information exchange: when EU institutions
exchange structured information on a recurring basis
Intra-institutional information exchange: in areas where changing
structural reference data would have a high-impact on operational
systems.
G7 Reusability of proposed solution
Although the reference data solution is developed mainly for the State-
aid domain, its processes should be generic for the purpose of being
reused in other domains and by other EU institutions.
Decision mechanism
G8 Decision mandate
The governance mechanism should clearly state the mandate of the
governance body with regard to taking decisions on:
Changes to reference data;
Intellectual property rights linked to reference data; and
Enforcement, i.e. implementation of reference data specifications in
systems.
G9 Documentation
Specific decision making processes which are depending on the context
in which a decision is required should be developed, documented and
shared with all relevant stakeholders.
G10 Time constraints
Decision processes should be linked to time constraints which are
dependent on the nature of the decision to be taken.
G11 Basis for decision making
The decision making processes should describe how agreements are
reached – e.g. via a qualified majority or via consensus building.
Enforcement Process
G12 Legal enforcement
In the context of State-aid control, the information that must be
exchanged between Member States and the European Commission is
specified in EU legislation, including the use of reference data.
G13 Reuse under an open licence
The reference data should be reusable under an open, widely permissive
licence.
Process for Continuous Improvement
G14 Quality Assurance
Reference data governance and management at DG COMP
03/09/2015 Page 11 of 70
The reference data management and governance methodology
should take quality of its processes (cf. 2.3 and 3.3) into account
as an intrinsic aspect and not regard it as an after-thought.
G15 Risk mitigation
Risks related to the propagation of changes to reference data into
operational systems, should be mitigated by governance processes.
Overall, the governance structure should promote the sharing and reuse of
reference data sets.
Reference data governance and management at DG COMP
03/09/2015 Page 12 of 70
2.2. Existing solutions for reference data governance
This section contains an overview of existing reference data governance solutions.
These solutions could be taken as a reference for best practices or even adopted
where possible.
2.2.1. ISA Committee and ISA Coordination Group
The European Commission is assisted in the implementation of the Interoperability
Solutions for European Public Administrations (ISA) Programme by the ISA
Committee, which represents the Member States. Furthermore, the ISA
Coordination Group, nominated by the ISA Committee, ensures continuity and
consistency at working level. Expert groups provide guidance on specific Work
Programme actions. In the past, the ISA Coordination Group has endorsed
structural metadata such as the Core Vocabularies3. This governance body may be
useful for taking high-level decisions on voluntary, trans-European harmonisation
initiatives on structural metadata. Obviously, the ISA Committee and Coordination
Groups do not have a mandate to take decisions in the context of reference data for
State-aid control.
2.2.2. Inter-Institutional Metadata Maintenance Committee (IMMC)
The Inter-Institutional Metadata Maintenance Committee (IMMC) is responsible for
the decisions related to key reference data and data models used in the legal
decision-making process of EU institutions and the EU Open Data Portal (ODP). A
thorough description of the governance methodology of the IMMC is included in
deliverables D4.1 and D4.2. Whereas the governance methodology applied by the
IMMC meets most requirements for inter-institutional governance, the current
scope of the IMMC does not cover reference data in the domain of State-aid control.
It also does not provide a solution for the local governance.
2.2.3. ISO11179-6 Metadata Registration
A general standard for the registration of metadata items is ISO/IEC 11179. As part
of the six-part standard, ISO/IEC 11179-6:20054 specifies the procedure by
which Administered Items required in various application areas could be registered
and assigned an internationally unique identifier. This procedure includes
organisations such as the Registration Authority, the Responsible Organisation, and
the Submitting Organisation. It also includes roles such as the Registrar, Steward,
and Submitter. This standard was a source of inspiration for the IMMC and its
Metadata Registry.
3 Joinup (30 May 2012), ISA Member State representatives endorse key specifications for e-Government
interoperability, https://joinup.ec.europa.eu/node/48837
4 ISO/IEC 11179-6:2005. Information technology -- Metadata registries (MDR) -- Part 6: Registration.
http://www.iso.org/iso/catalogue_detail.htm?csnumber=35348
Reference data governance and management at DG COMP
03/09/2015 Page 13 of 70
2.2.4. Data Management Body of Knowledge (DM-BOK)
The Data Management Body of Knowledge (DM-BOK) is a general methodology for
data management. The DM-BOK devotes an entire chapter to Reference and Master
Data Management. In terms of governance, it defines a number of Reference Data
Management processes. In terms of Governance Structure, it defines a number of
operational roles including the Data Architect, Business Analyst, Data Stewart, and
Application Architect as responsible rules. It attributes all decision power onto the
role of a Data Governance Council.
2.3. Specification of metadata governance
This section contains a proposed specification of metadata governance that is
tailored to the State-aid control information systems operated by DG COMP.
2.3.1. Scope
The domain of the governance is in the first place limited to State-aid control.
However, some reference data is not sector-specific, for example country codes,
but cross-sectorial.
Another aspect of scope is the level of governance. For DG COMP, metadata
governance should take place at three levels:
Local: part of the reference data is system-specific, i.e. specific to the
State-aid control information systems of DG COMP. For such reference data,
governance and management should take place at local (intra-
organisational) level only.
Inter-institutional: another part of the reference data can potentially be
used or is already used in the context of other information systems. For such
reference data, governance and management should take place both at the
inter-institutional and local levels.
Trans-European: reference data that can be used in the context of
information systems between Member States and the EU institutions, bodies
and agencies. In such cases Comitology may be needed. Comitology
procedures are relevant when the EC has been granted power to create and
implement rules. This is further explained in Section 2.3.2.3.
Figure 2: organisation structures
OP IMMC ISA Committee ?
MS1
MS2
MS3 MS4
…
DG1
DG2
DG3 DG4
DG…
LOCAL INTER - INSTITUTIONAL BETWEEN MEMBER STATES
COORDINATION EU INSTITUTIONS
COORDINATION EU
Reference data governance and management at DG COMP
03/09/2015 Page 14 of 70
Setting up metadata governance structures at these levels may seem heavy and
require a considerable coordination costs, however, experience from practice seems
to indicate that this is needed. The more complex the sharing of reference
becomes the more need there will be for formalized procedures. For
instance on an inter-institutional and trans-European level it will be a necessity to
describe change and release management in the rules of procedure, on a local level
this may be less formal depending on communications. Without formal metadata
governance and management many coordination problems may occur. The benefits
of proper metadata governance and management for information exchanges in
many cases outweighs the costs of fixing interoperability conflicts in production
systems
There must be a clear set of scope criteria that determine whether a reference data
specification should be placed under local, inter-institutional or trans-European
governance as this requires considerable coordination effort. On the other hand, it
increases reuse and hence maximises the benefits of interoperability through the
use of common reference data.
We propose that a metadata specification (including reference data) is placed under
trans-European governance when the following criteria are met:
The reference data is within scope of Council Regulation (EC) No 659/1999
and therefore directly related to the domain of State-aid control (i.e. the
notification or reporting process).
It is proposed that a metadata specification (including reference data) is placed
under inter-institutional governance when the following criteria are met:
Inter-institutional information exchange: when public administrations
exchange information on a recurring basis in which the metadata
specification is used as a common information exchange specification. For
example, the Named Authority List on currencies (NALs) of the Publications
Office could fit this criterion;
Large degree of similarity: when EU institutions use structural metadata
in existing information systems with a large degree of similarity. For
example, nearly all information systems of EU institutions use reference data
about the Member States of the European Union;
Commitment of maintenance: the publisher is committed to sustain the
specification using an open change management process. For example, the
Publications Office has a strong commitment of maintaining the Named
Authority Lists;
Commitment of use: there are at least two public administrations that
have a strong commitment to use the metadata specification. For example,
the Nomenclature for Terrestrial Units (NUTS) is used by many EU
institutions.
We propose that a metadata specification is placed under local governance when
the cost of coordination outweighs the benefits of interoperability:
High impact of changes: in areas where changing structural reference
data would have a high-impact on operational systems. For example, in a
content management system where updating reference data impacts many
other systems that provision the content management system, the impact of
changes to reference data may outweigh the benefits of increased
Reference data governance and management at DG COMP
03/09/2015 Page 15 of 70
interoperability due to the use of common reference data placed under
common governance.
No renewal of legacy applications: in areas where there is no renewal of
legacy systems, the cost of orchestrating integration outweighs pragmatic
local management. For example, in cases where structural metadata has
been hard-coded in software, implementing each change of reference data
triggers a software development lifecycle, which may be undesirable for
legacy applications.
2.3.2. Organisational structure
The following sections describe a governance mechanism for State-aid control at
local, inter-institutional, and trans-European levels.
2.3.2.1.Local governance structure
For the local governance, this report proposes to reuse an existing governance
structure used by DG COMP in the governance of the State-aid control information
systems. For the local governance the structure could be as follows:
A steering committee that will decide on strategic levels such as the
continuity and direction of the State-aid system, establish policies, deal with
issues related to the data model, such as copyrights, business relations. The
Steering Committee provides the strategic directions for the work and will
participate in the maintenance of the structural metadata ensuring the
alignment with the European policies and guidelines
A working group (WG): the WG brings together a group of experts with
knowledge of reference data. The WG is responsible for developing,
maintaining and publishing the reference data:
o The working group will consider proposals either by the group itself
or by users
o Proposals that are supported by the working group are sent to the
steering committee.
o The steering committee will provide advice on the validity of
proposals – advice taken into account by the SC’s decisions.
Stakeholders: all involved stakeholders perform day-to-day operations.
This is the level where the structural metadata is actually reused and
implemented in production systems. Feedback on the suitability of the
structural metadata in the different application scenarios is communicated
from this level to the functional level, in order to ensure that the structural
metadata is fit for purpose.
It is recommended that representatives from the stakeholders are invited to
participate in the WGs. This will ensure that feedback from the stakeholders is
fed into the structural metadata lifecycle, fostering the alignment of the
structural metadata with the requirements and needs of the users.
On a local scale this distinction might only exist in terms of roles.
At least the following roles should be implemented:
Reference data governance and management at DG COMP
03/09/2015 Page 16 of 70
Content expertise: knowledge about the semantics of the data for which
the metadata is used and the applications in which the data is used
Information management expertise: knowledge about theory and
practice of metadata
Technical expertise: knowledge about the technical approaches to be used
for the technical implementation in the environment in which the metadata
is used.
Documentation and publication expertise: knowledge about the
documentation rules and publication processes used in the environment in
which the metadata is used.
An example of a decision to be made on a local level could be the implementation
of background link type. It is only by DG COMP in the State-aid system, thus
decisions on management can be made on a local level according to the governance
structure and specified roles.
2.3.2.2.Inter-institutional governance structure
For the inter-institutional governance, this report proposes to adopt the governance
model of Inter-institutional Metadata Maintenance Committee (IMMC)5, or even to
expand the mandate of this governance body to also include
For example, the Countries Named Authority List is already governed by the IMMC
at an inter-institutional level. The NAL is a controlled vocabulary listing countries
with their authority code and label(s) The Countries NAL is part of the Core
Metadata (CM) used in the data exchange between the institutions involved in the
legal decision making process and the Publications Office of the EU. The NAL is
under governance of the Inter-institutional Metadata Maintenance Committee
(IMMC) and maintained by the Publications Office of the EU in its Metadata Registry
(MDR).
5 Annex 2 a` la note CD(2011)53 http://publications.europa.eu/mdr/resource/core-metadata/IMMC_reu3_adoption_anx3.pdf_A-
2011-764293.pdf
Reference data governance and management at DG COMP
03/09/2015 Page 17 of 70
2.3.2.3.Trans-European governance structure
In the context of State-aid Control, article 27 of Council Regulation (EC) No
659/1999 gave the European Commission the power of adopting implementing
provisions on State-aid Control. These include the specification of reference data,
as can be derived from the below text:
The Commission, acting in accordance with the procedure laid down in
Article 29, shall have the power to adopt implementing provisions
concerning the form, content and other details of notifications,[…]
In cases where the European Commission is given this power, a governance
mechanism is put in place that must follow Comitology procedures. Meaning that, a
committee composed of the representatives of the Member States and chaired by
the Commission is set up. The primary role of these Committees is to provide an
opinion on the draft measures that the Commission intends to adopt. There are two
functions for a committee either advisory or examination.
Advisory: the Commission shall take the utmost account of the committee’s
opinion.
Examination: implementing acts cannot be adopted by the Commission if
they are not in accordance with the opinion of the committee, except in very
exceptional circumstances, where they may apply for a limited period of
time
In the context of State-aid control, the Comitology Procedure resulted in
Commission Regulation (EC) No 794/2004 of 21 April 2004. One example of a
reference data specification that was designed through this process are the values
of the “objective” of State-aid control.
Reference data governance and management at DG COMP
03/09/2015 Page 18 of 70
Figure 3 – Illustration: objectives of State-aid control as defined in Commission Regulation
(EC) No 794/2004
2.3.3. Decisions
The three aforementioned governance structures should take among others the
following decisions:
Whether a metadata specification must be placed under local or inter-
institutional governance;
How to change and improve the metadata management process;
Whether a change request to a metadata specification must be accepted or
rejected (based on an impact analysis; cost-benefit analysis, risk analysis);
Whether an accepted change request will be released immediately or in a
scheduled release;
Where to store a metadata specification and with which access restrictions
(define roles and responsibilities);
Whether a metadata specification can be published under an open licence;
Whether a metadata specification can be supplemented with official
mappings;
Which policy is followed to encourage or mandate the reuse of the reference
data specification;
Which method is used for documenting reference data;
Whether a metadata specification should be deprecated; and
Which standards and tools to use in the metadata management process.
In the three aforementioned governance bodies, all decisions should be taken by
consensus and should be formally logged. Where not time-constrained, members
of the Governance structure should have sufficient time (e.g. two weeks) to review
proposed decisions. Where more time is needed to evaluate a proposal, it is a good
Reference data governance and management at DG COMP
03/09/2015 Page 19 of 70
practice to have decision takers request additional time. Giving too much time for
review by default, would slow down the decision making process.
2.3.4. Authoritative source
Both reference data specifications under metadata governance and related
documentation should have one owner, called authoritative source.
It is recommended to select an authoritative source which provides support for
versioning, and thus keeps track of all previous releases of structural metadata.
The latter is especially important when working with historical datasets, where it
may be needed to refer back to previous releases of reference data. In Section 3,
we specify how the authoritative source relates to the release management
process, where a new version of reference data is released. In Section 3.3.6, we
identify a number of tools that could support the management of the authoritative
source.
The use of persistent Uniform Resource Identifiers (HTTP URI’s) for reference
data releases can make it easier to manage an authoritative source. URI’s are
increasingly used for data integration according to the design principle of “Linked
Data”. Linked Data is a way of identifying, linking and accessing information on the
Web according to the four design principles put forward by Tim Berners-Lee6:
Use URIs as names for things;
Use HTTP URIs so that people can look up those names;
When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL); and Include links to other URIs, so that they
can discover more things.
Even when the underlying technology changes persistent HTTP URI’s allow both
identifying and obtaining reference data sets via a mechanism of URI forwarding /
redirection. A prerequisite for this is that the URI’s are well managed. A proposal
for the governance and management of persistent URIs for EU institutions is
included in deliverable ‘D3.2 Common approach for the management of persistent
URIs by EU institutions’.
6 http://www.w3.org/DesignIssues/LinkedData.html
Reference data governance and management at DG COMP
03/09/2015 Page 20 of 70
Example: For the ADMS reference data of the ISA Programme, a file server on
Joinup is used as the authoritative source. However, the purl.org service is used to
maintain persistent (permanent) HTTP URIs. This means that the link will never
break as changes are made, and the authoritative source will always be available
because the end point is always the same.
http://purl.org/adms/assettype/1.0 redirects to
https://joinup.ec.europa.eu/svn/adms/ADMS_v1.00/ADMS_SKOS_v1.00.rdf
An example of combining best practices of persistent URI’s with keeping preceding
versions available are provided by the Metadata Registry (MDR) of the Publications
Office of the European Union. The URI’s of the Named Authority Lists (NAL) in the
MDR refer to the latest version of the structural metadata. For accessing preceding
versions, the URI’s include version numbers. This does not only see to consistency
and continuity but also supports release management as described in Section 3.3.4.
2.3.5. Licensing framework
Both under local and inter-institutional governance, it is important and a legal
obligation under the PSI Directive, for public administrations to make their data,
which includes reference data, available under an open licence upon a so-called
“request for reuse” by any third-party.
The European Commission is following a policy whereby it actively encourages the
publication of government data. Different licensing options can be considered for
the reference data of DG COMP. These include among others:
The ISA Open Metadata Licence v1.17: this is a permissive licence that
grants the rights of use (both for commercial and non-commercial
purposes), the creation of derivative works, and redistribution. Nearly the
only restriction that it applies is to cite the source (attribution).;
The European Commission Legal Notice8: this notice authorises reuse
provided that the source is acknowledged (attribution), additional reuse
conditions (other restrictions) can be added to this by the publisher.
The European Union Public Licence (EUPL)9: this is a permissive software
licence that allows the rights of use (both for commercial and non-
commercial purposes), the creation of derivative works, and redistribution.
In addition to giving attribution, the licence also requires derivative works to
be shared under similar licensing conditions (share-a-like).
DG COMP should set up the appropriate licensing framework, guaranteeing that it
also owns the intellectual property rights before granting any rights to third-parties.
Intellectual property rights are usually acquired by the European Union through
employment or procurement contracts. In case of Member Sate working groups,
contributor agreements may be needed, such as the ISA Contributor Agreement. In
7 https://joinup.ec.europa.eu/community/semic/document/joinup-semantic-asset-licensing-framework 8 http://ec.europa.eu/ipg/basics/legal/notice_copyright/index_en.htm 9 https://joinup.ec.europa.eu/software/page/eupl
Reference data governance and management at DG COMP
03/09/2015 Page 21 of 70
cases when external standards, maintained by standardisation bodies, are reused,
the licensing and reuse conditions of these standards have to be considered and
respected.
2.3.6. Enforcement
Metadata governance should help decide which policy should be followed to
encourage or mandate the reuse of the reference data specification. According to
deliverable ‘D4.1 Metadata management requirements and existing solutions in EU
Institutions and Member States’, the following options are possible:
Legal requirement: implementation is enforced by law; it is an official
requirement;
Comply-or-explain: implementation is not enforced by law, but public
administrations have to comply with the use of a particular specification or
standard for metadata, or if they do not comply, explain publicly why they
do not;
Oversight board: implementation is encouraged via project review
committees; or
Voluntary: implementation is encouraged via information campaigns.
In the context of State-aid control, enforcement is often a matter of a legal
obligation. For example, Commission Regulation (EC) No 794/2004 specifies the
reference data in the forms that Member States have to fill in for State-aid
notification.
2.3.7. Continuous improvement
Metadata governance should facilitate the continuous improvement (implement
feedback) of the metadata management process and governance rules. To ensure
a process for continuous improvement, all decisions taken should be systematically
documented and made accessible for consultation by the various stakeholders
involved. For example the IMMC does so via the publicly available MDR, where a
user not only can find the structural metadata of current application, but also
previous versions. Hereby it should weigh of the benefits of increased
interoperability and data quality against the increased coordination costs.
The following metrics and key-performance indicators should be monitored:
The number of change requests;
The number of releases;
The lead time between receipt of a change request and the closing of the
change management process for this request;
The number of full-time equivalents needed to operate the metadata
governance and management;
The number of systems that have implemented the metadata specifications;
Reference data governance and management at DG COMP
03/09/2015 Page 22 of 70
3. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA
MANAGEMENT
This chapter provides an overview of the requirements and specifications for
reference data management for DG COMP
3.1. Stakeholder requests and needs
The table below contains an overview of requirements for reference data
management gathered from DG COMP.
Table 4 – Stakeholder requests and needs: reference data management
ID Requests and needs
Design reference data
M1
Design reference data
The management processes set up for the State Aid Notification
System should support the design and development of reference data
sets which have to be used by the member states.
M2
Integration with external sources
The State Aid Notification System uses reference data from external
authoritative sources. Integrating these external reference data sets
with the internal system is a key requirement.
M3
Quality control
The proposed solution should put in place processes for controlling the
quality of the reference data. The quality control processes also apply
when updating reference data sets.
Manage reference data changes
M4
Detect changes in external reference data
The proposed solution for reference data management should support
detecting changes to reference data which is managed by an external
organisation and published on an external authoritative source.
M5
Impact assessment of changes in external reference data
When a new version of externally managed reference data is released,
the proposed solution should support assessing the impact of these
changes on the reference data which is used in the State Aid
Notification System. This process should support deciding whether the
reference data of the State Aid Notification System should be modified
as a consequence of a change in external reference data.
M6
Manage changes to internal reference data
The management processes should describe how changes to internally
managed reference data sets should be handled.
Implement reference data changes
M7 Impact assessment of an implementation
Reference data governance and management at DG COMP
03/09/2015 Page 23 of 70
The change management processes should include an impact
assessment of implementing changes into the reference data
repository of the State Aid Notification System on the users of this
system.
M8
Update reference data lists
The management methodology should describe how changes to the
reference data are updated in reference data sets.
M9
Standardised formats for interoperability
In order to foster interoperability with a wide range of (legacy)
systems, the reference data should be made available in a
standardised format. Support for XML is an important requirement.
M10
Propagation
Changes in reference data should be propagated to the Case
Management Systems. While this is a request from DG COMP, it is
considered out of scope for this study.
Share and reuse reference data
M11
List reference data for reuse on an open platform
Reference data which is managed by external organisations and which
should be reused by Member States when exchanging information with
the European Commission should be listed on an open platform which
includes URI’s to the versions that need to be reused.
M12
Share reference data in an authoritative source
Reference data which is internally managed by
DG COMP should be published and documented on an authoritative
source in machine- and human-readable formats.
M13
For the purpose of supporting interoperability, the documentation of
metadata should provide all the necessary elements (e.g. guidelines,
tutorials, tools) for stakeholders to easily incorporate the reference
data with their systems and internal management and governance
structure.
M14
Versioning and backward compatibility
Preceding versions of the reference data should be kept available at
the authoritative source.
Harmonise reference data
M15
Selection
When alternative reference data sets are available to be reused in the
State Aid Notification System, the management processes should
propose a methodology for comparing and selecting one data set.
M16
Mapping
The proposed solution for reference data management should describe
how similar data sets can be mapped in their context, while keeping
trace of such branches
Reference data governance and management at DG COMP
03/09/2015 Page 24 of 70
3.2. Existing methodologies for reference data management
This section contains an overview of existing reference data management
methodologies. These solutions should be taken as a reference for best practices or
even adopted where possible.
3.2.1. Data Management Body of Knowledge (DM-BOK)
The Data Management Association’s guide to the Data Management Body of
Knowledge recommends that changes to controlled vocabularies and their reference
data sets are conducted by a change request process:
1. Create and receive a change request
2. Identify the related stakeholders and understand their interests.
3. Identify and evaluate the impacts of the proposed change.
4. Decide to accept or reject the change, or recommend a decision to
management or governance.
5. Review and approve or deny the recommendation.
6. Communicate the decision to stakeholders prior to making the change.
7. Update the data.
8. Inform stakeholder the change has been made.
3.2.2. ISO 11179-6 Metadata Registration
The ISO/IEC 1117910 standard provides guidelines for several topics related to
Metadata Registries (MDR):
Part 1 introduces a framework containing fundamental ideas of data
elements, value domains, data element concepts, conceptual domains, and
classification schemes;
Part 2 provides a conceptual model for managing classification schemes;
Part 3 specifies a registry meta-model and basic attributes;
Part 4 provides guidelines for formulating unambiguous data definitions;
Part 5 introduces naming and identification principles;
Part 6 provides instructions on how registration applicants could register a
data item with a central Registration Authority, including allocating unique
identifiers for each data item.
Besides data elements, ISO/IEC 11179-6 addresses data element concepts,
conceptual domains and value domains as defined in ISO/IEC 11179-3. The
standard provides guidelines for representing these data types in a metadata
registry that documents the common administration and identification, naming and
definition details together with their administered item-specific details. These
guidelines include:
10 http://metadata-standards.org/11179/
Reference data governance and management at DG COMP
03/09/2015 Page 25 of 70
a proposed structure for an International Registration Data Identifier
(IRDI);
tables that summarize the requirements for the inclusion of metadata
attributes in an MDR;
suggested roles and responsibilities for managing an MDR; and
a suggested set of operations for functional operating procedures.
3.2.3. ISO 19135:2005 Geographic information -- Procedures for item
registration
“ISO 19135:200511 specifies procedures to be followed in establishing, maintaining
and publishing registers of unique, unambiguous and permanent identifiers, and
meanings that are assigned to items of geographic information” [International
Organisation for Standardisation, 2005]. The standard specifies which information
is necessary to uniquely identify, define, manage and register items in a registry.
3.2.4. Information Technology Infrastructure Library (ITIL)
The Information Technology Infrastructure Library (ITIL) is a systematic approach
to the delivery of quality IT services. It provides a basic vocabulary and a number
of processes that are relevant in managing the lifecycle of IT services such as
change management, release management, and service validation and testing.
3.2.5. Good practices from the Publications Office: integrating
Reference Data Management in the Software Development
Lifecycle
In order to foster the reuse of reference data sets, it is crucial to ensure the
reference data release cycle is aligned with the internal software development
lifecycle (SDLC) of its users. For the purpose of integrating reference data
management in the SDLC, the Publications Office of the EU identified several best
practices:
Impact Analysis
In its change management process, the publications office carries out an
impact assessment to assess the impact related to a change to the Named
Authority List (NALs) on the production systems that use them. These
systems are related to the legislative process of the European Union. The
impact analysis can lead to three levels:
o Minor change (minor impact);
o Major change (major impact); and
o Structural change (structural impact).
Align Release Cycles
The Publications Office aims to improve the alignment of the NAL release
cycle with the Software Development Lifecycle of the applications that reuse
11 http://www.iso.org/iso/catalogue_detail.htm?csnumber=32553
Reference data governance and management at DG COMP
03/09/2015 Page 26 of 70
the NALs. This entails categorizing releases as minor, major or structural. In
the future, releases will be scheduled periodically. The periodicity would then
depend on the category of the release: minor releases could be launched
every 2 months and major releases every 3 months. Structural changes,
which include publishing new code lists for example, would be released on
an ad hoc basis.
Get Commitment of External Bodies
For metadata which is under the governance of external or inter-institutional
bodies, such as the IMMC, it is hard to get a general agreement on
implementation planning. Therefore, the Publications Office aims to get
external parties committed to release new versions following a regular
schedule. External releases would not necessarily be published
simultaneously with internal releases, but re-users could adapt their internal
software releases based on the committed release schedule.
Use standards
ISO11179-6 contains a number of suggested operating procedures, roles,
and responsibilities for metadata management. This also includes the role of
a Metadata Steward –called the domain expert at the Publications Office –
who among others helps with the impact assessment. The use of other
standards such as DM-BOK or the ISO 19109 standard on geographic
information may also be very relevant.
Publish Release Notes
Together with each version release, the Publications Office publishes a
release note which justifies and explains the new release.
Publish Difference Lists
When publishing new versions of authority tables, a machine-readable
difference table listing all the changes compared to the previous version is
released. Originally, differences were represented both in XML and SKOS.
Based on feedback received from the users of the metadata, the Publications
Office recently opted to represent the changes in an Excel spreadsheet.
Difference lists are especially valuable to users of legacy systems, in which
reference data sets are often hard coded. Implementing changes in such
systems entails significant software development efforts, which can be
optimized by using difference lists.
Versioning
A good practice in versioning is to keep previous versions of metadata
available on the authoritative source. Combined with good URI management
this allows users to minimize the risk for their operations by referring to
specific versions of the metadata since changes are not de facto
incorporated in their processes or IT systems.
Standardised testing
Since the Publications Office might not be able to assess the impact of a
change in reference data on the operational system of its users, automated
propagation of reference data to those systems would bring significant risks.
Therefore, the Publications Office proposes to run standard test sets at the
user side on new versions of reference data before implementing releases in
production systems. A different strategy could imply defining different
classes of impact and a prior assessment of the impact of different types of
Reference data governance and management at DG COMP
03/09/2015 Page 27 of 70
change mapped out against these classes. This way you can build prior
knowledge of what can be automated and what not.
3.3. Specification for metadata management
This section describes the high-level administrative processes that are included in
the life cycle of reference data management. The administrative processes will be
described using the Business Process Modelling Notation (BPMN) [OMG, 2011].
Although there are different levels of metadata governance, the processes
described below are generic and should therefore be applicable to all.
3.3.1. Design structural metadata
Structural data design entails the processes of agreeing on the syntax and the
semantics, and encoding the reference data in different formats. This phase is out
of scope of this work.
3.3.2. Manage change of structural metadata
Goal: Managing changes that impact the reference data through a centralised
process in order to ensure that the internal IT infrastructure and services as well as
the systems of users remain aligned to business requirements.
Actors and roles:
The environment of the Reference Data Management Component contains two main
levels.
The first level represents the owner of the reference data. This is the
governing body that creates and maintains the reference data set. It
includes roles such as a Reference Data Working Group (RD-WG), a Review
Group (RG) and a Steering Committee. DG COMP or the Publications Office
of the EU, who both own and manage reference data, would be part of this
governance level.
The second level represents the users that reuse the reference data. These
users include systems within or outside of the governing level, such as the
GENIS system, the Case Management Systems of SANI or any DG reusing
reference data owned by DG COMP.
The manage reference data changes process is carried out at the governing level.
Therefore, this process does not include the change management at the user side,
which is described in section 3.3.4 on implementing reference data changes in
operational systems. The majority of tasks within this process is carried out by the
Reference Data Working Group (RD-WG) and reviewed by the Review Group (RG).
At least the following roles should be implemented:
Content expertise: knowledge about the semantics of the data for which
the metadata is used and the applications in which the data is used
Reference data governance and management at DG COMP
03/09/2015 Page 28 of 70
Information management expertise: knowledge about theory and
practice of change management, e.g. impact on environment.
Technical expertise: knowledge about the technical approaches to be used
and the impact on systems.
Documentation and publication expertise: knowledge about the
documentation rules and publication processes used in the environment in
which the metadata is used.
Tasks:
Record a Request for Change (RFC)
The creation of an RFC can be triggered by different sources, such as
incoming user feedback, the outcome of periodic reviews, legal obligations
and the release of a new version of a reused standard. All RFC’s are stored,
tracked and maintained in a ticketing system.
Validate an RFC
The editor of the working group checks if the RFC is provided in the correct
format and if it contains all the relevant information for carrying out the
assessment phase.
Assess and Evaluate the RFC
Since not all RFC’s should lead to a change, objective criteria should be set
up for assessing and evaluating change requests. Such criteria could include
an impact analysis carried out by the owner(s) of the reference data. See
3.2.5 for good practice on impact analysis. The outcome of the assessment
should be a categorisation of the requested change, which influences the
further management of the RFC. The categorisation is carried out by the RD-
WG based on the risk related to the change, which can be minor, major or
structural.
Approve or reject a change request
Based on the assessment of the RFC, the change is accepted or rejected by
the Review Group. The stakeholders are informed about the decision taken.
Plan updates
After an RFC is approved, the implementation and harmonization of the
change should be planned. The working group decides on the timing of the
update, whether the change will be implemented on itself or if it will be
grouped with other changes in a release.
Coordinate Change Implementation into the Component
Before being implemented into the production environment of the Reference
Data Management Component, all changes should be tested in an isolated
testing environment. Moreover, service desks and other related stakeholders
should be provided with the necessary documentation regarding the change
in order to support the implementation.
Review and Close Change
The review and closing stage includes validating if the implemented change
addresses the original RFC and if stakeholders are satisfied.
Reference data governance and management at DG COMP
03/09/2015 Page 29 of 70
Environments and testing:
Typically changes are not implemented immediately into the production
environment. Structural changes to reference data and/or data usually need to be
implemented in supporting software/tools of all stakeholders involved. Such
changes should first be developed in a separate environment in order to guarantee
the continuation and quality of operational systems. Changes to reference data that
impact software or the data itself should be made on a development environment
first. After initial system tests by the development team, the changes can be
applied to a separate environment called Integration/Acceptance. This will allow
users, domain experts, to test the change from a user perspective. If all is well,
then the changes can be rolled out on the production environment. Therefore, at
least the following technical environments should be available:
Development: all changes are developed on this environment.
Integration Testing: after development, the applied change needs to be
tested in an integrated (not isolated) environment, mimicking as close as
possible the real context
Acceptance: this is a separate environment to allow users to accept the
committed changes
Production: the live environment
If the changes are made on alocal level, the tests as described could be enough.
However, on an inter-organisational environment where multiple stakeholders are
involved additional testing is needed. For instance, in the situation where there are
multiple sources, a central processing environment and multiple consumers an
integration / chain test is in order. In this test, all involved parties assure
themselves that the processing of reference data and the changes made, across
multiple systems from different owners, work as described in the documentation.
Only after a successful integration test on a test environment, the actual rollout in
the production chain will take place.
The process as described above applies mainly for structural changes in reference
data that have an impact on the operational software. In other cases a complete
DTAP environment is not a necessity. For instance, if a new version of a data model
is created in an editing tool, the model itself should then be tested. It is not
necessary to have multiple instances of the editing tool, because the tool itself is
not being tested nor is it part of the production environment (data value chain).
Decisions:
Is the change to be discussed on an local, inter-institutional or Trans-
European level
Is the change valid: does it fit, is it cost-effective, are the risks manageable
Decide on follow-up of declined changes
Determine if the change is urgent
Determine when the change will be formalized (for which release)
Reference data governance and management at DG COMP
03/09/2015 Page 30 of 70
3.3.3. Harmonise structural metadata
Goal:
The harmonisation of structural metadata used for information exchange either
through the creation of mappings between terms of two or more specifications for
structural metadata or by forging a wide consensus on the use of a common
specification.
The use of a common specification is more likely in a local and sometimes an inter-
institutional environment. The use of a mapping is more likely where there is a wide
variety of stakeholders such as trans-European where member states are involved.
Metadata alignment can offer a real added value to European institutions and public
administrations of the Member States:
Increase quality and value of the data: the use of common controlled
structural metadata or the use of agreed mappings reduces the
heterogeneity of the dataset and increases the reusability of data in other
contexts, hence the value;
Provide richer and more expressive context to their data;
Increase visibility and discoverability;
Increase reuse potential;
Promote the reuse of information from other authoritative sources.
Actors and roles:
At least the following roles should be implemented:
Content expertise: knowledge about the semantics of the data for which
the metadata is used and the applications in which the data is used
Information management expertise: knowledge about theory and
practice of harmonisation, e.g. mapping across multiple systems.
Technical expertise: knowledge about the technical approaches to be used
for the technical implementation in the environment in which the metadata
is used.
Documentation and publication expertise: knowledge about the
documentation rules and publication processes used in the environment in
which the metadata is used.
Reference data governance and management at DG COMP
03/09/2015 Page 31 of 70
Tasks:
Below is a description of the tasks to be performed in the situation that a mapping
would be needed.
1. Identify and analyse related metadata specifications: the working group
has an operational responsibility to manage metadata. This is including
maintaining metadata as is, accept or decline new proposals based on
metadata harmonization criteria;
2. Propose mappings: the working group should then make a proposal for
mappings. The Steering committee decides whether the proposal is
approved or not. If approved the working group can continue with the
next step;
3. Create and execute mapping (Add the mapping in the controlled
vocabularies file);
4. Testing the mapping
5. Publish the metadata alignment.
Below is a description of the tasks to be performed in the situation that a common
reference model would be used.
1. Identify and analyse related metadata specifications: the working group
has an operational responsibility to manage metadata. This is including
maintaining metadata as is, accept or decline new proposals based on
metadata harmonization criteria;
2. Propose reference model: the working group should agree on a common
model to use and define the parties that are involved;
3. Standardize the common model and determine who will manage the
model;
4. Test the model for all users
5. Publish the metadata alignment.
Decisions:
Decide on the use of common specification or mappings
In case of mappings, decide whether a metadata specification can be
supplemented with official mappings
Decide on management and responsibility of harmonized data
Reference data governance and management at DG COMP
03/09/2015 Page 32 of 70
3.3.4. Release structural metadata
Goal:
Efficiently deploying changes into the Reference Data Management Component
while protecting the live environment through planning, testing, building and
implementing a grouped set of changes.
Actors and roles:
Similar to the change management process, the manage reference data release
process is carried out at the governance level of the reference data. The tasks are
carried out by the Reference Data Working Group and the steering committee.
Tasks:
In release management of reference data and the tools used to support it is good
practice to agree on a number of releases per year. Where reference data is just
used in one system with a high frequency in changes in reference data this is not so
much a necessity, but in an inter-institutional or trans-European environment it is
because of the impact releases have on the environment. A distinction can be made
between minor and major releases. In release management there are two options
for deploying
Immediate implementation: a change is accepted and scheduled for release.
This is most likely in a local environment with a high frequency of changes
Pooled releases: changes are pooled into periodic releases, either minor or
major release depending on the impact of the changes individually and as a
group.
In this document we will describe the process for pooled releases.
Pool a set of changes into a release
The ITIL framework and identified good practices from the Publications
Office indicated that changes could be pooled into periodic releases. By
doing so, users of the reference data can easily align their Software
Development Lifecycles to changes in reference data (see 3.3.4). The
periodicity and impact of releases depend on the release type: minor, major
or structural. For example: small changes which entail low risks for
operational systems can be assigned to minor releases, which are launched
more often than major releases that bear more risk.
Testing
Testing, user acceptance and quality assurance considerations have to be
taken into account before a release is deployed into the production
environment of the Reference Data Management Component. Before the
rollout into production is allowed, the user or business owner should sign off
on the release.
Version a release
As indicated in section 3.1 of this study, a key stakeholder requirement is to
keep preceding versions of the reference data available in order to assure
backward compatibility. It should be possible to access information that was
exchanged in the past with the applicable version of the related reference
data. In order to satisfy this requirement, versions have to be managed in a
Reference data governance and management at DG COMP
03/09/2015 Page 33 of 70
clear and structured way and preceding versions should be kept available in
the reference data repository. All versions and changes should be well
documented (3.3.4.1). A study conducted in light of the ISA programme on
metadata management in European Institutions and Member States
[European Commission, ISA Programme, 2014] identified several aspects of
versioning which should be taken into account:
o Numbering
Several options are identified for version numbering. A first approach,
which is applied by the Inter-institutional Metadata Management
Committee (IMMC), is to identify versions based on release dates
combined with a sequence number, e.g. 20140101-0, 20140101-1,
etc. The sequence number is mostly used for immediate bug fixes. A
second option for assigning version numbers is a multi-level
approach, which is for example applied by KOOP, a governmental
organisation from the Netherlands. In a three level approach – e.g.
X.Y.Z – Z could be altered in case of bug fixes, Y in case of minor
updates and X in case of major updates. It could also reflect to which
extend changes belong in a class of release cycles. E.g. Z for
automated changes without risk
o Backward compatibility
Backward compatibility means that new versions of the structural
metadata should be compatible with preceding versions. Updates to
data models should impact the day-to-day operations of its users as
little as possible. Therefore, backward version compatibility should be
taken into account in the update procedure. All updates that are not
backwards compatible should be clearly documented in the release
notes, and should also be accompanied by guidelines to the users on
how to deal with these changes in their production systems;
o Tool support
Deliverable D4.1 (European Commission, ISA Programme, 2014)
listed Apache Subversion (SVN) as a tool for version management.
Other versioning tools include the Concurrent Version System (CVS)
or Git; and
o Authoritative Source
It is recommended to select an authoritative source which provides
support for versioning systems. For fostering interoperability, it is
crucial that persistent Uniform Resource Identifiers are managed
properly. An example of combining best practices of persistent URI’s
and keeping preceding versions available is provided by the Metadata
Registry (MDR) of the Publications Office of the European Union. The
URI’s of the Named Authority Lists (NAL) in the MDR refer to the
latest version of the structural metadata. For accessing preceding
versions, the URI’s include version numbers.
Publish release notes
Release notes describe the general information of a release: the date of
publication, the version number, the URI, the expiration date of the version,
etc. Moreover, they should include a list of changes compared to the
previous version, preferably in machine-readable format.
Implement release
Reference data governance and management at DG COMP
03/09/2015 Page 34 of 70
This phase entails the release in the Reference Data Management
Component.
Communicate to stakeholders
Once the release has been rolled out the users and other stakeholders
should be notified.
Decisions:
Decide on the number and type of releases per year
To agree the exact content and plan for each release
Determine the release schedule
Actions to be taken if a release is cancelled
Whether a metadata specification can be published under an open licence
3.3.4.1.Document reference data
Goal:
Reference data is data used to classify or categorize other data. Business rules
usually dictate that reference data values conform to one of several allowed values.
The set of allowable data values is a value domain. These business rules and
domains should be well documented for successful interoperability.
Actors and roles:
At least the following roles should be implemented:
Documentation and publication expertise: knowledge about the
documentation rules and publication processes used in the environment in
which the metadata is used.
Content expertise: knowledge about the semantics of the data for which
the metadata is used and the applications in which the data is used
Tasks:
Documenting reference data may include adding descriptive reference data, such as
these defined in the Asset Description Metadata Schema (ADMS see 4.2.4):
The meaning and purpose of each reference data value domain
The reference tables and databases where the reference data appears
The source of the data in each table
The version currently available
When the data was last updated
How the data in each table is maintained
Who is accountable for the quality of the data and meta-data
Successful organizations first understand the needs for reference data. Then they
trace the lineage of this data to identify the original and interim source databases,
files, applications, organizations, and even the individual roles that create and
maintain the data. Understand both the up-stream sources and the down-stream
needs to capture quality data at its source.
Reference data governance and management at DG COMP
03/09/2015 Page 35 of 70
Decisions: The method used for documenting reference data. How will documenting
itself be managed, is it part of the rules of procedure
Who determines and changes the business rules
Which policy is followed to encourage or mandate the reuse of the reference
data specification
3.3.5. Deploy structural metadata
Goal:
Efficiently deploying changes into the operational systems of users while protecting
the live environment of their system through planning, testing, building and
implementing a grouped set of changes.
Actors and roles:
The implementation of reference data changes in operational system is carried out
by the users of the reference data. Here, the reference data management lifecycle
has a touch point with the software development lifecycle (SDLC).
Tasks:
There are two ways for implementing reference data: either automatically (for
instance GENIS RDC) or integration with the normal system development lifecycle.
A description of the latter as it would be in an inter-institutional environment is
given below.
Detect a change
Users should be subscribed to reference data sets for which they want to
receive notifications of upcoming and rolled out changes. Notifying users of
changes is the responsibility of the governance level, subscribing is the
responsibility of the users.
Log a system change request
Changes to reference data which have an impact on the operational systems
of users should lead to the creation of a change request in their internal IT
system, which then triggers the internal change management processes.
Analyse the impact of a change on a system
The impact assessment which is part of the Manage Reference Data Changes
process is carried out on the level of the reference data owner, thus it does
not take into account the specific characteristics of the users’ internal IT
system. Therefore, it is necessary for users to carry out an impact analysis
before implementing changes to their systems.
Test and Propagate a change to the system
A rolled out change could be grouped with other, internal changes in order
to match the Software Development Lifecycle or software release schedule
of the user. The propagation of changes should include a testing phase in an
isolated environment before releasing them into production.
Reference data governance and management at DG COMP
03/09/2015 Page 36 of 70
Log the change
All changes should be logged and documented for several purposes, such as
assuring the possibility to restore a system to a previous state and creating
an audit trail.
Decisions:
Where to store a metadata specification and with which access restrictions
(define roles and responsibilities)
Whether a metadata specification can be supplemented with official
mappings
3.3.6. Retire structural metadata
Goal:
Changes to internal or external reference data sets may be minor or major. For
example, country code lists go through minor revisions as geopolitical space
changes. When the Soviet Union broke into many independent states, the term for
Soviet Union was deprecated with an end of life date, and new terms added for new
countries
Sometimes terms and codes are retired. The codes still appear in the context of
transactional data, so the codes may not disappear due to referential integrity. The
codes found in a data warehouse also represent historical truth. Code tables,
therefore, require effective date and expiration date columns, and application logic
must refer to the currently valid codes when establishing new foreign key
relationships.12
Actors and roles:
A proper impact analysis of data deprecation is essential to ensure the continuity
and quality of data and systems. The involvement of all consumers is key. At least
the following roles should be implemented:
Content expertise: knowledge about the semantics of the data for which
the metadata is used and the applications in which the data is used
Information management expertise: knowledge about theory and
practice of harmonisation, e.g. mapping across multiple systems.
Technical expertise: knowledge about the technical approaches to be used
for the technical implementation in the environment in which the metadata
is used.
Documentation and publication expertise: knowledge about the
documentation rules and publication processes used in the environment in
which the metadata is used
12 Source: DAMA guide
Reference data governance and management at DG COMP
03/09/2015 Page 37 of 70
Tasks:
• Assess the impact of deprecation
• Review for approval
• Approach all consumers of the data
• Clearly mark reference data as deprecated
• Ensure backwards compatibility
Decisions:
• Whether a metadata specification should be deprecated
• How to approach all consumers
• How to ensure backwards compatibility
Reference data governance and management at DG COMP
03/09/2015 Page 38 of 70
4. REQUIREMENTS FOR AND ASSESSMENT OF EXISTING REFERENCE DATA
TOOLS
In this chapter, we identify and assess the coverage of the identified requirements
and proposed approach by existing tools, including the GENIS reference data
component, and assess their use, usefulness, and fitness-for-purpose.
4.1. Stakeholder requests and needs
Below is a list of stakeholder requests and needs for tools. The requirements for
these tools are closely related to the governance and management model as
discussed in chapter 3.3. Basically, tools are needed for the following:
Reference data editor: edit, harmonize, map and document reference
data;
Tools for managing reference data changes: managing changes and
releases of reference data;
Tools for reference data propagation: implementing and retire
reference data; and
Tools for reference data publication.
Table 5 – Reference data tools
ID Requests and needs
Reference data editor
T1
Feature list
DG COMP needs a tool that is capable of editing reference data and
support the design of reference data in the context of one or more
information systems. The tool should support tasks in the following
processes:
Design reference data;
Manage reference data changes.
The tool should have the following features:
Import reference data from an external source and detect
changes;
Create, read, update, or delete a concept scheme;
Create, read, update, or delete concepts in a concept scheme;
Add multilingual labels to a concept scheme;
Foresee a possibility to define the order of concepts in a
concept scheme;
Version concept schemes;
Version concepts;
Version the labels of concepts;
Export one or more versions of a concept scheme.
Tools for managing reference data changes (ticketing/workflow)
T2 Feature list
DG COMP needs a tool that is capable of
Reference data governance and management at DG COMP
03/09/2015 Page 39 of 70
Keeping a log of change requests;
Keeping track of impact analyses;
Keeping a log of decisions on change requests;
Creating release notice;
Linking change requests to release notes and vice-versa; and
Linking change requests to version and vice-versa.
Tools for reference data propagation
T3
Feature list
The tool should allow:
Deploy versioned reference data-as-a-service to an information
system;
Deliver services while disconnected (local cache); and
Provision all versions (full versioning of temporal changes and
language versions).
Tools for publishing a release
T4
Feature list
The tool should provide:
Read-access over HTTP/s;
Write-access over WebDAV or Subversion.
Tools for reference data harmonisation
T5
Feature list
The tool should provide:
Mapping: a means of mapping concepts in different concept
schemes;
Link discovery: a means of discovering relationships between
data items within different Linked Data sources
4.2. Existing standards for reference data management
This section lists a number of metadata standards that should be supported by
metadata tools:
Standard representations (exchange formats) for reference data such as
SKOS, and GeneriCode.
Standards for documenting metadata specifications such as ADMS.
4.2.1. Representation: Simple Knowledge Organisation System
(SKOS)
SKOS13, the Simple Knowledge Organisation System, is a common data model for
sharing controlled vocabularies such as code lists, thesauri, and taxonomies via the
Web in a machine-readable format. In the Core Vocabularies14 specifications of the
13 http://www.w3.org/2004/02/skos/vocabs
14 https://joinup.ec.europa.eu/system/files/project/Core_Vocabularies-Business_Location_Person-
Specification-v1.00_0.pdf
Reference data governance and management at DG COMP
03/09/2015 Page 40 of 70
ISA Programme, SKOS is the recommended vocabulary for the representation of
code lists. The Publications Office already uses SKOS as the official format of
EuroVoc, the EU’s multilingual thesaurus, and the Named Authority Lists.
SKOS provides a standard way to represent knowledge organization systems using
the Resource Description Framework15 (RDF). Encoding this information in RDF
allows it to be passed between computer applications in an interoperable way.
Using RDF also allows knowledge organization systems to be used in distributed,
decentralised metadata applications. Decentralised metadata is becoming a typical
scenario, where service providers want to add value to metadata harvested from
multiple sources.
SKOS represents the terms in a controlled vocabulary as instances of the class
skos:Concepts. SKOS also defines properties for multi-lingual labels
(skos:prefLabel), associated codes (skos:notation), and definitions
(skos:definition). The publication of controlled vocabularies represented in SKOS on
the Web brings the following advantages:
1. De-referencing: the principles of Linked Data requires each term in the
controlled vocabulary to be identified by a corresponding term URI based on
the HTTP protocol. The term “Taxonomy” in the “Asset Type” scheme has for
example the following term URI:
<http://purl.org/adms/assettype/Taxonomy>. This means that when
someone else encounters such an URI, he can look up its meaning by
entering the URI in the address bar in his browser. This is called de-
referencing as it is an actual valid reference, and not a pointer. This is a
simple yet powerful feature of the Web.
2. Machine-readability: In the example of “Taxonomy”, the user can use the
term URI to retrieve both a machine-readable and human-readable file
containing definitions, labels, and related concepts for this term expressed
in SKOS. SKOS is a W3C Recommendation and commonly used
representation format for controlled vocabularies. Well-known thesauri such
as EuroVoc have been defined using an ontology that extends SKOS.
3. Multilingualism: SKOS allows to associate labels and definitions in multiple
languages to any concept. This means that we can associate the labels
“taxonomie”@FR, “Taxonomie”@DE, or “taxonomia”@PT to the concept
identified with URI http://purl.org/net/mediatypes/application/OWL+XML to
include the French, German, and Portuguese labels.
4. Metadata alignment: SKOS provides mapping properties like
skos:closeMatch, skos:exactMatch, skos:broadMatch, skos:narrowMatch and
skos:relatedMatch. These properties are used to state mapping alignment
links between SKOS concepts in different concept schemes, where the links
are inherent in the meaning of the linked concepts.
a. The properties skos:broadMatch and skos:narrowMatch are used to
state a hierarchical mapping link between two concepts.
15 http://www.w3.org/RDF/
Reference data governance and management at DG COMP
03/09/2015 Page 41 of 70
b. The property skos:relatedMatch is used to state an associative
mapping link between two concepts.
c. The property skos:closeMatch is used to link two concepts that are
sufficiently similar that they can be used interchangeably in some
information retrieval applications. In order to avoid possibilities of
"compound errors" when combining mappings across more than two
concept schemes, skos:closeMatch is not declared to be a
transitive property.
d. The property skos:exactMatch is used to link two concepts, indicating
a high degree of confidence that the concepts can be used
interchangeably across a wide range of information retrieval
applications. skos:exactMatch is a transitive property, and is a
sub-property of skos:closeMatch.
SKOS is an extensible vocabulary. One popular extension is SKOS-XL, which
extends SKOS with labels (SKOS eXtension for Labels).
4.2.2. Representation: GeneriCode
The OASIS Code List Representation format, GeneriCode16, is a single model and
XML format (with a W3C XML Schema) that can encode a broad range of code list
information. The XML format is designed to support interchange or distribution of
machine-readable code list information between systems.
4.2.3. Representation: Using HTTP URIs to identify concept schemes
and concepts
In order to facilitate its sharing and reuse across systems and organisation,
structural metadata needs to have persistent unique identifiers. As we are
experiencing the era of the Web of Data, it is recommended that such identifiers
come in the form of HTTP URIs. The ISA Programme as well as W3C have created
good practices and guidelines for the design and management of well-formed,
persistent URIs [European Commission - ISA Programme, 2012], e.g. see ISA’s 10
Rules for Persistent URIs17.
4.2.4. Description: Asset Description Metadata Schema (ADMS)
The Asset Description Metadata Schema (ADMS) is a common vocabulary for
descriptive metadata, used to describe interoperability solutions. ADMS is currently
a W3C Working Group Note18.
ADMS is intended as a model that facilitates federation and co-operation. Like
DCAT, ADMS has the concepts of a repository, assets within the repository that are
often conceptual in nature, and accessible realizations of those assets, known as
distributions. ADMS is an RDF vocabulary with an RDF schema available at its
namespace http://www.w3.org/ns/adms . The original ADMS specification published
16 http://docs.oasis-open.org/codelist/ns/genericode/1.0/
17 https://joinup.ec.europa.eu/community/semic/document/10-rules-persistent-uris/
18 http://www.w3.org/TR/vocab-adms/
Reference data governance and management at DG COMP
03/09/2015 Page 42 of 70
by the European Commission [ADMS1] includes an XML schema that also defines all
the controlled vocabularies and cardinality constraints associated with the original
document.
ADMS allow users to:
• “describe semantic assets in a common way so that they can be seamlessly
cross-queried and discovered by ICT developers from a single access point,
such as Joinup;
• search, identify, retrieve, compare semantic assets to be reused avoiding
duplication and expensive design work through a single point of access;
• keep their own system for documenting and storing semantic assets;
• improve indexing and visibility of their own assets;
• Link semantic assets to one another in cross-border and cross-sector
settings.”
When reference data is stored, regardless in what manner, extra descriptive
metadata can be very useful regarding re-usability, transparency, etc. Descriptive
metadata about reference data sets may document:
The meaning and purpose of each reference data value domain.
The reference tables and databases where the reference data appears.
The source of the data in each table.
The version currently available.
The last modification date.
The way the data is maintained.
The person accountable for the quality of the data and metadata.
The main limitation of ADMS is that it perceives structural metadata as a black-box.
This means that it can be used for describing a data model or a reference dataset
as a whole, but it cannot be used for describing particular elements within that data
model or reference dataset – or at least this is not its purpose. In such cases, the
use of other standards is recommended, such as ISO 11179 standard on metadata
registries
4.3. Existing tools for reference data management
4.3.1. Publication: Joinup
In this context, the main value of Joinup is as an online collaborative platform. The
Joinup platform was developed by the ISA programme of the European
Commission for releasing and documenting specifications for structural metadata
such as ontologies, data models, code lists, XML schemas, reference data, etc.
Publishing reference data on Joinup allows users to easily find the data, download it
and provide feedback.
Joinup offers the following features that support the release and publication of
structural metadata:
WebDav;
Subversion;
Release management; and
ADMS editor and ADMS-conform publication.
Reference data governance and management at DG COMP
03/09/2015 Page 43 of 70
An example of a structural metadata specification that uses Joinup as repository is
OSLO – Open Standards for Local Authorities. The OSLO project has created the
following permanent (persistent) URIs using the purl.org service:
http://purl.org/oslo
>> redirects to >> https://joinup.ec.europa.eu/node/66650
http://purl.org/oslo/ns/vocabulary
>> redirects to >> https://joinup.ec.europa.eu/svn/adms/CESAR/V-ICT-
OR_OSLO/OSLO_v1.00_XML_Schemas.zip
These permanent URIs can be configured to forward requests to any location. This
gives the OSLO project the flexibility to refer to its specifications using the
permanent URLs. Currently, the request is forwarded to Joinup. The specifications
itself as stored on a Subversion versioning store, which is also accessible through
HTTP. Using the Joinup ADMS editor, a description of the structural metadata was
made. The description metadata is available in both human-readable form (HTML)
and machine-readable form (RDF-XML).
https://joinup.ec.europa.eu/node/66650
4.3.2. Publication: Metadata Registry of the Publications Office
(MDR)
The Metadata Registry (MDR) of the Publications Office19 of the EU is the
authoritative source for definition data – metadata elements, named authority lists,
schemas, etc. – and authority data used for exchanging data between institutions
involved in the legal decision making process. Many of the definition data sets
contained in the MDR are governed by the Inter-Institutional Metadata Maintenance
Committee (IMMC).
The Publications Office uses a tool chain and some scripts to edit the Named
Authority Lists. For each NAL, the Publications Office publishes a set of distribution
which can be downloaded from the MDR website. These sets are composed of a
SKOS, XML, XSD and HTML version.
A publication package is also available as a zip file. It contains the distribution of
changed NALs (XML, SKOS, ATTO-XML20), a comparison file allowing to identify
differences between the previous and the current version, and the release notes
listing the changes to the NALs included in the publication.
4.3.3. Editor / Propagation: GENIS Reference Data Component
(GENIS RDC)
In the context of the Generic Interoperable Notification Services (GENIS) project,
funded under Action 1.11 of the ISA programme, a GENIS Reference Data
Component (GENIS RDC) was built. The GENIS RDC has the following features:
19 http://publications.europa.eu/mdr/ 20 http://publications.europa.eu/mdr/authority/
Reference data governance and management at DG COMP
03/09/2015 Page 44 of 70
Import reference data from a file;
Create, read, update, delete reference data using the Web-based graphical
user interface;
Export reference data to a file;
Deploy reference data as a service to clients.
The GENIS RDC considers the following
Project: Reference data is categorised in projects. The SANI2 project for
example contains the reference data which is linked to the State Aid
Notification Infrastructure.
Group: projects have zero, one or more groups. Groups represent concept
schemes, for example a country code list.
Reference data entity: each group consists of reference data entities. By
defining different start and end dates to a reference data entity in different
projects, each system will be able to access the version which is relevant.
For example, the HR system might need Serbia as part of its reference data
while DG COMP might not yet need it in its system.
Representation: reference data entities can have one or more
representations (e.g. alpha-3 and alpha-2 codes for countries).
Ordering: groups can have one or more orderings for the reference data
entities included in it.
The Component supports versioning of the reference data on group and on
reference data entity levels. Clients can consume reference data according to a
timestamp. By doing so, the Component allows to serve reference data as it was
available at any point in time in the past. For example, when a user fills in a
notification form; the form component stores the form together with the codes of
the reference data items. When the form is opened at a later stage, the form will
appear with the reference data labels that were available at the time when the form
was submitted.
According to a presentation on the GENIS RDC delivered by DG COMP, the software
supports several main features for managing reference data, metadata and
enterprise master data:
Multi-tenancy: The software is designed in a way that allows it to run as a
single instance on a server, while serving multiple client organisations. The
Component categorises reference data in projects, to which users and
managers are assigned;
Graph Data: The domain model of the reference data in the tool includes
Project, Group, Reference Data Item, Representation, Order and Tag
entities. Ownership, lifecycle, protection and data segments are defined for
each entity;
Versioning: Versioning is carried out at the Group Entity and at the
Reference Data Entity level. These entities can get project-specific start and
end dates assigned;
Data staging: Various import and export capabilities like XML and CSV are
supported. Import and export is script driven, so it can be adapted to the
specifications of different systems.
Reference data governance and management at DG COMP
03/09/2015 Page 45 of 70
Decoupling: Clients of the Reference Data Management Component
operate entirely on locally cached data. By decoupling the tool from its
clients, an outage of the Component does not lead to an interruption in the
client’s system;
Multilingualism: each reference data item can have labels in any language;
Deployment: The Component can be deployed as a service, integrated into
an application, as a standalone application or as a proxy;
Notification: Users of the Reference Data Management Component are
notified in case of import; manual data changes or changes on cross
referenced data.
The governance process currently adopted by DG COMP for the Reference Data
Management Component involves three roles:
Administrator: Users in the Administrator role can create and delete
projects, groups, project managers, normal users…;
Project Manager (creation of standard users, giving access to a specific
project, etc.). The Project Manager can create Standard Users and give them
access to projects; and
Standard user: a Standard User will be assigned to one or more projects and, when logged in will have access to one or more projects.
The Reference Data Component by DG COMP is intended solely for reference data
and allows for a clear distinction between this and the business logic which will be
in the application layer. To this extent, the building block can be reused by other
systems as a plugin, via web-services, via API, or using a dedicated client. It is
currently designed for use within DG COMP and would need further work before it
can be made available as a generic solution for interoperability.
4.3.4. Editor: VocBench
VocBench21 is a web-based editing and workflow tool for managing thesauri,
authority lists and glossaries based on SKOS and RDF. The tool was developed by
the Food and Agricultural Organisation (FAO) of the United Nations. VocBench
supports collaborative editing, multilingual terminologies and administration
functions that allow assigning roles for maintenance, validation and quality
assurance.
The Publications Office of the European Commission uses VocBench to manage its
EuroVoc thesaurus.
4.3.5. Editor: PoolParty: Thesaurus Management
PoolParty Thesaurus Server22 is a software tool for creating and maintaining
taxonomies, thesauri, ontologies and knowledge graphs. The tool manages
21 http://aims.fao.org/tools/vocbench-2; http://vocbench.uniroma2.it/
22 http://www.poolparty.biz/portfolio-item/poolparty-thesaurus-server/
Reference data governance and management at DG COMP
03/09/2015 Page 46 of 70
metadata based on standards like RDF and SKOS. Designing code lists can be done
via the graphical interface or by importing existing lists in formats like XML, Excel,
etc. Moreover, the tool carries out automatic quality checks based on SKOS.
For system integration purposes, PoolParty provides an API which is based on the
SPARQL standard, an RDF database query language.
4.3.6. Editor: Silk workbench (link discovery)
Silk Workbench23 is a web application which guides the user through the process of
creating a link specification for interlinking two data sources.
The Silk Workbench provides the following components:
• Workspace Browser enables the user to browse the projects in the
workspace. Linking Tasks can be loaded from a project and committed back
to it later.
• Linkage Rule Editor A graphical editor which enables the user to easily
create and edit link specifications. The widget will show the current link
specification in a tree view while allowing editing using drag-and-drop.
• Evaluation allows the user to execute the current Link Specification. The
links are displayed while they are generated on-the-fly. Generated links for
which the reference link set does not specify their correctness, the user may
confirm or decline their correctness. The user may request detailed
summaries on how the similarity score of specific links is composed of.
4.3.7. Workflow Management tool: Activiti
Activiti24 is an open source tool that aims at serving the Business Process
Management (BPM) needs of both business people as well as IT developers. The
tool supports designing and graphically authoring Workflow processes (e.g. in
BPMN), it provides features for task management such as creating and assigning or
temporarily delegating tasks to users, etc.
It can run in embedded, standalone or client/server mode. Its engine is written in
java, which means it can call out to native Java code, which makes it a great choice
for a dedicated workflow component in an (existing) Java platform.
4.3.8. Change management: Atlassian JIRA
Atlassian JIRA25 is an online ticket tracking system that supports organising and
following up on issues, assigning work packages and monitor team activity. JIRA
can be used for following up on change requests and to support the development
and maintenance of reference data.
23 https://www.assembla.com/spaces/silk/wiki/Silk_Workbench 24 http://activiti.org/userguide/index.html#N10007 25 https://www.atlassian.com/software/jira
Reference data governance and management at DG COMP
03/09/2015 Page 47 of 70
4.3.9. Deployment: Mule
ESBs are universal connectors; they transform/route/augment messages securely
and can notify subscribed clients. For this reason, by excellence they are used a lot
for integration purposes. Mule is an Open Source Java ESB.
4.3.10. Editor / Deployment: Jena
Apache Jena is a free an Open Source Java framework for building semantic web
applications. It is composed of different APIs to interact on RDF data. These APIs
allow Jena to span from core RDF processing to inferring knowledge and
establishing Ontologies.
4.4. Domain model
Figure 4 contains a domain model that provides a logical metadata model that we
will use for describing the reference data management processes. The domain
model is a conform subset of the SKOS-XL standard.
The domain model consists of the following classes [Miles & Bechhofer, 2009]:
Concept Scheme: A SKOS concept scheme can be viewed as an
aggregation of one or more SKOS concepts. Semantic relationships (links)
between those concepts may also be viewed as part of a concept scheme.
This definition is, however, meant to be suggestive rather than restrictive,
and there is some flexibility in the formal data model stated below. The
notion of a concept scheme is useful when dealing with data from an
unknown source, and when dealing with data that describes two or more
different knowledge organization systems.
Concept: A SKOS concept can be viewed as an idea or notion; a unit of
thought. However, what constitutes a unit of thought is subjective, and this
definition is meant to be suggestive, rather than restrictive. The notion of a
SKOS concept is useful when describing the conceptual or intellectual
structure of a knowledge organization system, and when referring to specific
ideas or meanings established within a KOS.
Label: A lexical label is a string of UNICODE characters, such as "romantic
love" or "れんあい", in a given natural language, such as English or Japanese
(written here in hiragana). The Simple Knowledge Organization System
provides some basic vocabulary for associating lexical labels with resources
of any type. In particular, SKOS enables a distinction to be made between
the preferred, alternative and "hidden" lexical labels for any given resource.
The preferred and alternative labels are useful when generating or creating
human-readable representations of a knowledge organization system. These
labels provide the strongest clues as to the meaning of a SKOS concept. The
hidden labels are useful when a user is interacting with a knowledge
organization system via a text-based search function. The user may, for
example, enter mis-spelled words when trying to find a relevant concept. If
the mis-spelled query can be matched against a hidden label, the user will
be able to find the relevant concept, but the hidden label won't otherwise be
visible to the user (so further mistakes aren't encouraged).
Reference data governance and management at DG COMP
03/09/2015 Page 48 of 70
Ordered Collection: SKOS concept collections are labelled and/or ordered
groups of SKOS concepts. Collections are useful where a group of concepts
shares something in common, and it is convenient to group them under a
common label, or where some concepts can be placed in a meaningful order.
The domain model consists of the following relationships:
Broader and narrower: SKOS semantic relations are links between SKOS
concepts, where the link is inherent in the meaning of the linked concepts.
The Simple Knowledge Organization System distinguishes between two basic
categories of semantic relation: hierarchical and associative. A hierarchical
link between two concepts indicates that one is in some way more general
("broader") than the other ("narrower"). An associative link between two
concepts indicates that the two are inherently "related", but that one is not
in any way more general than the other.
prefLabel: the preferred label (as an entity);
memberList: skos:memberList is a functional property, i.e., it does not
have more than one value. This is intended to capture within the SKOS data
model that it doesn't make sense for an ordered collection to have more
than one member list.
The domain model consists of the following attributes:
URI: identify concepts in a unique way;
prefLabel: multilingual label attributed to a concept. Per language, only one
preferred label can be defined;
Notation: lexical code used to uniquely identify a concept within a concept
scheme; and
Definition: skos:definition provides a plain text definition of classes.
URI[1]
name[0..*]
ConceptScheme
URI[1]
notation[1]
prefLabel[0..*]
altLabel[0..*]
definition[0..*]
example[0..*]
validFrom[1]
validTil[1]
Concept
hasTopLevelConcept
narrower
broader
URI[1]
name[0..*]
OrderedCollectionmemberList <<Ordered>>
URI[1]
literalForm[0..*]
validFrom[1]
validTil[1]
Label
prefLabel
inScheme
Reference data governance and management at DG COMP
03/09/2015 Page 49 of 70
Figure 4 – UML Static Diagram: Domain Model for reference data (based on SKOS-XL)
This too has a need for version control. It is important that a scheme is versioned
so that the relevance and value is known. Versions are updated according to the
principles of release management. Minor releases are for instance changes in
examples and should not be given a new version; major releases are changes in
ConceptScheme as this has an impact on the environment it should be versioned.
In this case we recommend that ConceptScheme should be versioned, Concept and
Label should not.
4.5. Data flow diagram
It is understood that the Reference Data Building Block could designed to operate
within an environment that starts with an external authentic source (e.g. the
Publications Office for country codes) and end with that data being used as
reference in operational databases such as the GENIS one. The authoritative source
can be managed via Joinup, changes are managed and logged with the aid of
aforementioned tools, GENIS propagates the data and finally changes get a follow-
through in the operational systems. The figure below refers to this understanding,
which is further relied on in the use-cases below.
Figure 5- Simplified DFD for the flow of data between authentic source and GENIS
4.6. High-level use cases
This section lists a number of high-level use cases that need to be supported by a
tool (or a combination of tools) to support the reference data management
lifecycle.
Reference data governance and management at DG COMP
03/09/2015 Page 50 of 70
Figure 6 High-level use cases for metadata management
4.6.1. Use Case 0 – Edit an authentic source of reference data
DG COMP needs to manage the relationship between its reference data and the
authentic source. Although DG COMP itself may own reference data which is an
authentic source, the creation of reference data is out of scope for this study.
4.6.2. Use Case 1 – Detect reference data changes
The system needs to be able to detect reference data changes that happen at an
authentic source and notify the actor.
ID Detect reference data changes
Goal Detect and identify the changes to an authentic source of reference
data
Preconditi
ons
Authentic reference data linked to a Concept Scheme is available
in an authentic source and accessible via an HTTP request over a
(persistent) HTTP URI.
Context-specific reference data is available in the tool and
associated with the authentic reference data.
Success
End
Condition
The tool produces a list of HTTP URIs for Concepts per
ConceptScheme for which a change has occurred.
Failed End
Condition Authentic reference data is either unavailable or incorrect
Primary
Actor Editor
Secondary
Actors Authoritative source – submit feedback
Priority
Performan
ce
Frequency Ad-hoc
Trigger Periodic check or at request by the reference data editor.
Other
Descriptio
n
Ste
p Action
Basic flow 1 The Editor creates a local, context-specific Concept Scheme,
Reference data governance and management at DG COMP
03/09/2015 Page 51 of 70
for example, a list of country codes.
2
The Editor populates the context-specific Concept Scheme from
an authentic source. This is done by configuring a persistent
URI for the associated authentic source of reference data. For
example, the countries NAL from the MDR:
http://publications.europa.eu/mdr/resource/authority/country/
skos/countries-skos.rdf
3
The Editor requests the system to detect changes between the
local, context-specific Concept Scheme and the authentic
source of reference data.
OR
The system periodically (e.g. every day) triggers the detection
of changes.
4
The system retrieves the latest version of the authentic source
of reference data. The system produces a list of differences
between the local and the authentic reference data.
4.6.3. Use Case 2 – Manage reference data changes
DG COMP needs to manage how a validated change at an authentic source can
enter the production environment of a particular information system at DG COMP.
It is assumed here that a reference data is trusted and propagated to the various
information systems (e.g. GENIS, SARI and eventually other DG COMP systems
outside of the State-aid domain) once it is in the Reference Data Building Block.
ID Manage reference data changes
Goal Changes to reference data are made in a controlled
environment ensuring continuity and quality
Scope and Level
Preconditions The system has produced a list of differences between the
local and the authentic reference data.
Success End
Condition
Each change on the difference list has been fully treated:
- it has either been applied to local reference data; or
- the difference has been discarded.
Each difference is automatically logged in a ticketing
system.
Failed End Condition Change has been denied due to semantic or syntactic
errors
Primary Actor Editor
Secondary Actors Stakeholders – submit feedback
Priority
Reference data governance and management at DG COMP
03/09/2015 Page 52 of 70
Performance
Frequency Either:
Ad hoc: when receiving feedback from users and/or when (new) legal obligations arise; or
Periodic: when changes are pooled for a planned release.
Trigger Request for change
Other
Description Step Action
Basic flow 1
The editor has received a list of difference
between the local and the authentic reference
data. This list needs to be logged as a change
request.
2
The editor assess the impact of the
differences between the lists on the local
dataset
3 The change is approved after a check on
design rules, semantics and syntax.
4 The editor adds or edits for instance a country
code in the Concept Scheme and defines it.
5
The editor then plans for the new version to
be released to all consumers and involved
stakeholders are informed.
4.6.4. Use Case 3 – Deploy reference data changes
DG COMP needs to manage the deployment / propagation of reference data
changes to its information systems.
ID Deploy reference data changes
Goal
Propagating all changes to consumer systems in order to
establish a new stable build. All system changes go
through the process of testing, acceptance and production.
Scope and Level
Preconditions
Local reference data has been configured (Concept
Scheme).
Stakeholders are informed and involved in upcoming
change.
Success End
Condition
Local reference data is made available – as a service – to a
client application.
Failed End Condition Service cannot be consumed by stakeholders
Primary Actor Software Developer
Reference data governance and management at DG COMP
03/09/2015 Page 53 of 70
Secondary Actors Stakeholders – testing and consuming
Priority
Performance
Frequency Either:
Ad hoc: when receiving feedback from users and/or when
(new) legal obligations arise; or
Periodic: when changes are pooled for a planned release.
Trigger Planned release
Other
Description Step Action
Basic flow 1
Import the new version of the reference data,
for example, the countries NAL to the test
environment
2
Version the concept schemes so that a new
list is created and validity (timestamp) of the
data can be entered.
3 Check if the local reference data passes
validity rules.
4
If the validity rule passes export the NAL to a
test environment of an exemplary consumer
for acceptance.
5
If the NAL passes testing and acceptance
repeat the steps above and deploy reference
data as a service to other information systems
4.7. Assessment of proposed tooling for reference data
management
This section proposes a set of tools for managing reference data. It is indicated
which steps of the reference data process are supported by which tools. All
requirements can be supported by existing tools described in Section 4.3, some of
which are already being used within the EC.
Requirement JIRA GENIS RDC VocBench Joinup Silk
T1 Edit
Import reference
data from external
source
x x x
CRUD
ConceptScheme x x
multilingualism x x
Order of concepts x x
Versioning x x
Reference data governance and management at DG COMP
03/09/2015 Page 54 of 70
Requirement JIRA GENIS RDC VocBench Joinup Silk
Export x x
T2 Changes
Log changes x
Keeping track of
impact analysis x
Log decisions x
Create release
notice x
Linking change
requests to release
notes
x
Linking change
requests to
versions
x
T3 Propagate
Deploy as a
service x
Deliver services
while disconnected x
Provision all
versions x
T4 Publication
Read-access over
HTTP x
Write-access over
WebDAV or
Subversion.
x
T5
Harmonisation
Mappings x
Link discovery x
4.8. Recommendations for the GENIS RDC – E2E
implementation example
Based on the inventory of existing requirements and needs and existing tools, it
can be concluded that GENIS RDC is a tool that could fit as a deployment tool.
Other Standard tools are already available for editing, change management and
publication. Therefore we give the following recommendations to demonstrate how
the pieces can be fitted together:
Consider using existing editors such as VocBench: Investigate the
possibility of using VocBench as an editor for the reference data and focus
future development effort for the GENIS Reference Data Component (RDC)
on its deployment features only; “reference data as a service”;
Reference data governance and management at DG COMP
03/09/2015 Page 55 of 70
Consider using a standard representation format such as SKOS-XL:
Align the versioning of the reference data with good practices from the
Publications Office and standards such as SKOS(-XL);
Provide an import and export feature for reference data in SKOS-XL
format; and
Consider attributing persistent HTTP URIs: Include URIs for concept
schemes and concepts in the reference data and align them with the
(informal) rules for persistent URIs of the URI Task Force of the European
Commission (Cf. SEMIC Deliverable D3.226).
Consider integration with a Workflow Automation tool like Activiti
(i.e. integration with the management aspects of reference data);
Consider integration with an ESB like Mule ESB or Mule AnyPoint for
connecting to various stakeholders with specific interface requirements,
and/or Cloud deployment in case an even broader access is desirable.
Possible integration solutions
Figure 9 and 10 give a graphical overview of how the tools mentioned above can fit
together as well as how GENIS could fit in such overall approach.
Figure 9 shows: the functional responsibilities of the different blocks and
how they collaborate;
Figure 10 shows: how they can be mapped to components/tools like GENIS.
With the above recommendations, GENIS could as such become an integral part of
an overall semantic platform/approach within the Commission.
26 D3.2 Common approach for the management of URIs by EU institutions
https://webgate.ec.europa.eu/CITnet/confluence/x/8AHgDw
Reference data governance and management at DG COMP
03/09/2015 Page 56 of 70
Figure 7 - Overview (functional blocks)
Figure 8: Overview (example implementation)
Below an explanation is given of the figures and their lanes:
Governance Lane:
Governance is extensively covered in this document. It is the only ‘non-tangible’
(hence the chalked line) in this overview. Yet the governance drives the other lanes
that do have a counterpart in software. The products of governance are policies &
principles which should be implemented in the other lanes.
Reference data governance and management at DG COMP
03/09/2015 Page 57 of 70
Management Lane:
Reference data management component
This component logically follows on governance. In other words the principles and
procedures are translated to managing reference data. The management of
reference data comes in the form of different workflow processes also defined in
this document. A workflow tool coordinates the processes that need to be carried
out and involves all stakeholders. For example, it defines a workflow for creating a
new controlled vocabulary, or for adding elements to existing vocabularies. It
typically also keeps versions of these workflows, and audit trail for Business
Intelligence reporting. Ideally it also allows for call-outs to different parts of an
overall architecture from within the WF tool, this to realize an integrated approach.
Reference data editor Component
This component is the place where CRUD (Create/Read/Update/Delete) operations
on RD are executed once a decision is made in the Workflow Component. At its
backend, it will have to interface with a variety of existing data stores (NAs) and
middleware like Apache Jena or Semantic Turkey to cover RDF/OWL/SKOS
functionality. As e.g. Jena by itself cannot accommodate just about any backend,
an ESB comes to rescue.
Consumption / deployment Lane
This lane makes sure that a variety of customers can access the semantic content
they need, to integrate it into their own content. The ESB can be used again to
address the differences in storage format (SQL/native RDF/…) and the format
clients want their data in (XML, JSON, native RDF/SKOS …).
To ensure a separation of concerns and decouple front-end from backend, it is
advisable to apply a man-in-the middle approach. It is called Façade as its main
purpose is just that. A decision to be made is how accessible one would want this to
be. E.g. if public access is desirable, an always-on off-site cloud solution like Mule
AnyPoint can offer the same flexibility as a local Mule ESB while adding cloud-
hosting features.
For an integrated approach, each of the blocks can be mapped to custom
development or configuration/extending of existing tools like the ones mentioned
earlier in section 4.3 of this document. Looking at the studied requirements, the
latter approach seems to fit;
The reference data editor role could be assumed by either GENIS RDC or
VocBench. GENIS RDC is Java based and can be reused by other systems as
a plugin, via web-services, via API, or using a dedicated client. Vocbench is
a Java-based Open-Source tool which means it can work together
seamlessly with a workflow engine like Activiti.
Cf. recommendations for RDC: for deployment needs, GENIS could be used
as it is Java-based as well (-> it can be made inter-operable with all other
aspects of the setup) as it already features an External Service Layer to
accommodate the needs of various clients. For parts it does not cover yet,
the ESB can be implemented. It is indeed why recommendations to further
develop GENIS focus on these 2 aspects of GENIS.
Reference data governance and management at DG COMP
03/09/2015 Page 58 of 70
5. CONCLUSIONS
This report elaborates on the tailoring of a methodology for the management and
governance of reference data for the State-aid information of DG COMP in which
the Commission exchanges information both internally (with DG AGRI, DG MARE
and Eurostat) and with European public administrations in all Member States.
The following approach was followed:
Stakeholder requests and needs were identified;
A solution for the governance & management of reference data is specified;
It was assessed if existing tools including GENIS RDC as a main component
meet the identified requests and needs; and
Recommendations for further development of GENIS RDC are given.
Solution for governance and management:
There are many existing standards and methodologies to achieve metadata
governance and metadata management. In terms of governance we have derived
the following models from existing solutions that can be used:
For the local level we have identified a governance structure composing out
of a steering committee, working group and stakeholder involvement.
For inter-institutional IMMC can be taken for inspiration.
On a trans-European level comitology procedures need to be taken into
account.
We have determined that both reference data specifications under metadata
governance and related documentation should have an authoritative source.
The use of persistent Uniform Resource Identifiers (HTTP URI’s) for reference
data releases can make it easier to manage an authoritative source.
In terms of data management we have identified best practices from DM-BOK,
Publications offices and ITIL and found that these existing management practices
can be well applied to manage structural metadata as described in chapter 3.3.
Support by existing tools and recommendations
It is concluded that GENIS RDC is a well-placed tool that can be used for editing
and propagating data and perhaps play a part in change management and that
there are many tools available that could complement GENIS RDC such as
VocBench in order to fulfil the needs and requirements listed in this document. In
Section 4.8 we formulated the following recommendations:
Consider using the tools as mentioned in the categorization as they fulfil the
requirements and are also being widely used within the EC;
Consider using a standard representation format such as SKOS-XL;
Consider providing an import and export feature for reference data in
SKOS-XL format;
Consider attributing persistent HTTP URIs; and
Also consider the use of integration tools such as ESB MULE and combine it
with a workflow automation tool such as Activiti.
Reference data governance and management at DG COMP
03/09/2015 Page 59 of 70
6. ACKNOWLEDGEMENTS
Specific acknowledgement is due to:
Person Organisation
Jesper Abrahamsen European Commission, DG COMP
Julian-Daniel Jimenez-
Krause European Commission, DG COMP
Manuel Perez-Espin European Commission, DG COMP
Roberto Atienza European Commission, DG COMP
Carsten Schott European Commission, DG COMP (external consultant)
Reference data governance and management at DG COMP
03/09/2015 Page 60 of 70
BIBLIOGRAPHY
(2012). Open Data White Paper - Unleashing the Potential. Norwich: The Stationery
Office.
Official Journal of the European Union. (2013, June 27). Retrieved December 02, 2013,
from EUROPA - European Union website, the official EU website: http://eur-
lex.europa.eu/JOHtml.do?uri=OJ:L:2013:175:SOM:EN:HTML
Bechhofer, S., & Miles, A. (2009). SKOS Simple Knowledge Organization System
Reference. W3C.
Berners-Lee, T. (2006, July 27). Linked Data. Retrieved December 02, 2013, from
World Wide Web Consortium (W3C):
http://www.w3.org/DesignIssues/LinkedData.html
Bizer, Heath, & Berners-Lee. (2009). Linked Data - The Story So Far. International
Journal on Semantic Web and Information Systems, 1-22.
Chen, W.-J., Baldwin, J., Dunn, T., Grasselt, M., Shabbar, H., Mandelstein, D., et al.
(2013). A Practical Guide to Managing Reference Data with IBM InfoSphere
Master Data Management Reference Data Management Hub. International
Business Machines Corporation.
CIEC. (2013). Information note. Strasbourg: CIEC.
Coates, A., & Watts, M. (2007). Code List Representation (Genericode) Version 1.0.
OASIS.
CooP. (2014). Final Report of Work Package 5: Specifications of Common Data
Formats and Semantics.
Council of the European Union. (2009). Council Decision 2009/316/JHA of 6 April
2009 on the establishment of the European Criminal Records Information
System (ECRIS) in application of Article 11 of Framework Decision
2009/315/JHA. Official Journal L 093, 33-48.
De Leenheer, P., de Moor, A., & Christiaens, S. (2010). Business Semantics
Management at the Flemish Public Administration.
Dekkers, M., & Goedertier, S. (2013). Metadata for Public Sector Administration.
NISO/DCMI.
Digitaliseringsstyrelsen. (2012, May 30). About OIOXML. Retrieved November 22,
2013, from Digitaliseringsstyrelsen:
http://www.digst.dk/Servicemenu/English/IT-Architecture-and-
Standards/Standardisation/Standardisation-creating-digital-Denmark/About-
OIOXML
Reference data governance and management at DG COMP
03/09/2015 Page 61 of 70
Directorate General: Energy & Transport. (2011, 01 17). Tachonet Project: XML
Messaging Reference Guide.
ECN. (2013, 10 21). European Competition Network. Retrieved 12 24, 2013, from
European Commission: competition: http://ec.europa.eu/competition/ecn/
ECN. (n.d.). Joint Statement of the Council and the Commission on the Functioning of
the Network of Competition Authorities. Retrieved 12 23, 2013, from European
Commission: competition:
http://ec.europa.eu/competition/ecn/joint_statement_en.pdf
e-CODEX. (2012). e-Justice Communication via Online Data Exchange. European
Commission.
EESSI. (n.d.). Electronic Exchange of Social Security Information . Retrieved 12 24,
2013, from European Commission: Employment, Social Affairs & Inclusion:
http://ec.europa.eu/social/main.jsp?catId=869
e-SENS. (2013, August 27). Electronic Simple European Networked Services - D6.1
Executable ICT Baseline Architecture.
ETSI. (2011). Electronic Signatures and Infrastructures (ESI); Associated Signature
Containers (ASiC). Sophia Antipolis: European Telecommunications Standards
Institute.
EUCARIS. (2013). EUCARIS - Technology. Retrieved 12 24, 2013, from EUCARIS:
https://www.eucaris.net/technology
EUCARIS. (2013). Use of EUCARIS. Retrieved 12 23, 2013, from European Car and
Driving License Information System: https://www.eucaris.net/use-of-eucaris
European Commission . (2010). Commission Regulation (EU) No 1213/2010 of 16
December 2010 establishing common rules concerning the interconnection of
national electronic registers on road transport undertakings Text with EEA
relevance . Official Journal of the European Commission, 21-29.
European Commission - ISA Programme. (2012). D7.1.3 - Study on persistent URIs,
with identification of best practices and recommendations on the topic for the
MSs and the EC. Retrieved from
https://joinup.ec.europa.eu/community/semic/document/10-rules-persistent-uris/
European Commission - ISA Programme. (2012). D7.1.3 - Study on persistent URIs,
with identification of best practices and recommendations on the topic for the
MSs and the EC. Brussels.
European Commission - ISA Programme. (2013, June 03). CAMSS - 05 - Detailed
CAMSS Criteria. Retrieved November 27, 2013, from Joinup:
https://joinup.ec.europa.eu/community/camss/wiki/camss-05-detailed-camss-
criteria
Reference data governance and management at DG COMP
03/09/2015 Page 62 of 70
European Commission - ISA Programme. (2013). Draft Report on reaching semantic
agreements with CISE. Brussels.
European Commission - ISA Programme. (2013). Process and methodology for
developing semantic agreements.
European Commission - ISA Programme. (2014). D4.2. Methodology and tools for
Metadata Governance and Management for EU Institutions and Member States.
Brussels.
European Commission - Mobility and Transport. (n.d.). European Register of Road
Transport Undertakings (ERRU). Retrieved 12 23, 2013, from European
Commission - Mobility and Transport:
http://ec.europa.eu/transport/modes/road/access/erru_en.htm
European Commission. (2010). Evaluation of the 2004 action plan for electronic public
procurement. Brussels: European Commission.
European Commission. (2011). Commission Decision of 12 December 2011 on the
reuse of Commission documents (2011/833/EU). Official Journal of the
European Union, 39-42.
European Commission. (2011, December 12). Communication from the Commission to
the European Parliament, the Council, the European Economic and Social
Committee and the Committee of the regions. Open data - An engine for
innovation, growth and transparent governance. Brussels, Belgium.
European Commission. (2013). INSPIRE Directive. Retrieved December 10, 2013, from
European Commission: http://inspire.jrc.ec.europa.eu/
European Commission. (2013). ISA Open Metadata Licence v1.1. Brussels.
European Commission. (2013). Official documents - Employment, Social Affairs &
Inclusion - European Commission. Retrieved December 17, 2013, from
European Commission:
http://ec.europa.eu/social/main.jsp?catId=868&langId=en
European Commission, ISA Programme. (2013). D4.1 – Metadata management
requirements and existing solutions in EU Institutions and Member States.
Brussels: European Commission.
European Commission, ISA Programme. (2013). D6.1.2 – Report for documenting and
reusing data models and reference data Business Case. Brussels.
European Commission, ISA Programme. (2014). D4.1 – Metadata management
requirements and existing solutions in EU Institutions and Member States.
Brussels: European Commission.
European Community. (2007). European Union Public Licence v.1.1. Retrieved 02 11,
2014, from European Union Public Licence (EUPL v.1.1.):
Reference data governance and management at DG COMP
03/09/2015 Page 63 of 70
https://joinup.ec.europa.eu/system/files/EN/EUPL%20v.1.1%20-
%20Licence.pdf
European Union. (2011). Regulation no. 182/2011 laying down the rules and general
principles concerning mechanisms for control by Member States of the
Commission's exercise of implementing powers.
Fabian Büttner, U. B. (2013). Model-driven Standardization of Public Authority Data
Interchange.
General Secretariat of the Council. (2010). ECRIS Technical Specifications - Inception
Report. Brussels: European Commission – DG Justice.
Government On-Line Metadata Working Group. (2006). Records Management
Application Profile. Canada: Government of Canada.
Graux, H. (2009). Study on electronic documents and electronic delivery for the
purpose of the impementation of Art. 8 of the Services Directive. Brussels:
Timelex.
IBM. (2013). Reference Data Management: IBM Redbooks Solution Guide. New York:
International Business Machines Corporation.
IDABC - CAMSS. (2012, June 4). CAMSS Assessment Criteria. Retrieved November
27, 2013, from IDABC - CAMSS:
https://webgate.ec.europa.eu/fpfis/mwikis/idabc-
camss/index.php/CAMSS_Assessment_Criteria
Interactive Instruments. (2011). Beyond service interfaces - OGC encoding standards in
INSPIRE: GML and SLD/SE. Bonn, Germany. Retrieved December 10, 2013,
from
http://inspire.jrc.ec.europa.eu/events/conferences/inspire_2011/presentations/wo
rkshops/274/Beyond_service_interfaces_OGC_workshop.pdf
International Organisation for Standardisation. (2005). ISO 19135:2005 Geographic
information -- Procedures for item registration. Geneva.
International Organization for Standardization. (2005). ISO/IEC 11179-6:2005 -
Metadata registries, part 6: Registration.
International Organization for Standardization. (2009). ISO/IEC 11179-1:2004 -
Metadata registries.
International Organization for Standardization. (2009). ISO/IEC 11179-1:2004 -
Metadata registries.
Interoperability solutions for European public administrations (ISA). (2011, May 5).
eGovernment Core Vocabularies: The SEMIC.EU approach. Brussels, Belgium:
European Commission.
Reference data governance and management at DG COMP
03/09/2015 Page 64 of 70
ISA. (2010). European Interoperability Framework (EIF) for European public services.
Brussels: European Commission.
ISA. (2012). How Linked Data is transforming eGovernment. European Commission.
ISA. (2013, January 04). ICCS-CIEC Civil Status Forms. Retrieved December 09, 2013,
from Joinup: https://joinup.ec.europa.eu/catalogue/repository/iccs-ciec-civil-
status-forms
ISA Programme. (2011). Towards Open Government Metadata. Brussels.
ISA Programme of the European Commission. (2012). Metadata Management Survey
Results.
ISA Programme of the European Commission. (2013). Towards harmonised
governance and management of data models and reference data - Business case.
Brussels.
Kurt Salmon. (2013). Assessment of TESs supporting EU policies.
LeBlanc, P., & Smith, B. L. (2002). A Workshop on Managing Horizontal Issues.
Retrieved 12 24, 2013, from Managing Horizontal Issues:
http://www.thinkwell.ca/groupwork/managingHorizontalIssues/documents/MHI
WkshopOutlinev2.pdf
Miles, A., & Bechhofer, S. (2009, August 18). SKOS Simple Knowledge Organization
System eXtension for Labels (SKOS-XL) Namespace Document - HTML Variant.
Retrieved April 3, 2014, from World Wide Web Consortium (W3C):
http://www.w3.org/TR/skos-reference/skos-xl.html
Mosley, M., Brackett, M., Earley, S., & Henderson, D. (2009). The DAMA Guide to The
Data Management Body of Knowledge (DAMA-DMBOK Guide). New Jersey:
Technics Publications, LLC.
National Information Standards Organization . (2004). Understanding Metadata.
NIEM. (2013). NIEM Tools Catalog. Retrieved 11 25, 2013, from NIEM | National
Information Exchange Model: https://www.niem.gov/tools-
catalog/Pages/tools.aspx
OASIS. (2006, December 12). Universal Business Language v2.0. Retrieved November
29, 2013, from OASIS | Advancing open standards for the informaton society:
http://docs.oasis-open.org/ubl/os-UBL-2.0/UBL-2.0.html
OASIS. (2013). OASIS Universal Business Language (UBL) TC. Retrieved November
29, 2013, from OASIS | Advancing open standards for the information society:
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl
Object Management Group, Inc. (2012, October). Unified Modeling Language™
(UML®). Retrieved November 20, 2013, from Object Management Group:
http://www.omg.org/spec/UML/
Reference data governance and management at DG COMP
03/09/2015 Page 65 of 70
OHIM. (2013). Community Trademark Registration Process. Retrieved December 15,
2013, from https://oami.europa.eu/ohimportal/en/registration-process
OMG. (2011, 01). Business Process Modelling Notation 2.0 (BPMN). Retrieved from
omg.org: http://www.omg.org/spec/BPMN/
Open Knowledge Definition. (n.d.). Open Definition. Retrieved 11 20, 2013, from Open
Definition: http://opendefinition.org/
Open Knowledge Foundation. (2013). Open Definition. Retrieved December 02, 2013,
from Open Definition: http://opendefinition.org/
PEPPOL. (2013). Virtual Company Dossier. Retrieved November 22, 2013, from
PEPPOL | Pan-European Public Procurement Online:
http://www.peppol.eu/peppol_components/virtual-company-dossier
Portal Administración Electrónica. (2013). Technical standards for interoperability .
Retrieved December 09, 2013, from Portal Administración Electrónica:
http://administracionelectronica.gob.es/pae_Home/pae_Organizacion/pae_DGM
APIAE.html?idioma=en
Publications Office of the European Union. (2011). Proposal for metadata governance
on interinstitutional level.
Roy, D. (n.d.). National Information Exchange Model (NIEM): Technical Introduction
to NIEM.
Spanish Ministry of Finance and Public Administration. (n.d.). Decision of 19 February
2013 of the secretary of state for public administration approving the technical
interoperability standard for the reuse of information resources.
SPOCS. (2012). eDocuments - Specification. Retrieved December 16, 2013, from
eDocuments:
http://joinup.ec.europa.eu/site/spocs/eDocuments/specification.html
Uhrowczik, P. (1973). Data dictionary/directories. IBM Systems Journal, 332-350.
UN/CEFACT. (2004). Standard Business Document Header - Technical Specification.
European Commission.
UN/CEFACT. (2008). UML Profile for Core Components (UPCC).
UN/CEFACT. (2009). Core Components Technical Specification - Version 3.0.
UN/CEFACT. (2009). XML Naming and Design Rules Technical Specification -
Version 3.0.
UN/CEFACT. (2012). Core Components Business Document Assembly - Technical
Specification - Version 1.0.
Reference data governance and management at DG COMP
03/09/2015 Page 66 of 70
UN/CEFACT. (2012, June 27). Core Components Business Document Assembly
Technical Specification.
United Nations - Centre for Trade Faciliation and Electronic Business. (2009). Core
Components Technical Specification - Version 3.0.
W3C. (2013). Linked data. Retrieved December 02, 2013, from World Wide Web
Consortium (W3C): http://www.w3.org/standards/semanticweb/data
Reference data governance and management at DG COMP
03/09/2015 Page 67 of 70
ANNEX I STATE-AID REFERENCE DATA SETS
The table below represents the different reference data relevant to State-aid control
which is maintained by DG COMP. It is understood that there is awareness that
reference data is best kept and maintained at source, thus employing a federated
model, where consumers will be ascertained of the quality of that data because it is
maintained directly by the business owner (e.g. the Publications Office may be
responsible for some data core to the business of the Commission and then all
other DGs similarly provision other reference data).
Table 6 – State Aid reference data
TABLE NAME AUTHENTIC SOURCE
ACCELERATED PROCEDURE TYPE
AGRI DESCRIPTIO OTHER
AGRI DESCRIPTION SUB-TYPE
AGRI DESCRIPTION TYPE
BENEFICIARY NUMBER
BENEFICIARY SIZE
CARTOUCH DESCRIPTORS
CASE BACKGROUND LINK TYPE
CASE CATEGORY
CASE CRITERIA
CASE PLANNING STEPS
CASE TYPE
CLASSIFICATION
CLASSIFICATION PLAN
COMPLAINANT TYPE
COMPLAINT TYPE
COMPLAINTS – MEANS OF CLOSURE
COMPLAINTS – REASON FOR CLOSURE
COMPLAINTS – REASON FOR NON CLOSURE
COUNTRY
CR EU COURT
CR RECENT EVENT
CR STATUS
CURRENCY
DECISION TYPE
DECISIONAL PROCEDURE TYPE
DG
EMPOWERMENT
GBER BENEFICIARY
INTERNAL QUALIFIER
LEGAL BASIS
LANGUAGE
MC CONSDITION STATUS
MC STATUS
NACE CODE
Reference data governance and management at DG COMP
03/09/2015 Page 68 of 70
OBJECTIVE Commission Regulation (EC) No
794/2004 of 21 April 2004 PRIMARY LAW
PRIORITY
PROCEDURE KEY STEP
PROCEDURE NEXT STEP
PROCEDURE TYPE
REGION
REGIONAL AID
RETENTION LIST
SECONDARY EMPOWERMENT
SECONDARY LAW 2
SECONDARY LAW 3
STATE AID INSTRUMENT
SUB DOMAINS
TYPE_OF_AID
UNIT
WORKLOAD
ACCELERATED PROCEDURE TYPE
Reference data governance and management at DG COMP
03/09/2015 Page 69 of 70
ANNEX II METADATA REGISTRY OF THE PUBLICATIONS OFFICE (MDR)
The Metadata Registry (MDR) of the Publications Office27 of the EU is the
authoritative source for definition data – metadata elements, named authority lists,
schemas, etc. – and authority data used for exchanging data between institutions
involved in the legal decision making process. Many of the definition data sets
contained in the MDR are governed by the Inter-Institutional Metadata Maintenance
Committee (IMMC).
The Publications Office uses a tool chain and some scripts to edit the Named
Authority Lists. For each NAL, the Publications Office publishes a set of distribution
which can be downloaded from the MDR website. These sets are composed of a
SKOS, XML, XSD and HTML version.
A publication package is also available as a zip file. It contains the distribution of
changed NALs (XML, SKOS, ATTO-XML28), a comparison file allowing to identify
differences between the previous and the current version, and the release notes
listing the changes to the NALs included in the publication.
The architecture:
In the past, the publication process was time consuming and error-prone.
Moreover, the technologies involved in this process were not portable and had
complex maintenance.
PO decided to improve this process by implementing a cross-platform solution,
licence free and easily maintainable.
PO implemented a solution based on a tool to manage the validation workflow,
JIRA, as well as another to run the files transformation (XML technologies, Perl
programming language) and a software versioning to maintain current and
historical versions of files.
The validation workflow:
Publications office uses JIRA to manage the validation workflow. Three roles have
been defined in the workflow:
NAL operator: in charge of maintaining the Named Authority Lists. They can
open a ticket in JIRA in order to update, create or delete an item in the list.
NAL technician: responsible for the execution of the script to transform files;
they also produce diff reporting and the publication package for the release.
NAL authority: in charge of validating the contents before the release.
The workflow is summarised as below:
1. The NAL operator receives an external request to create, update a NAL;
27 http://publications.europa.eu/mdr/
28 http://publications.europa.eu/mdr/authority/
Reference data governance and management at DG COMP
03/09/2015 Page 70 of 70
2. NAL operator creates a ticket in JIRA, a notification is sent to NAL technician
and NAL authority;
3. NAL operator checks out the excel file from the SVN repository;
4. NAL operator updates the excel file and checks in the excel file in the SVN
repository;
5. NAL operator changes the ticket status in JIRA, NAL technician and NAL
authority are notified;
6. NAL technician launches the transformation process with the tool which
generates XML, SKOS, HTML files
7. NAL technician launches also the diff report to compare XMLs (the current
version and the new one), the report is generated in Excel and HTML;
8. NAL technician updates the ticket status in JIRA and NAL operator and NAL
authority are notified. The report is also sent to the NAL operator and NAL
authority;
9. NAL operator checks the report and validates it. If he detects an error, the
process restarts from point 3;
10. NAL operator updates the ticket status in JIRA, NAL technician and NAL
authority are notified;
11. NAL authority checks also the diff report and gives the final validation. If he
detects an error, the process restarts from point 3;
12. NAL authority updates the ticket status, NAL operator and NAL technician
are notified;
13. NAL technician prepares the release note and the release package, and
sends it to technical team in charge of the deployment;
14. NAL technician closes the ticket; NAL operator and NAL authority are
notified.
The execution workflow:
The execution workflow is the technical side done to transform, to compare and to
package the NAL publication. It is summarized in the following schema:
Figure 9 – Schematic overview of how the Publications Office edits an XML file and generates
all distributions of Named Authority Lists (NALs)
Transform
Output fileOutput file
XLS XML
Output file
XML XSD
Input file
XML SKOS
HTML
Transformation engine (Execution workflow in XML, XSLT files, PERL)