77
Date: 03/09/2015 SC17DI06692 D4.3 Report on implementation of a Metadata Management pilot for DG COMP

D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Date: 03/09/2015

SC17DI06692

D4.3 Report on implementation of a Metadata Management pilot for DG COMP

Page 2: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page i

Document Metadata

Property Value

Date 2014-06-07

Status Accepted

Version 1.00

Authors

Stijn Goedertier – PwC EU Services

Gerben Hoogeboom – PwC EU Services

Philippe Lamote – PwC EU Services

Nikolaos Loutas – PwC EU Services

Brecht Wyns – PwC EU Services

Reviewed by Pieter Breyne – PwC EU Services

Approved by

Jesper Abrahamsen – European Commission, DG COMP

Julian-Daniel Jimenez-Krause – EC, DG COMP

Athanasios Karalopoulos - European Commission, DG DIGIT

Vassilios Peristeras – European Commission, DG DIGIT

This study was prepared for the ISA Programme by:

PwC EU Services

Disclaimer:

The views expressed in this report are purely those of the authors and may not, in

any circumstances, be interpreted as stating an official position of the European

Commission.

The European Commission does not guarantee the accuracy of the information

included in this study, nor does it accept any responsibility for any use thereof.

Reference herein to any specific products, specifications, process, or service by

trade name, trademark, manufacturer, or otherwise, does not necessarily constitute

or imply its endorsement, recommendation, or favouring by the European

Commission.

All care has been taken by the author to ensure that s/he has obtained, where

necessary, permission to use any parts of manuscripts including illustrations, maps,

and graphs, on which intellectual property rights already exist from the titular

holder(s) of such rights or from her/his or their legal representative.

“PwC” is the brand under which member firms of PricewaterhouseCoopers

International Limited (PwCIL) operate and provide services. Together, these firms

form the PwC network. Each firm in the network is a separate and independent

legal entity and does not act as agent of PwCIL or any other member firm.

Page 3: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page ii

Document History

Version Date Description Action

0.01 2013-12-17 Template & Table of Contents Creation

0.02 2014-01-08

Desk research based on previous

communication and business

case

Update

0.03 2014-01-16 Updates based on meeting with

DG COMP of the 15 January 2014 Update

0.04 - 0.06 2014-01-21 Internal review Update

0.07 2014-01-30

Updates based on received input

from the Conference Call of 29

January 2014

Update

0.08 2014-02-11

Updates based on received input

from the Conference Call of 29

January 2014

Update

0.09 - 0.10 2014-02-11 Internal review Update

0.11 2014-02

New structure based on the

document, domain model &

schema documentation updated.

Update

0.12 2014-03-10 Internal review Review

0.13 2014-03-10

Updates throughout the

document: stakeholder

requirements, best practices and

SKOS description

Update

0.14 2014-03-11 Update

0.15 2014-03-12 Update

0.16 2014-04-13 Review

0.17 2014-03-13 Update

0.18 – 0.22

2014-03-18

2014-03-26

Restructuring

Updates throughout the

document

Update

0.23 2014-03-31 Updates on the governance

model Update

0.24 2014-04-02 Updates on the domain model Update

0.25 – 0.27

2014-04-02

– 2014-04-

03

Updates on the description of

GENIS Reference Data

Component

Update

0.28 2014-04-08

Updates in structure of section 4:

requirements and specifications

for reference data tools

Update

0.29 2014-04-08 Updates in section 3 Update

Page 4: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page iii

0.30 2014-04-08 Internal review Review

0.31 2014-04-08 Updates in section 3 and 4 Update

0.32 2014-04-08 Updated sections 2 and 3 Update

0.33 2014-04-15 Updated governance Update

0.34 – 0.35 2014-04-16 Updated tools Update

0.36 2014-04-17 Internal review Review

0.37 2014-04-18 Comments processed Update

0.38 2014-04-18 Delivered for review Review

0.39 2014-04-18 Updated change man. and tools Update

0.40 – 0.41 2014-04-18 Updated chapters 4 and 5 Update

0.42 2014-04-30 General Review by Jesper

Abrahapsen

Review &

Update

0.43 2014-05-06

Elaboration of GENIS

recommendations and End-2-End

example approach

Review &

Update

0.44 – 0.45 2014-05-07 Internal review Review

0.46 2014-05-08 Delivered for acceptance Delivered

0.47 2014-05-28 Delivered for acceptance Delivered

0.47 2014-06-04 Review by Athanasios

Karalopoulos Review

0.48 2014-06-05 Addressing review comments by

Athanasios Karalopoulos Update

0.49 2014-06-06 Delivered for acceptance Delivered

0.50 2014-06-07 Delivered for acceptance Delivered

1.00 2014-06-07 Accepted Accepted

Page 5: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page iv

Contents

EXECUTIVE SUMMARY ........................................................................................................................... 1 1. INTRODUCTION ............................................................................................................................ 3

1.1. CONTEXT: STATE-AID CONTROL ......................................................................................................... 3 1.2. DEFINITION: REFERENCE DATA ........................................................................................................... 4 1.3. BUSINESS NEED .............................................................................................................................. 5 1.4. EXPECTED BENEFITS ........................................................................................................................ 5 1.5. APPROACH .................................................................................................................................... 6 1.6. STAKEHOLDERS AND ROLES ............................................................................................................... 6 1.7. GLOSSARY ..................................................................................................................................... 7

2. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA GOVERNANCE............................... 9

2.1. STAKEHOLDER REQUESTS AND NEEDS .................................................................................................. 9 2.2. EXISTING SOLUTIONS FOR REFERENCE DATA GOVERNANCE .................................................................... 12

2.2.1. ISA Committee and ISA Coordination Group ....................................................................... 12 2.2.2. Inter-Institutional Metadata Maintenance Committee (IMMC) ......................................... 12 2.2.3. ISO11179-6 Metadata Registration .................................................................................... 12 2.2.4. Data Management Body of Knowledge (DM-BOK) ............................................................ 13

2.3. SPECIFICATION OF METADATA GOVERNANCE ...................................................................................... 13 2.3.1. Scope................................................................................................................................... 13 2.3.2. Organisational structure ..................................................................................................... 15 2.3.3. Decisions ............................................................................................................................. 18 2.3.4. Authoritative source ........................................................................................................... 19 2.3.5. Licensing framework ........................................................................................................... 20 2.3.6. Enforcement ....................................................................................................................... 21 2.3.7. Continuous improvement ................................................................................................... 21

3. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA MANAGEMENT ........................... 22

3.1. STAKEHOLDER REQUESTS AND NEEDS ................................................................................................ 22 3.2. EXISTING METHODOLOGIES FOR REFERENCE DATA MANAGEMENT........................................................... 24

3.2.1. Data Management Body of Knowledge (DM-BOK) ............................................................ 24 3.2.2. ISO 11179-6 Metadata Registration ................................................................................... 24 3.2.3. ISO 19135:2005 Geographic information -- Procedures for item registration .................... 25 3.2.4. Information Technology Infrastructure Library (ITIL).......................................................... 25 3.2.5. Good practices from the Publications Office: integrating Reference Data Management in the Software Development Lifecycle ................................................................................................. 25

3.3. SPECIFICATION FOR METADATA MANAGEMENT ................................................................................... 27 3.3.1. Design structural metadata ................................................................................................ 27 3.3.2. Manage change of structural metadata ............................................................................ 27 3.3.3. Harmonise structural metadata ......................................................................................... 30 3.3.4. Release structural metadata .............................................................................................. 32 3.3.5. Deploy structural metadata ................................................................................................ 35 3.3.6. Retire structural metadata ................................................................................................. 36

4. REQUIREMENTS FOR AND ASSESSMENT OF EXISTING REFERENCE DATA TOOLS ......................... 38

4.1. STAKEHOLDER REQUESTS AND NEEDS ................................................................................................ 38 4.2. EXISTING STANDARDS FOR REFERENCE DATA MANAGEMENT .................................................................. 39

4.2.1. Representation: Simple Knowledge Organisation System (SKOS) ...................................... 39 4.2.2. Representation: GeneriCode ............................................................................................... 41 4.2.3. Representation: Using HTTP URIs to identify concept schemes and concepts ................... 41 4.2.4. Description: Asset Description Metadata Schema (ADMS) ................................................ 41

4.3. EXISTING TOOLS FOR REFERENCE DATA MANAGEMENT ......................................................................... 42 4.3.1. Publication: Joinup .............................................................................................................. 42 4.3.2. Publication: Metadata Registry of the Publications Office (MDR) ...................................... 43

Page 6: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page v

4.3.3. Editor / Propagation: GENIS Reference Data Component (GENIS RDC) ............................. 43 4.3.4. Editor: VocBench ................................................................................................................. 45 4.3.5. Editor: PoolParty: Thesaurus Management ........................................................................ 45 4.3.6. Editor: Silk workbench (link discovery)................................................................................ 46 4.3.7. Workflow Management tool: Activiti ................................................................................. 46 4.3.8. Change management: Atlassian JIRA ................................................................................. 46 4.3.9. Deployment: Mule .............................................................................................................. 47 4.3.10. Editor / Deployment: Jena.............................................................................................. 47

4.4. DOMAIN MODEL .......................................................................................................................... 47 4.5. DATA FLOW DIAGRAM ................................................................................................................... 49 4.6. HIGH-LEVEL USE CASES .................................................................................................................. 49

4.6.1. Use Case 0 – Edit an authentic source of reference data ................................................... 50 4.6.2. Use Case 1 – Detect reference data changes ...................................................................... 50 4.6.3. Use Case 2 – Manage reference data changes ................................................................... 51 4.6.4. Use Case 3 – Deploy reference data changes ..................................................................... 52

4.7. ASSESSMENT OF PROPOSED TOOLING FOR REFERENCE DATA MANAGEMENT ............................................. 53 4.8. RECOMMENDATIONS FOR THE GENIS RDC – E2E IMPLEMENTATION EXAMPLE ....................................... 54

5. CONCLUSIONS ............................................................................................................................ 58 6. ACKNOWLEDGEMENTS ............................................................................................................... 59 BIBLIOGRAPHY ..................................................................................................................................... 60 ANNEX I STATE-AID REFERENCE DATA SETS ......................................................................................... 67 ANNEX II METADATA REGISTRY OF THE PUBLICATIONS OFFICE (MDR) ................................................ 69

List of Tables

Table 1 - Stakeholders ...................................................................................................... 6

Table 2 - Glossary ............................................................................................................ 7

Table 3 – Stakeholder requests: reference data management ........................................... 9

Table 4 – Stakeholder requests and needs: reference data management ........................ 22

Table 5 – Reference data tools ....................................................................................... 38

Table 6 – State Aid reference data.................................................................................. 67

List of Figures

Figure 1 – Overview of systems involved in State-aid control ........................................ 4

Figure 2: organisation structures .................................................................................... 13

Figure 3 – Illustration: objectives of State-aid control as defined in Commission

Regulation (EC) No 794/2004 ................................................................................ 18

Figure 4 – UML Static Diagram: Domain Model for reference data (based on SKOS-

XL) ......................................................................................................................... 49

Figure 5- Simplified DFD for the flow of data between authentic source and GENIS .. 49

Page 7: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page vi

Figure 6 High-level use cases for metadata management .............................................. 50

Figure 7 - Overview (functional blocks) ........................................................................ 56

Figure 8: Overview (example implementation).............................................................. 56

Figure 9 – Schematic overview of how the Publications Office edits an XML file and

generates all distributions of Named Authority Lists (NALs) ............................... 70

Page 8: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 1 of 70

EXECUTIVE SUMMARY

This report is commissioned by the Interoperability Solutions for European Public

Administrations (ISA) Programme of the European Commission, in the context of its

Action 1.1 on semantic interoperability. It involves the tailoring of a methodology

for the management and governance of reference data, based on the proposed

methodology in D4.2 ‘Methodology and tools for Metadata Governance and

Management for EU Institutions’, for the State-aid information systems (register of

planned State-aid) of DG COMP in which the Commission exchanges information

both internally (with DG AGRI, DG MARE and Eurostat) and with European public

administrations in all Member States. It also assesses the extent to which the

Generic Interoperable Notification Services (GENIS) Reference Data Component

(RDC) can support the reference data governance and management processes.

During the development of this pilot the approach was set to the following:

Elicited and validate the specific requirements for reference data

management and governance for DG COMP in the context of State-aid

control;

Identify existing solutions for managing and governing reference data

based on input from the Publications Office and deliverable D4.1 ‘Metadata

management requirements and existing solutions in EU Institutions and

Member States’ ;

Specify a solution for the management and governance of reference data,

consistent with D4.2 ‘Methodology and tools for Metadata Governance and

Management for EU Institutions’ and based on standards, and

demonstrated its applicability and feasibility; and

Assess the coverage of the identified requirements and propose an

approach with existing tools, including the GENIS reference data

component, hereby identifying gaps, assessing usefulness, and fitness-for-

purpose.

Governance and Management

Chapters 2 and 3 look at the requirements and specifications for reference data

governance and reference data management respectively.

In terms of governance we have derived several models from existing solutions.

For the local level we have identified a governance structure composing out

of a steering committee, working group and stakeholder involvement.

For inter-institutional IMMC can be taken for inspiration.

On a trans-European level Comitology procedures need to be taken into

account.

We have determined that both reference data specifications under metadata

governance and related documentation should have an authoritative source.

The use of persistent Uniform Resource Identifiers (HTTP URI’s) for reference

data releases can make it easier to manage an authoritative source.

In terms of data management we have identified best practices from DM-BOK,

Publications Office and ITIL and found that these existing management practices

can be well applied to manage structural metadata as described in chapter 3.3.

Page 9: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 2 of 70

Tools

The focus of Chapter 4 is an assessment of and requirements for existing tools for

Reference Data. It enlists the main use cases and ends with briefly identifying

which functionality could be covered by which tool, as well as presenting an

example of a possible overall approach to demonstrate how they can all collaborate.

Having identified governance and management procedures, standards and best

practices we have done an inventory of existing tools to support this. It is

concluded that GENIS RDC is a well-placed tool that can be used for editing and

propagating data and perhaps play a part in change management and that there

are many tools available that could complement GENIS RDC in order to fulfil the

needs and requirements listed in this document. The tools can be categorized as

follows:

Editing: GENIS RDC or VocBench could be used as editing and workflow

tools for managing thesauri, authority lists and glossaries based on SKOS

RDF.

Change management: Use a dedicated component to manage and track

changes. Alternatively GENIS could be expanded to include more change

management.

Deployment: GENIS RDC is already used for deployment and its

functionality could be expanded as explained in Section 4.8

Publication: As publication source, the Joinup platform of the ISA

programme can be used. The structural metadata can be represented using

SKOS RDF and described using the Asset Description Metadata Schema

(ADMS RDF).

Harmonisation: For reference data mappings or interlinking two data

sources.

Each tool is tailored to fit the needs and requirements of the domain (e.g. editing,

change management, deployment, publication). In this context tools need to be

integrated so that automated exchange can be facilitated (e.g. change

management and workflow tools cover the entire process and need to keep track of

what is happening in editing tools etc.).

The following is therefore recommended:

Consider using the tools as mentioned in the categorization as they fulfil the

requirements and are also being widely used within the EC;

Consider using a standard representation format such as SKOS-XL;

Consider providing an import and export feature for reference data in

SKOS-XL format;

Consider attributing persistent HTTP URIs; and

Also consider the use of integration tools such as ESB MULE and combine it

with a workflow automation tool such as Activiti.

Page 10: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 3 of 70

1. INTRODUCTION

This report is commissioned by the Interoperability Solutions for European Public

Administrations (ISA) Programme of the European Commission, in the context of its

Action 1.1 on semantic interoperability. It involves the tailoring of a methodology

for the management and governance of reference data for the State-aid information

systems (register of planned State-aid) of DG COMP in which the Commission

exchanges information both internally (with DG AGRI, DG MARE and Eurostat) and

with European public administrations in all Member States. It also assesses the

extent to which the Generic Interoperable Notification Services (GENIS) Reference

Data Component (RDC) can support the reference data governance and

management processes.

1.1. Context: State-aid control

DG COMP – jointly with DG MARE And DG AGRI – supports the following two State-

aid control processes:

State-aid notification process: Member States are obliged to inform in

detail to the European Commission of their intention to spend public money

in undertakings (state aid). The legal basis for this is Commission Regulation

(EC) No 794/2004 of 21 April 2004 implementing Council Regulation (EC)

No 659/1999 laying down detailed rules for the application of Article 93 of

the EC Treaty, including Regulations amending Regulation 794/2004, and

Commission Regulation (EC) No 800/2008.

State-aid monitoring and reporting process: Member States are obliged

to report to the Commission on actual expenditures on current State-aid

measures. The legal basis for this is Article 21 of Council Regulation (EC)

659/1999 in regard of schemes and Article 6 of Commission Regulation

(EC) 794/2004 with respect to the remainder of existing aid, be it ad hoc

or any other kind.

The State-aid control processes are supported by the State-aid control information

systems of DG COMP, depicted in Figure 1, include:

GENIS (SANI-II): The Generic Interoperable Notification Services (GENIS)

Information System is used to manage and support the exchange of

information between Member States and the Commission within the State

Aid Notification Process, where Member States notify the European

Commission of planned State-aid. GENIS is also known as the State Aid

Notification Interactive (SANI-2), and is the successor to the existing SANI.

CMS: The Case Management System (CMS) receives the notification and is

used by Commission staff to investigate whether the State-aid can be

approved.

SARI: Once approved, the State Aid Reporting Interactive (SARI) is used by

Member States to supply the European Commission with the requested

information on state aid issued to beneficiaries.

Statistical reporting: DG COMP collects statistics for EuroStat1 on State-

1 Eurostat is the statistical office of the European Union, it provides the European Union with statistics at

European level that enable comparisons between countries and regions

Page 11: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 4 of 70

aid. For this it has created its own data warehouse.

GENIS has a component-based architecture, consisting of several building blocks,

including the GENIS Reference Data Component. DG COMP intends to ensure

that this component will be used to manage change to the reference data of GENIS,

CMS, and SARI by Q1 2014.

Figure 1 – Overview of systems involved in State-aid control

1.2. Definition: reference data

In this report the following definition for reference data is used:

Reference data are small, discrete sets of values that are not updated as part of

business transactions but are usually used to impose consistent classification.

Reference data normally has a low update frequency. Reference data is relevant

across more than one business systems belonging to different organisations and

sectors.

Reference data is a denominator for several artefacts that are used in information

systems and information exchange. The following is a list of types of reference data

that were identified by the ADMS Working Group2:

Code list: Complete set of data element values of a coded simple data

element [ISO 9735-1:2002, 4.14];

Taxonomy: scheme of categories and subcategories that can be used to

sort and otherwise organize items of knowledge or information [ISO/DIS

25964-2];

Thesaurus: controlled and structured vocabulary in which concepts are

represented by terms, organized so that relationships between concepts are

made explicit, and preferred terms are accompanied by lead-in entries for

synonyms or quasi-synonyms [ISO 25964-1:2011];

Name Authority List: controlled vocabulary for use in naming particular

entities consistently [ISO/DIS 25964-2].

2 ADMS Asset Types, https://joinup.ec.europa.eu/svn/adms/ADMS_v1.00/ADMS_SKOS_v1.00.html

GENISSANI-II

• Create notification

Case MgmtSystem - CMS

• COMP, MARE, AGRI

• Receive notification

SARI (DG COMP)

• Web-based user interface

• MS provide information after approval of notification

Reporting for EUROSTAT

• Publication

Reference Data Reference Data Reference Data

Page 12: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 5 of 70

Annex I contains an overview of the Reference Data that is managed by DG COMP

in the context of State-aid control.

1.3. Business need

A business case developed in the context of Action 1.1 of the ISA Programme

[European Commission, ISA Programme, 2013], elicits the following problem and

proposes the following solutions:

Problem: The business case reveals that uncoordinated use of reference

data may lead to failures in transaction handling between applications.

Moreover, the lack of common reference data makes integrating data

from different sources more cumbersome and has a negative impact on data

quality.

Solutions: The business case proposes a solution that is threefold:

o Metadata governance: well-defined roles and responsibilities,

cohesive policies and principles, and decision-making processes that

define, govern and regulate the lifecycle of metadata;

o Metadata management: the good practice of putting in place

people, processes, and systems to plan, perform, evaluate, and

improve the lifecycle of metadata;

o Metadata tools: tools that help to automate certain tasks in the

metadata management process.

DG COMP has a solid appreciation for the importance of reference data. High-level

management buy-in makes adoption of the appropriate methodologies easier. The

GENIS Reference Data Component, if supported by a well-thought management

and governance methodology, has the potential to improve the sharing and

integration of information and contribute directly to the realisation of GENIS.

Annex I contains an overview of the Reference Data that is managed by DG COMP.

The challenge lies in the fact that this is repeatedly the case across the myriad

databases across the Commission. The same attribute in different databases then

has a different set of allowable data values (value domains). The first challenge

thus is to map these values to an authentic source, and thus the one which is the

prime owner of that data’s creation in the Commission. The most typical example is

the list of codes with which Member States are referred, where a list is published by

the Publications Office in line with the process of accession of a state to the

European Union.

1.4. Expected benefits

The beneficiaries of the pilot (see Section 1.6) anticipate that a better

management and governance of reference data will yield several benefits,

including:

Better implementation of State-aid control policy;

Improve the coordination of the development and maintenance of reference

Page 13: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 6 of 70

data in the domain of State-aid control;

Increase consumer’s reliability on the data and reduced errors and

inconsistencies in the data flows between State-aid control information

systems; and

More efficient collaboration by developing a common understanding of

operational terminology, reinforced by multilingual concepts and keeping

track of temporal aspects.

1.5. Approach

The approach followed in this study is split in the following four phases.

1. Elicit and validate the specific requirements for reference data

management and governance for DG COMP in the context of State-aid

control;

2. Identify existing solutions for managing and governing reference data

based on input from the Publications Office and D4.1;

3. Specify a solution for the management and governance of reference

data (based on D4.2 and providing input to D4.2) and demonstrate its

applicability and feasibility; and

4. Assess the coverage of the identified requirements and proposed

approach by existing tools, including the GENIS reference data

component, and identify gaps. Identify gaps and assess use, usefulness, and

fitness-for-purpose.

The report is structured in three parts with requirements and specifications for

governance, management, and tools.

1.6. Stakeholders and roles

The table below lists the stakeholders involved in this study.

Table 1 - Stakeholders

Term Beneficiary System

owner

Approving

authority Sponsor

Member States X

DG COMP X X X

DG AGRI X

DG MARE X

EC ISA Programme X X

For facilitating the communication and the collaboration with the different

stakeholders, several meetings and workshops were organised:

W1: Tuesday 15 January 2014

W2: Wednesday 22 January 2014

Page 14: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 7 of 70

W3: Wednesday 29 January 2014

W4: Wednesday 26 February 2014

W5: Friday 21 March 2014

In these workshops we involved as necessary, the following:

the IT Unit of DG COMP: Mr J. Jimenez Krause (IT Project Manager), Mr J.

Abrahamsen (IT Project Officer - Database Administrator) and Mr Manuel

Perez Espin (Head of Unit - Information technology)

the ISA unit in DIGIT: Mr A. Karalopoulos (Programme Manager), Ms S.

Wigard (Programme Manager) and Mr V. Peristeras (Programme Manager -

EU policies)

The workshops were supplemented where necessary by direct communications with

an official responsible for the development of GENIS: Mr R. Atienza (IT Project

Officer).

1.7. Glossary

The table below provides common definitions used throughout the study.

Table 2 - Glossary

Term Description

ADMS A common metadata vocabulary to describe standards, so-called

interoperability assets, on the Web.

Code list Complete set of data element values of a coded simple data element [ISO 9735-1:2002, 4.14].

Data model

A data model is a collection of entities, their properties and the

relationships among them, which aims at formally representing a

domain, a concept or a real-world thing.

DG AGRI Directorate-General for Agriculture and Rural Development

DG COMP Directorate-General for Competition

DG MARE Directorate-General for Maritime Affairs and Fisheries

Interoperability

According the ISA Decision, interoperability means the ability of disparate and diverse organisations to interact towards mutually

beneficial and agreed common goals, involving the sharing of information and knowledge between the organisations, through the business processes they support, by means of the exchange of data between their respective ICT systems.

Metadata

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about

information. [National Information Standards Organization , 2004]

Metadata alignment

Metadata alignment is the harmonisation of structural metadata either by forging a wide consensus on the use of a common specification for structural metadata or through the creation of mappings between terms of two or more specifications.

Metadata

governance

Metadata governance comprises well-defined roles and responsibilities, cohesive policies and principles, and decision-making processes that define, govern and regulate metadata.

Metadata

management

Metadata management is defined as the good practice of putting in place people, processes, and systems to plan, perform, evaluate, and improve the lifecycle of metadata.

Page 15: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 8 of 70

Term Description

Name Authority List

Controlled vocabulary for use in naming particular entities consistently [ISO/DIS 25964-2].

Reference data

Reference data is small, discrete sets of values that are not updated as

part of business transactions but are usually used to impose consistent classification. Reference data normally has a low update frequency. Reference data is relevant across more than one business systems belonging to different organisations and sectors.

RFC Request For Change a form used to record details of a request for a change and is sent as an input to change management by the change requestor.

SKOS Simple Knowledge Organization System – RDF Vocabulary for the representation of key reference data such as code lists, and taxonomies.

Structural

metadata Data model or reference data

Taxonomy Scheme of categories and subcategories that can be used to sort and otherwise organize items of knowledge or information [ISO/DIS

25964-2].

Thesaurus

Controlled and structured vocabulary in which concepts are represented by terms, organized so that relationships between

concepts are made explicit, and preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms [ISO 25964-1:2011].

GENIS

The Generic Interoperable Notification Services (GENIS) Information

System is used to manage and support the exchange of information between Member States and the Commission within the State Aid Notification Process, where Member States notify the European Commission of planned State-aid. GENIS is also known as the State Aid Notification Interactive (SANI-2), and is the successor to the existing SANI.

RDC Reference data component belonging to GENIS for the automated

deployment of reference data

SARI The State Aid Reporting Interactive (SARI) is used by Member States to supply the European Commission with the requested information on state aid issued to beneficiaries.

CMS The Case Management System (CMS) receives the notification and is used by Commission staff to investigate whether the State-aid can be approved.

Page 16: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 9 of 70

2. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA

GOVERNANCE

This section elicits the stakeholder requests and needs and formulates the

specifications for a reference data governance framework for the State-aid

information systems of DG COMP. We defined metadata governance as the set of

roles and responsibilities, cohesive policies and principles, and decision-making

processes that define, govern and regulate the lifecycle of metadata.

2.1. Stakeholder requests and needs

The table below lists the stakeholder requests and needs for reference data

governance.

Table 3 – Stakeholder requests: reference data management

ID Request or need

Organisational Structure

G1 Formal organisational structure (including ownership) for

context-neutral reference data

Authentic reference data is context-neutral, i.e. not defined in the

context of a single system. There must be a formal organisational

structure for the governance of each set of authentic, context-neutral

reference data with formally defined roles including ownership. The

owner should be committed to sustain the reference data specification

using an open change management process.

G2 Formal organisational structure (including ownership) for

system-based reference data

There must be a formal organisational structure for each information

system that uses reference data with formally defined roles including

ownership.

G3 Foster the reuse of existing standards

The reference data management and governance structure should foster

the reuse of existing standards.

G4 Involve direct stakeholders in the governance process

The solution should foresee the involvement of direct stakeholders in the

metadata governance process to ensure that the interests of the

stakeholders are taken into account.

[Note: The specification of this will be closely linked to ISO 11179-

6:2005 and OPOCE best practices]

G5 Involve operational staff in functional meetings

The solution should foresee to invite representatives from the operational

level to participate in functional-level meetings.

[Note: The specification of this will be closely linked to ISO 11179-

6:2005 and OPOCE best practices]

Page 17: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 10 of 70

Scope Criteria

G6 Intra- and inter-institutional governance

The mechanism for governance should encompass both intra- and inter-

institutional data exchange:

Inter-institutional information exchange: when EU institutions

exchange structured information on a recurring basis

Intra-institutional information exchange: in areas where changing

structural reference data would have a high-impact on operational

systems.

G7 Reusability of proposed solution

Although the reference data solution is developed mainly for the State-

aid domain, its processes should be generic for the purpose of being

reused in other domains and by other EU institutions.

Decision mechanism

G8 Decision mandate

The governance mechanism should clearly state the mandate of the

governance body with regard to taking decisions on:

Changes to reference data;

Intellectual property rights linked to reference data; and

Enforcement, i.e. implementation of reference data specifications in

systems.

G9 Documentation

Specific decision making processes which are depending on the context

in which a decision is required should be developed, documented and

shared with all relevant stakeholders.

G10 Time constraints

Decision processes should be linked to time constraints which are

dependent on the nature of the decision to be taken.

G11 Basis for decision making

The decision making processes should describe how agreements are

reached – e.g. via a qualified majority or via consensus building.

Enforcement Process

G12 Legal enforcement

In the context of State-aid control, the information that must be

exchanged between Member States and the European Commission is

specified in EU legislation, including the use of reference data.

G13 Reuse under an open licence

The reference data should be reusable under an open, widely permissive

licence.

Process for Continuous Improvement

G14 Quality Assurance

Page 18: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 11 of 70

The reference data management and governance methodology

should take quality of its processes (cf. 2.3 and 3.3) into account

as an intrinsic aspect and not regard it as an after-thought.

G15 Risk mitigation

Risks related to the propagation of changes to reference data into

operational systems, should be mitigated by governance processes.

Overall, the governance structure should promote the sharing and reuse of

reference data sets.

Page 19: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 12 of 70

2.2. Existing solutions for reference data governance

This section contains an overview of existing reference data governance solutions.

These solutions could be taken as a reference for best practices or even adopted

where possible.

2.2.1. ISA Committee and ISA Coordination Group

The European Commission is assisted in the implementation of the Interoperability

Solutions for European Public Administrations (ISA) Programme by the ISA

Committee, which represents the Member States. Furthermore, the ISA

Coordination Group, nominated by the ISA Committee, ensures continuity and

consistency at working level. Expert groups provide guidance on specific Work

Programme actions. In the past, the ISA Coordination Group has endorsed

structural metadata such as the Core Vocabularies3. This governance body may be

useful for taking high-level decisions on voluntary, trans-European harmonisation

initiatives on structural metadata. Obviously, the ISA Committee and Coordination

Groups do not have a mandate to take decisions in the context of reference data for

State-aid control.

2.2.2. Inter-Institutional Metadata Maintenance Committee (IMMC)

The Inter-Institutional Metadata Maintenance Committee (IMMC) is responsible for

the decisions related to key reference data and data models used in the legal

decision-making process of EU institutions and the EU Open Data Portal (ODP). A

thorough description of the governance methodology of the IMMC is included in

deliverables D4.1 and D4.2. Whereas the governance methodology applied by the

IMMC meets most requirements for inter-institutional governance, the current

scope of the IMMC does not cover reference data in the domain of State-aid control.

It also does not provide a solution for the local governance.

2.2.3. ISO11179-6 Metadata Registration

A general standard for the registration of metadata items is ISO/IEC 11179. As part

of the six-part standard, ISO/IEC 11179-6:20054 specifies the procedure by

which Administered Items required in various application areas could be registered

and assigned an internationally unique identifier. This procedure includes

organisations such as the Registration Authority, the Responsible Organisation, and

the Submitting Organisation. It also includes roles such as the Registrar, Steward,

and Submitter. This standard was a source of inspiration for the IMMC and its

Metadata Registry.

3 Joinup (30 May 2012), ISA Member State representatives endorse key specifications for e-Government

interoperability, https://joinup.ec.europa.eu/node/48837

4 ISO/IEC 11179-6:2005. Information technology -- Metadata registries (MDR) -- Part 6: Registration.

http://www.iso.org/iso/catalogue_detail.htm?csnumber=35348

Page 20: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 13 of 70

2.2.4. Data Management Body of Knowledge (DM-BOK)

The Data Management Body of Knowledge (DM-BOK) is a general methodology for

data management. The DM-BOK devotes an entire chapter to Reference and Master

Data Management. In terms of governance, it defines a number of Reference Data

Management processes. In terms of Governance Structure, it defines a number of

operational roles including the Data Architect, Business Analyst, Data Stewart, and

Application Architect as responsible rules. It attributes all decision power onto the

role of a Data Governance Council.

2.3. Specification of metadata governance

This section contains a proposed specification of metadata governance that is

tailored to the State-aid control information systems operated by DG COMP.

2.3.1. Scope

The domain of the governance is in the first place limited to State-aid control.

However, some reference data is not sector-specific, for example country codes,

but cross-sectorial.

Another aspect of scope is the level of governance. For DG COMP, metadata

governance should take place at three levels:

Local: part of the reference data is system-specific, i.e. specific to the

State-aid control information systems of DG COMP. For such reference data,

governance and management should take place at local (intra-

organisational) level only.

Inter-institutional: another part of the reference data can potentially be

used or is already used in the context of other information systems. For such

reference data, governance and management should take place both at the

inter-institutional and local levels.

Trans-European: reference data that can be used in the context of

information systems between Member States and the EU institutions, bodies

and agencies. In such cases Comitology may be needed. Comitology

procedures are relevant when the EC has been granted power to create and

implement rules. This is further explained in Section 2.3.2.3.

Figure 2: organisation structures

OP IMMC ISA Committee ?

MS1

MS2

MS3 MS4

DG1

DG2

DG3 DG4

DG…

LOCAL INTER - INSTITUTIONAL BETWEEN MEMBER STATES

COORDINATION EU INSTITUTIONS

COORDINATION EU

Page 21: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 14 of 70

Setting up metadata governance structures at these levels may seem heavy and

require a considerable coordination costs, however, experience from practice seems

to indicate that this is needed. The more complex the sharing of reference

becomes the more need there will be for formalized procedures. For

instance on an inter-institutional and trans-European level it will be a necessity to

describe change and release management in the rules of procedure, on a local level

this may be less formal depending on communications. Without formal metadata

governance and management many coordination problems may occur. The benefits

of proper metadata governance and management for information exchanges in

many cases outweighs the costs of fixing interoperability conflicts in production

systems

There must be a clear set of scope criteria that determine whether a reference data

specification should be placed under local, inter-institutional or trans-European

governance as this requires considerable coordination effort. On the other hand, it

increases reuse and hence maximises the benefits of interoperability through the

use of common reference data.

We propose that a metadata specification (including reference data) is placed under

trans-European governance when the following criteria are met:

The reference data is within scope of Council Regulation (EC) No 659/1999

and therefore directly related to the domain of State-aid control (i.e. the

notification or reporting process).

It is proposed that a metadata specification (including reference data) is placed

under inter-institutional governance when the following criteria are met:

Inter-institutional information exchange: when public administrations

exchange information on a recurring basis in which the metadata

specification is used as a common information exchange specification. For

example, the Named Authority List on currencies (NALs) of the Publications

Office could fit this criterion;

Large degree of similarity: when EU institutions use structural metadata

in existing information systems with a large degree of similarity. For

example, nearly all information systems of EU institutions use reference data

about the Member States of the European Union;

Commitment of maintenance: the publisher is committed to sustain the

specification using an open change management process. For example, the

Publications Office has a strong commitment of maintaining the Named

Authority Lists;

Commitment of use: there are at least two public administrations that

have a strong commitment to use the metadata specification. For example,

the Nomenclature for Terrestrial Units (NUTS) is used by many EU

institutions.

We propose that a metadata specification is placed under local governance when

the cost of coordination outweighs the benefits of interoperability:

High impact of changes: in areas where changing structural reference

data would have a high-impact on operational systems. For example, in a

content management system where updating reference data impacts many

other systems that provision the content management system, the impact of

changes to reference data may outweigh the benefits of increased

Page 22: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 15 of 70

interoperability due to the use of common reference data placed under

common governance.

No renewal of legacy applications: in areas where there is no renewal of

legacy systems, the cost of orchestrating integration outweighs pragmatic

local management. For example, in cases where structural metadata has

been hard-coded in software, implementing each change of reference data

triggers a software development lifecycle, which may be undesirable for

legacy applications.

2.3.2. Organisational structure

The following sections describe a governance mechanism for State-aid control at

local, inter-institutional, and trans-European levels.

2.3.2.1.Local governance structure

For the local governance, this report proposes to reuse an existing governance

structure used by DG COMP in the governance of the State-aid control information

systems. For the local governance the structure could be as follows:

A steering committee that will decide on strategic levels such as the

continuity and direction of the State-aid system, establish policies, deal with

issues related to the data model, such as copyrights, business relations. The

Steering Committee provides the strategic directions for the work and will

participate in the maintenance of the structural metadata ensuring the

alignment with the European policies and guidelines

A working group (WG): the WG brings together a group of experts with

knowledge of reference data. The WG is responsible for developing,

maintaining and publishing the reference data:

o The working group will consider proposals either by the group itself

or by users

o Proposals that are supported by the working group are sent to the

steering committee.

o The steering committee will provide advice on the validity of

proposals – advice taken into account by the SC’s decisions.

Stakeholders: all involved stakeholders perform day-to-day operations.

This is the level where the structural metadata is actually reused and

implemented in production systems. Feedback on the suitability of the

structural metadata in the different application scenarios is communicated

from this level to the functional level, in order to ensure that the structural

metadata is fit for purpose.

It is recommended that representatives from the stakeholders are invited to

participate in the WGs. This will ensure that feedback from the stakeholders is

fed into the structural metadata lifecycle, fostering the alignment of the

structural metadata with the requirements and needs of the users.

On a local scale this distinction might only exist in terms of roles.

At least the following roles should be implemented:

Page 23: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 16 of 70

Content expertise: knowledge about the semantics of the data for which

the metadata is used and the applications in which the data is used

Information management expertise: knowledge about theory and

practice of metadata

Technical expertise: knowledge about the technical approaches to be used

for the technical implementation in the environment in which the metadata

is used.

Documentation and publication expertise: knowledge about the

documentation rules and publication processes used in the environment in

which the metadata is used.

An example of a decision to be made on a local level could be the implementation

of background link type. It is only by DG COMP in the State-aid system, thus

decisions on management can be made on a local level according to the governance

structure and specified roles.

2.3.2.2.Inter-institutional governance structure

For the inter-institutional governance, this report proposes to adopt the governance

model of Inter-institutional Metadata Maintenance Committee (IMMC)5, or even to

expand the mandate of this governance body to also include

For example, the Countries Named Authority List is already governed by the IMMC

at an inter-institutional level. The NAL is a controlled vocabulary listing countries

with their authority code and label(s) The Countries NAL is part of the Core

Metadata (CM) used in the data exchange between the institutions involved in the

legal decision making process and the Publications Office of the EU. The NAL is

under governance of the Inter-institutional Metadata Maintenance Committee

(IMMC) and maintained by the Publications Office of the EU in its Metadata Registry

(MDR).

5 Annex 2 a` la note CD(2011)53 http://publications.europa.eu/mdr/resource/core-metadata/IMMC_reu3_adoption_anx3.pdf_A-

2011-764293.pdf

Page 24: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 17 of 70

2.3.2.3.Trans-European governance structure

In the context of State-aid Control, article 27 of Council Regulation (EC) No

659/1999 gave the European Commission the power of adopting implementing

provisions on State-aid Control. These include the specification of reference data,

as can be derived from the below text:

The Commission, acting in accordance with the procedure laid down in

Article 29, shall have the power to adopt implementing provisions

concerning the form, content and other details of notifications,[…]

In cases where the European Commission is given this power, a governance

mechanism is put in place that must follow Comitology procedures. Meaning that, a

committee composed of the representatives of the Member States and chaired by

the Commission is set up. The primary role of these Committees is to provide an

opinion on the draft measures that the Commission intends to adopt. There are two

functions for a committee either advisory or examination.

Advisory: the Commission shall take the utmost account of the committee’s

opinion.

Examination: implementing acts cannot be adopted by the Commission if

they are not in accordance with the opinion of the committee, except in very

exceptional circumstances, where they may apply for a limited period of

time

In the context of State-aid control, the Comitology Procedure resulted in

Commission Regulation (EC) No 794/2004 of 21 April 2004. One example of a

reference data specification that was designed through this process are the values

of the “objective” of State-aid control.

Page 25: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 18 of 70

Figure 3 – Illustration: objectives of State-aid control as defined in Commission Regulation

(EC) No 794/2004

2.3.3. Decisions

The three aforementioned governance structures should take among others the

following decisions:

Whether a metadata specification must be placed under local or inter-

institutional governance;

How to change and improve the metadata management process;

Whether a change request to a metadata specification must be accepted or

rejected (based on an impact analysis; cost-benefit analysis, risk analysis);

Whether an accepted change request will be released immediately or in a

scheduled release;

Where to store a metadata specification and with which access restrictions

(define roles and responsibilities);

Whether a metadata specification can be published under an open licence;

Whether a metadata specification can be supplemented with official

mappings;

Which policy is followed to encourage or mandate the reuse of the reference

data specification;

Which method is used for documenting reference data;

Whether a metadata specification should be deprecated; and

Which standards and tools to use in the metadata management process.

In the three aforementioned governance bodies, all decisions should be taken by

consensus and should be formally logged. Where not time-constrained, members

of the Governance structure should have sufficient time (e.g. two weeks) to review

proposed decisions. Where more time is needed to evaluate a proposal, it is a good

Page 26: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 19 of 70

practice to have decision takers request additional time. Giving too much time for

review by default, would slow down the decision making process.

2.3.4. Authoritative source

Both reference data specifications under metadata governance and related

documentation should have one owner, called authoritative source.

It is recommended to select an authoritative source which provides support for

versioning, and thus keeps track of all previous releases of structural metadata.

The latter is especially important when working with historical datasets, where it

may be needed to refer back to previous releases of reference data. In Section 3,

we specify how the authoritative source relates to the release management

process, where a new version of reference data is released. In Section 3.3.6, we

identify a number of tools that could support the management of the authoritative

source.

The use of persistent Uniform Resource Identifiers (HTTP URI’s) for reference

data releases can make it easier to manage an authoritative source. URI’s are

increasingly used for data integration according to the design principle of “Linked

Data”. Linked Data is a way of identifying, linking and accessing information on the

Web according to the four design principles put forward by Tim Berners-Lee6:

Use URIs as names for things;

Use HTTP URIs so that people can look up those names;

When someone looks up a URI, provide useful information, using the

standards (RDF*, SPARQL); and Include links to other URIs, so that they

can discover more things.

Even when the underlying technology changes persistent HTTP URI’s allow both

identifying and obtaining reference data sets via a mechanism of URI forwarding /

redirection. A prerequisite for this is that the URI’s are well managed. A proposal

for the governance and management of persistent URIs for EU institutions is

included in deliverable ‘D3.2 Common approach for the management of persistent

URIs by EU institutions’.

6 http://www.w3.org/DesignIssues/LinkedData.html

Page 27: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 20 of 70

Example: For the ADMS reference data of the ISA Programme, a file server on

Joinup is used as the authoritative source. However, the purl.org service is used to

maintain persistent (permanent) HTTP URIs. This means that the link will never

break as changes are made, and the authoritative source will always be available

because the end point is always the same.

http://purl.org/adms/assettype/1.0 redirects to

https://joinup.ec.europa.eu/svn/adms/ADMS_v1.00/ADMS_SKOS_v1.00.rdf

An example of combining best practices of persistent URI’s with keeping preceding

versions available are provided by the Metadata Registry (MDR) of the Publications

Office of the European Union. The URI’s of the Named Authority Lists (NAL) in the

MDR refer to the latest version of the structural metadata. For accessing preceding

versions, the URI’s include version numbers. This does not only see to consistency

and continuity but also supports release management as described in Section 3.3.4.

2.3.5. Licensing framework

Both under local and inter-institutional governance, it is important and a legal

obligation under the PSI Directive, for public administrations to make their data,

which includes reference data, available under an open licence upon a so-called

“request for reuse” by any third-party.

The European Commission is following a policy whereby it actively encourages the

publication of government data. Different licensing options can be considered for

the reference data of DG COMP. These include among others:

The ISA Open Metadata Licence v1.17: this is a permissive licence that

grants the rights of use (both for commercial and non-commercial

purposes), the creation of derivative works, and redistribution. Nearly the

only restriction that it applies is to cite the source (attribution).;

The European Commission Legal Notice8: this notice authorises reuse

provided that the source is acknowledged (attribution), additional reuse

conditions (other restrictions) can be added to this by the publisher.

The European Union Public Licence (EUPL)9: this is a permissive software

licence that allows the rights of use (both for commercial and non-

commercial purposes), the creation of derivative works, and redistribution.

In addition to giving attribution, the licence also requires derivative works to

be shared under similar licensing conditions (share-a-like).

DG COMP should set up the appropriate licensing framework, guaranteeing that it

also owns the intellectual property rights before granting any rights to third-parties.

Intellectual property rights are usually acquired by the European Union through

employment or procurement contracts. In case of Member Sate working groups,

contributor agreements may be needed, such as the ISA Contributor Agreement. In

7 https://joinup.ec.europa.eu/community/semic/document/joinup-semantic-asset-licensing-framework 8 http://ec.europa.eu/ipg/basics/legal/notice_copyright/index_en.htm 9 https://joinup.ec.europa.eu/software/page/eupl

Page 28: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 21 of 70

cases when external standards, maintained by standardisation bodies, are reused,

the licensing and reuse conditions of these standards have to be considered and

respected.

2.3.6. Enforcement

Metadata governance should help decide which policy should be followed to

encourage or mandate the reuse of the reference data specification. According to

deliverable ‘D4.1 Metadata management requirements and existing solutions in EU

Institutions and Member States’, the following options are possible:

Legal requirement: implementation is enforced by law; it is an official

requirement;

Comply-or-explain: implementation is not enforced by law, but public

administrations have to comply with the use of a particular specification or

standard for metadata, or if they do not comply, explain publicly why they

do not;

Oversight board: implementation is encouraged via project review

committees; or

Voluntary: implementation is encouraged via information campaigns.

In the context of State-aid control, enforcement is often a matter of a legal

obligation. For example, Commission Regulation (EC) No 794/2004 specifies the

reference data in the forms that Member States have to fill in for State-aid

notification.

2.3.7. Continuous improvement

Metadata governance should facilitate the continuous improvement (implement

feedback) of the metadata management process and governance rules. To ensure

a process for continuous improvement, all decisions taken should be systematically

documented and made accessible for consultation by the various stakeholders

involved. For example the IMMC does so via the publicly available MDR, where a

user not only can find the structural metadata of current application, but also

previous versions. Hereby it should weigh of the benefits of increased

interoperability and data quality against the increased coordination costs.

The following metrics and key-performance indicators should be monitored:

The number of change requests;

The number of releases;

The lead time between receipt of a change request and the closing of the

change management process for this request;

The number of full-time equivalents needed to operate the metadata

governance and management;

The number of systems that have implemented the metadata specifications;

Page 29: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 22 of 70

3. REQUIREMENTS AND SPECIFICATIONS FOR REFERENCE DATA

MANAGEMENT

This chapter provides an overview of the requirements and specifications for

reference data management for DG COMP

3.1. Stakeholder requests and needs

The table below contains an overview of requirements for reference data

management gathered from DG COMP.

Table 4 – Stakeholder requests and needs: reference data management

ID Requests and needs

Design reference data

M1

Design reference data

The management processes set up for the State Aid Notification

System should support the design and development of reference data

sets which have to be used by the member states.

M2

Integration with external sources

The State Aid Notification System uses reference data from external

authoritative sources. Integrating these external reference data sets

with the internal system is a key requirement.

M3

Quality control

The proposed solution should put in place processes for controlling the

quality of the reference data. The quality control processes also apply

when updating reference data sets.

Manage reference data changes

M4

Detect changes in external reference data

The proposed solution for reference data management should support

detecting changes to reference data which is managed by an external

organisation and published on an external authoritative source.

M5

Impact assessment of changes in external reference data

When a new version of externally managed reference data is released,

the proposed solution should support assessing the impact of these

changes on the reference data which is used in the State Aid

Notification System. This process should support deciding whether the

reference data of the State Aid Notification System should be modified

as a consequence of a change in external reference data.

M6

Manage changes to internal reference data

The management processes should describe how changes to internally

managed reference data sets should be handled.

Implement reference data changes

M7 Impact assessment of an implementation

Page 30: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 23 of 70

The change management processes should include an impact

assessment of implementing changes into the reference data

repository of the State Aid Notification System on the users of this

system.

M8

Update reference data lists

The management methodology should describe how changes to the

reference data are updated in reference data sets.

M9

Standardised formats for interoperability

In order to foster interoperability with a wide range of (legacy)

systems, the reference data should be made available in a

standardised format. Support for XML is an important requirement.

M10

Propagation

Changes in reference data should be propagated to the Case

Management Systems. While this is a request from DG COMP, it is

considered out of scope for this study.

Share and reuse reference data

M11

List reference data for reuse on an open platform

Reference data which is managed by external organisations and which

should be reused by Member States when exchanging information with

the European Commission should be listed on an open platform which

includes URI’s to the versions that need to be reused.

M12

Share reference data in an authoritative source

Reference data which is internally managed by

DG COMP should be published and documented on an authoritative

source in machine- and human-readable formats.

M13

For the purpose of supporting interoperability, the documentation of

metadata should provide all the necessary elements (e.g. guidelines,

tutorials, tools) for stakeholders to easily incorporate the reference

data with their systems and internal management and governance

structure.

M14

Versioning and backward compatibility

Preceding versions of the reference data should be kept available at

the authoritative source.

Harmonise reference data

M15

Selection

When alternative reference data sets are available to be reused in the

State Aid Notification System, the management processes should

propose a methodology for comparing and selecting one data set.

M16

Mapping

The proposed solution for reference data management should describe

how similar data sets can be mapped in their context, while keeping

trace of such branches

Page 31: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 24 of 70

3.2. Existing methodologies for reference data management

This section contains an overview of existing reference data management

methodologies. These solutions should be taken as a reference for best practices or

even adopted where possible.

3.2.1. Data Management Body of Knowledge (DM-BOK)

The Data Management Association’s guide to the Data Management Body of

Knowledge recommends that changes to controlled vocabularies and their reference

data sets are conducted by a change request process:

1. Create and receive a change request

2. Identify the related stakeholders and understand their interests.

3. Identify and evaluate the impacts of the proposed change.

4. Decide to accept or reject the change, or recommend a decision to

management or governance.

5. Review and approve or deny the recommendation.

6. Communicate the decision to stakeholders prior to making the change.

7. Update the data.

8. Inform stakeholder the change has been made.

3.2.2. ISO 11179-6 Metadata Registration

The ISO/IEC 1117910 standard provides guidelines for several topics related to

Metadata Registries (MDR):

Part 1 introduces a framework containing fundamental ideas of data

elements, value domains, data element concepts, conceptual domains, and

classification schemes;

Part 2 provides a conceptual model for managing classification schemes;

Part 3 specifies a registry meta-model and basic attributes;

Part 4 provides guidelines for formulating unambiguous data definitions;

Part 5 introduces naming and identification principles;

Part 6 provides instructions on how registration applicants could register a

data item with a central Registration Authority, including allocating unique

identifiers for each data item.

Besides data elements, ISO/IEC 11179-6 addresses data element concepts,

conceptual domains and value domains as defined in ISO/IEC 11179-3. The

standard provides guidelines for representing these data types in a metadata

registry that documents the common administration and identification, naming and

definition details together with their administered item-specific details. These

guidelines include:

10 http://metadata-standards.org/11179/

Page 32: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 25 of 70

a proposed structure for an International Registration Data Identifier

(IRDI);

tables that summarize the requirements for the inclusion of metadata

attributes in an MDR;

suggested roles and responsibilities for managing an MDR; and

a suggested set of operations for functional operating procedures.

3.2.3. ISO 19135:2005 Geographic information -- Procedures for item

registration

“ISO 19135:200511 specifies procedures to be followed in establishing, maintaining

and publishing registers of unique, unambiguous and permanent identifiers, and

meanings that are assigned to items of geographic information” [International

Organisation for Standardisation, 2005]. The standard specifies which information

is necessary to uniquely identify, define, manage and register items in a registry.

3.2.4. Information Technology Infrastructure Library (ITIL)

The Information Technology Infrastructure Library (ITIL) is a systematic approach

to the delivery of quality IT services. It provides a basic vocabulary and a number

of processes that are relevant in managing the lifecycle of IT services such as

change management, release management, and service validation and testing.

3.2.5. Good practices from the Publications Office: integrating

Reference Data Management in the Software Development

Lifecycle

In order to foster the reuse of reference data sets, it is crucial to ensure the

reference data release cycle is aligned with the internal software development

lifecycle (SDLC) of its users. For the purpose of integrating reference data

management in the SDLC, the Publications Office of the EU identified several best

practices:

Impact Analysis

In its change management process, the publications office carries out an

impact assessment to assess the impact related to a change to the Named

Authority List (NALs) on the production systems that use them. These

systems are related to the legislative process of the European Union. The

impact analysis can lead to three levels:

o Minor change (minor impact);

o Major change (major impact); and

o Structural change (structural impact).

Align Release Cycles

The Publications Office aims to improve the alignment of the NAL release

cycle with the Software Development Lifecycle of the applications that reuse

11 http://www.iso.org/iso/catalogue_detail.htm?csnumber=32553

Page 33: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 26 of 70

the NALs. This entails categorizing releases as minor, major or structural. In

the future, releases will be scheduled periodically. The periodicity would then

depend on the category of the release: minor releases could be launched

every 2 months and major releases every 3 months. Structural changes,

which include publishing new code lists for example, would be released on

an ad hoc basis.

Get Commitment of External Bodies

For metadata which is under the governance of external or inter-institutional

bodies, such as the IMMC, it is hard to get a general agreement on

implementation planning. Therefore, the Publications Office aims to get

external parties committed to release new versions following a regular

schedule. External releases would not necessarily be published

simultaneously with internal releases, but re-users could adapt their internal

software releases based on the committed release schedule.

Use standards

ISO11179-6 contains a number of suggested operating procedures, roles,

and responsibilities for metadata management. This also includes the role of

a Metadata Steward –called the domain expert at the Publications Office –

who among others helps with the impact assessment. The use of other

standards such as DM-BOK or the ISO 19109 standard on geographic

information may also be very relevant.

Publish Release Notes

Together with each version release, the Publications Office publishes a

release note which justifies and explains the new release.

Publish Difference Lists

When publishing new versions of authority tables, a machine-readable

difference table listing all the changes compared to the previous version is

released. Originally, differences were represented both in XML and SKOS.

Based on feedback received from the users of the metadata, the Publications

Office recently opted to represent the changes in an Excel spreadsheet.

Difference lists are especially valuable to users of legacy systems, in which

reference data sets are often hard coded. Implementing changes in such

systems entails significant software development efforts, which can be

optimized by using difference lists.

Versioning

A good practice in versioning is to keep previous versions of metadata

available on the authoritative source. Combined with good URI management

this allows users to minimize the risk for their operations by referring to

specific versions of the metadata since changes are not de facto

incorporated in their processes or IT systems.

Standardised testing

Since the Publications Office might not be able to assess the impact of a

change in reference data on the operational system of its users, automated

propagation of reference data to those systems would bring significant risks.

Therefore, the Publications Office proposes to run standard test sets at the

user side on new versions of reference data before implementing releases in

production systems. A different strategy could imply defining different

classes of impact and a prior assessment of the impact of different types of

Page 34: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 27 of 70

change mapped out against these classes. This way you can build prior

knowledge of what can be automated and what not.

3.3. Specification for metadata management

This section describes the high-level administrative processes that are included in

the life cycle of reference data management. The administrative processes will be

described using the Business Process Modelling Notation (BPMN) [OMG, 2011].

Although there are different levels of metadata governance, the processes

described below are generic and should therefore be applicable to all.

3.3.1. Design structural metadata

Structural data design entails the processes of agreeing on the syntax and the

semantics, and encoding the reference data in different formats. This phase is out

of scope of this work.

3.3.2. Manage change of structural metadata

Goal: Managing changes that impact the reference data through a centralised

process in order to ensure that the internal IT infrastructure and services as well as

the systems of users remain aligned to business requirements.

Actors and roles:

The environment of the Reference Data Management Component contains two main

levels.

The first level represents the owner of the reference data. This is the

governing body that creates and maintains the reference data set. It

includes roles such as a Reference Data Working Group (RD-WG), a Review

Group (RG) and a Steering Committee. DG COMP or the Publications Office

of the EU, who both own and manage reference data, would be part of this

governance level.

The second level represents the users that reuse the reference data. These

users include systems within or outside of the governing level, such as the

GENIS system, the Case Management Systems of SANI or any DG reusing

reference data owned by DG COMP.

The manage reference data changes process is carried out at the governing level.

Therefore, this process does not include the change management at the user side,

which is described in section 3.3.4 on implementing reference data changes in

operational systems. The majority of tasks within this process is carried out by the

Reference Data Working Group (RD-WG) and reviewed by the Review Group (RG).

At least the following roles should be implemented:

Content expertise: knowledge about the semantics of the data for which

the metadata is used and the applications in which the data is used

Page 35: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 28 of 70

Information management expertise: knowledge about theory and

practice of change management, e.g. impact on environment.

Technical expertise: knowledge about the technical approaches to be used

and the impact on systems.

Documentation and publication expertise: knowledge about the

documentation rules and publication processes used in the environment in

which the metadata is used.

Tasks:

Record a Request for Change (RFC)

The creation of an RFC can be triggered by different sources, such as

incoming user feedback, the outcome of periodic reviews, legal obligations

and the release of a new version of a reused standard. All RFC’s are stored,

tracked and maintained in a ticketing system.

Validate an RFC

The editor of the working group checks if the RFC is provided in the correct

format and if it contains all the relevant information for carrying out the

assessment phase.

Assess and Evaluate the RFC

Since not all RFC’s should lead to a change, objective criteria should be set

up for assessing and evaluating change requests. Such criteria could include

an impact analysis carried out by the owner(s) of the reference data. See

3.2.5 for good practice on impact analysis. The outcome of the assessment

should be a categorisation of the requested change, which influences the

further management of the RFC. The categorisation is carried out by the RD-

WG based on the risk related to the change, which can be minor, major or

structural.

Approve or reject a change request

Based on the assessment of the RFC, the change is accepted or rejected by

the Review Group. The stakeholders are informed about the decision taken.

Plan updates

After an RFC is approved, the implementation and harmonization of the

change should be planned. The working group decides on the timing of the

update, whether the change will be implemented on itself or if it will be

grouped with other changes in a release.

Coordinate Change Implementation into the Component

Before being implemented into the production environment of the Reference

Data Management Component, all changes should be tested in an isolated

testing environment. Moreover, service desks and other related stakeholders

should be provided with the necessary documentation regarding the change

in order to support the implementation.

Review and Close Change

The review and closing stage includes validating if the implemented change

addresses the original RFC and if stakeholders are satisfied.

Page 36: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 29 of 70

Environments and testing:

Typically changes are not implemented immediately into the production

environment. Structural changes to reference data and/or data usually need to be

implemented in supporting software/tools of all stakeholders involved. Such

changes should first be developed in a separate environment in order to guarantee

the continuation and quality of operational systems. Changes to reference data that

impact software or the data itself should be made on a development environment

first. After initial system tests by the development team, the changes can be

applied to a separate environment called Integration/Acceptance. This will allow

users, domain experts, to test the change from a user perspective. If all is well,

then the changes can be rolled out on the production environment. Therefore, at

least the following technical environments should be available:

Development: all changes are developed on this environment.

Integration Testing: after development, the applied change needs to be

tested in an integrated (not isolated) environment, mimicking as close as

possible the real context

Acceptance: this is a separate environment to allow users to accept the

committed changes

Production: the live environment

If the changes are made on alocal level, the tests as described could be enough.

However, on an inter-organisational environment where multiple stakeholders are

involved additional testing is needed. For instance, in the situation where there are

multiple sources, a central processing environment and multiple consumers an

integration / chain test is in order. In this test, all involved parties assure

themselves that the processing of reference data and the changes made, across

multiple systems from different owners, work as described in the documentation.

Only after a successful integration test on a test environment, the actual rollout in

the production chain will take place.

The process as described above applies mainly for structural changes in reference

data that have an impact on the operational software. In other cases a complete

DTAP environment is not a necessity. For instance, if a new version of a data model

is created in an editing tool, the model itself should then be tested. It is not

necessary to have multiple instances of the editing tool, because the tool itself is

not being tested nor is it part of the production environment (data value chain).

Decisions:

Is the change to be discussed on an local, inter-institutional or Trans-

European level

Is the change valid: does it fit, is it cost-effective, are the risks manageable

Decide on follow-up of declined changes

Determine if the change is urgent

Determine when the change will be formalized (for which release)

Page 37: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 30 of 70

3.3.3. Harmonise structural metadata

Goal:

The harmonisation of structural metadata used for information exchange either

through the creation of mappings between terms of two or more specifications for

structural metadata or by forging a wide consensus on the use of a common

specification.

The use of a common specification is more likely in a local and sometimes an inter-

institutional environment. The use of a mapping is more likely where there is a wide

variety of stakeholders such as trans-European where member states are involved.

Metadata alignment can offer a real added value to European institutions and public

administrations of the Member States:

Increase quality and value of the data: the use of common controlled

structural metadata or the use of agreed mappings reduces the

heterogeneity of the dataset and increases the reusability of data in other

contexts, hence the value;

Provide richer and more expressive context to their data;

Increase visibility and discoverability;

Increase reuse potential;

Promote the reuse of information from other authoritative sources.

Actors and roles:

At least the following roles should be implemented:

Content expertise: knowledge about the semantics of the data for which

the metadata is used and the applications in which the data is used

Information management expertise: knowledge about theory and

practice of harmonisation, e.g. mapping across multiple systems.

Technical expertise: knowledge about the technical approaches to be used

for the technical implementation in the environment in which the metadata

is used.

Documentation and publication expertise: knowledge about the

documentation rules and publication processes used in the environment in

which the metadata is used.

Page 38: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 31 of 70

Tasks:

Below is a description of the tasks to be performed in the situation that a mapping

would be needed.

1. Identify and analyse related metadata specifications: the working group

has an operational responsibility to manage metadata. This is including

maintaining metadata as is, accept or decline new proposals based on

metadata harmonization criteria;

2. Propose mappings: the working group should then make a proposal for

mappings. The Steering committee decides whether the proposal is

approved or not. If approved the working group can continue with the

next step;

3. Create and execute mapping (Add the mapping in the controlled

vocabularies file);

4. Testing the mapping

5. Publish the metadata alignment.

Below is a description of the tasks to be performed in the situation that a common

reference model would be used.

1. Identify and analyse related metadata specifications: the working group

has an operational responsibility to manage metadata. This is including

maintaining metadata as is, accept or decline new proposals based on

metadata harmonization criteria;

2. Propose reference model: the working group should agree on a common

model to use and define the parties that are involved;

3. Standardize the common model and determine who will manage the

model;

4. Test the model for all users

5. Publish the metadata alignment.

Decisions:

Decide on the use of common specification or mappings

In case of mappings, decide whether a metadata specification can be

supplemented with official mappings

Decide on management and responsibility of harmonized data

Page 39: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 32 of 70

3.3.4. Release structural metadata

Goal:

Efficiently deploying changes into the Reference Data Management Component

while protecting the live environment through planning, testing, building and

implementing a grouped set of changes.

Actors and roles:

Similar to the change management process, the manage reference data release

process is carried out at the governance level of the reference data. The tasks are

carried out by the Reference Data Working Group and the steering committee.

Tasks:

In release management of reference data and the tools used to support it is good

practice to agree on a number of releases per year. Where reference data is just

used in one system with a high frequency in changes in reference data this is not so

much a necessity, but in an inter-institutional or trans-European environment it is

because of the impact releases have on the environment. A distinction can be made

between minor and major releases. In release management there are two options

for deploying

Immediate implementation: a change is accepted and scheduled for release.

This is most likely in a local environment with a high frequency of changes

Pooled releases: changes are pooled into periodic releases, either minor or

major release depending on the impact of the changes individually and as a

group.

In this document we will describe the process for pooled releases.

Pool a set of changes into a release

The ITIL framework and identified good practices from the Publications

Office indicated that changes could be pooled into periodic releases. By

doing so, users of the reference data can easily align their Software

Development Lifecycles to changes in reference data (see 3.3.4). The

periodicity and impact of releases depend on the release type: minor, major

or structural. For example: small changes which entail low risks for

operational systems can be assigned to minor releases, which are launched

more often than major releases that bear more risk.

Testing

Testing, user acceptance and quality assurance considerations have to be

taken into account before a release is deployed into the production

environment of the Reference Data Management Component. Before the

rollout into production is allowed, the user or business owner should sign off

on the release.

Version a release

As indicated in section 3.1 of this study, a key stakeholder requirement is to

keep preceding versions of the reference data available in order to assure

backward compatibility. It should be possible to access information that was

exchanged in the past with the applicable version of the related reference

data. In order to satisfy this requirement, versions have to be managed in a

Page 40: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 33 of 70

clear and structured way and preceding versions should be kept available in

the reference data repository. All versions and changes should be well

documented (3.3.4.1). A study conducted in light of the ISA programme on

metadata management in European Institutions and Member States

[European Commission, ISA Programme, 2014] identified several aspects of

versioning which should be taken into account:

o Numbering

Several options are identified for version numbering. A first approach,

which is applied by the Inter-institutional Metadata Management

Committee (IMMC), is to identify versions based on release dates

combined with a sequence number, e.g. 20140101-0, 20140101-1,

etc. The sequence number is mostly used for immediate bug fixes. A

second option for assigning version numbers is a multi-level

approach, which is for example applied by KOOP, a governmental

organisation from the Netherlands. In a three level approach – e.g.

X.Y.Z – Z could be altered in case of bug fixes, Y in case of minor

updates and X in case of major updates. It could also reflect to which

extend changes belong in a class of release cycles. E.g. Z for

automated changes without risk

o Backward compatibility

Backward compatibility means that new versions of the structural

metadata should be compatible with preceding versions. Updates to

data models should impact the day-to-day operations of its users as

little as possible. Therefore, backward version compatibility should be

taken into account in the update procedure. All updates that are not

backwards compatible should be clearly documented in the release

notes, and should also be accompanied by guidelines to the users on

how to deal with these changes in their production systems;

o Tool support

Deliverable D4.1 (European Commission, ISA Programme, 2014)

listed Apache Subversion (SVN) as a tool for version management.

Other versioning tools include the Concurrent Version System (CVS)

or Git; and

o Authoritative Source

It is recommended to select an authoritative source which provides

support for versioning systems. For fostering interoperability, it is

crucial that persistent Uniform Resource Identifiers are managed

properly. An example of combining best practices of persistent URI’s

and keeping preceding versions available is provided by the Metadata

Registry (MDR) of the Publications Office of the European Union. The

URI’s of the Named Authority Lists (NAL) in the MDR refer to the

latest version of the structural metadata. For accessing preceding

versions, the URI’s include version numbers.

Publish release notes

Release notes describe the general information of a release: the date of

publication, the version number, the URI, the expiration date of the version,

etc. Moreover, they should include a list of changes compared to the

previous version, preferably in machine-readable format.

Implement release

Page 41: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 34 of 70

This phase entails the release in the Reference Data Management

Component.

Communicate to stakeholders

Once the release has been rolled out the users and other stakeholders

should be notified.

Decisions:

Decide on the number and type of releases per year

To agree the exact content and plan for each release

Determine the release schedule

Actions to be taken if a release is cancelled

Whether a metadata specification can be published under an open licence

3.3.4.1.Document reference data

Goal:

Reference data is data used to classify or categorize other data. Business rules

usually dictate that reference data values conform to one of several allowed values.

The set of allowable data values is a value domain. These business rules and

domains should be well documented for successful interoperability.

Actors and roles:

At least the following roles should be implemented:

Documentation and publication expertise: knowledge about the

documentation rules and publication processes used in the environment in

which the metadata is used.

Content expertise: knowledge about the semantics of the data for which

the metadata is used and the applications in which the data is used

Tasks:

Documenting reference data may include adding descriptive reference data, such as

these defined in the Asset Description Metadata Schema (ADMS see 4.2.4):

The meaning and purpose of each reference data value domain

The reference tables and databases where the reference data appears

The source of the data in each table

The version currently available

When the data was last updated

How the data in each table is maintained

Who is accountable for the quality of the data and meta-data

Successful organizations first understand the needs for reference data. Then they

trace the lineage of this data to identify the original and interim source databases,

files, applications, organizations, and even the individual roles that create and

maintain the data. Understand both the up-stream sources and the down-stream

needs to capture quality data at its source.

Page 42: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 35 of 70

Decisions: The method used for documenting reference data. How will documenting

itself be managed, is it part of the rules of procedure

Who determines and changes the business rules

Which policy is followed to encourage or mandate the reuse of the reference

data specification

3.3.5. Deploy structural metadata

Goal:

Efficiently deploying changes into the operational systems of users while protecting

the live environment of their system through planning, testing, building and

implementing a grouped set of changes.

Actors and roles:

The implementation of reference data changes in operational system is carried out

by the users of the reference data. Here, the reference data management lifecycle

has a touch point with the software development lifecycle (SDLC).

Tasks:

There are two ways for implementing reference data: either automatically (for

instance GENIS RDC) or integration with the normal system development lifecycle.

A description of the latter as it would be in an inter-institutional environment is

given below.

Detect a change

Users should be subscribed to reference data sets for which they want to

receive notifications of upcoming and rolled out changes. Notifying users of

changes is the responsibility of the governance level, subscribing is the

responsibility of the users.

Log a system change request

Changes to reference data which have an impact on the operational systems

of users should lead to the creation of a change request in their internal IT

system, which then triggers the internal change management processes.

Analyse the impact of a change on a system

The impact assessment which is part of the Manage Reference Data Changes

process is carried out on the level of the reference data owner, thus it does

not take into account the specific characteristics of the users’ internal IT

system. Therefore, it is necessary for users to carry out an impact analysis

before implementing changes to their systems.

Test and Propagate a change to the system

A rolled out change could be grouped with other, internal changes in order

to match the Software Development Lifecycle or software release schedule

of the user. The propagation of changes should include a testing phase in an

isolated environment before releasing them into production.

Page 43: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 36 of 70

Log the change

All changes should be logged and documented for several purposes, such as

assuring the possibility to restore a system to a previous state and creating

an audit trail.

Decisions:

Where to store a metadata specification and with which access restrictions

(define roles and responsibilities)

Whether a metadata specification can be supplemented with official

mappings

3.3.6. Retire structural metadata

Goal:

Changes to internal or external reference data sets may be minor or major. For

example, country code lists go through minor revisions as geopolitical space

changes. When the Soviet Union broke into many independent states, the term for

Soviet Union was deprecated with an end of life date, and new terms added for new

countries

Sometimes terms and codes are retired. The codes still appear in the context of

transactional data, so the codes may not disappear due to referential integrity. The

codes found in a data warehouse also represent historical truth. Code tables,

therefore, require effective date and expiration date columns, and application logic

must refer to the currently valid codes when establishing new foreign key

relationships.12

Actors and roles:

A proper impact analysis of data deprecation is essential to ensure the continuity

and quality of data and systems. The involvement of all consumers is key. At least

the following roles should be implemented:

Content expertise: knowledge about the semantics of the data for which

the metadata is used and the applications in which the data is used

Information management expertise: knowledge about theory and

practice of harmonisation, e.g. mapping across multiple systems.

Technical expertise: knowledge about the technical approaches to be used

for the technical implementation in the environment in which the metadata

is used.

Documentation and publication expertise: knowledge about the

documentation rules and publication processes used in the environment in

which the metadata is used

12 Source: DAMA guide

Page 44: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 37 of 70

Tasks:

• Assess the impact of deprecation

• Review for approval

• Approach all consumers of the data

• Clearly mark reference data as deprecated

• Ensure backwards compatibility

Decisions:

• Whether a metadata specification should be deprecated

• How to approach all consumers

• How to ensure backwards compatibility

Page 45: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 38 of 70

4. REQUIREMENTS FOR AND ASSESSMENT OF EXISTING REFERENCE DATA

TOOLS

In this chapter, we identify and assess the coverage of the identified requirements

and proposed approach by existing tools, including the GENIS reference data

component, and assess their use, usefulness, and fitness-for-purpose.

4.1. Stakeholder requests and needs

Below is a list of stakeholder requests and needs for tools. The requirements for

these tools are closely related to the governance and management model as

discussed in chapter 3.3. Basically, tools are needed for the following:

Reference data editor: edit, harmonize, map and document reference

data;

Tools for managing reference data changes: managing changes and

releases of reference data;

Tools for reference data propagation: implementing and retire

reference data; and

Tools for reference data publication.

Table 5 – Reference data tools

ID Requests and needs

Reference data editor

T1

Feature list

DG COMP needs a tool that is capable of editing reference data and

support the design of reference data in the context of one or more

information systems. The tool should support tasks in the following

processes:

Design reference data;

Manage reference data changes.

The tool should have the following features:

Import reference data from an external source and detect

changes;

Create, read, update, or delete a concept scheme;

Create, read, update, or delete concepts in a concept scheme;

Add multilingual labels to a concept scheme;

Foresee a possibility to define the order of concepts in a

concept scheme;

Version concept schemes;

Version concepts;

Version the labels of concepts;

Export one or more versions of a concept scheme.

Tools for managing reference data changes (ticketing/workflow)

T2 Feature list

DG COMP needs a tool that is capable of

Page 46: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 39 of 70

Keeping a log of change requests;

Keeping track of impact analyses;

Keeping a log of decisions on change requests;

Creating release notice;

Linking change requests to release notes and vice-versa; and

Linking change requests to version and vice-versa.

Tools for reference data propagation

T3

Feature list

The tool should allow:

Deploy versioned reference data-as-a-service to an information

system;

Deliver services while disconnected (local cache); and

Provision all versions (full versioning of temporal changes and

language versions).

Tools for publishing a release

T4

Feature list

The tool should provide:

Read-access over HTTP/s;

Write-access over WebDAV or Subversion.

Tools for reference data harmonisation

T5

Feature list

The tool should provide:

Mapping: a means of mapping concepts in different concept

schemes;

Link discovery: a means of discovering relationships between

data items within different Linked Data sources

4.2. Existing standards for reference data management

This section lists a number of metadata standards that should be supported by

metadata tools:

Standard representations (exchange formats) for reference data such as

SKOS, and GeneriCode.

Standards for documenting metadata specifications such as ADMS.

4.2.1. Representation: Simple Knowledge Organisation System

(SKOS)

SKOS13, the Simple Knowledge Organisation System, is a common data model for

sharing controlled vocabularies such as code lists, thesauri, and taxonomies via the

Web in a machine-readable format. In the Core Vocabularies14 specifications of the

13 http://www.w3.org/2004/02/skos/vocabs

14 https://joinup.ec.europa.eu/system/files/project/Core_Vocabularies-Business_Location_Person-

Specification-v1.00_0.pdf

Page 47: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 40 of 70

ISA Programme, SKOS is the recommended vocabulary for the representation of

code lists. The Publications Office already uses SKOS as the official format of

EuroVoc, the EU’s multilingual thesaurus, and the Named Authority Lists.

SKOS provides a standard way to represent knowledge organization systems using

the Resource Description Framework15 (RDF). Encoding this information in RDF

allows it to be passed between computer applications in an interoperable way.

Using RDF also allows knowledge organization systems to be used in distributed,

decentralised metadata applications. Decentralised metadata is becoming a typical

scenario, where service providers want to add value to metadata harvested from

multiple sources.

SKOS represents the terms in a controlled vocabulary as instances of the class

skos:Concepts. SKOS also defines properties for multi-lingual labels

(skos:prefLabel), associated codes (skos:notation), and definitions

(skos:definition). The publication of controlled vocabularies represented in SKOS on

the Web brings the following advantages:

1. De-referencing: the principles of Linked Data requires each term in the

controlled vocabulary to be identified by a corresponding term URI based on

the HTTP protocol. The term “Taxonomy” in the “Asset Type” scheme has for

example the following term URI:

<http://purl.org/adms/assettype/Taxonomy>. This means that when

someone else encounters such an URI, he can look up its meaning by

entering the URI in the address bar in his browser. This is called de-

referencing as it is an actual valid reference, and not a pointer. This is a

simple yet powerful feature of the Web.

2. Machine-readability: In the example of “Taxonomy”, the user can use the

term URI to retrieve both a machine-readable and human-readable file

containing definitions, labels, and related concepts for this term expressed

in SKOS. SKOS is a W3C Recommendation and commonly used

representation format for controlled vocabularies. Well-known thesauri such

as EuroVoc have been defined using an ontology that extends SKOS.

3. Multilingualism: SKOS allows to associate labels and definitions in multiple

languages to any concept. This means that we can associate the labels

“taxonomie”@FR, “Taxonomie”@DE, or “taxonomia”@PT to the concept

identified with URI http://purl.org/net/mediatypes/application/OWL+XML to

include the French, German, and Portuguese labels.

4. Metadata alignment: SKOS provides mapping properties like

skos:closeMatch, skos:exactMatch, skos:broadMatch, skos:narrowMatch and

skos:relatedMatch. These properties are used to state mapping alignment

links between SKOS concepts in different concept schemes, where the links

are inherent in the meaning of the linked concepts.

a. The properties skos:broadMatch and skos:narrowMatch are used to

state a hierarchical mapping link between two concepts.

15 http://www.w3.org/RDF/

Page 48: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 41 of 70

b. The property skos:relatedMatch is used to state an associative

mapping link between two concepts.

c. The property skos:closeMatch is used to link two concepts that are

sufficiently similar that they can be used interchangeably in some

information retrieval applications. In order to avoid possibilities of

"compound errors" when combining mappings across more than two

concept schemes, skos:closeMatch is not declared to be a

transitive property.

d. The property skos:exactMatch is used to link two concepts, indicating

a high degree of confidence that the concepts can be used

interchangeably across a wide range of information retrieval

applications. skos:exactMatch is a transitive property, and is a

sub-property of skos:closeMatch.

SKOS is an extensible vocabulary. One popular extension is SKOS-XL, which

extends SKOS with labels (SKOS eXtension for Labels).

4.2.2. Representation: GeneriCode

The OASIS Code List Representation format, GeneriCode16, is a single model and

XML format (with a W3C XML Schema) that can encode a broad range of code list

information. The XML format is designed to support interchange or distribution of

machine-readable code list information between systems.

4.2.3. Representation: Using HTTP URIs to identify concept schemes

and concepts

In order to facilitate its sharing and reuse across systems and organisation,

structural metadata needs to have persistent unique identifiers. As we are

experiencing the era of the Web of Data, it is recommended that such identifiers

come in the form of HTTP URIs. The ISA Programme as well as W3C have created

good practices and guidelines for the design and management of well-formed,

persistent URIs [European Commission - ISA Programme, 2012], e.g. see ISA’s 10

Rules for Persistent URIs17.

4.2.4. Description: Asset Description Metadata Schema (ADMS)

The Asset Description Metadata Schema (ADMS) is a common vocabulary for

descriptive metadata, used to describe interoperability solutions. ADMS is currently

a W3C Working Group Note18.

ADMS is intended as a model that facilitates federation and co-operation. Like

DCAT, ADMS has the concepts of a repository, assets within the repository that are

often conceptual in nature, and accessible realizations of those assets, known as

distributions. ADMS is an RDF vocabulary with an RDF schema available at its

namespace http://www.w3.org/ns/adms . The original ADMS specification published

16 http://docs.oasis-open.org/codelist/ns/genericode/1.0/

17 https://joinup.ec.europa.eu/community/semic/document/10-rules-persistent-uris/

18 http://www.w3.org/TR/vocab-adms/

Page 49: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 42 of 70

by the European Commission [ADMS1] includes an XML schema that also defines all

the controlled vocabularies and cardinality constraints associated with the original

document.

ADMS allow users to:

• “describe semantic assets in a common way so that they can be seamlessly

cross-queried and discovered by ICT developers from a single access point,

such as Joinup;

• search, identify, retrieve, compare semantic assets to be reused avoiding

duplication and expensive design work through a single point of access;

• keep their own system for documenting and storing semantic assets;

• improve indexing and visibility of their own assets;

• Link semantic assets to one another in cross-border and cross-sector

settings.”

When reference data is stored, regardless in what manner, extra descriptive

metadata can be very useful regarding re-usability, transparency, etc. Descriptive

metadata about reference data sets may document:

The meaning and purpose of each reference data value domain.

The reference tables and databases where the reference data appears.

The source of the data in each table.

The version currently available.

The last modification date.

The way the data is maintained.

The person accountable for the quality of the data and metadata.

The main limitation of ADMS is that it perceives structural metadata as a black-box.

This means that it can be used for describing a data model or a reference dataset

as a whole, but it cannot be used for describing particular elements within that data

model or reference dataset – or at least this is not its purpose. In such cases, the

use of other standards is recommended, such as ISO 11179 standard on metadata

registries

4.3. Existing tools for reference data management

4.3.1. Publication: Joinup

In this context, the main value of Joinup is as an online collaborative platform. The

Joinup platform was developed by the ISA programme of the European

Commission for releasing and documenting specifications for structural metadata

such as ontologies, data models, code lists, XML schemas, reference data, etc.

Publishing reference data on Joinup allows users to easily find the data, download it

and provide feedback.

Joinup offers the following features that support the release and publication of

structural metadata:

WebDav;

Subversion;

Release management; and

ADMS editor and ADMS-conform publication.

Page 50: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 43 of 70

An example of a structural metadata specification that uses Joinup as repository is

OSLO – Open Standards for Local Authorities. The OSLO project has created the

following permanent (persistent) URIs using the purl.org service:

http://purl.org/oslo

>> redirects to >> https://joinup.ec.europa.eu/node/66650

http://purl.org/oslo/ns/vocabulary

>> redirects to >> https://joinup.ec.europa.eu/svn/adms/CESAR/V-ICT-

OR_OSLO/OSLO_v1.00_XML_Schemas.zip

These permanent URIs can be configured to forward requests to any location. This

gives the OSLO project the flexibility to refer to its specifications using the

permanent URLs. Currently, the request is forwarded to Joinup. The specifications

itself as stored on a Subversion versioning store, which is also accessible through

HTTP. Using the Joinup ADMS editor, a description of the structural metadata was

made. The description metadata is available in both human-readable form (HTML)

and machine-readable form (RDF-XML).

https://joinup.ec.europa.eu/node/66650

4.3.2. Publication: Metadata Registry of the Publications Office

(MDR)

The Metadata Registry (MDR) of the Publications Office19 of the EU is the

authoritative source for definition data – metadata elements, named authority lists,

schemas, etc. – and authority data used for exchanging data between institutions

involved in the legal decision making process. Many of the definition data sets

contained in the MDR are governed by the Inter-Institutional Metadata Maintenance

Committee (IMMC).

The Publications Office uses a tool chain and some scripts to edit the Named

Authority Lists. For each NAL, the Publications Office publishes a set of distribution

which can be downloaded from the MDR website. These sets are composed of a

SKOS, XML, XSD and HTML version.

A publication package is also available as a zip file. It contains the distribution of

changed NALs (XML, SKOS, ATTO-XML20), a comparison file allowing to identify

differences between the previous and the current version, and the release notes

listing the changes to the NALs included in the publication.

4.3.3. Editor / Propagation: GENIS Reference Data Component

(GENIS RDC)

In the context of the Generic Interoperable Notification Services (GENIS) project,

funded under Action 1.11 of the ISA programme, a GENIS Reference Data

Component (GENIS RDC) was built. The GENIS RDC has the following features:

19 http://publications.europa.eu/mdr/ 20 http://publications.europa.eu/mdr/authority/

Page 51: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 44 of 70

Import reference data from a file;

Create, read, update, delete reference data using the Web-based graphical

user interface;

Export reference data to a file;

Deploy reference data as a service to clients.

The GENIS RDC considers the following

Project: Reference data is categorised in projects. The SANI2 project for

example contains the reference data which is linked to the State Aid

Notification Infrastructure.

Group: projects have zero, one or more groups. Groups represent concept

schemes, for example a country code list.

Reference data entity: each group consists of reference data entities. By

defining different start and end dates to a reference data entity in different

projects, each system will be able to access the version which is relevant.

For example, the HR system might need Serbia as part of its reference data

while DG COMP might not yet need it in its system.

Representation: reference data entities can have one or more

representations (e.g. alpha-3 and alpha-2 codes for countries).

Ordering: groups can have one or more orderings for the reference data

entities included in it.

The Component supports versioning of the reference data on group and on

reference data entity levels. Clients can consume reference data according to a

timestamp. By doing so, the Component allows to serve reference data as it was

available at any point in time in the past. For example, when a user fills in a

notification form; the form component stores the form together with the codes of

the reference data items. When the form is opened at a later stage, the form will

appear with the reference data labels that were available at the time when the form

was submitted.

According to a presentation on the GENIS RDC delivered by DG COMP, the software

supports several main features for managing reference data, metadata and

enterprise master data:

Multi-tenancy: The software is designed in a way that allows it to run as a

single instance on a server, while serving multiple client organisations. The

Component categorises reference data in projects, to which users and

managers are assigned;

Graph Data: The domain model of the reference data in the tool includes

Project, Group, Reference Data Item, Representation, Order and Tag

entities. Ownership, lifecycle, protection and data segments are defined for

each entity;

Versioning: Versioning is carried out at the Group Entity and at the

Reference Data Entity level. These entities can get project-specific start and

end dates assigned;

Data staging: Various import and export capabilities like XML and CSV are

supported. Import and export is script driven, so it can be adapted to the

specifications of different systems.

Page 52: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 45 of 70

Decoupling: Clients of the Reference Data Management Component

operate entirely on locally cached data. By decoupling the tool from its

clients, an outage of the Component does not lead to an interruption in the

client’s system;

Multilingualism: each reference data item can have labels in any language;

Deployment: The Component can be deployed as a service, integrated into

an application, as a standalone application or as a proxy;

Notification: Users of the Reference Data Management Component are

notified in case of import; manual data changes or changes on cross

referenced data.

The governance process currently adopted by DG COMP for the Reference Data

Management Component involves three roles:

Administrator: Users in the Administrator role can create and delete

projects, groups, project managers, normal users…;

Project Manager (creation of standard users, giving access to a specific

project, etc.). The Project Manager can create Standard Users and give them

access to projects; and

Standard user: a Standard User will be assigned to one or more projects and, when logged in will have access to one or more projects.

The Reference Data Component by DG COMP is intended solely for reference data

and allows for a clear distinction between this and the business logic which will be

in the application layer. To this extent, the building block can be reused by other

systems as a plugin, via web-services, via API, or using a dedicated client. It is

currently designed for use within DG COMP and would need further work before it

can be made available as a generic solution for interoperability.

4.3.4. Editor: VocBench

VocBench21 is a web-based editing and workflow tool for managing thesauri,

authority lists and glossaries based on SKOS and RDF. The tool was developed by

the Food and Agricultural Organisation (FAO) of the United Nations. VocBench

supports collaborative editing, multilingual terminologies and administration

functions that allow assigning roles for maintenance, validation and quality

assurance.

The Publications Office of the European Commission uses VocBench to manage its

EuroVoc thesaurus.

4.3.5. Editor: PoolParty: Thesaurus Management

PoolParty Thesaurus Server22 is a software tool for creating and maintaining

taxonomies, thesauri, ontologies and knowledge graphs. The tool manages

21 http://aims.fao.org/tools/vocbench-2; http://vocbench.uniroma2.it/

22 http://www.poolparty.biz/portfolio-item/poolparty-thesaurus-server/

Page 53: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 46 of 70

metadata based on standards like RDF and SKOS. Designing code lists can be done

via the graphical interface or by importing existing lists in formats like XML, Excel,

etc. Moreover, the tool carries out automatic quality checks based on SKOS.

For system integration purposes, PoolParty provides an API which is based on the

SPARQL standard, an RDF database query language.

4.3.6. Editor: Silk workbench (link discovery)

Silk Workbench23 is a web application which guides the user through the process of

creating a link specification for interlinking two data sources.

The Silk Workbench provides the following components:

• Workspace Browser enables the user to browse the projects in the

workspace. Linking Tasks can be loaded from a project and committed back

to it later.

• Linkage Rule Editor A graphical editor which enables the user to easily

create and edit link specifications. The widget will show the current link

specification in a tree view while allowing editing using drag-and-drop.

• Evaluation allows the user to execute the current Link Specification. The

links are displayed while they are generated on-the-fly. Generated links for

which the reference link set does not specify their correctness, the user may

confirm or decline their correctness. The user may request detailed

summaries on how the similarity score of specific links is composed of.

4.3.7. Workflow Management tool: Activiti

Activiti24 is an open source tool that aims at serving the Business Process

Management (BPM) needs of both business people as well as IT developers. The

tool supports designing and graphically authoring Workflow processes (e.g. in

BPMN), it provides features for task management such as creating and assigning or

temporarily delegating tasks to users, etc.

It can run in embedded, standalone or client/server mode. Its engine is written in

java, which means it can call out to native Java code, which makes it a great choice

for a dedicated workflow component in an (existing) Java platform.

4.3.8. Change management: Atlassian JIRA

Atlassian JIRA25 is an online ticket tracking system that supports organising and

following up on issues, assigning work packages and monitor team activity. JIRA

can be used for following up on change requests and to support the development

and maintenance of reference data.

23 https://www.assembla.com/spaces/silk/wiki/Silk_Workbench 24 http://activiti.org/userguide/index.html#N10007 25 https://www.atlassian.com/software/jira

Page 54: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 47 of 70

4.3.9. Deployment: Mule

ESBs are universal connectors; they transform/route/augment messages securely

and can notify subscribed clients. For this reason, by excellence they are used a lot

for integration purposes. Mule is an Open Source Java ESB.

4.3.10. Editor / Deployment: Jena

Apache Jena is a free an Open Source Java framework for building semantic web

applications. It is composed of different APIs to interact on RDF data. These APIs

allow Jena to span from core RDF processing to inferring knowledge and

establishing Ontologies.

4.4. Domain model

Figure 4 contains a domain model that provides a logical metadata model that we

will use for describing the reference data management processes. The domain

model is a conform subset of the SKOS-XL standard.

The domain model consists of the following classes [Miles & Bechhofer, 2009]:

Concept Scheme: A SKOS concept scheme can be viewed as an

aggregation of one or more SKOS concepts. Semantic relationships (links)

between those concepts may also be viewed as part of a concept scheme.

This definition is, however, meant to be suggestive rather than restrictive,

and there is some flexibility in the formal data model stated below. The

notion of a concept scheme is useful when dealing with data from an

unknown source, and when dealing with data that describes two or more

different knowledge organization systems.

Concept: A SKOS concept can be viewed as an idea or notion; a unit of

thought. However, what constitutes a unit of thought is subjective, and this

definition is meant to be suggestive, rather than restrictive. The notion of a

SKOS concept is useful when describing the conceptual or intellectual

structure of a knowledge organization system, and when referring to specific

ideas or meanings established within a KOS.

Label: A lexical label is a string of UNICODE characters, such as "romantic

love" or "れんあい", in a given natural language, such as English or Japanese

(written here in hiragana). The Simple Knowledge Organization System

provides some basic vocabulary for associating lexical labels with resources

of any type. In particular, SKOS enables a distinction to be made between

the preferred, alternative and "hidden" lexical labels for any given resource.

The preferred and alternative labels are useful when generating or creating

human-readable representations of a knowledge organization system. These

labels provide the strongest clues as to the meaning of a SKOS concept. The

hidden labels are useful when a user is interacting with a knowledge

organization system via a text-based search function. The user may, for

example, enter mis-spelled words when trying to find a relevant concept. If

the mis-spelled query can be matched against a hidden label, the user will

be able to find the relevant concept, but the hidden label won't otherwise be

visible to the user (so further mistakes aren't encouraged).

Page 55: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 48 of 70

Ordered Collection: SKOS concept collections are labelled and/or ordered

groups of SKOS concepts. Collections are useful where a group of concepts

shares something in common, and it is convenient to group them under a

common label, or where some concepts can be placed in a meaningful order.

The domain model consists of the following relationships:

Broader and narrower: SKOS semantic relations are links between SKOS

concepts, where the link is inherent in the meaning of the linked concepts.

The Simple Knowledge Organization System distinguishes between two basic

categories of semantic relation: hierarchical and associative. A hierarchical

link between two concepts indicates that one is in some way more general

("broader") than the other ("narrower"). An associative link between two

concepts indicates that the two are inherently "related", but that one is not

in any way more general than the other.

prefLabel: the preferred label (as an entity);

memberList: skos:memberList is a functional property, i.e., it does not

have more than one value. This is intended to capture within the SKOS data

model that it doesn't make sense for an ordered collection to have more

than one member list.

The domain model consists of the following attributes:

URI: identify concepts in a unique way;

prefLabel: multilingual label attributed to a concept. Per language, only one

preferred label can be defined;

Notation: lexical code used to uniquely identify a concept within a concept

scheme; and

Definition: skos:definition provides a plain text definition of classes.

URI[1]

name[0..*]

ConceptScheme

URI[1]

notation[1]

prefLabel[0..*]

altLabel[0..*]

definition[0..*]

example[0..*]

validFrom[1]

validTil[1]

Concept

hasTopLevelConcept

narrower

broader

URI[1]

name[0..*]

OrderedCollectionmemberList <<Ordered>>

URI[1]

literalForm[0..*]

validFrom[1]

validTil[1]

Label

prefLabel

inScheme

Page 56: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 49 of 70

Figure 4 – UML Static Diagram: Domain Model for reference data (based on SKOS-XL)

This too has a need for version control. It is important that a scheme is versioned

so that the relevance and value is known. Versions are updated according to the

principles of release management. Minor releases are for instance changes in

examples and should not be given a new version; major releases are changes in

ConceptScheme as this has an impact on the environment it should be versioned.

In this case we recommend that ConceptScheme should be versioned, Concept and

Label should not.

4.5. Data flow diagram

It is understood that the Reference Data Building Block could designed to operate

within an environment that starts with an external authentic source (e.g. the

Publications Office for country codes) and end with that data being used as

reference in operational databases such as the GENIS one. The authoritative source

can be managed via Joinup, changes are managed and logged with the aid of

aforementioned tools, GENIS propagates the data and finally changes get a follow-

through in the operational systems. The figure below refers to this understanding,

which is further relied on in the use-cases below.

Figure 5- Simplified DFD for the flow of data between authentic source and GENIS

4.6. High-level use cases

This section lists a number of high-level use cases that need to be supported by a

tool (or a combination of tools) to support the reference data management

lifecycle.

Page 57: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 50 of 70

Figure 6 High-level use cases for metadata management

4.6.1. Use Case 0 – Edit an authentic source of reference data

DG COMP needs to manage the relationship between its reference data and the

authentic source. Although DG COMP itself may own reference data which is an

authentic source, the creation of reference data is out of scope for this study.

4.6.2. Use Case 1 – Detect reference data changes

The system needs to be able to detect reference data changes that happen at an

authentic source and notify the actor.

ID Detect reference data changes

Goal Detect and identify the changes to an authentic source of reference

data

Preconditi

ons

Authentic reference data linked to a Concept Scheme is available

in an authentic source and accessible via an HTTP request over a

(persistent) HTTP URI.

Context-specific reference data is available in the tool and

associated with the authentic reference data.

Success

End

Condition

The tool produces a list of HTTP URIs for Concepts per

ConceptScheme for which a change has occurred.

Failed End

Condition Authentic reference data is either unavailable or incorrect

Primary

Actor Editor

Secondary

Actors Authoritative source – submit feedback

Priority

Performan

ce

Frequency Ad-hoc

Trigger Periodic check or at request by the reference data editor.

Other

Descriptio

n

Ste

p Action

Basic flow 1 The Editor creates a local, context-specific Concept Scheme,

Page 58: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 51 of 70

for example, a list of country codes.

2

The Editor populates the context-specific Concept Scheme from

an authentic source. This is done by configuring a persistent

URI for the associated authentic source of reference data. For

example, the countries NAL from the MDR:

http://publications.europa.eu/mdr/resource/authority/country/

skos/countries-skos.rdf

3

The Editor requests the system to detect changes between the

local, context-specific Concept Scheme and the authentic

source of reference data.

OR

The system periodically (e.g. every day) triggers the detection

of changes.

4

The system retrieves the latest version of the authentic source

of reference data. The system produces a list of differences

between the local and the authentic reference data.

4.6.3. Use Case 2 – Manage reference data changes

DG COMP needs to manage how a validated change at an authentic source can

enter the production environment of a particular information system at DG COMP.

It is assumed here that a reference data is trusted and propagated to the various

information systems (e.g. GENIS, SARI and eventually other DG COMP systems

outside of the State-aid domain) once it is in the Reference Data Building Block.

ID Manage reference data changes

Goal Changes to reference data are made in a controlled

environment ensuring continuity and quality

Scope and Level

Preconditions The system has produced a list of differences between the

local and the authentic reference data.

Success End

Condition

Each change on the difference list has been fully treated:

- it has either been applied to local reference data; or

- the difference has been discarded.

Each difference is automatically logged in a ticketing

system.

Failed End Condition Change has been denied due to semantic or syntactic

errors

Primary Actor Editor

Secondary Actors Stakeholders – submit feedback

Priority

Page 59: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 52 of 70

Performance

Frequency Either:

Ad hoc: when receiving feedback from users and/or when (new) legal obligations arise; or

Periodic: when changes are pooled for a planned release.

Trigger Request for change

Other

Description Step Action

Basic flow 1

The editor has received a list of difference

between the local and the authentic reference

data. This list needs to be logged as a change

request.

2

The editor assess the impact of the

differences between the lists on the local

dataset

3 The change is approved after a check on

design rules, semantics and syntax.

4 The editor adds or edits for instance a country

code in the Concept Scheme and defines it.

5

The editor then plans for the new version to

be released to all consumers and involved

stakeholders are informed.

4.6.4. Use Case 3 – Deploy reference data changes

DG COMP needs to manage the deployment / propagation of reference data

changes to its information systems.

ID Deploy reference data changes

Goal

Propagating all changes to consumer systems in order to

establish a new stable build. All system changes go

through the process of testing, acceptance and production.

Scope and Level

Preconditions

Local reference data has been configured (Concept

Scheme).

Stakeholders are informed and involved in upcoming

change.

Success End

Condition

Local reference data is made available – as a service – to a

client application.

Failed End Condition Service cannot be consumed by stakeholders

Primary Actor Software Developer

Page 60: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 53 of 70

Secondary Actors Stakeholders – testing and consuming

Priority

Performance

Frequency Either:

Ad hoc: when receiving feedback from users and/or when

(new) legal obligations arise; or

Periodic: when changes are pooled for a planned release.

Trigger Planned release

Other

Description Step Action

Basic flow 1

Import the new version of the reference data,

for example, the countries NAL to the test

environment

2

Version the concept schemes so that a new

list is created and validity (timestamp) of the

data can be entered.

3 Check if the local reference data passes

validity rules.

4

If the validity rule passes export the NAL to a

test environment of an exemplary consumer

for acceptance.

5

If the NAL passes testing and acceptance

repeat the steps above and deploy reference

data as a service to other information systems

4.7. Assessment of proposed tooling for reference data

management

This section proposes a set of tools for managing reference data. It is indicated

which steps of the reference data process are supported by which tools. All

requirements can be supported by existing tools described in Section 4.3, some of

which are already being used within the EC.

Requirement JIRA GENIS RDC VocBench Joinup Silk

T1 Edit

Import reference

data from external

source

x x x

CRUD

ConceptScheme x x

multilingualism x x

Order of concepts x x

Versioning x x

Page 61: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 54 of 70

Requirement JIRA GENIS RDC VocBench Joinup Silk

Export x x

T2 Changes

Log changes x

Keeping track of

impact analysis x

Log decisions x

Create release

notice x

Linking change

requests to release

notes

x

Linking change

requests to

versions

x

T3 Propagate

Deploy as a

service x

Deliver services

while disconnected x

Provision all

versions x

T4 Publication

Read-access over

HTTP x

Write-access over

WebDAV or

Subversion.

x

T5

Harmonisation

Mappings x

Link discovery x

4.8. Recommendations for the GENIS RDC – E2E

implementation example

Based on the inventory of existing requirements and needs and existing tools, it

can be concluded that GENIS RDC is a tool that could fit as a deployment tool.

Other Standard tools are already available for editing, change management and

publication. Therefore we give the following recommendations to demonstrate how

the pieces can be fitted together:

Consider using existing editors such as VocBench: Investigate the

possibility of using VocBench as an editor for the reference data and focus

future development effort for the GENIS Reference Data Component (RDC)

on its deployment features only; “reference data as a service”;

Page 62: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 55 of 70

Consider using a standard representation format such as SKOS-XL:

Align the versioning of the reference data with good practices from the

Publications Office and standards such as SKOS(-XL);

Provide an import and export feature for reference data in SKOS-XL

format; and

Consider attributing persistent HTTP URIs: Include URIs for concept

schemes and concepts in the reference data and align them with the

(informal) rules for persistent URIs of the URI Task Force of the European

Commission (Cf. SEMIC Deliverable D3.226).

Consider integration with a Workflow Automation tool like Activiti

(i.e. integration with the management aspects of reference data);

Consider integration with an ESB like Mule ESB or Mule AnyPoint for

connecting to various stakeholders with specific interface requirements,

and/or Cloud deployment in case an even broader access is desirable.

Possible integration solutions

Figure 9 and 10 give a graphical overview of how the tools mentioned above can fit

together as well as how GENIS could fit in such overall approach.

Figure 9 shows: the functional responsibilities of the different blocks and

how they collaborate;

Figure 10 shows: how they can be mapped to components/tools like GENIS.

With the above recommendations, GENIS could as such become an integral part of

an overall semantic platform/approach within the Commission.

26 D3.2 Common approach for the management of URIs by EU institutions

https://webgate.ec.europa.eu/CITnet/confluence/x/8AHgDw

Page 63: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 56 of 70

Figure 7 - Overview (functional blocks)

Figure 8: Overview (example implementation)

Below an explanation is given of the figures and their lanes:

Governance Lane:

Governance is extensively covered in this document. It is the only ‘non-tangible’

(hence the chalked line) in this overview. Yet the governance drives the other lanes

that do have a counterpart in software. The products of governance are policies &

principles which should be implemented in the other lanes.

Page 64: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 57 of 70

Management Lane:

Reference data management component

This component logically follows on governance. In other words the principles and

procedures are translated to managing reference data. The management of

reference data comes in the form of different workflow processes also defined in

this document. A workflow tool coordinates the processes that need to be carried

out and involves all stakeholders. For example, it defines a workflow for creating a

new controlled vocabulary, or for adding elements to existing vocabularies. It

typically also keeps versions of these workflows, and audit trail for Business

Intelligence reporting. Ideally it also allows for call-outs to different parts of an

overall architecture from within the WF tool, this to realize an integrated approach.

Reference data editor Component

This component is the place where CRUD (Create/Read/Update/Delete) operations

on RD are executed once a decision is made in the Workflow Component. At its

backend, it will have to interface with a variety of existing data stores (NAs) and

middleware like Apache Jena or Semantic Turkey to cover RDF/OWL/SKOS

functionality. As e.g. Jena by itself cannot accommodate just about any backend,

an ESB comes to rescue.

Consumption / deployment Lane

This lane makes sure that a variety of customers can access the semantic content

they need, to integrate it into their own content. The ESB can be used again to

address the differences in storage format (SQL/native RDF/…) and the format

clients want their data in (XML, JSON, native RDF/SKOS …).

To ensure a separation of concerns and decouple front-end from backend, it is

advisable to apply a man-in-the middle approach. It is called Façade as its main

purpose is just that. A decision to be made is how accessible one would want this to

be. E.g. if public access is desirable, an always-on off-site cloud solution like Mule

AnyPoint can offer the same flexibility as a local Mule ESB while adding cloud-

hosting features.

For an integrated approach, each of the blocks can be mapped to custom

development or configuration/extending of existing tools like the ones mentioned

earlier in section 4.3 of this document. Looking at the studied requirements, the

latter approach seems to fit;

The reference data editor role could be assumed by either GENIS RDC or

VocBench. GENIS RDC is Java based and can be reused by other systems as

a plugin, via web-services, via API, or using a dedicated client. Vocbench is

a Java-based Open-Source tool which means it can work together

seamlessly with a workflow engine like Activiti.

Cf. recommendations for RDC: for deployment needs, GENIS could be used

as it is Java-based as well (-> it can be made inter-operable with all other

aspects of the setup) as it already features an External Service Layer to

accommodate the needs of various clients. For parts it does not cover yet,

the ESB can be implemented. It is indeed why recommendations to further

develop GENIS focus on these 2 aspects of GENIS.

Page 65: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 58 of 70

5. CONCLUSIONS

This report elaborates on the tailoring of a methodology for the management and

governance of reference data for the State-aid information of DG COMP in which

the Commission exchanges information both internally (with DG AGRI, DG MARE

and Eurostat) and with European public administrations in all Member States.

The following approach was followed:

Stakeholder requests and needs were identified;

A solution for the governance & management of reference data is specified;

It was assessed if existing tools including GENIS RDC as a main component

meet the identified requests and needs; and

Recommendations for further development of GENIS RDC are given.

Solution for governance and management:

There are many existing standards and methodologies to achieve metadata

governance and metadata management. In terms of governance we have derived

the following models from existing solutions that can be used:

For the local level we have identified a governance structure composing out

of a steering committee, working group and stakeholder involvement.

For inter-institutional IMMC can be taken for inspiration.

On a trans-European level comitology procedures need to be taken into

account.

We have determined that both reference data specifications under metadata

governance and related documentation should have an authoritative source.

The use of persistent Uniform Resource Identifiers (HTTP URI’s) for reference

data releases can make it easier to manage an authoritative source.

In terms of data management we have identified best practices from DM-BOK,

Publications offices and ITIL and found that these existing management practices

can be well applied to manage structural metadata as described in chapter 3.3.

Support by existing tools and recommendations

It is concluded that GENIS RDC is a well-placed tool that can be used for editing

and propagating data and perhaps play a part in change management and that

there are many tools available that could complement GENIS RDC such as

VocBench in order to fulfil the needs and requirements listed in this document. In

Section 4.8 we formulated the following recommendations:

Consider using the tools as mentioned in the categorization as they fulfil the

requirements and are also being widely used within the EC;

Consider using a standard representation format such as SKOS-XL;

Consider providing an import and export feature for reference data in

SKOS-XL format;

Consider attributing persistent HTTP URIs; and

Also consider the use of integration tools such as ESB MULE and combine it

with a workflow automation tool such as Activiti.

Page 66: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 59 of 70

6. ACKNOWLEDGEMENTS

Specific acknowledgement is due to:

Person Organisation

Jesper Abrahamsen European Commission, DG COMP

Julian-Daniel Jimenez-

Krause European Commission, DG COMP

Manuel Perez-Espin European Commission, DG COMP

Roberto Atienza European Commission, DG COMP

Carsten Schott European Commission, DG COMP (external consultant)

Page 67: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 60 of 70

BIBLIOGRAPHY

(2012). Open Data White Paper - Unleashing the Potential. Norwich: The Stationery

Office.

Official Journal of the European Union. (2013, June 27). Retrieved December 02, 2013,

from EUROPA - European Union website, the official EU website: http://eur-

lex.europa.eu/JOHtml.do?uri=OJ:L:2013:175:SOM:EN:HTML

Bechhofer, S., & Miles, A. (2009). SKOS Simple Knowledge Organization System

Reference. W3C.

Berners-Lee, T. (2006, July 27). Linked Data. Retrieved December 02, 2013, from

World Wide Web Consortium (W3C):

http://www.w3.org/DesignIssues/LinkedData.html

Bizer, Heath, & Berners-Lee. (2009). Linked Data - The Story So Far. International

Journal on Semantic Web and Information Systems, 1-22.

Chen, W.-J., Baldwin, J., Dunn, T., Grasselt, M., Shabbar, H., Mandelstein, D., et al.

(2013). A Practical Guide to Managing Reference Data with IBM InfoSphere

Master Data Management Reference Data Management Hub. International

Business Machines Corporation.

CIEC. (2013). Information note. Strasbourg: CIEC.

Coates, A., & Watts, M. (2007). Code List Representation (Genericode) Version 1.0.

OASIS.

CooP. (2014). Final Report of Work Package 5: Specifications of Common Data

Formats and Semantics.

Council of the European Union. (2009). Council Decision 2009/316/JHA of 6 April

2009 on the establishment of the European Criminal Records Information

System (ECRIS) in application of Article 11 of Framework Decision

2009/315/JHA. Official Journal L 093, 33-48.

De Leenheer, P., de Moor, A., & Christiaens, S. (2010). Business Semantics

Management at the Flemish Public Administration.

Dekkers, M., & Goedertier, S. (2013). Metadata for Public Sector Administration.

NISO/DCMI.

Digitaliseringsstyrelsen. (2012, May 30). About OIOXML. Retrieved November 22,

2013, from Digitaliseringsstyrelsen:

http://www.digst.dk/Servicemenu/English/IT-Architecture-and-

Standards/Standardisation/Standardisation-creating-digital-Denmark/About-

OIOXML

Page 68: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 61 of 70

Directorate General: Energy & Transport. (2011, 01 17). Tachonet Project: XML

Messaging Reference Guide.

ECN. (2013, 10 21). European Competition Network. Retrieved 12 24, 2013, from

European Commission: competition: http://ec.europa.eu/competition/ecn/

ECN. (n.d.). Joint Statement of the Council and the Commission on the Functioning of

the Network of Competition Authorities. Retrieved 12 23, 2013, from European

Commission: competition:

http://ec.europa.eu/competition/ecn/joint_statement_en.pdf

e-CODEX. (2012). e-Justice Communication via Online Data Exchange. European

Commission.

EESSI. (n.d.). Electronic Exchange of Social Security Information . Retrieved 12 24,

2013, from European Commission: Employment, Social Affairs & Inclusion:

http://ec.europa.eu/social/main.jsp?catId=869

e-SENS. (2013, August 27). Electronic Simple European Networked Services - D6.1

Executable ICT Baseline Architecture.

ETSI. (2011). Electronic Signatures and Infrastructures (ESI); Associated Signature

Containers (ASiC). Sophia Antipolis: European Telecommunications Standards

Institute.

EUCARIS. (2013). EUCARIS - Technology. Retrieved 12 24, 2013, from EUCARIS:

https://www.eucaris.net/technology

EUCARIS. (2013). Use of EUCARIS. Retrieved 12 23, 2013, from European Car and

Driving License Information System: https://www.eucaris.net/use-of-eucaris

European Commission . (2010). Commission Regulation (EU) No 1213/2010 of 16

December 2010 establishing common rules concerning the interconnection of

national electronic registers on road transport undertakings Text with EEA

relevance . Official Journal of the European Commission, 21-29.

European Commission - ISA Programme. (2012). D7.1.3 - Study on persistent URIs,

with identification of best practices and recommendations on the topic for the

MSs and the EC. Retrieved from

https://joinup.ec.europa.eu/community/semic/document/10-rules-persistent-uris/

European Commission - ISA Programme. (2012). D7.1.3 - Study on persistent URIs,

with identification of best practices and recommendations on the topic for the

MSs and the EC. Brussels.

European Commission - ISA Programme. (2013, June 03). CAMSS - 05 - Detailed

CAMSS Criteria. Retrieved November 27, 2013, from Joinup:

https://joinup.ec.europa.eu/community/camss/wiki/camss-05-detailed-camss-

criteria

Page 69: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 62 of 70

European Commission - ISA Programme. (2013). Draft Report on reaching semantic

agreements with CISE. Brussels.

European Commission - ISA Programme. (2013). Process and methodology for

developing semantic agreements.

European Commission - ISA Programme. (2014). D4.2. Methodology and tools for

Metadata Governance and Management for EU Institutions and Member States.

Brussels.

European Commission - Mobility and Transport. (n.d.). European Register of Road

Transport Undertakings (ERRU). Retrieved 12 23, 2013, from European

Commission - Mobility and Transport:

http://ec.europa.eu/transport/modes/road/access/erru_en.htm

European Commission. (2010). Evaluation of the 2004 action plan for electronic public

procurement. Brussels: European Commission.

European Commission. (2011). Commission Decision of 12 December 2011 on the

reuse of Commission documents (2011/833/EU). Official Journal of the

European Union, 39-42.

European Commission. (2011, December 12). Communication from the Commission to

the European Parliament, the Council, the European Economic and Social

Committee and the Committee of the regions. Open data - An engine for

innovation, growth and transparent governance. Brussels, Belgium.

European Commission. (2013). INSPIRE Directive. Retrieved December 10, 2013, from

European Commission: http://inspire.jrc.ec.europa.eu/

European Commission. (2013). ISA Open Metadata Licence v1.1. Brussels.

European Commission. (2013). Official documents - Employment, Social Affairs &

Inclusion - European Commission. Retrieved December 17, 2013, from

European Commission:

http://ec.europa.eu/social/main.jsp?catId=868&langId=en

European Commission, ISA Programme. (2013). D4.1 – Metadata management

requirements and existing solutions in EU Institutions and Member States.

Brussels: European Commission.

European Commission, ISA Programme. (2013). D6.1.2 – Report for documenting and

reusing data models and reference data Business Case. Brussels.

European Commission, ISA Programme. (2014). D4.1 – Metadata management

requirements and existing solutions in EU Institutions and Member States.

Brussels: European Commission.

European Community. (2007). European Union Public Licence v.1.1. Retrieved 02 11,

2014, from European Union Public Licence (EUPL v.1.1.):

Page 70: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 63 of 70

https://joinup.ec.europa.eu/system/files/EN/EUPL%20v.1.1%20-

%20Licence.pdf

European Union. (2011). Regulation no. 182/2011 laying down the rules and general

principles concerning mechanisms for control by Member States of the

Commission's exercise of implementing powers.

Fabian Büttner, U. B. (2013). Model-driven Standardization of Public Authority Data

Interchange.

General Secretariat of the Council. (2010). ECRIS Technical Specifications - Inception

Report. Brussels: European Commission – DG Justice.

Government On-Line Metadata Working Group. (2006). Records Management

Application Profile. Canada: Government of Canada.

Graux, H. (2009). Study on electronic documents and electronic delivery for the

purpose of the impementation of Art. 8 of the Services Directive. Brussels:

Timelex.

IBM. (2013). Reference Data Management: IBM Redbooks Solution Guide. New York:

International Business Machines Corporation.

IDABC - CAMSS. (2012, June 4). CAMSS Assessment Criteria. Retrieved November

27, 2013, from IDABC - CAMSS:

https://webgate.ec.europa.eu/fpfis/mwikis/idabc-

camss/index.php/CAMSS_Assessment_Criteria

Interactive Instruments. (2011). Beyond service interfaces - OGC encoding standards in

INSPIRE: GML and SLD/SE. Bonn, Germany. Retrieved December 10, 2013,

from

http://inspire.jrc.ec.europa.eu/events/conferences/inspire_2011/presentations/wo

rkshops/274/Beyond_service_interfaces_OGC_workshop.pdf

International Organisation for Standardisation. (2005). ISO 19135:2005 Geographic

information -- Procedures for item registration. Geneva.

International Organization for Standardization. (2005). ISO/IEC 11179-6:2005 -

Metadata registries, part 6: Registration.

International Organization for Standardization. (2009). ISO/IEC 11179-1:2004 -

Metadata registries.

International Organization for Standardization. (2009). ISO/IEC 11179-1:2004 -

Metadata registries.

Interoperability solutions for European public administrations (ISA). (2011, May 5).

eGovernment Core Vocabularies: The SEMIC.EU approach. Brussels, Belgium:

European Commission.

Page 71: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 64 of 70

ISA. (2010). European Interoperability Framework (EIF) for European public services.

Brussels: European Commission.

ISA. (2012). How Linked Data is transforming eGovernment. European Commission.

ISA. (2013, January 04). ICCS-CIEC Civil Status Forms. Retrieved December 09, 2013,

from Joinup: https://joinup.ec.europa.eu/catalogue/repository/iccs-ciec-civil-

status-forms

ISA Programme. (2011). Towards Open Government Metadata. Brussels.

ISA Programme of the European Commission. (2012). Metadata Management Survey

Results.

ISA Programme of the European Commission. (2013). Towards harmonised

governance and management of data models and reference data - Business case.

Brussels.

Kurt Salmon. (2013). Assessment of TESs supporting EU policies.

LeBlanc, P., & Smith, B. L. (2002). A Workshop on Managing Horizontal Issues.

Retrieved 12 24, 2013, from Managing Horizontal Issues:

http://www.thinkwell.ca/groupwork/managingHorizontalIssues/documents/MHI

WkshopOutlinev2.pdf

Miles, A., & Bechhofer, S. (2009, August 18). SKOS Simple Knowledge Organization

System eXtension for Labels (SKOS-XL) Namespace Document - HTML Variant.

Retrieved April 3, 2014, from World Wide Web Consortium (W3C):

http://www.w3.org/TR/skos-reference/skos-xl.html

Mosley, M., Brackett, M., Earley, S., & Henderson, D. (2009). The DAMA Guide to The

Data Management Body of Knowledge (DAMA-DMBOK Guide). New Jersey:

Technics Publications, LLC.

National Information Standards Organization . (2004). Understanding Metadata.

NIEM. (2013). NIEM Tools Catalog. Retrieved 11 25, 2013, from NIEM | National

Information Exchange Model: https://www.niem.gov/tools-

catalog/Pages/tools.aspx

OASIS. (2006, December 12). Universal Business Language v2.0. Retrieved November

29, 2013, from OASIS | Advancing open standards for the informaton society:

http://docs.oasis-open.org/ubl/os-UBL-2.0/UBL-2.0.html

OASIS. (2013). OASIS Universal Business Language (UBL) TC. Retrieved November

29, 2013, from OASIS | Advancing open standards for the information society:

https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl

Object Management Group, Inc. (2012, October). Unified Modeling Language™

(UML®). Retrieved November 20, 2013, from Object Management Group:

http://www.omg.org/spec/UML/

Page 72: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 65 of 70

OHIM. (2013). Community Trademark Registration Process. Retrieved December 15,

2013, from https://oami.europa.eu/ohimportal/en/registration-process

OMG. (2011, 01). Business Process Modelling Notation 2.0 (BPMN). Retrieved from

omg.org: http://www.omg.org/spec/BPMN/

Open Knowledge Definition. (n.d.). Open Definition. Retrieved 11 20, 2013, from Open

Definition: http://opendefinition.org/

Open Knowledge Foundation. (2013). Open Definition. Retrieved December 02, 2013,

from Open Definition: http://opendefinition.org/

PEPPOL. (2013). Virtual Company Dossier. Retrieved November 22, 2013, from

PEPPOL | Pan-European Public Procurement Online:

http://www.peppol.eu/peppol_components/virtual-company-dossier

Portal Administración Electrónica. (2013). Technical standards for interoperability .

Retrieved December 09, 2013, from Portal Administración Electrónica:

http://administracionelectronica.gob.es/pae_Home/pae_Organizacion/pae_DGM

APIAE.html?idioma=en

Publications Office of the European Union. (2011). Proposal for metadata governance

on interinstitutional level.

Roy, D. (n.d.). National Information Exchange Model (NIEM): Technical Introduction

to NIEM.

Spanish Ministry of Finance and Public Administration. (n.d.). Decision of 19 February

2013 of the secretary of state for public administration approving the technical

interoperability standard for the reuse of information resources.

SPOCS. (2012). eDocuments - Specification. Retrieved December 16, 2013, from

eDocuments:

http://joinup.ec.europa.eu/site/spocs/eDocuments/specification.html

Uhrowczik, P. (1973). Data dictionary/directories. IBM Systems Journal, 332-350.

UN/CEFACT. (2004). Standard Business Document Header - Technical Specification.

European Commission.

UN/CEFACT. (2008). UML Profile for Core Components (UPCC).

UN/CEFACT. (2009). Core Components Technical Specification - Version 3.0.

UN/CEFACT. (2009). XML Naming and Design Rules Technical Specification -

Version 3.0.

UN/CEFACT. (2012). Core Components Business Document Assembly - Technical

Specification - Version 1.0.

Page 73: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 66 of 70

UN/CEFACT. (2012, June 27). Core Components Business Document Assembly

Technical Specification.

United Nations - Centre for Trade Faciliation and Electronic Business. (2009). Core

Components Technical Specification - Version 3.0.

W3C. (2013). Linked data. Retrieved December 02, 2013, from World Wide Web

Consortium (W3C): http://www.w3.org/standards/semanticweb/data

Page 74: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 67 of 70

ANNEX I STATE-AID REFERENCE DATA SETS

The table below represents the different reference data relevant to State-aid control

which is maintained by DG COMP. It is understood that there is awareness that

reference data is best kept and maintained at source, thus employing a federated

model, where consumers will be ascertained of the quality of that data because it is

maintained directly by the business owner (e.g. the Publications Office may be

responsible for some data core to the business of the Commission and then all

other DGs similarly provision other reference data).

Table 6 – State Aid reference data

TABLE NAME AUTHENTIC SOURCE

ACCELERATED PROCEDURE TYPE

AGRI DESCRIPTIO OTHER

AGRI DESCRIPTION SUB-TYPE

AGRI DESCRIPTION TYPE

BENEFICIARY NUMBER

BENEFICIARY SIZE

CARTOUCH DESCRIPTORS

CASE BACKGROUND LINK TYPE

CASE CATEGORY

CASE CRITERIA

CASE PLANNING STEPS

CASE TYPE

CLASSIFICATION

CLASSIFICATION PLAN

COMPLAINANT TYPE

COMPLAINT TYPE

COMPLAINTS – MEANS OF CLOSURE

COMPLAINTS – REASON FOR CLOSURE

COMPLAINTS – REASON FOR NON CLOSURE

COUNTRY

CR EU COURT

CR RECENT EVENT

CR STATUS

CURRENCY

DECISION TYPE

DECISIONAL PROCEDURE TYPE

DG

EMPOWERMENT

GBER BENEFICIARY

INTERNAL QUALIFIER

LEGAL BASIS

LANGUAGE

MC CONSDITION STATUS

MC STATUS

NACE CODE

Page 75: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 68 of 70

OBJECTIVE Commission Regulation (EC) No

794/2004 of 21 April 2004 PRIMARY LAW

PRIORITY

PROCEDURE KEY STEP

PROCEDURE NEXT STEP

PROCEDURE TYPE

REGION

REGIONAL AID

RETENTION LIST

SECONDARY EMPOWERMENT

SECONDARY LAW 2

SECONDARY LAW 3

STATE AID INSTRUMENT

SUB DOMAINS

TYPE_OF_AID

UNIT

WORKLOAD

ACCELERATED PROCEDURE TYPE

Page 76: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 69 of 70

ANNEX II METADATA REGISTRY OF THE PUBLICATIONS OFFICE (MDR)

The Metadata Registry (MDR) of the Publications Office27 of the EU is the

authoritative source for definition data – metadata elements, named authority lists,

schemas, etc. – and authority data used for exchanging data between institutions

involved in the legal decision making process. Many of the definition data sets

contained in the MDR are governed by the Inter-Institutional Metadata Maintenance

Committee (IMMC).

The Publications Office uses a tool chain and some scripts to edit the Named

Authority Lists. For each NAL, the Publications Office publishes a set of distribution

which can be downloaded from the MDR website. These sets are composed of a

SKOS, XML, XSD and HTML version.

A publication package is also available as a zip file. It contains the distribution of

changed NALs (XML, SKOS, ATTO-XML28), a comparison file allowing to identify

differences between the previous and the current version, and the release notes

listing the changes to the NALs included in the publication.

The architecture:

In the past, the publication process was time consuming and error-prone.

Moreover, the technologies involved in this process were not portable and had

complex maintenance.

PO decided to improve this process by implementing a cross-platform solution,

licence free and easily maintainable.

PO implemented a solution based on a tool to manage the validation workflow,

JIRA, as well as another to run the files transformation (XML technologies, Perl

programming language) and a software versioning to maintain current and

historical versions of files.

The validation workflow:

Publications office uses JIRA to manage the validation workflow. Three roles have

been defined in the workflow:

NAL operator: in charge of maintaining the Named Authority Lists. They can

open a ticket in JIRA in order to update, create or delete an item in the list.

NAL technician: responsible for the execution of the script to transform files;

they also produce diff reporting and the publication package for the release.

NAL authority: in charge of validating the contents before the release.

The workflow is summarised as below:

1. The NAL operator receives an external request to create, update a NAL;

27 http://publications.europa.eu/mdr/

28 http://publications.europa.eu/mdr/authority/

Page 77: D4.3 Report on implementation of a Metadata Management ... · Component Update 0.28 2014-04-08 Updates in structure of section 4: ... Deployment: Mule .....47 4.3.10. Editor / Deployment:

Reference data governance and management at DG COMP

03/09/2015 Page 70 of 70

2. NAL operator creates a ticket in JIRA, a notification is sent to NAL technician

and NAL authority;

3. NAL operator checks out the excel file from the SVN repository;

4. NAL operator updates the excel file and checks in the excel file in the SVN

repository;

5. NAL operator changes the ticket status in JIRA, NAL technician and NAL

authority are notified;

6. NAL technician launches the transformation process with the tool which

generates XML, SKOS, HTML files

7. NAL technician launches also the diff report to compare XMLs (the current

version and the new one), the report is generated in Excel and HTML;

8. NAL technician updates the ticket status in JIRA and NAL operator and NAL

authority are notified. The report is also sent to the NAL operator and NAL

authority;

9. NAL operator checks the report and validates it. If he detects an error, the

process restarts from point 3;

10. NAL operator updates the ticket status in JIRA, NAL technician and NAL

authority are notified;

11. NAL authority checks also the diff report and gives the final validation. If he

detects an error, the process restarts from point 3;

12. NAL authority updates the ticket status, NAL operator and NAL technician

are notified;

13. NAL technician prepares the release note and the release package, and

sends it to technical team in charge of the deployment;

14. NAL technician closes the ticket; NAL operator and NAL authority are

notified.

The execution workflow:

The execution workflow is the technical side done to transform, to compare and to

package the NAL publication. It is summarized in the following schema:

Figure 9 – Schematic overview of how the Publications Office edits an XML file and generates

all distributions of Named Authority Lists (NALs)

Transform

Output fileOutput file

XLS XML

Output file

XML XSD

Input file

XML SKOS

HTML

Transformation engine (Execution workflow in XML, XSLT files, PERL)