Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
EUROSTAT
Business Case
VIP BIGDATA
Date: 30/04/2015
Version: 1.3
PM² Simplified Version Template V.0.4 (January 2015)
Document Control Information
Settings Value
Document Title: Business Case
Project Title: BIGDATA
Document Author: Albrecht Wirthmann
Project ID (from PMR-site): 617
Project Owner: Mariana Kotzeva
Project Manager: Michail Skaliotis
Doc. Version: 1.3
Sensitivity: Limited
Project Type: Critical
Approval Date: 20 May 2015
Table of Contents
1. Purpose ....................................................................................................................................... 4
2. Action Required .......................................................................................................................... 4
3. Current Situation and Mandate for Change ............................................................................... 5
3.1. Problem statement ............................................................................................................. 5
3.2. Mandate (legal base) .......................................................................................................... 5
4. Objectives and Deliverables ........................................................................................................ 6
4.1. Scope ................................................................................................................................... 6
4.2. Aims and Objectives ............................................................................................................ 7
4.3. Deliverables and Key Milestones ........................................................................................ 8
4.4. Indicators .......................................................................................................................... 15
4.5. What the project does not include ................................................................................... 15
5. Impact Assessment ................................................................................................................... 16
5.1. Stakeholder Analysis ......................................................................................................... 16
5.2. Project Environment ......................................................................................................... 17
5.3. Cost-Benefit Analysis......................................................................................................... 18
5.4. Risk Analysis ...................................................................................................................... 20
6. Approach ................................................................................................................................... 22
6.1. Methodology ..................................................................................................................... 22
6.2. General Description .......................................................................................................... 22
6.3. Resources and Lead Times ................................................................................................ 23
6.4. Project Funding ................................................................................................................. 25
7. Project Organisation ................................................................................................................. 25
7.1. Project Manager ................................................................................................................ 25
7.2. Reporting Structure........................................................................................................... 25
Annex 1 – Stakeholder Analysis ........................................................................................................ 27
Annex 2 – Risk Analysis ..................................................................................................................... 30
Business case BIGD Date: 30/04/2015 Version: 1.3 4 / 32
1. PURPOSE
The purpose of this document is to outline the scope, approach, objectives, preliminary
resource implications and impact on stakeholders of the BIG DATA project (BIGD) for
endorsement by the ESSC. The BIGD project is part of the Vision 2020 implementation
portfolio responding to the pressing need of harnessing new data sources to deliver better
statistical products and service in response to users' needs. The project will implement the
short and medium term objectives of the Big Data Action Plan and Roadmap, which was
endorsed by the ESSC at its meeting in Riga on 26 Sep 2015. The proposals are based on
stakeholder analysis and an initial feasibility and cost-benefit study.
The aim of the BIGD project is to provide the ESS with necessary capabilities, operational
experiences, methodological and ethical guidelines, and testing infrastructures required for
a stepwise integration of big data sources into the production of official statistics. The
project envisages: (i) short-term actions aimed at building capacity in the ESS to harness big
data sources and delivering first results on the use of big data as an auxiliary source for the
production of official statistics; (ii) medium-term actions to create the legal, technical and
statistical infrastructure for systematic use of big data sources in different domains of official
statistics. The corresponding long term vision (beyond 2020) of the Big Data Action Plan and
Roadmap is to effectively achieve a full integration of big data sources into the regular
production of statistics and the statistical information architecture in a multisource
framework. Therefore, actions in the project have been designed to support the long term
vision using an agile and gradual approach. This would ensure that lessons learned from
initial experimental actions are used to enhance methods and tools for the use of new
sources in statistical production.
2. ACTION REQUIRED
The ESSC is asked to provide feedback and approve the approach outlined so that the
project may proceed to the detailed planning phase and implementation.
Business case BIGD Date: 30/04/2015 Version: 1.3 5 / 32
3. CURRENT SITUATION AND MANDATE FOR CHANGE
3.1. PROBLEM STATEMENT
As stated in the Scheveningen Memorandum1, recent innovations in the information
and communication technologies have been leading to an increasing degree of
digitization of economies and societies that offer new opportunities for the
compilation of statistics, while the use of big data for statistical purposes challenges
and urges the European Statistical System to effectively address a variety of issues.
Harnessing new data sources is potentially providing scope to increase the quality
and the variety of statistical products enabling the ESS to better respond to fast-
growing and increasingly differentiated user needs.
3.2. MANDATE (LEGAL BASE)
In the Scheveningen Memorandum, the ESSC requested Eurostat and the NSIs to
elaborate an ESS action plan and roadmap in order to follow up the implementation
of the memorandum. At its meeting in Riga on 26 Sep 2014, the ESSC endorsed the
Big Data Action Plan and Roadmap 1.0 (BDAR) and proceed to the creation of a
concrete project that would integrate it into the ESS Vision 2020 portfolio. In
addition, the ESSC agreed that the ESS Task Force on Big Data for official statistics
would coordinate the work on the implementation of the BDAR, which is the purpose
of the BIGD project.
Following the endorsement of the ESS Vision 2020 by the ESSC in May 2014, in
February 20015 the ESSC approved a portfolio of projects proposed by the VIG based
on a prioritisation methodology which also involved in the assessment the VIN group.
As a result of this process it was decided that a Big Data project should be part of the
portfolio of Vision implementation project reflecting the strategic importance of
using new data sources to increase the efficiency of statistical production to better
respond to user needs.
1 Adopted by the ESSC of 27 September 2013:
http://ec.europa.eu/eurostat/documents/42577/43315/Scheveningen-memorandum-27-09-13
Business case BIGD Date: 30/04/2015 Version: 1.3 6 / 32
4. OBJECTIVES AND DELIVERABLES
4.1. SCOPE
The BIGD project is linked to a number of key areas of the ESS Vision 2020. It
represents an immediate response to the demand of the Vision ESS 2020 for
harnessing new data sources in statistical production moving to a multi-source
environment for the delivery of statistical products to users. The structure of the
BDAR reflects the methodological and organisational challenges pointed out in the
ESS Vision 2020 document. Inclusion of big data in official statistics´ production will
necessitate the adoption of new methods for data analysis and processing as well as
enhancing the IT infrastructure of the ESS members. Organisational and management
issues relate to the legal framework, privacy, knowledge and capacities of handling a
dynamically evolving data and metadata ecosystem.
The scope of the project covers in general the short and medium term goals that
have been identified in the BDAR. In particular it covers the execution of pilots for
exploring the potential of selected big data sources for the production of official
statistics and the application of results to specific statistical domains in response to
users' needs. Depending on the big data source different statistical domains will be
affected by the project, e.g. exploration of mobile communication data could affect
tourism, population, migration, regional or transport statistics.
In addition to statistical domains and data sources, the project includes
accompanying actions on identified horizontal topics to enable integration of big data
sources into official statistics. These are:
Methodological frameworks,
Quality frameworks,
Metadata frameworks,
IT infrastructures,
Communication,
Legal frameworks,
Ethical frameworks,
Skills and training,
Experience sharing.
Common access to data sources, development of methods and use of applications
will create new opportunities for rethinking the current collaboration model for
Business case BIGD Date: 30/04/2015 Version: 1.3 7 / 32
European statistics towards common architecture, IT infrastructure and regulatory
solutions.
4.2. AIMS AND OBJECTIVES
The overall purpose of the BIGD project is to enable the ESS to gradually integrate big
data sources into the production of European and national statistics managing this
complex goal using an agile framework where lessons learned are used in the
subsequent phases of the projects, and gradual implementation of results aim at
frequent delivery of products during the length of the project.
Actions and deliverables have been defined to reach this goal and are broken down in
short and medium term goals in the BDAR. The BDAR constitutes a high-level
description of where we would like to be, in terms of:
– Long-term vision (beyond 2020)
– Medium-term aims (by 2020)
– Short-term objectives (by the end of 2016)
The long term vision does not make part of this project. The related goals are
enumerated in the BDAR.
The aims relevant for the BIGD project at medium term include the finalisation of big
data pilots and early implementation of statistics based on selected big data sources,
the design of IT infrastructures that can process big data sources or producing
statistics, the implementation of small scale computation environments and the
development of partnerships to create more data computation centres tailored to
project needs, the development of methodological and quality frameworks, the
implementation of professional training to acquire necessary skills for statisticians,
the establishment of partnerships with stakeholders (data providers, academia, etc.)
and the implementation of a communication strategy towards important
stakeholders with the aim of ensuring use of big data sources by official statistics.
Actions for achieving the short term objectives are related to analysing and preparing
the conditions for big data usage in official statistics and starting concrete pilots to
gain experiences supported by two waves of ESSnet projects to leverage ESS
members experience on big data use and catalyse resources for their application to
specific business domains. Actions include identification and analysis of big data
sources, exploring potentials of partnerships with data providers, design and
experimentation of small scale computation environments and data centres tailored
Business case BIGD Date: 30/04/2015 Version: 1.3 8 / 32
to project needs, identification of skills and elaboration of training programs,
identification of research needs, integration of big data strategy into overall strategy
at European level, analysis and elaboration of ethical guidelines, analysis of legal
environment, and elaboration of a communication strategy.
4.3. DELIVERABLES AND KEY MILESTONES
The big data environment is characterised by very rapid development, mainly driven
by technological advances. It is therefore necessary to review the business case at
regular intervals and adjust (if necessary) the actions to the technological, economic
or societal developments in order to assure achievement of the overall aim.
The BIGD project contains actions related to horizontal areas as well as actions
related to the execution of pilots. The description of these actions and the related
deliverables are therefore grouped according to the following topics:
– Policy
– Communication
– Big data sources
– Methods
– Quality
– IT infrastructure
– Skills
– Experience sharing
– Legislation
– Applications / Pilots
– Governance
i) Policy
A strategy for big data in official statistics should be embedded into overall
government strategies at national as well as at EU level. The above mentioned
Communication of the Commission calls for the definition of a government strategy
on big data.
Objective Deliverable Timing Actor Relational aspects
Definition of a
strategy for official
statistics related to
big data
Strategy
document
10/14 – 12/15 Coordinated by
Eurostat with input
from task force
Integration into overall
government strategy on big
data at European
Commission and national
level.
Business case BIGD Date: 30/04/2015 Version: 1.3 9 / 32
ii) Principles of Official statistics and big data: Communication strategy
A number of big data sources contain sensitive information. Use of these sources for
official statistics purposes may induce negative perceptions with the general public
and other stakeholders. That could endanger the successful execution of pilots and
have negative consequences on the long-term goal of integration of big data in
official statistics production. It is therefore of utmost importance to define aims,
procedures and outcomes of big data usage according to the UN fundamental
principles of Official Statistics and the European Statistics Code of Practice with a
focus on ethical principles, such as privacy. Based on the results of this "ethical"
review a communication strategy should be developed and implemented that should
guide the execution of the pilots and would support later integration of big data
sources into official statistics.
Objective Deliverable Timing How to achieve? Relational aspects
Definition of
principles and ethical
guidelines for big data
utilisation
Document with ethical
guidelines
1/16 – 04/17 Call for tender to
ensure specific
expertise; review
within the pilots
Activities at UN and
ESS levels
Definition of
Communication
guidelines for big data
projects
Document with
communication
guidelines
1/16 – 12/16 Call for tender to
ensure specific
expertise, follow-
up for each pilot
Review and input by
ESS Big Data Task
Force
iii) Big Data Sources
The number of big data sources is growing rapidly. The variety and size of big data
sources determine to a great extent the potential of big data for producing statistics.
In order to take informed decisions on actions and pilots it is essential to work on an
inventory and taxonomy of big data sources. The HLG on modernisation of official
statistics has started work on this subject that will be further developed and can be
reused for the aims of the BIGD project.
Objective Deliverable Timing Actor Relational aspects
Inventory of big data
sources and definition
of taxonomy
Database with
inventory; taxonomy
related to work at
international level
From 9/14 –
12/15
UN organisations Activities at UN and
national levels
Development of
metadata framework
for big data
processing
Meta-, paradata
framework for big data
1/15 – 12/17 UN organisation
and further
refinement with
ESS big data pilots
Based on existing ESS
and UN frameworks
Business case BIGD Date: 30/04/2015 Version: 1.3 10 / 32
Objective Deliverable Timing Actor Relational aspects
Partnerships among
stakeholders
Community
development and
exchange of best
practice
1/16 – end of
project
Eurostat and ESS
TF big data
Stakeholders of the
pilots should
participate
iv) Applications / Pilots
A number of projects and activities related to big data and statistics have been
carried out at national, European and international level. Building on these
experiences and based on certain priority criteria elaborated by the ESS TF – Big Data
and discussions with key stakeholders, 6 pilot projects are planned to be carried out
by two consecutive ESSnets. The business case of the two ESSnets will also be
discussed at the ESSC Meeting on 20-21 May 2015. Due to the amendment of the
Financing Decision 2015 of Eurostat, the first ESSnet agreement could already be
signed in 2015 and second one would be launched in 2016. These pilot projects are of
critical importance for the success of the BIGD. While the full-fledged integration of
big data sources into the statistical production remains a far reaching long term
objective, we do expect that outcomes of the pilot projects will pave the way for
earlier integration of these sources in the statistical production for specific statistical
domains. Special attention to this issue will be drawn when elaborating the detailed
specifications of the ESSnets that will also reflect business priorities based on users'
needs.
Objective Deliverable Timing Actor Relational aspects
Pilots for generating
statistics from big
data sources at ESS
level (ESSnet I + II)
6 Pilot projects, phased
approach
1/16 – 12/17,
1/18- 12/19
ESSNET;
Framework
partnership
agreements
Eurostat and NSIs;
build on work by UN,
NSIs, Eurostat and
Commission
v) Methods
The use of big data sources requires application of new methods in data analysis,
processing and statistical inference. At the same time the methods are dependent on
the data sources, e.g. if they contain structured data or textual information. Actions
related to methodology should aim at developing a common toolbox of methods that
would become available throughout the ESS and fit for use for different statistical
domains.
Business case BIGD Date: 30/04/2015 Version: 1.3 11 / 32
Objective Deliverable Timing Actor Relational aspects
Inventory of methods Document with
statistical
methodologies used on
big data projects
10/14 – 12/15 UN organisation Activities at national
and UN levels;
Should be further
developed through the
pilots
Toolbox of big data
methods
Methodological guide 1/16 – 12/17;
1/18-12/19
UN organisation
and further
development in
pilots
Developed as part of
the pilots and
consolidated as
horizontal activity
vi) Quality
The provision of high quality information is one of the corner stones of official
statistics. Statistical information should be fit for use. Quality profiles differ
depending on the product type according to the statistical information infrastructure,
i.e. indicators, accounting systems, and data. Statistical information derived from big
data sources should be described according to defined quality elements to be able to
evaluate their overall quality. Previous European work on the quality of statistics
derived from administrative data sources and preliminary work on quality
frameworks for big data sources by the HLG will provide a good starting point. The
final aim of the related actions is to be equipped with a quality framework that would
be adjusted to big data sources and that would allow describing quality of derived
statistics according to their intended use.
Objective Deliverable Timing Actor Relational aspects
Review of
quality
framework
Quality
framework
adjusted to big
data sources
1/15 – 12/17 UN organisations and
further development within
ESSNET on pilots;
consolidation by ESS Big
Data Task Force
UN framework should
be developed through
the pilots;
ESS, UN level
vii) IT infrastructure
The inherent characteristics of big data, including their volume, variety and velocity
have implications on IT systems and infrastructures. In order to utilise the potential
of big data it will be necessary to analyse requirements related to big data
processing, including security and confidentiality issues, and design IT infrastructures
to be implemented as part of new workflows of statistical data production. This will
also require creating small scale data computation facilities and creating partnerships
with external stakeholders to develop data analytics solutions tailored to the specific
needs of the project. Based on these results the future IT infrastructure(s) would be
Business case BIGD Date: 30/04/2015 Version: 1.3 12 / 32
determined by the business model(s) implemented to produce statistics from big
data.
Objective Deliverable Timing Actor Relational aspects
Inventory, definition
of requirements and
specification of future
IT infrastructure
Documents with
specifications
1/16 –
12/17
1/18-12/19
ESSNET Developed as part of
the pilots and
consolidated as
horizontal activity;
NSIs, Eurostat/DG
DIGIT
Design and
experimentation of
small scale
computation
environments and
data centres tailored
to pilots' needs
Hard- and software
infrastructure capable of
managing, analysing and
processing of selected big
data sources for the pilots
(see point iv applications
/ pilots)
1/16 –
12/17
1/18-12/19
ESSNET During runtime of the
ESSnets I+II;
Planning and
implementation of IT
infrastructure
IT infrastructure suitable
for big data processing
2018-2021 TF big data, DG
Digit, supported
by specific
expertise
Corresponds to
medium term
objective; depends on
finalisation of pervious
action;
NSIs, Eurostat/DG
DIGIT
viii) Skills
The access, management, processing and analysis of big data require specific new
skills or skills combinations that are currently not present in official statistics. These
are closely related with the term of “data scientist”. A definition, an inventory and a
strategy for acquiring these skills for the European Statistical System will be essential
for success of the action plan.
Objective Deliverable Timing Actor Relation
Inventory of big data
skills and
identification of
required skills for big
data for official
statistics
Document with
identification of skills,
definition of training needs,
elaboration of curricula and
definition of a strategy for
imparting skills related to
big data for official
statistics.
9/14 – 12/15 TF big data Inventory of UNECE;
Skills should be
reviewed in pilots;
ESS, UN level
Business case BIGD Date: 30/04/2015 Version: 1.3 13 / 32
Elaboration of big
data courses for
official statistics
according to different
target groups
(managers, statistical
officers, …)
Courses in ESTP, EMOS, …;
1/15 – 12/17
Regular Review
2018 - 20
TF big data,
call for
tender
Pilots should be used
to focus training;
ESS level
Review of HR strategy Document defining big data
HR strategy
1/2016 –
12/2016
TF big data Depends on output
of skills inventory
ix) Experience sharing
An important element is to share experience on projects, applications, pilots and big
data sources within the ESS. One example, which is already implemented in the
frame of the big data project run by the UNECE, is the sandbox environment that
helps to get familiar with big data processing.
Exchange of information between stakeholders at all levels should be done via face-
to-face and virtual meetings, as well as electronic communication platforms and
written reporting. Annual workshops are planned to discuss progress and results of
the different actions with internal and external stakeholders.
Objective Deliverable Timing Actor Relational aspects
Elaboration of
measures for sharing
experiences, e.g.
workshops, sand box,
competitions, …
Document describing
measures for later
implementation
10/14 – 12/15 TF big data UN, NSIs, Eurostat
Implementation of
measures
Workshops, sand box,
competition
From 6/16 ESSNET, calls
for tender, TF
big data
ESS level
x) Legislation
Legislation plays a crucial role in determining the framework conditions for accessing,
processing and disseminating statistics derived from big data sources. On the one
hand legislation refers to laws regulating activities of statistical bodies1. The statistical
legislative framework should be reviewed and enhanced in cases where current
legislation would prevent or limit use of big data sources, e.g. by determining use of
surveys or by limiting access to big data sources. On the other hand it refers to
1 Regulation 223/2009 of the European Parliament and of the Council of 11 March 2009 on European statistics,
OJ L 87/164, 31/03/2009, p. 164–173
Business case BIGD Date: 30/04/2015 Version: 1.3 14 / 32
protection of personal information and privacy of natural and legal persons and to
intellectual property right1. In general, directives define minimum requirements that
can be further refined at national level while regulations are directly applicable at
national level. Depending on the type of legislation, this might have consequences as
regards harmonisation of big data processing within the ESS.
Objective Deliverable Timing Actor Relational aspects
Assessment of
current legislation as
regards big data
usage (statistical,
data protection and
privacy, other
legislation related to
big data sources)
Document with
analysis, assessment
and proposals for
enhancements of
current legislation
1/16 – 12/16 ESSNET, call
for tender
ESS
xi) Governance and Coordination
The BIGD project should be guided by a clear governance structure that ensures
availability of information at all necessary levels while providing adequate
operational flexibility for review and adaptation of related actions. The operational
management of the action plan should assure coordinated output of the various
actions, the monitoring of the time table, regular reviews and revisions if necessary.
The operational management should also report on progress at regular time intervals
to various addressees according to the agreed governance structure.
Objective Deliverable Timing Actor Relational aspects
Elaboration and
agreement on
governance structure
Mandate of ESS TF Big
Data and Big Data
contact group with
governance structure
9/14-6/15 TF big data,
ESSC
ESS
1 Directive 95/46/EC of the European Parliament and of the Council of 24/10/1995 on the protection of
individuals with regard to the processing of personal data and on the free movement of such data, OJ L 281,
21/11/1995 p. 31-50. A new legal framework on data protection is currently in legislative procedure, see
COM(2012) 11 final.
Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing
of personal data and the protection of privacy in the electronic communications sector (Directive on privacy
and electronic communications). Official Journal L 201 , 31/07/2002 p. 37 – 47;
Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of
databases OJ L 077 , 27/03/1996 P. 20 - 28
Business case BIGD Date: 30/04/2015 Version: 1.3 15 / 32
4.4. INDICATORS
Indicators on the success of the project can be measured according to the degree of achievement of the previously formulated goals on short, medium and long term. They are:
Short term:
– Identification and analysis of output portfolio of big data sources (short to medium term activity)
– A number of successful pilot projects on big data applications in official statistics launched and delivering first results which can be considered for adoption in the context of statistical production.
– The requirements in terms of skills needed for the exploration of big data in official statistics and ESS professional training programmes are established.
– Set up of IT infrastructure solutions to deliver data computation capacity tailored to project needs and data access solutions, including though partnerships
– Communication to the general public on planned and ongoing big data activities of the ESS with users/policy makers on their needs, and with big data owners on data aspirations.
– Information with stakeholders within the statistical system and the research community is exchanged, e.g. a European conference on big data in Official Statistics.
Medium term:
– Pilot results provide input for the industrialised implementation of statistics based on big data.
– IT infrastructures are designed that can process big data sources for generating official statistics.
– Methodological and quality frameworks are developed (and/or reviewed) for integrating big data Sources in official statistics in a production setting.
– Partnerships are in place on big data and Official Statistics.
– Ethical guidelines are produced and agreed among the ESS.
– Implementation of communication strategy ensures positive attitude of general public towards use of big data sources in official statistics.
4.5. WHAT THE PROJECT DOES NOT INCLUDE
The BIGD project is focussing on the short and medium term objectives of the BDAR
and does not contain implementation of statistical processes based on big data
sources at ESS level. However, depending on the speed of progress related to the
pilots, it might be the case that some objectives would be achieved earlier. E.g. work
on methodological and quality frameworks could advance quicker for some statistical
domains. This could lead to implementation actions during the runtime of this
Business case BIGD Date: 30/04/2015 Version: 1.3 16 / 32
project. In this case, actions related to achieving this objective could start earlier than
foreseen and be included into the overall scope of the BIGD project. Enabling
decisions will be informed by revisions of the business case.
5. IMPACT ASSESSMENT
5.1. STAKEHOLDER ANALYSIS
Organisations outside the ESS, such as the OECD, the UN statistical division, the
World Bank, and the UNECE, are involved in further developing the subject. The big
data project of the UNECE HLG1 is an international collaboration project on the role
of big data in the modernisation of statistical production2. At its 45th session in 2014,
the Statistical Commission of the UN supported the creation of a global working
group (GWG) on Big Data for Official Statistics. The GWG established a number of
task teams working on different topics around use of new data sources for official
statistics and in particular for producing statistics related to the Sustainable
Development goals. Outputs of these initiatives and projects will be integrated into
the work of the ESS Big Data TF to avoid duplication of work and to ensure
harmonisation at international level.
The European Commission has launched a policy initiative aiming at tapping the full
potential of big data for the European economy, society and public services. The
communication of the European Commission “Towards a thriving data-driven
economy” is sketching the features of the future data-driven economy and sets out
fields of activities to support the transition to this future economy. As a first step, the
Big Data Value contractual private public partnership3 was launched on 13 Oct 2014.
It joins the European Commission with representatives from industry and academia
who agreed on a strategic research and innovation agenda for the period 2016 -
2020. Main elements of the agenda are creating lighthouse projects and innovation
spaces following agreed technical and non-technical priorities. Eurostat and other
Commission services are initiating a coordination group at the level of the European
Commission to ensure this aim.
1 High-Level Group for the Modernisation of Statistical Production and Services:
http://www1.unece.org/stat/platform/display/hlgbas/High-
Level+Group+for+the+Modernisation+of+Statistical+Production+and+Services
2 See: http://www1.unece.org/stat/platform/display/bigdata/Big+Data+in+Official+Statistics
3 See: http://www.bigdatavalue.eu/
Business case BIGD Date: 30/04/2015 Version: 1.3 17 / 32
The European Central Bank and other central banks have also demonstrated their
interest in exploring the use of big data for economic and financial statistics and have
recently launched a number of initiatives and workshops in this regard. The scope of
synergies and collaboration between the ESS and ESCB should therefore be explored.
Universities and other academic stakeholders are very active in doing research on
information and communication technologies and especially relate to big data
sources. They have a strong interest in both big data sources and in data coming from
official statistics. At the same time they are indispensable partners for developing
new methodologies to analyse big data and as educators and trainers of statisticians.
The exploration and development of processes for integrating big data sources into
official statistics requires close involvement from academia.
Businesses play an important role in the big data ecosystem as owners of big data
sources, as developers of innovative services or as users of statistical data derived
from big data sources. Cooperation between public administration and private
entities, which could result in creating public private partnerships, will be essential to
tap the full potential of big data for official statistics.
Public authorities other than statistical offices could as well be owners or users of big
data sources. In addition, they could be responsible for certain aspects related to big
data, such as legal settings or data protection.
Statistical offices have been experiencing a decreasing willingness of citizens to
respond to surveys. However, a steeply growing volume of personal and behavioural
data is collected in exchange of free digital services or loyalty programs. There is an
increasing awareness of possible misuse of personal data collected via digital
channels. Even the perception of possible misuse of data could evoke very negative
public opinions that could endanger utilisation of big data for official statistics.
Advocating professional independence and ensuring the principles of the European
Statistics code of practice are of utmost importance to achieve the goals of the
project.
5.2. PROJECT ENVIRONMENT
Eurostat, other DGs of the European Commission (for example CNECT, JRC, DIGIT),
the ESS and its international partners are already very active in the field of big data.
As indicated in the previous chapter, there are multiple activities where results are
already available, on which the BIGD project could build upon; a prominent example
Business case BIGD Date: 30/04/2015 Version: 1.3 18 / 32
is the various deliverables of the UNECE HLG project on big data1. Outputs of the
Global working group on big data and official statistics of the UNSD are another
example.
It is important for the ESS to maintain and strengthen the good collaborative working
arrangements with other bodies and institutions which are active in the field of big
data and official statistics, and develop further synergies with other institutional and
private partners. There are several areas of work which the ESS can contribute to and
benefit from international collaboration (in particular with UN), such as
methodology, quality, metadata, IT, business frameworks, global inventories, etc.
5.3. COST-BENEFIT ANALYSIS
It must be acknowledged that while the costs might be rather well-defined for each
of the ESSnets itself, there will be no benefit unless the results are implemented –
and the implementation costs are not yet known.
The expected benefits from the integration of big data sources into official statistics
can be summarised as follows:
– Better response to user needs through availability of various new data sources;
– Acquisition of new competences to enlarge portfolio of official statistics and ensure role as centres of competence towards users;
– Increased efficiency (if a statistical product is possible to produce at lower cost using big data sources);
– Big data techniques may make some data processing less expensive than traditional techniques. Examples are trade and patents data that are housed in huge relational databases but which easily scale to big data repositories;
– Wider product range (if “completely new statistics” based on big data sources are used);
– Increased quality (if a statistical product could be improved [timeliness, completeness, relevance, accuracy, ...] using big data sources);
– Reduction of burden on respondents;
– Faster adaptability (if the phenomenon that official statistics tries to capture “moves”, the big data source may possibly “move with it”, including new, relevant variables as part of an expanding business)
– Provision of big data based official statistics, produced in compliance with sound statistical disclosure control (SDC) principles, may reduce the general
1 See: http://www1.unece.org/stat/platform/display/bigdata/Big+Data+Projects
Business case BIGD Date: 30/04/2015 Version: 1.3 19 / 32
public use of “non-compliant”, “alternative” statistics produced by other actors.
Do nothing Scenario
Data is becoming a new asset for the economy. In the near future, we will experience
an increasing amount of new services derived from data (in particular in an Internet
of Things environment). Big data (or simply data of any kind) will play a central role in
this new development. With an ever increasing number of data sources, private
businesses will soon be able to produce statistics that will be in competition with
official statistics. In the beginning, the quality of these statistics will be lower or be at
least questionable but these new producers will improve by time. In addition,
statistical data could be provided in a timelier manner than official statistics.
Examples are the Billion prices project of the MIT1, various activities of global players
like Google, or small or medium sized enterprises like Positium2, which have
specialised in exploitation of specific big data sources such as mobile communication
data. With increasing availability of registers as open data, statistical frames will no
longer be a monopoly to statistical offices but can be affordable to private
enterprises, too. Updates of these frames would be done via internet sources.
Ignoring these new developments, official statistics would lose relevance in future
and risks to be marginalised similarly to what happened to the geographical offices
with Google or TomTom heavily investing into satellite images, aerial photographs
and topographic maps.
This scenario, i.e. 'do nothing', is also in contradiction with the Scheveningen
Memorandum (commitment of the ESS to explore the potential of Big Data for
official statistics), as well as with the Fundamental Principles of Official Statistics (in
particular Principle 1) and the European Statistics Code of Practice (in particular
principles of relevance, cost effectiveness, and timeliness and punctuality). It
therefore becomes obvious that the search for an alternative scenario is not simply
an option for the ESS but it becomes a responsibility.
1 See: http://bpp.mit.edu/
2 Positium participated in the Eurostat project "Feasibility study on the use of mobile positioning data for
tourism statistics", Eurostat Contract No 30501.2012.001- 2012.452,
http://ec.europa.eu/eurostat/web/tourism/methodology/projects-and-studies
Business case BIGD Date: 30/04/2015 Version: 1.3 20 / 32
Alternative Scenario
Pursuing its mission, the statistics code of practice and fundamental principles of
official statistics, as well as the expectations of society towards the role of official
statistics, the ESS should investigate the potential of certain big data sources for
producing official statistics in a coordinated, cost effective and collaborative way. It is
of utmost importance to stick to high quality standards (one of the unique
comparative advantages of official statistics versus other data providers), to produce
statistics in a transparent and scientifically robust manner in order to create trust
among the users of 'big data – derived' official statistics. In addition, it is important
that statistical offices develop appropriate analytic capabilities and competences
needed for the new data ecosystem. To be recognised as key actors in this dynamic
knowledge era, Statistical offices have as well to effectively communicate the distinct
values of official statistics (why official statistics matter) and position themselves as
independent stakeholders only bound by the principles of the code of practice.
The BIGD project adheres to this scenario, as being the only approach which ensures
that NSIs and official statistics remain relevant and continue to fulfil their role in the
future.
5.4. RISK ANALYSIS
There are numerous risks when trying to use new data sources, in this case big data
sources, for the purpose of creating official statistics.
A fundamental overarching risk is the potential irrelevance of official statistics in a
fast-changing technological environment where new data sources are made available
to different users for the production of statistical information if no action in this field
is taken. This motivates the need to invest in ways to efficiently harness the potential
for big data sources in the ESS statistical production and identify ways to use these
new data sources in the delivery of statistical products and services to users.
An exhaustive list of specific risks with proposed actions for treatment is contained in Annex 2. Out of this exhaustive list we identified the following high level risks:
– Access to data sources
– Negative public opinion on the use of big data sources by official statistics
– Duplication of work among stakeholders
– Lack of experts / skills for big data usage
– Changes in EU legislation, specifically data protection regulation
The risk of not getting affordable access to (some) relevant big data sources has
become very prominent in the first attempts of the NSIs and Eurostat to explore big
Business case BIGD Date: 30/04/2015 Version: 1.3 21 / 32
data sources for official statistics. Strategies for mitigating this risk include (i)
selecting those data sources which do not have this risk, such as internet websites, or
(ii) in cases of 'difficult to access data sources', ensure that those countries which
participate in pilots have resolved the issue of 'access' beforehand (e.g. some NSIs
have already managed to access mobile communication data by enforcing statistical
law or by developing partnerships with network operators).
A number of big data sources contain sensitive in the way that they are related to
personal information or individuals could be identified indirectly, i.e. movement
patterns from mobile communication data could be used to identify individuals. Big
data sources could also be used for optimising decisions related to government
actions, e.g. placing speeding cameras according to information on traffic. Negative
public opinions on use of specific big data sources could inhibit use of those data
sources for official statistics. Measures for mitigation are the definition of ethical
guidelines for big data usage, extending existing frameworks, starting from the
European Statistics code of practice. Work on ethical guidelines should also include
an assessment of sensitivity of big data sources. Ethical guidelines have to be
communicated to the public to be effective. Therefore, it will be necessary to define
and execute an appropriate communication strategy accompanying the pilots and a
possible large scale implementation.
A number of stakeholders already have or are planning to start activities on exploring
the potential of big data sources for the benefit of their business. A number of NSIs
have started pilots on using big data sources for official statistics. International
organisations, such as the UNECE and the UNSD have started big data initiatives. In
order to avoid duplication it is therefore necessary to closely collaborate with the
different stakeholders. The aim should be that each organisation should concentrate
on its strengths, which are related to its mission. Existing experience at national level
is necessary to assess probability of success of pilots at European level. Activities of
UN organisations related to defining frameworks, such as on quality could be
integrated into work at the ESS level.
Use of big data sources represents a methodological challenge changing the
paradigm of official statistics from design based approaches to modelled based
algorithms. Staff of NSIs has to be able to combine these new methodological
approaches with the quality requirements of official statistics. Currently, there are
only few staff members within the ESS who would fulfil these requirements. In order
to cope with this shortage, it will be necessary to train existing staff and to include
graduates from universities who are able to bring in skills related to new
methodologies. These have to be taken up and combined with official statistics
requirement within a project setup.
Business case BIGD Date: 30/04/2015 Version: 1.3 22 / 32
Currently, the regulation for data protection is being discussed between the
European Parliament, the European Council and the European Commission. The final
outcome is still open. Depending on decisions taken, there might be consequences
for use of big data for the purpose of official statistics. The influence of the statistical
system at this final stage is very limited. Therefore, actions of the ESS in this regard
depend of the final text of the Data Protection Regulation which will be known once
the Regulation is adopted by the Council and the European Parliament.
6. APPROACH
6.1. METHODOLOGY
This project will follow the PM2 Methodology for all project activities.
6.2. GENERAL DESCRIPTION
The BDAR distinguishes between pilots and actions related to horizontal topics. The
issue of integrating big data sources into official statistics could be tackled by
investigating on the potential of data sources for producing statistics that are
relevant for different domains. The alternative approach would be to start from
statistical domains and try to identify big data sources that could contribute to
enhancing or replacing current data sources. Experience at national level show, that
access to data sources is a serious issue that could delay the execution of a pilot
project considerably. Therefore, the BDAR suggests to concentrate on data sources at
first hand and to explore their potential for different statistical domains.
The exploration of specific data sources should be done via pilots that are conducted
by a consortium of NSIs, supported by scientific advice, if necessary. In parallel with
the exploration of data sources, the pilots should contribute to solving issues related
to horizontal topics such as quality, methodology, legislation or IT infrastructure.
Some topics, such as quality would start with a preliminary framework, which would
be further elaborated within the pilots and, at the finalisation stage of the pilots, be
consolidated at general level in order to reach to a general quality framework for big
data processing for the purpose of official statistics.
Other topics, such as policy, legislation, ethical guidelines and communication would
be treated as separate actions as they require a more in depth investigation or act as
an enabler for the pilots.
Business case BIGD Date: 30/04/2015 Version: 1.3 23 / 32
The pilots should be conducted by ESSnets. This approach assures involvement of the
NSIs, the proper financing of the actions and later acceptance of the propose
solutions by the members of the ESS.
Consolidation of the work performed within the pilots could be coordinated by the
ESS Big Data TF in order to ensure general applicability.
Work related to some horizontal topics should be carried out by procurement
procedures as they either require specific expertise or are more of supporting and
administrative nature. These include actions related to big data ethics, advocacy and
communication, analysis related to legislation, and tasks of administrative support for
meetings and / or workshops.
6.3. RESOURCES AND LEAD TIMES
The project will be conducted within the current structure of the Eurostat TF Big Data
and the ESS TF Big Data. The overall project is embedded into the ESS Vision 2020
portfolio. The BIGD business case is derived from the BDAR, which is a joined product
of the Eurostat and the ESS Big Data TFs. It is assumed that both task forces continue
working on the BIGD project. With the adoption of the BDAR by the ESSC in
September 2014, preparations have already started for the implementation of the
BDAR. Implementation of the BIGD project can therefore start immediately after the
approval of the business case by the ESSC.
As noted above, the BIGD project will be implemented using resources from the
Eurostat TF Big Data and from the ESS TF Big Data. Additional resources will be
necessary for implementing the planned big data pilots. These resources will be
mobilized creating two subsequent ESSnets. There is a separate business case for the
ESSnets, which contains more detailed information on time lines and deliverables.
Most of the work of the BIGD project will be performed via the ESSnet. Support
actions related to ethics, communication, legal analysis and experience sharing will
be supplied using procurement procedures.
Business case BIGD Date: 30/04/2015 Version: 1.3 24 / 32
Figure 1: Roadmap BIGD project
Business case BIGD Date: 30/04/2015 Version: 1.3 25 / 32
6.4. PROJECT FUNDING
The total estimated cost of the project is 4.8 million euro for Eurostat and 0.4 million
euro for the NSIs.
7. PROJECT ORGANISATION
7.1. PROJECT MANAGER
The project manager for this project is Michail Skaliotis, head of the Eurostat Task
Force on Big Data. The project owner is Mariana Kotzeva, Eurostat’s Deputy Director
General.
7.2. REPORTING STRUCTURE
The ESS TF on Big data will perform operational project management and will serve
as project steering group. It will assure coordinated output of the various actions, the
monitoring of the time table, regular reviews and revisions if necessary. The ESS TF
Big Data will also report on progress at regular time intervals to DIME and inform
other relevant Directors’ Groups as appropriate. It will also contribute to the VIG
reports on the overall progress of the ESS Vision 2020 implementation portfolio.
Project Team
As stated in the ESS Big Data Action plan and Roadmap 1.0, embarking on the use of
big data for official statistics is a nontrivial activity, taking place in a dynamic
environment. External events, as well as findings made along the way during the
implementation of the Action Plan will most likely trigger the inclusion of new actions
and the refocusing of existing ones.
For this reason, the ESSnets consortia should be rather large, and include any ESS
members which could conceivably be involved in the implementation of the BDAR
(this could, for instance, include comparatively minor activities, such as trying out the
national feasibility of implementing methods developed by other members of the
ESSnets). Any other national authorities1 which could possibly be involved should
also be considered.
1 List of National Statistical Institutes and other national authorities responsible for the development,
production and dissemination of European statistics as designated by Member States
http://epp.eurostat.ec.europa.eu/portal/page/portal/ess_eurostat/introduction
Business case BIGD Date: 30/04/2015 Version: 1.3 26 / 32
7.4 Dissemination and sustainability plan
The CROS portal will be used for publishing final technical deliverables as soon as
they have been approved. Workshops and/or webinars should be foreseen for the
presentation of results and lessons learned, and presence at relevant European
events with an Official Statistics focus should also be considered.
Given the experimental nature of the BIGD project, it would be premature to require
any commitment from the ESS to implement the project results at this stage. This
would rather be the topic of a subsequent version of the ESS Big Data Action plan and
Roadmap.
It is clear already at this stage that non-negligible resource investment across the ESS
(under the BIG DATA umbrella) would be necessary if the results of the BIGD project
are to be implemented in the sense that big data sources are integrated in the official
statistics production across the ESS.
Business case BIGD Date: 30/04/2015 Version: 1.3 27 / 32
ANNEX 1 – STAKEHOLDER ANALYSIS
Table 1: Stakeholder Identification
Stakeholder External / Internal to Eurostat Stakeholder Function
Users Supplier Other
Commission Policy DGs (CNECT, MOVE, ENTR, CLIMA, …)
European Commission X
DG DIGIT European Commission X
DG JRC European Commission X
National Statistical Offices ESS X
ESS Vision VIPs ESS X
ECB and national central banks ESCB X X
UNECE, UNSD International Organisations X X
OECD International Organisations X X
Eurostat Production Units Eurostat X X
Legal unit Eurostat X
Enterprise Architects Eurostat X
Training Eurostat X
Personal Eurostat X
LISO Eurostat X
Data protection authorities European and national authorities
X
Public data suppliers European and national authorities
X
Private Businesses Private X X X
Technology Businesses Private X
Open software community Private, not for profit X
Universities, Academia Public / private X X
Table 2: Stakeholder needs and possible roles
Stakeholder Need Possible Role
Commission Policy DGs (CNECT, MOVE, ENTR, CLIMA, …)
Need for more timely and flexible statistical data for policy definition, monitoring and evaluation.
Expressing user needs
DG DIGIT Need for planning of IT infrastructure suitable for big data processing, including hard and software requirements
Use of IT specifications for planning
Business case BIGD Date: 30/04/2015 Version: 1.3 28 / 32
DG JRC Development of new analytical methodologies and skills. Access to data sources and statistical data for scientific analysis
Contribute with scientific expertise on data analysis, IT infrastructures; Facilitate access to certain big data sources
National Statistical Offices
Improvement of statistical data production; improvement related to various quality elements, e.g. timeliness, relevance, accuracy, coherence, comparability, etc.;
Coordination in developing new production processes for deriving statistical data from big data sources; Facilitate access to big data sources;
ESS Vision VIPs Create synergies between different projects, avoid duplications or contradictions
Coordination at level of portfolio management and project level
ECB and national central banks
More timely and flexible statistical data; Access to data sources , analytical skills, methodologies, quality and metadata frameworks
Collaboration in exploring big data sources for official statistics; Facilitate access to certain big data sources
UNECE, UNSD More timely and flexible statistical data mainly for monitoring Global Development Goals; Access to data sources , analytical skills, methodologies, quality and metadata frameworks
Collaboration in exploring big data sources for official statistics; Facilitate access to certain big data sources
OECD More timely and flexible statistical data;
Expression of user needs;
Eurostat Production Units
Support in producing statistical data from new data sources;
Collaboration in exploring big data sources for official statistics;
Eurostat Legal unit Integration of use of big data sources for statistical purposes conforms with legal framework;
Collaboration in analysis of current framework and formulation of actions
Enterprise Architects
Integration of production process based on big data sources into future enterprise architecture
Collaboration in analysis of requirements and definition of new elements of enterprise architecture related to integration of big data sources into production of European statistics
Training Analysis of training needs and provision of training to build new skills
Collaboration in defining new trainings and strategy for acquiring skills
Human resources Availability of staff with appropriate skills profile
Collaboration in defining skills profile and strategies for assuring availability of staff
Business case BIGD Date: 30/04/2015 Version: 1.3 29 / 32
Eurostat’s Local Informatics Security Officer (LISO)
Assurance of data security, privacy and protection
Include LISO in related aspects of big data processing
Data protection authorities
Assurance of data security, privacy and protection; Conformance of methods with data protection and privacy laws;
Collaboration when formulating guidelines
Public authorities Use of more timely statistical data meeting quality standards of statistical offices; Limiting burden; Use of data following ethical principles and legal conditions; Maximise use of data;
Partnership in data usage and supply;
Private Businesses Use of more timely statistical data meeting quality standards of statistical offices; Reducing or limiting burden; Use of data following ethical principles and legal conditions; Sell data or statistics to statistical offices;
Partnership as data supplier and user; Statistical offices as trusted third party providing statistics;
Technology Businesses
Development of services related to big data processing;
Provision of those services
Open software community
Development of new software related to big data management, processing, analysis;
Collaboration in software development; provision of new software; enhancement of software;
Universities, Academia
Use of micro/statistical data for scientific analysis; Research in new data analysis methodologies;
Development of new data analysis, processing, storage, etc.
Business case BIGD Date: 30/04/2015 Version: 1.3 30 / 32
ANNEX 2 – RISK ANALYSIS
Experience has shown that unless data sources are chosen with care, and data access is secured prior to the launch of the activity, there is a clear risk that a big data pilot cannot be carried out – or that its scope could be severely reduced. Moreover, even if data access is guaranteed, the data may prove to be of a structure which couldn’t conceivably render any useful improvements of official statistics.
Table 3: Risk Analysis
Nr Risk Name Prob. (1-5)
Impact (1-5)
Mitigation / Measure
1 Important big data sources not accessible
(e.g. mobile phone data)
4 5 - Requiring that any proposal involving a pilot includes a commitment by the data owner to make the data accessible to the ESSnet (and an assessment of the sustainability of the data source).
- Conducting a feasibility study prior to launching any major pilot
- Consideration of alternative data sources
2 Negative public opinion 2 5 - Definition of ethical guidelines
- Definition and execution of communication strategy
- Assessment of potential risks before engaging in data processing
3 Data security breaches 2 5 - Prior privacy impact assessment and implement preventing measures
- Threshold Assessment
- Risk Identification
- Risk Mitigation
- Definition of an action plan in case of breaches
Business case BIGD Date: 30/04/2015 Version: 1.3 31 / 32
- Application of established security standards
- Monitor data processing steps and data traffic (auditable steps)
4 Data confidentiality breaches
1 5 - Prior privacy impact assessment and implement preventing measures (see data security breaches)
- Application of manuals and standards for protection of confidential data
- Agree on applicable standards for confidentiality protection before starting pilot
- Apply agreed rules and verify application
5 Unnecessary duplication / repetition of work done by other entities
3 1 - Close collaboration and communication with stakeholders
- Clarify expectations in ToRs
- Frequent review of progress
- Collect references to build on previous work before starting activity
6 Not enough resources in NSIs
2 4 - Clarify expectations in ToRs
- Verify resource allocation in proposal
- Monitor resources during project
8 Not enough involvement by Member States
2 4 - Communication at different levels of the ESS
- Only start project with sufficient support
10 Lack of availability of experts project
3 4 - Ensure participation of NSI with relevant experience
- Ensure inclusion of scientific community
11 Lack / suboptimal of coordination between work packages, specifically between pilots and horizontal topics
1 3 - Clarify expectations in ToRs of ESSnets
- Verify coordination measures in proposal
- Assure communication during ESSnet runtime
Business case BIGD Date: 30/04/2015 Version: 1.3 32 / 32
12 Unfavourable changes in EU data protection legislation
2 5 - Monitor legislative developments
- Conduct impact analysis
14 Different national (technical, economic, societal, …) conditions, impact of languages
5 1 - Analyse conditions before or during pilot execution and consider results for implementation planning
- include national modifications
- foresee monitoring of national situations in project
15 No post project implementation
1 3 - Prepare decisions well in advance in consultation with Member States