D1.2 Final DITAS architecture and
validation approach
Project Acronym DITAS
Project Title Data-intensive applications Improvement by moving
daTA and computation in mixed cloud/fog environ-
mentS
Project Number 731945
Instrument Collaborative Project
Start Date 01/01/2017
Duration 36 months
Thematic Priority
Website:
ICT-06-2016 Cloud Computing
http://www.ditas-project.eu
Dissemination level: Public
Work Package WP1 Requirements, Architecture and Validation Approach
Due Date: M24
Submission
Date:
17/01/2019
Version: 1.0
Status Final for submission
Author(s): Maya Anderson (IBM), Ety Khaitzin (IBM), Aitor Fernández
(IDEKO), Borja Tornos (IDEKO), José Antonio Sánchez Murillo
(Atos), Alexandros Psychas (ICCS), Achilleas Marinakis
(ICCS), Vrettos Moulos (ICCS), Grigor Pavlov (CS), Sergey
Miroshnikov (CS), Frank Pallas (TU-Berlin), Max-R. Ulbricht (TU-
Berlin), Sebastian Werner (TU-Berlin), Mattia Salnitri (POLIMI),
Giovanni Meroni (POLIMI), Pierluigi Plebani (POLIMI), Ana Be-
lén González Méndez (ATOS), José Antonio Sánchez (ATOS),
David García Pérez (ATOS), Ilio Catallo (OSR), Andrea Mich-
eletti (OSR)
Reviewer(s) Cinzia Cappiello (POLIMI), David García Pérez (ATOS)
This project has received funding by the European Union’s Horizon
2020 research and innovation programme under grant agreement
No. 731945
© Main editor and other members of the DITAS consortium
2 D1.2 Final DITAS architecture and validation approach
Version History
Version Date Comments, Changes, Status Authors, contributors, reviewers
0.1 08/10/2018 Initial Version Maya Anderson (IBM)
0.2 09/11/2018 Added state of the art for
goal modelling
Mattia Salnitri (POLIMI)
0.3 13/12/2018 Filed in table components
DS4M, DURE, DUE@VDM,
DUE#VDC, DU, SLA Man-
ager, Deployment Engine
Mattia Salnitri (POLIMI)
Giovannin Meroni (POLIMI)
José Antonio Sánchez (ATOS)
0.4 14/12/2018 Added the V&V Section Borja Tornos (IDEKO), Aitor Fer-
nández (IDEKO),
0.5 19/12/2018 Added Market Analysis, SLA
State of the Art
Ana Belén González Méndez
(ATOS)
David García Pérez (ATOS)
0.6 20/12/2018 Added requirements annex,
components Annex, and
questionnaire Annex
Maya Anderson (IBM), Ety
Khaitzin (IBM), José Antonio
Sánchez Murillo (Atos), Alexan-
dros Psychas (ICCS), Achilleas
Marinakis (ICCS), Vrettos Mou-
los (ICCS), Grigor Pavlov (CS),
Sergey Miroshnikov (CS), Frank
Pallas (TU-Berlin), Max-R. Ulbricht
(TU-Berlin), Sebastian Werner
(TU-Berlin), Mattia Salnitri (PO-
LIMI), Giovanni Meroni (POLIMI),
Pierluigi Plebani (POLIMIJosé
Antonio Sánchez (ATOS), David
García Pérez (ATOS), Ilio
Catallo (OSR)
0.7 21/12/2018 Update to the architecture
section, executive summary
and conclusions
Maya Anderson (IBM), Ety
Khaitzin (IBM), José Antonio
Sánchez Murillo (Atos), Alexan-
dros Psychas (ICCS), Achilleas
Marinakis (ICCS), Vrettos Mou-
los (ICCS), Grigor Pavlov (CS),
Sergey Miroshnikov (CS), Frank
Pallas (TU-Berlin), Max-R. Ulbricht
(TU-Berlin), Sebastian Werner
(TU-Berlin), Mattia Salnitri (PO-
LIMI), Giovanni Meroni (POLIMI),
Pierluigi Plebani (POLIMIJosé
Antonio Sánchez (ATOS), David
García Pérez (ATOS), Ilio
Catallo (OSR)
0.8 22/12/2018 Reviewed version Cinzia Cappiello (POLIMI), Da-
vid García Pérez (ATOS)
© Main editor and other members of the DITAS consortium
3 D1.2 Final DITAS architecture and validation approach
0.9 08/01/2019 Clean word version David García Pérez (ATOS),
Maya Anderson (IBM)
0.9 15/01/2019 Update to SotA, architecture
and requirements Annex
Maya Anderson (IBM), Ety
Khaitzin (IBM), Aitor Fernández
(IDEKO), Borja Tornos (IDEKO),
José Antonio Sánchez Murillo
(Atos), Alexandros Psychas
(ICCS), Achilleas Marinakis
(ICCS), Vrettos Moulos (ICCS),
Grigor Pavlov (CS), Sergey
Miroshnikov (CS), Frank Pallas
(TU-Berlin), Max-R. Ulbricht (TU-
Berlin), Sebastian Werner (TU-
Berlin), Mattia Salnitri (POLIMI),
Giovanni Meroni (POLIMI), Pier-
luigi Plebani (POLIMI), Ana Be-
lén González Méndez (ATOS),
José Antonio Sánchez (ATOS),
David García Pérez (ATOS), Ilio
Catallo (OSR), Andrea Michel-
etti (OSR)
1.0 17/01/2019 Fixes to text around all sec-
tions. Quality check. Docu-
ment ready for submission.
Maya Anderson (IBM), Pierluigi
Plebani (POLIMI), David García
Pérez (ATOS), Maria Teresa Gar-
cía González (ATOS)
© Main editor and other members of the DITAS consortium
4 D1.2 Final DITAS architecture and validation approach
Contents
Version History ................................................................................................................. 2
List of Figures ................................................................................................................... 6
List of tables ..................................................................................................................... 7
Executive Summary........................................................................................................ 8
1 Introduction ........................................................................................................... 10
1.1 Structure of the Document .......................................................................... 10
1.2 Glossary of Acronyms ................................................................................... 11
2 Update to the State of the Art ............................................................................ 13
2.1 Data Delivery in Fog Computing................................................................. 13
2.2 Data Management in Fog Computing ...................................................... 15
2.2.1 Data utility ............................................................................................... 15
2.2.2 Security and privacy mechanisms ....................................................... 16
2.3 Data as a Service .......................................................................................... 17
2.3.1 Interface description language ........................................................... 17
2.3.2 Goal models for data and computation movement ....................... 18
2.3.3 Software-Level Agreement ................................................................... 19
3 Update to Market Analysis .................................................................................. 21
3.1 Market Overview ........................................................................................... 21
3.1.1 Market Segmentation ............................................................................ 22
3.2 Applications with a Fog Computing approach ........................................ 22
3.2.1 Connected Vehicles .............................................................................. 23
3.2.2 Smart Cities.............................................................................................. 23
3.2.3 Connected Healthcare ........................................................................ 23
3.2.4 Smart Manufacturing ............................................................................. 24
3.3 Use Cases Market Study ............................................................................... 24
3.3.1 e-Health ................................................................................................... 24
3.3.2 Industry 4.0............................................................................................... 30
3.4 Market context questionnaire ..................................................................... 38
3.4.1 Characterization of interviewees ......................................................... 38
3.4.2 Summary of Questionnaires and interviews conducted .................. 39
4 Update to the Business and Technical Requirements ..................................... 43
5 DITAS Architecture ................................................................................................ 45
5.1 DITAS roles ....................................................................................................... 46
5.2 DITAS-SDK Architecture ................................................................................. 47
5.3 Execution Environment Architecture .......................................................... 50
5.4 VDC Architecture .......................................................................................... 51
© Main editor and other members of the DITAS consortium
5 D1.2 Final DITAS architecture and validation approach
5.4.1 Common Accessibility Framework ...................................................... 52
5.4.2 Data Processing ..................................................................................... 53
5.4.3 Data Access Layer ................................................................................. 53
5.4.4 Other VDC Components ...................................................................... 54
5.5 VDM Architecture .......................................................................................... 54
5.6 VDC and VDM integration ........................................................................... 56
6 Detailed Technical Verification and Validation Approach ........................... 58
6.1 Requirements traceability ............................................................................ 58
6.1.1 Requirements as user stories ................................................................. 59
6.1.2 Acceptance criteria .............................................................................. 59
6.2 Verification methodology ............................................................................ 60
6.2.1 Unit tests ................................................................................................... 60
6.2.2 API validation test .................................................................................. 61
6.2.3 Integration Tests ...................................................................................... 62
6.3 Validation methodology .............................................................................. 62
6.3.1 Component level requirements validation ........................................ 62
6.3.2 Framework level validation ................................................................... 63
6.3.3 Validation against project objectives ................................................. 65
7 Conclusions ............................................................................................................ 66
8 References ............................................................................................................. 67
ANNEX 1: DITAS Business and Technical Requirements .......................................... 73
WP1 – Requirement, Architecture and Validation Approach ........................... 73
Technical Requirements ...................................................................................... 73
WP2 - Enhanced data management ................................................................... 77
WP3 - Data virtualization.......................................................................................... 82
Business Requirements .......................................................................................... 82
Technical Requirements ...................................................................................... 84
WP4 - Execution environment ................................................................................. 94
Business Requirements .......................................................................................... 94
Technical Requirements ...................................................................................... 95
WP5 - Real world case studies and integration .................................................. 103
IDEKO Use Case Requirements ......................................................................... 103
IDEKO Use Case Application level Requirements .......................................... 107
OSR Use Case Requirements ............................................................................. 108
OSR Use Case Application level Requirements .............................................. 111
Objective to WP Traceability Matrix .................................................................... 115
ANNEX 2: DITAS Components ................................................................................... 118
Virtual Data Container........................................................................................... 118
© Main editor and other members of the DITAS consortium
6 D1.2 Final DITAS architecture and validation approach
Virtual Data Manager ............................................................................................ 123
DITAS SDK ................................................................................................................. 124
ANNEX 3: DITAS Technical Questionnaire ............................................................... 127
ANNEX 4: DITAS market context questionnaire ...................................................... 131
List of Figures
Figure 1: Size of Fog computing market opportunity by vertical market, 2019 and
2022 ................................................................................................................................ 21
Figure 2: Fog Market Segmentation .......................................................................... 22
Figure 3: Disruptive technologies in Healthcare ...................................................... 26
Figure 4.Healthcare supply chain .............................................................................. 27
Figure 5.HIE system ....................................................................................................... 28
Figure 6. Global Industry 4.0 Market .......................................................................... 30
Figure 7. Growth in revenue attributable to Industry 4.0 per industry sector ....... 31
Figure 8. Annual investments in Industry 4.0 per industrial sectors ........................ 32
Figure 9. Nine Technologies transforming Industrial production ............................ 32
Figure 10. The new Industry 4.0 stakeholders ecosystem ........................................ 34
Figure 11. Mindsphere by Siemens............................................................................. 34
Figure 12. Architecture using Azure IoT ..................................................................... 35
Figure 13. Google Cloud IoT Edge workflow ............................................................ 36
Figure 14. AWS IoT architecture .................................................................................. 36
Figure 15. IBM Watson Architecture ........................................................................... 37
Figure 16. Characterization of the organizations .................................................... 39
Figure 17. Interviewees’ roles ...................................................................................... 39
Figure 18. Difficulties managing data ....................................................................... 41
Figure 19. The conceptualization of Virtual Data Container ................................. 46
Figure 20. VDC Blueprint Lifecycle ............................................................................. 48
Figure 21. DITAS SDK Architecture .............................................................................. 49
Figure 22. DITAS SDK Resolution Engine Architecture and component interaction
........................................................................................................................................ 50
Figure 23. DITAS Execution Environment for several deployments of the same
blueprint ........................................................................................................................ 51
Figure 24. High-level view of the VDC ....................................................................... 52
Figure 25. High-level view of the DAL ........................................................................ 54
Figure 26. High-level view of the VDM ....................................................................... 55
Figure 27. High-level view of the VDM and VDM integration ................................ 57
© Main editor and other members of the DITAS consortium
7 D1.2 Final DITAS architecture and validation approach
Figure 28: Requirements Traceability Matrix for WP2 ............................................... 58
Figure 29: Measurements criteria vs WP .................................................................... 59
Figure 30: Software verification tests ......................................................................... 60
Figure 31: Component validation flow...................................................................... 63
Figure 32: Business requirements for the Industry 4.0 use case .............................. 64
Figure 33: Technical requirements for the Industry 4.0 use case ........................... 64
Figure 34: Validation against use cases flow............................................................ 64
Figure 35: Objective 1 fulfillment ................................................................................ 65
Figure 36: Validation against project objectives ..................................................... 65
Figure 37. Average rank per requirement for technical questionnaire .............. 129
Figure 38. Average rank per parameter for technical questionnaire ................ 130
List of tables
Table 1. Acronyms ........................................................................................................ 12
Table 2: Classification of Industry 4.0 Stakeholders ................................................. 38
Table 3: Fields to be fulfilled by the requirements of DITAS. ................................... 44
© Main editor and other members of the DITAS consortium
8 D1.2 Final DITAS architecture and validation approach
Executive Summary
The final DITAS architecture document includes an update to market analysis,
update to the business requirements, detailed project architecture and a de-
tailed plan for verification and validation.
As described in the initial architecture document with market analysis, state of
art refresh and validation approach D1.1 (D1.1, 2017), DITAS aims to address the
complexity of developing and deploying data intensive applications for the fu-
ture computing platforms that would span the Cloud and the Edge. DITAS pro-
vides data access abstraction for the application designer, application devel-
oper, and the application operator, so each can focus his time on his objectives
and avoiding stepping beyond his skill set and expertise.
DITAS consortium has performed an updated Market Analysis and it is not as gen-
eral as the one presented in the deliverable D1.1 (D1.1, 2017). In this document
we present a more focused Market Analysis, specially looking at possible markets
that can become natural users of a future DITAS platform. Also, with the help of
our use case partners, we do a more in-depth study of DITAS in both Industry 4.0
and eHealth business scenarios.
For the State of the Art, now that the first release of DITAS has been launched
and evaluated, it is clearer to see where DITAS is pushing the envelope. We think
that DITAS advances the State of the Art in 3 different scenarios all related with
the data lifecycle in a Fog environment: Data Delivery, Data Management and
Data as a Service. Section 2 of the document offers a more in-depth review of
this aspect.
In this updated version of the document Section 4, we have revised the require-
ments collected in the first period and added new ones, according to the up-
dated version of the architecture. A new technical questionnaire was passed to
external entities, asking people to rank the project basic requirements as well as
the parameters that drive the data and computation movement process. We
also put emphasis on the traceability of the requirements, extending their table
in order to provide information about how to test and fulfil each one requirement.
When implementing the first prototype of the DITAS platform, we evolved the
platform architecture. More advanced blueprint lifecycle was developed and in
order to accommodate the advanced flows in the e-Health use case with pri-
vacy concerns, a DAL layer was added to the VDC.
In the Architecture section, we define the actors involved in the architecture,
then give an overview of the components for designing, deploying and manag-
ing Virtual Data Containers (VDC), divided between two parts: the DITAS-SDK
concerning the definition and the retrieval of a VDC, and the DITAS Execution
Environment (DITAS-EE) that manages the execution of the VDC as well as the
data and computation movements.
In order to validate the DITAS framework we define a detailed technical verifica-
tion and validation approach on this document, where we describe this process
using the different type of tests the consortium is applying during the develop-
ment. The basis of this process are the requirements, and this section also de-
scribes how we track the project requirements using different traceability matri-
ces per Work Package. Furthermore, we define a process to ensure the fulfilment
of the project objectives described in the DoA (DoA, 2016), assigning each of the
objectives to Work Packages and Components in charge to fulfil them. With all
© Main editor and other members of the DITAS consortium
9 D1.2 Final DITAS architecture and validation approach
this, we ensure that the development team has covered every need of the pro-
ject and we assure that the final product and each of its components have total
stability and consistency.
© Main editor and other members of the DITAS consortium
10 D1.2 Final DITAS architecture and validation approach
1 Introduction
The final DITAS architecture document includes an update to market analysis,
update to the business requirements, detailed project architecture and a de-
tailed plan for verification and validation.
As described in the initial architecture document with market analysis, state of
art refresh and validation approach D1.1 (D1.1, 2017), DITAS aims to address the
complexity of developing and deploying data intensive applications for the fu-
ture computing platforms that would span the Cloud and the Edge. DITAS pro-
vides data access abstraction for the application designer, application devel-
oper, and the application operator, so each can focus his time on his objectives
and avoiding stepping beyond his skill set and expertise.
The overall objective of this document is to identify the requirements of the whole
project, and of its specific components, to outline the system architecture and a
common vision of the project feature set and functionality, and to define the
technical verification and validation approach. This is done in four steps.
First, we present a summary of the state of the art analysis of the technologies
that are used, and we focus on the main DITAS innovations. In addition, we in-
vestigate the state of the market in the area of fog computing and the two use
cases: e-Health and Industry 4.0, and the relevant trends. The state of the art and
market trends help to estimate risks at the technology improvement and to iden-
tify the main innovation domains DITAS can exploit.
Second, we detail both the business and the technical requirements for DITAS
components and for the project architecture; the requirements help ensure that
DITAS addresses both functionality and quality needs of the potential customers.
These requirements capture not only functional and non-functional aspects, but
also performance, security, privacy, interoperability, availability, reliability, main-
tainability, evolvability and extensibility. Special attention is given to the tracea-
bility of the requirements.
Third, we outline the overall DITAS architecture, its main components and flows.
We describe the roles in DITAS and describe the architecture using these roles,
which allow separation of concerns. The initial architecture has been revised and
elaborated since D1.1 (D1.1, 2017) based on conclusions from building the first
DITAS prototype.
Fourth, we describe the methodology using which we will analyze the case stud-
ies and perform the technical verification and validation of the DITAS compo-
nents and of the DITAS platform as a whole.
1.1 Structure of the Document
This document is arranged similarly to the deliverable D1.1 (D1.1, 2017): following
the first introductory sections, Section 2 includes the update to the state of art.
Section 3 presents the results of the market analysis, which reviews the current
state of practices regarding tools and methods used in Industry to manage data
in Fog Computing, and in particular in e-Health and Industry 4.0. This review also
provides the necessary basis for section 5 for understanding how the architecture
of the DITAS framework is shaped in order to increase the possibilities of adoption
by industrial players. Section 4 and Annex 1 (in more detail) introduce the busi-
ness and technical requirements collected through a survey of possible organi-
zations that could end up using DITAS technologies (the questionnaire can be
© Main editor and other members of the DITAS consortium
11 D1.2 Final DITAS architecture and validation approach
found in Annex 3 and 4). Section 5 describes the architecture and Annex 2 lists its
various components with their relationships to tasks in work packages. Section 6
defines the approach to technical verification and validation of the DITAS archi-
tecture. Section 7 concludes with the summary of the document and the next
steps of the DITAS project following the final architecture document delivery.
1.2 Glossary of Acronyms
All deliverables will include a glossary of Acronyms of terms used within the doc-
ument.
Acronym Definition
AI Artificial Intelligence
AM Additive Manufacturing
API Application Programming Interface
CAF Common Accessibility Framework
CAGR compound annual growth rate
CAM Connected Asset Manager
CI Continuous Integration
CPU Central Processing Unit
D Deliverable
DAL Data Access Layer
DBMS Database Management System
DIA Data Intensive Application
DNS Domain Name System
DoA Description of Action
DS4M Decision System for Movement
DUE Data Utility Evaluator
DUR Data Utility Resolution
DURE Data Utility Resolution Engine
EC European Commission
EE Execution Environment
EHC electronic health record
GB Gigabyte
GDP Gross Domestic Product
GDPR Global Data Protection Regulation
GPU Graphics Processing Unit
HIE Health Information Exchange
HIPAA Health Insurance Portability and Accountability Act
© Main editor and other members of the DITAS consortium
12 D1.2 Final DITAS architecture and validation approach
Acronym Definition
HMI Human-Machine Interface
HTTP Hypertext Transfer Protocol
IACS Industrial Automation and Control Systems
ICT Information and Communications Technology
IIoT Industrial Internet of Things
IoT Internet of Things
JDBC Java Database Connectivity
JSON JavaScript Object Notation
KDM Knowledge Discovery Meta-Model
MED Mobile Edge Computing
OAS OpenAPI specification
PaaS Platform as a Service
PLC Programmable Logic Controllers
QoS Quality of Service
REST Representational State Transfer
RTM Requirements Traceability Matrix
SDK Software Development Kit
SLA Service Level Agreement
SOA Service-Oriented Architecture
SQL Structured Query Language
VDC Virtual Data Container
VDM Virtual Data Manager
Table 1. Acronyms
© Main editor and other members of the DITAS consortium
13 D1.2 Final DITAS architecture and validation approach
2 Update to the State of the Art
In this updated version of this document we choose a different approach to up-
date the State of the Art, we focus on the main topic that we think DITAS is inno-
vative: the data lifecycle in an Edge, Cloud or Fog environment. The section is
divided into three main aspects of the data lifecycle: Delivery, Management and
the usage of data as a Service.
2.1 Data Delivery in Fog Computing
One of the main relevant advantages in adopting Fog Computing (Bonomi,
Milito, Zhu, & Addepalli, 2012; Byers, 2017; Varshney & Simmhan, 2017) concerns
the improvement in the data delivery through an active role of the edge side. In
fact, Fog computing advocates a prominent usage of computation of the edge
devices, i.e. where the data are generated (FOG - Fog Computing and
Networking Architecture Framework, 2018). This results in a reduced amount of
data to be sent to the cloud resources where, in this way, less data is stored, or
the computation can be finalized to return with a lower response time a result to
the final user. Although a lot of effort has been done in the community to optimize
the computation and the data delivery from the edge to the cloud (Mouradian,
et al., 2018), one of the goal of DITAS to improve the data-intensive applications
is to investigate also how the data delivery in other way around (from the cloud
to the edge) can be improved (Bermbach, et al., 2017).
In particular, Information Logistics has been considered in DITAS to properly or-
ganize the data delivery to the final users. According to the classification pro-
posed in (Michelberger, Andris, Girit, & Mutschler, 2013), we are interested in user-
oriented Information Logistics: i.e., the delivery of information at the right time,
the right place, and with the right quality and format to the user (D’Andria, et al.,
2015). As a consequence, user requirements can be defined in terms of func-
tional aspects, i.e., content, and non-functional ones, i.e., time, location, repre-
sentation, and quality.
Based on these assumptions, in the context of DITAS data delivery has been con-
sidered in a service-oriented architecture (SOA) where at the provider’s side
data could be stored in different formats on the cloud or on the premises of the
provider (the edge). Data can be organized in databases (relational or sche-
maless) or they are generated on-the-fly and transmitted through streams (Qin,
et al., 2016). Furthermore, as the data provider can offer the owned data as they
are or after a processing, this computation can be distributed among the nodes
belonging to the provider and the consumer (Verma, Yadav, Motwani, Raw, &
Singh, 2016).
In this context, data movement holds a crucial role, as methods and techniques
able to move the data from the provider to the consumer in order to satisfy the
consumer needs in terms of functional and non-functional properties are not fully
studied in the literature. In fact, most of the existing work considers the data flow
in a controller environments where fog nodes are devices with specialized ele-
ments (e.g., GPU, CPU, and RAM) (Dey & Mukherjee, 2018) and computation and
data are properly distributed to reduce latency (Verma, Yadav, Motwani, Raw,
& Singh, 2016), energy consumption (Duy La, Ngo, Dinh, Quek, & Shin, 2018), re-
source utilization (Lai, Song, Hwang, & Lai, 2016), data size (Al-Doghman,
Chaczko, & Jiang, 2017). Goal of DITAS is to focus on a broader environment in
which the providers and consumers belong to different organizations and no
© Main editor and other members of the DITAS consortium
14 D1.2 Final DITAS architecture and validation approach
control over the network is possible (Salman, Elhajj, Chehab, & Kayssi, 2018). In
this context, the literature is limited and only few approaches to distribute the
computation are provided (Pham & Huh, 2017) (Vidyasankar, 2018).
Security and privacy is uniquely relevant for a cloud-native platform such as DI-
TAS.
New regulations, such as the European General Data Protection Regulation
(GDPR), specify new and challenging data governance requirements for data-
intensive platforms and applications. (Bertino & Ferrari, 2018) and (Colombo &
Ferrari, 2018) broadly describe the current research in the field of Big Data secu-
rity and privacy. Specifically, when providing access to the data, the regulations
require to take into account new concepts such as the consent given by the
individual who provided the data, known as the data subject, and the usage of
the data, known as data usage purpose.
Existing access control tools either use compliance checks that do not com-
pletely match the new and complex requirements that GDPR introduces or are
limited in their scalability. Most of the existing solutions apply a coarse-grained
protection, protecting access to a data object. Tools that provide fine-grained
compliance at the granularity of specific cells, do so by either making decision
for each row separately (Thi, Si, & Dang, 2018), and thus are limited in their scala-
bility in the data lake, or by creating static views (Martínez, Fouche, Gérard, & J.,
2018) for each possible scenario, a solution that will not work for a wide set of
request attributes with multiple possible values.
To address described issues, as part of our work on DITAS, we have developed a
technique for efficient privacy policy enforcement that takes into account data
subject consent while allowing analytics on large scale data. In (Khaitzin, Shlomo,
& Anderson, 2018) we give an overview of the technique, in which we add a pre-
computation phase, in which we compile the policies and parts of the supple-
mentary data (e.g. consents, profiles), keeping only the parts relevant to the pol-
icy decisions. Thus, we obtain a compiled representation that can be efficiently
used during query-time. The result is stored as close to the data as possible in an
accessible form. We use this technique in Privacy Enforcement Engine.
With the enactment of the GDPR, the need for formal audits and vendor certifi-
cation such as EuroCloud1 or Cloud Security Alliance2 have become relevant for
cloud service provider. Research in this area, especially with regards to improving
transparency and accountability of cloud provider has become more critical.
However, such approaches are not yet suitable for the advent of Fog and Edge
Computing (Bermbach, et al., 2017). These dynamic environments are not as
easy audited, especially for moving devices.
Therefore, other technologies are needed to establish trust and transparency in
these environments. Including trusted computing (Sadeghi & Stüble, 2004), “real-
time auditing” (Ko, Lee, & Pearson, 2011) (Ullah, Ahmed, & Ylitalo, 2013)
(Doelitzscher, et al., 2012) but also monitoring approaches such as (Alcaraz
Calero & Aguado, 2015), (Sharma, Chatterjee, & Sharma, 2013) or the one
1 EuroCloud StarAudit, https://staraudit.org/
2 Cloud Security Alliance, Security, Trust & Assurance Registry Certification,
https://cloudsecurityalliance.org/star/certification/
© Main editor and other members of the DITAS consortium
15 D1.2 Final DITAS architecture and validation approach
established in DITAS.
Further techniques such as multi-party computation (Furukawa, Lindell, Nof, &
Weinstein, 2017), secret splitting (Shamir, 1979) as well as property preserving en-
cryption (Pallas & Grambow, 2018) are also possible ways to aid in GDPR compli-
ance computationally.
Additionally, concepts like “Sticky Policies” (Pearson & Mont, 2011), “Distributed
Usage Control” (Pretschner, Hilty, & Basin, 2006) also fit the DITAS context. Further-
more, advanced consent control strategies such as presented in (Ulbricht &
Pallas, 2018) also offers ways to support GDPR compliant applications.
Besides these techniques, research for some practical implementation such as
the apps presented by Lodge et al. (Lodge, Crabtree, & Brown, 2018), as well as
edge-access control management presented by Werner et al. (Werner, Pallas,
& Bermbach, 2017) shows potential for the security and privacy practices in DI-
TAS.
All of these approaches introduce tradeoffs, that have to be analyzed and eval-
uated, for example (Pallas & Grambow, 2018) showed the performance penal-
ties of privacy-preserving databases. A platform such as DITAS needs to offer
means to select the appropriate tradeoff for each use-case and offer guidance
for a data administrator to select the best fitting technology.
2.2 Data Management in Fog Computing
Data management in DITAS aims to suggest and provide to the application de-
signer the most suitable data set considering the application and user require-
ments. Requirements are related to data utility and security and privacy aspects.
The following sections discuss the existing contributions in data utility and security
research areas and highlight the innovative aspects of the DITAS approach.
2.2.1 Data utility
In the first period of DITAS the concept of Data Utility has been introduced and
defined as “the relevance of a data set for the usage context” (Cappiello,
Pernici, Plebani, & Vitali, 2017) where the context includes the application re-
quirements and the resources used to host the data source. Such a definition was
proposed considering previous literature contributions that were using the term
Data Utility. In fact, the concept of Data Utility has been used in several contexts.
For the general IT context, Data Utility has been defined by (Kock, 2007) consid-
ering both the relevance of a piece of information to the context and the capa-
bility of such piece of information to reduce uncertainty. In the business scenario
Data Utility has been instead defined as the business value attributed to data
within specific usage contexts (Syed & Syed, 2008). A more complex definition
has been provided in the statistics context by (Hundepool, et al., 2012): Data
Utility is “a summary term describing the value of a given data release as an an-
alytical resource. This comprises the data's analytical completeness and its ana-
lytical validity”. All these definitions agree on the fact that the utility of a data set
depends on the context in which data are used.
An important characteristic of the Data Utility concerns its variation with respect
to the specific goal of the data analysis. As an example, Data Utility is often an-
alyzed for data mining applications (Lin, Wu, & Tseng, 2015) and defined consid-
ering the different data mining techniques. In this research area the tradeoff be-
tween data utility and data privacy is often considered. In fact, in order to
© Main editor and other members of the DITAS consortium
16 D1.2 Final DITAS architecture and validation approach
guarantee data privacy, data anonymization techniques have to be applied:
hiding data values influences the effectiveness of data mining algorithms. (Han,
J., J., H., & J., 2017) proposes an anonymization method that is able to guarantee
higher utility, i.e., better classification accuracy. A method able to accomplish a
good balance between privacy and utility in the context of association rule was
proposed in (Kalyani, V. P. Chandra Sekhara Rao, & Janakiramaiah, 2017). More-
over, Data Utility might be influenced by the quality of service and the quality of
data. For instance, the relation between Data Utility and quality of service has
been investigated in (Wang, Zhu, Bao, & Liu, 2016), which discusses Data Utility
with a focus on energy efficiency of mobile devices in a mobile cloud-oriented
environment. The issue of energy efficiency for discovering interrelations be-
tween the evaluation of the data value and the effectiveness of run-time adap-
tation strategies has been discussed in (Ho & Pernici, 2015). Similarly, the influence
of data quality on Data Utility is considered in (Moody & Walsh, 1999), where rel-
evant quality dimensions (e.g., accuracy and completeness) are considered in
relations with Data Utility. Note that data quality (and thus data utility) assessment
depends on the type of data and on the type of application. The relationship
between data quality and data mining algorithms has been analyzed by (Blake
& Mangiameli, 2011). Later, (Even, Shankaranarayanan, & Berger, 2010) has fo-
cused on the impact of the main four data quality dimensions (accuracy, com-
pleteness, consistency and timeliness) on clustering algorithms. The study high-
lights that the consistency, completeness and accuracy issues are the ones that
negatively impact on the results effect on the results.
Finally, Data Utility has been also analyzed with respect to the relation between
IT and business, and this has paved the way for associating Data Utility to business
processes. In this context, Data Utility is defined as a measurement of the gain
obtained by using a dataset inside an organization (Even, Shankaranarayanan,
& Berger, Inequality in the utility of customer data: implications for data
management and usage, 2010). Moreover, (Giorgini, Mylopoulos, Nicchiarelli, &
Sebastiani, 2003) discusses which are the information quality requirements in or-
der to obtain reliable results from the execution of business processes.
2.2.2 Security and privacy mechanisms
As described above most security and privacy mechanisms introduce some sort
of tradeoffs regarding system performance in general and non-functional re-
quirements like GDPR compliance or transparency concerning data processing
and flow of data/information. To handle these tradeoffs properly and being able
to make informed decisions regarding the development and /or integration of
specific privacy and security mechanisms into new systems, knowledge about
the systematic quantification of risks of possible data breaches or unintentional
data leakage is needed. Since the usage of Likert scales (e.g. from “very low” to
“very high”) to rate risk in common security and privacy impact assessments is
way to imprecise, this issue gains attention in some research communities re-
cently.
New approaches to measure the value of privacy and the efficacy of privacy
enhancing technologies (PETs) (Halunen & Karinsalo, 2017) and valuable “Sys-
tematization of Knowledge” regarding technical privacy metrics (Wagner &
Eckhoff, 2018) lead to new ways of possible quantification of privacy risks, and
therefore to better privacy impact assessments (Wagner & Boiten, Privacy Risk
Assessment: From Art to Science, 2018). The results of these assessments can be
© Main editor and other members of the DITAS consortium
17 D1.2 Final DITAS architecture and validation approach
used to make better decisions for or against the integration of a specific privacy
mechanism in order to get the right balance between system performance and
security as well as privacy requirements.
In DITAS we will evaluate these new approaches in order to get insights and in-
spirations for the development of metrics that we want to use to rank different
possible privacy and security mechanisms that can be adapted to the VDC dur-
ing the deployment phase. To be able to make an informed decision about the
choice of a specific blueprint a ranking mechanism should be used to determine
which of the available blueprints fits best the requirements of a respective appli-
cation designer.
2.3 Data as a Service
2.3.1 Interface description language
According to the DITAS architecture, the Virtual Data Container interacts with
the data-intensive applications through the Common Accessibility Framework
API, the programming model of which is REST-oriented. The data administrator is
in charge of designing the API as well as making it publicly available, via the
abstract VDC blueprint. In fact, the EXPOSED API section of the blueprint includes
all the information about the methods, through which the administrator exposes
totally or partially the data that are stored in the sources that he/she controls.
EXPOSED API is a technical section that enables the application developer to
understand how the VDC methods work and therefore to conclude whether the
VDC fits to his/her DIA from a developing point of view (D3.2, 2018).
As a result, there was a need to provide a structured description of the CAF REST-
ful API, using a specification that allows both humans and computers to discover
and understand the capabilities of each one VDC method. Some of the most
popular standards, towards that direction, are the following (Petychakis, et al.,
2014): OpenAPI specification (originally known as the Swagger specification),
which offers a large ecosystem of API tooling, has great support in almost every
modern programming languages and allows developers to test the APIs imme-
diately through easy deployment of server instances. API Blueprints, where an API
description can be used in the Apiary platform to create automated mock serv-
ers, validators etc. The Hydra specification, which is currently under heavy devel-
opment, tries to enrich current web APIs with tools and techniques from the se-
mantic web area. RAML (RESTful API Modeling Language) provides a structured,
unambiguous format for describing a RESTful API, allowing developers to describe
the API; the endpoints, the HTTP methods to be used for each one, any parame-
ters and their format, what can be expected by way of a response etc.
(Tsouroplis, et al., 2015). Since a critical business requirement in DITAS concerning
the VDC, is to have an open API so that big vendors and also new providers are
able to publish their services and components, we decided to describe the CAF
API based on the OpenAPI specification (OAS). In fact, we agree upon the ob-
jective of the OpenAPI Initiative, creating an open description format for API ser-
vices that is vendor-neutral, portable and open to accelerating the vision of a
truly connected world (Lucky, Cremaschi, Lodigiani, Menolascina, & De Paoli,
2014). Furthermore, OAS project has the largest and most active developer com-
munity on GitHub (Surwase, 2016).
However, we propose an extension of the OAS, in order to address major require-
ments of the project, thus supporting the data movement techniques that we
© Main editor and other members of the DITAS consortium
18 D1.2 Final DITAS architecture and validation approach
introduce. Existing suggestions to extend the OAS aim at enhancing actual API
descriptions by creating a simple description format to annotate properties at
semantic level to support semi-automatic composition (Lucky, Cremaschi,
Lodigiani, Menolascina, & De Paoli, 2014), or implementing the FAIR principles
(Findable, Accessible, Interoperable, Reusable), introducing additional
metadata elements beyond those included in the OAS (Zaveri, et al., 2017).
In DITAS, the structure of the abstract blueprint is method-oriented, meaning that
each one exposed VDC method is semantically described by separate tags and
has its own guaranteed levels of data quality, security and privacy. Moreover,
the rules in the form of goal trees, that are used to construct the SLA contract,
differentiate between the methods. Consequently, the extensions of the OAS
that we suggest are mainly applied to the so-called Operation Object, which in
our case corresponds to the VDC method. Indicatively, the data administrator,
through the definition of the extended operation, must specify among others the
data sources that the VDC method accesses as well as the schema of the data,
included in the response payload. This kind of information is necessary for the
platform in order to decide which portion of data to move, given the specific
method selected by the application designer.
2.3.2 Goal models for data and computation movement
In DITAS the decision of where and when move data and computation is based
on an ad-hoc extension of a goal-based modelling language. Goal models rep-
resent sets of objectives (i.e., goals) organized in a tree structure where the root
is the main objective and the leaves are the refined sub-objectives. In particular,
we used the goal model structure to specify the requirements of application de-
signers, we then enriched the language to support the decision of which data
movement or computation movement to enact in case the requirements of the
user are no more satisfied. For more information on such approach please refer
to Deliverable 2.2 (D2.2, 2018).
A great variety of analyses techniques have been proposed for analyzing goal
models for this purpose (Horkoff & Yu, Interactive goal model analysis for early
requirements engineering, 2016). The satisfaction analyses propagate the satis-
faction or denial of goals forward and backward in the goal tree structure. The
forward propagation (Letier & Van Lamsweerde, 2004) (top-down) can be used
to check alternatives: if a certain goal is (not) satisfied, what are the sub-goals
that are (not) satisfied. The backward propagation (Sebastiani, Giorgini, &
Mylopoulos, 2004) (bottom-up) can be used to understand what the conse-
quences are of a satisfied or denied goal. Some satisfaction analyses mark the
goal with labels representing the level of satisfaction, for example: satisfied, par-
tially satisfied, denied or unknown (Giorgini, Mylopoulos, Nicchiarelli, &
Sebastiani, 2003; Chung, Nixon, Yu, & Mylopoulos, 2012).
Other research work uses goal models and their analysis for business intelligence
(Horkoff, et al., 2014; Amyot, et al., 2010). Goal models are enriched with metrics
that indicate values associated with the achievement of goals. For example, the
goal Sell trips of a travel agency may be associated with the metric Number of
trips sold, which indicates a close relationship between the satisfaction of the
goal and the number of trips sold by the travel agency.
Goal-based modelling languages often include contribution links that represent
positive or negative consequences (Horkoff & Yu, 2016). A contribution link that
© Main editor and other members of the DITAS consortium
19 D1.2 Final DITAS architecture and validation approach
connects two goals specifies that the achievement of a goal contributes posi-
tively the achievement of another goal. For example, in a traveling company,
the goal Advertise campaign performed can be connected to Trips sold goal
with a positive contribution link, since the achievement of the former goal will
help the achievement of the latter. Such contribution links can be used to identify
conflicts between goals and, along with metrics, they can be used to choose
the best set of goal to achieve (Amyot, et al., 2010; Horkoff, et al., 2012).
Some goal-based analyses are used for the objective of assessing alternatives for
decision making (Letier & Van Lamsweerde, 2004). This includes the design of
data intensive applications, where it is central to define the objective(s) of an
application in order to select the best data sources and the metrics to monitor.
In the context of fog computing such information and decisions can be used also
during runtime, in order to select the best data movement action to adopt when
one or more metrics reach critical values. For example, in case of applications
that uses data streams, the quality of the stream should be maximized. This can
be monitored using a metric that measures the throughput of the connection.
Whenever the value of the metric becomes lower than a certain threshold, a
data movement technique that consists in moving the source nearer the appli-
cation (for example from the cloud to the edge) may be adopted. Such repa-
ration action will bring the metric within the desired range.
2.3.3 Software-Level Agreement
To cover the Software-Level Agreement (SLA) part of DITAS we need to cover
three DITAS components, SLA Manager, Computation Movement Enactor (not
still developed) and Data Movement Enactor (already covered by other parts of
this document). The SLA Manager in DITAS is only responsible of monitoring an
SLA violation, the actions to resolve them will be taken care by the two previous
mentioned Enactors.
At the beginning of the project the main idea was to continue using WS-Agree-
ment specification (Open Grid Forum, 2014), that partners of the project have
used before, but due to the technical constraints of building an SLA Manager
lightweight enough to be at Edge nodes it was decided to do the component
from scratch, building over the goal-based model specified in the previous sec-
tion.
In the last years several works have been focusing in Service-Level Agreements
systems for Fog environments, specially focused on the computation and limited
number of resourced in the Edge. Katsalis et al (Katsalis, Papaioannou, Nikaein,
& Tassiulas, 2016) study optimization techniques for the deployment of virtual ma-
chines on a mobile-cloud environment where the number of resources is limited.
The main objective it is to select the best deployment strategy to achieve the
requirements of the applications considering aspects such as the network. Taleb
et at (Taleb, Dutta, Ksentini, Iqbal, & Flinck, 2017) employs the concept of Mobile
Edge Computing (MEC) to enable application achieve its QoS, allowing the ap-
plication to access anywhere any data with reduced latency, everything con-
trolled under a complete SLA and Monitoring system. Yin et al (Yin, Cheng, Cai,
& Cao, 2017) present a similar work to the one of Katsalis, where it is necessary to
manage limited resources on Cloud-Edge environments, but they do this also
considering that the applications execute time-sensitive jobs.
© Main editor and other members of the DITAS consortium
20 D1.2 Final DITAS architecture and validation approach
DITAS SLA manager focused more on quality of data aspects that we think it is
part of the uniqueness of DITAS with respect to another Cloud-Edge environment.
In the literature we found few work related to this topic.
© Main editor and other members of the DITAS consortium
21 D1.2 Final DITAS architecture and validation approach
3 Update to Market Analysis
3.1 Market Overview
Nowadays we experience tremendous success of IoT paradigm resulting in ap-
proximately 20 billion devices constantly producing data - data that must be
eventually processed and stored. Gartner3 predicts that there will be 20.4 billion
IoT devices installed by the end of 2020, not including computers and
smartphones, and in three years it is expected that there are up to 50 billion de-
vices at the edge of the network. Most IoT deployments face challenges related
to latency, network bandwidth, reliability and security, and cannot be addressed
in traditional cloud models, because of that, it is easier to process the data where
they are produced - namely at the edge of the network.
The rapid adoption of IoT is seen by the companies as an opportunity for data-
driven businesses, and a combination of Cloud and Edge Computing is becom-
ing the architecture accepted to approach the challenges related to the inte-
gration of data from multiple sources and to process data in motion and at rest.
The Fog Computing market opportunity is expected to achieve $18bn by 20224
growing from $1.032bn in 2018, in which the most potential markets are in en-
ergy/utilities and transportation, followed by Healthcare and industrial markets.
However, each market will adopt new Edge solutions, standards and products
at its own pace facing their markets’ own barriers.
Figure 1: Size of Fog computing market opportunity by vertical market, 2019 and 2022
3 https://www.gartner.com/en
4 https://www.openfogconsortium.org/wp-content/uploads/451-Research-report-on-5-
year-Market-Sizing-of-Fog-Oct-2017.pdf
© Main editor and other members of the DITAS consortium
22 D1.2 Final DITAS architecture and validation approach
3.1.1 Market Segmentation
Fog Computing market is segmented be-
tween Solutions and Applications. Solutions
includes hardware, software/applications
and services around this technology:
Hardware: edge devices with capacity to
participate in Fog system (connectivity,
application software, computing hard-
ware, etc.).
Software/application: Software and appli-
cations that give capacity to devices to
connect and communicate with other de-
vices securely, IoT applications to reliable
integrate IoT sensors and the cloud, appli-
cations to distribute data flow between
Cloud and Edge, etc.
Services: New business models are arisen,
such as Fog-as-a-service, where the ven-
dor leases an outcome (hardware/soft-
ware/services) to an end customer.
The Fog market solutions is split into hardware components (51%), application
software (Fog-enabled-analytics-19,9%) and Fog services (15.7%)5. 451 Research
forecasts that hardware percentage will decrease over time and different Fog
services and application software will emerge.
On the other hand, several applications of vertical markets using Cloud and IoT
technologies and with the need of real time data can benefit of the elastic re-
sources at the edge that offers Fog computing. The next section explores these
applications and how they take advantage of Fog computing.
3.2 Applications with a Fog Computing approach
The ideal applications that use Fog computing approach are those that require
intelligence near the edge, run in dispersed areas with poor connectivity or cre-
ate a large amount of data impossible to stream to the cloud, or manage many
connected sensors.
According to the OpenfogConsortium6, there are four applications in different
sectors, in which Fog Computing brings multiple benefits to process data in real
time.
Next, we briefly describe those applications that potentially use Fog Computing.
5 https://www.openfogconsortium.org/wp-content/uploads/451-Research-report-on-5-
year-Market-Sizing-of-Fog-Oct-2017.pdf
6 https://www.openfogconsortium.org
Figure 2: Fog Market Segmentation
© Main editor and other members of the DITAS consortium
23 D1.2 Final DITAS architecture and validation approach
3.2.1 Connected Vehicles
Transportation sector has a potential to reach $3.2B by 20227, being the second-
largest potential market for Fog Computing. Transportation applications have
key characteristics to use Fog Computing, such as mobility, intermittent connec-
tivity and real time responses need and the most emerging application in this
sector is autonomous/connected cars.
According to Forbes8, is expected 20M self-driving cars in the roads, and by 2030
it estimates that one in four cars will be an autonomous car. This means a consid-
erable generation of data to process. Intel9 predicts that by 2020 each autono-
mous car will generate more than 4.000 GB per day. But not all of this amount of
data need to be processed in the Cloud, so Fog Computing can provide a solu-
tion with a set of network resources upon connected car can run their needs of
computation and storage. Fog architecture will improve efficiency, perfor-
mance, bandwidth, speed and reliability to connected cars.
3.2.2 Smart Cities
According to a recent report, global smart city market is expected to reach $2.7B
globally by 2024, growing at a CAGR of around 16% between 2018 and 202410.
Smart cities powered by IoT technology promise to transform the way we live so
far, but IoT must ensure manage a vast number of connected sensors in a relia-
bility and timely way, particularly for critical functions. Moreover, municipality net-
works manage sensitive citizens and traffic data and critical data for emergency
response.
Fog Computing has become the solution to help reliability of delay and data-
intensive applications developed for smart cities.
3.2.3 Connected Healthcare
Healthcare industry is one of the three largest potential opportunity for Fog Com-
puting. Connected Healthcare enable patients’ engagement and reduce
Healthcare systems costs while improving healthcare services, among other mul-
tiple benefits, and the boom of sensors, smart health devices, health apps and
IoT technologies provide the basis for it.
Fog Computing enable IoT platforms to monitor patient health variables in real
time and allowing fast responses to their needs.
The biggest challenge to face in Healthcare environment is the management of
sensitive data and the exchange of health records with a maximum level of se-
curity.
7 https://www.openfogconsortium.org/wp-content/uploads/451-Research-report-on-5-
year-Market-Sizing-of-Fog-Oct-2017.pdf
8 https://www.forbes.com/sites/oliviergarret/2017/03/03/10-million-self-driving-cars-will-
hit-the-road-by-2020-heres-how-to-profit/#641f424b7e50
9 https://www.intel.com/content/www/us/en/automotive/autonomous-vehicles.html
10 https://globenewswire.com/news-release/2018/08/23/1555932/0/en/Global-Smart-
City-Market-Will-Reach-USD-2-700-1-Billion-By-2024-Zion-Market-Research.html
© Main editor and other members of the DITAS consortium
24 D1.2 Final DITAS architecture and validation approach
3.2.4 Smart Manufacturing
Industrial Internet of Things (IIoT) has emerged as a technology to help increase
productivity performance in manufacturing. Management of data at rest and at
flight is crucial for this kind of industry, for instance, to carry out predictive mainte-
nance or stock control. Large number of sensors are located in a plant and have
to be controlled in real time and connected with other departments of the or-
ganization or other plants. A mix between Cloud computing and storage of his-
torical data and Fog computing are the technologies to integrate IIoT platforms
in manufacturing industry.
3.3 Use Cases Market Study
DITAS project validates its solutions in two use case with data-intensive applica-
tion needs: e-Health and Smart Manufacturing. The purpose of testing project
results in real scenarios is threefold: a) Validate the value of the DITAS outcomes
in the real world, and b) extract knowledge from the validation in these use cases
and provide partners’ use cases new business opportunities and c) promotion for
the sustainability approach of outcomes after the project.
In the next section, a market analysis of the vertical sectors to which use cases
belongs have been carried out. This study aims, in each case, at analyzing the
market context, stakeholders’ identification, challenges and concerns and exist-
ing comparable solutions or potential competitors.
3.3.1 e-Health
The digital transformation in all the industries is based in harnessing data, either
historical data or in real time. The adoption of IT technologies in healthcare sys-
tems is following the same pattern as other sectors, although the drivers to adopt
new technologies are different for each industry, all of them identify the ad-
vantages of fully exploiting data with new technologies.
With respect to the digital transformation of healthcare, the European Commis-
sion carried out a public consultation (European Commission, Public Consultation
on Health and Care in the Digital Single Market, 2017) (finishing in October 2017)
investigating the need for policy measures promoting digital innovation for better
healthcare in Europe. The main results of this public consultation show that over
93% of the respondents believe that “Citizens should be able to manage their
own health data”. Furthermore, 83% of all respondents either agree or strongly
agree with the statement that “Sharing of health data could be beneficial to
improve treatment, diagnosis and prevention of diseases across the EU". The
overwhelming majority of all respondents (73.6%) identify improved possibilities
for medical research as a reason for supporting cross border transfer of medical
data. Risks of privacy breaches and of cybersecurity are on the top of the lists
among the major barriers identified to the cross-border transfer of medical data
(European Commision, Synopsis Report - Consultation: Transformation Health and
Care in the Digital Single Market, 2018).
The global eHealth market is projected to reach U$132.35 billion by 2023 from
$47.60 billion in 2018, at a CAGR of 22.7% according to ReportsnReporst11.
11 https://www.reportsnreports.com/reports/1385867-ehealth-market-by-product-ehr-
pacs-vna-ris-lis-cvis-telehealth-erx-hie-patient-portal-medical-apps-services-remote-
© Main editor and other members of the DITAS consortium
25 D1.2 Final DITAS architecture and validation approach
Regarding EU, the value of the European data economy was €300 billion in 2016;
if the right legislative and policy measures are put in place, this value could grow
to up to €739 billion by 2020, 4% of the EU's GDP (European Commission, Final
results of the European Data Market study measuring the size and trens of the EU
data economy, 2017; European Commission, Data in the EU: Commission steps
up efforts to increase availability and boost healthcare data sharing, 2018). And
the primary reasons boosting digitization are:
• Improvement of patient experience: With digitalized services, patients will
have full access to their health information, and using new technologies
such as IoT of wearables, for instance, will be monitored and will receive
a personalized care. It will lead a move towards a more proactive and
prescriptive care and a patient-centric care approach.
• Expenditure reduction and improving services in Health systems and or-
ganizations: Health systems want to improve their quality healthcare ser-
vices meanwhile reducing costs, and digitization in Healthcare sector is a
step towards this goal. This technological approach of healthcare will be
needed to face the boom of chronic diseases and growing geriatric pop-
ulation in European Union.
• Appearance of new technologies: Internet of things (IoT) for healthcare is
one of the major drivers of e-Health market, and new technologies such
as Cloud computing, Big Data, and mobile wearables enable more effi-
cient and rapid ways of delivering healthcare.
In the last report published by Atos, Look Out 2020+ Industry Trends Healthcare12,
experts predict that Healthcare market will be a data-intensive field, much more
than in other vertical sectors and they agree that the technologies needed to
succeed are Cloud Computing, AI and IoT.
But the use of new technologies also brings multiple dangers, regulations and
privacy concerns, especially with the sensitive data managed in Healthcare en-
vironments. Data breaches can create high risks in patients and penalty fees with
the new General DATA Protection Regulation (GDPR) regulations. A Data breach
can cost millions (the average response and remediate could be up to $3,8 M)
accordingly to the Ponemon Institute study13.
3.3.1.1 Disruptive technologies for the future of e-Healthcare
Atos has predicted14 the 10 disruptive technologies that will impact in the future
of the Healthcare.
patient-monitoring-diagnostic-services-end-user-hospitals-home-healthcare-payers-st-
to-2023.html
12 https://atos.net/content/mini-sites/look-out-2020/healthcare/
13 https://www.ponemon.org/news-2/23
14 https://atos.net/content/mini-sites/look-out-2020/
© Main editor and other members of the DITAS consortium
26 D1.2 Final DITAS architecture and validation approach
Figure 3 depicts the 10 disruptive technologies envisioned and classified accord-
ing to their integration status in e-Healthcare systems. Some of them, such as AI,
Robotics or Augmented reality are being adopted while Hybrid Cloud is currently
considered as a mainstream with a high impact.
Figure 3: Disruptive technologies in Healthcare
Cloud services can offer security and privacy controls for health systems and
data, and cloud-based healthcare IT systems can solve issues regarding interop-
erability and integration. Moreover, cloud services enable the rapid develop-
ment for mobile and IoT and cloud computing can support AI applications.
3.3.1.2 Healthcare market stakeholders
The healthcare supply chain is comprised by providers (hospitals, clinics and phy-
sicians), payers (insurance companies, governments and regulatory bodies), dis-
tributors, manufacturers, and patients (see Figure 4).
Digitization of healthcare market has led to the communication and share of
data among the stakeholders and between each stage of the supply chain. For
healthcare systems this is highly valuable since it allows, for instance, monitoring
patients at real time and providers respond to their needs faster.
All the stakeholders in the supply chain agree that sharing data has a great im-
pact in different factors such as better and more efficient healthcare services,
improving quality of patient care, lower costs and increase revenues.
© Main editor and other members of the DITAS consortium
27 D1.2 Final DITAS architecture and validation approach
Figure 4.Healthcare supply chain15
A preliminary description of the stakeholders involved in the Healthcare supply
chain is:
Providers (Hospitals and Healthcare Systems): The EHR systems market is growing
very fast and is expected to reach $5.20B by 2021 from $3.92B in 2016 at a CAGR
of 5.8% during the forecast period16.
Contrary to other ICT-based clinical systems, EHRs adoption is increasingly ac-
cepted by hospitals and clinicians. EHRs store patient data such as radiology im-
ages, medications, historic health reports, etc. This information is used by the hos-
pital staff, but other stakeholders can use this data to improve diagnosis, for clin-
ical trials or predictive medicine, etc.
Payers (Government, Policy makers): Public Health Administration and regula-
tory bodies develop policies, invest in infrastructures, and at the same time use
ICT-based healthcare solutions.
Payer (Researchers, pharmacy industry, etc.): Data from different sources and
from a vast number of patients will improve clinical trials for pharmacy industry or
increase medical information for researchers.
Patients: Patients are the final beneficiaries of sharing data and of the insights
derived of processing and analyzing data. It is expected a high demand for
health monitoring applications, and an increasingly patient empowerment by
using their own HHR.
Distributors: Service providers play different roles in the Healthcare system. Some
of the solutions or services offered by them are:
• Secure and reliable services to store and share EHRs
• Mobile solutions to monitor and record patient information in real time
• Telemedicine services
• Digital e-Health Platforms with integrated Healthcare services
• Interoperability solutions to share information
15 https://rctom.hbs.org/submission/impact-of-digitalization-on-healthcare/
16 https://www.marketsandmarkets.com/Market-Reports/ambulatory-ehr-market-
235617627.html
© Main editor and other members of the DITAS consortium
28 D1.2 Final DITAS architecture and validation approach
• Applications focused in different diseases and pathologies, for diagnosing
or prevention
Manufacturers: They develop and deliver medical devices, smart sensors, moni-
tor devices, robots for healthcare, etc.
3.3.1.3 Challenges and concerns of Healthcare data sharing
Sharing patient health data can help the different stakeholders of the Healthcare
supply chain. For instance, hospitals can reduce costs of readmissions or avoid
medical errors and Public Health Agencies can improve population health and
perform new services for patients care.
However, Healthcare stakeholders must consider HIPAA regulations and ensure
data privacy and safety. These privacy issues are not only great concerns mainly
for patients, but also for providers: both know the benefits of sharing clinical data
but have many doubts about the data privacy. So, one of the greatest chal-
lenges to face in the adoption of e-Health is sharing clinical data among stake-
holders more effectively.
Currently, Health Information Exchange (HIE) technology provides the solution for
sharing clinical data within the organizations and the healthcare community,
and the option of the Integrated platform17 which uses open standards and
cloud-based architectures to integrate different applications and hospitals de-
partments. Moreover, open standards facilitate the connection with other hospi-
tal networks.
Figure 5.HIE system18
HIE systems help to share EHR and other patient data within hospitals and with
another health organizations, etc. They are records of clinical data of the patient
while being in the hospitals: imaging tests, bloody tests, medication, etc. But
17 https://www.optum.com/content/dam/optum3/optum/en/resources/white-pa-
pers/Sharing_Clinical_Data_White_Paper.pdf
18 http://www.atlantiscgpr.com/?page_id=20
© Main editor and other members of the DITAS consortium
29 D1.2 Final DITAS architecture and validation approach
patients have begun to collect data about health and disease with wearable-
devices, sensors and apps. This information offers new opportunities for research-
ers that want to gather and analyze this information to carry out clinical trial or
research. While EHR offer organized and structured health data, data collected
from patients is dispersed and difficult to share among health stakeholders.
Nowadays, Healthcare digitization is far from having tools to share all the infor-
mation available in the entire supply chain and create new business models sell-
ing or buying data for medical purpose to pharmacy industry or researchers. That
is the next step to fully exploit health data.
Another aspect to consider concerning healthcare data sharing is the fact that
the European Commission, in its mid-term review on the implementation of the
digital single market strategy, set out the intention to take further action in the
area (among the others) of “citizens' secure access to and sharing of health data
across borders” (European Commission, Communication on enabling the digital
transformation of health and care in the Digital Single Market; empowering
citizens and building a healthier society, 2018). Particularly the Commission will
support the eHealth Digital Service Infrastructure (European Commision, eHDSI
Mission, 2018) to enable new services for people, such as exchange of electronic
health records using the specifications of the European electronic health record
exchange format, and the use of the data for public health and research.
3.3.1.3.1 EHR and health services providers
Healthcare players are working on developing sharing health information solu-
tions, and mostly of them are focused on security and privacy for EHR sharing.
Other solutions are focused on providing different health services with different
business models.
Some existing solutions are:
▪ 4medica19: 4medica aims at securely exchange Health information in real
time.
▪ NextGen Healthcare20: Several software solutions aiming at solving in-
teroperability, health information exchange, analytics, etc. in connected
health environments.
▪ Greenway Health LLC21: Greenway Health LLC develops different services
and solutions to the health community, from sharing EHR to patient en-
gagement tools.
▪ Siemens Medical Solutions22: Siemens is a leader in integrated health solu-
tions and services and Healthcare IT.
▪ GE Healthcare23: General Electric offers A large portfolio of hardware and
software solutions and services focused on connected health, for patients
and practitioners.
19 https://www.4medica.com/
20 https://www.nextgen.com/
21 https://www.greenwayhealth.com/
22 https://www.healthcare.siemens.com/
23 https://www.gehealthcare.com/
© Main editor and other members of the DITAS consortium
30 D1.2 Final DITAS architecture and validation approach
▪ Allscripts Healthcare Solutions24: Solutions for Hospitals and Health Systems,
and an innovative solution called ePrescribe aiming at reducing medical
errors with an easy-to-use platform.
Contrary to the different existing solutions in the market to share healthcare infor-
mation, DITAS project outcomes related with e-Health use case will deliver a set
of norms and rules associated to the privacy and security management of sensi-
ble data, and data intensive applications working in this domain will use project
results to take advantage of the data and computation movements strategies.
3.3.2 Industry 4.0
The global Industry 4.0 market size is expected to grow to $310B by 2023 with a
37% CAGR from 201825, and according to “Industry 4.0 & Smart Manufacturing
2018-2023” report26, different key factors are responsible of this rapid growth: the
need of connected supply chains, data-based manufacturing processes, and
increasing availability of emerging technologies and solutions such as Blockchain
in manufacturing, Artificial Intelligence, Robotics, IIoT, Condition monitoring and
Cyber security. Additionally, by the year 2022, 64% of manufacturers predict their
factories will be totally connected through the IIoT27.
Figure 6. Global Industry 4.0 Market28
According to BCG29, the impact of Industry 4.0 can be analyzed in the four areas
in which the adoption of smart manufacturing are expected to bring more ben-
efit:
24 https://www.allscripts.com/
25 https://iot-analytics.com/industry-4-0-and-smart-manufacturing/
26 https://iot-analytics.com/product/industry-4-0-smart-manufacturing-market-report-
2018-2023/
27 https://www.zebra.com/us/en/about-zebra/newsroom/press-releases/2017/zebra-
study-reveals-one-half-of-manufacturers-globally-to-adopt-.html
28 Source: IoT Analytics-November 2018-Industry 4.0 Market Report 2018-2023
29 https://www.zvw.de/media.media.72e472fb-1698-4a15-8858-344351c8902f.origi-
nal.pdf
© Main editor and other members of the DITAS consortium
31 D1.2 Final DITAS architecture and validation approach
Productivity: The integration of data-driven manufacturing processes will in-
crease productivity, for instance, predictive maintenance will avoid downtime
and will lead to better decision-making.
Revenue Growth: The possibility of combining information in real time about plan-
ning, production, warehousing and transportation will lead to an optimization of
the processes and increasing revenues. By 2020, Industry 4.0 is expected to bring
an average cost reduction of 3.6% p.a. across process industries globally, totaling
$421 billion30.
Additionally, recent studies find that about 50% of companies surveyed expect
double-digit growth in revenues in the next five years, attributed directly to Indus-
try 4.031, leading by Manufacturing and engineering sectors.
Figure 7. Growth in revenue attributable to Industry 4.0 per industry sector32
Employment: The demand for employees in the manufacturing sector is ex-
pected to growth although new skills will be required, and low-skills laborers, who
carry out repetitive tasks, will be displaced by workers with IT competencies such
as software development, connectivity, etc.
Investment: To adapt existing manufacturing processes to Industry 4.0 will require
that manufacturers invest in new devices, software application, and different ser-
vices such as cloud services.
It is expected that European industry will invest €140b annually in Industry 4.0 until
2020, according to PwC31, being Manufacturing and engineering the sectors with
a higher expected annual investment.
30 https://www.pwc.com/gx/en/industries/industries-4.0/landing-page/industry-4.0-build-
ing-your-digital-enterprise-april-2016.pdf
31 https://www.pwc.com/gx/en/industries/industries-4.0/landing-page/industry-4.0-build-
ing-your-digital-enterprise-april-2016.pdf
32 https://www.pwc.nl/en/assets/documents/pwc-industrie-4-0.pdf
© Main editor and other members of the DITAS consortium
32 D1.2 Final DITAS architecture and validation approach
Figure 8. Annual investments in Industry 4.0 per industrial sectors33
3.3.2.1 Building blocks of Industry 4.0
Everything in Industry 4.0 is about data: gather and analyze data across ma-
chines to increase manufacturing efficiency, reducing costs and increasing
productivity and benefits, and a key role is played by IoT. But is not only IoT: cloud
computing, big data, data analysis at the edge of network (edge computing),
data exchange, mobile, programmable logic controllers (PLC), HMI and Scada,
and sensors and actuators, are key elements to create smart factories.
Nine technologies trends have been identified as forming the building blocks of
Industry 4.034 (see Figure 9).
Figure 9. Nine Technologies transforming Industrial production35
Big Data Analytics: Collection and evaluation of data from different sources will
be crucial to real-time decision making.
33 https://www.pwc.nl/en/assets/documents/pwc-industrie-4-0.pdf
34 https://www.bcg.com/capabilities/operations/embracing-industry-4.0-rediscovering-
growth.aspx
35 https://www.zvw.de/media.media.72e472fb-1698-4a15-8858-344351c8902f.origi-
nal.pdf
© Main editor and other members of the DITAS consortium
33 D1.2 Final DITAS architecture and validation approach
Autonomous Robots: Autonomous and collaborative robots are being used in
smart manufacturing and working safely with humans and interacting among
them as well.
Simulation: 3D simulation used in prototyping of product development will be-
come widely used to optimize production and improving quality.
Horizontal and vertical system integration: Connect all the manufacturing supply
chain and all the departments of the entire organization will be solved by Industry
4.0.
Cybersecurity: Data of their process is crucial for companies, and the protection
of their information systems and manufacturing process is critical, so it would be
needed reliable communications and secure access to machines.
The Industrial Internet of Things: The explosion of IoT will enable to incorporate
embedded computing to sensors and actuators and connect them using stand-
ards technologies. This will lead to a smart manufacturing in which sensors will
connect each other and with centralized controllers in real time.
Additive Manufacturing: Industry 4.0 will allow Additive Manufacturing (AM) to be
more widely used, this will offer a lot of production and construction advantages,
such as high-performance, complexity, etc.
Augmented Reality: It is expected that Augmented reality will be able to provide
workers with real-time information and help to make decisions in manufacturing
processes.
The Cloud: Smart Manufacturing is more and more about harnessing data, and
that’s why all the technologies involved in the growth of industry 4.0 market man-
age and process data to help to improve the entire manufacturing process. To
process and analyses such amount of data, cloud-based software is required,
and Cloud computing is a crucial resource for smart manufacturing while offer-
ing a platform for open source collaboration.
At the same time, Edge Computing mixed with Cloud computing, will achieve
reaction times in just milliseconds.
3.3.2.2 Industry 4.0 Stakeholders ecosystem and solutions
Manufacturing Sector was a market led by “Product/Control solution providers”
represented by strongly positioned industrial automation corporates such as Sie-
mens, ABB, Rockwell Yokogawa, Schneider, etc. that offered proprietary solu-
tions. The increasing need of IT and connectivity solutions to implement Industry
4.0 has allowed other type of stakeholders of smart manufacturing: “IT solution
providers” and “Connectivity solution providers”.
IT solution providers will provide solutions for control, monitoring and data pro-
cessing, and an example of these companies are Microsoft, SAS, Oracle, IBM,
Intel, etc. On the other hand, Connectivity solutions providers will facilitate the
implementation of technologies with connectivity demands, some of these com-
panies are Cisco or Huawei.
The picture below showcases several brands that are currently investing in Indus-
try 4.0.
© Main editor and other members of the DITAS consortium
34 D1.2 Final DITAS architecture and validation approach
Figure 10. The new Industry 4.0 stakeholders ecosystem36
Product and Control solution providers
As said before, large industrial automation corporates have been the leaders in
the Manufacturing sector for many years, offering proprietary solutions to control
the industrial processes. But the evolution of the sector has led to these compa-
nies to take a step further in their offering portfolio and develop solutions to be
able to integrate the new technologies involved in the digitalization of the sector.
An example of that is Siemens, who position itself as an Industry 4.0 leader, up-
dating existing products or developing new ones with the ambition of helping
industrial companies to take advantage of the digitalization of the manufactur-
ing sector.
MindSphere37 is the cloud-based IoT platform and IoT operating system devel-
oped by Siemens to satisfy the technological needs of Industry 4.0. This open plat-
form offers Platform as a service (PaaS) with an extensive option for data ex-
change using Siemens APIs and native cloud accessibility, and connectivity op-
tions to support different IoT-ready assets (see Figure 11).
Figure 11. Mindsphere by Siemens38
36 https://dzone.com/articles/industry-40-the-top-9-trends-for-2018
37 https://www.siemens.com/content/dam/webassetpool/mam/tag-siemens-
com/smdb/corporate-core/software/mindsphere/mindsphere-brochure.pdf
38 https://www.siemens.com/content/dam/webassetpool/mam/tag-siemens-
com/smdb/corporate-core/software/mindsphere/mindsphere-brochure.pdf
© Main editor and other members of the DITAS consortium
35 D1.2 Final DITAS architecture and validation approach
IT solution providers
IT solution providers have leverage the opportunity of being part of the manu-
facturing revolution and bridge the gap between automation and IT.
Some of the most prominent solutions are:
• Microsoft Azure IoT Suite Connected Factory Solutions
Microsoft has developed Azure IoT Suite Connected Factory Solution39, a cloud-
based platform to manage industrial IoT devices in real time, introducing AI and
other advanced solutions such as Microsoft Hololens40 that enable interact with
holograms and visualize relevant data.
A typical architecture using Azure IoT, includes Azure IoT Edge for real-time data
ingestion and processing and with possibilities to adapt to several open-source
and standard protocols of different manufacturers and vendors.
Figure 12. Architecture using Azure IoT41
• Google Cloud IoT Edge
Cloud IoT Edge enable cloud-integrated edge computing, extending machine
learning and data processing capabilities provided by Google Cloud to edge
devices42.
This solution is being used in smart manufacturing to act on sensors or predict
outcomes in real time.
39 https://azure.microsoft.com/es-mx/blog/azure-iot-suite-connected-factory-now-avail-
able/
40 https://www.microsoft.com/en-us/hololens
41 https://blogs.msdn.microsoft.com/msind/2018/04/27/iiot-smart-factories-ai-azure-iot-
edge/
42 https://cloud.google.com/iot-edge/
© Main editor and other members of the DITAS consortium
36 D1.2 Final DITAS architecture and validation approach
Figure 13. Google Cloud IoT Edge workflow43
• Amazon Web Services IoT Platform
Amazon has developed AWB IoT Platform44, that is a cloud-based platform that
enable an easy interaction of the devices with other devices and cloud appli-
cations. This application also allows to use and integrate another AWS services to
create complete solutions for Industry 4.0.
Figure 14. AWS IoT architecture45
• IBM Watson IoT
IBM has developed IBM Watson IoT platform46, a cloud-hosted platform to man-
age device data and machines with Blockchain and AI services.
43 https://cloud.google.com/solutions/iot/
44 https://docs.aws.amazon.com/es_es/aws-technical-content/latest/aws-overview/in-
ternet-of-things-services.html#aws-iot-platform
45 https://cloudacademy.com/blog/aws-iot-internet-of-things/
46 https://www.ibm.com/us-en/marketplace/internet-of-things-cloud
© Main editor and other members of the DITAS consortium
37 D1.2 Final DITAS architecture and validation approach
Figure 15. IBM Watson Architecture
Connectivity solutions providers
Providing different types of connectivity in a factory is key important. Connectiv-
ity solution providers offer, wireless solutions or integration with the different Indus-
trial Automation and Control Systems (IACS) protocols existing and widely imple-
mented in the sector, such as Profinet, CC-Link or Ethernet IP.
Cisco has developed several solutions to its Connected Factory portfolio, such
as Connected Asset Manager (CAM) for IoT Intelligence that is a visualization tool
to manage data or Industrial Network Director that provides factories full control
of the plant network47.
Other Telco provider, Huawei is developing software and hardware solutions to
equip smart devices, for example Huawei’s LiteOS IoT operating system is em-
bedded in smart devices used in manufacturing simplifying cloud interconnec-
tions, or two wireless access methods: eLTE and NB-IoT to enable communica-
tions within the plant48.
In summary, stakeholders’ ecosystem in Industry 4.0 before described can be
classified as follow:
Stakeholders Companies Solutions
Product and Control
solution providers
Siemens, ABB, Rockwell
Automation, Honeywell,
Schneider, Bosch, etc.
MindSphere, Agility 4.0,
EcoStruxure, Bosch IoT
Suite, etc.
IT solution providers Amazon, Microsoft,
Oracle, IBM, Intel, etc.
Azure IoT, AWS IoT,
Oracle IoT Cloud,
Google Cloud IoT Edge,
IBM Watson IoT, etc.
47 https://www.cisco.com/c/en/us/solutions/internet-of-things/manufacturing-digital-
transformation.html
48 https://www.huawei.com/en/about-huawei/publications/communicate/84/iot-
makes-manufacturing-smart
© Main editor and other members of the DITAS consortium
38 D1.2 Final DITAS architecture and validation approach
Stakeholders Companies Solutions
Connectivity solution
providers
Cisco, Huawei, etc. CAM, LiteOS IoT, etc.
Table 2: Classification of Industry 4.0 Stakeholders
Industrial IoT platforms (IIoT) are the core of Industry 4.0 and in the competition of
the IIoT platform markets, IoT platform vendors are already integrating Block-
chain, and new solutions come with augmented vision, machine vision, or digital
twin capabilities. In the end, cognitive capabilities and artificial intelligence will
become the differentiating factor among IoT platform solutions. But while most
vendors provide hardware that support IoT solutions, the true differentiator is the
edge software to solve Industrial IoT projects where a lot of data must be pro-
cessed at or nearby the edge, so IoT vendors are moving their focus to the edge.
The IIoT ecosystem is quickly evolving and mainly led by major vendors, but new
start-up and solutions are emerging with capacity to close gaps and comple-
menting competencies. This is an opportunity for solutions like DITAS that can be
integrated in an IoT platform as a differentiator element sectors in which cloud
and edge environments play a key role.
3.4 Market context questionnaire
To collect business requirements and validate market context assumptions, the
DITAS consortium has conducted a market context questionnaire and interviews
among stakeholders.
This process has been carried out by the consortium partners through different
means: a) contacting stakeholders and sending the questionnaire template by
email, b) call interviews with stakeholders, and c) personal interviews with stake-
holders in different events (conferences, workshops, etc.).
The survey has been anonymously conducted and didn’t require to submit any
personal data from the interviewees.
3.4.1 Characterization of interviewees
The selection of the sample aims at covering the broadest range of stakeholders’
profiles identified in DITAS project and has included 12 individuals within the DITAS
consortium (Atos, IBM) who are not directly involved in the project activities, and
external companies/institutions to the consortium.
The organizations selected have been companies working on Manufacturing
sector (2), IT providers (5, most of them are Cloud providers), Telecommunication
providers (1), Software development companies (3) and Academia (1), and de-
pending on their size, 5 Large Companies, 4 SME, 2 VSE, and 1 N/A.
Different roles were covered by the interviewees involved in the survey, from
technical positions to business developers.
© Main editor and other members of the DITAS consortium
39 D1.2 Final DITAS architecture and validation approach
Figure 16. Characterization of the organizations
Figure 17. Interviewees’ roles
3.4.2 Summary of Questionnaires and interviews conducted
Below is a summary of the market-oriented questions included in the survey car-
ried out and the interviewees’ responses.
Question 1: Does your organization require Cloud/Edge services to develop its
business?
The adoption of Cloud services is going mainstream, and most companies make
use of these services to develop their business. Among the companies involved
in the survey, excepting Cloud providers, for obvious reasons, all of them use dif-
ferent Cloud providers such as: Google Cloud, AWS, IBM cloud, and Vodafone
Cloud. And for Edge computing they mainly use Microsoft Azure and AWS Green-
grass.
0 1 2 3 4 5
Project Manager
Software Architect
Product Developer
Business Developer
Sales Force
CTO
Researcher
17%
42%
25%
8%8%
Manufacturing
IT products and solutions
Software developments
Telco
Academia
42%
33%
17%
8%
LARGE SME VSE N/A
© Main editor and other members of the DITAS consortium
40 D1.2 Final DITAS architecture and validation approach
But it is significant how Manufacturing companies surveyed are still reluctant to
use Cloud services for their processes and make use of own services and devel-
opments.
Question 2: Does your organization use data from external sources for their com-
mercial offering?
75% of the interviewees do not use data from external sources for their commer-
cial offering.
The rest, 25% of the respondents use data from external sources since their port-
folio offering integrates data analysis such as Telco companies.
© Main editor and other members of the DITAS consortium
41 D1.2 Final DITAS architecture and validation approach
Question 3: Does your organization sell or buy data?
67% of the interviewees manage data
from others, contrary to 17% that man-
age own data.
Only Telco companies surveyed sell
and buy data.
This aspect of the data is important to
know since business models identified
for DITAS results are based on manage
data from data owners and consum-
ers.
Question 4: Has your company difficul-
ties managing those data?
SMEs and VSEs have expressed their
concerns to manage data, and 36% agree that the most important aspect for
them is data security and privacy, followed by 21% of data visualization and 14%
of data acquisition.
Figure 18. Difficulties managing data
For Manufacturing industry companies, the most important concern is the data
acquisition and data visualization, while for the rest of company profiles it varies
among the concerns previously described.
Question 5: What problems can DITAS solve for you that your company is already
solving with other workflows/tools/platforms?
Different points of views about what DITAS can solve for their businesses have
been found depending on the company profile. Some of the answers for this
question has been:
• For IT Companies:
a) DITAS can be an extra security layer to ensures except of the integrity and
confidentiality of the data GPDR compliance
b) DITAS can provide Data redundancy and orchestrate the data move-
ment from one cloud provider to the other hiding at the same time the
complexity of security restrictions that each cloud vendor has (API Keys)
• For Software development companies:
17%0%
0%
8%67%
8%
Manage own data
Sell data
Buy data
Sell and buy data
Manage data from others
N/A
0% 5% 10% 15% 20% 25% 30% 35% 40%
Data acquisition
Data storage
Data visualization
Data dismissal
Dana analysis
Data security and privacy
Data movement
© Main editor and other members of the DITAS consortium
42 D1.2 Final DITAS architecture and validation approach
a) DITAS can help data movement based on applications’ logic
b) DITAS can provide a Data security framework
• For Telco companies:
a) Performance wise DITAS could be used to move data from a warehouse
to another
• For Manufacturing industry companies:
a) Manage automatically the Fog/Cloud communication
Question 6: Would your organization consider using a solution such as DITAS in
your workflow?
All the interviewees agree that they would consider using DITAS in their workflow.
Question 7: How much will your organization willing to pay for such services?
This is an open question and most interviewees agree that it will depend on:
• The functionalities or services DITAS provide
• The business models
• Must consider the client and the case first
Question 8: Is there any other workflow/tools/platform or solutions/service that
solves the problem better/cheaper than DITAS?
Some interviewees have mentioned solutions such as Docker49.
Question 9: Do you think that the Open Source approach of DITAS could be a
barrier for your organization?
We can confirm that the Open Source of DITAS results is not a problem for 99% of
the companies surveyed.
For Manufacturing industry companies is very important that besides the Open
Source approach, the solution complies with industry standards.
49 https://www.docker.com/
© Main editor and other members of the DITAS consortium
43 D1.2 Final DITAS architecture and validation approach
4 Update to the Business and Technical Requirements
In the first version of the document, we elicited and analysed an initial set of busi-
ness and technical requirements, derived from two sources: (i) questionnaires
that were circulated to external entities and (ii) use case analysis. In this updated
version, we revised the requirements listed in D1.1 (D1.1, 2017) as well as added
new ones, based on the final picture of the project architecture. Towards that
direction, we interviewed people with expertise, asking them to rank nine general
requirements that we have identified, as also some parameters, that are related
to data and computation movement. The technical questionnaire that was
passed to the experts can be found in Annex 3. In total, we have collected 21
answers from the respondents so far and we will try to update with more ques-
tionnaires in the upcoming months.
Furthermore, in this document we focus on the traceability of the requirements.
In order to enhance that process, we extended the table –presented in D1.1
(D1.1, 2017)- that describes each of the requirements, by adding two more fields:
• Component that fulfils it
• Test case / Acceptance criteria
These fields will enable the consortium to better track the requirements and thus
to ensure that those have been addressed. The extended table is depicted be-
low, whereas the complete list of the requirements can be found in Annex 1. As
in D1.1 (D1.1, 2017), we prioritise the requirements using the MoSCoW method
(IIBA, 2009).
ID • For WPs 1-4: B(for Business requirement) or T(for tech-
nical requirement) + WP number origin.counter (e.g.
the first business requirement of WP2 is B2.1 etc.)
• For WP5:
o For Industry 4.0 use case: EU1.F (for DITAS Frame-
work level requirements) or UC (for Use Case
level requirements) counter e.g. EU1.F1, EU1.F2,
…, EU1.UC1, EU1.UC2,…
o For e-Health use case: EU2.F (for DITAS Frame-
work level requirements) or UC (for Use Case
level requirements) counter e.g. EU2.F1,
EU2.F2,…, EU2.UC1, EU2.UC2,…
Requirement
Type
This field determines whether the requirement is Functional or
Non-Functional
• Functional Specific technical implementation require-
ments) • Non-Functional (general abstracted architectural or
conceptual requirements)
Source This field identifies the source of the requirement
• Questionnaire • DITAS Analysis
Priority Based on MoSCoW method
• M - Must have this requirement to meet the needs
© Main editor and other members of the DITAS consortium
44 D1.2 Final DITAS architecture and validation approach
• S - Should have this requirement if possible, but project
success does not rely on it • C - Could have this requirement if it does not affect an-
ything else on the project • W - Would like to have this requirement later, but deliv-
ery won't be this time
Category Illustrates the category of the requirement
• Extensibility (Scalability, Expandability, Portability) • Security (Privacy, Integrity, Non-Repudiation) • Interoperability (Reusability, Connectivity, Adaptation) • Performance (Availability, Reliability) • Maintainability (Evaluability, Evolvability) • Other category
Component
that fulfils it
Indicates the specific component that fulfils the requirement
Description This field contains the specification of the requirement (de-
scription of the purpose and goals to be fulfilled), written in a
preferably concise, yet clear way. At this point one should be
very specific as to which is the goal of this requirement and
envisioned benefit.
Rationale This field describes the need that the specific requirement is
covering.
Dependencies This field contains a list of possible interdependencies between
the requirements.
Test case / Ac-
ceptance cri-
teria
This field describes the way to test the requirement, to ensure
that it is developed and thus, fulfilled. This information will be
the base for the Verification & Validation procedures
Time-frame This field provides an estimation on the time frame to have this
requirement fulfilled.
• Report period 1 • Report period 2
Comments Extra comments that could be used in order to further describe
the specific requirement.
Table 3: Fields to be fulfilled by the requirements of DITAS.
© Main editor and other members of the DITAS consortium
45 D1.2 Final DITAS architecture and validation approach
5 DITAS Architecture
The DITAS architecture has been designed taking into account the main design
principles guiding service-oriented systems, fog computing environments, and
content delivery networks.
Generally speaking, the services considered in DITAS as Data Services, i.e., com-
ponents able to provide data that the owner wants to make available, in a read-
only mode, to data consumers. These data live in the resources managed by the
data owner and could be stored in databases or offered as streams.
Depending on customer’s needs, more or less complex data processing can be
done prior to data access. Indeed, the proposed service could offer the data as
they are or analysis on data sets.
Adopting a service-oriented architecture, the visibility principle has been
adopted to make the provided functionalities accessible through APIs that are
made publicly available. Conversely, details on the implementation must be hid-
den to the service consumer. In DITAS, APIs concern the ability to access to data
that could be stored on databases or generated by sensors or returned by data
processing methods.
Due to the heterogeneity of data sources, the service provisioning requires to
deal with different devices that can be located both on the edge and the cloud,
i.e., a Fog environment. For this reason, DITAS architecture must support the data
provisioning, especially when processing is required, with a deployment able to
balance between the scalability and security offered by the cloud resources and
the reduced latency offered by the edge resources.
At run time, the DITAS solution relies on data movement that, with respect to what
usually happens in the content delivery networks, does not occur only from the
cloud to the edge but also the vice-versa, from the edge to the cloud.
To achieve these objectives, DITAS introduces the Virtual Data Container (VDC)
which has the role to embed in a single logical unit the components which con-
stitute the original data-intensive application along with specific modules that
offers a way to access data which is agnostic with respect to the specific under-
lying technology (see Figure 19). In addition, a data utility enforcement is in-
cluded to check if the quality of the data offered by the service as well as the
quality of the service is met. In case these qualities are not in line with the cus-
tomer expectation, the DITAS architecture is able - enacting data and compu-
tation movement strategies - to recover the situation.
Data movement concerns the need to move the original data set from the loca-
tion in which the owner of the data has decide to store them to other places
managed by the DITAS platform or by the consumer to reduce the latency, while
preserving the security during the transmission and the privacy. Similarly, the com-
putation movement enables the possibility to move a VDC among the resources
made available by the data owner, the DITAS platform, and the customer.
As the computation and especially data movement could have an impact on
the privacy and security issues, the VDC has also to verify that data are always
stored in the places and in the format that satisfy the consent of usage as agreed
by the data owner and the data customer.
© Main editor and other members of the DITAS consortium
46 D1.2 Final DITAS architecture and validation approach
Being provided an informal overview of the DITAS architecture, the next para-
graphs will propose a more detailed and precise view. In particular, we firstly de-
fine the actors involved in the architecture, then a complete overview of the
components able to design, deploy and manage VDCs are described accord-
ing to two main components that constitute the DITAS architecture:
• the DITAS-SDK concerning the definition and the retrieval of a VDC
• the DITAS Execution Environment (DITAS-EE) that manages the execution
of the VDC as well as the data and computation movements.
Figure 19. The conceptualization of Virtual Data Container
5.1 DITAS roles
The data administrator is the owner of data sources and has a complete
knowledge of them. The data administrator takes advantage of DITAS to enable
the provisioning of some of the internal data that s/he would like to make acces-
sible by other subjects. Depending on the subject and the consent of usage, the
visibility on these data can be partial or total. With DITAS, the data administrator
can simplify the process of making her/his data available as, through the VDC,
the DITAS platform is able to optimize the data provisioning by means of data
and computation movement. In fact, the data administrator has only the task to
define the exposed API, i.e., the Common Access Framework (CAF), reflecting
the methods to access to the data.
The application developer is the actor in charge of creating the VDC. Based on
the data sources made available by the data administrator s/he responsible of
defining the code able to expose the API defined by the data administrator.
© Main editor and other members of the DITAS consortium
47 D1.2 Final DITAS architecture and validation approach
Depending on the case, the data processing developed can be a simple con-
nection to the provided data sources or complex data analytics. As a result, the
application developer is able to provide a complete specification of a VDC. It is
worth noting, that in several cases the same actor will hold both the data admin-
istrator and the application developer roles.
The application designer represents the service consumer and her/his goal is two-
fold. On the one hand, the goal is to select the most suitable VDC with respect
to her/his requirements. For this reason, the DITAS platform has to provide a
matchmaker able to compare the application requirements and the capabilities
offered by a VDC. This matchmaking is mainly driven by the data utility which
encompass the quality of service, quality of data, and reputation aspects. On
the other hand, she/he has to check if the VDC is really providing what has been
promised both according to functional and non-functional perspective.
The DITAS operator is responsible for the run-time platform; this includes the re-
sponsibility for maintaining the applications running. The system operator has no
specific application or data knowledge, but rather dependent on the monitoring
tools to verify that all the applications are properly running, to monitor the cor-
rective actions the DITAS platform is taking, and to provide feedback at design-
time by suggesting refinements of the data utility specification.
5.2 DITAS-SDK Architecture
The major goal of the DITAS SDK is to support the definition and the matchmaking
of the VDC Blueprint. All the components created in the context of DITAS SDK are
created to support the full lifecycle of the VDC Blueprint.
VDC blueprint is a structured document (in our implementation we are using
JSON to this purpose) created to capture all the properties of the VDC which
goal is twofold:
● to support the application designer when looking for the dataset that
could be interesting for his/her purposes.
● to support the DITAS-EE to properly deploy all the components composing
the VDC needed to expose the data.
The VDC Blueprint consists of 5 distinct sections (described in the deliverable D3.2
(D3.2, 2018)) created to describe different aspects of the VDC instances:
● Internal Structure: High-level textual description of the VDC to character-
ize it as a product, focusing on business characteristics.
● Data Management: Specifies the attributes of the methods offered by the
VDC and, for each method, the guaranteed levels of data quality, secu-
rity and privacy. This is the set of information that are defined by the data
administrator to inform the DITAS platform about where the data sets to
be exposed are.
● Abstract Properties: Contains all the rules in the form of goal trees to be
used by the SLA Manager in order to define the SLA contract which will
hold during the VDC usage between the data administrator and the data
designer/data developer.
● CookBook Appendix: Describes the deployment information to properly
host the VDC in the DITAS-EE. This information also contains the details to
create not only the VDC but also the VDM, which will be in charge of
managing the VDC instances created from the same VDC Blueprint
© Main editor and other members of the DITAS consortium
48 D1.2 Final DITAS architecture and validation approach
● Exposed API: Technical section to enable the application developer to
fully understand how the VDC exposed methods work.
Each of these sections addresses different DITAS roles and components. DITAS
SDK is created in order to handle all the needed operations for creating storing
and delivering the Blueprints to the Application Designer (Figure 21). In fact, the
VDC blueprint cannot be considered as a document created in a single step
but, depending on the interaction among the actors, the sections are incremen-
tally defined. More in details, Figure 20 shows the VDC Blueprint lifecycle where
three versions of the VDC Blueprint are included:
● Abstract VDC Blueprint.
● Intermediate VDC Blueprint.
● Concrete VDC Blueprint.
Figure 20. VDC Blueprint Lifecycle
In the first step, which is composed by several activities (see Figure 21), Data Ad-
ministrators create the Abstract VDC Blueprint and then the blueprint validator
component validates this document in order to store it in the blueprint repository
by using the Blueprint Repository Engine which is the component responsible for
carrying all the CRUD operations for the Blueprint Repository. Applications design-
ers should be able to select the appropriate blueprint based on the requirements
of the application that is developed.
© Main editor and other members of the DITAS consortium
49 D1.2 Final DITAS architecture and validation approach
Figure 21. DITAS SDK Architecture
In order to select the most appropriate Blueprint, the Resolution Engine compo-
nent is introduced. This component takes as input the Application Requirement
file that the Application Designer produces and filters the Abstract blueprints ac-
cordingly. Application Requirement file is also a JSON formatted file which con-
tains all the requirements of the Application Designer. Requirement file contains
information about the content that the VDC should deliver, the QoS that the VDC
is committed to deliver, the DATA quality of the Sources and also information
about the Privacy and Security features of the VDC.
Resolution engine consists of three subcomponents (Figure 22). The first compo-
nent is the content based search which filters the Blueprint based on the content
they deliver. The second component is the DURE (Data Utility Resolution Engine)
which is responsible for filtering and ranking the filtered Blueprints from the previ-
ous step based on the QoS and Data Quality features. Finally, the Resolution en-
gine communicates with the third component, Privacy and Security Evaluator
which is responsible for filtering and Ranking the Blueprints based on the Privacy
and Security requirements of the Application Designer.
© Main editor and other members of the DITAS consortium
50 D1.2 Final DITAS architecture and validation approach
Figure 22. DITAS SDK Resolution Engine Architecture and component interaction
It is important to mention that the Abstract Blueprints that fulfill all the require-
ments are altered by the DURE by inserting non-functional constraints that will be
used for conducting the SLA if this Blueprint is selected by the Application De-
signer. This additional information transforms the Abstract VDC Blueprints in the,
so-called, intermediate VDC Blueprints which will be returned as possible candi-
dates for the designing application to the Application Designer. Once the Appli-
cation Designer has selected one of the returned blueprints candidate, the In-
termediate VDC Blueprint is sent to the Deployment Engine in order to create the
Concrete Blueprint.
To do so, the Deployment Engine will consider the list of resources included in the
Application Requirements document that the Data Administrator, the Applica-
tion User and the Data Provider may want to provide for running VDC. These re-
sources are computation instances and storage space that can be allocated in
a Public Cloud or machines and disks that are already available at the Edge. It
is worth noticing that we assume that these resources are properly configured to
be made accessible by the Deployment Engine as well as by the DITAS-EE to
deploy and execute portion of the data storage, data computation, and data
movement actions
As a last step, the Concrete VDC Blueprint is passed to the DITAS-EE components
and it will provide visibility about the available resources that can be used to
optimize the VDC execution.
5.3 Execution Environment Architecture
The Execution Environment (EE) is the second main element of the DITAS platform
and provides support for executing and managing the VDC lifecycle once de-
ployed by the Deployment Engine as agreed by the Application Designed and
the Data Administrator exploiting the SDK facilities.
In fact, the DITAS-SDK and the DITAS-EE are in tight connection through the De-
ployment Engine, which is responsible to build and configure the Execution Envi-
ronment between cloud and edge devices running on top of Kubernetes infra-
structure, based on the Blueprint resolution selected by the Data Administrator.
Also, the EE provides the foundational capabilities required by the VDCs’ security-
© Main editor and other members of the DITAS consortium
51 D1.2 Final DITAS architecture and validation approach
and privacy-related components in matters of Identity and Access Manage-
ment.
The execution environment takes decisions about what, where, when, and how
to move data or computation resources. To this aim it is composed by two main
elements:
● Virtual Data Container (VDC), created from what has been defined in a
Concrete VDC Blueprint and deployed to serve a specific application by
providing data exposed by the data administrator.
● Virtual Data Manager (VDM), which is in charge of executing, monitoring
and moving either data or computation within the environment.
When having more than one Concrete VDC Blueprints, which are generated
from the same Abstract VDC Blueprint, this results in the generation of many VDCs
that are all supervised by only one VDM in the same Kubernetes environment,
and that access to “logically” the same data source (see Figure 23). Logically it
means that the data could be replicated or moved in the environment to im-
prove the performances of the application, while preserving the privacy of the
data.
Figure 23. DITAS Execution Environment for several deployments of the same blueprint
5.4 VDC Architecture
The VDC provides an abstraction layer that takes care of retrieving, processing
and delivering data with the proper quality level, while in parallel putting special
emphasis on data security, performance, privacy, and data protection. The
VDC, acting as a middleware, lets the application designer simply to define the
requirements on the needed data, expressed as data utility, and takes the re-
sponsibility for providing this data timely, securely and accurately by hiding the
complexity of the underlying infrastructure. The infrastructure could consist of dif-
ferent platforms, storage systems, and network capabilities. The VDC Blueprint
describes thoroughly the VDC, since it includes, among others, information about
the business characteristics of it, about the data sources that the VDC connects
to, how to deploy it as well as the API that the data administrator exposes to the
data consumers.
From the technical point of view, the Virtual Data Container is, by definition, pro-
gramming platform and language agnostic, in order to facilitate the life of the
application developer, who is in charge of creating the VDC. Indeed, the devel-
oper has the flexibility to implement the VDC based on the platform and the lan-
guage with which he/she is familiar. For instance, in the context of DITAS project
VDM
VDC VDC VDC
..DATA STORAGE
© Main editor and other members of the DITAS consortium
52 D1.2 Final DITAS architecture and validation approach
- as required by the two uses cases considered in the project - one of the imple-
mented VDC uses the Spark platform and relying on the Spark SQL module for
structured data processing, while another one uses the Node-RED platform,
whose lightweight runtime is built on Node.js. In this way, it is possible to evaluate
the DITAS approach in different situations: i.e., in former case, the VDC has to
deal with data analytics over a heterogenous data set, whereas in the latter case
data offered as streams are collected and processed.
Moreover, the VDC is architectural agnostic and therefore it is able to run at the
edge of the network on low-cost hardware such as the Raspberry Pi as well as in
more powerful cloud resources. This VDC principle is of high importance, partic-
ularly while enacting computation movement strategies that enable the possibil-
ity to move a VDC between heterogeneous resources that compose a Fog en-
vironment.
On a high level, a Virtual Data Container consists of three different layers: the
Common Accessibility Framework (CAF), the Data Processing and the Data Ac-
cess Layer (DAL), as depicted in Figure 24.
Figure 24. High-level view of the VDC
5.4.1 Common Accessibility Framework
The role of CAF is to ensure that VDCs serve their data in a unified and pre-de-
fined manner. It is actually the interface between a VDC and the data-intensive
application, meaning that the latter knows only the CAF, which hides all the com-
plexity behind the VDC. The data administrator publishes the CAF API, which con-
tains a set of well-described methods, through which he/she makes available
some of the data included in the data sources, to which the VDC is connected.
From the implementation point of view, the programming model that CAF follows
is REST oriented and the adopted common communication protocol is HTTP. Re-
garding the API, it is described according to the OpenAPI specification, ex-
tended with features that enable the DITAS platform to implement data move-
ment techniques. The complete definition of the CAF API is included in the Ab-
stract VDC Blueprint under the Exposed API section.
© Main editor and other members of the DITAS consortium
53 D1.2 Final DITAS architecture and validation approach
5.4.2 Data Processing
Data processing layer contains all the computation, data transformation and
composition that the VDC implements in order to provide the data to the con-
sumers in the content and format that the exposed API promises. Regardless of
the adopted programming language, the VDC is able to perform a set of pro-
cessing techniques to the data coming from the sources. Depending on the busi-
ness logic of each one exposed VDC method, the range of this processing may
vary between just fetching raw data from a single database, on the one hand,
and querying multiple sources, applying analytics on the data and compressing
it before serving the response to the client, on the other hand. The code that
implements the data transformation layer is included in the concrete VDC blue-
print.
5.4.3 Data Access Layer
The third element of a VDC is represented by the Data Access Layer (DAL), which
has the fundamental role of exposing the data provided by the Data Adminis-
trator to the DITAS-EE infrastructure without violating any privacy and security
constraints. In fact, the DAL includes the Privacy Enforcement Layer, which is the
component in charge of rewriting the SQL, which is required to be executed in
order to satisfy the call coming from the Processing Layer, to a SQL that avoids
returning the data that cannot be seen externally. This filtering is affected mainly
by the location of the VDC. In fact, there is a possibility to move the computation,
i.e., the processing and the CAF layer, and this could affect the data that can
be transmitted. For this reason, an important assumption about the DAL requires
that this layer is deployed in the same place, where the data is stored, i.e., it is
invariant of the computation movement.
Focusing on the data movement, in case the strategy is to duplicate the data
source somewhere else (e.g., on the premises of the consumer) only the DAL firstly
ensures that only the data that can be stored at that location are replicated.
Secondly, a new instance of the DAL is instantiated at that location to perform
access control.
In more detail, Privacy Enforcement Engine acts as a proxy before executing the
query over the data. It rewrites the query so that it returns only data compliant
with privacy policies, evaluated together with user identity information. To this
end the original query is augmented with filters based on policies and on addi-
tional attributes of the request or the data, such as the data subject consent.
The protocol of communication between the DAL and the rest of the VDC is
gRPC since, on the one hand, it is generic enough and supports well both re-
quest-response model and streaming and, on the other hand, it is much more
efficient than plain REST over HTTP.
© Main editor and other members of the DITAS consortium
54 D1.2 Final DITAS architecture and validation approach
Figure 25. High-level view of the DAL
5.4.4 Other VDC Components
VDC Request Monitor - It monitors the VDC incoming and outgoing requests by
intercepting the http traffic. The requests are evaluated and enriched with blue-
print metadata and stored in a monitoring database. Further information, like re-
sponse time, error codes are measured and reported.
Throughput Agent - It monitors the data traffic between VDC components and
the data source. Measurements are aggregated and enriched with added data
from other monitoring components and then also stored in Elasticsearch.
Logging Agent - It monitors the logs of different DITAS components as well as of-
fering an interface to the VDC and other VDC components to report additional
information to the Elasticsearch.
Data Utility Evaluator (DUE@VDC) At runtime, it is responsible for evaluating the
Data Utility for the VDC and for providing information to the SLA Manager to trig-
ger data and computation movement in case of data utility requirements viola-
tion.
SLA Manager - Checks that the Quality of Service constraints defined in the ab-
stract blueprint for the different data sources are met during the execution of the
VDC, sending a violation message to the VDM if not.
5.5 VDM Architecture
The role of VDM is to coordinate the several VDC that can be instantiated from
the same VDC blueprint. In fact, for each user requesting for a VDC Blueprint, a
VDC instance is generated. As a result, many VDC instances will be created each
of them accessing to the same data source. In this configuration, a coordination
is required, as each VDC operates independently from the other and if the deci-
sion on data movement would be left to the VDC it might happen that decision
taken by a VDC could negatively another VDC. For instance, assuming that there
© Main editor and other members of the DITAS consortium
55 D1.2 Final DITAS architecture and validation approach
are two VDCs where one is deployed on the cloud and the other on the edge,
as they access both to the same data source it might happen that the former
prefers to have data on the cloud, while the latter on the edge. Assuming that a
duplication of data is not possible, this situation could result in a continuous data
movement between the two nodes, as the VDC wants to optimize their local
behavior. To avoid this situation a VDM is required.
To allow a proper management of this type of conflicts, the VDM is equipped
with components able to monitor all the controlled VDC, to decide and enact
data and computation movement actions (see Figure 26). More in details:
Figure 26. High-level view of the VDM
• DUE@VDM - it aggregates the data utility values calculated at runtime by
the different DUE@VDCs. Such data utility values are referred to the data
returned by the methods offered in the VDC. The aggregation of the utility
values helps in understanding the level of data utility provided by the dif-
ferent methods at run time.
• Decision System 4 Data Movement - This VDM component plays an im-
portant role in the DITAS platform, since it decides when and where to
move data sources and VDCs in the fog infrastructure managed by DITAS.
The Decision System for Data and Computation Movement (DS4M) con-
siders requirements of application designers and their violations: when a
requirement is violated the DS4M enacts the movement that will have the
highest positive impact on the violated requirement and, therefore, the
highest probability of restore the satisfaction of such requirements. In case
data sources of the VDM are shared among multiple applications, the
DS4M considers the requirements of all applications and enacts the best
data movement based on the entire set of requirements.
• Data Movement Enactor - it enacts the actions of data movement across
locations, where the data could be easily used, by consuming the API of
the storage layer. It will copy and maintain synchronicity of data between
edge and cloud servers/instances; making the data available closer to
where it’s needed. • Computation Movement Enactor - Moves computation units between the
available computational resources to optimize data access times. Facili-
ties made available by Kubernetes, the orchestrator adopted in the
© Main editor and other members of the DITAS consortium
56 D1.2 Final DITAS architecture and validation approach
project, containers fully or partially implementing the processing layer of
a VDC can moved among the different locations in which the processing
is allowed to be performed without violating any privacy constraints.
• Data Analytics - aggregates additional information, generated by the op-
eration of different DITAS components, as Data/Computation Movement
Enactor, Decision System for data and computation Movement, SLA Man-
ager, Throughput agent and Logging agent. Provides an interface to
query the various data sources that comprise this information and does
additional processing and refining where necessary. Its queries integrate
key QoS metrics used in the operations of other components such as the
SLA manager and Decision System for data and computation movement.
The Data analytics API is placed inside a separate container in a Kuber-
netes environment with its own endpoint and internal DNS, which allows
communication with other DITAS components. Data Analytics provides an
API which translates requests into ElasticSearch queries and outputs the
results in accordance to a format suitable for use by other DITAS modules.
• Log data analysis service - provides log analysis for the data administrator
to get insights about their own data sources. This module also offers to the
DITAS operator the possibility to analyze the situation of the whole platform
in terms of, for instance, violations, bottlenecks, deadlocks, regardless of
a specific VDM and VDCs.
5.6 VDC and VDM integration
The connection between the VDM and the VDC is represented in Figure 27. Gen-
erally speaking, the VDM is interested in the data about the behavior of the VDC.
As already mentioned above, the relevant events occurring during the execu-
tion of the VDC are stored in the ElasticSearch module. Such low-level events
(e.g. applications are accessing to some data) are then processed by the data
analytics module in order to generate high-level events (accessed data are re-
turned with a given quality) that can be used by the SLA manager to raise possi-
ble violations (e.g., the returned data have not sufficient quality).
It is worth noticing that the VDM is also indirectly connected to the SDK. In fact,
the DUE@VDM is a component that is able to decide, based on the behaviors of
the running VDC, whether the promised data utility values stored in the Abstract
Blueprint needs a revision. For instance, if the level of accuracy of the data set
agreed for a given VDC is very often violated, it is reasonable for the data ad-
ministrator to have a tool able to raise this issue and to suggest the proper value
to be stored. In this case, in accordance with the data administrator, the Abstract
VDC blueprint can be revised with the new values which will be used for the sub-
sequent instantiations.
© Main editor and other members of the DITAS consortium
57 D1.2 Final DITAS architecture and validation approach
Figure 27. High-level view of the VDM and VDM integration
© Main editor and other members of the DITAS consortium
58 D1.2 Final DITAS architecture and validation approach
6 Detailed Technical Verification and Validation Approach
The following is a detailed technical verification and validation approach that
the consortium members will be performing during the project. The results of the
verification and validation results will be delivered on D5.450 on M36.
6.1 Requirements traceability
Tracking requirements is an important aspect of the software development pro-
cess as it ensures that all of the requirements have been correctly considered
and updated during each stage of the project. It’s a decisive part as it guaran-
tees that the development team has covered every need and no functionality
is missed out of left untouched, giving with this, the stability and consistency to
the final product and to each of its components.
The most common way of ensuring that there’s a proper and full traceability is
using a Requirements Traceability Matrix (RTM). For the DITAS project, we are us-
ing a Google Docs-based Excel sheet Requirements Traceability Matrix, one for
each Work Package, and one for each use case. The following figure shows an
example of the Requirements Traceability Matrix for WP2.
Figure 28: Requirements Traceability Matrix for WP2
This Requirements Traceability Matrix gives us a fast view and a total traceability
of each requirements with the component that has to fulfil it, and which test case
we have to run in order to validate it.
It’s important to point out that the requirements (described on the Description
column of the RTM) can have two different sources; internal requirements, which
are requirements elicited by the consortium, and external requirements, elicited
by using external methods like business questionnaires, which are explained on
the section Update to Market Analysis (Section 3) and Update to the Business
and Technical Requirements (Section 4) of this deliverable.
Moreover, we have added a specific tab to track requirements and compo-
nents with the main project objectives described in the DoA (DoA, 2016). The tab
Measurements criteria vs WP relates every project objective in the DoA with the
50 Ditas consortium: D5.4 Final case studies validation report
© Main editor and other members of the DITAS consortium
59 D1.2 Final DITAS architecture and validation approach
work packages and components in charge to fulfil it. The following figure is an
excerpt from that tab.
Figure 29: Measurements criteria vs WP
6.1.1 Requirements as user stories
In Agile development, user stories are a brief and simple definition of a feature
faced from the perspective of the one who desires a new capability, which is
usually a user or a customer of the system. They have the following from:
As a <type of user>, I want <some goal> so that <some reason>.
For example, a user story for searching on a website can be as follows:
As a website user, I want to able to search on the webpage, so that I can find necessary in-
formation.
For the project, the consortium members were encouraged to use user stories to
define the requirements on a user-friendly way, and they are written on the “De-
scription” column of the Requirements Traceability Matrix.
6.1.2 Acceptance criteria
For the DITAS project, as well as with the user stories, the consortium members
were encouraged to use acceptance criteria in order to describe the way to test
and fulfil the requirements, that is, to specify conditions under which a user story
(a requirement) is fulfilled. The acceptance criteria sentences are written on the
“Test case / Acceptance criteria” column of the Requirements Traceability Ma-
trix.
For example, and following the example of the previous section, the ac-
ceptance criteria for searching on a website can be as follows.
Given that I’m in a role of registered or guest user
When I open the “Products” page
Then the system shows me the list of all products
And the system shows the “Search” section in the right top corner of the screen
When I fill in the “Search” field with the name of existing item in the product list
And I click the “Apply” button OR press the Enter key on keyboard
Then the system shows the matching products in the Search Results section
The acceptance criteria will drive the verification of the requirements, as it ena-
bles individual requirement verification. Moreover, when possible, acceptance
© Main editor and other members of the DITAS consortium
60 D1.2 Final DITAS architecture and validation approach
criteria will be automated in the Continuous Delivery Pipelines. When not possi-
ble, manual inspection will be used to track the fulfilment of the requirements.
5.1.3 Working methodology
As stated on 5.1 Requirements traceability section, each Work Package has its
own Requirements Traceability Matrix. The initial requirements of the matrix are
the ones that were described on the Annex 2 - DITAS Business and Technical Re-
quirements section of the D1.1 (D1.1, 2017). Hereafter, the way of updating the
Requirements Traceability Matrix for each role is as follows:
• Every WP partners are in charge of updating, removing or adding the in-
ternal requirements. The internal requirements should be written using the
User Stories style described on the section 5.1.1.
• Every WP leaders are in charge of updating the external requirements
(taken from the questionnaires) of the WP when available and removing
the outdated ones. The external requirements as well as the internal re-
quirements, should be written using the User Stories style described on the
section 5.1.1.
• Every WP leaders are also responsible for updating the Measurements cri-
teria vs WP tab that will be explained in the “User acceptance tests” sec-
tion.
6.2 Verification methodology
Software verification is required to ensure the software quality and is a key phase
in the software development life cycle. For ease of understanding we can sum-
marize it with the question, “Am I building the right product?”; that is to say, “does
the software satisfies its specification”?.
The software verification for the DITAS project is focused on the development of
the components and are performed via several tests that are explained on the
following subsections. Most of these tests run on the Jenkins pipeline of the DITAS
Continuous integration (CI) system. More details about the CI system can be
found on D5.2 (D5.2, 2018).
The flow for the software verification is as follows:
Figure 30: Software verification tests
6.2.1 Unit tests
A unit test is a way of testing a unit, the smallest piece of code that can be logi-
cally isolated in a system ensuring that the units are individually and inde-
pendently scrutinized for proper operation. These tests are performed on the
“Build - Test stage” of the Jenkins Pipeline.
The following example shows a unit test performed by the VDC Repository Engine,
a Java-based developed component. It uses Maven as a build automation tool
for building and running all the unit test classes.
© Main editor and other members of the DITAS consortium
61 D1.2 Final DITAS architecture and validation approach
stage('Build - test') {
agent {
dockerfile {
filename 'Dockerfile.build'
}
}
steps {
// Build and store artifact
sh 'mvn -B -DskipTests clean package'
archiveArtifacts 'target/*.jar'
// Run unit tests
sh 'mvn test'
}
}
If any of the unit tests fails, the Jenkins pipeline stops, and the developer gets
instantly notified via email, so he can fix its code errors as soon as possible.
6.2.2 API validation test
All the API defined by every component of the project is described via OpenAPI.
Thus, each of the APIs have a yaml file with a complete description of the exposed
resources, parameters and expected responses. This enables automatic testing
of the APIs.
In order to check that the API is implemented according to its definition file,
Dredd 51, a language agnostic command-line tool is used. More details about
this can be found on (D5.2, 2018) , but to sum up, Dredd reads the API description
from the definition file and step by step validates whether the API implementa-
tion replies with responses as they are described in the documentation. Using this
tool enforces the developer to have the API documentation up to date, ensuring
also and up-to-date API documentation for the users.
The following example shows a API validation test performed by the VDC Repos-
itory Engine, using Dredd as explained above. The VDC Repository Engine com-
ponent container is running on the Staging machine (31.171.247.162) and it
serves on the port 50009. The validation of the API definition happens against this
endpoint.
stage('API validation') {
agent any
steps {
sh 'dredd VDC_Repository_Engine_Swagger_v2.yaml.yaml http://31.171.247.162:50009'
}
51 Dredd - HTTP API Testing Framework: https://github.com/apiaryio/dredd
© Main editor and other members of the DITAS consortium
62 D1.2 Final DITAS architecture and validation approach
}
If the API validation fails, the Jenkins pipeline stops and the developer gets in-
stantly notified via email, so he can fix its API definition as soon as possible.
6.2.3 Integration Tests
Within the integration tests, multiple and bigger units in interaction are tested to
ensure the consistency and interoperability between integrated components.
This level of testing exposes faults in the interaction between integrated units.
To plan de integration test of each component, the components diagrams (ei-
ther SDK or EE) were used as starting point in order to see dependencies between
components. These components diagram were defined on section 3 of D4.252,
as well as the diagram-based approach which was introduced on D5.2. A more
detailed definition of the integration tests will be defined on D5.4.
The following example shows an integration test performed by VDC Repository
Engine component, which uses, as it is Java-based component, the Maven Fail-
safe plugin to simplify the action and to fire the corresponding integration tests.
stage('Integration tests') {
agent any
steps {
sh 'mvn verify'
}
}
If the integration tests fail, the Jenkins pipeline stops, and the developer gets in-
stantly notified via email, so he can fix its dependency problems as soon as pos-
sible. The failed component is not deployed downstream so will never reach the
Staging or Production servers.
6.3 Validation methodology
The validation methodology for the DITAS project, which leads us to meet the
project requirements in the most efficient and effective way, contains three dif-
ferent levels:
● Component level requirements validation: How we validate the DITAS
framework components using their requirements.
● Framework validation [against use cases]: How the use cases (Industry
4.0 and eHealth) help to validate the DITAS project.
● Validation against project objectives: How we validate the project ob-
jectives described on the DoA.
6.3.1 Component level requirements validation
As stated on the previous sections, using the RTM we have requirements written
on the “Description” column for each Work Package. These requirements are
fulfilled by one component, which is declared on the “Component that fulfils it”
52 Ditas consortium: D4.2 Execution environment Prototype - First release (2018)
© Main editor and other members of the DITAS consortium
63 D1.2 Final DITAS architecture and validation approach
column of the same matrix. Finally, this component is tested using a concrete
acceptance test. If the test passes, the component gets validated. We can sum-
marize the components validation with the following figure.
Figure 31: Component validation flow
Each acceptance test provides enough information to run the test. Depending
on the nature of the test, it will be run automatic or manually. All the information
regarding acceptance tests will be detailed on Deliverable D5.4.
6.3.2 Framework level validation
The framework level validation part for the DITAS project is divided in two different
type of validation tests, the system tests and the user acceptance test.
6.3.2.1 System Tests
System test are crucial for validating the behavior of the components when crit-
ical actions are required from framework. IDEKO as the validation leader has
made a list of these critical actions, such as publish a Blueprints into the Reposi-
tory, search for a blueprint, Blueprint ranking, etc. The system tests ensure that
there aren’t any inconsistencies between the units that are integrated together.
Typically, the system tests are done after the integration test on an automatic
way, but we prefer to test them separately by critical actions, one test for each
critical action, and in this way, we have a total control of the tests. It’s important
to point out that some of the test will be done manually and some others auto-
matically. The full list of system tests will be detailed in the D5.4.
For example, the testing of the critical action Blueprint ranking, is made manually.
To do it, we first have to open the Blueprint Repository web interface, we type a
query and we check if the results are correctly ranked (comparing the blue-
prints). If everything worked perfectly, we verified that the VDC Repository En-
gine, the Blueprint Repository Database, the Blueprint Repository index in Elas-
ticsearch and the Data Utility Resolution Engine are correctly working, as they
were the elements that take part on this critical action. The tracking of the system
tests and the results are also made using a traceability matrix. Depending on the
nature of the tests, it will be automated into the CI Cycle using tools like Selenium.
6.3.2.2 User acceptance tests
The User acceptance tests, as the name suggest, is a process of verifying that a
solution works for the user. For the DITAS project, the users are the Use Cases,
which are in charge of developing an application that will use the DITAS frame-
work. Each use case defines a set of business requirements, that is, what they
need from the software. These requirements cover different type of categories,
such as performance or availability. Furthermore, they define test cases or ac-
ceptance criteria in order to validate these requirements. We consider that if this
happens, the user acceptance tests are successfully passed, as they require-
ments are met.
The following image shows some business requirements requested by the Indus-
try 4.0 Use Case, which is driven by IDEKO.
© Main editor and other members of the DITAS consortium
64 D1.2 Final DITAS architecture and validation approach
Figure 32: Business requirements for the Industry 4.0 use case
Along with the business requirements, the Use Cases also offer some technical
requirements they need for their application. The next figure shows some of the
technical requirements for the IDEKO Use Case.
Figure 33: Technical requirements for the Industry 4.0 use case
We can summarize the user acceptance tests with the next figure, where the
Application of the use cases (which is using the DITAS framework) have some
requirements. These requirements get fulfilled if the acceptance test gets ac-
complished, therefore, the application using DITAS meets the user requirements,
so the user acceptance tests are passed.
Figure 34: Validation against use cases flow
© Main editor and other members of the DITAS consortium
65 D1.2 Final DITAS architecture and validation approach
6.3.3 Validation against project objectives
Validating the project objectives is a critical aspect of any project. To ensure the
validation and in order to track and fulfil the project objectives we are using an-
other matrix introduced some sections above, with the title Measurements crite-
ria vs WP. In this matrix we have the project objectives described in the DoA (Sec-
tion 1.1) (DoA, 2016) and we link them to its related work package. The objectives
are also linked with the specific components that meet the requirements plus an
explanation of how we really cover the criteria.
The next figure shows the tracking of the Objective 1 “Improvement of produc-
tivity when developing and deploying data-intensive applications” fulfillment.
Figure 35: Objective 1 fulfillment
In summary, in this sheet we have the objectives described in the DoA, and we
have the related Work Package and the specific component that are in charge
on satisfying them. These components have requirements, so if we validate these
requirements we are validating the project objectives. We can summarize this
validation with the Figure 36.
Figure 36: Validation against project objectives
It’s important to point out that some of these objectives have quantitative met-
rics (e.g., 3.3. Reduction of 10% of the time needed for the transition between
different deployments using DITAS framework compared to traditional ap-
proaches) that will be evaluated by the end-users on test scenarios with DITAS
and without DITAS. The details of this tests will be detailed on D5.4.
© Main editor and other members of the DITAS consortium
66 D1.2 Final DITAS architecture and validation approach
7 Conclusions
This document includes an update to market analysis, update to the business
requirements, detailed project architecture and a detailed plan for verification
and validation, which have all been updated based on the conclusions from the
first phase of the project.
In this document we have updated the market analysis with more focus on fog
computing and on two of the possible markets for DITAS - the e-Health and In-
dustry 4.0 markets. What follows from this analysis is that DITAS has opportunities
in these markets. The updated state of art is now focused on the innovations of
DITAS in the data lifecycle scenarios in a Fog environment: Data Delivery, Data
Management and Data as a Service. It shows that DITAS is indeed innovative,
considering the current state of the art.
The requirements have been revised too and expanded based on new question-
naire feedback and the updated version of the architecture, with an emphasis
on the traceability of the requirements.
We have also updated the architecture with more details based on the conclu-
sions from building the first prototype implementing the core functionalities of the
DITAS platform for milestone MS3. A more advanced blueprint lifecycle, with an
intermediate VDC Blueprint in addition to the abstract and concrete blueprints,
was developed. Moreover, a new DAL layer was added to the VDC for address-
ing privacy concerns in computation movement.
In addition, a more detailed technical verification and validation approach is
described, with an emphasis on requirements traceability.
The next milestone is a mature and final release implementing all modules con-
stituting the DITAS platform according to the architecture described in this doc-
ument - deliverables D2.3, D3.3, D4.3 and D5.3 in M30.
© Main editor and other members of the DITAS consortium
67 D1.2 Final DITAS architecture and validation approach
8 References
Alcaraz Calero, J. M., & Aguado, J. G. (2015). MonPaaS: An Adaptive Monitor-
ing Platform as a Service for Cloud Computing Infrastructures and
Services. IEEE Transactions on Services Computing, 8(1), 65-78.
Al-Doghman, F., Chaczko, Z., & Jiang, J. (2017). A Review of Aggregation
Algorithms for the Internet of Things. 25th International Conference on
Systems Engineering (ICSEng), (pp. 480-487). doi:10.1109/ICSEng.2017.43
Amyot, D., Ghanavati, S., Horkoff, J., Mussbacher, G., Peyton, L., & Yu, E. (2010).
Evaluating goal models within the goal-oriented requirement language.
International Journal of Intelligent Systems, 25(8), 841-877.
Bermbach, D., Pallas, F., García Pérez, D., Plebani, P., Anderson, M., Kat, R., & Tai,
S. (2017). A research perspective on Fog computing. International
Conference on Service-Oriented Computing. Springer.
Bertino, E., & Ferrari, E. (2018). Big Data Security and Privacy. In S. Flesca, S. Greco,
E. Masciari, & S. D, A Comprehensive Guide Through the Italian Database
Research Over the Last 25 Years. Studies in Big Data (Vol. 31). Springer.
Blake, R., & Mangiameli, P. (2011). The Effects and Interactions of Data Quality
and Problem Complexity on Classification. Journal Data and Information
Quality, 2(2).
Bonomi, F., Milito, R., Zhu, J., & Addepalli, S. (2012). Fog computing and its role in
the internet of things. Proceedings of the First Edition of the MCC Workshop
on Mobile Cloud Computing1, (pp. 13-16).
Byers, C. (2017, August). Architectural Imperatives for Fog Computing: Use Cases,
Requirements, and Architectural Techniques for Fog-Enabled IoT
Networks. IEEE Communications Magazine, 55(8), 14-20.
doi:10.1109/MCOM.2017.1600885
Cappiello, C., Pernici, B., Plebani, P., & Vitali, M. (2017). Utility-Driven Data
Management for Data-Intensive Applications in Fog Environments. ER
Workshops, (pp. 216-226).
Chung, L., Nixon, B., Yu, E., & Mylopoulos, J. (2012). Formal reasoning techniques
for goal models. Springer Science & Business Media.
Colombo, P., & Ferrari, E. (2018). Privacy Aware Access Control for Big Data: A
Research Roadmap. In Big Data Research, 2(4), 145-154.
consortium, D. (2018). D5.2 Integration of DITAS and case studies validation
report.
D’Andria, F., Field, D., Aliki, K., Kousiouris, G., Garcia-Perez, D., Pernici, B., &
Plebani, P. (2015). Data Movement in the Internet of Things Domain.
European Conference on Service-Oriented and Cloud Computing.
Service Oriented and Cloud Computing. Lecture Notes in Computer
Science. 9306, pp. 243-252. Springer, Cham.
D1.1. (2017). D1.1 Initial architecture document with market analysis, SotA refresh
and validation approach. DITAS Consortium.
D2.2. (2018). D2.2 DITAS Data Management - Second Release. DITAS Consortium.
© Main editor and other members of the DITAS consortium
68 D1.2 Final DITAS architecture and validation approach
D3.2. (2018). D3.2 Data Virtualization SDK prototype (initial version). DITAS
Consortium.
D5.2. (2018). D5.2 Integration of DITAS and case studies validation report. DITAS
consortium.
Dey, S., & Mukherjee, A. (2018). Implementing Deep Learning and Inferencing on
Fog and Edge Computing Systems. IEEE International Conference on
Pervasive Computing and Communications Workshops (PerCom
Workshops), (pp. 818-823).
DoA. (2016). Description of Action. Research and Innovation Action. N. 7319454,
DITAS. European Commision. .
Doelitzscher, F., Fischer, C., Moskal, D., Reich, C., Knahl, M., & Clarke, N. (2012).
Validating Cloud Infrastructure Changes by Cloud Audits. IEEE Eighth
World Congress on Services (pp. 377-384). HONOLULU: IEEE.
Duy La, Q., Ngo, M. V., Dinh, T. Q., Quek, T. Q., & Shin, H. (2018). Enabling
intelligence in fog computing to achieve energy and latency reduction.
Digital Communications and Networks.
doi:https://doi.org/10.1016/j.dcan.2018.10.008
European Commision. (2018, January 04). eHDSI Mission. Retrieved January 10,
2019, from
https://ec.europa.eu/cefdigital/wiki/display/EHOPERATIONS/eHDSI+Missi
on
European Commision. (2018). Synopsis Report - Consultation: Transformation
Health and Care in the Digital Single Market. European Commission.
European Commission. (2017, May 2). Final results of the European Data Market
study measuring the size and trens of the EU data economy. Retrieved
January 10, 2019, from https://ec.europa.eu/digital-single-
market/en/news/final-results-european-data-market-study-measuring-
size-and-trends-eu-data-economy
European Commission. (2017, October 12). Public Consultation on Health and
Care in the Digital Single Market. Retrieved 01 10, 2019, from
https://ec.europa.eu/digital-single-market/en/news/public-consultation-
health-and-care-digital-single-market
European Commission. (2018, April 25). Communication on enabling the digital
transformation of health and care in the Digital Single Market;
empowering citizens and building a healthier society. Retrieved January
10, 2019, from https://ec.europa.eu/digital-single-
market/en/news/communication-enabling-digital-transformation-health-
and-care-digital-single-market-empowering
European Commission. (2018, April 25). Data in the EU: Commission steps up
efforts to increase availability and boost healthcare data sharing.
(European Commission) Retrieved January 10, 2019, from
http://europa.eu/rapid/press-release_IP-18-3364_en.htm
Even, A., Shankaranarayanan, G., & Berger, P. (2010). Inequality in the utility of
customer data: implications for data management and usage. J.
Database Mark. Custom. Strat. Manag., 17(1), 19-35.
© Main editor and other members of the DITAS consortium
69 D1.2 Final DITAS architecture and validation approach
Even, A., Shankaranarayanan, G., & Berger, P. (2010). Inequality in the utility of
customer data: implications for data management and usage. J.
Database Mark. Custom. Strat. Manag., 17(1), 19-35.
FOG - Fog Computing and Networking Architecture Framework. (2018, 06 14).
IEEE 1934-2018 - IEEE Standard for Adoption of OpenFog Reference
Architecture for Fog Computing. Retrieved from IEEE Standard
Association: https://standards.ieee.org/standard/1934-2018.html
Furukawa, J., Lindell, Y., Nof, A., & Weinstein, O. (2017). High-throughput secure
three-party computa-tion for malicious adversaries and an honest
majority. Annual International Conference on the Theory and Applications
of Cryptographic Techniques. Springer.
Giorgini, P., Mylopoulos, J., Nicchiarelli, E., & Sebastiani, R. (2003). Formal
reasoning techniques for goal models. Journal Data Semantics, 1(1), 1-20.
Halunen, K., & Karinsalo, A. (2017). Measuring the value of pri-vacy and the
efficacy of PETs. In Proceedings of the 11th European Conference on
Software Architecture: Companion Proceedings (ECSA '17) (pp. 132-135).
New York: ACM.
Han, J., Y., J., L., H., P., & J., W. (2017). An Anonymization Method to Improve Data
Utility for Classification. Proceedings of International Symposium on
Cyberspace Safety and Security, (pp. 57-71).
Ho, T., & Pernici, B. (2015). A data-value-driven adaptation framework for energy
effi-ciency for data intensive applications in clouds. 2015 IEEE Conference
on Tech-nologies for Sustainability (SusTech), (pp. 47–52).
Horkoff, J., & Yu, E. (2016). Interactive goal model analysis for early requirements
engineering. equirements Engineering, 21(1), 29-61.
Horkoff, J., Barone, D., Jiang, L., Yu, E., Amyot, D., Borgida, A., & Mylopoulos, J.
(2014). Strategic business modeling: representation and reasoning.
Software & Systems Modeling, 13(3), 1015-1041.
Horkoff, J., Borgida, A., Mylopoulos, J., Barone, D., Jiang, L., Yu, E., & Amyot, D.
(2012). Making data meaningful: The business intelligence model and its
formal semantics in description logics. Proc. of On the Move to Meaningful
Internet Systems, (pp. 700-717).
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E., Spicer,
K., & de Wolf, P. (2012). Statistical Disclosure Control. New York: Wiley.
IIBA. (2009). MoSCoW Analysis. A Guide to the Business Analysis Body of
Knowledge. International Institute of Business Analysis.
Kalyani, G., V. P. Chandra Sekhara Rao, M., & Janakiramaiah, B. (2017). Particle
Swarm Intelligence and Impact Factor-Based Privacy Preserving
Association Rule Mining for Balancing Data Utility and Knowledge Privacy.
Arabian Journal for Science and Engineering, 43.
Katsalis, K., Papaioannou, T., Nikaein, N., & Tassiulas, L. (2016). SLA-driven VM
Scheduling in Mobile Edge Computing. IEEE 9th International Conference
on Cloud Computing.
Khaitzin, E., Shlomo, R., & Anderson, M. (2018). Privacy Enforcement at a Large
Scale for GDPR Compliance. Proceedings of the 11th ACM International
© Main editor and other members of the DITAS consortium
70 D1.2 Final DITAS architecture and validation approach
Systems and Storage Conference (SYSTOR '18) (pp. 124-124). New York:
ACM.
Ko, R., Lee, B., & Pearson, S. (2011). Towards Achieving Accountability, Au-
ditability and Trust in Cloud Computing. In A. Abraham, J. Mauri, J. Buford,
J. Su-zuki, & S. Thampi, Advances in Computing and Communications,
CCIS 193 (pp. 432-444).
Kock, N. (2007). Encyclopedia of E-collaboration. Hershey: Information Science
Reference - Im-print of: IGI Publishing.
Lai, C., Song, D., Hwang, R., & Lai, Y. (2016). A QoS-aware streaming service over
fog computing infrastructures. Digital Media Industry & Academic Forum
(DMIAF), (pp. 94-98). doi:10.1109/DMIAF.2016.7574909
Letier, E., & Van Lamsweerde, A. (2004). Reasoning about partial goal satisfaction
for requirements and design engineering. ACM SIGSOFT Soft. Eng. Notes,
29, 53-62.
Lin, Y., Wu, C., & Tseng, V. (2015). Mining high utility itemsets in big data. In T. Cao,
E. Lim, Z. Zhou, T. Ho, D. Cheung, & H. Motoda, PAKDD 2015. Lecture Notes
in Computer Science (Vol. 9078, pp. 649-661). Springer.
Lodge, T., Crabtree, A., & Brown, A. (2018). Developing GDPR Compliant Apps
for the Edge. ESORICS International Workshop on Data Privacy
Management, Cryptocurrencies and Blockchain Technology (pp. 313-
328). Springer.
Lucky, M., Cremaschi, M., Lodigiani, B., Menolascina, A., & De Paoli, F. (2014).
Enriching API Descriptions by Adding API Profiles Through Semantic
Annotation. Collaborative Systems for Smart Networked Environments.
PRO-VE 2014. IFIP Advances in Information and Communication
Technology. Springer.
Martínez, S., Fouche, A., Gérard, S., & J., C. (2018). Automatic Generation of
Security Compliant (Virtual) Model Views. In Conceptual Modeling. ER
2018. Lecture Notes in Computer Science (Vol. 11157). Springer.
Michelberger, B., Andris, R., Girit, H., & Mutschler, B. (2013). A Literature Survey on
Information Logistics. BIS 2013. Lecture Notes in Business Information
Processing. 157. Berlin, Heidelberg: Springer.
Moody, D., & Walsh, P. (1999). Measuring the value of information: an asset
valuation approach. European Conference on Information Systems.
Mouradian, C., Naboulsi, D., Yangui, S., Glitho, R. H., Morrow, M. J., & Polakos, P.
A. (2018). A Comprehensive Survey on Fog Computing: State-of-the-Art
and Research Challenges. IEEE Communications Surveys & Tutorials, 20(1),
416-464. doi:10.1109/COMST.2017.2771153
Open Grid Forum. (2014). Web Services Agreement Specification (WS-
Agreement). Open Grid Forum.
Pallas, F., & Grambow, M. (2018). Three Tales of Disillusion: Bench-marking
Property Preserving Encryption Schemes. International Conference on
Trust and Privacy in Digital Business. Springer.
Pearson, S., & Mont, M. (2011). Sticky Policies: An Approach for Managing Privacy
across Multiple Parties. IEEE Computer, 44(9), 61-68.
© Main editor and other members of the DITAS consortium
71 D1.2 Final DITAS architecture and validation approach
Petychakis, M., Alvertis, I., Biliri, E., Tsouroplis, R., Lampathaki, F., & Askounis, D.
(2014). Enterprise Collaboration Framework for Managing, Advancing and
Unifying the Functionality of Multiple Cloud-Based Services with the Help
of a Graph API. Collaborative Systems for Smart Networked Environments.
PRO-VE 2014. IFIP Advances in Information and Communication
Technology. Springer.
Pham, V., & Huh, E. (2017). A Fog/Cloud based data delivery model for publish-
subscribe systems. 2017 International Conference on Information
Networking (ICOIN), (pp. 477-479). Da Nang.
doi:10.1109/ICOIN.2017.7899539
Pretschner, A., Hilty, M., & Basin, D. (2006). Distributed Usage Control.
Communications of the ACM, 49(9), 39-44.
Qin, Y., Sheng, Q. Z., Falkner, N. J., Dustdar, S., Wang, H., & Vasilakos, A. V. (2016).
When things matter: A survey on data-centric internet of things. Journal of
Network and Computer Applications, 64, 137-153.
doi:https://doi.org/10.1016/j.jnca.2015.12.016
Sadeghi, A.-R., & Stüble, C. (2004). Property-based Attestation for Computing
Platforms: Caring about properties, not mechanisms. New Security
Paradigms Work-shop.
Salman, O., Elhajj, I., Chehab, A., & Kayssi, A. (2018). IoT survey: An SDN and fog
computing perspective. Computer Networks, 143, 221-246.
doi:https://doi.org/10.1016/j.comnet.2018.07.020.
Sebastiani, R., Giorgini, P., & Mylopoulos, J. (2004). Simple and minimum-cost
satisfiability for goal models. Proc. of Int. Conference on Advanced
Information Systems Engineering (pp. 20-35). Springer.
Shamir, A. (1979). How to share a secret. Communications of the ACM, 612-613.
Sharma, Chatterjee, S., & Sharma, D. (2013). CloudView: Enabling tenants to
monitor and control their cloud instantiations. IFIP/IEEE International
Symposium on Integrated Network Management, (pp. 443-449). GHENT.
Surwase, V. (2016). REST API Modeling Languages -A Developer’s Perspective.
IJSTE - International Journal of Science Technology & Engineering, 2(10).
Syed, M., & Syed, S. (2008). Handbook of Research on Modern Systems Analysis
and Design Technologies and Applications. Hershey: Information Science
Reference - Imprint of: IGI Publishing.
Taleb, T., Dutta, S., Ksentini, A., Iqbal, M., & Flinck, H. (2017). Mobile Edge
Computing Potential in Making Cities Smarter. IEEE Communications
Magazine (pp. 38-44). IEEE.
Thi, Q., Si, T., & Dang, T. (2018). Fine Grained Attribute Based Access Control
Model for Privacy Protection. In T. Dang, R. Wagner, J. Küng, N. Thoai, M.
Takizawa, & E. Neuhold, Future Data and Security Engineering, FDSE 2016.
Lecture Notes in Computer Science (Vol. 10018). Cham: Springer.
Tsouroplis, R., Petychakis, M., Alvertis, I., Biliri, E., Lampathaki, F., & Askounis, D.
(2015). Internet-Based Enterprise Innovation Through a Community-Based
API Builder to Manage APIs. Current Trends in Web Engineering. ICWE 2015.
Lecture Notes in Computer Science. 9396. Springer.
© Main editor and other members of the DITAS consortium
72 D1.2 Final DITAS architecture and validation approach
Ulbricht, M.-R., & Pallas, F. (2018). YaPPL-A Lightweight Privacy Pref-erence
Language for Legally Sufficient and Automated Consent Provision in IoT
Scenarios. Data Privacy Management, Cryptocurrencies and Blockchain
Technology (pp. 329-344). Springer.
Ullah, K. W., Ahmed, A. S., & Ylitalo, J. (2013). Towards Building an Automated
Security Compliance Tool for the Cloud. 12th IEEE International
Conference on Trust, Security and Privacy in Computing and
Communications (pp. 1587-1593). Melbourne: VIC.
Varshney, P., & Simmhan, Y. (2017). Demystifying Fog Computing: Characterizing
Architectures, Applications and Abstractions. IEEE 1st International
Conference on Fog and Edge Computing (ICFEC).
Verma, S., Yadav, A. K., Motwani, D., Raw, R. S., & Singh, H. K. (2016). An efficient
data replication and load balancing technique for fog computing
environment. 2016 3rd International Conference on Computing for
Sustainable Global Development (INDIACom), (pp. 2888-2895). New Delhi.
Vidyasankar, K. (2018). Distributing Computations in Fog Architectures.
Proceedings of the 2018 Workshop on Theory and Practice for Integrated
Cloud, Fog and Edge Computing Paradigms (TOPIC '18) (pp. 3-8). New
York: ACM. doi:https://doi.org/10.1145/3229774.3229775
Wagner, I., & Boiten, E. (2018). Privacy Risk Assessment: From Art to Science.
Metrics.
Wagner, I., & Eckhoff, D. (2018). Technical Privacy Metrics: A Systematic Survey.
ACM Computer Survey, 51(3), 57:1-57:38.
doi:https://doi.org/10.1145/3168389
Wang, J., Zhu, X., Bao, W., & Liu, L. (2016). A utility-aware approach to redundant
data up-load in cooperative mobile cloud. 9th IEEE International
Conference on Cloud Computing, (pp. 384-391). San Francisco.
Werner, S., Pallas, F., & Bermbach, D. (2017). Designing Suitable Access Control
for Web-Connected Smart Home Platforms. International Conference on
Service-Oriented Computing. Springer.
Yin, B., Cheng, Y., Cai, L., & Cao, X. (2017). Online SLA-aware Multi-Resource
Allocation for Deadline Sensitive Jobs in Edge-Clouds. GLOBECOM 2017 -
2017 IEEE Global Communications Conference.
Zaveri, A., Dastgheib, S., Wu, C., Whetzel, T., Verborgh, R., Avillach, P., . . .
Dumontier, M. (2017). smartAPI: Towards a More Intelligent Network of
Web APIs. In: Blomqvist E., Maynard D., Gangemi A., Hoekstra. The
Semantic Web. ESWC 2017. Lecture Notes in Computer Science. 10250.
Springer.
© Main editor and other members of the DITAS consortium
73 D1.2 Final DITAS architecture and validation approach
ANNEX 1: DITAS Business and Technical Requirements
WP1 – Requirement, Architecture and Validation Approach
Technical Requirements
ID T1.1
Requirement Type Non-functional
Source Questionnaire
Priority | Category Must|Performance
Component that fulfils it VDC
Description Reduce and process the data on the Edge/IoT side
before they reach a central location such as the
Cloud
Rationale Processing and reducing data before they are
stored in a central location allows the distribution of
task, the reduction of resources space needed for
the storage and the transmission of data.
Test case / Acceptance
criteria
In the e-Health use case:
As DPO I want that data to be pseudonymized
upon transferring in the hospital group cloud so
that the hospital is compliant to GDPR
Time Frame Report on period 2.
ID T1.2
Requirement Type Functional
Source Questionnaire
Priority | Category Must|Interoperability
Component that fulfils it VDC
Description Harmonize the data coming from different data
sources (data heterogeneity)
Rationale There is an increasing need to develop data-inten-
sive applications able to manage data coming
from heterogeneous sources
Test case / Acceptance
criteria
Develop a method that consumes data from two
data sources
Write a simple blueprint for that method
Deploy a VDC from that blueprint
Using a tool like Postman call that method
Check that indeed data from the two data
sources is returned
© Main editor and other members of the DITAS consortium
74 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 1
ID T1.3
Requirement Type Functional
Source Questionnaire
Priority | Category Must|Security
Component that fulfils it DAL, Policy Enforcement Engine
Description Respect privacy requirements (such as GDPR com-
pliance) in data movement transactions and data
access
Rationale The GDPR imposes strict limitations on how to man-
age personal data. The DITAS platform must respect
such limitations in order to ensure its customers that
the service offered is compliant and they won’t run
into the severe consequences specified by the law.
Test case / Acceptance
criteria
In the e-Health use case:
Researcher accessing the data cannot access
personal patient data
Researcher accessing the data in the public
cloud, after the data has been moved from the
private cloud to the public cloud, gets the data
encrypted
Time Frame Report period 2
ID T1.4
Requirement Type Functional
Source Questionnaire
Priority | Category Must|Interoperability
Component that fulfils it CAF
Description Simplify the exposed data to third party users/clients
Rationale DITAS aims at enabling data-intensive application
developers to focus only on the business logic of the
application
Test case / Acceptance
criteria
Using Node-RED implement a method that re-
trieves and combines data from two data sources
Measure the time expended implementing that
method
Implement a script that consumes that method
Measure the time expended developing that
script
© Main editor and other members of the DITAS consortium
75 D1.2 Final DITAS architecture and validation approach
The difference between the two computed times
is (roughly) the time saved by the developer
Time Frame Report period 2
ID T1.5
Requirement Type Functional
Source Questionnaire
Priority | Category Must|Interoperability
Component that fulfils it CAF, DAL
Description Simplify the data access from Fog and Cloud
Rationale The simplification of the usage of the DITAS platform
will help the application developers and, therefore
decrease its, already steep, learning curve.
Test case / Acceptance
criteria
Using Node-RED implement a method that re-
trieves and combines data from two data sources:
one at the edge and another one in the cloud
Measure the time expended implementing that
method
Implement a script that consumes that method
Measure the time expended developing that
script
The difference between the two computed times
is (roughly) the time saved by the developer
Time Frame Report period 2
ID T1.6
Requirement Type Non-functional
Source Questionnaire
Priority | Category Should|Interoperability
Component that fulfils it Abstract Blueprint, SLA Manager
Description Have a flexible agreement between data provider
and consumer (e.g. latency < 100 ms while availa-
bility > 99.999%)
Rationale DITAS provides agreements about the quality of
data between the data provider and the data user,
the SLA system in place will verify that those agree-
ments are met. This is performed via constant moni-
toring of the agreement
© Main editor and other members of the DITAS consortium
76 D1.2 Final DITAS architecture and validation approach
Test case / Acceptance
criteria
Given a blueprint with QoS constraints and an ab-
stract properties goal tree, verify upon a VDC de-
ployment that an SLA is created in the SLA Manager
for every method defined in the blueprint
Time Frame Report period 1
ID T1.7
Requirement Type Functional
Source Questionnaire
Priority | Category Must|Interoperability
Component that fulfils it Concrete Blueprint, Application Requirements
Description Have the possibility to express data quality con-
straints
Rationale Data intensive applications need not only to access
data but also to access data whose quality reflect
their requirements
Test case / Acceptance
criteria
As a DITAS application designer I want to specify
quality metrics on the data the application will use,
for the selection of the blueprint.
Time Frame Report period 2
ID T1.8
Requirement Type Non-functional
Source Questionnaire
Priority | Category Should|Security
Component that fulfils it VDC Request Monitor
Description Have the possibility to monitor how data is provi-
sioned or consumed
Rationale Data handled by DITAS can contain sensitive infor-
mation, which requires to keep track of access,
movement of that data (Auditing)
Test case / Acceptance
criteria
As DITAS operator I want to get metrics on requests
coming to and going out from a VDC, so that I can
monitor how data is provisioned or consumed
Time Frame Report period 2
ID T1.9
Requirement Type Non-functional
Source Questionnaire
© Main editor and other members of the DITAS consortium
77 D1.2 Final DITAS architecture and validation approach
Priority | Category Should|Security
Component that fulfils it Monitoring System
Description keep track of the data transformations occurring
during the data movement
Rationale As data can be sensitive it is imperative that
changes to the data is tracked. With this tracking an
audit of data changes is possible
Test case / Acceptance
criteria
As DITAS operator, in the e-Health use case, when
data is moved, I can look at the records of the trans-
formations
Time Frame Report period 2
WP2 - Enhanced data management
ID T2.1
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Performance
Component that fulfils it Abstract Blueprint
Description Metadata describing the data sources must be
available. This enables the computation of the data
utility matching application developer require-
ments and data source capabilities.
Rationale Meta-data describing the content of a data source
and its non-functional properties are essential for
computing both the Potential Data Utility and the
Data Utility of a data source according to the ap-
plication developer needs.
Test case / Acceptance
criteria
When the data administrator creates the blueprint, then the datasource must be characterised with
metadata about the general characteristics, the
QoS, the QoD, deployment process and API ac-
cess.
Time Frame Report period 1
ID T2.2
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it Abstract Blueprint
© Main editor and other members of the DITAS consortium
78 D1.2 Final DITAS architecture and validation approach
Description A sample set might be obtainable from the data
source for evaluating the quality of the matching
between a data source and application require-
ments.
Rationale Sample data gathered from a data source accord-
ing to the application developer needs are used for
a validation of the proposed match between the
application developer requirements and the data
source features at design time.
Test case / Acceptance
criteria
When the data administrator creates the blueprint, then the datasource section of the blueprint must
contains a reference to a file that contains a repre-
sentative sample of the dataset of the datasources.
Time Frame Report period 2
ID T2.3
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability/Maintainability
Component that fulfils it Abstract Blueprint, DUE
Description Metadata describing the potential data utility of a
data source and the data utility of the matching
between application requirements and data
source are saved in as metadata
Rationale Data Utility metadata can be used for selecting the
proper data source for an application developer
request and to monitor the trends in data utility for
a running application in order to detect possible is-
sues and to react to inefficiencies
Test case / Acceptance
criteria
When the data administrator submits the blueprint,
then the data quality of the data source will be an-
alysed and saved in the abstract blueprint
Time Frame Report period 2
ID T2.4
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Functional
Component that fulfils it Abstract Blueprint
Description Functional dependencies and constraints on data
must be specified.
© Main editor and other members of the DITAS consortium
79 D1.2 Final DITAS architecture and validation approach
Rationale This requirement allows to calculate the data qual-
ity.
Test case / Acceptance
criteria
When the data administrator creates the blueprint,
then the datasource must be characterised with
such rules, if they exist.
Time Frame Report period 2
ID T2.5
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Functional
Component that fulfils it Abstract Blueprint
Description Metadata describing the input required on each
method must be available. Such information com-
prehends the input required for each method to be
available and the type of data (i.e., plain data, en-
crypted, pseudonymized, anonymised, etc).
Rationale Metadata on the input of methods will be used to
understand which portion of data are needed by
the VDC and then can be moved.
Test case / Acceptance
criteria
When the data administrator creates the blueprint,
then, for each method, the input in terms of column
for each data source used, and the type of data
must be specified
Time Frame Report period 2
ID T2.6
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Functional
Component that fulfils it Abstract Blueprint, Concrete Blueprint
Description The list of computational and storage resources that
are made available by the data administrator and
the application designer, must be available.
Rationale The list of resources will be used for the data and
computation movement.
Test case / Acceptance
criteria
When the data administrator creates the blueprint,
then the list of resources that he shares with the plat-
form must be specified.
© Main editor and other members of the DITAS consortium
80 D1.2 Final DITAS architecture and validation approach
When the application designer selects a blueprint,
then he must specify the list of resources that he
shares with the platform.
Time Frame Report period 2
ID T2.7
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Functional
Component that fulfils it DAL
Description Metadata describing the amount of resources used
by a datasource (in terms of CPUs, space, memory,
etc) must be available.
Rationale Such information will be used to understand if a
datasource can be moved in a new node.
Test case / Acceptance
criteria
When the DS4M will call the DAL, then it will provide the information about the infor-
mation on the data sources.
Time Frame Report period 2
ID T2.8
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Functional
Component that fulfils it Concrete Blueprint
Description The method selected by the application designer
should be memorised and known after the deploy-
ment of the VDC.
Rationale in order to perform a computation movement, the
DS4M will compare the resources used by the
method with the target resource where the VDC will
be moved. in order to do this, the DS4M needs to
know which method, among the ones offered by
the VDC, is being used by the application.
Test case / Acceptance
criteria
When the VDC is deployed, then the information on
the method that has been selected by the user
must be injected in the blueprint.
Time Frame Report period 2
© Main editor and other members of the DITAS consortium
81 D1.2 Final DITAS architecture and validation approach
ID T2.9
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Functional
Component that fulfils it Data Analytics
Description Metadata describing the amount of resources used
by a node inside the DITAS cluster (in terms of CPUs,
space, memory, etc) must be available.
Rationale Such information will be used to understand if a
datasource can be moved in a new node.
Test case / Acceptance
criteria
When the DS4M will call the data analytics, then the
data analytics will provide the information about
the information on the data sources.
Time Frame Report period 2
ID T2.10
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Must|Security
Component that fulfils it VDC Request Monitor
Description Metadata regarding incoming and outgoing re-
quests as well as regarding available encryption of
these requests must be available.
Rationale such information will be used to make decisions
about data/computation movement and for re-
porting to the auditing and compliance framework.
Test case / Acceptance
criteria
When a request to a VDC is made, then information
about the request time, method, status-code be-
comes available in the monitoring database. Fur-
thermore, even for an SSL request a minimal set of
monitoring information is collected in the monitor-
ing database.
Time Frame Report period 2
ID T2.11
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Security
Component that fulfils it Abstract Blueprint
© Main editor and other members of the DITAS consortium
82 D1.2 Final DITAS architecture and validation approach
Description Metadata about privacy guarantees of a VDC
must be available.
Rationale Policy enforcement engine should be deployed
and configured for the VDCs using such privacy
guarantees.
Test case / Acceptance
criteria
When the data administrator creates the blueprint,
then, for each method, he can define the privacy
guarantees in the abstract blueprint.
Time Frame Report period 1
WP3 - Data virtualization
Business Requirements
ID B3.1
Requirement Type Non-functional
Source Questionnaire
Priority | Category Must|Performance
Component that fulfils it Abstract Blueprint
Description A variety of multiple different “modes” could be de-
scribed in the VDC Blueprint.
Rationale Achieve simpler and more economical manage-
ment of the resources.
Test case / Acceptance
criteria
Data Administrator will be able to describe
the different modes in the Abstract Blueprint
Time Frame Report period 1
ID B3.2
Requirement Type Non-functional
Source Questionnaire
Priority | Category Must|Performance
Component that fulfils it Abstract Blueprint
Description VDC Blueprint must be able to describe and handle
different performance factors such as information
about energy consumption and energy efficiency
of the component and architecture.
Rationale The VDC Blueprint should consider and handle dif-
ferent performance factors.
Test case / Acceptance
criteria
Data Administrator will be able to describe
the different energy modes in their services
in the Abstract Blueprint.
© Main editor and other members of the DITAS consortium
83 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 1
ID B3.3
Requirement Type Functional
Source Questionnaire
Priority | Category Must|OpenSource
Component that fulfils it Abstract Blueprint
Description Should have an open API in order for big ventors
and also new providers to be able to publish their
services and components.
Rationale The VDC Blueprint should be easy to understand
and use.
Test case / Acceptance
criteria
Extended the openAPI specification which is a ho-
mogenous standardised solution for describing the
REST services.
Time Frame Report period 1
ID B3.4
Requirement Type Non-functional
Source Questionnaire
Priority | Category Must|OpenSource
Component that fulfils it Abstract Blueprint
Description A documentation that describes each method and
each attribute must be included.
Rationale The VDC Blueprint should be easy to understand
and use.
Test case / Acceptance
criteria
Use of GitHub for version control and documenting
all the components of Abstract Blueprint.
Time Frame Report period 1
ID B3.5
Requirement Type Non-functional
Source Questionnaire
Priority | Category Must|OpenSource/Extensibility
Component that fulfils it Abstract Blueprint
Description Be open to be used for alternative solutions that
may arise in the future.
© Main editor and other members of the DITAS consortium
84 D1.2 Final DITAS architecture and validation approach
Rationale The VDC Blueprint should be easy to understand
and use.
Test case / Acceptance
criteria
Blueprint schema is highly extensible and is able to
encapsulate also another blueprint inside
Time Frame Report period 1
ID B3.6
Requirement Type Non-functional
Source Questionnaire
Priority | Category Should|Extensibility
Component that fulfils it Abstract Blueprint
Description Be open to be used for alternative architectures
and dynamic systems.
Rationale The VDC Blueprint should be architectural agnostic.
Test case / Acceptance
criteria
VDC is able to be deployed in different systems with
different computational capabilities
(Embedded Devices, X64 machines)
Time Frame Report period 1
ID B3.7
Requirement Type Non-functional
Source Questionnaire
Priority | Category Could|Extensibility
Component that fulfils it Abstract Blueprint
Description Be able to use/reuse “on the fly” different VDC Blue-
prints from different repositories.
Rationale The VDC Blueprint should be architectural agnostic.
Test case / Acceptance
criteria
Blueprint schema is highly extensible and is able to
encapsulate also another blueprint inside
Time Frame Report period 1
Technical Requirements
ID T3.1
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
© Main editor and other members of the DITAS consortium
85 D1.2 Final DITAS architecture and validation approach
Component that fulfils it Abstract Blueprint
Description The VDC must be data-source independent.
Rationale The structure of the metadata format as also the
way that the metadata are saved must be orches-
trated in a way that every future candidate File Sys-
tem will be supported as a data source.
Test case / Acceptance
criteria
Abstract Blueprint contains information about the
data sources, whose data the VDC exposes, where
the data sources might be of different types, e.g.
parquet files consumed using S3 API, DBMS tables
consumed using JDBC, that enable the VDC to con-
nect to these various data sources.
Time Frame Report period 1
ID T3.2
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Maintainability
Component that fulfils it PSE
Description Catch errors and security incompatibilities/vulnera-
bilities before the production/deployment.
Rationale The VDC should be evaluated for any security risk
before the actual deployment in order to avoid any
breach on the Data Exposed.
Test case / Acceptance
criteria
Security and Privacy Meta-Model and Privacy Se-
curity Evaluator Service allow for pre-deployment fil-
tering.
Time Frame Report period 1
ID T3.3
Requirement Type Functional
Source DITAS Analysis
Priority | Category Should|Maintainability/Security
Component that fulfils it Abstract Blueprint
Description VDC Schema should be able to describe not only
capabilities of the VDC but also describe the pro-
cesses for deployment.
Rationale The structure of the VDC Schema must not only be
declarative but also imperative, providing the
© Main editor and other members of the DITAS consortium
86 D1.2 Final DITAS architecture and validation approach
essentials to DITAS Platform in order to understand
what you want to happen step by step.
Test case / Acceptance
criteria
Abstract Blueprint contain information about the
configuration and orchestration for deploying the
VDC
Time Frame Report period 1
ID T3.4
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Availability
Component that fulfils it Deployment Engine
Description Candidate resolution should be automatically
tested, reviews built and deployed after the deploy-
ment of the VDC
Rationale The processes for the deployment phase should be
transparent to the end user.
Test case / Acceptance
criteria • Define two blueprints with one method each,
one with a right implementation of the method
and a second one with a wrong implementa-
tion of the same.
• Deploy the first blueprint: no warning message
should be received from the Deployment En-
gine • Deploy the second one: Deployment Engine
must warn about the non-availability of the
method and stop the deployment
Time Frame Report period 2
ID T3.5
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Performance
Component that fulfils it Abstract Blueprint
Description Take advantage of pre-built community-based
blueprints (parts or components)
Rationale The expandability and re-usability of VDC compo-
nents and blueprints is a key factor for the simplicity
of the VDC file.
Test case / Acceptance
criteria
Blueprint schema is highly extensible and is able to
encapsulate also another blueprint inside
© Main editor and other members of the DITAS consortium
87 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 1
ID T3.6
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Maintainability/ Extensibility
Component that fulfils it Abstract Blueprint
Description Implicitly support disaster recovery and business
continuity characteristics to VDC Blueprint.
Rationale The DITAS Platform should provide business continu-
ity to the application service model.
Test case / Acceptance
criteria
Using the Blueprint Schema data administrators can
describe the services and provide information
about how the services can handle disaster recov-
ery
Time Frame Report period 1
ID T3.7
Requirement Type Functional
Source DITAS Analysis
Priority | Category Should|Maintainability
Component that fulfils it Abstract Blueprint
Description Monitor the infrastructure upgrade offerings of the
same provider and blueprint and seamlessly up-
date the VDC without expensive rewrites.
Rationale Upgrades of the same blueprint should not stop the
runtime process.
Test case / Acceptance
criteria
Abstract Blueprint is consisted of multiple sections
that can easily change and will not affect the run-
ning instances of a VDC Blueprint
Time Frame Report period 1
ID T3.8
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Structural
Component that fulfils it Abstract Blueprint
© Main editor and other members of the DITAS consortium
88 D1.2 Final DITAS architecture and validation approach
Description A multi-modal language should specify the differ-
ent components.
Rationale Several modes (components) will create a single ar-
tifact. Could collaborate with ISO/IEC 19506 Stand-
ard called Knowledge Discovery Meta-Model
(KDM), which involves existing software systems by
insuring interoperability and exchange of data be-
tween tools provided by different vendors.
Test case / Acceptance
criteria
Develop two types of VDCs, using different pro-
gramming languages
Time Frame Report Period 1
ID T3.9
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Must|Structural
Component that fulfils it Abstract Blueprint
Description The notation language should follow the semi-struc-
tured format.
Rationale The notation language should be fast, accurate
and user friendly and have a semi structured format
in order to be able to describe different aspects of
heterogeneous data sources.
Test case / Acceptance
criteria
Abstract Blueprint Schema is created using JSON
semi structured markup language.
Time Frame Report period 1
ID T3.10
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Extensibility
Component that fulfils it Abstract Blueprint, RE
Description The parsing mechanism/resolution should be able
to handle multiple different resolution tasks and be
able to scale.
Rationale VDC Resolution process should be distributed into
services that assess the different blueprint sections.
In this way if a section of the blueprint is changed
this will not affect the complete resolution process.
© Main editor and other members of the DITAS consortium
89 D1.2 Final DITAS architecture and validation approach
Test case / Acceptance
criteria
Abstract Blueprint is consisted of multiple sections
that are accessed by different distributed resolution
services
Time Frame Report period 1
ID T3.11
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Must|Structural/ Optimization
Component that fulfils it Abstract Blueprint
Description The notation language should be able to be parsed
efficiently and fast.
Rationale The notation language should be fast, accurate
and user friendly.
Test case / Acceptance
criteria
Abstract Blueprint Schema is created using JSON
semi structured markup language
Time Frame Report period 1
ID T3.12
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Must|Structural
Component that fulfils it Abstract Blueprint
Description The notation language must be human readable,
easy to script and to understand.
Rationale The notation language should hide the complexity
of the architecture and components and provide
also a nice overview of the specific component.
Test case / Acceptance
criteria
Abstract Blueprint Schema is created using JSON
semi structured notation language
Time Frame Report period 1
ID T3.13
Requirement Type Functional
Source DITAS Analysis
Priority | Category Would|Compatibility
Component that fulfils it Abstract Blueprint
© Main editor and other members of the DITAS consortium
90 D1.2 Final DITAS architecture and validation approach
Description The VDC blueprint should be open enough for new
structural changes and be able to handle en-
crypted entries except of plain text.
Rationale The expandability of the VDC description Schema is
crucial.
Test case / Acceptance
criteria
Blueprint schema is highly extensible and described
is a modularized way that allows the creation of
new sections.
Time Frame Report period 1
ID T3.14
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Must|Compatibility
Component that fulfils it Abstract Blueprint
Description The notation language should be able to be parsed
by multiple different programming languages.
Rationale Should be able to be programmable understood
and used from multiple programming families.
Test case / Acceptance
criteria
Abstract Blueprint Schema is created using JSON
semi structured notation language
Time Frame Report period 1
ID T3.15
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Extensibility/ Interoperability
Component that fulfils it CAF, DAL
Description Make the access to data in the Cloud, edge and
fog transparent to the application, overcoming lim-
itations and notions such as running location, loca-
tion of data, bandwidth.
Rationale Application developers should be able to run their
application without having to consider where the
DITAS platform decides to put the computation or
the data and the developer shouldn’t rewrite it
when fog topology changes.
Test case / Acceptance
criteria
Application can access data that is deployed in the
cloud and in the fog. It continues to get data ex-
posed by the VDC even after either data or com-
putation movement have occurred.
© Main editor and other members of the DITAS consortium
91 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 2
ID T3.16
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it DUR
Description The Data Utility Refiner has to weight the goal model
specified in the application requirements accord-
ing to the type of application that will use the data.
Rationale The importance of each non-functional require-
ment varies based on the application that will use
the data source. E.g., in streaming applications, la-
tency is generally more important than reliability.
Test case / Acceptance
criteria
When the DURE invokes the DUR, then the DUR
should weigh the goal model defined by the appli-
cation developer according to the application
type.
Time Frame Report period 2
ID T3.17
Requirement Type Functional
Source DITAS Analysis
Priority | Category Should|Extensibility
Component that fulfils it DUR
Description To address new application types, the Data Utility
Refiner should easily allow new algorithms and
weighing schemes to be introduced.
Rationale As new types of applications are introduced (e.g.,
big data and machine learning techniques), exist-
ing weighing schemes may not be adequate for
them. Hence the need for new schemes.
Test case / Acceptance
criteria
When a new application type is introduced, then
the DUR should allow the definition of a new
weighting scheme.
Time Frame Report period 2
ID T3.18
Requirement Type Functional
© Main editor and other members of the DITAS consortium
92 D1.2 Final DITAS architecture and validation approach
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it DUE@VDM
Description The Data Utility Evaluator has to update potential
data utility values based on the actual usage of the
exposed data source
Rationale As data utility, and in particular data quality attrib-
utes, change based on the characteristics of the
data that are requested (e.g, only a subset of all the
available data is requested), their assessment has
to be made per-request.
Test case / Acceptance
criteria
When the DURE invokes the DUE, then the DUE up-
date potential data utility values if the request con-
sider a subset of attributes provided by a specific
method.
Time Frame Report period 2
ID T3.19
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it PSE
Description The Privacy and Security Evaluator must determine
if and how well privacy and security attributes fit the
application developer requirements
Rationale Data sources that do not fulfill requirements on pri-
vacy and security should be discarded. Also, data
sources that offer significantly better security and/or
privacy than others should be favored.
Test case / Acceptance
criteria
When the DURE invokes the PSE, then the PSE must
filter out and reasonably rank blueprints according
to the specified security and privacy requirements
Time Frame Report period 2
ID T3.20
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Functional
Component that fulfils it Application Requirements
Description Application requirements must specify constraints
on non-functional requirements with a goal model.
© Main editor and other members of the DITAS consortium
93 D1.2 Final DITAS architecture and validation approach
Rationale In order to properly filter and rank the blueprints,
specific applications requirements should be taken
under consideration in the resolution process.
Test case / Acceptance
criteria
When the application developer specifies the ap-
plication requirement, then the requirements must
be expressed with a goal model.
Time Frame Report period 2
ID T3.21
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it RE
Description The Resolution Engine must provide to the Data Util-
ity Resolution Engine the non-functional require-
ments as specified by the application developer.
Rationale It is important for the individual resolution services to
be able to communicate and send and retrieve the
appropriate data essential for each process.
Test case / Acceptance
criteria
Test the interoperability and communication pay-
load of the services. When the RE invokes the DURE,
then the RE must provide the non-functional re-
quirements as specified by the application devel-
oper
Time Frame Report period 2
ID T3.22
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it RE
Description The Resolution Engine must provide to the Data Util-
ity Resolution Engine non-functional attributes re-
lated to each blueprint that needs to be ranked
and filtered.
Rationale It is important for the individual resolution services to
be able to communicate and send and retrieve the
appropriate data essential for each process.
Test case / Acceptance
criteria
Test the interoperability and communication pay-
load of the services. When the RE invokes the DURE,
© Main editor and other members of the DITAS consortium
94 D1.2 Final DITAS architecture and validation approach
then the RE must provide a list of blueprints to filter
and rank, together with non-functional attributes.
Time Frame Report period 2
WP4 - Execution environment
Business Requirements
ID B4.1
Requirement Type Functional
Source Questionnaire
Priority | Category Would|Maintainability
Component that fulfils it Deployment Engine
Description The DITAS Platform should be able to rebuild or
move the entire production infrastructure from bare
metal.
Rationale DITAS aims to be a set of tools that are able to be
installed in different Cloud and Edge providers. It
needs to support several of those providers in a very
abstract way.
Test case / Acceptance
criteria
Test that Deployment Engine is able to deploy at
least to two different Cloud providers. This will
demonstrate that the architecture is modular
enough not be Cloud provider dependent.
Time Frame Report period 2
ID B4.2
Requirement Type Functional
Source Questionnaire
Priority | Category Would|Extensibility
Component that fulfils it All
Description The DITAS Platform components should have source
control repository.
Rationale Should be a single point of search for the compo-
nents (source) code in case that are needed for
testing and upgrading.
Test case / Acceptance
criteria
Verify that all DITAS components are hosted in a
source control version repository.
Time Frame Report period 2
ID B4.3
© Main editor and other members of the DITAS consortium
95 D1.2 Final DITAS architecture and validation approach
Requirement Type Non-functional
Source Questionnaire
Priority | Category Could|Performance
Component that fulfils it Deployment Engine
Description The DITAS Platform should be able to deploy the
components on time.
Rationale The deployment delay should not last long as that
may disturb the end user.
Test case / Acceptance
criteria
In collaboration with the two use cases of the pro-
ject, verify that DITAS Deployment Engine is able to
deploy DITAS platform in a reasonable time, also,
verify that adaptations actions are not delayed due
to Deployment Engine performance.
Time Frame Report period 2
Technical Requirements
ID T4.1
Requirement Type Non-functional
Source Questionnaire
Priority | Category Would|Maintainability
Component that fulfils it Deployment Engine
Description The DITAS Platform should be able to data backup
in minutes.
Rationale Upgrades of the same blueprint should not stop the
runtime process.
Test case / Acceptance
criteria
Test that the deployment engine is able to perform
a backup or running VDCs and VDMs.
Time Frame Report period 2
ID T4.2
Requirement Type Non-functional
Source Questionnaire
Priority | Category Would|Performance
Component that fulfils it Deployment Engine
Description Run the VDCs to isolated independent environ-
ments at once for benchmarking.
Rationale VDC should be optimized and checked for the per-
formance and their matrices.
© Main editor and other members of the DITAS consortium
96 D1.2 Final DITAS architecture and validation approach
Test case / Acceptance
criteria
Test that Deployment Engine is able to create a
VDC/VDM independent deployment of other in-
stances running at the same time.
Time Frame Report period 2
ID T4.3
Requirement Type Non-functional
Source Questionnaire
Priority | Category Would|Performance
Component that fulfils it Deployment Engine and VDC/VDM components
Description Run VDC “equivalent” for architecture backup sce-
narios or for fault tolerance at architectural level.
Rationale VDC should be optimized and checked for the per-
formance and their matrices.
Test case / Acceptance
criteria
Test that both VDM and Deployment Engine are
able to keep a synchronous copy of the VDC. The
VDC components should also be aware of this to
allow fault tolerance.
Time Frame Report period 2
ID T4.4
Requirement Type Functional
Source DITAS Analysis
Priority | Category Should|Extensibility
Component that fulfils it VDC Components
Description The DITAS SLA Manager must be able to run on Edge
and on Cloud independently.
Rationale A DITAS application can run both one Edge and on
Cloud and this component should be able to run at
both levels even with limited functionality.
Test case / Acceptance
criteria
when the VDC is moved to an edge device, then the limited resources must not influence nega-
tively the performance of the VDC components.
Time Frame Report period 1
ID T4.5
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
© Main editor and other members of the DITAS consortium
97 D1.2 Final DITAS architecture and validation approach
Component that fulfils it SLA Manager
Description The DITAS SLA Manager will offer an API for configu-
ration and QoS definition.
Rationale The Decision System for Data Movement needs an
API to configure it for a given application.
Test case / Acceptance
criteria
When the VDC is deployed, then the SLA Manager
will receive in input the blueprint of the VDC.
Time Frame Report period 1
ID T4.6
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it SLA Manager
Description The SLA Manager will notify via a system of any vio-
lations of the rules that trigger a data movement
action.
Rationale Since several subsystems of DITAS will need to react
to this situation, notification via queue subsystem
looks like the best option for scalability.
Test case / Acceptance
criteria
Test that SLA Manager is able to notify violations to
the queue subsystem.
Time Frame Report period 1
ID T4.7
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it SLA Manager
Description The SLA Manager will notify via a system of any vio-
lations of the rules that trigger a data movement
action.
Rationale Since several subsystems of DITAS will need to react
to this situation, notification via queue subsystem
looks like the best option for scalability.
Test case / Acceptance
criteria
Test that SLA Manager is able to notify violations to
the queue subsystem.
Time Frame Report period 2
© Main editor and other members of the DITAS consortium
98 D1.2 Final DITAS architecture and validation approach
ID T4.8
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it Computation movement enactor
Description Being able to create application both at the Edge
and Cloud.
Rationale To move computation tasks between the Edge and
the Cloud or vice versa it is necessary to being able
to create application containers in both environ-
ments.
Test case / Acceptance
criteria
Test that the Computation Movement Enactor is
able to create containers both at Edge and
Cloud.
Time Frame Report period 1
ID T4.9
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it Computation movement enactor
Description Being able to add Spark nodes.
Rationale To move Spark computation between the Edge
and the Cloud or vice versa it is necessary to being
able to create Spark computation nodes both on
the Edge and the Cloud and assign applications to
them, so step by step the Spark scheduler can cre-
ate resources there.
Test case / Acceptance
criteria
Test that it is possible to federate spark nodes be-
tween Edge and Cloud and deployment of Spark
nodes can be created in each location.
Time Frame Report period 1
ID T4.10
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it Data movement enactor
Description The Data Movement Enactor will offer an API to al-
low the decision system to instruct it on which data
© Main editor and other members of the DITAS consortium
99 D1.2 Final DITAS architecture and validation approach
movement to be execute, the target and the
source, and the transformation on the data to be
executed.
Rationale The decision system, once decided where to move
data, will communicate its decision to the data
movement enactor.
Test case / Acceptance
criteria
When the DS4M decides that a data movement
has to be enacted, then it will call the DME using the
API that it exposes, specifying where to move the
datasource, which transformation to use.
Time Frame Report period 2
ID T4.11
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it Computational movement enactor
Description The Computation Movement Enactor will offer an
API to allow the decision system to instruct it on
which computation movement to be execute: the
target and the source of the movement
Rationale The decision system, once decided where to move
computation, will communicate its decision to the
computation movement enactor.
Test case / Acceptance
criteria
When the DS4M decides that a computation move-
ment has to be enacted, then it will call the CME
using the API that it exposes, specifying where to
move the VDC.
Time Frame Report period 2
ID T4.12
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Should|Security
Component that fulfils it All components
Description The API of the Decision System for data and com-
putation Movement should be available only to au-
thorized users
Rationale This is to avoid external (malicious) user to send false
violation on the system to trigger unwanted
data/computation movements.
© Main editor and other members of the DITAS consortium
100 D1.2 Final DITAS architecture and validation approach
Test case / Acceptance
criteria
When a user that is not authenticated tries to call
the DS4M, then the DS4M will ignore the call.
Time Frame Report period 2
ID T4.13
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|
Component that fulfils it Data analytics
Description Metadata describing the resources that are made
available by the data administrator and the appli-
cation designer must be available. Such metadata
must describe the resources available (in terms of
space, memory, CPUs, etc) the location and the
type of data that can be memorized.
Rationale This information allows to do a computation move-
ment and to understand if a resource can execute
the method of the vdc that is moved
Test case / Acceptance
criteria
when the DS4M receives a Violation from the SLA
manager, then it needs to know how much re-
source a method of a VDC is using, therefore it calls
the data analytics that it returns the amount of re-
sources (in terms of CPU, RAM, storage space) used
by the method.
Time Frame Report period 2
ID T4.14
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Must|Security
Component that fulfils it VDC-Throughput-Agent
Description Metadata describing the behavior of the VDC as
well as the movement of data/computation must
be available.
Rationale The aggregated data is send to the monitoring da-
tabase used by the auditing and compliance
framework.
Test case / Acceptance
criteria
when a VDC component performs network opera-
tions, the aggregated usage gets stored in the mon-
itoring database.
Time Frame Report period 2
© Main editor and other members of the DITAS consortium
101 D1.2 Final DITAS architecture and validation approach
ID T4.15
Requirement Type Non-functional
Source DITAS Analysis
Priority | Category Must|Security
Component that fulfils it VDC-Logging-Agent
Description Metrics and measurements regarding various sys-
tem qualities must be available to make decisions
about violations of requirements and SLAs.
Rationale Data is used to trigger data/computation move-
ment and as part of the auditing and compliance
framework.
Test case / Acceptance
criteria
when a VDC component requests monitoring data
to be stored, the data will eventually be stored in
the monitoring database.
Time Frame Report period 2
ID T4.16
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Security
Component that fulfils it Policy Enforcement Engine, DAL
Description The VDC should expose data that is compliant with
privacy policies, if privacy attributes are defined for
the VDC.
Rationale
Test case / Acceptance
criteria
VDC with privacy attributes exposes only data that
is compliant with privacy policies.
Time Frame Report period 2
ID T4.17
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it DURE
Description The metrics and the goal model should be well
formed in the concrete blueprint.
Rationale The creation of the blueprint is central for the com-
munication between module. in particular, the
© Main editor and other members of the DITAS consortium
102 D1.2 Final DITAS architecture and validation approach
parts that define the goal model and the data qual-
ity metrics are needed by the DS4M and SLA man-
ager to check the violations and decide where and
when to move data and computation.
Test case / Acceptance
criteria
Sections about the data utility and the abstract
properties of the concrete blueprint should be well
formed.
Time Frame Report period 2
ID T4.18
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it DS4M
Description The DS4M must provide an interface to collect the
violations detected by the SLA manager.
Rationale The DS4M will expose and API interface, used by the
SLA manager, to collect the violation of user re-
quirements.
Test case / Acceptance
criteria
When a violation of a requirement happens, the
SLA manager will be able to correctly send the vio-
lation data to the DS4M.
Time Frame Report period 2
ID T4.19
Requirement Type Functional
Source DITAS Analysis
Priority | Category Must|Interoperability
Component that fulfils it DS4M
Description The DS4M will create a well formed JSON file with
the data and computation movement to be en-
acted and deliver it respectively to the data move-
ment enactor and computation movement enac-
tor.
Rationale The definition of how and where to move data and
computation is stored in two JSON files that are sent
respectively to the data movement enactor and
the computation movement enactor. Such file
must be well formed in order to allow the move-
ment component to understand where to move the
resources.
© Main editor and other members of the DITAS consortium
103 D1.2 Final DITAS architecture and validation approach
Test case / Acceptance
criteria
Test that the DS4M is able to create both files that
are according to spec. Test that the computation
movement enactor and data movement enactor
are able to understand them and act on them.
Time Frame Report period 2
WP5 - Real world case studies and integration
IDEKO Use Case Requirements
ID EU1.F1
Requirement Type Functional
Priority | Category Must|N/A
Description As data owner I want the framework to enable data
sources description mechanisms. So that I can de-
scribe and publish my data sources to application
developers.
Test case / Acceptance
criteria
Check the completion of the Blueprint Repository
Time Frame Report period 2
ID EU1.F2
Requirement Type Functional
Priority | Category Must|Flexibility
Description As data owner, I want the framework to provide a
mechanism to offer rich data-methods to develop-
ers. So that they don't have to build standard que-
ries as if they were connecting to the datasource
directly.
Test case / Acceptance
criteria
Check if the VDC architecture provides a flexible
mechanism to manage queries and data.
Time Frame Report period 2
ID EU1.F3
Requirement Type Functional
Priority | Category Must|Flexibility
Description As data owner, I want the rich method to be able
to query multiple sources before getting back the
data to the user. So that the user doesn't have to
call several methods to get data back from differ-
ent sources.
© Main editor and other members of the DITAS consortium
104 D1.2 Final DITAS architecture and validation approach
Test case / Acceptance
criteria
Check if the VDC allows querying several sources
from a single method.
Time Frame Report period 2
ID EU1.F4
Requirement Type Functional
Priority | Category Must|Flexibility
Description As data owner, I want the rich methods to be able
to perform data transformation. So that I can pro-
vide users with computed data instead on simple
raw-data.
Test case / Acceptance
criteria
Check if the VDC allows data transformation when
defining the methods.
Time Frame Report period 2
ID EU1.F5
Requirement Type Non-functional
Priority | Category Must|Security
Description As data owner, I want to be able to publish my data
sources without the need of exposing nothing more
than necessary. So that my internal infrastructure
stays safe.
Test case / Acceptance
criteria
Check if the data access interface, provides a
mechanism to avoid external access to local net-
work.
Time Frame Report period 2
ID EU1.F6
Requirement Type Functional
Priority | Category Must|Flexibility
Description As data owner, I want to be able to query different
type of data sources. So I can publish different types
of data source I own.
Test case / Acceptance
criteria
Check if the VDC allows querying different data
sources.
Time Frame Report period 2
ID EU1.F7
Requirement Type Non-functional
© Main editor and other members of the DITAS consortium
105 D1.2 Final DITAS architecture and validation approach
Priority | Category Must|Affordability
Description As application designer, I want the framework to be
able to be deployed actual company's resources.
So that I don't have to purchase additional re-
sources or to pay a fee to DITAS.
Test case / Acceptance
criteria
Check project business models and architecture.
Time Frame Report period 2
ID EU1.F8
Requirement Type Non-functional
Priority | Category Must|Performance
Description As application developer, I want the framework to
be smart enough to perform computation (data
transformation, etc.) in the agreed time (SLA) over-
coming incidents. So that the developer has only to
focus on business logic.
Test case / Acceptance
criteria
Check whether the framework applies mechanisms
to avoid possible SLA breaches regarding compu-
tation.
Time Frame Report period 2
ID EU1.F9
Requirement Type Non-functional
Priority | Category Must|Performance
Description As application developer, I want the framework to
provide the needed data under the parameters
defined in the SLA. So that I don't have to implement any extra mech-
anism to fulfil the application data needs.
Test case / Acceptance
criteria
Check whether the framework applies mechanisms
to avoid possible SLA breaches regarding data
needs.
Time Frame Report period 2
ID EU1.F10
Requirement Type Functional
Priority | Category Must|Availability
© Main editor and other members of the DITAS consortium
106 D1.2 Final DITAS architecture and validation approach
Description As application designer, I want the application to
be tolerant to server failures. So that the developer
has only to focus on business logic.
Test case / Acceptance
criteria
Simulate a server failure by shooting down a server. Check if the Kubernetes takes over the situation by
deploying new services.
Time Frame Report period 2
ID EU1.F11
Requirement Type Functional
Priority | Category Must|Simplicity
Description As a developer, I want the framework to ease data
gathering from data sources. So that I don't have to
manage all the connections and queries manually.
Test case / Acceptance
criteria
Check if the VDC allows querying several sources
from a single method.
Time Frame Report period 2
ID EU1.F12
Requirement Type Non-functional
Priority | Category Must|Interoperability
Description As a developer, I want the framework to provide a
simple interface to get data from the data sources.
So that I can use standard mechanisms to get the
data.
Test case / Acceptance
criteria
Check if the CAF is based on some of the widely
used communication mechanism, like REST.
Time Frame Report period 2
ID EU1.F13
Requirement Type Functional
Priority | Category Must|Simplicity
Description As a developer, I want the framework to enable
multi-source querying in a single call. So that I don't
have to make sequential calls to get back the
needed data.
Test case / Acceptance
criteria
Check if the VDC retrieves data from different data
sources from the same method.
Time Frame Report period 2
© Main editor and other members of the DITAS consortium
107 D1.2 Final DITAS architecture and validation approach
ID EU1.F14
Requirement Type Functional
Priority | Category Must|Flexibility
Description As a developer, I want the framework to enable
data transformation mechanisms to get trans-
formed data in a single call. So that I don't have to
perform data transformation on application side.
Test case / Acceptance
criteria
Check if the VDC allows transforming data from
within the method itself.
Time Frame Report period 2
IDEKO Use Case Application level Requirements
ID EU1.UC1
Requirement Type Functional
Priority | Category Must|Performance
Description As application designer, I want the framework to be
able to fallback to a different datasource when the
primary one fails. So that the error probability of a
data call decreases.
Test case / Acceptance
criteria
Check if the VDC allows the data administrator to
create fallback mechanisms.
Time Frame Report period 2
ID EU1.UC2
Requirement Type Functional
Priority | Category Must|Performance
Description As application developer, I want the framework to
serve streaming data for the machines within 2 sec-
onds since the data is created. So that I can per-
form analytics in a very short window.
Test case / Acceptance
criteria
Consume a method that provides a stream. Check the timestamp of the data collected. Compare with the timestamp in the database.
Time Frame Report period 2
ID EU1.UC3
Requirement Type Functional
Priority | Category Must|Performance
© Main editor and other members of the DITAS consortium
108 D1.2 Final DITAS architecture and validation approach
Description As application developer, i want a VDC method to
provide machine diagnostic results in near real
time. So that I can act in consequence.
Test case / Acceptance
criteria
Use the machine simulator to introduce an anom-
aly. Record the timestamp when the anomaly is intro-
duced. Check when the application detects the anomaly. Record the timestamp. Check if the difference is less than two seconds.
Time Frame Report period 2
OSR Use Case Requirements
ID EU2.F1
Requirement Type Functional
Priority | Category Must|Accuracy
Description As a researcher, I want to obtain accurate data
so that I can correctly perform my study.
Test case / Acceptance
criteria
Perform multiple queries and test if the average ac-
curacies is ≥ 98%.
Time Frame Report period 2
ID EU2.F2
Requirement Type Non-functional
Priority | Category Must|Completeness
Description As a researcher, I want to obtain complete data so
that I can correctly perform my study.
Test case / Acceptance
criteria
Perform multiple queries and test if the average
completeness is ≥ 80%.
Time Frame Report period 2
ID EU2.F3
Requirement Type Non-functional
Priority | Category Must|Scalability
Description As head of research, I want to interact with a scal-
able system to accommodate all requests coming
from my researchers so that they can conduct dif-
ferent researches at once.
Test case / Acceptance
criteria
Test if limit = 99% and gain = 1.5x
© Main editor and other members of the DITAS consortium
109 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 2
ID EU2.F4
Requirement Type Non-functional
Priority | Category Must|Security
Description As a researcher, I want to obtain accurate data
so that I can correctly perform my study.
Test case / Acceptance
criteria
Manual inspection.
Time Frame Report period 2
ID EU2.F5
Requirement Type Non-functional
Priority | Category Must|Security
Description As a researcher, I want to access minimized data so
that I know I am compliant with GDPR
Test case / Acceptance
criteria
Manual inspection.
Time Frame Report period 2
ID EU2.F6
Requirement Type Non-functional
Priority | Category Must|Security
Description As an internal researcher, I want to access pseu-
donymized data so that I know I am compliant with
GDPR.
Test case / Acceptance
criteria
Manual inspection.
Time Frame Report period 2
ID EU2.F7
Requirement Type Non-functional
Priority | Category Must|Security
Description As DPO, I want data to be encrypted when trans-
ferred to a co-holder so that I know the hospital is
compliant with GDPR.
Test case / Acceptance
criteria
Manual inspection.
© Main editor and other members of the DITAS consortium
110 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 2
ID EU2.F8
Requirement Type Non-Functional
Priority | Category Must|Availability
Description As a doctor, I want data to be readily accessible so
that I can access them when I need them the most
during an emergency.
Test case / Acceptance
criteria
Check if availability is ≥ 99.9999%
Time Frame Report period 2
ID EU2.F9
Requirement Type Non-Functional
Priority | Category Must|Performance
Description As a doctor, I want fast processing of data so that I
do not slow down operations during an emergency.
Test case / Acceptance
criteria
Check if processing time is < 1s
Time Frame Report period 2
ID EU2.F10
Requirement Type Non-Functional
Priority | Category Must|Security
Description As a doctor, I want data to be re-identified upon
retrieval so that I can access all information availa-
ble related to the emergency at hand.
Test case / Acceptance
criteria
Manual inspection.
Time Frame Report period 2
ID EU2.F11
Requirement Type Non-Functional
Priority | Category Must|Security
Description As DPO, I want that data to be pseudonymized
upon transferring in the hospital group cloud so that
the hospital is compliant to GDPR.
Test case / Acceptance
criteria
Manual inspection.
© Main editor and other members of the DITAS consortium
111 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 2
ID EU2.F12
Requirement Type Non-Functional
Priority | Category Must|Security
Description As DPO, I want that data to be encrypted upon
transferring in the hospital group cloud so that the
hospital is compliant to GDPR.
Test case / Acceptance
criteria
Manual inspection.
Time Frame Report period 2
OSR Use Case Application level Requirements
ID EU2.UC1
Requirement Type Functional
Priority | Category Must|Functional
Description As a user, I want to log into the system so that I can
start using the platform.
Test case / Acceptance
criteria
Test if login is successful for existing accounts.
Test if login fails for non-existing accounts.
Time Frame Report period 2
ID EU2.UC2
Requirement Type Functional
Priority | Category Would|Security
Description As a user, I want - upon login, to be able to ask for
a recover password email, so that I can access
again the system if I do not remember my creden-
tials.
Test case / Acceptance
criteria
Test if a recovery email is successfully sent to the
mailbox of an existing account.
Test if the user is prompted with an error if the mail-
box is not associated with an existing account.
Time Frame Report period 2
ID EU2.UC3
Requirement Type Functional
Priority | Category Must|Functional
© Main editor and other members of the DITAS consortium
112 D1.2 Final DITAS architecture and validation approach
Description As a user, I want to log into the system so that I can
start using the platform.
Test case / Acceptance
criteria
Test if login is successful for existing accounts.
Test if login fails for non-existing accounts.
Time Frame Report period 2
ID EU2.UC4
Requirement Type Functional
Priority | Category Should|Functional
Description As a user, I want to see my profile so that I can re-
view my personal information.
Test case / Acceptance
criteria
Test that each personal information is present in the
profile page.
Time Frame Report period 2
ID EU2.UC5
Requirement Type Functional
Priority | Category Could|Security
Description As a user I want to change the password so that I
can enforce the security of my account.
Test case / Acceptance
criteria
Test if after changing the password it is not possible
to log in with the old one.
Test if after changing the password it is possible to
log in with the new password.
Time Frame Report period 2
ID EU2.UC6
Requirement Type Functional
Priority | Category Must|Functional
Description As a researcher, I want to query the system for the
average values extracted from available blood
tests (by component name and age range) so that
I can start my research.
Test case / Acceptance
criteria
Search for an existing component (e.g., Fibrinogen).
Search for an existing component in a loose age
range (e.g., 0-110).
Search for an existing component (e.g., Fibrinogen)
in a wrong age range (e.g., 120-200).
Search for a non-existing component.
© Main editor and other members of the DITAS consortium
113 D1.2 Final DITAS architecture and validation approach
Time Frame Report period 2
ID EU2.UC7
Requirement Type Functional
Priority | Category Must|Functional
Description As a researcher, I want to query the system for gen-
der, age, BMI, and cholesterol of patients that had
(resp. did not have) a stroke so that I can start my
research.
Test case / Acceptance
criteria
Search for patients that did not have a stroke.
Search for patients that had a stroke.
Time Frame Report period 2
ID EU2.UC8
Requirement Type Functional
Priority | Category Must|Functional
Description As researcher, I want to download the result set of
a query so that I can process it off-line.
Test case / Acceptance
criteria
Test if a given data view result can be downloaded
via a link.
Time Frame Report period 2
ID EU2.UC9
Requirement Type Functional
Priority | Category Should|Functional
Description As researcher, I want to bookmark the query of a
data view so that I can easily execute them again.
Test case / Acceptance
criteria
Formulate a query. Bookmark it. Check if it appears
in the list of favorite queries.
Time Frame Report period 2
ID EU2.UC10
Requirement Type Functional
Priority | Category Should|Functional
Description As a researcher, I want to remove from my book-
marks a query so that I can stop tracking outdated
queries.
© Main editor and other members of the DITAS consortium
114 D1.2 Final DITAS architecture and validation approach
Test case / Acceptance
criteria
Go to the list of favorite queries. Remove one, see if
it does not appear anymore.
Time Frame Report period 2
ID EU2.UC11
Requirement Type Functional
Priority | Category Would|Maintainability
Description As a researcher, I want to receive a notification
email when new data related to my favorite queries
are available so that i am informed on the availa-
bility of novel data.
Test case / Acceptance
criteria
Trigger the event signaling the presence of new
data. Check for the presence of the email.
Time Frame Report period 2
ID EU2.UC12
Requirement Type Functional
Priority | Category Must|Functional
Description As a user, I want to log into the system so that I can
start using the platform.
Test case / Acceptance
criteria
Test if login is successful for existing accounts.
Test if login fails for non-existing accounts.
Time Frame Report period 2
ID EU2.UC13
Requirement Type Functional
Priority | Category Must|Security
Description As a user, I want to log out of the system so that I
can safely close the working session.
Test case / Acceptance
criteria
Test if logout is successful (i.e., if it is not possible to
access restricted areas after logout).
Time Frame Report period 2
ID EU2.UC14
Requirement Type Functional
Priority | Category Must|Functional
Description As a doctor, I want to retrieve all the information
available in the EHR for the specified patient (by SSN
© Main editor and other members of the DITAS consortium
115 D1.2 Final DITAS architecture and validation approach
and time period) so that I can go through his past
exams to better address the emergency.
Test case / Acceptance
criteria
Search for an existing patient.
Search for an existing patient in a given time period.
Search for a non-existing patient.
Time Frame Report period 2
ID EU2.UC15
Requirement Type Functional
Priority | Category Could|Functional
Description As a doctor, I want to select two blood tests of a
given patient and compare them visually so that I
can better understand what changed from one to
the other.
Test case / Acceptance
criteria
Select two exams which differ from some known
fields. Check if the differences are highlighted.
Time Frame Report period 2
ID EU2.UC16
Requirement Type Functional
Priority | Category Would|Functional
Description As a doctor, I want to select two medical images
for the same type of exam, of a given patient, and
compare them visually so that I can better under-
stand what changed from one to the other.
Test case / Acceptance
criteria
Select two images with known differences. Check if
they are visually highlighted.
Time Frame Report period 2
Objective to WP Traceability Matrix
Objec-
tive
Criteria WPs Specific
Components
Objective 1. Improvement of productivity when developing and deploying
data-intensive applications
1.1 Adoption of the framework for the develop-
ment of 4 applications relevant in the
adopted case studies.
WP5 All the appli-
cations to be
developed
by the end
users.
© Main editor and other members of the DITAS consortium
116 D1.2 Final DITAS architecture and validation approach
1.2 Ability to be connected with the main SBC
devices (i.e., Raspberry PI, Odroid), as well as
the main operating systems (i.e., Android OS,
Linux, Windows).
WP3 check if also
WP4 is
needed
1.3 Integration with 3 different data stores, com-
ing from different worlds (SQL, NoSQL data
stores, CEP systems) using the common defi-
nition of VDC that will provide the necessary
abstraction.
WP3 DAL
Objective 2. Enhancing the data management in mixed cloud/fog environments adding data and computation movement
2.1 Definition and implementation of 5 diverse
modes to transform/transmit data for the
movement.
WP4 Data Move-
ment Enactor
2.2 Definition and implementation of 5 diverse
modes to deploy/reconfigure tasks in mixed
federated cloud/fog environments.
WP3
WP4
Deployment
Engine
2.3 Improvement of more than 10% of the ob-
served latency when using a diverse mode
for data
WP4 DS4M, Data
Movement
Enactor
2.4 Improvement of more than 10% of the ob-
served latency when using a different de-
ployment configuration for more than 5 of
the use case functionalities.
WP3
WP4
DS4M, De-
ployment En-
gine
2.5 Ability to extract the same information in less
than 10% of the observed latency using less
than 90% of data for more than 2 of the use
case functionalities.
WP3
WP4
DS4M, De-
ployment En-
gine
Objective 3. Definition of strategies for improving the execution of data-inten-sive applications
3.1 Definition of a set of 20 indicators able to
measure aspects related to non-functional
aspects such as performance, security, data
quality, and so on.
WP2,
WP3
Blueprint,
goal-based
model
3.2 Ability to estimate the effects of the enact-
ment data and computation movement
techniques with an error lower than 15% with
respect to the real effects measured a-poste-
riori.
WP4 DS4M
3.3 Reduction of 10% of the time needed for the
transition between different deployments us-
ing DITAS framework compared to traditional
approaches.
WP3
WP4
Deployment
Engine
Objective 4. Enabling the execution of data-intensive applications in a mixed cloud/fog environment
4.1 Ability to run all the 4 different applications
designed according to the DITAS framework.
WP5 All compo-
nents
© Main editor and other members of the DITAS consortium
117 D1.2 Final DITAS architecture and validation approach
4.2 Reduction of 10% of the response time of ap-
plications running on DITAS framework with
respect to executions that do not exploit the
facilities provided by the project.
WP5 All compo-
nents
4.3 Ability to limit the overhead of the monitoring
system for an application to less than 10% of
the resource usage.
WP4 Data Analyt-
ics + monitor-
ing
Objective 5. Maximise the impact on business
5.1 Creation of visibility, engaging and nurturing
actions with targeted audiences including
scientific community, research other fellow
projects and initiatives, open source and in-
dustry communities.
WP6,
Rest
of
WPs
Website, All
components
5.2 Demonstrate suitability and value with case
studies and leverage promotion as Demon-
strators to support impact creation.
WP6,
WP5,
Rest
of
WPs
Case studies
applications,
All compo-
nents
5.3 Impact created with knowledge transfer; in-
dividual exploitation, innovation and/or com-
mercialization from every partner.
WP6,
Rest
of
WPs
Not compo-
nents but
every partner
is involved.
5.4 Establishment of DITAS sustainability body
with the involvement of as many organiza-
tions as possible
WP6,
Rest
of
WPs
Not compo-
nents but
every partner
is involved.
© Main editor and other members of the DITAS consortium
118 D1.2 Final DITAS architecture and validation approach
ANNEX 2: DITAS Components
Virtual Data Container
Component name: Data utility evaluator (DUE@VDC)
Description: The data utility metrics defined in the abstract blueprint, defines
the utility of data of the data sources when the data was included for the first
time in DITAS. However, the content of data sources changes over time and,
therefore, the data quality metrics must be updated. The DUE@VDC collects
the data utility of the results of the method provided by the VDC and sends the
results to the DUE@VDM.
Inputs: output of methods Input mechanism: REST
Outputs: data utility metrics Output mechanism: REST
Implementation language (if code): python
Requirements: T1.7 Storage: N/A
Component name: SLA Manager
Description: Every method defined in the abstract blueprint will provide a series
of Service Level Indicators (SLI), for example, average response time for queries,
availability of the underlying datasource or ratio of errors found during those
same queries. This SLIs will be used to form Service Level Objectives (SLO), that
is a combination of guarantees about SLI values that should be fulfilled. For
example, we might get an SLO that says that a particular method will provide
results in less than 1 second on average and that it will only produce 1 errors for
every 1000 queries. With this SLOs which are defined in the blueprint itself, the
SLA Manager compose a Service Level Agreement (SLA) per method. The SLA
is a guarantee that all SLOs previously defined will be maintained during the
execution of the VDC. To validate it, the SLA Manager relies in the Data Ana-
lytics component, asking for SLI values and checking them against their SLOs.
In case some of them are not fulfilled, it will inform of a violation to the Decision
System for Data Movement, passing the broken SLO and SLI values that pro-
duced the violation.
Inputs: Blueprint, SLAs Input mechanism: File, REST
Outputs: JSON Output mechanism: REST
Implementation language (if code): Go
Requirements: T1.7 Storage: MongoDB
© Main editor and other members of the DITAS consortium
119 D1.2 Final DITAS architecture and validation approach
Component name: Computation Movement Enactor
Description: When some QoS constraints are not fulfilled and data movement
is not the best option, it might be necessary to move computation units across
resources. For example, in case the response time of a query is quite high due
to the latency between the VDC and the final user, data movement might not
be enough to achieve the desired goal because the request must travel to a
different cluster than the one holding the data, potentially far away from the
final user. The DS4M component will take this into account and it will inform the
Computation Movement Enactor about the need to move computation units
between clusters. Once the order is received, the Computation Movement En-
actor will execute the actual movement of computation units.
Inputs: Blueprint, JSON Input mechanism: File, REST
Outputs: JSON Output mechanism: REST
Implementation language (if code): Go
Requirements: Storage:
Component Name: Computation Movement Enactor
Description: When some QoS constraints are not fulfilled, it might be necessary
to move computation units across resources. For example, in case the response
time of a query is quite high due to the latency between the VDC and the final
user, data movement might not be enough to achieve the desired goal be-
cause the request must travel to a different cluster than the one holding the
data, potentially far away from the final user. The DS4M component will take
this into account and it will inform the Computation Movement Enactor about
the need to move computation units between clusters. Once the order is re-
ceived, the Computation Movement Enactor will execute the actual move-
ment of computation units.
Implementation will start in the second period.
Component name: Data analytics
Description: aggregates additional information, generated by the operation of
different DITAS components. Provides an interface to query the various data
sources that comprise this information and does additional processing and re-
fining where necessary. Its queries integrate key QoS metrics used in the oper-
ation of other components such as the SLA manager and Decision system for
data movement.
Inputs: REST query Input mechanism: REST
© Main editor and other members of the DITAS consortium
120 D1.2 Final DITAS architecture and validation approach
Outputs: REST query answer Output mechanism: REST
Implementation language (if code): python, swagger
Requirements: docker container for api Storage: 3GB for ELK +10 Gb for ES db
Component name: Data movement enactor
Description: it enacts the actions of data movement across locations by con-
suming the API of the storage layer. It will copy and maintain synchronicity of
data between edge and cloud servers/instances; making the data available
closer to where it’s needed. inside a VDC.
Inputs: REST query Input mechanism: REST
Outputs: REST query answer Output mechanism: REST
Implementation language (if code): no
Requirements: Storage:
Component name: Log analysis service
Description: Components and data-sources running in different clusters log
their activity providing information about how well they are executing their
tasks and how their internals are working. Analyzing these logs is a basic activity
that system administrators do for both debugging and preventing problems but
the amount of data in them is usually huge, which makes it impossible for a
single operator to be able to spot problems just by looking at them. That’s why
tools like Logstash were developed. It enables operators to define rules for ag-
gregating entries in logs and provide summaries of information which are easier
to look at in order to find current or future problems. The log analysis service will
use one of those tools to aggregate log information that can be used in auto-
matic or supervised processed to ensure the resources available to VDCs and
users are working as expected.
Inputs: JSON Input mechanism: REST
Outputs: JSON Output mechanism: REST
Implementation language (if code):
Requirements: Storage:
© Main editor and other members of the DITAS consortium
121 D1.2 Final DITAS architecture and validation approach
Component name: Request-Monitor
Description: The VDC Request Monitor is one of the monitoring sidecars used
to observe the behavior of a VDC within the DITAS project. The agent acts as
an ingress controller to the VDC and observes any incoming and outgoing
HTTP/HTTPS traffic. It can be instructed to add SSL encryption between the VDC
and a Client using ether the Let’s Encrypt Protocol or a self-signed certificate.
It also records metadata about the request traffic and reports it to Elas-
ticsearch for later analysis. The monitor also adds open-tracing headers to
each incoming request, therefore enabling the use of tracing systems like Zip-
kin.
Inputs: HTTP Request Input mechanism: HTTP
Outputs: HTTP Response, traffic metadata, open-
tracing headers Output mechanism: HTTP
Implementation language (if code): Go
Requirements: ElasticSearch Storage: local files, Elas-
ticSearch
Component name: Throughput Agent
Description: The VDC Throughput Agent is one of the monitoring sidecars used
to observe the behavior of VDCs within the DITAS project. The agent observes
all incoming and outgoing requests from a VDC by observing the underlying
socket layer. The data is aggregated over time and send to the monitoring
database. This agent acts as a passive monitoring sidecar to the VDC and is
therefore independent of the concrete VDC implementation.
Inputs: Blueprint Input mechanism:
Outputs: Network usage statistics Output mechanism: ElasticSearch
Implementation language (if code): Go
Requirements: Linux Sockets, Elas-
ticSearch Storage: ElasticSearch
Component name: Logging Agent
Description: The VDC Logging agent is a small software service to enable a
VDC to transmit metrics and instrumentation information to the DITAS platform.
The agent offers an interface to each VDC, which enables access to the mon-
itoring and tracing databases without requiring the VDC to included specific
© Main editor and other members of the DITAS consortium
122 D1.2 Final DITAS architecture and validation approach
dependencies for these services. The agent is intended to run in the same con-
tainer as the VDC and is compiled with static libraries, allowing it to be de-
ployed in any Unix-like environment.
Inputs: Rest and local Filesystem Input mechanism: REST
Outputs: Aggregated log data Output mechanism: ElasticSearch
Implementation language (if code): Go
Requirements: Storage: ElasticSearch
Component name: DAL
Description: part of the VDC, which is in charge of simplifying the connection
between the VDC data processing layer and the data sources. DAL is always
deployed in the same security and privacy realm of the data source made
available by the data administrator and it provides the required connectivity
of the data source to the VDC processing while enforcing privacy policies.
Inputs: protobuf of data query Input mechanism:
gRPC
Outputs: protobuf of data content Output mecha-
nism: gRPC
Implementation language (if code): Scala
Requirements: computation resources for Spark near the
data sources; privacy enforcement engine Storage:
Component name: Privacy Enforcement Engine
Description: It acts as a proxy before executing queries over the data in data
sources. It rewrites the query so that it accesses only data compliant with pri-
vacy policies.
The Enforcement Engine is a sidecar of the VDC, and data access policies,
access purposes and data subject consents are defined in Data Policy and
Consent Manager (DPCM). The VDC that is implemented with Apache Spark
uses the enforcement engine to transform Spark SQL queries to queries that
return only compliant data.
Inputs: SQL query, access purpose Input mechanism: REST API
© Main editor and other members of the DITAS consortium
123 D1.2 Final DITAS architecture and validation approach
Outputs: Rewritten SQL Query, addi-
tional data sources with privacy in-
formation
Output mechanism: REST API response
Implementation language (if code): Scala
Requirements: policies and consent
should be provided by DPCM (exter-
nal component)
Storage: Intermediate representation
saved near data as additional ta-
bles/files in the data store
Virtual Data Manager
Component name: Data utility evaluator (DUE@VDM)
Description: The DUE@VDM collects the information retrieved by all the
DUE@VDC linked to the VDM and it aggregates at runtime the quality of data
of the overall VDC in order to update the abstract blueprint related, for future
usage. Such update is optional and not automated, its usage depends on the
business plan and the ownership of the data sources.
Inputs: Data from all the DUE@VDC Input mechanism: REST
Outputs: updates abstract blueprint Output mechanism: blueprint repository
call
Implementation language (if code): Python
Requirements: T1.7, T3.18 Storage: N/A
Component name: Decision system for data and computation movement
(DS4M)
Description: DITAS will guarantee the satisfaction of application designer’s re-
quirements by moving data and tasks. DS4M is the reasoning system that will
decide the best data or computation movement to enact.
Inputs: Violations of user requirements specified in
concrete blueprint Input mechanism: REST
Outputs: data or computation movement Output mechanism:
REST
Implementation language (if code): Java
Requirements: T4.18, T4.19 Storage:
© Main editor and other members of the DITAS consortium
124 D1.2 Final DITAS architecture and validation approach
DITAS SDK
Component name: Data utility evaluator (DUE)
Description: provides information about the data utility according to the user
request in a given time. Data utility is dynamic and can change over time ac-
cording to the platform and to the application requirements. It integrates the
Potential Data Utility Service (PDUS) and the Sample Data Generator defined
in D1.1.
Inputs: blueprint Input mechanism: REST
Outputs: Blueprint Output mechanism: REST
Implementation language (if code): Python
Requirements: T1.7 Storage: N/A
Component name: Data Utility Resolution Engine (DURE)
Description: In order to help the application designer to select the best blue-
print, DITAS will order them based on the fit of the blueprint on the requirements
he/she specified. DURE plays a central role in these functionalities since it filters
and ranks a list of blueprints based on the application requirements.
Inputs: list of blueprints and application require-
ments Input mechanism: REST
Outputs: ordered list of blueprints Output mechanism: REST
Implementation language (if code): node.js
Requirements: T4.17 Storage: N/A
Component name: VDC Blueprint Repository
Description: This is a repository where all the abstract VDC blueprints are stored.
Inputs: Interacts with other components or DITAS
roles through the Repository Engine REST interface Input mechanism: N/A
Outputs: N/A Output mechanism: N/A
Implementation language (if code):
© Main editor and other members of the DITAS consortium
125 D1.2 Final DITAS architecture and validation approach
Requirements: N/A Storage: Document-ori-
ented database (Mon-
goDB)
Component name: VDC Blueprint Repository Engine
Description: The Repository Engine provides CRUD operations in the VDC Blue-
print Repository. Indicatively, it enables the data administrator to store, update
or delete his/her abstract VDC blueprint and the Resolution Engine to retrieve
blueprints from the Repository.
Inputs: Receives HTTP requests from other components or DI-
TAS roles (e.g. Resolution Engine or data administrator) Input mecha-
nism: REST
Outputs: Depending on the HTTP request method Output mecha-
nism: REST
Implementation language (if code): Java
Requirements: N/A Storage: N/A
Component name: VDC Blueprint Validator
Description: This is a subcomponent of the Repository Engine and its goal is to
enforce that the inserted or updated abstract blueprints are valid before they
end up in the Repository, according to the current abstract VDC blueprint
schema and other logic requirements. For invalid blueprints, the Validator pro-
vides descriptive and helpful error messages to the data administrator.
Inputs: N/A Input mechanism: N/A
Outputs: N/A Output mechanism: N/A
Implementation language (if code): Java
Requirements: Abstract VDC blueprint schema Storage: N/A
Component name: Resolution Engine
Description: Resolution engine is the component responsible for filtering and
ranking the VDC Abstract blueprints in the Blueprint Repository based on the
User Requirements. It enables the Application designer to find the most appro-
priate Blueprint based on the needs of the Application in terms of content, QoS,
QoD and Security features and Privacy regulations.
© Main editor and other members of the DITAS consortium
126 D1.2 Final DITAS architecture and validation approach
Inputs: User Requirement JSON file, DURE
output Input mechanism: REST
Outputs: Best Candidate VDC Blueprint Output mechanism: REST
Implementation language (if code): Java, Spring Framework
Requirements: DURE output Storage: Elasticsearch Database
Component name: Privacy Security Evaluator
Description: The PSE is the component responsible for filtering and ranking the
Abstract blueprints based on the security and privacy requirements of the User.
It is part of the Resolution Engine pipeline and dependent of the DURE. It’s de-
signed as microservice and therefore independently scalable and managea-
ble form the DURE and Resolution Engine.
Inputs: DURE Request (Blueprint, Subset of User Re-
quirements) Input mechanism: REST
Outputs: Ranked List of Blueprints Output mechanism:
REST
Implementation language (if code): Java, Spring Framework
Requirements: DURE Storage: N/A
© Main editor and other members of the DITAS consortium
127 D1.2 Final DITAS architecture and validation approach
ANNEX 3: DITAS Technical Questionnaire
Thank you for filling in the DITAS technical questionnaire.
Nowadays, there is an increasing need to develop Data-Intensive Applications
(DIAs), able to manage more and more amounts of data coming from distrib-
uted and heterogeneous sources, such as IoT sensors, devices or mobile appli-
cations, in an effective, quick and secure manner.
The goal of DITAS is to propose a cloud platform to support information logistic
for DIAs where data processing does not occur only on cloud resources but also
on devices at the edge of the network. To achieve this goal, DITAS project is im-
plementing data and computation movement strategies. According to these
strategies, DITAS platform decides where, when, and how to save data – on the
cloud or on the edge of the network – and where, when, and how to compute
part of the tasks composing the application. Therefore, it creates a synergy be-
tween traditional and cloud approaches, in order to find a good balance be-
tween reliability, security, sustainability, and cost.
The purpose of this questionnaire is twofold. Firstly, to rank the nine following basic
requirements that we have identified, in order to direct the design and the im-
plementation of the DITAS architecture and components accordingly. Secondly
to rank the nine following parameters, so that we have a better understanding
of which issues to take under consideration and with which priority, while devel-
oping the aforementioned data and computation movement strategies.
Please rank the following requirements (1 corresponds to the most important, 9
to the less one):
Requirement Rank
R1: Reduce and process the data on the Edge/IoT side before they
reach a central location such as the Cloud
R2: Harmonize the data coming from different data sources (data het-
erogeneity)
R3: Respect privacy requirements (such as GDPR compliance) in data
movement transactions and data access
R4: Simplify the exposed data to third party users/clients
© Main editor and other members of the DITAS consortium
128 D1.2 Final DITAS architecture and validation approach
R5: Simplify the data access from Fog and Cloud
R6: Have a flexible agreement between data provider and consumer
(e.g. latency < 100 ms while availability > 99.999%)
R7: Have the possibility to express in SLA data quality constraints
R8: Have the possibility to monitor how data is provisioned or con-
sumed
R9: keep track of the data transformations occurring during the data
movement
Please rank the following parameters (1 corresponds to the most important, 9 to
the less one) to drive the data and computation movement:
Parameter Rank
P1: Data Quality
P2: Reputation of the Data Source
P3: Availability
P4: Latency, Response Time, Throughput
© Main editor and other members of the DITAS consortium
129 D1.2 Final DITAS architecture and validation approach
P5: Encryption
P6: Purpose Control
P7: Data Access Policy
P8: Authentication
P9: Access Monitoring
Based on the answers that we have collected, the following charts depict the
average rank per requirement and per parameter respectively:
Figure 37. Average rank per requirement for technical questionnaire
© Main editor and other members of the DITAS consortium
130 D1.2 Final DITAS architecture and validation approach
Figure 38. Average rank per parameter for technical questionnaire
© Main editor and other members of the DITAS consortium
131 D1.2 Final DITAS architecture and validation approach
ANNEX 4: DITAS market context questionnaire
Below it is included the questionnaire template used to carry out the survey and
interviews for the DITAS market context analysis.
Introduction to the interview
Thank you for participating in this survey. The purpose of this interview is to ask
you several questions to help us to get to know better our potential stakeholders
and their needs.
The interview should not last more than 30-40 minutes and will consist of three
steps:
Firstly, we need to know about your profile and some aspects of your company.
Consider that we are not going to use your professional or company data, we
just need to categorize surveyed and their companies, so the survey is anony-
mous.
Secondly, you will find some market-oriented questions that we would like you to
answer with your vision, the context of your company and needs and experi-
ence.
Finally, some aspects about different business scenarios identified for DITAS solu-
tion need to be validated, so we have prepared some questions to know your
opinion and vision about them.
Your feedback is very valuable to us, so we are looking forward to hearing from
you and feel free to comment all you consider.
We stress you that the survey is completely anonymous, and you will not be re-
quired to submit any personal data about yourself.
Organizations information
1. Number or employees of your organization (please, select one choice):
< 10
>10 & < 50
> 50 & < 250
> 250
2. Annual revenues of your organization
< 10 M€
> 10 M€ & < 50 M€
> 50 M€
3. What are your target market(s) for your product(s) or project(s)? (you can
choose more than one)
Cloud
© Main editor and other members of the DITAS consortium
132 D1.2 Final DITAS architecture and validation approach
IoT
Fog/Edge
Telco
IT
Other (please identify): ___________________________________________
4. What are your target sector(s) for your product(s) or project(s)?
Home
Smart Cities
Smart Health
Smart Transport and Connected Transportation
Smart Buildings and Hospitality
Smart Industry and Manufacturing
Connected Home
Agriculture
Infrastructure
Logistics
Retail
Consumer
Media
Internet or Mobile
Security & Defense
Entertainment
Finance
Banking
Other (please identify): ____________________________________________
5. Does your company develop data-intensive applications?
YES (explain and quantify)
NO
6. Does your company develop applications in Cloud / Edge context?
YES (explain and quantify)
NO
7. Can you describe what your company’s business is specifically in
Cloud/Edge/IoT contexts?
__________________________________________________________________________
Positioning and Background information
1. Which is your position within your organization?
CEO
CTO
Business developer
© Main editor and other members of the DITAS consortium
133 D1.2 Final DITAS architecture and validation approach
Product developer
Sales Force
Project manager
Other (please identify): _________________________________________
2. Are you familiar with the development of Cloud, Mobile, IoT, Fog/Edge
Applications?
YES (rate your experience from 1 to 5)
NO
3. Do you know about the market trends in Cloud, Mobile, IoT, Fog/Edge
Applications?
YES (rate your experience from 1 to 5)
NO
Market-oriented questions
1. Does your organization require Cloud/Edge services to develop its busi-
ness?
NO
YES (please select):
AWS Greengrass
Microsoft Azure
Other (please identify): _____________________________________
2. Does your organization use data from external sources for their commer-
cial offering?
YES
NO
3. Does your organization sell or buy data?
Sell data
Buy data
Both
Manage data from others
4. Has your company difficulties managing those data?
NO
YES (please select, you can choose more than one):
Data logistics (detail the specific field)
Data Definition
Data Acquisition (Data sources, Data format, Transmis-
sion protocols, others)
Data Storage (Where to store, DFS with IoT, Security/pri-
vacy, others)
© Main editor and other members of the DITAS consortium
134 D1.2 Final DITAS architecture and validation approach
Data Movement (movement, compression, anonymiz-
ing, encryption/securing, others)
Data Consumption (Data analytics / operational pro-
cesses, Lambda/Kappa architecture, others)
Data Dismissal (Deletion, freezing, others)
Data management
Data analysis
Data visualization
Data security/privacy/trust
data storage
data warehousing
data quality
data visualization
data security
data analytics
data governance
data architecture
5. What problems can DITAS solve for you that your company is already
solving with other workflows/tools/platforms?
__________________________________________________________________________
6. Would your organization consider using a solution such as DITAS in your
workflow?
YES
NO (explain why): _____________________________________________
7. How much will your organization willing to pay for such services?
__________________________________________________________________________
8. Is there any other workflow/tools/platform or solutions/service that solves
the problem better/cheaper than DITAS?
________________________________________________________________________
9. Do you think that the Open Source approach of DITAS could be a barrier
for your organization?
YES (please, explain why): ______________________________________
NO