Download pdf - D1.2 Final DITAS architecture and validation approach · vides data access abstraction for the application designer, application devel-oper, and the application operator, so each

D1.2 Final DITAS architecture and

validation approach

Project Acronym DITAS

Project Title Data-intensive applications Improvement by moving

daTA and computation in mixed cloud/fog environ-

mentS

Project Number 731945

Instrument Collaborative Project

Start Date 01/01/2017

Duration 36 months

Thematic Priority

Website:

ICT-06-2016 Cloud Computing

http://www.ditas-project.eu

Dissemination level: Public

Work Package WP1 Requirements, Architecture and Validation Approach

Due Date: M24

Submission

Date:

17/01/2019

Version: 1.0

Status Final for submission

Author(s): Maya Anderson (IBM), Ety Khaitzin (IBM), Aitor Fernández

(IDEKO), Borja Tornos (IDEKO), José Antonio Sánchez Murillo

(Atos), Alexandros Psychas (ICCS), Achilleas Marinakis

(ICCS), Vrettos Moulos (ICCS), Grigor Pavlov (CS), Sergey

Miroshnikov (CS), Frank Pallas (TU-Berlin), Max-R. Ulbricht (TU-

Berlin), Sebastian Werner (TU-Berlin), Mattia Salnitri (POLIMI),

Giovanni Meroni (POLIMI), Pierluigi Plebani (POLIMI), Ana Be-

lén González Méndez (ATOS), José Antonio Sánchez (ATOS),

David García Pérez (ATOS), Ilio Catallo (OSR), Andrea Mich-

eletti (OSR)

Reviewer(s) Cinzia Cappiello (POLIMI), David García Pérez (ATOS)

This project has received funding by the European Union’s Horizon

2020 research and innovation programme under grant agreement

No. 731945

© Main editor and other members of the DITAS consortium

2 D1.2 Final DITAS architecture and validation approach

Version History

Version Date Comments, Changes, Status Authors, contributors, reviewers

0.1 08/10/2018 Initial Version Maya Anderson (IBM)

0.2 09/11/2018 Added state of the art for

goal modelling

Mattia Salnitri (POLIMI)

0.3 13/12/2018 Filed in table components

DS4M, DURE, DUE@VDM,

DUE#VDC, DU, SLA Man-

ager, Deployment Engine

Mattia Salnitri (POLIMI)

Giovannin Meroni (POLIMI)

José Antonio Sánchez (ATOS)

0.4 14/12/2018 Added the V&V Section Borja Tornos (IDEKO), Aitor Fer-

nández (IDEKO),

0.5 19/12/2018 Added Market Analysis, SLA

State of the Art

Ana Belén González Méndez

(ATOS)

David García Pérez (ATOS)

0.6 20/12/2018 Added requirements annex,

components Annex, and

questionnaire Annex

Maya Anderson (IBM), Ety

Khaitzin (IBM), José Antonio

Sánchez Murillo (Atos), Alexan-

dros Psychas (ICCS), Achilleas

Marinakis (ICCS), Vrettos Mou-

los (ICCS), Grigor Pavlov (CS),

Sergey Miroshnikov (CS), Frank

Pallas (TU-Berlin), Max-R. Ulbricht

(TU-Berlin), Sebastian Werner

(TU-Berlin), Mattia Salnitri (PO-

LIMI), Giovanni Meroni (POLIMI),

Pierluigi Plebani (POLIMIJosé

Antonio Sánchez (ATOS), David

García Pérez (ATOS), Ilio

Catallo (OSR)

0.7 21/12/2018 Update to the architecture

section, executive summary

and conclusions


Khaitzin (IBM), José Antonio

Sánchez Murillo (Atos), Alexan-

dros Psychas (ICCS), Achilleas

Marinakis (ICCS), Vrettos Mou-

los (ICCS), Grigor Pavlov (CS),

Sergey Miroshnikov (CS), Frank

Pallas (TU-Berlin), Max-R. Ulbricht

(TU-Berlin), Sebastian Werner

(TU-Berlin), Mattia Salnitri (PO-

LIMI), Giovanni Meroni (POLIMI),

Pierluigi Plebani (POLIMIJosé

Antonio Sánchez (ATOS), David

García Pérez (ATOS), Ilio

Catallo (OSR)

0.8 22/12/2018 Reviewed version Cinzia Cappiello (POLIMI), Da-

vid García Pérez (ATOS)



0.9 08/01/2019 Clean word version David García Pérez (ATOS),

Maya Anderson (IBM)

0.9 15/01/2019 Update to SotA, architecture

and requirements Annex


Khaitzin (IBM), Aitor Fernández

(IDEKO), Borja Tornos (IDEKO),

José Antonio Sánchez Murillo

(Atos), Alexandros Psychas

(ICCS), Achilleas Marinakis

(ICCS), Vrettos Moulos (ICCS),

Grigor Pavlov (CS), Sergey

Miroshnikov (CS), Frank Pallas

(TU-Berlin), Max-R. Ulbricht (TU-

Berlin), Sebastian Werner (TU-

Berlin), Mattia Salnitri (POLIMI),

Giovanni Meroni (POLIMI), Pier-

luigi Plebani (POLIMI), Ana Be-

lén González Méndez (ATOS),

José Antonio Sánchez (ATOS),

David García Pérez (ATOS), Ilio

Catallo (OSR), Andrea Michel-

etti (OSR)

1.0 17/01/2019 Fixes to text around all sec-

tions. Quality check. Docu-

ment ready for submission.

Maya Anderson (IBM), Pierluigi

Plebani (POLIMI), David García

Pérez (ATOS), Maria Teresa Gar-

cía González (ATOS)



Contents

Version History ................................................................................................................. 2

List of Figures ................................................................................................................... 6

List of tables ..................................................................................................................... 7

Executive Summary........................................................................................................ 8

1 Introduction ........................................................................................................... 10

1.1 Structure of the Document .......................................................................... 10

1.2 Glossary of Acronyms ................................................................................... 11

2 Update to the State of the Art ............................................................................ 13

2.1 Data Delivery in Fog Computing................................................................. 13

2.2 Data Management in Fog Computing ...................................................... 15

2.2.1 Data utility ............................................................................................... 15

2.2.2 Security and privacy mechanisms ....................................................... 16

2.3 Data as a Service .......................................................................................... 17

2.3.1 Interface description language ........................................................... 17

2.3.2 Goal models for data and computation movement ....................... 18

2.3.3 Software-Level Agreement ................................................................... 19

3 Update to Market Analysis .................................................................................. 21

3.1 Market Overview ........................................................................................... 21

3.1.1 Market Segmentation ............................................................................ 22

3.2 Applications with a Fog Computing approach ........................................ 22

3.2.1 Connected Vehicles .............................................................................. 23

3.2.2 Smart Cities.............................................................................................. 23

3.2.3 Connected Healthcare ........................................................................ 23

3.2.4 Smart Manufacturing ............................................................................. 24

3.3 Use Cases Market Study ............................................................................... 24

3.3.1 e-Health ................................................................................................... 24

3.3.2 Industry 4.0............................................................................................... 30

3.4 Market context questionnaire ..................................................................... 38

3.4.1 Characterization of interviewees ......................................................... 38

3.4.2 Summary of Questionnaires and interviews conducted .................. 39

4 Update to the Business and Technical Requirements ..................................... 43

5 DITAS Architecture ................................................................................................ 45

5.1 DITAS roles ....................................................................................................... 46

5.2 DITAS-SDK Architecture ................................................................................. 47

5.3 Execution Environment Architecture .......................................................... 50

5.4 VDC Architecture .......................................................................................... 51



5.4.1 Common Accessibility Framework ...................................................... 52

5.4.2 Data Processing ..................................................................................... 53

5.4.3 Data Access Layer ................................................................................. 53

5.4.4 Other VDC Components ...................................................................... 54

5.5 VDM Architecture .......................................................................................... 54

5.6 VDC and VDM integration ........................................................................... 56

6 Detailed Technical Verification and Validation Approach ........................... 58

6.1 Requirements traceability ............................................................................ 58

6.1.1 Requirements as user stories ................................................................. 59

6.1.2 Acceptance criteria .............................................................................. 59

6.2 Verification methodology ............................................................................ 60

6.2.1 Unit tests ................................................................................................... 60

6.2.2 API validation test .................................................................................. 61

6.2.3 Integration Tests ...................................................................................... 62

6.3 Validation methodology .............................................................................. 62

6.3.1 Component level requirements validation ........................................ 62

6.3.2 Framework level validation ................................................................... 63

6.3.3 Validation against project objectives ................................................. 65

7 Conclusions ............................................................................................................ 66

8 References ............................................................................................................. 67

ANNEX 1: DITAS Business and Technical Requirements .......................................... 73

WP1 – Requirement, Architecture and Validation Approach ........................... 73

Technical Requirements ...................................................................................... 73

WP2 - Enhanced data management ................................................................... 77

WP3 - Data virtualization.......................................................................................... 82

Business Requirements .......................................................................................... 82


WP4 - Execution environment ................................................................................. 94

Business Requirements .......................................................................................... 94


WP5 - Real world case studies and integration .................................................. 103

IDEKO Use Case Requirements ......................................................................... 103

IDEKO Use Case Application level Requirements .......................................... 107

OSR Use Case Requirements ............................................................................. 108

OSR Use Case Application level Requirements .............................................. 111

Objective to WP Traceability Matrix .................................................................... 115

ANNEX 2: DITAS Components ................................................................................... 118

Virtual Data Container........................................................................................... 118



Virtual Data Manager ............................................................................................ 123

DITAS SDK ................................................................................................................. 124

ANNEX 3: DITAS Technical Questionnaire ............................................................... 127

ANNEX 4: DITAS market context questionnaire ...................................................... 131

List of Figures

Figure 1: Size of Fog computing market opportunity by vertical market, 2019 and

2022 ................................................................................................................................ 21

Figure 2: Fog Market Segmentation .......................................................................... 22

Figure 3: Disruptive technologies in Healthcare ...................................................... 26

Figure 4.Healthcare supply chain .............................................................................. 27

Figure 5.HIE system ....................................................................................................... 28

Figure 6. Global Industry 4.0 Market .......................................................................... 30

Figure 7. Growth in revenue attributable to Industry 4.0 per industry sector ....... 31

Figure 8. Annual investments in Industry 4.0 per industrial sectors ........................ 32

Figure 9. Nine Technologies transforming Industrial production ............................ 32

Figure 10. The new Industry 4.0 stakeholders ecosystem ........................................ 34

Figure 11. Mindsphere by Siemens............................................................................. 34

Figure 12. Architecture using Azure IoT ..................................................................... 35

Figure 13. Google Cloud IoT Edge workflow ............................................................ 36

Figure 14. AWS IoT architecture .................................................................................. 36

Figure 15. IBM Watson Architecture ........................................................................... 37

Figure 16. Characterization of the organizations .................................................... 39

Figure 17. Interviewees’ roles ...................................................................................... 39

Figure 18. Difficulties managing data ....................................................................... 41

Figure 19. The conceptualization of Virtual Data Container ................................. 46

Figure 20. VDC Blueprint Lifecycle ............................................................................. 48

Figure 21. DITAS SDK Architecture .............................................................................. 49

Figure 22. DITAS SDK Resolution Engine Architecture and component interaction

........................................................................................................................................ 50

Figure 23. DITAS Execution Environment for several deployments of the same

blueprint ........................................................................................................................ 51

Figure 24. High-level view of the VDC ....................................................................... 52

Figure 25. High-level view of the DAL ........................................................................ 54

Figure 26. High-level view of the VDM ....................................................................... 55

Figure 27. High-level view of the VDM and VDM integration ................................ 57

https://atos365-my.sharepoint.com/personal/maria-teresa_garciagonzalez_external_atos_net/Documents/PROJECTS/SP.717662.100%20DITAS/DELIVERABLES/D1.2/DITAS%20-%20D1.2%20-%20Final%20Archictecture%20and%20Validation-0.13%20-%20MTGG.docx#_Toc535524451



Figure 28: Requirements Traceability Matrix for WP2 ............................................... 58

Figure 29: Measurements criteria vs WP .................................................................... 59

Figure 30: Software verification tests ......................................................................... 60

Figure 31: Component validation flow...................................................................... 63

Figure 32: Business requirements for the Industry 4.0 use case .............................. 64

Figure 33: Technical requirements for the Industry 4.0 use case ........................... 64

Figure 34: Validation against use cases flow............................................................ 64

Figure 35: Objective 1 fulfillment ................................................................................ 65

Figure 36: Validation against project objectives ..................................................... 65

Figure 37. Average rank per requirement for technical questionnaire .............. 129

Figure 38. Average rank per parameter for technical questionnaire ................ 130

List of tables

Table 1. Acronyms ........................................................................................................ 12

Table 2: Classification of Industry 4.0 Stakeholders ................................................. 38

Table 3: Fields to be fulfilled by the requirements of DITAS. ................................... 44



Executive Summary

The final DITAS architecture document includes an update to market analysis,

update to the business requirements, detailed project architecture and a de-

tailed plan for verification and validation.

As described in the initial architecture document with market analysis, state of

art refresh and validation approach D1.1 (D1.1, 2017), DITAS aims to address the

complexity of developing and deploying data intensive applications for the fu-

ture computing platforms that would span the Cloud and the Edge. DITAS pro-

vides data access abstraction for the application designer, application devel-

oper, and the application operator, so each can focus his time on his objectives

and avoiding stepping beyond his skill set and expertise.

DITAS consortium has performed an updated Market Analysis and it is not as gen-

eral as the one presented in the deliverable D1.1 (D1.1, 2017). In this document

we present a more focused Market Analysis, specially looking at possible markets

that can become natural users of a future DITAS platform. Also, with the help of

our use case partners, we do a more in-depth study of DITAS in both Industry 4.0

and eHealth business scenarios.

For the State of the Art, now that the first release of DITAS has been launched

and evaluated, it is clearer to see where DITAS is pushing the envelope. We think

that DITAS advances the State of the Art in 3 different scenarios all related with

the data lifecycle in a Fog environment: Data Delivery, Data Management and

Data as a Service. Section 2 of the document offers a more in-depth review of

this aspect.

In this updated version of the document Section 4, we have revised the require-

ments collected in the first period and added new ones, according to the up-

dated version of the architecture. A new technical questionnaire was passed to

external entities, asking people to rank the project basic requirements as well as

the parameters that drive the data and computation movement process. We

also put emphasis on the traceability of the requirements, extending their table

in order to provide information about how to test and fulfil each one requirement.

When implementing the first prototype of the DITAS platform, we evolved the

platform architecture. More advanced blueprint lifecycle was developed and in

order to accommodate the advanced flows in the e-Health use case with pri-

vacy concerns, a DAL layer was added to the VDC.

In the Architecture section, we define the actors involved in the architecture,

then give an overview of the components for designing, deploying and manag-

ing Virtual Data Containers (VDC), divided between two parts: the DITAS-SDK

concerning the definition and the retrieval of a VDC, and the DITAS Execution

Environment (DITAS-EE) that manages the execution of the VDC as well as the

data and computation movements.

In order to validate the DITAS framework we define a detailed technical verifica-

tion and validation approach on this document, where we describe this process

using the different type of tests the consortium is applying during the develop-

ment. The basis of this process are the requirements, and this section also de-

scribes how we track the project requirements using different traceability matri-

ces per Work Package. Furthermore, we define a process to ensure the fulfilment

of the project objectives described in the DoA (DoA, 2016), assigning each of the

objectives to Work Packages and Components in charge to fulfil them. With all



this, we ensure that the development team has covered every need of the pro-

ject and we assure that the final product and each of its components have total

stability and consistency.



1 Introduction

The final DITAS architecture document includes an update to market analysis,

update to the business requirements, detailed project architecture and a de-

tailed plan for verification and validation.

As described in the initial architecture document with market analysis, state of

art refresh and validation approach D1.1 (D1.1, 2017), DITAS aims to address the

complexity of developing and deploying data intensive applications for the fu-

ture computing platforms that would span the Cloud and the Edge. DITAS pro-

vides data access abstraction for the application designer, application devel-

oper, and the application operator, so each can focus his time on his objectives

and avoiding stepping beyond his skill set and expertise.

The overall objective of this document is to identify the requirements of the whole

project, and of its specific components, to outline the system architecture and a

common vision of the project feature set and functionality, and to define the

technical verification and validation approach. This is done in four steps.

First, we present a summary of the state of the art analysis of the technologies

that are used, and we focus on the main DITAS innovations. In addition, we in-

vestigate the state of the market in the area of fog computing and the two use

cases: e-Health and Industry 4.0, and the relevant trends. The state of the art and

market trends help to estimate risks at the technology improvement and to iden-

tify the main innovation domains DITAS can exploit.

Second, we detail both the business and the technical requirements for DITAS

components and for the project architecture; the requirements help ensure that

DITAS addresses both functionality and quality needs of the potential customers.

These requirements capture not only functional and non-functional aspects, but

also performance, security, privacy, interoperability, availability, reliability, main-

tainability, evolvability and extensibility. Special attention is given to the tracea-

bility of the requirements.

Third, we outline the overall DITAS architecture, its main components and flows.

We describe the roles in DITAS and describe the architecture using these roles,

which allow separation of concerns. The initial architecture has been revised and

elaborated since D1.1 (D1.1, 2017) based on conclusions from building the first

DITAS prototype.

Fourth, we describe the methodology using which we will analyze the case stud-

ies and perform the technical verification and validation of the DITAS compo-

nents and of the DITAS platform as a whole.

1.1 Structure of the Document

This document is arranged similarly to the deliverable D1.1 (D1.1, 2017): following

the first introductory sections, Section 2 includes the update to the state of art.

Section 3 presents the results of the market analysis, which reviews the current

state of practices regarding tools and methods used in Industry to manage data

in Fog Computing, and in particular in e-Health and Industry 4.0. This review also

provides the necessary basis for section 5 for understanding how the architecture

of the DITAS framework is shaped in order to increase the possibilities of adoption

by industrial players. Section 4 and Annex 1 (in more detail) introduce the busi-

ness and technical requirements collected through a survey of possible organi-

zations that could end up using DITAS technologies (the questionnaire can be



found in Annex 3 and 4). Section 5 describes the architecture and Annex 2 lists its

various components with their relationships to tasks in work packages. Section 6

defines the approach to technical verification and validation of the DITAS archi-

tecture. Section 7 concludes with the summary of the document and the next

steps of the DITAS project following the final architecture document delivery.

1.2 Glossary of Acronyms

All deliverables will include a glossary of Acronyms of terms used within the doc-

ument.

Acronym Definition

AI Artificial Intelligence

AM Additive Manufacturing

API Application Programming Interface

CAF Common Accessibility Framework

CAGR compound annual growth rate

CAM Connected Asset Manager

CI Continuous Integration

CPU Central Processing Unit

D Deliverable

DAL Data Access Layer

DBMS Database Management System

DIA Data Intensive Application

DNS Domain Name System

DoA Description of Action

DS4M Decision System for Movement

DUE Data Utility Evaluator

DUR Data Utility Resolution

DURE Data Utility Resolution Engine

EC European Commission

EE Execution Environment

EHC electronic health record

GB Gigabyte

GDP Gross Domestic Product

GDPR Global Data Protection Regulation

GPU Graphics Processing Unit

HIE Health Information Exchange

HIPAA Health Insurance Portability and Accountability Act



Acronym Definition

HMI Human-Machine Interface

HTTP Hypertext Transfer Protocol

IACS Industrial Automation and Control Systems

ICT Information and Communications Technology

IIoT Industrial Internet of Things

IoT Internet of Things

JDBC Java Database Connectivity

JSON JavaScript Object Notation

KDM Knowledge Discovery Meta-Model

MED Mobile Edge Computing

OAS OpenAPI specification

PaaS Platform as a Service

PLC Programmable Logic Controllers

QoS Quality of Service

REST Representational State Transfer

RTM Requirements Traceability Matrix

SDK Software Development Kit

SLA Service Level Agreement

SOA Service-Oriented Architecture

SQL Structured Query Language

VDC Virtual Data Container

VDM Virtual Data Manager

Table 1. Acronyms



2 Update to the State of the Art

In this updated version of this document we choose a different approach to up-

date the State of the Art, we focus on the main topic that we think DITAS is inno-

vative: the data lifecycle in an Edge, Cloud or Fog environment. The section is

divided into three main aspects of the data lifecycle: Delivery, Management and

the usage of data as a Service.

2.1 Data Delivery in Fog Computing

One of the main relevant advantages in adopting Fog Computing (Bonomi,

Milito, Zhu, & Addepalli, 2012; Byers, 2017; Varshney & Simmhan, 2017) concerns

the improvement in the data delivery through an active role of the edge side. In

fact, Fog computing advocates a prominent usage of computation of the edge

devices, i.e. where the data are generated (FOG - Fog Computing and

Networking Architecture Framework, 2018). This results in a reduced amount of

data to be sent to the cloud resources where, in this way, less data is stored, or

the computation can be finalized to return with a lower response time a result to

the final user. Although a lot of effort has been done in the community to optimize

the computation and the data delivery from the edge to the cloud (Mouradian,

et al., 2018), one of the goal of DITAS to improve the data-intensive applications

is to investigate also how the data delivery in other way around (from the cloud

to the edge) can be improved (Bermbach, et al., 2017).

In particular, Information Logistics has been considered in DITAS to properly or-

ganize the data delivery to the final users. According to the classification pro-

posed in (Michelberger, Andris, Girit, & Mutschler, 2013), we are interested in user-

oriented Information Logistics: i.e., the delivery of information at the right time,

the right place, and with the right quality and format to the user (D’Andria, et al.,

2015). As a consequence, user requirements can be defined in terms of func-

tional aspects, i.e., content, and non-functional ones, i.e., time, location, repre-

sentation, and quality.

Based on these assumptions, in the context of DITAS data delivery has been con-

sidered in a service-oriented architecture (SOA) where at the provider’s side

data could be stored in different formats on the cloud or on the premises of the

provider (the edge). Data can be organized in databases (relational or sche-

maless) or they are generated on-the-fly and transmitted through streams (Qin,

et al., 2016). Furthermore, as the data provider can offer the owned data as they

are or after a processing, this computation can be distributed among the nodes

belonging to the provider and the consumer (Verma, Yadav, Motwani, Raw, &

Singh, 2016).

In this context, data movement holds a crucial role, as methods and techniques

able to move the data from the provider to the consumer in order to satisfy the

consumer needs in terms of functional and non-functional properties are not fully

studied in the literature. In fact, most of the existing work considers the data flow

in a controller environments where fog nodes are devices with specialized ele-

ments (e.g., GPU, CPU, and RAM) (Dey & Mukherjee, 2018) and computation and

data are properly distributed to reduce latency (Verma, Yadav, Motwani, Raw,

& Singh, 2016), energy consumption (Duy La, Ngo, Dinh, Quek, & Shin, 2018), re-

source utilization (Lai, Song, Hwang, & Lai, 2016), data size (Al-Doghman,

Chaczko, & Jiang, 2017). Goal of DITAS is to focus on a broader environment in

which the providers and consumers belong to different organizations and no



control over the network is possible (Salman, Elhajj, Chehab, & Kayssi, 2018). In

this context, the literature is limited and only few approaches to distribute the

computation are provided (Pham & Huh, 2017) (Vidyasankar, 2018).

Security and privacy is uniquely relevant for a cloud-native platform such as DI-

TAS.

New regulations, such as the European General Data Protection Regulation

(GDPR), specify new and challenging data governance requirements for data-

intensive platforms and applications. (Bertino & Ferrari, 2018) and (Colombo &

Ferrari, 2018) broadly describe the current research in the field of Big Data secu-

rity and privacy. Specifically, when providing access to the data, the regulations

require to take into account new concepts such as the consent given by the

individual who provided the data, known as the data subject, and the usage of

the data, known as data usage purpose.

Existing access control tools either use compliance checks that do not com-

pletely match the new and complex requirements that GDPR introduces or are

limited in their scalability. Most of the existing solutions apply a coarse-grained

protection, protecting access to a data object. Tools that provide fine-grained

compliance at the granularity of specific cells, do so by either making decision

for each row separately (Thi, Si, & Dang, 2018), and thus are limited in their scala-

bility in the data lake, or by creating static views (Martínez, Fouche, Gérard, & J.,

2018) for each possible scenario, a solution that will not work for a wide set of

request attributes with multiple possible values.

To address described issues, as part of our work on DITAS, we have developed a

technique for efficient privacy policy enforcement that takes into account data

subject consent while allowing analytics on large scale data. In (Khaitzin, Shlomo,

& Anderson, 2018) we give an overview of the technique, in which we add a pre-

computation phase, in which we compile the policies and parts of the supple-

mentary data (e.g. consents, profiles), keeping only the parts relevant to the pol-

icy decisions. Thus, we obtain a compiled representation that can be efficiently

used during query-time. The result is stored as close to the data as possible in an

accessible form. We use this technique in Privacy Enforcement Engine.

With the enactment of the GDPR, the need for formal audits and vendor certifi-

cation such as EuroCloud1 or Cloud Security Alliance2 have become relevant for

cloud service provider. Research in this area, especially with regards to improving

transparency and accountability of cloud provider has become more critical.

However, such approaches are not yet suitable for the advent of Fog and Edge

Computing (Bermbach, et al., 2017). These dynamic environments are not as

easy audited, especially for moving devices.

Therefore, other technologies are needed to establish trust and transparency in

these environments. Including trusted computing (Sadeghi & Stüble, 2004), “real-

time auditing” (Ko, Lee, & Pearson, 2011) (Ullah, Ahmed, & Ylitalo, 2013)

(Doelitzscher, et al., 2012) but also monitoring approaches such as (Alcaraz

Calero & Aguado, 2015), (Sharma, Chatterjee, & Sharma, 2013) or the one

1 EuroCloud StarAudit, https://staraudit.org/

2 Cloud Security Alliance, Security, Trust & Assurance Registry Certification,

https://cloudsecurityalliance.org/star/certification/



established in DITAS.

Further techniques such as multi-party computation (Furukawa, Lindell, Nof, &

Weinstein, 2017), secret splitting (Shamir, 1979) as well as property preserving en-

cryption (Pallas & Grambow, 2018) are also possible ways to aid in GDPR compli-

ance computationally.

Additionally, concepts like “Sticky Policies” (Pearson & Mont, 2011), “Distributed

Usage Control” (Pretschner, Hilty, & Basin, 2006) also fit the DITAS context. Further-

more, advanced consent control strategies such as presented in (Ulbricht &

Pallas, 2018) also offers ways to support GDPR compliant applications.

Besides these techniques, research for some practical implementation such as

the apps presented by Lodge et al. (Lodge, Crabtree, & Brown, 2018), as well as

edge-access control management presented by Werner et al. (Werner, Pallas,

& Bermbach, 2017) shows potential for the security and privacy practices in DI-

TAS.

All of these approaches introduce tradeoffs, that have to be analyzed and eval-

uated, for example (Pallas & Grambow, 2018) showed the performance penal-

ties of privacy-preserving databases. A platform such as DITAS needs to offer

means to select the appropriate tradeoff for each use-case and offer guidance

for a data administrator to select the best fitting technology.

2.2 Data Management in Fog Computing

Data management in DITAS aims to suggest and provide to the application de-

signer the most suitable data set considering the application and user require-

ments. Requirements are related to data utility and security and privacy aspects.

The following sections discuss the existing contributions in data utility and security

research areas and highlight the innovative aspects of the DITAS approach.

2.2.1 Data utility

In the first period of DITAS the concept of Data Utility has been introduced and

defined as “the relevance of a data set for the usage context” (Cappiello,

Pernici, Plebani, & Vitali, 2017) where the context includes the application re-

quirements and the resources used to host the data source. Such a definition was

proposed considering previous literature contributions that were using the term

Data Utility. In fact, the concept of Data Utility has been used in several contexts.

For the general IT context, Data Utility has been defined by (Kock, 2007) consid-

ering both the relevance of a piece of information to the context and the capa-

bility of such piece of information to reduce uncertainty. In the business scenario

Data Utility has been instead defined as the business value attributed to data

within specific usage contexts (Syed & Syed, 2008). A more complex definition

has been provided in the statistics context by (Hundepool, et al., 2012): Data

Utility is “a summary term describing the value of a given data release as an an-

alytical resource. This comprises the data's analytical completeness and its ana-

lytical validity”. All these definitions agree on the fact that the utility of a data set

depends on the context in which data are used.

An important characteristic of the Data Utility concerns its variation with respect

to the specific goal of the data analysis. As an example, Data Utility is often an-

alyzed for data mining applications (Lin, Wu, & Tseng, 2015) and defined consid-

ering the different data mining techniques. In this research area the tradeoff be-

tween data utility and data privacy is often considered. In fact, in order to



guarantee data privacy, data anonymization techniques have to be applied:

hiding data values influences the effectiveness of data mining algorithms. (Han,

J., J., H., & J., 2017) proposes an anonymization method that is able to guarantee

higher utility, i.e., better classification accuracy. A method able to accomplish a

good balance between privacy and utility in the context of association rule was

proposed in (Kalyani, V. P. Chandra Sekhara Rao, & Janakiramaiah, 2017). More-

over, Data Utility might be influenced by the quality of service and the quality of

data. For instance, the relation between Data Utility and quality of service has

been investigated in (Wang, Zhu, Bao, & Liu, 2016), which discusses Data Utility

with a focus on energy efficiency of mobile devices in a mobile cloud-oriented

environment. The issue of energy efficiency for discovering interrelations be-

tween the evaluation of the data value and the effectiveness of run-time adap-

tation strategies has been discussed in (Ho & Pernici, 2015). Similarly, the influence

of data quality on Data Utility is considered in (Moody & Walsh, 1999), where rel-

evant quality dimensions (e.g., accuracy and completeness) are considered in

relations with Data Utility. Note that data quality (and thus data utility) assessment

depends on the type of data and on the type of application. The relationship

between data quality and data mining algorithms has been analyzed by (Blake

& Mangiameli, 2011). Later, (Even, Shankaranarayanan, & Berger, 2010) has fo-

cused on the impact of the main four data quality dimensions (accuracy, com-

pleteness, consistency and timeliness) on clustering algorithms. The study high-

lights that the consistency, completeness and accuracy issues are the ones that

negatively impact on the results effect on the results.

Finally, Data Utility has been also analyzed with respect to the relation between

IT and business, and this has paved the way for associating Data Utility to business

processes. In this context, Data Utility is defined as a measurement of the gain

obtained by using a dataset inside an organization (Even, Shankaranarayanan,

& Berger, Inequality in the utility of customer data: implications for data

management and usage, 2010). Moreover, (Giorgini, Mylopoulos, Nicchiarelli, &

Sebastiani, 2003) discusses which are the information quality requirements in or-

der to obtain reliable results from the execution of business processes.

2.2.2 Security and privacy mechanisms

As described above most security and privacy mechanisms introduce some sort

of tradeoffs regarding system performance in general and non-functional re-

quirements like GDPR compliance or transparency concerning data processing

and flow of data/information. To handle these tradeoffs properly and being able

to make informed decisions regarding the development and /or integration of

specific privacy and security mechanisms into new systems, knowledge about

the systematic quantification of risks of possible data breaches or unintentional

data leakage is needed. Since the usage of Likert scales (e.g. from “very low” to

“very high”) to rate risk in common security and privacy impact assessments is

way to imprecise, this issue gains attention in some research communities re-

cently.

New approaches to measure the value of privacy and the efficacy of privacy

enhancing technologies (PETs) (Halunen & Karinsalo, 2017) and valuable “Sys-

tematization of Knowledge” regarding technical privacy metrics (Wagner &

Eckhoff, 2018) lead to new ways of possible quantification of privacy risks, and

therefore to better privacy impact assessments (Wagner & Boiten, Privacy Risk

Assessment: From Art to Science, 2018). The results of these assessments can be



used to make better decisions for or against the integration of a specific privacy

mechanism in order to get the right balance between system performance and

security as well as privacy requirements.

In DITAS we will evaluate these new approaches in order to get insights and in-

spirations for the development of metrics that we want to use to rank different

possible privacy and security mechanisms that can be adapted to the VDC dur-

ing the deployment phase. To be able to make an informed decision about the

choice of a specific blueprint a ranking mechanism should be used to determine

which of the available blueprints fits best the requirements of a respective appli-

cation designer.

2.3 Data as a Service

2.3.1 Interface description language

According to the DITAS architecture, the Virtual Data Container interacts with

the data-intensive applications through the Common Accessibility Framework

API, the programming model of which is REST-oriented. The data administrator is

in charge of designing the API as well as making it publicly available, via the

abstract VDC blueprint. In fact, the EXPOSED API section of the blueprint includes

all the information about the methods, through which the administrator exposes

totally or partially the data that are stored in the sources that he/she controls.

EXPOSED API is a technical section that enables the application developer to

understand how the VDC methods work and therefore to conclude whether the

VDC fits to his/her DIA from a developing point of view (D3.2, 2018).

As a result, there was a need to provide a structured description of the CAF REST-

ful API, using a specification that allows both humans and computers to discover

and understand the capabilities of each one VDC method. Some of the most

popular standards, towards that direction, are the following (Petychakis, et al.,

2014): OpenAPI specification (originally known as the Swagger specification),

which offers a large ecosystem of API tooling, has great support in almost every

modern programming languages and allows developers to test the APIs imme-

diately through easy deployment of server instances. API Blueprints, where an API

description can be used in the Apiary platform to create automated mock serv-

ers, validators etc. The Hydra specification, which is currently under heavy devel-

opment, tries to enrich current web APIs with tools and techniques from the se-

mantic web area. RAML (RESTful API Modeling Language) provides a structured,

unambiguous format for describing a RESTful API, allowing developers to describe

the API; the endpoints, the HTTP methods to be used for each one, any parame-

ters and their format, what can be expected by way of a response etc.

(Tsouroplis, et al., 2015). Since a critical business requirement in DITAS concerning

the VDC, is to have an open API so that big vendors and also new providers are

able to publish their services and components, we decided to describe the CAF

API based on the OpenAPI specification (OAS). In fact, we agree upon the ob-

jective of the OpenAPI Initiative, creating an open description format for API ser-

vices that is vendor-neutral, portable and open to accelerating the vision of a

truly connected world (Lucky, Cremaschi, Lodigiani, Menolascina, & De Paoli,

2014). Furthermore, OAS project has the largest and most active developer com-

munity on GitHub (Surwase, 2016).

However, we propose an extension of the OAS, in order to address major require-

ments of the project, thus supporting the data movement techniques that we



introduce. Existing suggestions to extend the OAS aim at enhancing actual API

descriptions by creating a simple description format to annotate properties at

semantic level to support semi-automatic composition (Lucky, Cremaschi,

Lodigiani, Menolascina, & De Paoli, 2014), or implementing the FAIR principles

(Findable, Accessible, Interoperable, Reusable), introducing additional

metadata elements beyond those included in the OAS (Zaveri, et al., 2017).

In DITAS, the structure of the abstract blueprint is method-oriented, meaning that

each one exposed VDC method is semantically described by separate tags and

has its own guaranteed levels of data quality, security and privacy. Moreover,

the rules in the form of goal trees, that are used to construct the SLA contract,

differentiate between the methods. Consequently, the extensions of the OAS

that we suggest are mainly applied to the so-called Operation Object, which in

our case corresponds to the VDC method. Indicatively, the data administrator,

through the definition of the extended operation, must specify among others the

data sources that the VDC method accesses as well as the schema of the data,

included in the response payload. This kind of information is necessary for the

platform in order to decide which portion of data to move, given the specific

method selected by the application designer.

2.3.2 Goal models for data and computation movement

In DITAS the decision of where and when move data and computation is based

on an ad-hoc extension of a goal-based modelling language. Goal models rep-

resent sets of objectives (i.e., goals) organized in a tree structure where the root

is the main objective and the leaves are the refined sub-objectives. In particular,

we used the goal model structure to specify the requirements of application de-

signers, we then enriched the language to support the decision of which data

movement or computation movement to enact in case the requirements of the

user are no more satisfied. For more information on such approach please refer

to Deliverable 2.2 (D2.2, 2018).

A great variety of analyses techniques have been proposed for analyzing goal

models for this purpose (Horkoff & Yu, Interactive goal model analysis for early

requirements engineering, 2016). The satisfaction analyses propagate the satis-

faction or denial of goals forward and backward in the goal tree structure. The

forward propagation (Letier & Van Lamsweerde, 2004) (top-down) can be used

to check alternatives: if a certain goal is (not) satisfied, what are the sub-goals

that are (not) satisfied. The backward propagation (Sebastiani, Giorgini, &

Mylopoulos, 2004) (bottom-up) can be used to understand what the conse-

quences are of a satisfied or denied goal. Some satisfaction analyses mark the

goal with labels representing the level of satisfaction, for example: satisfied, par-

tially satisfied, denied or unknown (Giorgini, Mylopoulos, Nicchiarelli, &

Sebastiani, 2003; Chung, Nixon, Yu, & Mylopoulos, 2012).

Other research work uses goal models and their analysis for business intelligence

(Horkoff, et al., 2014; Amyot, et al., 2010). Goal models are enriched with metrics

that indicate values associated with the achievement of goals. For example, the

goal Sell trips of a travel agency may be associated with the metric Number of

trips sold, which indicates a close relationship between the satisfaction of the

goal and the number of trips sold by the travel agency.

Goal-based modelling languages often include contribution links that represent

positive or negative consequences (Horkoff & Yu, 2016). A contribution link that



connects two goals specifies that the achievement of a goal contributes posi-

tively the achievement of another goal. For example, in a traveling company,

the goal Advertise campaign performed can be connected to Trips sold goal

with a positive contribution link, since the achievement of the former goal will

help the achievement of the latter. Such contribution links can be used to identify

conflicts between goals and, along with metrics, they can be used to choose

the best set of goal to achieve (Amyot, et al., 2010; Horkoff, et al., 2012).

Some goal-based analyses are used for the objective of assessing alternatives for

decision making (Letier & Van Lamsweerde, 2004). This includes the design of

data intensive applications, where it is central to define the objective(s) of an

application in order to select the best data sources and the metrics to monitor.

In the context of fog computing such information and decisions can be used also

during runtime, in order to select the best data movement action to adopt when

one or more metrics reach critical values. For example, in case of applications

that uses data streams, the quality of the stream should be maximized. This can

be monitored using a metric that measures the throughput of the connection.

Whenever the value of the metric becomes lower than a certain threshold, a

data movement technique that consists in moving the source nearer the appli-

cation (for example from the cloud to the edge) may be adopted. Such repa-

ration action will bring the metric within the desired range.

2.3.3 Software-Level Agreement

To cover the Software-Level Agreement (SLA) part of DITAS we need to cover

three DITAS components, SLA Manager, Computation Movement Enactor (not

still developed) and Data Movement Enactor (already covered by other parts of

this document). The SLA Manager in DITAS is only responsible of monitoring an

SLA violation, the actions to resolve them will be taken care by the two previous

mentioned Enactors.

At the beginning of the project the main idea was to continue using WS-Agree-

ment specification (Open Grid Forum, 2014), that partners of the project have

used before, but due to the technical constraints of building an SLA Manager

lightweight enough to be at Edge nodes it was decided to do the component

from scratch, building over the goal-based model specified in the previous sec-

tion.

In the last years several works have been focusing in Service-Level Agreements

systems for Fog environments, specially focused on the computation and limited

number of resourced in the Edge. Katsalis et al (Katsalis, Papaioannou, Nikaein,

& Tassiulas, 2016) study optimization techniques for the deployment of virtual ma-

chines on a mobile-cloud environment where the number of resources is limited.

The main objective it is to select the best deployment strategy to achieve the

requirements of the applications considering aspects such as the network. Taleb

et at (Taleb, Dutta, Ksentini, Iqbal, & Flinck, 2017) employs the concept of Mobile

Edge Computing (MEC) to enable application achieve its QoS, allowing the ap-

plication to access anywhere any data with reduced latency, everything con-

trolled under a complete SLA and Monitoring system. Yin et al (Yin, Cheng, Cai,

& Cao, 2017) present a similar work to the one of Katsalis, where it is necessary to

manage limited resources on Cloud-Edge environments, but they do this also

considering that the applications execute time-sensitive jobs.



DITAS SLA manager focused more on quality of data aspects that we think it is

part of the uniqueness of DITAS with respect to another Cloud-Edge environment.

In the literature we found few work related to this topic.



3 Update to Market Analysis

3.1 Market Overview

Nowadays we experience tremendous success of IoT paradigm resulting in ap-

proximately 20 billion devices constantly producing data - data that must be

eventually processed and stored. Gartner3 predicts that there will be 20.4 billion

IoT devices installed by the end of 2020, not including computers and

smartphones, and in three years it is expected that there are up to 50 billion de-

vices at the edge of the network. Most IoT deployments face challenges related

to latency, network bandwidth, reliability and security, and cannot be addressed

in traditional cloud models, because of that, it is easier to process the data where

they are produced - namely at the edge of the network.

The rapid adoption of IoT is seen by the companies as an opportunity for data-

driven businesses, and a combination of Cloud and Edge Computing is becom-

ing the architecture accepted to approach the challenges related to the inte-

gration of data from multiple sources and to process data in motion and at rest.

The Fog Computing market opportunity is expected to achieve $18bn by 20224

growing from $1.032bn in 2018, in which the most potential markets are in en-

ergy/utilities and transportation, followed by Healthcare and industrial markets.

However, each market will adopt new Edge solutions, standards and products

at its own pace facing their markets’ own barriers.

Figure 1: Size of Fog computing market opportunity by vertical market, 2019 and 2022

3 https://www.gartner.com/en

4 https://www.openfogconsortium.org/wp-content/uploads/451-Research-report-on-5-

year-Market-Sizing-of-Fog-Oct-2017.pdf

https://www.gartner.com/en

https://www.openfogconsortium.org/wp-content/uploads/451-Research-report-on-5-year-Market-Sizing-of-Fog-Oct-2017.pdf




3.1.1 Market Segmentation

Fog Computing market is segmented be-

tween Solutions and Applications. Solutions

includes hardware, software/applications

and services around this technology:

Hardware: edge devices with capacity to

participate in Fog system (connectivity,

application software, computing hard-

ware, etc.).

Software/application: Software and appli-

cations that give capacity to devices to

connect and communicate with other de-

vices securely, IoT applications to reliable

integrate IoT sensors and the cloud, appli-

cations to distribute data flow between

Cloud and Edge, etc.

Services: New business models are arisen,

such as Fog-as-a-service, where the ven-

dor leases an outcome (hardware/soft-

ware/services) to an end customer.

The Fog market solutions is split into hardware components (51%), application

software (Fog-enabled-analytics-19,9%) and Fog services (15.7%)5. 451 Research

forecasts that hardware percentage will decrease over time and different Fog

services and application software will emerge.

On the other hand, several applications of vertical markets using Cloud and IoT

technologies and with the need of real time data can benefit of the elastic re-

sources at the edge that offers Fog computing. The next section explores these

applications and how they take advantage of Fog computing.

3.2 Applications with a Fog Computing approach

The ideal applications that use Fog computing approach are those that require

intelligence near the edge, run in dispersed areas with poor connectivity or cre-

ate a large amount of data impossible to stream to the cloud, or manage many

connected sensors.

According to the OpenfogConsortium6, there are four applications in different

sectors, in which Fog Computing brings multiple benefits to process data in real

time.

Next, we briefly describe those applications that potentially use Fog Computing.



6 https://www.openfogconsortium.org

Figure 2: Fog Market Segmentation



https://www.openfogconsortium.org/



3.2.1 Connected Vehicles

Transportation sector has a potential to reach $3.2B by 20227, being the second-

largest potential market for Fog Computing. Transportation applications have

key characteristics to use Fog Computing, such as mobility, intermittent connec-

tivity and real time responses need and the most emerging application in this

sector is autonomous/connected cars.

According to Forbes8, is expected 20M self-driving cars in the roads, and by 2030

it estimates that one in four cars will be an autonomous car. This means a consid-

erable generation of data to process. Intel9 predicts that by 2020 each autono-

mous car will generate more than 4.000 GB per day. But not all of this amount of

data need to be processed in the Cloud, so Fog Computing can provide a solu-

tion with a set of network resources upon connected car can run their needs of

computation and storage. Fog architecture will improve efficiency, perfor-

mance, bandwidth, speed and reliability to connected cars.

3.2.2 Smart Cities

According to a recent report, global smart city market is expected to reach $2.7B

globally by 2024, growing at a CAGR of around 16% between 2018 and 202410.

Smart cities powered by IoT technology promise to transform the way we live so

far, but IoT must ensure manage a vast number of connected sensors in a relia-

bility and timely way, particularly for critical functions. Moreover, municipality net-

works manage sensitive citizens and traffic data and critical data for emergency

response.

Fog Computing has become the solution to help reliability of delay and data-

intensive applications developed for smart cities.

3.2.3 Connected Healthcare

Healthcare industry is one of the three largest potential opportunity for Fog Com-

puting. Connected Healthcare enable patients’ engagement and reduce

Healthcare systems costs while improving healthcare services, among other mul-

tiple benefits, and the boom of sensors, smart health devices, health apps and

IoT technologies provide the basis for it.

Fog Computing enable IoT platforms to monitor patient health variables in real

time and allowing fast responses to their needs.

The biggest challenge to face in Healthcare environment is the management of

sensitive data and the exchange of health records with a maximum level of se-

curity.



8 https://www.forbes.com/sites/oliviergarret/2017/03/03/10-million-self-driving-cars-will-

hit-the-road-by-2020-heres-how-to-profit/#641f424b7e50

9 https://www.intel.com/content/www/us/en/automotive/autonomous-vehicles.html

10 https://globenewswire.com/news-release/2018/08/23/1555932/0/en/Global-Smart-

City-Market-Will-Reach-USD-2-700-1-Billion-By-2024-Zion-Market-Research.html



https://www.forbes.com/sites/oliviergarret/2017/03/03/10-million-self-driving-cars-will-hit-the-road-by-2020-heres-how-to-profit/#641f424b7e50

https://www.forbes.com/sites/oliviergarret/2017/03/03/10-million-self-driving-cars-will-hit-the-road-by-2020-heres-how-to-profit/#641f424b7e50

https://www.intel.com/content/www/us/en/automotive/autonomous-vehicles.html

https://globenewswire.com/news-release/2018/08/23/1555932/0/en/Global-Smart-City-Market-Will-Reach-USD-2-700-1-Billion-By-2024-Zion-Market-Research.html

https://globenewswire.com/news-release/2018/08/23/1555932/0/en/Global-Smart-City-Market-Will-Reach-USD-2-700-1-Billion-By-2024-Zion-Market-Research.html



3.2.4 Smart Manufacturing

Industrial Internet of Things (IIoT) has emerged as a technology to help increase

productivity performance in manufacturing. Management of data at rest and at

flight is crucial for this kind of industry, for instance, to carry out predictive mainte-

nance or stock control. Large number of sensors are located in a plant and have

to be controlled in real time and connected with other departments of the or-

ganization or other plants. A mix between Cloud computing and storage of his-

torical data and Fog computing are the technologies to integrate IIoT platforms

in manufacturing industry.

3.3 Use Cases Market Study

DITAS project validates its solutions in two use case with data-intensive applica-

tion needs: e-Health and Smart Manufacturing. The purpose of testing project

results in real scenarios is threefold: a) Validate the value of the DITAS outcomes

in the real world, and b) extract knowledge from the validation in these use cases

and provide partners’ use cases new business opportunities and c) promotion for

the sustainability approach of outcomes after the project.

In the next section, a market analysis of the vertical sectors to which use cases

belongs have been carried out. This study aims, in each case, at analyzing the

market context, stakeholders’ identification, challenges and concerns and exist-

ing comparable solutions or potential competitors.

3.3.1 e-Health

The digital transformation in all the industries is based in harnessing data, either

historical data or in real time. The adoption of IT technologies in healthcare sys-

tems is following the same pattern as other sectors, although the drivers to adopt

new technologies are different for each industry, all of them identify the ad-

vantages of fully exploiting data with new technologies.

With respect to the digital transformation of healthcare, the European Commis-

sion carried out a public consultation (European Commission, Public Consultation

on Health and Care in the Digital Single Market, 2017) (finishing in October 2017)

investigating the need for policy measures promoting digital innovation for better

healthcare in Europe. The main results of this public consultation show that over

93% of the respondents believe that “Citizens should be able to manage their

own health data”. Furthermore, 83% of all respondents either agree or strongly

agree with the statement that “Sharing of health data could be beneficial to

improve treatment, diagnosis and prevention of diseases across the EU". The

overwhelming majority of all respondents (73.6%) identify improved possibilities

for medical research as a reason for supporting cross border transfer of medical

data. Risks of privacy breaches and of cybersecurity are on the top of the lists

among the major barriers identified to the cross-border transfer of medical data

(European Commision, Synopsis Report - Consultation: Transformation Health and

Care in the Digital Single Market, 2018).

The global eHealth market is projected to reach U$132.35 billion by 2023 from

$47.60 billion in 2018, at a CAGR of 22.7% according to ReportsnReporst11.

11 https://www.reportsnreports.com/reports/1385867-ehealth-market-by-product-ehr-

pacs-vna-ris-lis-cvis-telehealth-erx-hie-patient-portal-medical-apps-services-remote-

https://www.reportsnreports.com/reports/1385867-ehealth-market-by-product-ehr-pacs-vna-ris-lis-cvis-telehealth-erx-hie-patient-portal-medical-apps-services-remote-patient-monitoring-diagnostic-services-end-user-hospitals-home-healthcare-payers-st-to-2023.html




Regarding EU, the value of the European data economy was €300 billion in 2016;

if the right legislative and policy measures are put in place, this value could grow

to up to €739 billion by 2020, 4% of the EU's GDP (European Commission, Final

results of the European Data Market study measuring the size and trens of the EU

data economy, 2017; European Commission, Data in the EU: Commission steps

up efforts to increase availability and boost healthcare data sharing, 2018). And

the primary reasons boosting digitization are:

• Improvement of patient experience: With digitalized services, patients will

have full access to their health information, and using new technologies

such as IoT of wearables, for instance, will be monitored and will receive

a personalized care. It will lead a move towards a more proactive and

prescriptive care and a patient-centric care approach.

• Expenditure reduction and improving services in Health systems and or-

ganizations: Health systems want to improve their quality healthcare ser-

vices meanwhile reducing costs, and digitization in Healthcare sector is a

step towards this goal. This technological approach of healthcare will be

needed to face the boom of chronic diseases and growing geriatric pop-

ulation in European Union.

• Appearance of new technologies: Internet of things (IoT) for healthcare is

one of the major drivers of e-Health market, and new technologies such

as Cloud computing, Big Data, and mobile wearables enable more effi-

cient and rapid ways of delivering healthcare.

In the last report published by Atos, Look Out 2020+ Industry Trends Healthcare12,

experts predict that Healthcare market will be a data-intensive field, much more

than in other vertical sectors and they agree that the technologies needed to

succeed are Cloud Computing, AI and IoT.

But the use of new technologies also brings multiple dangers, regulations and

privacy concerns, especially with the sensitive data managed in Healthcare en-

vironments. Data breaches can create high risks in patients and penalty fees with

the new General DATA Protection Regulation (GDPR) regulations. A Data breach

can cost millions (the average response and remediate could be up to $3,8 M)

accordingly to the Ponemon Institute study13.

3.3.1.1 Disruptive technologies for the future of e-Healthcare

Atos has predicted14 the 10 disruptive technologies that will impact in the future

of the Healthcare.

patient-monitoring-diagnostic-services-end-user-hospitals-home-healthcare-payers-st-

to-2023.html

12 https://atos.net/content/mini-sites/look-out-2020/healthcare/

13 https://www.ponemon.org/news-2/23

14 https://atos.net/content/mini-sites/look-out-2020/



https://atos.net/content/mini-sites/look-out-2020/healthcare/

https://www.ponemon.org/news-2/23

https://atos.net/content/mini-sites/look-out-2020/



Figure 3 depicts the 10 disruptive technologies envisioned and classified accord-

ing to their integration status in e-Healthcare systems. Some of them, such as AI,

Robotics or Augmented reality are being adopted while Hybrid Cloud is currently

considered as a mainstream with a high impact.

Figure 3: Disruptive technologies in Healthcare

Cloud services can offer security and privacy controls for health systems and

data, and cloud-based healthcare IT systems can solve issues regarding interop-

erability and integration. Moreover, cloud services enable the rapid develop-

ment for mobile and IoT and cloud computing can support AI applications.

3.3.1.2 Healthcare market stakeholders

The healthcare supply chain is comprised by providers (hospitals, clinics and phy-

sicians), payers (insurance companies, governments and regulatory bodies), dis-

tributors, manufacturers, and patients (see Figure 4).

Digitization of healthcare market has led to the communication and share of

data among the stakeholders and between each stage of the supply chain. For

healthcare systems this is highly valuable since it allows, for instance, monitoring

patients at real time and providers respond to their needs faster.

All the stakeholders in the supply chain agree that sharing data has a great im-

pact in different factors such as better and more efficient healthcare services,

improving quality of patient care, lower costs and increase revenues.



Figure 4.Healthcare supply chain15

A preliminary description of the stakeholders involved in the Healthcare supply

chain is:

Providers (Hospitals and Healthcare Systems): The EHR systems market is growing

very fast and is expected to reach $5.20B by 2021 from $3.92B in 2016 at a CAGR

of 5.8% during the forecast period16.

Contrary to other ICT-based clinical systems, EHRs adoption is increasingly ac-

cepted by hospitals and clinicians. EHRs store patient data such as radiology im-

ages, medications, historic health reports, etc. This information is used by the hos-

pital staff, but other stakeholders can use this data to improve diagnosis, for clin-

ical trials or predictive medicine, etc.

Payers (Government, Policy makers): Public Health Administration and regula-

tory bodies develop policies, invest in infrastructures, and at the same time use

ICT-based healthcare solutions.

Payer (Researchers, pharmacy industry, etc.): Data from different sources and

from a vast number of patients will improve clinical trials for pharmacy industry or

increase medical information for researchers.

Patients: Patients are the final beneficiaries of sharing data and of the insights

derived of processing and analyzing data. It is expected a high demand for

health monitoring applications, and an increasingly patient empowerment by

using their own HHR.

Distributors: Service providers play different roles in the Healthcare system. Some

of the solutions or services offered by them are:

• Secure and reliable services to store and share EHRs

• Mobile solutions to monitor and record patient information in real time

• Telemedicine services

• Digital e-Health Platforms with integrated Healthcare services

• Interoperability solutions to share information

15 https://rctom.hbs.org/submission/impact-of-digitalization-on-healthcare/

16 https://www.marketsandmarkets.com/Market-Reports/ambulatory-ehr-market-

235617627.html

https://rctom.hbs.org/submission/impact-of-digitalization-on-healthcare/

https://www.marketsandmarkets.com/Market-Reports/ambulatory-ehr-market-235617627.html

https://www.marketsandmarkets.com/Market-Reports/ambulatory-ehr-market-235617627.html



• Applications focused in different diseases and pathologies, for diagnosing

or prevention

Manufacturers: They develop and deliver medical devices, smart sensors, moni-

tor devices, robots for healthcare, etc.

3.3.1.3 Challenges and concerns of Healthcare data sharing

Sharing patient health data can help the different stakeholders of the Healthcare

supply chain. For instance, hospitals can reduce costs of readmissions or avoid

medical errors and Public Health Agencies can improve population health and

perform new services for patients care.

However, Healthcare stakeholders must consider HIPAA regulations and ensure

data privacy and safety. These privacy issues are not only great concerns mainly

for patients, but also for providers: both know the benefits of sharing clinical data

but have many doubts about the data privacy. So, one of the greatest chal-

lenges to face in the adoption of e-Health is sharing clinical data among stake-

holders more effectively.

Currently, Health Information Exchange (HIE) technology provides the solution for

sharing clinical data within the organizations and the healthcare community,

and the option of the Integrated platform17 which uses open standards and

cloud-based architectures to integrate different applications and hospitals de-

partments. Moreover, open standards facilitate the connection with other hospi-

tal networks.

Figure 5.HIE system18

HIE systems help to share EHR and other patient data within hospitals and with

another health organizations, etc. They are records of clinical data of the patient

while being in the hospitals: imaging tests, bloody tests, medication, etc. But

17 https://www.optum.com/content/dam/optum3/optum/en/resources/white-pa-

pers/Sharing_Clinical_Data_White_Paper.pdf

18 http://www.atlantiscgpr.com/?page_id=20

https://www.optum.com/content/dam/optum3/optum/en/resources/white-papers/Sharing_Clinical_Data_White_Paper.pdf

https://www.optum.com/content/dam/optum3/optum/en/resources/white-papers/Sharing_Clinical_Data_White_Paper.pdf

http://www.atlantiscgpr.com/?page_id=20



patients have begun to collect data about health and disease with wearable-

devices, sensors and apps. This information offers new opportunities for research-

ers that want to gather and analyze this information to carry out clinical trial or

research. While EHR offer organized and structured health data, data collected

from patients is dispersed and difficult to share among health stakeholders.

Nowadays, Healthcare digitization is far from having tools to share all the infor-

mation available in the entire supply chain and create new business models sell-

ing or buying data for medical purpose to pharmacy industry or researchers. That

is the next step to fully exploit health data.

Another aspect to consider concerning healthcare data sharing is the fact that

the European Commission, in its mid-term review on the implementation of the

digital single market strategy, set out the intention to take further action in the

area (among the others) of “citizens' secure access to and sharing of health data

across borders” (European Commission, Communication on enabling the digital

transformation of health and care in the Digital Single Market; empowering

citizens and building a healthier society, 2018). Particularly the Commission will

support the eHealth Digital Service Infrastructure (European Commision, eHDSI

Mission, 2018) to enable new services for people, such as exchange of electronic

health records using the specifications of the European electronic health record

exchange format, and the use of the data for public health and research.

3.3.1.3.1 EHR and health services providers

Healthcare players are working on developing sharing health information solu-

tions, and mostly of them are focused on security and privacy for EHR sharing.

Other solutions are focused on providing different health services with different

business models.

Some existing solutions are:

▪ 4medica19: 4medica aims at securely exchange Health information in real

time.

▪ NextGen Healthcare20: Several software solutions aiming at solving in-

teroperability, health information exchange, analytics, etc. in connected

health environments.

▪ Greenway Health LLC21: Greenway Health LLC develops different services

and solutions to the health community, from sharing EHR to patient en-

gagement tools.

▪ Siemens Medical Solutions22: Siemens is a leader in integrated health solu-

tions and services and Healthcare IT.

▪ GE Healthcare23: General Electric offers A large portfolio of hardware and

software solutions and services focused on connected health, for patients

and practitioners.

19 https://www.4medica.com/

20 https://www.nextgen.com/

21 https://www.greenwayhealth.com/

22 https://www.healthcare.siemens.com/

23 https://www.gehealthcare.com/

https://www.4medica.com/

https://www.nextgen.com/

https://www.greenwayhealth.com/

https://www.healthcare.siemens.com/

https://www.gehealthcare.com/



▪ Allscripts Healthcare Solutions24: Solutions for Hospitals and Health Systems,

and an innovative solution called ePrescribe aiming at reducing medical

errors with an easy-to-use platform.

Contrary to the different existing solutions in the market to share healthcare infor-

mation, DITAS project outcomes related with e-Health use case will deliver a set

of norms and rules associated to the privacy and security management of sensi-

ble data, and data intensive applications working in this domain will use project

results to take advantage of the data and computation movements strategies.

3.3.2 Industry 4.0

The global Industry 4.0 market size is expected to grow to $310B by 2023 with a

37% CAGR from 201825, and according to “Industry 4.0 & Smart Manufacturing

2018-2023” report26, different key factors are responsible of this rapid growth: the

need of connected supply chains, data-based manufacturing processes, and

increasing availability of emerging technologies and solutions such as Blockchain

in manufacturing, Artificial Intelligence, Robotics, IIoT, Condition monitoring and

Cyber security. Additionally, by the year 2022, 64% of manufacturers predict their

factories will be totally connected through the IIoT27.

Figure 6. Global Industry 4.0 Market28

According to BCG29, the impact of Industry 4.0 can be analyzed in the four areas

in which the adoption of smart manufacturing are expected to bring more ben-

efit:

24 https://www.allscripts.com/

25 https://iot-analytics.com/industry-4-0-and-smart-manufacturing/

26 https://iot-analytics.com/product/industry-4-0-smart-manufacturing-market-report-

2018-2023/

27 https://www.zebra.com/us/en/about-zebra/newsroom/press-releases/2017/zebra-

study-reveals-one-half-of-manufacturers-globally-to-adopt-.html

28 Source: IoT Analytics-November 2018-Industry 4.0 Market Report 2018-2023

29 https://www.zvw.de/media.media.72e472fb-1698-4a15-8858-344351c8902f.origi-

nal.pdf

https://www.allscripts.com/

https://iot-analytics.com/industry-4-0-and-smart-manufacturing/

https://iot-analytics.com/product/industry-4-0-smart-manufacturing-market-report-2018-2023/

https://iot-analytics.com/product/industry-4-0-smart-manufacturing-market-report-2018-2023/

https://www.zebra.com/us/en/about-zebra/newsroom/press-releases/2017/zebra-study-reveals-one-half-of-manufacturers-globally-to-adopt-.html

https://www.zebra.com/us/en/about-zebra/newsroom/press-releases/2017/zebra-study-reveals-one-half-of-manufacturers-globally-to-adopt-.html

https://www.zvw.de/media.media.72e472fb-1698-4a15-8858-344351c8902f.original.pdf




Productivity: The integration of data-driven manufacturing processes will in-

crease productivity, for instance, predictive maintenance will avoid downtime

and will lead to better decision-making.

Revenue Growth: The possibility of combining information in real time about plan-

ning, production, warehousing and transportation will lead to an optimization of

the processes and increasing revenues. By 2020, Industry 4.0 is expected to bring

an average cost reduction of 3.6% p.a. across process industries globally, totaling

$421 billion30.

Additionally, recent studies find that about 50% of companies surveyed expect

double-digit growth in revenues in the next five years, attributed directly to Indus-

try 4.031, leading by Manufacturing and engineering sectors.

Figure 7. Growth in revenue attributable to Industry 4.0 per industry sector32

Employment: The demand for employees in the manufacturing sector is ex-

pected to growth although new skills will be required, and low-skills laborers, who

carry out repetitive tasks, will be displaced by workers with IT competencies such

as software development, connectivity, etc.

Investment: To adapt existing manufacturing processes to Industry 4.0 will require

that manufacturers invest in new devices, software application, and different ser-

vices such as cloud services.

It is expected that European industry will invest €140b annually in Industry 4.0 until

2020, according to PwC31, being Manufacturing and engineering the sectors with

a higher expected annual investment.

30 https://www.pwc.com/gx/en/industries/industries-4.0/landing-page/industry-4.0-build-

ing-your-digital-enterprise-april-2016.pdf

31 https://www.pwc.com/gx/en/industries/industries-4.0/landing-page/industry-4.0-build-

ing-your-digital-enterprise-april-2016.pdf

32 https://www.pwc.nl/en/assets/documents/pwc-industrie-4-0.pdf

https://www.pwc.com/gx/en/industries/industries-4.0/landing-page/industry-4.0-building-your-digital-enterprise-april-2016.pdf




https://www.pwc.nl/en/assets/documents/pwc-industrie-4-0.pdf



Figure 8. Annual investments in Industry 4.0 per industrial sectors33

3.3.2.1 Building blocks of Industry 4.0

Everything in Industry 4.0 is about data: gather and analyze data across ma-

chines to increase manufacturing efficiency, reducing costs and increasing

productivity and benefits, and a key role is played by IoT. But is not only IoT: cloud

computing, big data, data analysis at the edge of network (edge computing),

data exchange, mobile, programmable logic controllers (PLC), HMI and Scada,

and sensors and actuators, are key elements to create smart factories.

Nine technologies trends have been identified as forming the building blocks of

Industry 4.034 (see Figure 9).

Figure 9. Nine Technologies transforming Industrial production35

Big Data Analytics: Collection and evaluation of data from different sources will

be crucial to real-time decision making.

33 https://www.pwc.nl/en/assets/documents/pwc-industrie-4-0.pdf

34 https://www.bcg.com/capabilities/operations/embracing-industry-4.0-rediscovering-

growth.aspx

35 https://www.zvw.de/media.media.72e472fb-1698-4a15-8858-344351c8902f.origi-

nal.pdf

https://www.pwc.nl/en/assets/documents/pwc-industrie-4-0.pdf

https://www.bcg.com/capabilities/operations/embracing-industry-4.0-rediscovering-growth.aspx

https://www.bcg.com/capabilities/operations/embracing-industry-4.0-rediscovering-growth.aspx





Autonomous Robots: Autonomous and collaborative robots are being used in

smart manufacturing and working safely with humans and interacting among

them as well.

Simulation: 3D simulation used in prototyping of product development will be-

come widely used to optimize production and improving quality.

Horizontal and vertical system integration: Connect all the manufacturing supply

chain and all the departments of the entire organization will be solved by Industry

4.0.

Cybersecurity: Data of their process is crucial for companies, and the protection

of their information systems and manufacturing process is critical, so it would be

needed reliable communications and secure access to machines.

The Industrial Internet of Things: The explosion of IoT will enable to incorporate

embedded computing to sensors and actuators and connect them using stand-

ards technologies. This will lead to a smart manufacturing in which sensors will

connect each other and with centralized controllers in real time.

Additive Manufacturing: Industry 4.0 will allow Additive Manufacturing (AM) to be

more widely used, this will offer a lot of production and construction advantages,

such as high-performance, complexity, etc.

Augmented Reality: It is expected that Augmented reality will be able to provide

workers with real-time information and help to make decisions in manufacturing

processes.

The Cloud: Smart Manufacturing is more and more about harnessing data, and

that’s why all the technologies involved in the growth of industry 4.0 market man-

age and process data to help to improve the entire manufacturing process. To

process and analyses such amount of data, cloud-based software is required,

and Cloud computing is a crucial resource for smart manufacturing while offer-

ing a platform for open source collaboration.

At the same time, Edge Computing mixed with Cloud computing, will achieve

reaction times in just milliseconds.

3.3.2.2 Industry 4.0 Stakeholders ecosystem and solutions

Manufacturing Sector was a market led by “Product/Control solution providers”

represented by strongly positioned industrial automation corporates such as Sie-

mens, ABB, Rockwell Yokogawa, Schneider, etc. that offered proprietary solu-

tions. The increasing need of IT and connectivity solutions to implement Industry

4.0 has allowed other type of stakeholders of smart manufacturing: “IT solution

providers” and “Connectivity solution providers”.

IT solution providers will provide solutions for control, monitoring and data pro-

cessing, and an example of these companies are Microsoft, SAS, Oracle, IBM,

Intel, etc. On the other hand, Connectivity solutions providers will facilitate the

implementation of technologies with connectivity demands, some of these com-

panies are Cisco or Huawei.

The picture below showcases several brands that are currently investing in Indus-

try 4.0.



Figure 10. The new Industry 4.0 stakeholders ecosystem36

Product and Control solution providers

As said before, large industrial automation corporates have been the leaders in

the Manufacturing sector for many years, offering proprietary solutions to control

the industrial processes. But the evolution of the sector has led to these compa-

nies to take a step further in their offering portfolio and develop solutions to be

able to integrate the new technologies involved in the digitalization of the sector.

An example of that is Siemens, who position itself as an Industry 4.0 leader, up-

dating existing products or developing new ones with the ambition of helping

industrial companies to take advantage of the digitalization of the manufactur-

ing sector.

MindSphere37 is the cloud-based IoT platform and IoT operating system devel-

oped by Siemens to satisfy the technological needs of Industry 4.0. This open plat-

form offers Platform as a service (PaaS) with an extensive option for data ex-

change using Siemens APIs and native cloud accessibility, and connectivity op-

tions to support different IoT-ready assets (see Figure 11).

Figure 11. Mindsphere by Siemens38

36 https://dzone.com/articles/industry-40-the-top-9-trends-for-2018

37 https://www.siemens.com/content/dam/webassetpool/mam/tag-siemens-

com/smdb/corporate-core/software/mindsphere/mindsphere-brochure.pdf

38 https://www.siemens.com/content/dam/webassetpool/mam/tag-siemens-

com/smdb/corporate-core/software/mindsphere/mindsphere-brochure.pdf

https://dzone.com/articles/industry-40-the-top-9-trends-for-2018

https://www.siemens.com/content/dam/webassetpool/mam/tag-siemens-com/smdb/corporate-core/software/mindsphere/mindsphere-brochure.pdf






IT solution providers

IT solution providers have leverage the opportunity of being part of the manu-

facturing revolution and bridge the gap between automation and IT.

Some of the most prominent solutions are:

• Microsoft Azure IoT Suite Connected Factory Solutions

Microsoft has developed Azure IoT Suite Connected Factory Solution39, a cloud-

based platform to manage industrial IoT devices in real time, introducing AI and

other advanced solutions such as Microsoft Hololens40 that enable interact with

holograms and visualize relevant data.

A typical architecture using Azure IoT, includes Azure IoT Edge for real-time data

ingestion and processing and with possibilities to adapt to several open-source

and standard protocols of different manufacturers and vendors.

Figure 12. Architecture using Azure IoT41

• Google Cloud IoT Edge

Cloud IoT Edge enable cloud-integrated edge computing, extending machine

learning and data processing capabilities provided by Google Cloud to edge

devices42.

This solution is being used in smart manufacturing to act on sensors or predict

outcomes in real time.

39 https://azure.microsoft.com/es-mx/blog/azure-iot-suite-connected-factory-now-avail-

able/

40 https://www.microsoft.com/en-us/hololens

41 https://blogs.msdn.microsoft.com/msind/2018/04/27/iiot-smart-factories-ai-azure-iot-

edge/

42 https://cloud.google.com/iot-edge/

https://azure.microsoft.com/es-mx/blog/azure-iot-suite-connected-factory-now-available/

https://azure.microsoft.com/es-mx/blog/azure-iot-suite-connected-factory-now-available/

https://www.microsoft.com/en-us/hololens

https://blogs.msdn.microsoft.com/msind/2018/04/27/iiot-smart-factories-ai-azure-iot-edge/

https://blogs.msdn.microsoft.com/msind/2018/04/27/iiot-smart-factories-ai-azure-iot-edge/

https://cloud.google.com/iot-edge/



Figure 13. Google Cloud IoT Edge workflow43

• Amazon Web Services IoT Platform

Amazon has developed AWB IoT Platform44, that is a cloud-based platform that

enable an easy interaction of the devices with other devices and cloud appli-

cations. This application also allows to use and integrate another AWS services to

create complete solutions for Industry 4.0.

Figure 14. AWS IoT architecture45

• IBM Watson IoT

IBM has developed IBM Watson IoT platform46, a cloud-hosted platform to man-

age device data and machines with Blockchain and AI services.

43 https://cloud.google.com/solutions/iot/

44 https://docs.aws.amazon.com/es_es/aws-technical-content/latest/aws-overview/in-

ternet-of-things-services.html#aws-iot-platform

45 https://cloudacademy.com/blog/aws-iot-internet-of-things/

46 https://www.ibm.com/us-en/marketplace/internet-of-things-cloud

https://cloud.google.com/solutions/iot/

https://docs.aws.amazon.com/es_es/aws-technical-content/latest/aws-overview/internet-of-things-services.html#aws-iot-platform

https://docs.aws.amazon.com/es_es/aws-technical-content/latest/aws-overview/internet-of-things-services.html#aws-iot-platform

https://cloudacademy.com/blog/aws-iot-internet-of-things/

https://www.ibm.com/us-en/marketplace/internet-of-things-cloud



Figure 15. IBM Watson Architecture

Connectivity solutions providers

Providing different types of connectivity in a factory is key important. Connectiv-

ity solution providers offer, wireless solutions or integration with the different Indus-

trial Automation and Control Systems (IACS) protocols existing and widely imple-

mented in the sector, such as Profinet, CC-Link or Ethernet IP.

Cisco has developed several solutions to its Connected Factory portfolio, such

as Connected Asset Manager (CAM) for IoT Intelligence that is a visualization tool

to manage data or Industrial Network Director that provides factories full control

of the plant network47.

Other Telco provider, Huawei is developing software and hardware solutions to

equip smart devices, for example Huawei’s LiteOS IoT operating system is em-

bedded in smart devices used in manufacturing simplifying cloud interconnec-

tions, or two wireless access methods: eLTE and NB-IoT to enable communica-

tions within the plant48.

In summary, stakeholders’ ecosystem in Industry 4.0 before described can be

classified as follow:

Stakeholders Companies Solutions

Product and Control

solution providers

Siemens, ABB, Rockwell

Automation, Honeywell,

Schneider, Bosch, etc.

MindSphere, Agility 4.0,

EcoStruxure, Bosch IoT

Suite, etc.

IT solution providers Amazon, Microsoft,

Oracle, IBM, Intel, etc.

Azure IoT, AWS IoT,

Oracle IoT Cloud,

Google Cloud IoT Edge,

IBM Watson IoT, etc.

47 https://www.cisco.com/c/en/us/solutions/internet-of-things/manufacturing-digital-

transformation.html

48 https://www.huawei.com/en/about-huawei/publications/communicate/84/iot-

makes-manufacturing-smart

https://www.cisco.com/c/en/us/solutions/internet-of-things/manufacturing-digital-transformation.html

https://www.cisco.com/c/en/us/solutions/internet-of-things/manufacturing-digital-transformation.html

https://www.huawei.com/en/about-huawei/publications/communicate/84/iot-makes-manufacturing-smart

https://www.huawei.com/en/about-huawei/publications/communicate/84/iot-makes-manufacturing-smart



Stakeholders Companies Solutions

Connectivity solution

providers

Cisco, Huawei, etc. CAM, LiteOS IoT, etc.

Table 2: Classification of Industry 4.0 Stakeholders

Industrial IoT platforms (IIoT) are the core of Industry 4.0 and in the competition of

the IIoT platform markets, IoT platform vendors are already integrating Block-

chain, and new solutions come with augmented vision, machine vision, or digital

twin capabilities. In the end, cognitive capabilities and artificial intelligence will

become the differentiating factor among IoT platform solutions. But while most

vendors provide hardware that support IoT solutions, the true differentiator is the

edge software to solve Industrial IoT projects where a lot of data must be pro-

cessed at or nearby the edge, so IoT vendors are moving their focus to the edge.

The IIoT ecosystem is quickly evolving and mainly led by major vendors, but new

start-up and solutions are emerging with capacity to close gaps and comple-

menting competencies. This is an opportunity for solutions like DITAS that can be

integrated in an IoT platform as a differentiator element sectors in which cloud

and edge environments play a key role.

3.4 Market context questionnaire

To collect business requirements and validate market context assumptions, the

DITAS consortium has conducted a market context questionnaire and interviews

among stakeholders.

This process has been carried out by the consortium partners through different

means: a) contacting stakeholders and sending the questionnaire template by

email, b) call interviews with stakeholders, and c) personal interviews with stake-

holders in different events (conferences, workshops, etc.).

The survey has been anonymously conducted and didn’t require to submit any

personal data from the interviewees.

3.4.1 Characterization of interviewees

The selection of the sample aims at covering the broadest range of stakeholders’

profiles identified in DITAS project and has included 12 individuals within the DITAS

consortium (Atos, IBM) who are not directly involved in the project activities, and

external companies/institutions to the consortium.

The organizations selected have been companies working on Manufacturing

sector (2), IT providers (5, most of them are Cloud providers), Telecommunication

providers (1), Software development companies (3) and Academia (1), and de-

pending on their size, 5 Large Companies, 4 SME, 2 VSE, and 1 N/A.

Different roles were covered by the interviewees involved in the survey, from

technical positions to business developers.



Figure 16. Characterization of the organizations

Figure 17. Interviewees’ roles

3.4.2 Summary of Questionnaires and interviews conducted

Below is a summary of the market-oriented questions included in the survey car-

ried out and the interviewees’ responses.

Question 1: Does your organization require Cloud/Edge services to develop its

business?

The adoption of Cloud services is going mainstream, and most companies make

use of these services to develop their business. Among the companies involved

in the survey, excepting Cloud providers, for obvious reasons, all of them use dif-

ferent Cloud providers such as: Google Cloud, AWS, IBM cloud, and Vodafone

Cloud. And for Edge computing they mainly use Microsoft Azure and AWS Green-

grass.

0 1 2 3 4 5

Project Manager

Software Architect

Product Developer

Business Developer

Sales Force

CTO

Researcher

17%

42%

25%

8%8%

Manufacturing

IT products and solutions

Software developments

Telco

Academia

42%

33%

17%

8%

LARGE SME VSE N/A



But it is significant how Manufacturing companies surveyed are still reluctant to

use Cloud services for their processes and make use of own services and devel-

opments.

Question 2: Does your organization use data from external sources for their com-

mercial offering?

75% of the interviewees do not use data from external sources for their commer-

cial offering.

The rest, 25% of the respondents use data from external sources since their port-

folio offering integrates data analysis such as Telco companies.



Question 3: Does your organization sell or buy data?

67% of the interviewees manage data

from others, contrary to 17% that man-

age own data.

Only Telco companies surveyed sell

and buy data.

This aspect of the data is important to

know since business models identified

for DITAS results are based on manage

data from data owners and consum-

ers.

Question 4: Has your company difficul-

ties managing those data?

SMEs and VSEs have expressed their

concerns to manage data, and 36% agree that the most important aspect for

them is data security and privacy, followed by 21% of data visualization and 14%

of data acquisition.

Figure 18. Difficulties managing data

For Manufacturing industry companies, the most important concern is the data

acquisition and data visualization, while for the rest of company profiles it varies

among the concerns previously described.

Question 5: What problems can DITAS solve for you that your company is already

solving with other workflows/tools/platforms?

Different points of views about what DITAS can solve for their businesses have

been found depending on the company profile. Some of the answers for this

question has been:

• For IT Companies:

a) DITAS can be an extra security layer to ensures except of the integrity and

confidentiality of the data GPDR compliance

b) DITAS can provide Data redundancy and orchestrate the data move-

ment from one cloud provider to the other hiding at the same time the

complexity of security restrictions that each cloud vendor has (API Keys)

• For Software development companies:

17%0%

0%

8%67%

8%

Manage own data

Sell data

Buy data

Sell and buy data

Manage data from others

N/A

0% 5% 10% 15% 20% 25% 30% 35% 40%

Data acquisition

Data storage

Data visualization

Data dismissal

Dana analysis

Data security and privacy

Data movement



a) DITAS can help data movement based on applications’ logic

b) DITAS can provide a Data security framework

• For Telco companies:

a) Performance wise DITAS could be used to move data from a warehouse

to another

• For Manufacturing industry companies:

a) Manage automatically the Fog/Cloud communication

Question 6: Would your organization consider using a solution such as DITAS in

your workflow?

All the interviewees agree that they would consider using DITAS in their workflow.

Question 7: How much will your organization willing to pay for such services?

This is an open question and most interviewees agree that it will depend on:

• The functionalities or services DITAS provide

• The business models

• Must consider the client and the case first

Question 8: Is there any other workflow/tools/platform or solutions/service that

solves the problem better/cheaper than DITAS?

Some interviewees have mentioned solutions such as Docker49.

Question 9: Do you think that the Open Source approach of DITAS could be a

barrier for your organization?

We can confirm that the Open Source of DITAS results is not a problem for 99% of

the companies surveyed.

For Manufacturing industry companies is very important that besides the Open

Source approach, the solution complies with industry standards.

49 https://www.docker.com/

https://www.docker.com/



4 Update to the Business and Technical Requirements

In the first version of the document, we elicited and analysed an initial set of busi-

ness and technical requirements, derived from two sources: (i) questionnaires

that were circulated to external entities and (ii) use case analysis. In this updated

version, we revised the requirements listed in D1.1 (D1.1, 2017) as well as added

new ones, based on the final picture of the project architecture. Towards that

direction, we interviewed people with expertise, asking them to rank nine general

requirements that we have identified, as also some parameters, that are related

to data and computation movement. The technical questionnaire that was

passed to the experts can be found in Annex 3. In total, we have collected 21

answers from the respondents so far and we will try to update with more ques-

tionnaires in the upcoming months.

Furthermore, in this document we focus on the traceability of the requirements.

In order to enhance that process, we extended the table –presented in D1.1

(D1.1, 2017)- that describes each of the requirements, by adding two more fields:

• Component that fulfils it

• Test case / Acceptance criteria

These fields will enable the consortium to better track the requirements and thus

to ensure that those have been addressed. The extended table is depicted be-

low, whereas the complete list of the requirements can be found in Annex 1. As

in D1.1 (D1.1, 2017), we prioritise the requirements using the MoSCoW method

(IIBA, 2009).

ID • For WPs 1-4: B(for Business requirement) or T(for tech-

nical requirement) + WP number origin.counter (e.g.

the first business requirement of WP2 is B2.1 etc.)

• For WP5:

o For Industry 4.0 use case: EU1.F (for DITAS Frame-

work level requirements) or UC (for Use Case

level requirements) counter e.g. EU1.F1, EU1.F2,

…, EU1.UC1, EU1.UC2,…

o For e-Health use case: EU2.F (for DITAS Frame-

work level requirements) or UC (for Use Case

level requirements) counter e.g. EU2.F1,

EU2.F2,…, EU2.UC1, EU2.UC2,…

Requirement

Type

This field determines whether the requirement is Functional or

Non-Functional

• Functional Specific technical implementation require-

ments) • Non-Functional (general abstracted architectural or

conceptual requirements)

Source This field identifies the source of the requirement

• Questionnaire • DITAS Analysis

Priority Based on MoSCoW method

• M - Must have this requirement to meet the needs



• S - Should have this requirement if possible, but project

success does not rely on it • C - Could have this requirement if it does not affect an-

ything else on the project • W - Would like to have this requirement later, but deliv-

ery won't be this time

Category Illustrates the category of the requirement

• Extensibility (Scalability, Expandability, Portability) • Security (Privacy, Integrity, Non-Repudiation) • Interoperability (Reusability, Connectivity, Adaptation) • Performance (Availability, Reliability) • Maintainability (Evaluability, Evolvability) • Other category

Component

that fulfils it

Indicates the specific component that fulfils the requirement

Description This field contains the specification of the requirement (de-

scription of the purpose and goals to be fulfilled), written in a

preferably concise, yet clear way. At this point one should be

very specific as to which is the goal of this requirement and

envisioned benefit.

Rationale This field describes the need that the specific requirement is

covering.

Dependencies This field contains a list of possible interdependencies between

the requirements.

Test case / Ac-

ceptance cri-

teria

This field describes the way to test the requirement, to ensure

that it is developed and thus, fulfilled. This information will be

the base for the Verification & Validation procedures

Time-frame This field provides an estimation on the time frame to have this

requirement fulfilled.

• Report period 1 • Report period 2

Comments Extra comments that could be used in order to further describe

the specific requirement.

Table 3: Fields to be fulfilled by the requirements of DITAS.



5 DITAS Architecture

The DITAS architecture has been designed taking into account the main design

principles guiding service-oriented systems, fog computing environments, and

content delivery networks.

Generally speaking, the services considered in DITAS as Data Services, i.e., com-

ponents able to provide data that the owner wants to make available, in a read-

only mode, to data consumers. These data live in the resources managed by the

data owner and could be stored in databases or offered as streams.

Depending on customer’s needs, more or less complex data processing can be

done prior to data access. Indeed, the proposed service could offer the data as

they are or analysis on data sets.

Adopting a service-oriented architecture, the visibility principle has been

adopted to make the provided functionalities accessible through APIs that are

made publicly available. Conversely, details on the implementation must be hid-

den to the service consumer. In DITAS, APIs concern the ability to access to data

that could be stored on databases or generated by sensors or returned by data

processing methods.

Due to the heterogeneity of data sources, the service provisioning requires to

deal with different devices that can be located both on the edge and the cloud,

i.e., a Fog environment. For this reason, DITAS architecture must support the data

provisioning, especially when processing is required, with a deployment able to

balance between the scalability and security offered by the cloud resources and

the reduced latency offered by the edge resources.

At run time, the DITAS solution relies on data movement that, with respect to what

usually happens in the content delivery networks, does not occur only from the

cloud to the edge but also the vice-versa, from the edge to the cloud.

To achieve these objectives, DITAS introduces the Virtual Data Container (VDC)

which has the role to embed in a single logical unit the components which con-

stitute the original data-intensive application along with specific modules that

offers a way to access data which is agnostic with respect to the specific under-

lying technology (see Figure 19). In addition, a data utility enforcement is in-

cluded to check if the quality of the data offered by the service as well as the

quality of the service is met. In case these qualities are not in line with the cus-

tomer expectation, the DITAS architecture is able - enacting data and compu-

tation movement strategies - to recover the situation.

Data movement concerns the need to move the original data set from the loca-

tion in which the owner of the data has decide to store them to other places

managed by the DITAS platform or by the consumer to reduce the latency, while

preserving the security during the transmission and the privacy. Similarly, the com-

putation movement enables the possibility to move a VDC among the resources

made available by the data owner, the DITAS platform, and the customer.

As the computation and especially data movement could have an impact on

the privacy and security issues, the VDC has also to verify that data are always

stored in the places and in the format that satisfy the consent of usage as agreed

by the data owner and the data customer.



Being provided an informal overview of the DITAS architecture, the next para-

graphs will propose a more detailed and precise view. In particular, we firstly de-

fine the actors involved in the architecture, then a complete overview of the

components able to design, deploy and manage VDCs are described accord-

ing to two main components that constitute the DITAS architecture:

• the DITAS-SDK concerning the definition and the retrieval of a VDC

• the DITAS Execution Environment (DITAS-EE) that manages the execution

of the VDC as well as the data and computation movements.

Figure 19. The conceptualization of Virtual Data Container

5.1 DITAS roles

The data administrator is the owner of data sources and has a complete

knowledge of them. The data administrator takes advantage of DITAS to enable

the provisioning of some of the internal data that s/he would like to make acces-

sible by other subjects. Depending on the subject and the consent of usage, the

visibility on these data can be partial or total. With DITAS, the data administrator

can simplify the process of making her/his data available as, through the VDC,

the DITAS platform is able to optimize the data provisioning by means of data

and computation movement. In fact, the data administrator has only the task to

define the exposed API, i.e., the Common Access Framework (CAF), reflecting

the methods to access to the data.

The application developer is the actor in charge of creating the VDC. Based on

the data sources made available by the data administrator s/he responsible of

defining the code able to expose the API defined by the data administrator.



Depending on the case, the data processing developed can be a simple con-

nection to the provided data sources or complex data analytics. As a result, the

application developer is able to provide a complete specification of a VDC. It is

worth noting, that in several cases the same actor will hold both the data admin-

istrator and the application developer roles.

The application designer represents the service consumer and her/his goal is two-

fold. On the one hand, the goal is to select the most suitable VDC with respect

to her/his requirements. For this reason, the DITAS platform has to provide a

matchmaker able to compare the application requirements and the capabilities

offered by a VDC. This matchmaking is mainly driven by the data utility which

encompass the quality of service, quality of data, and reputation aspects. On

the other hand, she/he has to check if the VDC is really providing what has been

promised both according to functional and non-functional perspective.

The DITAS operator is responsible for the run-time platform; this includes the re-

sponsibility for maintaining the applications running. The system operator has no

specific application or data knowledge, but rather dependent on the monitoring

tools to verify that all the applications are properly running, to monitor the cor-

rective actions the DITAS platform is taking, and to provide feedback at design-

time by suggesting refinements of the data utility specification.

5.2 DITAS-SDK Architecture

The major goal of the DITAS SDK is to support the definition and the matchmaking

of the VDC Blueprint. All the components created in the context of DITAS SDK are

created to support the full lifecycle of the VDC Blueprint.

VDC blueprint is a structured document (in our implementation we are using

JSON to this purpose) created to capture all the properties of the VDC which

goal is twofold:

● to support the application designer when looking for the dataset that

could be interesting for his/her purposes.

● to support the DITAS-EE to properly deploy all the components composing

the VDC needed to expose the data.

The VDC Blueprint consists of 5 distinct sections (described in the deliverable D3.2

(D3.2, 2018)) created to describe different aspects of the VDC instances:

● Internal Structure: High-level textual description of the VDC to character-

ize it as a product, focusing on business characteristics.

● Data Management: Specifies the attributes of the methods offered by the

VDC and, for each method, the guaranteed levels of data quality, secu-

rity and privacy. This is the set of information that are defined by the data

administrator to inform the DITAS platform about where the data sets to

be exposed are.

● Abstract Properties: Contains all the rules in the form of goal trees to be

used by the SLA Manager in order to define the SLA contract which will

hold during the VDC usage between the data administrator and the data

designer/data developer.

● CookBook Appendix: Describes the deployment information to properly

host the VDC in the DITAS-EE. This information also contains the details to

create not only the VDC but also the VDM, which will be in charge of

managing the VDC instances created from the same VDC Blueprint



● Exposed API: Technical section to enable the application developer to

fully understand how the VDC exposed methods work.

Each of these sections addresses different DITAS roles and components. DITAS

SDK is created in order to handle all the needed operations for creating storing

and delivering the Blueprints to the Application Designer (Figure 21). In fact, the

VDC blueprint cannot be considered as a document created in a single step

but, depending on the interaction among the actors, the sections are incremen-

tally defined. More in details, Figure 20 shows the VDC Blueprint lifecycle where

three versions of the VDC Blueprint are included:

● Abstract VDC Blueprint.

● Intermediate VDC Blueprint.

● Concrete VDC Blueprint.

Figure 20. VDC Blueprint Lifecycle

In the first step, which is composed by several activities (see Figure 21), Data Ad-

ministrators create the Abstract VDC Blueprint and then the blueprint validator

component validates this document in order to store it in the blueprint repository

by using the Blueprint Repository Engine which is the component responsible for

carrying all the CRUD operations for the Blueprint Repository. Applications design-

ers should be able to select the appropriate blueprint based on the requirements

of the application that is developed.



Figure 21. DITAS SDK Architecture

In order to select the most appropriate Blueprint, the Resolution Engine compo-

nent is introduced. This component takes as input the Application Requirement

file that the Application Designer produces and filters the Abstract blueprints ac-

cordingly. Application Requirement file is also a JSON formatted file which con-

tains all the requirements of the Application Designer. Requirement file contains

information about the content that the VDC should deliver, the QoS that the VDC

is committed to deliver, the DATA quality of the Sources and also information

about the Privacy and Security features of the VDC.

Resolution engine consists of three subcomponents (Figure 22). The first compo-

nent is the content based search which filters the Blueprint based on the content

they deliver. The second component is the DURE (Data Utility Resolution Engine)

which is responsible for filtering and ranking the filtered Blueprints from the previ-

ous step based on the QoS and Data Quality features. Finally, the Resolution en-

gine communicates with the third component, Privacy and Security Evaluator

which is responsible for filtering and Ranking the Blueprints based on the Privacy

and Security requirements of the Application Designer.



Figure 22. DITAS SDK Resolution Engine Architecture and component interaction

It is important to mention that the Abstract Blueprints that fulfill all the require-

ments are altered by the DURE by inserting non-functional constraints that will be

used for conducting the SLA if this Blueprint is selected by the Application De-

signer. This additional information transforms the Abstract VDC Blueprints in the,

so-called, intermediate VDC Blueprints which will be returned as possible candi-

dates for the designing application to the Application Designer. Once the Appli-

cation Designer has selected one of the returned blueprints candidate, the In-

termediate VDC Blueprint is sent to the Deployment Engine in order to create the

Concrete Blueprint.

To do so, the Deployment Engine will consider the list of resources included in the

Application Requirements document that the Data Administrator, the Applica-

tion User and the Data Provider may want to provide for running VDC. These re-

sources are computation instances and storage space that can be allocated in

a Public Cloud or machines and disks that are already available at the Edge. It

is worth noticing that we assume that these resources are properly configured to

be made accessible by the Deployment Engine as well as by the DITAS-EE to

deploy and execute portion of the data storage, data computation, and data

movement actions

As a last step, the Concrete VDC Blueprint is passed to the DITAS-EE components

and it will provide visibility about the available resources that can be used to

optimize the VDC execution.

5.3 Execution Environment Architecture

The Execution Environment (EE) is the second main element of the DITAS platform

and provides support for executing and managing the VDC lifecycle once de-

ployed by the Deployment Engine as agreed by the Application Designed and

the Data Administrator exploiting the SDK facilities.

In fact, the DITAS-SDK and the DITAS-EE are in tight connection through the De-

ployment Engine, which is responsible to build and configure the Execution Envi-

ronment between cloud and edge devices running on top of Kubernetes infra-

structure, based on the Blueprint resolution selected by the Data Administrator.

Also, the EE provides the foundational capabilities required by the VDCs’ security-



and privacy-related components in matters of Identity and Access Manage-

ment.

The execution environment takes decisions about what, where, when, and how

to move data or computation resources. To this aim it is composed by two main

elements:

● Virtual Data Container (VDC), created from what has been defined in a

Concrete VDC Blueprint and deployed to serve a specific application by

providing data exposed by the data administrator.

● Virtual Data Manager (VDM), which is in charge of executing, monitoring

and moving either data or computation within the environment.

When having more than one Concrete VDC Blueprints, which are generated

from the same Abstract VDC Blueprint, this results in the generation of many VDCs

that are all supervised by only one VDM in the same Kubernetes environment,

and that access to “logically” the same data source (see Figure 23). Logically it

means that the data could be replicated or moved in the environment to im-

prove the performances of the application, while preserving the privacy of the

data.

Figure 23. DITAS Execution Environment for several deployments of the same blueprint

5.4 VDC Architecture

The VDC provides an abstraction layer that takes care of retrieving, processing

and delivering data with the proper quality level, while in parallel putting special

emphasis on data security, performance, privacy, and data protection. The

VDC, acting as a middleware, lets the application designer simply to define the

requirements on the needed data, expressed as data utility, and takes the re-

sponsibility for providing this data timely, securely and accurately by hiding the

complexity of the underlying infrastructure. The infrastructure could consist of dif-

ferent platforms, storage systems, and network capabilities. The VDC Blueprint

describes thoroughly the VDC, since it includes, among others, information about

the business characteristics of it, about the data sources that the VDC connects

to, how to deploy it as well as the API that the data administrator exposes to the

data consumers.

From the technical point of view, the Virtual Data Container is, by definition, pro-

gramming platform and language agnostic, in order to facilitate the life of the

application developer, who is in charge of creating the VDC. Indeed, the devel-

oper has the flexibility to implement the VDC based on the platform and the lan-

guage with which he/she is familiar. For instance, in the context of DITAS project

VDM

VDC VDC VDC

..DATA STORAGE



- as required by the two uses cases considered in the project - one of the imple-

mented VDC uses the Spark platform and relying on the Spark SQL module for

structured data processing, while another one uses the Node-RED platform,

whose lightweight runtime is built on Node.js. In this way, it is possible to evaluate

the DITAS approach in different situations: i.e., in former case, the VDC has to

deal with data analytics over a heterogenous data set, whereas in the latter case

data offered as streams are collected and processed.

Moreover, the VDC is architectural agnostic and therefore it is able to run at the

edge of the network on low-cost hardware such as the Raspberry Pi as well as in

more powerful cloud resources. This VDC principle is of high importance, partic-

ularly while enacting computation movement strategies that enable the possibil-

ity to move a VDC between heterogeneous resources that compose a Fog en-

vironment.

On a high level, a Virtual Data Container consists of three different layers: the

Common Accessibility Framework (CAF), the Data Processing and the Data Ac-

cess Layer (DAL), as depicted in Figure 24.

Figure 24. High-level view of the VDC

5.4.1 Common Accessibility Framework

The role of CAF is to ensure that VDCs serve their data in a unified and pre-de-

fined manner. It is actually the interface between a VDC and the data-intensive

application, meaning that the latter knows only the CAF, which hides all the com-

plexity behind the VDC. The data administrator publishes the CAF API, which con-

tains a set of well-described methods, through which he/she makes available

some of the data included in the data sources, to which the VDC is connected.

From the implementation point of view, the programming model that CAF follows

is REST oriented and the adopted common communication protocol is HTTP. Re-

garding the API, it is described according to the OpenAPI specification, ex-

tended with features that enable the DITAS platform to implement data move-

ment techniques. The complete definition of the CAF API is included in the Ab-

stract VDC Blueprint under the Exposed API section.



5.4.2 Data Processing

Data processing layer contains all the computation, data transformation and

composition that the VDC implements in order to provide the data to the con-

sumers in the content and format that the exposed API promises. Regardless of

the adopted programming language, the VDC is able to perform a set of pro-

cessing techniques to the data coming from the sources. Depending on the busi-

ness logic of each one exposed VDC method, the range of this processing may

vary between just fetching raw data from a single database, on the one hand,

and querying multiple sources, applying analytics on the data and compressing

it before serving the response to the client, on the other hand. The code that

implements the data transformation layer is included in the concrete VDC blue-

print.

5.4.3 Data Access Layer

The third element of a VDC is represented by the Data Access Layer (DAL), which

has the fundamental role of exposing the data provided by the Data Adminis-

trator to the DITAS-EE infrastructure without violating any privacy and security

constraints. In fact, the DAL includes the Privacy Enforcement Layer, which is the

component in charge of rewriting the SQL, which is required to be executed in

order to satisfy the call coming from the Processing Layer, to a SQL that avoids

returning the data that cannot be seen externally. This filtering is affected mainly

by the location of the VDC. In fact, there is a possibility to move the computation,

i.e., the processing and the CAF layer, and this could affect the data that can

be transmitted. For this reason, an important assumption about the DAL requires

that this layer is deployed in the same place, where the data is stored, i.e., it is

invariant of the computation movement.

Focusing on the data movement, in case the strategy is to duplicate the data

source somewhere else (e.g., on the premises of the consumer) only the DAL firstly

ensures that only the data that can be stored at that location are replicated.

Secondly, a new instance of the DAL is instantiated at that location to perform

access control.

In more detail, Privacy Enforcement Engine acts as a proxy before executing the

query over the data. It rewrites the query so that it returns only data compliant

with privacy policies, evaluated together with user identity information. To this

end the original query is augmented with filters based on policies and on addi-

tional attributes of the request or the data, such as the data subject consent.

The protocol of communication between the DAL and the rest of the VDC is

gRPC since, on the one hand, it is generic enough and supports well both re-

quest-response model and streaming and, on the other hand, it is much more

efficient than plain REST over HTTP.



Figure 25. High-level view of the DAL

5.4.4 Other VDC Components

VDC Request Monitor - It monitors the VDC incoming and outgoing requests by

intercepting the http traffic. The requests are evaluated and enriched with blue-

print metadata and stored in a monitoring database. Further information, like re-

sponse time, error codes are measured and reported.

Throughput Agent - It monitors the data traffic between VDC components and

the data source. Measurements are aggregated and enriched with added data

from other monitoring components and then also stored in Elasticsearch.

Logging Agent - It monitors the logs of different DITAS components as well as of-

fering an interface to the VDC and other VDC components to report additional

information to the Elasticsearch.

Data Utility Evaluator (DUE@VDC) At runtime, it is responsible for evaluating the

Data Utility for the VDC and for providing information to the SLA Manager to trig-

ger data and computation movement in case of data utility requirements viola-

tion.

SLA Manager - Checks that the Quality of Service constraints defined in the ab-

stract blueprint for the different data sources are met during the execution of the

VDC, sending a violation message to the VDM if not.

5.5 VDM Architecture

The role of VDM is to coordinate the several VDC that can be instantiated from

the same VDC blueprint. In fact, for each user requesting for a VDC Blueprint, a

VDC instance is generated. As a result, many VDC instances will be created each

of them accessing to the same data source. In this configuration, a coordination

is required, as each VDC operates independently from the other and if the deci-

sion on data movement would be left to the VDC it might happen that decision

taken by a VDC could negatively another VDC. For instance, assuming that there



are two VDCs where one is deployed on the cloud and the other on the edge,

as they access both to the same data source it might happen that the former

prefers to have data on the cloud, while the latter on the edge. Assuming that a

duplication of data is not possible, this situation could result in a continuous data

movement between the two nodes, as the VDC wants to optimize their local

behavior. To avoid this situation a VDM is required.

To allow a proper management of this type of conflicts, the VDM is equipped

with components able to monitor all the controlled VDC, to decide and enact

data and computation movement actions (see Figure 26). More in details:

Figure 26. High-level view of the VDM

• DUE@VDM - it aggregates the data utility values calculated at runtime by

the different DUE@VDCs. Such data utility values are referred to the data

returned by the methods offered in the VDC. The aggregation of the utility

values helps in understanding the level of data utility provided by the dif-

ferent methods at run time.

• Decision System 4 Data Movement - This VDM component plays an im-

portant role in the DITAS platform, since it decides when and where to

move data sources and VDCs in the fog infrastructure managed by DITAS.

The Decision System for Data and Computation Movement (DS4M) con-

siders requirements of application designers and their violations: when a

requirement is violated the DS4M enacts the movement that will have the

highest positive impact on the violated requirement and, therefore, the

highest probability of restore the satisfaction of such requirements. In case

data sources of the VDM are shared among multiple applications, the

DS4M considers the requirements of all applications and enacts the best

data movement based on the entire set of requirements.

• Data Movement Enactor - it enacts the actions of data movement across

locations, where the data could be easily used, by consuming the API of

the storage layer. It will copy and maintain synchronicity of data between

edge and cloud servers/instances; making the data available closer to

where it’s needed. • Computation Movement Enactor - Moves computation units between the

available computational resources to optimize data access times. Facili-

ties made available by Kubernetes, the orchestrator adopted in the



project, containers fully or partially implementing the processing layer of

a VDC can moved among the different locations in which the processing

is allowed to be performed without violating any privacy constraints.

• Data Analytics - aggregates additional information, generated by the op-

eration of different DITAS components, as Data/Computation Movement

Enactor, Decision System for data and computation Movement, SLA Man-

ager, Throughput agent and Logging agent. Provides an interface to

query the various data sources that comprise this information and does

additional processing and refining where necessary. Its queries integrate

key QoS metrics used in the operations of other components such as the

SLA manager and Decision System for data and computation movement.

The Data analytics API is placed inside a separate container in a Kuber-

netes environment with its own endpoint and internal DNS, which allows

communication with other DITAS components. Data Analytics provides an

API which translates requests into ElasticSearch queries and outputs the

results in accordance to a format suitable for use by other DITAS modules.

• Log data analysis service - provides log analysis for the data administrator

to get insights about their own data sources. This module also offers to the

DITAS operator the possibility to analyze the situation of the whole platform

in terms of, for instance, violations, bottlenecks, deadlocks, regardless of

a specific VDM and VDCs.

5.6 VDC and VDM integration

The connection between the VDM and the VDC is represented in Figure 27. Gen-

erally speaking, the VDM is interested in the data about the behavior of the VDC.

As already mentioned above, the relevant events occurring during the execu-

tion of the VDC are stored in the ElasticSearch module. Such low-level events

(e.g. applications are accessing to some data) are then processed by the data

analytics module in order to generate high-level events (accessed data are re-

turned with a given quality) that can be used by the SLA manager to raise possi-

ble violations (e.g., the returned data have not sufficient quality).

It is worth noticing that the VDM is also indirectly connected to the SDK. In fact,

the DUE@VDM is a component that is able to decide, based on the behaviors of

the running VDC, whether the promised data utility values stored in the Abstract

Blueprint needs a revision. For instance, if the level of accuracy of the data set

agreed for a given VDC is very often violated, it is reasonable for the data ad-

ministrator to have a tool able to raise this issue and to suggest the proper value

to be stored. In this case, in accordance with the data administrator, the Abstract

VDC blueprint can be revised with the new values which will be used for the sub-

sequent instantiations.



Figure 27. High-level view of the VDM and VDM integration



6 Detailed Technical Verification and Validation Approach

The following is a detailed technical verification and validation approach that

the consortium members will be performing during the project. The results of the

verification and validation results will be delivered on D5.450 on M36.

6.1 Requirements traceability

Tracking requirements is an important aspect of the software development pro-

cess as it ensures that all of the requirements have been correctly considered

and updated during each stage of the project. It’s a decisive part as it guaran-

tees that the development team has covered every need and no functionality

is missed out of left untouched, giving with this, the stability and consistency to

the final product and to each of its components.

The most common way of ensuring that there’s a proper and full traceability is

using a Requirements Traceability Matrix (RTM). For the DITAS project, we are us-

ing a Google Docs-based Excel sheet Requirements Traceability Matrix, one for

each Work Package, and one for each use case. The following figure shows an

example of the Requirements Traceability Matrix for WP2.

Figure 28: Requirements Traceability Matrix for WP2

This Requirements Traceability Matrix gives us a fast view and a total traceability

of each requirements with the component that has to fulfil it, and which test case

we have to run in order to validate it.

It’s important to point out that the requirements (described on the Description

column of the RTM) can have two different sources; internal requirements, which

are requirements elicited by the consortium, and external requirements, elicited

by using external methods like business questionnaires, which are explained on

the section Update to Market Analysis (Section 3) and Update to the Business

and Technical Requirements (Section 4) of this deliverable.

Moreover, we have added a specific tab to track requirements and compo-

nents with the main project objectives described in the DoA (DoA, 2016). The tab

Measurements criteria vs WP relates every project objective in the DoA with the

50 Ditas consortium: D5.4 Final case studies validation report



work packages and components in charge to fulfil it. The following figure is an

excerpt from that tab.

Figure 29: Measurements criteria vs WP

6.1.1 Requirements as user stories

In Agile development, user stories are a brief and simple definition of a feature

faced from the perspective of the one who desires a new capability, which is

usually a user or a customer of the system. They have the following from:

As a <type of user>, I want <some goal> so that <some reason>.

For example, a user story for searching on a website can be as follows:

As a website user, I want to able to search on the webpage, so that I can find necessary in-

formation.

For the project, the consortium members were encouraged to use user stories to

define the requirements on a user-friendly way, and they are written on the “De-

scription” column of the Requirements Traceability Matrix.

6.1.2 Acceptance criteria

For the DITAS project, as well as with the user stories, the consortium members

were encouraged to use acceptance criteria in order to describe the way to test

and fulfil the requirements, that is, to specify conditions under which a user story

(a requirement) is fulfilled. The acceptance criteria sentences are written on the

“Test case / Acceptance criteria” column of the Requirements Traceability Ma-

trix.

For example, and following the example of the previous section, the ac-

ceptance criteria for searching on a website can be as follows.

Given that I’m in a role of registered or guest user

When I open the “Products” page

Then the system shows me the list of all products

And the system shows the “Search” section in the right top corner of the screen

When I fill in the “Search” field with the name of existing item in the product list

And I click the “Apply” button OR press the Enter key on keyboard

Then the system shows the matching products in the Search Results section

The acceptance criteria will drive the verification of the requirements, as it ena-

bles individual requirement verification. Moreover, when possible, acceptance



criteria will be automated in the Continuous Delivery Pipelines. When not possi-

ble, manual inspection will be used to track the fulfilment of the requirements.

5.1.3 Working methodology

As stated on 5.1 Requirements traceability section, each Work Package has its

own Requirements Traceability Matrix. The initial requirements of the matrix are

the ones that were described on the Annex 2 - DITAS Business and Technical Re-

quirements section of the D1.1 (D1.1, 2017). Hereafter, the way of updating the

Requirements Traceability Matrix for each role is as follows:

• Every WP partners are in charge of updating, removing or adding the in-

ternal requirements. The internal requirements should be written using the

User Stories style described on the section 5.1.1.

• Every WP leaders are in charge of updating the external requirements

(taken from the questionnaires) of the WP when available and removing

the outdated ones. The external requirements as well as the internal re-

quirements, should be written using the User Stories style described on the

section 5.1.1.

• Every WP leaders are also responsible for updating the Measurements cri-

teria vs WP tab that will be explained in the “User acceptance tests” sec-

tion.

6.2 Verification methodology

Software verification is required to ensure the software quality and is a key phase

in the software development life cycle. For ease of understanding we can sum-

marize it with the question, “Am I building the right product?”; that is to say, “does

the software satisfies its specification”?.

The software verification for the DITAS project is focused on the development of

the components and are performed via several tests that are explained on the

following subsections. Most of these tests run on the Jenkins pipeline of the DITAS

Continuous integration (CI) system. More details about the CI system can be

found on D5.2 (D5.2, 2018).

The flow for the software verification is as follows:

Figure 30: Software verification tests

6.2.1 Unit tests

A unit test is a way of testing a unit, the smallest piece of code that can be logi-

cally isolated in a system ensuring that the units are individually and inde-

pendently scrutinized for proper operation. These tests are performed on the

“Build - Test stage” of the Jenkins Pipeline.

The following example shows a unit test performed by the VDC Repository Engine,

a Java-based developed component. It uses Maven as a build automation tool

for building and running all the unit test classes.



stage('Build - test') {

agent {

dockerfile {

filename 'Dockerfile.build'

}

}

steps {

// Build and store artifact

sh 'mvn -B -DskipTests clean package'

archiveArtifacts 'target/*.jar'

// Run unit tests

sh 'mvn test'

}

}

If any of the unit tests fails, the Jenkins pipeline stops, and the developer gets

instantly notified via email, so he can fix its code errors as soon as possible.

6.2.2 API validation test

All the API defined by every component of the project is described via OpenAPI.

Thus, each of the APIs have a yaml file with a complete description of the exposed

resources, parameters and expected responses. This enables automatic testing

of the APIs.

In order to check that the API is implemented according to its definition file,

Dredd 51, a language agnostic command-line tool is used. More details about

this can be found on (D5.2, 2018) , but to sum up, Dredd reads the API description

from the definition file and step by step validates whether the API implementa-

tion replies with responses as they are described in the documentation. Using this

tool enforces the developer to have the API documentation up to date, ensuring

also and up-to-date API documentation for the users.

The following example shows a API validation test performed by the VDC Repos-

itory Engine, using Dredd as explained above. The VDC Repository Engine com-

ponent container is running on the Staging machine (31.171.247.162) and it

serves on the port 50009. The validation of the API definition happens against this

endpoint.

stage('API validation') {

agent any

steps {

sh 'dredd VDC_Repository_Engine_Swagger_v2.yaml.yaml http://31.171.247.162:50009'

}

51 Dredd - HTTP API Testing Framework: https://github.com/apiaryio/dredd

https://github.com/apiaryio/dredd



}

If the API validation fails, the Jenkins pipeline stops and the developer gets in-

stantly notified via email, so he can fix its API definition as soon as possible.

6.2.3 Integration Tests

Within the integration tests, multiple and bigger units in interaction are tested to

ensure the consistency and interoperability between integrated components.

This level of testing exposes faults in the interaction between integrated units.

To plan de integration test of each component, the components diagrams (ei-

ther SDK or EE) were used as starting point in order to see dependencies between

components. These components diagram were defined on section 3 of D4.252,

as well as the diagram-based approach which was introduced on D5.2. A more

detailed definition of the integration tests will be defined on D5.4.

The following example shows an integration test performed by VDC Repository

Engine component, which uses, as it is Java-based component, the Maven Fail-

safe plugin to simplify the action and to fire the corresponding integration tests.

stage('Integration tests') {

agent any

steps {

sh 'mvn verify'

}

}

If the integration tests fail, the Jenkins pipeline stops, and the developer gets in-

stantly notified via email, so he can fix its dependency problems as soon as pos-

sible. The failed component is not deployed downstream so will never reach the

Staging or Production servers.

6.3 Validation methodology

The validation methodology for the DITAS project, which leads us to meet the

project requirements in the most efficient and effective way, contains three dif-

ferent levels:

● Component level requirements validation: How we validate the DITAS

framework components using their requirements.

● Framework validation [against use cases]: How the use cases (Industry

4.0 and eHealth) help to validate the DITAS project.

● Validation against project objectives: How we validate the project ob-

jectives described on the DoA.

6.3.1 Component level requirements validation

As stated on the previous sections, using the RTM we have requirements written

on the “Description” column for each Work Package. These requirements are

fulfilled by one component, which is declared on the “Component that fulfils it”

52 Ditas consortium: D4.2 Execution environment Prototype - First release (2018)



column of the same matrix. Finally, this component is tested using a concrete

acceptance test. If the test passes, the component gets validated. We can sum-

marize the components validation with the following figure.

Figure 31: Component validation flow

Each acceptance test provides enough information to run the test. Depending

on the nature of the test, it will be run automatic or manually. All the information

regarding acceptance tests will be detailed on Deliverable D5.4.

6.3.2 Framework level validation

The framework level validation part for the DITAS project is divided in two different

type of validation tests, the system tests and the user acceptance test.

6.3.2.1 System Tests

System test are crucial for validating the behavior of the components when crit-

ical actions are required from framework. IDEKO as the validation leader has

made a list of these critical actions, such as publish a Blueprints into the Reposi-

tory, search for a blueprint, Blueprint ranking, etc. The system tests ensure that

there aren’t any inconsistencies between the units that are integrated together.

Typically, the system tests are done after the integration test on an automatic

way, but we prefer to test them separately by critical actions, one test for each

critical action, and in this way, we have a total control of the tests. It’s important

to point out that some of the test will be done manually and some others auto-

matically. The full list of system tests will be detailed in the D5.4.

For example, the testing of the critical action Blueprint ranking, is made manually.

To do it, we first have to open the Blueprint Repository web interface, we type a

query and we check if the results are correctly ranked (comparing the blue-

prints). If everything worked perfectly, we verified that the VDC Repository En-

gine, the Blueprint Repository Database, the Blueprint Repository index in Elas-

ticsearch and the Data Utility Resolution Engine are correctly working, as they

were the elements that take part on this critical action. The tracking of the system

tests and the results are also made using a traceability matrix. Depending on the

nature of the tests, it will be automated into the CI Cycle using tools like Selenium.

6.3.2.2 User acceptance tests

The User acceptance tests, as the name suggest, is a process of verifying that a

solution works for the user. For the DITAS project, the users are the Use Cases,

which are in charge of developing an application that will use the DITAS frame-

work. Each use case defines a set of business requirements, that is, what they

need from the software. These requirements cover different type of categories,

such as performance or availability. Furthermore, they define test cases or ac-

ceptance criteria in order to validate these requirements. We consider that if this

happens, the user acceptance tests are successfully passed, as they require-

ments are met.

The following image shows some business requirements requested by the Indus-

try 4.0 Use Case, which is driven by IDEKO.



Figure 32: Business requirements for the Industry 4.0 use case

Along with the business requirements, the Use Cases also offer some technical

requirements they need for their application. The next figure shows some of the

technical requirements for the IDEKO Use Case.

Figure 33: Technical requirements for the Industry 4.0 use case

We can summarize the user acceptance tests with the next figure, where the

Application of the use cases (which is using the DITAS framework) have some

requirements. These requirements get fulfilled if the acceptance test gets ac-

complished, therefore, the application using DITAS meets the user requirements,

so the user acceptance tests are passed.

Figure 34: Validation against use cases flow



6.3.3 Validation against project objectives

Validating the project objectives is a critical aspect of any project. To ensure the

validation and in order to track and fulfil the project objectives we are using an-

other matrix introduced some sections above, with the title Measurements crite-

ria vs WP. In this matrix we have the project objectives described in the DoA (Sec-

tion 1.1) (DoA, 2016) and we link them to its related work package. The objectives

are also linked with the specific components that meet the requirements plus an

explanation of how we really cover the criteria.

The next figure shows the tracking of the Objective 1 “Improvement of produc-

tivity when developing and deploying data-intensive applications” fulfillment.

Figure 35: Objective 1 fulfillment

In summary, in this sheet we have the objectives described in the DoA, and we

have the related Work Package and the specific component that are in charge

on satisfying them. These components have requirements, so if we validate these

requirements we are validating the project objectives. We can summarize this

validation with the Figure 36.

Figure 36: Validation against project objectives

It’s important to point out that some of these objectives have quantitative met-

rics (e.g., 3.3. Reduction of 10% of the time needed for the transition between

different deployments using DITAS framework compared to traditional ap-

proaches) that will be evaluated by the end-users on test scenarios with DITAS

and without DITAS. The details of this tests will be detailed on D5.4.



7 Conclusions

This document includes an update to market analysis, update to the business

requirements, detailed project architecture and a detailed plan for verification

and validation, which have all been updated based on the conclusions from the

first phase of the project.

In this document we have updated the market analysis with more focus on fog

computing and on two of the possible markets for DITAS - the e-Health and In-

dustry 4.0 markets. What follows from this analysis is that DITAS has opportunities

in these markets. The updated state of art is now focused on the innovations of

DITAS in the data lifecycle scenarios in a Fog environment: Data Delivery, Data

Management and Data as a Service. It shows that DITAS is indeed innovative,

considering the current state of the art.

The requirements have been revised too and expanded based on new question-

naire feedback and the updated version of the architecture, with an emphasis

on the traceability of the requirements.

We have also updated the architecture with more details based on the conclu-

sions from building the first prototype implementing the core functionalities of the

DITAS platform for milestone MS3. A more advanced blueprint lifecycle, with an

intermediate VDC Blueprint in addition to the abstract and concrete blueprints,

was developed. Moreover, a new DAL layer was added to the VDC for address-

ing privacy concerns in computation movement.

In addition, a more detailed technical verification and validation approach is

described, with an emphasis on requirements traceability.

The next milestone is a mature and final release implementing all modules con-

stituting the DITAS platform according to the architecture described in this doc-

ument - deliverables D2.3, D3.3, D4.3 and D5.3 in M30.



8 References

Alcaraz Calero, J. M., & Aguado, J. G. (2015). MonPaaS: An Adaptive Monitor-

ing Platform as a Service for Cloud Computing Infrastructures and

Services. IEEE Transactions on Services Computing, 8(1), 65-78.

Al-Doghman, F., Chaczko, Z., & Jiang, J. (2017). A Review of Aggregation

Algorithms for the Internet of Things. 25th International Conference on

Systems Engineering (ICSEng), (pp. 480-487). doi:10.1109/ICSEng.2017.43

Amyot, D., Ghanavati, S., Horkoff, J., Mussbacher, G., Peyton, L., & Yu, E. (2010).

Evaluating goal models within the goal-oriented requirement language.

International Journal of Intelligent Systems, 25(8), 841-877.

Bermbach, D., Pallas, F., García Pérez, D., Plebani, P., Anderson, M., Kat, R., & Tai,

S. (2017). A research perspective on Fog computing. International

Conference on Service-Oriented Computing. Springer.

Bertino, E., & Ferrari, E. (2018). Big Data Security and Privacy. In S. Flesca, S. Greco,

E. Masciari, & S. D, A Comprehensive Guide Through the Italian Database

Research Over the Last 25 Years. Studies in Big Data (Vol. 31). Springer.

Blake, R., & Mangiameli, P. (2011). The Effects and Interactions of Data Quality

and Problem Complexity on Classification. Journal Data and Information

Quality, 2(2).

Bonomi, F., Milito, R., Zhu, J., & Addepalli, S. (2012). Fog computing and its role in

the internet of things. Proceedings of the First Edition of the MCC Workshop

on Mobile Cloud Computing1, (pp. 13-16).

Byers, C. (2017, August). Architectural Imperatives for Fog Computing: Use Cases,

Requirements, and Architectural Techniques for Fog-Enabled IoT

Networks. IEEE Communications Magazine, 55(8), 14-20.

doi:10.1109/MCOM.2017.1600885

Cappiello, C., Pernici, B., Plebani, P., & Vitali, M. (2017). Utility-Driven Data

Management for Data-Intensive Applications in Fog Environments. ER

Workshops, (pp. 216-226).

Chung, L., Nixon, B., Yu, E., & Mylopoulos, J. (2012). Formal reasoning techniques

for goal models. Springer Science & Business Media.

Colombo, P., & Ferrari, E. (2018). Privacy Aware Access Control for Big Data: A

Research Roadmap. In Big Data Research, 2(4), 145-154.

consortium, D. (2018). D5.2 Integration of DITAS and case studies validation

report.

D’Andria, F., Field, D., Aliki, K., Kousiouris, G., Garcia-Perez, D., Pernici, B., &

Plebani, P. (2015). Data Movement in the Internet of Things Domain.

European Conference on Service-Oriented and Cloud Computing.

Service Oriented and Cloud Computing. Lecture Notes in Computer

Science. 9306, pp. 243-252. Springer, Cham.

D1.1. (2017). D1.1 Initial architecture document with market analysis, SotA refresh

and validation approach. DITAS Consortium.

D2.2. (2018). D2.2 DITAS Data Management - Second Release. DITAS Consortium.



D3.2. (2018). D3.2 Data Virtualization SDK prototype (initial version). DITAS

Consortium.

D5.2. (2018). D5.2 Integration of DITAS and case studies validation report. DITAS

consortium.

Dey, S., & Mukherjee, A. (2018). Implementing Deep Learning and Inferencing on

Fog and Edge Computing Systems. IEEE International Conference on

Pervasive Computing and Communications Workshops (PerCom

Workshops), (pp. 818-823).

DoA. (2016). Description of Action. Research and Innovation Action. N. 7319454,

DITAS. European Commision. .

Doelitzscher, F., Fischer, C., Moskal, D., Reich, C., Knahl, M., & Clarke, N. (2012).

Validating Cloud Infrastructure Changes by Cloud Audits. IEEE Eighth

World Congress on Services (pp. 377-384). HONOLULU: IEEE.

Duy La, Q., Ngo, M. V., Dinh, T. Q., Quek, T. Q., & Shin, H. (2018). Enabling

intelligence in fog computing to achieve energy and latency reduction.

Digital Communications and Networks.

doi:https://doi.org/10.1016/j.dcan.2018.10.008

European Commision. (2018, January 04). eHDSI Mission. Retrieved January 10,

2019, from

https://ec.europa.eu/cefdigital/wiki/display/EHOPERATIONS/eHDSI+Missi

on

European Commision. (2018). Synopsis Report - Consultation: Transformation

Health and Care in the Digital Single Market. European Commission.

European Commission. (2017, May 2). Final results of the European Data Market

study measuring the size and trens of the EU data economy. Retrieved

January 10, 2019, from https://ec.europa.eu/digital-single-

market/en/news/final-results-european-data-market-study-measuring-

size-and-trends-eu-data-economy

European Commission. (2017, October 12). Public Consultation on Health and

Care in the Digital Single Market. Retrieved 01 10, 2019, from

https://ec.europa.eu/digital-single-market/en/news/public-consultation-

health-and-care-digital-single-market

European Commission. (2018, April 25). Communication on enabling the digital

transformation of health and care in the Digital Single Market;

empowering citizens and building a healthier society. Retrieved January

10, 2019, from https://ec.europa.eu/digital-single-

market/en/news/communication-enabling-digital-transformation-health-

and-care-digital-single-market-empowering

European Commission. (2018, April 25). Data in the EU: Commission steps up

efforts to increase availability and boost healthcare data sharing.

(European Commission) Retrieved January 10, 2019, from

http://europa.eu/rapid/press-release_IP-18-3364_en.htm

Even, A., Shankaranarayanan, G., & Berger, P. (2010). Inequality in the utility of

customer data: implications for data management and usage. J.

Database Mark. Custom. Strat. Manag., 17(1), 19-35.



Even, A., Shankaranarayanan, G., & Berger, P. (2010). Inequality in the utility of

customer data: implications for data management and usage. J.

Database Mark. Custom. Strat. Manag., 17(1), 19-35.

FOG - Fog Computing and Networking Architecture Framework. (2018, 06 14).

IEEE 1934-2018 - IEEE Standard for Adoption of OpenFog Reference

Architecture for Fog Computing. Retrieved from IEEE Standard

Association: https://standards.ieee.org/standard/1934-2018.html

Furukawa, J., Lindell, Y., Nof, A., & Weinstein, O. (2017). High-throughput secure

three-party computa-tion for malicious adversaries and an honest

majority. Annual International Conference on the Theory and Applications

of Cryptographic Techniques. Springer.

Giorgini, P., Mylopoulos, J., Nicchiarelli, E., & Sebastiani, R. (2003). Formal

reasoning techniques for goal models. Journal Data Semantics, 1(1), 1-20.

Halunen, K., & Karinsalo, A. (2017). Measuring the value of pri-vacy and the

efficacy of PETs. In Proceedings of the 11th European Conference on

Software Architecture: Companion Proceedings (ECSA '17) (pp. 132-135).

New York: ACM.

Han, J., Y., J., L., H., P., & J., W. (2017). An Anonymization Method to Improve Data

Utility for Classification. Proceedings of International Symposium on

Cyberspace Safety and Security, (pp. 57-71).

Ho, T., & Pernici, B. (2015). A data-value-driven adaptation framework for energy

effi-ciency for data intensive applications in clouds. 2015 IEEE Conference

on Tech-nologies for Sustainability (SusTech), (pp. 47–52).

Horkoff, J., & Yu, E. (2016). Interactive goal model analysis for early requirements

engineering. equirements Engineering, 21(1), 29-61.

Horkoff, J., Barone, D., Jiang, L., Yu, E., Amyot, D., Borgida, A., & Mylopoulos, J.

(2014). Strategic business modeling: representation and reasoning.

Software & Systems Modeling, 13(3), 1015-1041.

Horkoff, J., Borgida, A., Mylopoulos, J., Barone, D., Jiang, L., Yu, E., & Amyot, D.

(2012). Making data meaningful: The business intelligence model and its

formal semantics in description logics. Proc. of On the Move to Meaningful

Internet Systems, (pp. 700-717).

Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E., Spicer,

K., & de Wolf, P. (2012). Statistical Disclosure Control. New York: Wiley.

IIBA. (2009). MoSCoW Analysis. A Guide to the Business Analysis Body of

Knowledge. International Institute of Business Analysis.

Kalyani, G., V. P. Chandra Sekhara Rao, M., & Janakiramaiah, B. (2017). Particle

Swarm Intelligence and Impact Factor-Based Privacy Preserving

Association Rule Mining for Balancing Data Utility and Knowledge Privacy.

Arabian Journal for Science and Engineering, 43.

Katsalis, K., Papaioannou, T., Nikaein, N., & Tassiulas, L. (2016). SLA-driven VM

Scheduling in Mobile Edge Computing. IEEE 9th International Conference

on Cloud Computing.

Khaitzin, E., Shlomo, R., & Anderson, M. (2018). Privacy Enforcement at a Large

Scale for GDPR Compliance. Proceedings of the 11th ACM International



Systems and Storage Conference (SYSTOR '18) (pp. 124-124). New York:

ACM.

Ko, R., Lee, B., & Pearson, S. (2011). Towards Achieving Accountability, Au-

ditability and Trust in Cloud Computing. In A. Abraham, J. Mauri, J. Buford,

J. Su-zuki, & S. Thampi, Advances in Computing and Communications,

CCIS 193 (pp. 432-444).

Kock, N. (2007). Encyclopedia of E-collaboration. Hershey: Information Science

Reference - Im-print of: IGI Publishing.

Lai, C., Song, D., Hwang, R., & Lai, Y. (2016). A QoS-aware streaming service over

fog computing infrastructures. Digital Media Industry & Academic Forum

(DMIAF), (pp. 94-98). doi:10.1109/DMIAF.2016.7574909

Letier, E., & Van Lamsweerde, A. (2004). Reasoning about partial goal satisfaction

for requirements and design engineering. ACM SIGSOFT Soft. Eng. Notes,

29, 53-62.

Lin, Y., Wu, C., & Tseng, V. (2015). Mining high utility itemsets in big data. In T. Cao,

E. Lim, Z. Zhou, T. Ho, D. Cheung, & H. Motoda, PAKDD 2015. Lecture Notes

in Computer Science (Vol. 9078, pp. 649-661). Springer.

Lodge, T., Crabtree, A., & Brown, A. (2018). Developing GDPR Compliant Apps

for the Edge. ESORICS International Workshop on Data Privacy

Management, Cryptocurrencies and Blockchain Technology (pp. 313-

328). Springer.

Lucky, M., Cremaschi, M., Lodigiani, B., Menolascina, A., & De Paoli, F. (2014).

Enriching API Descriptions by Adding API Profiles Through Semantic

Annotation. Collaborative Systems for Smart Networked Environments.

PRO-VE 2014. IFIP Advances in Information and Communication

Technology. Springer.

Martínez, S., Fouche, A., Gérard, S., & J., C. (2018). Automatic Generation of

Security Compliant (Virtual) Model Views. In Conceptual Modeling. ER

2018. Lecture Notes in Computer Science (Vol. 11157). Springer.

Michelberger, B., Andris, R., Girit, H., & Mutschler, B. (2013). A Literature Survey on

Information Logistics. BIS 2013. Lecture Notes in Business Information

Processing. 157. Berlin, Heidelberg: Springer.

Moody, D., & Walsh, P. (1999). Measuring the value of information: an asset

valuation approach. European Conference on Information Systems.

Mouradian, C., Naboulsi, D., Yangui, S., Glitho, R. H., Morrow, M. J., & Polakos, P.

A. (2018). A Comprehensive Survey on Fog Computing: State-of-the-Art

and Research Challenges. IEEE Communications Surveys & Tutorials, 20(1),

416-464. doi:10.1109/COMST.2017.2771153

Open Grid Forum. (2014). Web Services Agreement Specification (WS-

Agreement). Open Grid Forum.

Pallas, F., & Grambow, M. (2018). Three Tales of Disillusion: Bench-marking

Property Preserving Encryption Schemes. International Conference on

Trust and Privacy in Digital Business. Springer.

Pearson, S., & Mont, M. (2011). Sticky Policies: An Approach for Managing Privacy

across Multiple Parties. IEEE Computer, 44(9), 61-68.



Petychakis, M., Alvertis, I., Biliri, E., Tsouroplis, R., Lampathaki, F., & Askounis, D.

(2014). Enterprise Collaboration Framework for Managing, Advancing and

Unifying the Functionality of Multiple Cloud-Based Services with the Help

of a Graph API. Collaborative Systems for Smart Networked Environments.

PRO-VE 2014. IFIP Advances in Information and Communication

Technology. Springer.

Pham, V., & Huh, E. (2017). A Fog/Cloud based data delivery model for publish-

subscribe systems. 2017 International Conference on Information

Networking (ICOIN), (pp. 477-479). Da Nang.

doi:10.1109/ICOIN.2017.7899539

Pretschner, A., Hilty, M., & Basin, D. (2006). Distributed Usage Control.

Communications of the ACM, 49(9), 39-44.

Qin, Y., Sheng, Q. Z., Falkner, N. J., Dustdar, S., Wang, H., & Vasilakos, A. V. (2016).

When things matter: A survey on data-centric internet of things. Journal of

Network and Computer Applications, 64, 137-153.

doi:https://doi.org/10.1016/j.jnca.2015.12.016

Sadeghi, A.-R., & Stüble, C. (2004). Property-based Attestation for Computing

Platforms: Caring about properties, not mechanisms. New Security

Paradigms Work-shop.

Salman, O., Elhajj, I., Chehab, A., & Kayssi, A. (2018). IoT survey: An SDN and fog

computing perspective. Computer Networks, 143, 221-246.

doi:https://doi.org/10.1016/j.comnet.2018.07.020.

Sebastiani, R., Giorgini, P., & Mylopoulos, J. (2004). Simple and minimum-cost

satisfiability for goal models. Proc. of Int. Conference on Advanced

Information Systems Engineering (pp. 20-35). Springer.

Shamir, A. (1979). How to share a secret. Communications of the ACM, 612-613.

Sharma, Chatterjee, S., & Sharma, D. (2013). CloudView: Enabling tenants to

monitor and control their cloud instantiations. IFIP/IEEE International

Symposium on Integrated Network Management, (pp. 443-449). GHENT.

Surwase, V. (2016). REST API Modeling Languages -A Developer’s Perspective.

IJSTE - International Journal of Science Technology & Engineering, 2(10).

Syed, M., & Syed, S. (2008). Handbook of Research on Modern Systems Analysis

and Design Technologies and Applications. Hershey: Information Science

Reference - Imprint of: IGI Publishing.

Taleb, T., Dutta, S., Ksentini, A., Iqbal, M., & Flinck, H. (2017). Mobile Edge

Computing Potential in Making Cities Smarter. IEEE Communications

Magazine (pp. 38-44). IEEE.

Thi, Q., Si, T., & Dang, T. (2018). Fine Grained Attribute Based Access Control

Model for Privacy Protection. In T. Dang, R. Wagner, J. Küng, N. Thoai, M.

Takizawa, & E. Neuhold, Future Data and Security Engineering, FDSE 2016.

Lecture Notes in Computer Science (Vol. 10018). Cham: Springer.

Tsouroplis, R., Petychakis, M., Alvertis, I., Biliri, E., Lampathaki, F., & Askounis, D.

(2015). Internet-Based Enterprise Innovation Through a Community-Based

API Builder to Manage APIs. Current Trends in Web Engineering. ICWE 2015.

Lecture Notes in Computer Science. 9396. Springer.



Ulbricht, M.-R., & Pallas, F. (2018). YaPPL-A Lightweight Privacy Pref-erence

Language for Legally Sufficient and Automated Consent Provision in IoT

Scenarios. Data Privacy Management, Cryptocurrencies and Blockchain

Technology (pp. 329-344). Springer.

Ullah, K. W., Ahmed, A. S., & Ylitalo, J. (2013). Towards Building an Automated

Security Compliance Tool for the Cloud. 12th IEEE International

Conference on Trust, Security and Privacy in Computing and

Communications (pp. 1587-1593). Melbourne: VIC.

Varshney, P., & Simmhan, Y. (2017). Demystifying Fog Computing: Characterizing

Architectures, Applications and Abstractions. IEEE 1st International

Conference on Fog and Edge Computing (ICFEC).

Verma, S., Yadav, A. K., Motwani, D., Raw, R. S., & Singh, H. K. (2016). An efficient

data replication and load balancing technique for fog computing

environment. 2016 3rd International Conference on Computing for

Sustainable Global Development (INDIACom), (pp. 2888-2895). New Delhi.

Vidyasankar, K. (2018). Distributing Computations in Fog Architectures.

Proceedings of the 2018 Workshop on Theory and Practice for Integrated

Cloud, Fog and Edge Computing Paradigms (TOPIC '18) (pp. 3-8). New

York: ACM. doi:https://doi.org/10.1145/3229774.3229775

Wagner, I., & Boiten, E. (2018). Privacy Risk Assessment: From Art to Science.

Metrics.

Wagner, I., & Eckhoff, D. (2018). Technical Privacy Metrics: A Systematic Survey.

ACM Computer Survey, 51(3), 57:1-57:38.

doi:https://doi.org/10.1145/3168389

Wang, J., Zhu, X., Bao, W., & Liu, L. (2016). A utility-aware approach to redundant

data up-load in cooperative mobile cloud. 9th IEEE International

Conference on Cloud Computing, (pp. 384-391). San Francisco.

Werner, S., Pallas, F., & Bermbach, D. (2017). Designing Suitable Access Control

for Web-Connected Smart Home Platforms. International Conference on

Service-Oriented Computing. Springer.

Yin, B., Cheng, Y., Cai, L., & Cao, X. (2017). Online SLA-aware Multi-Resource

Allocation for Deadline Sensitive Jobs in Edge-Clouds. GLOBECOM 2017 -

2017 IEEE Global Communications Conference.

Zaveri, A., Dastgheib, S., Wu, C., Whetzel, T., Verborgh, R., Avillach, P., . . .

Dumontier, M. (2017). smartAPI: Towards a More Intelligent Network of

Web APIs. In: Blomqvist E., Maynard D., Gangemi A., Hoekstra. The

Semantic Web. ESWC 2017. Lecture Notes in Computer Science. 10250.

Springer.



ANNEX 1: DITAS Business and Technical Requirements

WP1 – Requirement, Architecture and Validation Approach

Technical Requirements

ID T1.1

Requirement Type Non-functional

Source Questionnaire

Priority | Category Must|Performance

Component that fulfils it VDC

Description Reduce and process the data on the Edge/IoT side

before they reach a central location such as the

Cloud

Rationale Processing and reducing data before they are

stored in a central location allows the distribution of

task, the reduction of resources space needed for

the storage and the transmission of data.

Test case / Acceptance

criteria

In the e-Health use case:

As DPO I want that data to be pseudonymized

upon transferring in the hospital group cloud so

that the hospital is compliant to GDPR

Time Frame Report on period 2.

ID T1.2

Requirement Type Functional


Priority | Category Must|Interoperability

Component that fulfils it VDC

Description Harmonize the data coming from different data

sources (data heterogeneity)

Rationale There is an increasing need to develop data-inten-

sive applications able to manage data coming

from heterogeneous sources


criteria

Develop a method that consumes data from two

data sources

Write a simple blueprint for that method

Deploy a VDC from that blueprint

Using a tool like Postman call that method

Check that indeed data from the two data

sources is returned



Time Frame Report period 1

ID T1.3



Priority | Category Must|Security

Component that fulfils it DAL, Policy Enforcement Engine

Description Respect privacy requirements (such as GDPR com-

pliance) in data movement transactions and data

access

Rationale The GDPR imposes strict limitations on how to man-

age personal data. The DITAS platform must respect

such limitations in order to ensure its customers that

the service offered is compliant and they won’t run

into the severe consequences specified by the law.


criteria

In the e-Health use case:

Researcher accessing the data cannot access

personal patient data

Researcher accessing the data in the public

cloud, after the data has been moved from the

private cloud to the public cloud, gets the data

encrypted


ID T1.4




Component that fulfils it CAF

Description Simplify the exposed data to third party users/clients

Rationale DITAS aims at enabling data-intensive application

developers to focus only on the business logic of the

application


criteria

Using Node-RED implement a method that re-

trieves and combines data from two data sources

Measure the time expended implementing that

method

Implement a script that consumes that method

Measure the time expended developing that

script



The difference between the two computed times

is (roughly) the time saved by the developer


ID T1.5




Component that fulfils it CAF, DAL

Description Simplify the data access from Fog and Cloud

Rationale The simplification of the usage of the DITAS platform

will help the application developers and, therefore

decrease its, already steep, learning curve.


criteria

Using Node-RED implement a method that re-

trieves and combines data from two data sources:

one at the edge and another one in the cloud

Measure the time expended implementing that

method

Implement a script that consumes that method

Measure the time expended developing that

script

The difference between the two computed times

is (roughly) the time saved by the developer


ID T1.6



Priority | Category Should|Interoperability

Component that fulfils it Abstract Blueprint, SLA Manager

Description Have a flexible agreement between data provider

and consumer (e.g. latency < 100 ms while availa-

bility > 99.999%)

Rationale DITAS provides agreements about the quality of

data between the data provider and the data user,

the SLA system in place will verify that those agree-

ments are met. This is performed via constant moni-

toring of the agreement




criteria

Given a blueprint with QoS constraints and an ab-

stract properties goal tree, verify upon a VDC de-

ployment that an SLA is created in the SLA Manager

for every method defined in the blueprint


ID T1.7




Component that fulfils it Concrete Blueprint, Application Requirements

Description Have the possibility to express data quality con-

straints

Rationale Data intensive applications need not only to access

data but also to access data whose quality reflect

their requirements


criteria

As a DITAS application designer I want to specify

quality metrics on the data the application will use,

for the selection of the blueprint.


ID T1.8



Priority | Category Should|Security

Component that fulfils it VDC Request Monitor

Description Have the possibility to monitor how data is provi-

sioned or consumed

Rationale Data handled by DITAS can contain sensitive infor-

mation, which requires to keep track of access,

movement of that data (Auditing)


criteria

As DITAS operator I want to get metrics on requests

coming to and going out from a VDC, so that I can

monitor how data is provisioned or consumed


ID T1.9






Component that fulfils it Monitoring System

Description keep track of the data transformations occurring

during the data movement

Rationale As data can be sensitive it is imperative that

changes to the data is tracked. With this tracking an

audit of data changes is possible


criteria

As DITAS operator, in the e-Health use case, when

data is moved, I can look at the records of the trans-

formations


WP2 - Enhanced data management

ID T2.1


Source DITAS Analysis


Component that fulfils it Abstract Blueprint

Description Metadata describing the data sources must be

available. This enables the computation of the data

utility matching application developer require-

ments and data source capabilities.

Rationale Meta-data describing the content of a data source

and its non-functional properties are essential for

computing both the Potential Data Utility and the

Data Utility of a data source according to the ap-

plication developer needs.


criteria

When the data administrator creates the blueprint, then the datasource must be characterised with

metadata about the general characteristics, the

QoS, the QoD, deployment process and API ac-

cess.


ID T2.2







Description A sample set might be obtainable from the data

source for evaluating the quality of the matching

between a data source and application require-

ments.

Rationale Sample data gathered from a data source accord-

ing to the application developer needs are used for

a validation of the proposed match between the

application developer requirements and the data

source features at design time.


criteria

When the data administrator creates the blueprint, then the datasource section of the blueprint must

contains a reference to a file that contains a repre-

sentative sample of the dataset of the datasources.


ID T2.3



Priority | Category Must|Interoperability/Maintainability

Component that fulfils it Abstract Blueprint, DUE

Description Metadata describing the potential data utility of a

data source and the data utility of the matching

between application requirements and data

source are saved in as metadata

Rationale Data Utility metadata can be used for selecting the

proper data source for an application developer

request and to monitor the trends in data utility for

a running application in order to detect possible is-

sues and to react to inefficiencies


criteria

When the data administrator submits the blueprint,

then the data quality of the data source will be an-

alysed and saved in the abstract blueprint


ID T2.4



Priority | Category Must|Functional


Description Functional dependencies and constraints on data

must be specified.



Rationale This requirement allows to calculate the data qual-

ity.


criteria

When the data administrator creates the blueprint,

then the datasource must be characterised with

such rules, if they exist.


ID T2.5





Description Metadata describing the input required on each

method must be available. Such information com-

prehends the input required for each method to be

available and the type of data (i.e., plain data, en-

crypted, pseudonymized, anonymised, etc).

Rationale Metadata on the input of methods will be used to

understand which portion of data are needed by

the VDC and then can be moved.


criteria


then, for each method, the input in terms of column

for each data source used, and the type of data

must be specified


ID T2.6




Component that fulfils it Abstract Blueprint, Concrete Blueprint

Description The list of computational and storage resources that

are made available by the data administrator and

the application designer, must be available.

Rationale The list of resources will be used for the data and

computation movement.


criteria


then the list of resources that he shares with the plat-

form must be specified.



When the application designer selects a blueprint,

then he must specify the list of resources that he

shares with the platform.


ID T2.7




Component that fulfils it DAL

Description Metadata describing the amount of resources used

by a datasource (in terms of CPUs, space, memory,

etc) must be available.

Rationale Such information will be used to understand if a

datasource can be moved in a new node.


criteria

When the DS4M will call the DAL, then it will provide the information about the infor-

mation on the data sources.


ID T2.8




Component that fulfils it Concrete Blueprint

Description The method selected by the application designer

should be memorised and known after the deploy-

ment of the VDC.

Rationale in order to perform a computation movement, the

DS4M will compare the resources used by the

method with the target resource where the VDC will

be moved. in order to do this, the DS4M needs to

know which method, among the ones offered by

the VDC, is being used by the application.


criteria

When the VDC is deployed, then the information on

the method that has been selected by the user

must be injected in the blueprint.




ID T2.9




Component that fulfils it Data Analytics

Description Metadata describing the amount of resources used

by a node inside the DITAS cluster (in terms of CPUs,

space, memory, etc) must be available.

Rationale Such information will be used to understand if a

datasource can be moved in a new node.


criteria

When the DS4M will call the data analytics, then the

data analytics will provide the information about

the information on the data sources.


ID T2.10




Component that fulfils it VDC Request Monitor

Description Metadata regarding incoming and outgoing re-

quests as well as regarding available encryption of

these requests must be available.

Rationale such information will be used to make decisions

about data/computation movement and for re-

porting to the auditing and compliance framework.


criteria

When a request to a VDC is made, then information

about the request time, method, status-code be-

comes available in the monitoring database. Fur-

thermore, even for an SSL request a minimal set of

monitoring information is collected in the monitor-

ing database.


ID T2.11







Description Metadata about privacy guarantees of a VDC

must be available.

Rationale Policy enforcement engine should be deployed

and configured for the VDCs using such privacy

guarantees.


criteria


then, for each method, he can define the privacy

guarantees in the abstract blueprint.


WP3 - Data virtualization

Business Requirements

ID B3.1





Description A variety of multiple different “modes” could be de-

scribed in the VDC Blueprint.

Rationale Achieve simpler and more economical manage-

ment of the resources.


criteria

Data Administrator will be able to describe

the different modes in the Abstract Blueprint


ID B3.2





Description VDC Blueprint must be able to describe and handle

different performance factors such as information

about energy consumption and energy efficiency

of the component and architecture.

Rationale The VDC Blueprint should consider and handle dif-

ferent performance factors.


criteria

Data Administrator will be able to describe

the different energy modes in their services

in the Abstract Blueprint.




ID B3.3



Priority | Category Must|OpenSource


Description Should have an open API in order for big ventors

and also new providers to be able to publish their

services and components.

Rationale The VDC Blueprint should be easy to understand

and use.


criteria

Extended the openAPI specification which is a ho-

mogenous standardised solution for describing the

REST services.


ID B3.4



Priority | Category Must|OpenSource


Description A documentation that describes each method and

each attribute must be included.


and use.


criteria

Use of GitHub for version control and documenting

all the components of Abstract Blueprint.


ID B3.5



Priority | Category Must|OpenSource/Extensibility


Description Be open to be used for alternative solutions that

may arise in the future.




and use.


criteria

Blueprint schema is highly extensible and is able to

encapsulate also another blueprint inside


ID B3.6



Priority | Category Should|Extensibility


Description Be open to be used for alternative architectures

and dynamic systems.

Rationale The VDC Blueprint should be architectural agnostic.


criteria

VDC is able to be deployed in different systems with

different computational capabilities

(Embedded Devices, X64 machines)


ID B3.7



Priority | Category Could|Extensibility


Description Be able to use/reuse “on the fly” different VDC Blue-

prints from different repositories.

Rationale The VDC Blueprint should be architectural agnostic.


criteria





ID T3.1







Description The VDC must be data-source independent.

Rationale The structure of the metadata format as also the

way that the metadata are saved must be orches-

trated in a way that every future candidate File Sys-

tem will be supported as a data source.


criteria

Abstract Blueprint contains information about the

data sources, whose data the VDC exposes, where

the data sources might be of different types, e.g.

parquet files consumed using S3 API, DBMS tables

consumed using JDBC, that enable the VDC to con-

nect to these various data sources.


ID T3.2



Priority | Category Must|Maintainability

Component that fulfils it PSE

Description Catch errors and security incompatibilities/vulnera-

bilities before the production/deployment.

Rationale The VDC should be evaluated for any security risk

before the actual deployment in order to avoid any

breach on the Data Exposed.


criteria

Security and Privacy Meta-Model and Privacy Se-

curity Evaluator Service allow for pre-deployment fil-

tering.


ID T3.3



Priority | Category Should|Maintainability/Security


Description VDC Schema should be able to describe not only

capabilities of the VDC but also describe the pro-

cesses for deployment.

Rationale The structure of the VDC Schema must not only be

declarative but also imperative, providing the



essentials to DITAS Platform in order to understand

what you want to happen step by step.


criteria

Abstract Blueprint contain information about the

configuration and orchestration for deploying the

VDC


ID T3.4



Priority | Category Must|Availability

Component that fulfils it Deployment Engine

Description Candidate resolution should be automatically

tested, reviews built and deployed after the deploy-

ment of the VDC

Rationale The processes for the deployment phase should be

transparent to the end user.


criteria • Define two blueprints with one method each,

one with a right implementation of the method

and a second one with a wrong implementa-

tion of the same.

• Deploy the first blueprint: no warning message

should be received from the Deployment En-

gine • Deploy the second one: Deployment Engine

must warn about the non-availability of the

method and stop the deployment


ID T3.5





Description Take advantage of pre-built community-based

blueprints (parts or components)

Rationale The expandability and re-usability of VDC compo-

nents and blueprints is a key factor for the simplicity

of the VDC file.


criteria






ID T3.6



Priority | Category Must|Maintainability/ Extensibility


Description Implicitly support disaster recovery and business

continuity characteristics to VDC Blueprint.

Rationale The DITAS Platform should provide business continu-

ity to the application service model.


criteria

Using the Blueprint Schema data administrators can

describe the services and provide information

about how the services can handle disaster recov-

ery


ID T3.7



Priority | Category Should|Maintainability


Description Monitor the infrastructure upgrade offerings of the

same provider and blueprint and seamlessly up-

date the VDC without expensive rewrites.

Rationale Upgrades of the same blueprint should not stop the

runtime process.


criteria

Abstract Blueprint is consisted of multiple sections

that can easily change and will not affect the run-

ning instances of a VDC Blueprint


ID T3.8



Priority | Category Must|Structural




Description A multi-modal language should specify the differ-

ent components.

Rationale Several modes (components) will create a single ar-

tifact. Could collaborate with ISO/IEC 19506 Stand-

ard called Knowledge Discovery Meta-Model

(KDM), which involves existing software systems by

insuring interoperability and exchange of data be-

tween tools provided by different vendors.


criteria

Develop two types of VDCs, using different pro-

gramming languages

Time Frame Report Period 1

ID T3.9





Description The notation language should follow the semi-struc-

tured format.

Rationale The notation language should be fast, accurate

and user friendly and have a semi structured format

in order to be able to describe different aspects of

heterogeneous data sources.


criteria

Abstract Blueprint Schema is created using JSON

semi structured markup language.


ID T3.10



Priority | Category Must|Extensibility

Component that fulfils it Abstract Blueprint, RE

Description The parsing mechanism/resolution should be able

to handle multiple different resolution tasks and be

able to scale.

Rationale VDC Resolution process should be distributed into

services that assess the different blueprint sections.

In this way if a section of the blueprint is changed

this will not affect the complete resolution process.




criteria

Abstract Blueprint is consisted of multiple sections

that are accessed by different distributed resolution

services


ID T3.11



Priority | Category Must|Structural/ Optimization


Description The notation language should be able to be parsed

efficiently and fast.

Rationale The notation language should be fast, accurate

and user friendly.


criteria


semi structured markup language


ID T3.12





Description The notation language must be human readable,

easy to script and to understand.

Rationale The notation language should hide the complexity

of the architecture and components and provide

also a nice overview of the specific component.


criteria


semi structured notation language


ID T3.13



Priority | Category Would|Compatibility




Description The VDC blueprint should be open enough for new

structural changes and be able to handle en-

crypted entries except of plain text.

Rationale The expandability of the VDC description Schema is

crucial.


criteria

Blueprint schema is highly extensible and described

is a modularized way that allows the creation of

new sections.


ID T3.14



Priority | Category Must|Compatibility


Description The notation language should be able to be parsed

by multiple different programming languages.

Rationale Should be able to be programmable understood

and used from multiple programming families.


criteria


semi structured notation language


ID T3.15



Priority | Category Must|Extensibility/ Interoperability

Component that fulfils it CAF, DAL

Description Make the access to data in the Cloud, edge and

fog transparent to the application, overcoming lim-

itations and notions such as running location, loca-

tion of data, bandwidth.

Rationale Application developers should be able to run their

application without having to consider where the

DITAS platform decides to put the computation or

the data and the developer shouldn’t rewrite it

when fog topology changes.


criteria

Application can access data that is deployed in the

cloud and in the fog. It continues to get data ex-

posed by the VDC even after either data or com-

putation movement have occurred.




ID T3.16




Component that fulfils it DUR

Description The Data Utility Refiner has to weight the goal model

specified in the application requirements accord-

ing to the type of application that will use the data.

Rationale The importance of each non-functional require-

ment varies based on the application that will use

the data source. E.g., in streaming applications, la-

tency is generally more important than reliability.


criteria

When the DURE invokes the DUR, then the DUR

should weigh the goal model defined by the appli-

cation developer according to the application

type.


ID T3.17




Component that fulfils it DUR

Description To address new application types, the Data Utility

Refiner should easily allow new algorithms and

weighing schemes to be introduced.

Rationale As new types of applications are introduced (e.g.,

big data and machine learning techniques), exist-

ing weighing schemes may not be adequate for

them. Hence the need for new schemes.


criteria

When a new application type is introduced, then

the DUR should allow the definition of a new

weighting scheme.


ID T3.18






Component that fulfils it DUE@VDM

Description The Data Utility Evaluator has to update potential

data utility values based on the actual usage of the

exposed data source

Rationale As data utility, and in particular data quality attrib-

utes, change based on the characteristics of the

data that are requested (e.g, only a subset of all the

available data is requested), their assessment has

to be made per-request.


criteria

When the DURE invokes the DUE, then the DUE up-

date potential data utility values if the request con-

sider a subset of attributes provided by a specific

method.


ID T3.19




Component that fulfils it PSE

Description The Privacy and Security Evaluator must determine

if and how well privacy and security attributes fit the

application developer requirements

Rationale Data sources that do not fulfill requirements on pri-

vacy and security should be discarded. Also, data

sources that offer significantly better security and/or

privacy than others should be favored.


criteria

When the DURE invokes the PSE, then the PSE must

filter out and reasonably rank blueprints according

to the specified security and privacy requirements


ID T3.20




Component that fulfils it Application Requirements

Description Application requirements must specify constraints

on non-functional requirements with a goal model.



Rationale In order to properly filter and rank the blueprints,

specific applications requirements should be taken

under consideration in the resolution process.


criteria

When the application developer specifies the ap-

plication requirement, then the requirements must

be expressed with a goal model.


ID T3.21




Component that fulfils it RE

Description The Resolution Engine must provide to the Data Util-

ity Resolution Engine the non-functional require-

ments as specified by the application developer.

Rationale It is important for the individual resolution services to

be able to communicate and send and retrieve the

appropriate data essential for each process.


criteria

Test the interoperability and communication pay-

load of the services. When the RE invokes the DURE,

then the RE must provide the non-functional re-

quirements as specified by the application devel-

oper


ID T3.22




Component that fulfils it RE

Description The Resolution Engine must provide to the Data Util-

ity Resolution Engine non-functional attributes re-

lated to each blueprint that needs to be ranked

and filtered.

Rationale It is important for the individual resolution services to

be able to communicate and send and retrieve the

appropriate data essential for each process.


criteria

Test the interoperability and communication pay-

load of the services. When the RE invokes the DURE,



then the RE must provide a list of blueprints to filter

and rank, together with non-functional attributes.


WP4 - Execution environment

Business Requirements

ID B4.1



Priority | Category Would|Maintainability


Description The DITAS Platform should be able to rebuild or

move the entire production infrastructure from bare

metal.

Rationale DITAS aims to be a set of tools that are able to be

installed in different Cloud and Edge providers. It

needs to support several of those providers in a very

abstract way.


criteria

Test that Deployment Engine is able to deploy at

least to two different Cloud providers. This will

demonstrate that the architecture is modular

enough not be Cloud provider dependent.


ID B4.2



Priority | Category Would|Extensibility

Component that fulfils it All

Description The DITAS Platform components should have source

control repository.

Rationale Should be a single point of search for the compo-

nents (source) code in case that are needed for

testing and upgrading.


criteria

Verify that all DITAS components are hosted in a

source control version repository.


ID B4.3





Priority | Category Could|Performance


Description The DITAS Platform should be able to deploy the

components on time.

Rationale The deployment delay should not last long as that

may disturb the end user.


criteria

In collaboration with the two use cases of the pro-

ject, verify that DITAS Deployment Engine is able to

deploy DITAS platform in a reasonable time, also,

verify that adaptations actions are not delayed due

to Deployment Engine performance.



ID T4.1





Description The DITAS Platform should be able to data backup

in minutes.

Rationale Upgrades of the same blueprint should not stop the

runtime process.


criteria

Test that the deployment engine is able to perform

a backup or running VDCs and VDMs.


ID T4.2



Priority | Category Would|Performance


Description Run the VDCs to isolated independent environ-

ments at once for benchmarking.

Rationale VDC should be optimized and checked for the per-

formance and their matrices.




criteria

Test that Deployment Engine is able to create a

VDC/VDM independent deployment of other in-

stances running at the same time.


ID T4.3



Priority | Category Would|Performance

Component that fulfils it Deployment Engine and VDC/VDM components

Description Run VDC “equivalent” for architecture backup sce-

narios or for fault tolerance at architectural level.

Rationale VDC should be optimized and checked for the per-

formance and their matrices.


criteria

Test that both VDM and Deployment Engine are

able to keep a synchronous copy of the VDC. The

VDC components should also be aware of this to

allow fault tolerance.


ID T4.4




Component that fulfils it VDC Components

Description The DITAS SLA Manager must be able to run on Edge

and on Cloud independently.

Rationale A DITAS application can run both one Edge and on

Cloud and this component should be able to run at

both levels even with limited functionality.


criteria

when the VDC is moved to an edge device, then the limited resources must not influence nega-

tively the performance of the VDC components.


ID T4.5






Component that fulfils it SLA Manager

Description The DITAS SLA Manager will offer an API for configu-

ration and QoS definition.

Rationale The Decision System for Data Movement needs an

API to configure it for a given application.


criteria

When the VDC is deployed, then the SLA Manager

will receive in input the blueprint of the VDC.


ID T4.6





Description The SLA Manager will notify via a system of any vio-

lations of the rules that trigger a data movement

action.

Rationale Since several subsystems of DITAS will need to react

to this situation, notification via queue subsystem

looks like the best option for scalability.


criteria

Test that SLA Manager is able to notify violations to

the queue subsystem.


ID T4.7





Description The SLA Manager will notify via a system of any vio-

lations of the rules that trigger a data movement

action.

Rationale Since several subsystems of DITAS will need to react

to this situation, notification via queue subsystem

looks like the best option for scalability.


criteria

Test that SLA Manager is able to notify violations to

the queue subsystem.




ID T4.8




Component that fulfils it Computation movement enactor

Description Being able to create application both at the Edge

and Cloud.

Rationale To move computation tasks between the Edge and

the Cloud or vice versa it is necessary to being able

to create application containers in both environ-

ments.


criteria

Test that the Computation Movement Enactor is

able to create containers both at Edge and

Cloud.


ID T4.9




Component that fulfils it Computation movement enactor

Description Being able to add Spark nodes.

Rationale To move Spark computation between the Edge

and the Cloud or vice versa it is necessary to being

able to create Spark computation nodes both on

the Edge and the Cloud and assign applications to

them, so step by step the Spark scheduler can cre-

ate resources there.


criteria

Test that it is possible to federate spark nodes be-

tween Edge and Cloud and deployment of Spark

nodes can be created in each location.


ID T4.10




Component that fulfils it Data movement enactor

Description The Data Movement Enactor will offer an API to al-

low the decision system to instruct it on which data



movement to be execute, the target and the

source, and the transformation on the data to be

executed.

Rationale The decision system, once decided where to move

data, will communicate its decision to the data

movement enactor.


criteria

When the DS4M decides that a data movement

has to be enacted, then it will call the DME using the

API that it exposes, specifying where to move the

datasource, which transformation to use.


ID T4.11




Component that fulfils it Computational movement enactor

Description The Computation Movement Enactor will offer an

API to allow the decision system to instruct it on

which computation movement to be execute: the

target and the source of the movement

Rationale The decision system, once decided where to move

computation, will communicate its decision to the

computation movement enactor.


criteria

When the DS4M decides that a computation move-

ment has to be enacted, then it will call the CME

using the API that it exposes, specifying where to

move the VDC.


ID T4.12




Component that fulfils it All components

Description The API of the Decision System for data and com-

putation Movement should be available only to au-

thorized users

Rationale This is to avoid external (malicious) user to send false

violation on the system to trigger unwanted

data/computation movements.




criteria

When a user that is not authenticated tries to call

the DS4M, then the DS4M will ignore the call.


ID T4.13



Priority | Category Must|

Component that fulfils it Data analytics

Description Metadata describing the resources that are made

available by the data administrator and the appli-

cation designer must be available. Such metadata

must describe the resources available (in terms of

space, memory, CPUs, etc) the location and the

type of data that can be memorized.

Rationale This information allows to do a computation move-

ment and to understand if a resource can execute

the method of the vdc that is moved


criteria

when the DS4M receives a Violation from the SLA

manager, then it needs to know how much re-

source a method of a VDC is using, therefore it calls

the data analytics that it returns the amount of re-

sources (in terms of CPU, RAM, storage space) used

by the method.


ID T4.14




Component that fulfils it VDC-Throughput-Agent

Description Metadata describing the behavior of the VDC as

well as the movement of data/computation must

be available.

Rationale The aggregated data is send to the monitoring da-

tabase used by the auditing and compliance

framework.


criteria

when a VDC component performs network opera-

tions, the aggregated usage gets stored in the mon-

itoring database.




ID T4.15




Component that fulfils it VDC-Logging-Agent

Description Metrics and measurements regarding various sys-

tem qualities must be available to make decisions

about violations of requirements and SLAs.

Rationale Data is used to trigger data/computation move-

ment and as part of the auditing and compliance

framework.


criteria

when a VDC component requests monitoring data

to be stored, the data will eventually be stored in

the monitoring database.


ID T4.16




Component that fulfils it Policy Enforcement Engine, DAL

Description The VDC should expose data that is compliant with

privacy policies, if privacy attributes are defined for

the VDC.

Rationale


criteria

VDC with privacy attributes exposes only data that

is compliant with privacy policies.


ID T4.17




Component that fulfils it DURE

Description The metrics and the goal model should be well

formed in the concrete blueprint.

Rationale The creation of the blueprint is central for the com-

munication between module. in particular, the



parts that define the goal model and the data qual-

ity metrics are needed by the DS4M and SLA man-

ager to check the violations and decide where and

when to move data and computation.


criteria

Sections about the data utility and the abstract

properties of the concrete blueprint should be well

formed.


ID T4.18




Component that fulfils it DS4M

Description The DS4M must provide an interface to collect the

violations detected by the SLA manager.

Rationale The DS4M will expose and API interface, used by the

SLA manager, to collect the violation of user re-

quirements.


criteria

When a violation of a requirement happens, the

SLA manager will be able to correctly send the vio-

lation data to the DS4M.


ID T4.19




Component that fulfils it DS4M

Description The DS4M will create a well formed JSON file with

the data and computation movement to be en-

acted and deliver it respectively to the data move-

ment enactor and computation movement enac-

tor.

Rationale The definition of how and where to move data and

computation is stored in two JSON files that are sent

respectively to the data movement enactor and

the computation movement enactor. Such file

must be well formed in order to allow the move-

ment component to understand where to move the

resources.




criteria

Test that the DS4M is able to create both files that

are according to spec. Test that the computation

movement enactor and data movement enactor

are able to understand them and act on them.


WP5 - Real world case studies and integration

IDEKO Use Case Requirements

ID EU1.F1


Priority | Category Must|N/A

Description As data owner I want the framework to enable data

sources description mechanisms. So that I can de-

scribe and publish my data sources to application

developers.


criteria

Check the completion of the Blueprint Repository


ID EU1.F2


Priority | Category Must|Flexibility

Description As data owner, I want the framework to provide a

mechanism to offer rich data-methods to develop-

ers. So that they don't have to build standard que-

ries as if they were connecting to the datasource

directly.


criteria

Check if the VDC architecture provides a flexible

mechanism to manage queries and data.


ID EU1.F3



Description As data owner, I want the rich method to be able

to query multiple sources before getting back the

data to the user. So that the user doesn't have to

call several methods to get data back from differ-

ent sources.




criteria

Check if the VDC allows querying several sources

from a single method.


ID EU1.F4



Description As data owner, I want the rich methods to be able

to perform data transformation. So that I can pro-

vide users with computed data instead on simple

raw-data.


criteria

Check if the VDC allows data transformation when

defining the methods.


ID EU1.F5



Description As data owner, I want to be able to publish my data

sources without the need of exposing nothing more

than necessary. So that my internal infrastructure

stays safe.


criteria

Check if the data access interface, provides a

mechanism to avoid external access to local net-

work.


ID EU1.F6



Description As data owner, I want to be able to query different

type of data sources. So I can publish different types

of data source I own.


criteria

Check if the VDC allows querying different data

sources.


ID EU1.F7




Priority | Category Must|Affordability

Description As application designer, I want the framework to be

able to be deployed actual company's resources.

So that I don't have to purchase additional re-

sources or to pay a fee to DITAS.


criteria

Check project business models and architecture.


ID EU1.F8



Description As application developer, I want the framework to

be smart enough to perform computation (data

transformation, etc.) in the agreed time (SLA) over-

coming incidents. So that the developer has only to

focus on business logic.


criteria

Check whether the framework applies mechanisms

to avoid possible SLA breaches regarding compu-

tation.


ID EU1.F9




provide the needed data under the parameters

defined in the SLA. So that I don't have to implement any extra mech-

anism to fulfil the application data needs.


criteria

Check whether the framework applies mechanisms

to avoid possible SLA breaches regarding data

needs.


ID EU1.F10





Description As application designer, I want the application to

be tolerant to server failures. So that the developer

has only to focus on business logic.


criteria

Simulate a server failure by shooting down a server. Check if the Kubernetes takes over the situation by

deploying new services.


ID EU1.F11


Priority | Category Must|Simplicity

Description As a developer, I want the framework to ease data

gathering from data sources. So that I don't have to

manage all the connections and queries manually.


criteria

Check if the VDC allows querying several sources

from a single method.


ID EU1.F12



Description As a developer, I want the framework to provide a

simple interface to get data from the data sources.

So that I can use standard mechanisms to get the

data.


criteria

Check if the CAF is based on some of the widely

used communication mechanism, like REST.


ID EU1.F13


Priority | Category Must|Simplicity

Description As a developer, I want the framework to enable

multi-source querying in a single call. So that I don't

have to make sequential calls to get back the

needed data.


criteria

Check if the VDC retrieves data from different data

sources from the same method.




ID EU1.F14



Description As a developer, I want the framework to enable

data transformation mechanisms to get trans-

formed data in a single call. So that I don't have to

perform data transformation on application side.


criteria

Check if the VDC allows transforming data from

within the method itself.


IDEKO Use Case Application level Requirements

ID EU1.UC1



Description As application designer, I want the framework to be

able to fallback to a different datasource when the

primary one fails. So that the error probability of a

data call decreases.


criteria

Check if the VDC allows the data administrator to

create fallback mechanisms.


ID EU1.UC2




serve streaming data for the machines within 2 sec-

onds since the data is created. So that I can per-

form analytics in a very short window.


criteria

Consume a method that provides a stream. Check the timestamp of the data collected. Compare with the timestamp in the database.


ID EU1.UC3





Description As application developer, i want a VDC method to

provide machine diagnostic results in near real

time. So that I can act in consequence.


criteria

Use the machine simulator to introduce an anom-

aly. Record the timestamp when the anomaly is intro-

duced. Check when the application detects the anomaly. Record the timestamp. Check if the difference is less than two seconds.


OSR Use Case Requirements

ID EU2.F1


Priority | Category Must|Accuracy

Description As a researcher, I want to obtain accurate data

so that I can correctly perform my study.


criteria

Perform multiple queries and test if the average ac-

curacies is ≥ 98%.


ID EU2.F2


Priority | Category Must|Completeness

Description As a researcher, I want to obtain complete data so

that I can correctly perform my study.


criteria

Perform multiple queries and test if the average

completeness is ≥ 80%.


ID EU2.F3


Priority | Category Must|Scalability

Description As head of research, I want to interact with a scal-

able system to accommodate all requests coming

from my researchers so that they can conduct dif-

ferent researches at once.


criteria

Test if limit = 99% and gain = 1.5x




ID EU2.F4



Description As a researcher, I want to obtain accurate data

so that I can correctly perform my study.


criteria

Manual inspection.


ID EU2.F5



Description As a researcher, I want to access minimized data so

that I know I am compliant with GDPR


criteria

Manual inspection.


ID EU2.F6



Description As an internal researcher, I want to access pseu-

donymized data so that I know I am compliant with

GDPR.


criteria

Manual inspection.


ID EU2.F7



Description As DPO, I want data to be encrypted when trans-

ferred to a co-holder so that I know the hospital is

compliant with GDPR.


criteria

Manual inspection.




ID EU2.F8

Requirement Type Non-Functional


Description As a doctor, I want data to be readily accessible so

that I can access them when I need them the most

during an emergency.


criteria

Check if availability is ≥ 99.9999%


ID EU2.F9



Description As a doctor, I want fast processing of data so that I

do not slow down operations during an emergency.


criteria

Check if processing time is < 1s


ID EU2.F10



Description As a doctor, I want data to be re-identified upon

retrieval so that I can access all information availa-

ble related to the emergency at hand.


criteria

Manual inspection.


ID EU2.F11



Description As DPO, I want that data to be pseudonymized

upon transferring in the hospital group cloud so that

the hospital is compliant to GDPR.


criteria

Manual inspection.




ID EU2.F12



Description As DPO, I want that data to be encrypted upon

transferring in the hospital group cloud so that the

hospital is compliant to GDPR.


criteria

Manual inspection.


OSR Use Case Application level Requirements

ID EU2.UC1



Description As a user, I want to log into the system so that I can

start using the platform.


criteria

Test if login is successful for existing accounts.

Test if login fails for non-existing accounts.


ID EU2.UC2


Priority | Category Would|Security

Description As a user, I want - upon login, to be able to ask for

a recover password email, so that I can access

again the system if I do not remember my creden-

tials.


criteria

Test if a recovery email is successfully sent to the

mailbox of an existing account.

Test if the user is prompted with an error if the mail-

box is not associated with an existing account.


ID EU2.UC3








criteria




ID EU2.UC4


Priority | Category Should|Functional

Description As a user, I want to see my profile so that I can re-

view my personal information.


criteria

Test that each personal information is present in the

profile page.


ID EU2.UC5


Priority | Category Could|Security

Description As a user I want to change the password so that I

can enforce the security of my account.


criteria

Test if after changing the password it is not possible

to log in with the old one.

Test if after changing the password it is possible to

log in with the new password.


ID EU2.UC6



Description As a researcher, I want to query the system for the

average values extracted from available blood

tests (by component name and age range) so that

I can start my research.


criteria

Search for an existing component (e.g., Fibrinogen).

Search for an existing component in a loose age

range (e.g., 0-110).

Search for an existing component (e.g., Fibrinogen)

in a wrong age range (e.g., 120-200).

Search for a non-existing component.




ID EU2.UC7



Description As a researcher, I want to query the system for gen-

der, age, BMI, and cholesterol of patients that had

(resp. did not have) a stroke so that I can start my

research.


criteria

Search for patients that did not have a stroke.

Search for patients that had a stroke.


ID EU2.UC8



Description As researcher, I want to download the result set of

a query so that I can process it off-line.


criteria

Test if a given data view result can be downloaded

via a link.


ID EU2.UC9



Description As researcher, I want to bookmark the query of a

data view so that I can easily execute them again.


criteria

Formulate a query. Bookmark it. Check if it appears

in the list of favorite queries.


ID EU2.UC10



Description As a researcher, I want to remove from my book-

marks a query so that I can stop tracking outdated

queries.




criteria

Go to the list of favorite queries. Remove one, see if

it does not appear anymore.


ID EU2.UC11



Description As a researcher, I want to receive a notification

email when new data related to my favorite queries

are available so that i am informed on the availa-

bility of novel data.


criteria

Trigger the event signaling the presence of new

data. Check for the presence of the email.


ID EU2.UC12






criteria




ID EU2.UC13



Description As a user, I want to log out of the system so that I

can safely close the working session.


criteria

Test if logout is successful (i.e., if it is not possible to

access restricted areas after logout).


ID EU2.UC14



Description As a doctor, I want to retrieve all the information

available in the EHR for the specified patient (by SSN



and time period) so that I can go through his past

exams to better address the emergency.


criteria

Search for an existing patient.

Search for an existing patient in a given time period.

Search for a non-existing patient.


ID EU2.UC15


Priority | Category Could|Functional

Description As a doctor, I want to select two blood tests of a

given patient and compare them visually so that I

can better understand what changed from one to

the other.


criteria

Select two exams which differ from some known

fields. Check if the differences are highlighted.


ID EU2.UC16


Priority | Category Would|Functional

Description As a doctor, I want to select two medical images

for the same type of exam, of a given patient, and

compare them visually so that I can better under-

stand what changed from one to the other.


criteria

Select two images with known differences. Check if

they are visually highlighted.


Objective to WP Traceability Matrix

Objec-

tive

Criteria WPs Specific

Components

Objective 1. Improvement of productivity when developing and deploying

data-intensive applications

1.1 Adoption of the framework for the develop-

ment of 4 applications relevant in the

adopted case studies.

WP5 All the appli-

cations to be

developed

by the end

users.



1.2 Ability to be connected with the main SBC

devices (i.e., Raspberry PI, Odroid), as well as

the main operating systems (i.e., Android OS,

Linux, Windows).

WP3 check if also

WP4 is

needed

1.3 Integration with 3 different data stores, com-

ing from different worlds (SQL, NoSQL data

stores, CEP systems) using the common defi-

nition of VDC that will provide the necessary

abstraction.

WP3 DAL

Objective 2. Enhancing the data management in mixed cloud/fog environments adding data and computation movement

2.1 Definition and implementation of 5 diverse

modes to transform/transmit data for the

movement.

WP4 Data Move-

ment Enactor

2.2 Definition and implementation of 5 diverse

modes to deploy/reconfigure tasks in mixed

federated cloud/fog environments.

WP3

WP4

Deployment

Engine

2.3 Improvement of more than 10% of the ob-

served latency when using a diverse mode

for data

WP4 DS4M, Data

Movement

Enactor

2.4 Improvement of more than 10% of the ob-

served latency when using a different de-

ployment configuration for more than 5 of

the use case functionalities.

WP3

WP4

DS4M, De-

ployment En-

gine

2.5 Ability to extract the same information in less

than 10% of the observed latency using less

than 90% of data for more than 2 of the use

case functionalities.

WP3

WP4

DS4M, De-

ployment En-

gine

Objective 3. Definition of strategies for improving the execution of data-inten-sive applications

3.1 Definition of a set of 20 indicators able to

measure aspects related to non-functional

aspects such as performance, security, data

quality, and so on.

WP2,

WP3

Blueprint,

goal-based

model

3.2 Ability to estimate the effects of the enact-

ment data and computation movement

techniques with an error lower than 15% with

respect to the real effects measured a-poste-

riori.

WP4 DS4M

3.3 Reduction of 10% of the time needed for the

transition between different deployments us-

ing DITAS framework compared to traditional

approaches.

WP3

WP4

Deployment

Engine

Objective 4. Enabling the execution of data-intensive applications in a mixed cloud/fog environment

4.1 Ability to run all the 4 different applications

designed according to the DITAS framework.

WP5 All compo-

nents



4.2 Reduction of 10% of the response time of ap-

plications running on DITAS framework with

respect to executions that do not exploit the

facilities provided by the project.

WP5 All compo-

nents

4.3 Ability to limit the overhead of the monitoring

system for an application to less than 10% of

the resource usage.

WP4 Data Analyt-

ics + monitor-

ing

Objective 5. Maximise the impact on business

5.1 Creation of visibility, engaging and nurturing

actions with targeted audiences including

scientific community, research other fellow

projects and initiatives, open source and in-

dustry communities.

WP6,

Rest

of

WPs

Website, All

components

5.2 Demonstrate suitability and value with case

studies and leverage promotion as Demon-

strators to support impact creation.

WP6,

WP5,

Rest

of

WPs

Case studies

applications,

All compo-

nents

5.3 Impact created with knowledge transfer; in-

dividual exploitation, innovation and/or com-

mercialization from every partner.

WP6,

Rest

of

WPs

Not compo-

nents but

every partner

is involved.

5.4 Establishment of DITAS sustainability body

with the involvement of as many organiza-

tions as possible

WP6,

Rest

of

WPs

Not compo-

nents but

every partner

is involved.



ANNEX 2: DITAS Components

Virtual Data Container

Component name: Data utility evaluator (DUE@VDC)

Description: The data utility metrics defined in the abstract blueprint, defines

the utility of data of the data sources when the data was included for the first

time in DITAS. However, the content of data sources changes over time and,

therefore, the data quality metrics must be updated. The DUE@VDC collects

the data utility of the results of the method provided by the VDC and sends the

results to the DUE@VDM.

Inputs: output of methods Input mechanism: REST

Outputs: data utility metrics Output mechanism: REST

Implementation language (if code): python

Requirements: T1.7 Storage: N/A

Component name: SLA Manager

Description: Every method defined in the abstract blueprint will provide a series

of Service Level Indicators (SLI), for example, average response time for queries,

availability of the underlying datasource or ratio of errors found during those

same queries. This SLIs will be used to form Service Level Objectives (SLO), that

is a combination of guarantees about SLI values that should be fulfilled. For

example, we might get an SLO that says that a particular method will provide

results in less than 1 second on average and that it will only produce 1 errors for

every 1000 queries. With this SLOs which are defined in the blueprint itself, the

SLA Manager compose a Service Level Agreement (SLA) per method. The SLA

is a guarantee that all SLOs previously defined will be maintained during the

execution of the VDC. To validate it, the SLA Manager relies in the Data Ana-

lytics component, asking for SLI values and checking them against their SLOs.

In case some of them are not fulfilled, it will inform of a violation to the Decision

System for Data Movement, passing the broken SLO and SLI values that pro-

duced the violation.

Inputs: Blueprint, SLAs Input mechanism: File, REST

Outputs: JSON Output mechanism: REST

Implementation language (if code): Go

Requirements: T1.7 Storage: MongoDB



Component name: Computation Movement Enactor

Description: When some QoS constraints are not fulfilled and data movement

is not the best option, it might be necessary to move computation units across

resources. For example, in case the response time of a query is quite high due

to the latency between the VDC and the final user, data movement might not

be enough to achieve the desired goal because the request must travel to a

different cluster than the one holding the data, potentially far away from the

final user. The DS4M component will take this into account and it will inform the

Computation Movement Enactor about the need to move computation units

between clusters. Once the order is received, the Computation Movement En-

actor will execute the actual movement of computation units.

Inputs: Blueprint, JSON Input mechanism: File, REST



Requirements: Storage:

Component Name: Computation Movement Enactor

Description: When some QoS constraints are not fulfilled, it might be necessary

to move computation units across resources. For example, in case the response

time of a query is quite high due to the latency between the VDC and the final

user, data movement might not be enough to achieve the desired goal be-

cause the request must travel to a different cluster than the one holding the

data, potentially far away from the final user. The DS4M component will take

this into account and it will inform the Computation Movement Enactor about

the need to move computation units between clusters. Once the order is re-

ceived, the Computation Movement Enactor will execute the actual move-

ment of computation units.

Implementation will start in the second period.

Component name: Data analytics

Description: aggregates additional information, generated by the operation of

different DITAS components. Provides an interface to query the various data

sources that comprise this information and does additional processing and re-

fining where necessary. Its queries integrate key QoS metrics used in the oper-

ation of other components such as the SLA manager and Decision system for

data movement.

Inputs: REST query Input mechanism: REST



Outputs: REST query answer Output mechanism: REST

Implementation language (if code): python, swagger

Requirements: docker container for api Storage: 3GB for ELK +10 Gb for ES db

Component name: Data movement enactor

Description: it enacts the actions of data movement across locations by con-

suming the API of the storage layer. It will copy and maintain synchronicity of

data between edge and cloud servers/instances; making the data available

closer to where it’s needed. inside a VDC.

Inputs: REST query Input mechanism: REST

Outputs: REST query answer Output mechanism: REST

Implementation language (if code): no


Component name: Log analysis service

Description: Components and data-sources running in different clusters log

their activity providing information about how well they are executing their

tasks and how their internals are working. Analyzing these logs is a basic activity

that system administrators do for both debugging and preventing problems but

the amount of data in them is usually huge, which makes it impossible for a

single operator to be able to spot problems just by looking at them. That’s why

tools like Logstash were developed. It enables operators to define rules for ag-

gregating entries in logs and provide summaries of information which are easier

to look at in order to find current or future problems. The log analysis service will

use one of those tools to aggregate log information that can be used in auto-

matic or supervised processed to ensure the resources available to VDCs and

users are working as expected.

Inputs: JSON Input mechanism: REST


Implementation language (if code):




Component name: Request-Monitor

Description: The VDC Request Monitor is one of the monitoring sidecars used

to observe the behavior of a VDC within the DITAS project. The agent acts as

an ingress controller to the VDC and observes any incoming and outgoing

HTTP/HTTPS traffic. It can be instructed to add SSL encryption between the VDC

and a Client using ether the Let’s Encrypt Protocol or a self-signed certificate.

It also records metadata about the request traffic and reports it to Elas-

ticsearch for later analysis. The monitor also adds open-tracing headers to

each incoming request, therefore enabling the use of tracing systems like Zip-

kin.

Inputs: HTTP Request Input mechanism: HTTP

Outputs: HTTP Response, traffic metadata, open-

tracing headers Output mechanism: HTTP


Requirements: ElasticSearch Storage: local files, Elas-

ticSearch

Component name: Throughput Agent

Description: The VDC Throughput Agent is one of the monitoring sidecars used

to observe the behavior of VDCs within the DITAS project. The agent observes

all incoming and outgoing requests from a VDC by observing the underlying

socket layer. The data is aggregated over time and send to the monitoring

database. This agent acts as a passive monitoring sidecar to the VDC and is

therefore independent of the concrete VDC implementation.

Inputs: Blueprint Input mechanism:

Outputs: Network usage statistics Output mechanism: ElasticSearch


Requirements: Linux Sockets, Elas-

ticSearch Storage: ElasticSearch

Component name: Logging Agent

Description: The VDC Logging agent is a small software service to enable a

VDC to transmit metrics and instrumentation information to the DITAS platform.

The agent offers an interface to each VDC, which enables access to the mon-

itoring and tracing databases without requiring the VDC to included specific



dependencies for these services. The agent is intended to run in the same con-

tainer as the VDC and is compiled with static libraries, allowing it to be de-

ployed in any Unix-like environment.

Inputs: Rest and local Filesystem Input mechanism: REST

Outputs: Aggregated log data Output mechanism: ElasticSearch


Requirements: Storage: ElasticSearch

Component name: DAL

Description: part of the VDC, which is in charge of simplifying the connection

between the VDC data processing layer and the data sources. DAL is always

deployed in the same security and privacy realm of the data source made

available by the data administrator and it provides the required connectivity

of the data source to the VDC processing while enforcing privacy policies.

Inputs: protobuf of data query Input mechanism:

gRPC

Outputs: protobuf of data content Output mecha-

nism: gRPC

Implementation language (if code): Scala

Requirements: computation resources for Spark near the

data sources; privacy enforcement engine Storage:

Component name: Privacy Enforcement Engine

Description: It acts as a proxy before executing queries over the data in data

sources. It rewrites the query so that it accesses only data compliant with pri-

vacy policies.

The Enforcement Engine is a sidecar of the VDC, and data access policies,

access purposes and data subject consents are defined in Data Policy and

Consent Manager (DPCM). The VDC that is implemented with Apache Spark

uses the enforcement engine to transform Spark SQL queries to queries that

return only compliant data.

Inputs: SQL query, access purpose Input mechanism: REST API



Outputs: Rewritten SQL Query, addi-

tional data sources with privacy in-

formation

Output mechanism: REST API response

Implementation language (if code): Scala

Requirements: policies and consent

should be provided by DPCM (exter-

nal component)

Storage: Intermediate representation

saved near data as additional ta-

bles/files in the data store

Virtual Data Manager

Component name: Data utility evaluator (DUE@VDM)

Description: The DUE@VDM collects the information retrieved by all the

DUE@VDC linked to the VDM and it aggregates at runtime the quality of data

of the overall VDC in order to update the abstract blueprint related, for future

usage. Such update is optional and not automated, its usage depends on the

business plan and the ownership of the data sources.

Inputs: Data from all the DUE@VDC Input mechanism: REST

Outputs: updates abstract blueprint Output mechanism: blueprint repository

call

Implementation language (if code): Python

Requirements: T1.7, T3.18 Storage: N/A

Component name: Decision system for data and computation movement

(DS4M)

Description: DITAS will guarantee the satisfaction of application designer’s re-

quirements by moving data and tasks. DS4M is the reasoning system that will

decide the best data or computation movement to enact.

Inputs: Violations of user requirements specified in

concrete blueprint Input mechanism: REST

Outputs: data or computation movement Output mechanism:

REST

Implementation language (if code): Java

Requirements: T4.18, T4.19 Storage:



DITAS SDK

Component name: Data utility evaluator (DUE)

Description: provides information about the data utility according to the user

request in a given time. Data utility is dynamic and can change over time ac-

cording to the platform and to the application requirements. It integrates the

Potential Data Utility Service (PDUS) and the Sample Data Generator defined

in D1.1.

Inputs: blueprint Input mechanism: REST

Outputs: Blueprint Output mechanism: REST

Implementation language (if code): Python


Component name: Data Utility Resolution Engine (DURE)

Description: In order to help the application designer to select the best blue-

print, DITAS will order them based on the fit of the blueprint on the requirements

he/she specified. DURE plays a central role in these functionalities since it filters

and ranks a list of blueprints based on the application requirements.

Inputs: list of blueprints and application require-

ments Input mechanism: REST

Outputs: ordered list of blueprints Output mechanism: REST

Implementation language (if code): node.js


Component name: VDC Blueprint Repository

Description: This is a repository where all the abstract VDC blueprints are stored.

Inputs: Interacts with other components or DITAS

roles through the Repository Engine REST interface Input mechanism: N/A

Outputs: N/A Output mechanism: N/A

Implementation language (if code):



Requirements: N/A Storage: Document-ori-

ented database (Mon-

goDB)

Component name: VDC Blueprint Repository Engine

Description: The Repository Engine provides CRUD operations in the VDC Blue-

print Repository. Indicatively, it enables the data administrator to store, update

or delete his/her abstract VDC blueprint and the Resolution Engine to retrieve

blueprints from the Repository.

Inputs: Receives HTTP requests from other components or DI-

TAS roles (e.g. Resolution Engine or data administrator) Input mecha-

nism: REST

Outputs: Depending on the HTTP request method Output mecha-

nism: REST


Requirements: N/A Storage: N/A

Component name: VDC Blueprint Validator

Description: This is a subcomponent of the Repository Engine and its goal is to

enforce that the inserted or updated abstract blueprints are valid before they

end up in the Repository, according to the current abstract VDC blueprint

schema and other logic requirements. For invalid blueprints, the Validator pro-

vides descriptive and helpful error messages to the data administrator.

Inputs: N/A Input mechanism: N/A

Outputs: N/A Output mechanism: N/A


Requirements: Abstract VDC blueprint schema Storage: N/A

Component name: Resolution Engine

Description: Resolution engine is the component responsible for filtering and

ranking the VDC Abstract blueprints in the Blueprint Repository based on the

User Requirements. It enables the Application designer to find the most appro-

priate Blueprint based on the needs of the Application in terms of content, QoS,

QoD and Security features and Privacy regulations.



Inputs: User Requirement JSON file, DURE

output Input mechanism: REST

Outputs: Best Candidate VDC Blueprint Output mechanism: REST

Implementation language (if code): Java, Spring Framework

Requirements: DURE output Storage: Elasticsearch Database

Component name: Privacy Security Evaluator

Description: The PSE is the component responsible for filtering and ranking the

Abstract blueprints based on the security and privacy requirements of the User.

It is part of the Resolution Engine pipeline and dependent of the DURE. It’s de-

signed as microservice and therefore independently scalable and managea-

ble form the DURE and Resolution Engine.

Inputs: DURE Request (Blueprint, Subset of User Re-

quirements) Input mechanism: REST

Outputs: Ranked List of Blueprints Output mechanism:

REST

Implementation language (if code): Java, Spring Framework

Requirements: DURE Storage: N/A



ANNEX 3: DITAS Technical Questionnaire

Thank you for filling in the DITAS technical questionnaire.

Nowadays, there is an increasing need to develop Data-Intensive Applications

(DIAs), able to manage more and more amounts of data coming from distrib-

uted and heterogeneous sources, such as IoT sensors, devices or mobile appli-

cations, in an effective, quick and secure manner.

The goal of DITAS is to propose a cloud platform to support information logistic

for DIAs where data processing does not occur only on cloud resources but also

on devices at the edge of the network. To achieve this goal, DITAS project is im-

plementing data and computation movement strategies. According to these

strategies, DITAS platform decides where, when, and how to save data – on the

cloud or on the edge of the network – and where, when, and how to compute

part of the tasks composing the application. Therefore, it creates a synergy be-

tween traditional and cloud approaches, in order to find a good balance be-

tween reliability, security, sustainability, and cost.

The purpose of this questionnaire is twofold. Firstly, to rank the nine following basic

requirements that we have identified, in order to direct the design and the im-

plementation of the DITAS architecture and components accordingly. Secondly

to rank the nine following parameters, so that we have a better understanding

of which issues to take under consideration and with which priority, while devel-

oping the aforementioned data and computation movement strategies.

Please rank the following requirements (1 corresponds to the most important, 9

to the less one):

Requirement Rank

R1: Reduce and process the data on the Edge/IoT side before they

reach a central location such as the Cloud

R2: Harmonize the data coming from different data sources (data het-

erogeneity)

R3: Respect privacy requirements (such as GDPR compliance) in data

movement transactions and data access

R4: Simplify the exposed data to third party users/clients



R5: Simplify the data access from Fog and Cloud

R6: Have a flexible agreement between data provider and consumer

(e.g. latency < 100 ms while availability > 99.999%)

R7: Have the possibility to express in SLA data quality constraints

R8: Have the possibility to monitor how data is provisioned or con-

sumed

R9: keep track of the data transformations occurring during the data

movement

Please rank the following parameters (1 corresponds to the most important, 9 to

the less one) to drive the data and computation movement:

Parameter Rank

P1: Data Quality

P2: Reputation of the Data Source

P3: Availability

P4: Latency, Response Time, Throughput



P5: Encryption

P6: Purpose Control

P7: Data Access Policy

P8: Authentication

P9: Access Monitoring

Based on the answers that we have collected, the following charts depict the

average rank per requirement and per parameter respectively:

Figure 37. Average rank per requirement for technical questionnaire



Figure 38. Average rank per parameter for technical questionnaire



ANNEX 4: DITAS market context questionnaire

Below it is included the questionnaire template used to carry out the survey and

interviews for the DITAS market context analysis.

Introduction to the interview

Thank you for participating in this survey. The purpose of this interview is to ask

you several questions to help us to get to know better our potential stakeholders

and their needs.

The interview should not last more than 30-40 minutes and will consist of three

steps:

Firstly, we need to know about your profile and some aspects of your company.

Consider that we are not going to use your professional or company data, we

just need to categorize surveyed and their companies, so the survey is anony-

mous.

Secondly, you will find some market-oriented questions that we would like you to

answer with your vision, the context of your company and needs and experi-

ence.

Finally, some aspects about different business scenarios identified for DITAS solu-

tion need to be validated, so we have prepared some questions to know your

opinion and vision about them.

Your feedback is very valuable to us, so we are looking forward to hearing from

you and feel free to comment all you consider.

We stress you that the survey is completely anonymous, and you will not be re-

quired to submit any personal data about yourself.

Organizations information

1. Number or employees of your organization (please, select one choice):

< 10

>10 & < 50

> 50 & < 250

> 250

2. Annual revenues of your organization

< 10 M€

> 10 M€ & < 50 M€

> 50 M€

3. What are your target market(s) for your product(s) or project(s)? (you can

choose more than one)

Cloud



IoT

Fog/Edge

Telco

IT

Other (please identify): ___________________________________________

4. What are your target sector(s) for your product(s) or project(s)?

Home

Smart Cities

Smart Health

Smart Transport and Connected Transportation

Smart Buildings and Hospitality

Smart Industry and Manufacturing

Connected Home

Agriculture

Infrastructure

Logistics

Retail

Consumer

Media

Internet or Mobile

Security & Defense

Entertainment

Finance

Banking

Other (please identify): ____________________________________________

5. Does your company develop data-intensive applications?

YES (explain and quantify)

NO

6. Does your company develop applications in Cloud / Edge context?

YES (explain and quantify)

NO

7. Can you describe what your company’s business is specifically in

Cloud/Edge/IoT contexts?

__________________________________________________________________________

Positioning and Background information

1. Which is your position within your organization?

CEO

CTO

Business developer



Product developer

Sales Force

Project manager

Other (please identify): _________________________________________

2. Are you familiar with the development of Cloud, Mobile, IoT, Fog/Edge

Applications?

YES (rate your experience from 1 to 5)

NO

3. Do you know about the market trends in Cloud, Mobile, IoT, Fog/Edge

Applications?

YES (rate your experience from 1 to 5)

NO

Market-oriented questions

1. Does your organization require Cloud/Edge services to develop its busi-

ness?

NO

YES (please select):

AWS Greengrass

Microsoft Azure

Other (please identify): _____________________________________

2. Does your organization use data from external sources for their commer-

cial offering?

YES

NO

3. Does your organization sell or buy data?

Sell data

Buy data

Both

Manage data from others

4. Has your company difficulties managing those data?

NO

YES (please select, you can choose more than one):

Data logistics (detail the specific field)

Data Definition

Data Acquisition (Data sources, Data format, Transmis-

sion protocols, others)

Data Storage (Where to store, DFS with IoT, Security/pri-

vacy, others)



Data Movement (movement, compression, anonymiz-

ing, encryption/securing, others)

Data Consumption (Data analytics / operational pro-

cesses, Lambda/Kappa architecture, others)

Data Dismissal (Deletion, freezing, others)

Data management

Data analysis

Data visualization

Data security/privacy/trust

data storage

data warehousing

data quality

data visualization

data security

data analytics

data governance

data architecture

5. What problems can DITAS solve for you that your company is already

solving with other workflows/tools/platforms?

__________________________________________________________________________

6. Would your organization consider using a solution such as DITAS in your

workflow?

YES

NO (explain why): _____________________________________________

7. How much will your organization willing to pay for such services?

__________________________________________________________________________

8. Is there any other workflow/tools/platform or solutions/service that solves

the problem better/cheaper than DITAS?

________________________________________________________________________

9. Do you think that the Open Source approach of DITAS could be a barrier

for your organization?

YES (please, explain why): ______________________________________

NO