D3.3 Data Virtualization SDK prototype - Ditas Project · D3.3 Data Virtualization SDK prototype 9 2 Final Abstract VDC Blueprint Schema An abstract VDC Blueprint captures all the

D3.3 Data Virtualization SDK

prototype

Project Acronym DITAS

Project Title Data-intensive applications Improvement by moving

daTA and computation in mixed cloud/fog

environmentS

Project Number 731945

Instrument Collaborative Project

Start Date 01/01/2017

Duration 36 months

Thematic Priority

Website:

ICT-06-2016 Cloud Computing

http://www.ditas-project.eu

Dissemination level: Public

Work Package WP3 Data virtualization

Due Date: M30

Submission Date: 02/07/2019

Version: 1.0

Status Final for submission

Author(s): Alexandros Psychas (ICCS), Achilleas Marinakis (ICCS),

Vrettos Moulos (ICCS), Jose Antonio Sanchez (ATOS),

Frank Pallas (TUB), Sebastian Werner (TUB), Maya

Anderson (IBM), Mattia Salnitri (POLIMI), Giovanni

Meroni (POLIMI), Monica Vitali (POLIMI)

Reviewer(s) Monica Vitali (POLIMI), Maya Anderson (IBM)

This project has received funding by the European Union’s Horizon

2020 research and innovation programme under grant agreement

No. 731945

© Main editor and other members of the DITAS consortium

2 D3.3 Data Virtualization SDK prototype

Version History Version Date Comments, Changes, Status Authors, contributors,

reviewers

0.1 01/05/2019 ToC creation Alexandros Psychas

(ICCS)

0.2 20/05/2019 Blueprint sections 1,5 Achilleas Marinakis

(ICCS)

0.3 31/05/2018 Component Architecture, DAL,

Application Profiling

Alexandros Psychas

(ICCS), Achilleas

Marinakis (ICCS),

Vrettos Moulos (ICCS),

Jose Antonio Sanchez

(ATOS), Frank Pallas

(TUB), Sebastian Werner

(TUB), Maya Anderson

(IBM), Mattia Salnitri

(POLIMI), Giovanni

Meroni (POLIMI),

Monica Vitali (POLIMI)

0.7 7/06/2019 Internal review version Alexandros Psychas,

Achilleas Marinakis,

(ICCS)

0.91 21/06/2019 Internal review comments Monica Vitali (POLIMI),

Maya Anderson (IBM)

0.92 24/06/2019 Addressed internal review

comments

Alexandros Psychas,

Achilleas Marinakis,

(ICCS)

0.93 28/06/2019 Final version for quality check Alexandros Psychas,

Achilleas Marinakis

(ICCS)

1.0 02/07/2019 Quality check. Document ready for

submission.

Enric Pages, María

Teresa García (ATOS)



Contents Version History 2

Contents 3

List of Figures 4

List of tables 4

Executive Summary 5

1 Introduction 6

1.1 Glossary of Acronyms 8

2 Final Abstract VDC Blueprint Schema 9

2.1 Internal Structure (Blueprint Section 1) 9

2.2 Data Management (Blueprint Section 2) 20

2.3 Abstract Properties (Blueprint Section 3) 22

2.5 Cookbook Appendix (Blueprint Section 4) 23

2.6 Exposed API (Blueprint Section 5) 32

3 Final Component Architecture and specification 35

3.1 VDC Blueprint Repository Engine 36

3.1.1 VDC Blueprint Validator 36

3.2 VDC Blueprint Resolution Engine 37

3.2.1 Content Based Resolution 38

Component API 39

3.2.2 Data Utility Resolution Engine 39

Component API 41

3.2.3 Privacy Security Evaluator 41

Component API 43

3.2.4 Recommendation Component 44

Component API 46

4 Data Access Layer (DAL) 47

Component API 51

5 Application Profiling and Deployment Strategies 54

6 DITAS SDK 56

7 Conclusions 58

8 References 59

Appendix 60



List of Figures Figure 1: Abstract VDC Blueprint lifecycle .................................................................. 7 Figure 2: Resolution Engine Architecture .................................................................. 35 Figure 3: Creation & Storage phase of the Blueprint Lifecycle ............................. 36 Figure 4: Content Based Search Sequence Diagram ............................................ 38 Figure 5: Data Utility Resolution Sequence Diagram .............................................. 40 Figure 6: Simplified Architecture................................................................................. 42 Figure 7: Filtering process of PSE Sequence Diagram ............................................. 43 Figure 8: Recommendation Component Sequence Diagram ............................. 45 Figure 9: Initialization of DAL Data Movement Sequence Diagram ..................... 48 Figure 10: Finalization of DAL Data Movement Sequence Diagram .................... 48 Figure 11: Data transformation Sequence Diagram ............................................... 49 Figure 12: DAL Interconnection with CAF and Privacy Enforcement Engine ...... 50

List of tables Table 1. Acronyms .......................................................................................................... 8



Executive Summary

This document is responsible for describing all the technical information about

the components created in the context of WP3 and DITAS SDK. More specifically

this document describes all the changes that took place in the development

and implementation of the components responsible for the lifecycle of the VDC,

from the creation to the final deployment. As far as the VDC Blueprint is

concerned, all the changes in the schema in order to better describe the

structure of the VDC and also to address the requirements of the component,

are described. All the changes in the functionalities and the new features of the

components are also described in the document. In the context of WP3 Data

Access Layer (DAL) is fully described and analyzed. This component was

developed and implemented in order to aid the Data Administrator to dictate

the policies, purpose and the security measures needed in order to expose the

data. Finally, all the services, UIs and guidelines that will aid each stakeholder

(Data Administrator, Application developer, Application Designer and DITAS

Operator) involved in the creation, storage, deployment, selection and

operation of the VDC are described in the DITAS SDK section.



1 Introduction One of the main missions of WP3 is to produce the DITAS-SDK which will contain

all the needed information, services, guidelines, and tools in order to support in

the creation of a complete solution for the data intensive Application Designer.

More specifically, DITAS-SDK aims at improving the productivity of the Application

Designer in the process of developing and deploying a data intensive

application. Another important task for the SDK is to help the Data Administrator

to enhance the data management in the cloud and fog environments. In order

to reach these objectives, the DITAS-SDK components were created in order to

support the full lifecycle of the VDC.

The VDC provides an abstraction layer that takes care of retrieving, processing

and delivering data with the proper quality level, while in parallel putting special

emphasis on data security, performance, privacy, and data protection. The

VDC, acting as a middleware and takes the responsibility for providing this data

timely, securely and accurately by hiding the complexity of the underlying

infrastructure, to the Application Designer, who is only obligated to define the

requirements of the application in order to find the most appropriate VDC. The

infrastructure could consist of different platforms, storage systems, and network

capabilities. The VDC Blueprint describes thoroughly the VDC, since it includes,

among others, information about the business characteristics of it, about the

data sources that the VDC connects to, how to deploy it as well as the API that

the data administrator exposes to the data consumers. Conclusively, when we

are talking about the VDC lifecycle we mainly talk about the lifecycle of the

Abstract VDC Blueprint, which is the definition, and the description of the VDC.

Abstract Blueprint lifecycle represents all the phases and components involved

from the creation to discovery and finally the deployment of a VDC Blueprint.

This lifecycle was established and described in the D3.2 Section 1[2], deliverable.

The main developments in this process were mainly in the individual components

without affecting the general Architecture.



Figure 1: Abstract VDC Blueprint lifecycle

Although VDC is responsible for the complete orchestration and management

of the data, a layer responsible for managing which data are exposed and what

are the privacy and security parameters in order to expose them is needed. In

the context of WP3, the Data Access Layer (DAL) was created which has the

fundamental role of exposing the data provided by the Data Administrator to

the DITAS-EE infrastructure without violating any privacy and security constraints.

More specifically, the DAL contains the Privacy Enforcement Layer, which is the

component in charge of modifying the SQL query generated by a data

consumer willing to access the stored data. The query is executed in order to

satisfy the call coming from the Processing Layer. However, different consumers

might have different rights on the data visualization. To achieve this customized

access to the data, the DAL transforms the original query in a SQL query that

applies filters to avoid returning data that cannot be seen externally. This filtering

is also affected by the location of the VDC (e.g., some data cannot be accessed

from outside a safe location). Since the DAL is a fairly new component, a full

architecture, functionalities, and API description are presented in this document.

The rest of the document is structured as follows. Section 2 describes thoroughly

the blueprint schema and the changes that took place in the course of the

project, which components are involved with the blueprint and how the

requirements are addressed. In section 3 the Final architecture of the

components involved in the Blueprint lifecycle are described. Most importantly

the updates and changes in the components are described as well as the final



API. Section 4 contains all the architecture and component specifications of the

Data Access Layer (DAL) component. Section 5 presents all the work done for

the application profiling and deployment strategies in the context of WP3. In

section 6 the SDK of DITAS is introduced. It is important to mention that in the

deliverable the main characteristics and the idea behind the creation and

implementation of the SDK is described, the complete SDK is a wiki page

continuously updated with new information and functionalities as the project

evolves, making it accessible to a wider audience through the DITAS web page.

Finally, the conclusions and future steps are presented in section 7 of this

deliverable.

1.1 Glossary of Acronyms

Acronym Definition

API Application Programming Interface

CAF Common Accessibility Framework

CLI Command-Line Interface

CRUD Create Read Update Delete

D Deliverable

DAL Data Access Layer

DB Data Base

DME Data Movement Enactor

DS4M Decision System for Data and Computation Movement

DUE Data Utility Evaluator

DUR Data Utility Refinement

DURE Data Utility Resolution Engine

EE Execution Environment

GUI Graphical User Interface

HTTP Hypertext Transfer Protocol

IAM Identity Access Management

JSON JavaScript Object Notation

N/A Not Applicable

PSE Privacy & Security Evaluator

PSES Privacy & Security Evaluator Service

REST Representational State Transfer

SDK Software Development Kit

SLA Service-Level Agreement

SQL Structured Query Language

UI User Interface

URI Uniform Resource Identifier

URL Uniform Resource Locator

UUID Universally Unique IDentifier

VDC Virtual Data Container

WP Work Package Table 1. Acronyms



2 Final Abstract VDC Blueprint Schema An abstract VDC Blueprint captures all the properties of a VDC and is developed

according to the abstract VDC Blueprint schema. The latter is a general schema,

through which all the abstract VDC Blueprints are created. It follows the JSON

semi-structured format in order to address requirements T3.9, T3.11 and T3.12, with

regard to the notation language that should be accurate, user friendly, human

readable and able to be parsed efficiently and fast. The Blueprint consists of five

distinct sections, each one of which is used by different DITAS roles or

components. For instance, the Content Based Resolution component of the

DITAS architecture uses section 1 to filter the blueprints based on the content they

provide, whereas the Data Utility Resolution Engine component uses section 2 to

match the application requirements with the data administrator capabilities. The

following tables describe each field of the blueprint schema (Requirement B3.4),

focusing on the updates compared to D3.2. The traceability of each field with

the requirements is also presented. The complete list of the requirements can be

found in Annex 1 of D1.2 [4].

2.1 Internal Structure (Blueprint Section 1) While the scope of this blueprint section remains the same with respect to the

version described in D3.2 Section 3.1 [2], the table that analyzes each one field

of the blueprint has been enriched, in order to include additional information

about who (Role/Component column), when and why (Phase/Process column)

is using this specific field.

Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Overview Object Information about

the content of the

data served by

the VDC

Resolution

Engine

Blueprint

Selection

Phase

Overview.name String The name of the

VDC Blueprint

N/A N/A

Overview.description String Textual

description of the

VDC Blueprint

Resolution

Engine

Blueprint

Selection

Phase



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Overview.tags Array Each element of

this array contains

some keywords

that describe the

exposed data of

each one VDC

method

Resolution

Engine

Blueprint

Selection

Phase

Overview.tags.metho

d_id

String The id

(operationId) of

the method (as

indicated in the

EXPOSED_API.pat

hs field)

Resolution

Engine

Blueprint

Selection

Phase

Overview.tags.tags Array Keywords that

describe the

exposed data of

this specific VDC

method

Resolution

Engine

Blueprint

Selection

Phase

With respect to the version described in D3.2 Section 3.1 [2], the Overview field

contains the same information

Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Data_Sources Array Data sources used

by this VDC

Deployment

Engine

VDC

deployme

nt and

movement

Data_Sources.items.

properties.id

String A unique identifier

of this data source

Deployment

Engine

VDC

deployme

nt and

movement



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Data_Sources.items.

properties.descriptio

n

String Description Deployment

Engine

VDC

deployme

nt and

movement

Data_Sources.items.

properties.location

Enum Cloud/Edge Deployment

Engine

VDC

deployme

nt and

movement

Data_Sources.items.

properties.class

Enum "relational

database",

"object storage",

"time-series

database", "api",

"data stream"

Deployment

Engine

VDC

deployme

nt and

movement

Data_Sources.items.

properties.type

Enum "MySQL", "Minio",

"InfluxDB",

"rest", "other"

Deployment

Engine

VDC

deployme

nt and

movement

Data_Sources.items.

properties.paramet

ers

Object Connection

parameters

Deployment

Engine

VDC

deployme

nt and

movement

Data_Sources.items.

properties.schema

Object Schema Deployment

Engine

VDC

deployme

nt and

movement

Additional properties have been added to the Data_Sources field, for the use of

the deployment engine when configuring the VDC, such as data source id, and

for the use of DS4M. This addresses the requirement T2.1 for metadata describing

the data sources that must be available.



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Methods_Input Object This field contains

the part of the

data source that

each method

needs to be

executed

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods

Array The list of methods DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.method_id

String The id

(operationId) of

the method (as

indicated in the

EXPOSED_API.pat

hs field)

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources

Array The list of data

sources required

by the method

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources.

dataSource_id

String The id of the data

sources (as

indicated in the

Data_Sources

field)

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources.

dataSource_type

String The type of the

data sources

(relational/not_rel

ational/object)

DS4M Data and

computati

on

movement,

Monitoring

Data and

computati



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources.

database

Array the list of

databases

required by a

method in a data

source

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources.

database.databas

e_id

String The id of the

database

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources.

database.tables

Array the list of

tables/collections

required by a

method in a data

source

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources.

database.tables.ta

ble_id

String The id of the

tables/collection

DS4M Data and

computati

on

movement,

Monitoring

Methods_Input.Met

hods.dataSources.

database.tables.c

olumns

Array The IDs of the

column/field to

be moved

DS4M Data and

computati

on

movement,

Monitoring

The Methods_Input field describes what are the parts of each data source that

are used for each method. It was inserted in the blueprint to allow the Decision

System for Data and Computation Movement (DS4M) to move only the portion

of data that is used [3]. Such new section allows reducing the amount of data to

be transferred and stored when data sources are replicated (Objective 2.5) and

to correctly estimate the data utility for each method (Requirement T3.18).



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Flow Object The data flow that

implements the

VDC

N/A N/A

Flow.platform Enum Spark or Node-

RED

N/A N/A

Flow.parameters Object Platform details

(for Spark)

N/A N/A

Flow.source_code Any JSON

structure

The flow JSON file

(for Node-RED)

N/A N/A

With respect to the version described in D3.2 Section 3.1 [2], the Flow field

contains the same information

Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

DAL_Images Object Set of Docker

images to include

in the DALs

associated to this

VDC. The key is a

unique DAL

identifier while the

values are the

image

information

Deployment

Engine

VDC

Deployme

nt and

movement

DAL_Images.[dal_i

d].original_ip

string IP where the

original DAL has

been deployed

Deployment

Engine

VDC

Deployme

nt and

movement

DAL_Images.[dal_i

d].images

Object Set of images to

deploy in this DAL

implementation.

N/A N/A



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

The key is a unique

identifier and the

values are the

image

information

DAL_Images.[dal_i

d].images.[image_i

d].image

string The Docker image

name in standard

format

[repository]/[grou

p]/<image_name

>:[version]

Deployment

Engine

VDC

Deployme

nt and

movement

DAL_Images.[dal_i

d].images.[image_i

d].internal_port

Int The port in which

the software of

the image will be

listening, if any.

This port won’t be

exposed outside,

but it will receive

data through

redirection.

Deployment

Engine

VDC

Deployme

nt and

movement

DAL_Images.[dal_i

d].images.[image_i

d].external_port


the image will be

accessible. It will

redirect any

request to this port

to the one

specified in

internal_port

Deployment

Engine

VDC

Deployme

nt and

movement

DAL_Images.[dal_i

d].images.[image_i

d].environment

Object Environment

variables to pass

to the image in

key-value format.

Deployment

Engine

VDC

Deployme

nt and

movement

The DAL_Images field is new, and it has been introduced in this version of the

schema.



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

VDC_Images

Object Set of Docker

images to include

in the DAL. The

key is a unique

image identifier

while the values

are the image

information

Deployment

Engine

VDC

Deployme

nt and

movement

VDC_Images.[im

age_id].image

string The Docker image

name in standard

format

[repository]/[grou

p]/<image_name

>:[version]

Deployment

Engine

VDC

Deployme

nt and

movement

VDC_Images.[im

age_id].internal_

port


the software of

the image will be

listening, if any.

This port will not

be exposed

outside but it will

receive data

through

redirection.

Deployment

Engine

VDC

Deployme

nt and

movement

VDC_Images.[im

age_id].internal_

port


the image will be

accessible. It will

redirect any

request to this port

to the one

specified in

internal_port

Deployment

Engine

VDC

Deployme

nt and

movement

VDC_Images.[im

age_id].internal_

port

Object Environment

variables to pass

Deployment

Engine

VDC

Deployme



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

to the image in

key-value format.

nt and

movement

The VDC_Images field is new, and it has been introduced in this version of the

schema.

Field Type

(JSON

Format)

Description Role/Compone

nt

Phase/Proc

ess

Identity_Access_Ma

nagement

Object Information about

identity and

access

management of

this VDC

Access

Management,

Request

Monitor

Blueprint

Selection

Phase

Identity_Access_Ma

nagement.jwks_uri

String The JWKS URL for

getting

verification keys

Application,

Request

Monitor, DAL,

Any that needs

to validate a

token

Blueprint

Selection

Phase

Identity_Access_Ma

nagement.iam_end

point

String The endpoint of

the IAM server

Application,

Request

Monitor, DAL,

Any that needs

to validate a

token

Blueprint

Selection

Phase

Identity_Access_Ma

nagement.roles

List of

Strings

A set of roles that

a user might have.

Request

Monitor, PSES

Blueprint

Selection

Phase

Identity_Access_Ma

nagement.provider

List of

Objects

A list of identity

provider that can

be used. Can be

empty if only the

DITAS internal one

is used.

N/A Blueprint

Selection

Phase



Field Type

(JSON

Format)

Description Role/Compone

nt

Phase/Proc

ess

Identity_Access_Ma

nagement.provider[i

].name

String Name of the

provider

Request

Monitor

Blueprint

Selection

Phase

Identity_Access_Ma

nagement.provider[i

].type

String Type of the

provider to use.

Only OAuth

supported for

now.

Request

Monitor

Blueprint

Selection

Phase

Identity_Access_Ma

nagement.provider[i

].uri

String Address of the

provider.

Request

Monitor

Blueprint

Selection

Phase

Identity_Access_Ma

nagement.provider[i

].portal

Login Portal for

that provider.

Request

Monitor

Blueprint

Selection

Phase

The Identity_Access_Management field is new, and it has been introduced in this

version of the schema. It describes how identity access is managed. It was

inserted in the blueprint for two main purposes:

● Giving App Developers all information necessary to authenticate against

the VDC

● Enable pre-filtering of Blueprints that a Developer has no access to

All technical details about the identity access management are stored in the

cookbook section of the blueprint under the same field name, as the other

information is not relevant to an application developer but the DITAS runtime.

Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Testing_Output_D

ata

Array Sample dataset

per VDC method

DUE Blueprint

Selection

Phase

Testing_Output_D

ata.method_id

String The id of this

exposed VDC

method

DUE Blueprint

Selection

Phase



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

Testing_Output_D

ata.attributes

Array The attributes of

the output data

returned by the

method that are

required by the

end user

DUE Blueprint

Selection

Phase

Testing_Output_D

ata.zip_data

String The URI to the zip

sample data for

this exposed VDC

method

DUE Blueprint

Selection

Phase

Testing_Output_D

ata.history_time

Integer The time interval,

expressed in

seconds before

the current time,

to compute the

availability of the

method.

SLA Manager Monitoring

Testing_Output_D

ata.history_invoc

ations

Integer The maximum

number of

invocations of the

method, to

compute its

availability

SLA Manager Monitoring

With respect to the version described in D3.2 Section 3.1 [2], the

Testing_Output_Data field contains almost the same information.

The attributes, history_time and history_invocations subfields are obtained from

the application requirements. Therefore, they are present only in the

intermediate and concrete blueprint. The attributes subfield is used by the Data

Utility Evaluator (DUE) to calculate the data quality only for those output

attributes that are relevant for the application designer [1]. Similarly, the

history_time and history_invocations subfields are used by the SLA manager to

compute the availability for that method based on, respectively, the indicated

time interval and number of service invocations.

The zip_data subfield, instead, may be used by the data administrator in order to

provide a reference to a file that contains a representative sample of the dataset

that a specific VDC method exposes (Requirement T2.2), and by the DUE to

correctly estimate the data utility for each method (Requirement T3.18).



2.2 Data Management (Blueprint Section 2) The Data Management section of the Blueprint specifies, for each method, the

guaranteed levels of data quality, security and privacy. Such information will be

used (i) for filtering the blueprints that do not fit the Application Designer

requirements; (ii) for the data and computation movement; (iii) as specifications

of metric thresholds agreed with the application developer. For further details,

please see D3.2 Section 3.2 Table 6 and Table 7 [2].

Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

method_id String The id of this

exposed VDC

method

DURE, SLA

Manager,

DS4M

Blueprint

Selection

Phase,

Data and

computatio

n

movement,

Monitoring

attributes Object Data utility,

security and

privacy attributes

for this exposed

VDC method

DURE, SLA

Manager,

DS4M

Blueprint

Selection

Phase,

Data and

computatio

n

movement,

Monitoring

attributes.dataUtility Array A list with all the

metrics related to

data quality. The

JSON schema for

each one metric

is presented in the

next table

DURE, SLA

Manager,

DS4M

Blueprint

Selection

Phase,

Data and

computatio

n

movement,

Monitoring



Field Type

(JSON

Format)

Description Role/

Component

Phase/

Process

attributes.security Array A list with all the

properties related

to security. The

JSON schema for

each one

property is

identical to the

schema used for

the data utility

metrics

DURE, SLA

Manager,

DS4M, PSE

Blueprint

Selection

Phase,

Data and

computatio

n

movement,

Monitoring

attributes.privacy Array A list with all the

properties related

to privacy. The

JSON schema for

each one

property is

identical to the

schema used for

the data utility

metrics

DURE, SLA

Manager,

DS4M, PSE

Blueprint

Selection

Phase,

Data and

computatio

n

movement,

Monitoring

With respect to the contents described in D3.2 Section 3.2 [2], the data structure

was slightly changed. In particular, properties associated to a metric are no

longer defined as JSON objects. Instead, they are defined as JSON properties.

Changes in the JSON schema are highlighted in bold.

{

"type":"object",

"properties":{

"id":{

"description":"id of the metric",

"type":"string"

},

"name":{

"description":"name of the metric",

"type":"string"

},

"type":{

"description":"type of the metric",

"type":"string"

},

"properties":{



"description":"properties related to the metric",

"type":"object",

"additionalProperties":{

"type":"object",

"properties":{

"unit":{

"description":"unit of measure of the property",

"type":"string"

},

"maximum":{

"description":"lower limit of the offered property",

"type":"number"

},

"minimum":{

"description":"upper limit of the offered property",

"type":"number"

},

"value":{

"description":"value of the property",

"anyOf":[

{

"type":"string"

},

{

"type":"object"

},

{

"type":"array"

}

]

}

}

}

}

}

}

2.3 Abstract Properties (Blueprint Section 3) This section contains the goal model that specifies the non-functional application

requirements that the blueprint is expected to fulfill once the concrete blueprint

is instantiated. Such goal model is used by the SLA Manager to detect violations,

and by the DS4M to identify the best data and computation movement actions.

For further details on how the goal model is encoded, please see D3.2 Section

3.3 [2].

This section remains empty in the abstract blueprint. Once VDC Blueprint

resolution takes place, an intermediate blueprint is generated. In particular, the

Data Utility Resolution Engine, which is furtherly discussed in section 3.2.2, inserts

in this section a subset of the goal model taken from the application designer

requirements.



2.5 Cookbook Appendix (Blueprint Section 4)

Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Deployment Object Information about the

available

infrastructures and the

software running on it

N/A N/A

Deployment.id String Identifier of the

deployment

N/A N/A

Deployment.na

me

String Human friendly name

of the deployment

N/A N/A

Deployment.infr

astructures

Object Set of clusters

deployed with this

blueprint. The key is the

infrastructure identifier

and the value is the

cluster nodes

information

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.id

String Identifier of the cluster

infrastructure

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

ame

String Human readable

name of the cluster

infrastructure

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.ty

pe

String Type of the cluster. For

example: cloudsigma,

aws or baremetal

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

odes

Object Nodes present in the

cluster indexed by

node role. The value is

a list of nodes

information

N/A N/A



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.host

name

String Hostname of the node N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.role

String Role on the node in

the cluster. In case of

Kubernetes it will be

master or slave

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.ip

String External IP of the node N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.driv

e_size

Int Size of the boot drive in

bytes

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.dat

a_drives

Array List of data drives

attached to the node

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.dat

a_drives[i].name

String Name of the data

drive

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.dat

a_drives[i].name

String Size of the data drive N/A N/A



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Deployment.infr

astructures.<infr

astructure_id>.n

odes.<role>.extr

a_properties

Object Arbitrary properties

associated to this node

in key-value format

whose key and values

are strings. It can be

used to set labels, for

example.

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.st

atus

String Status of the cluster. A

healthy cluster should

be in running status.

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.v

dcs

Object Object containing

information about the

VDCs deployed in the

cluster. The key is the

VDC identifier while the

value is the information

relative to this

particular VDC

N/A N/A

Deployment.infr

astructures.<infr

astructure_id>.v

dcs.<vdc_id>.po

rts


which port is assigned

to each image running

in a VDC. The key is the

image identifier in

Docker format and the

value the port in which

it can be reached in

any node of the

cluster.

N/A N/A



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Deployment.infr

astructures.<infr

astructure_id>.e

xtra_properties


associated to this

cluster in key-value

format whose key and

values are strings. It

can be used to set

labels, for example.

N/A N/A

Deployment.extr

a_properties


associated to this multi

cluster deployment in

key-value format

whose key and values

are strings. It can be

used to set labels, for

example.

N/A N/A

Deployment.stat

us

String General status of the

whole multi cluster

deployment. A healty

deployment should be

in the “running” state

N/A N/A

The Deployment field has been added to this version of the blueprint to provide

information to the different components running in the VDCs about the clusters

available to them. This field will be automatically generated by the Deployment

Engine once the clusters are initialized and it will be part of the concrete

blueprint.

Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Resources Object Set of resources

available to deploy

VDCs. These resources

are machines with

attached disks grouped

as clusters

(infrastructures) that will

Deployment

Engine

Deploym

ent



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

form different

Kubernetes clusters.

Resources.descri

ption

string Optional description for

the whole resource set

Deployment

Engine

Deploym

ent

Resources.infrast

ructures

array List of available clusters

to create or use

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].descri

ption

string Optional description for

the cluster

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].name

string Unique name for the

cluster. It will be used to

form the machines

hostnames if it need to

be created

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].type

string Type of infrastructure. A

“Cloud” value means

that the resources are

not initialized and so the

deployment engine

needs to create them as

Virtual Machines and

initialize the Kubernetes

cluster over them.

“Edge” means that the

machines are already

configured as a cluster

and the data in the

“resources” section is just

informative.

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].provid

er

object Information about the

cloud or edge provider

Deployment

Engine

Deploym

ent



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Resources.infrast

ructures[i].provid

er.api_endpoint

string Endpoint to use in case

of a cloud provider

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].provid

er.api_type

string The type of provider such

as AWS, GCP,

Cloudsigma, etc

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].provid

er.credentials

Object A key-value map with

the credentials to access

the cloud provider and

be able to create the

cluster. In case of an

“Edge” cluster type, the

existing k8s cluster

credentials must be

provided here.

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].provid

er.secret_id

string If the deployment

engine is configured to

use a vault, the

credentials can be

provided as a link to the

vault to retrieve them

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces

array List of Virtual Machines to

instantiate, in case of a

“Cloud” deployment, or

to use in case of an

“Edge” one.

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].cores

int Number of cores of the

VM

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].cpu

int CPU speed in MHz Deployment

Engine

Deploym

ent



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Resources.infrast

ructures[i].resour

ces[i].disk

int Boot disk size in MB Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].image_id

string Identifier of the boot

image to use

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].ip

string IP to assign to the VM. If

not present, a random

one will be chosen. The

provider has to have

enough free public IPs for

all of the machines since

they need to have a

fixed IP

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].name

string Unique name of the

machine. Along with the

infrastructure name, it will

be used to compose the

machine hostname

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].ram

int RAM size in MB Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].role

string Role in the kubernetes

cluster. It can be

“master” or “worker”

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].type

string Optionally the type of

machine (i.e. n1-small)

can be provided here

instead of providing the

individual features of

RAM and CPU

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].drives

array Set of data drives

attached to the

machine

Deployment

Engine

Deploym

ent



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Resources.infrast

ructures[i].resour

ces[i].drives[i].n

ame

string Unique name for the

data drive

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].drives[i].siz

e

int Size in MB of the data

drive

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].drives[i].ty

pe

string Type of the data drive

(HDD or SDD)

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].resour

ces[i].extra_pro

perties

string Key-value map to

provide arbitrary tags to

the machine. It can be

used to provide

information to the

deployment engine

about the features

installed or configured in

the boot disk or to mark

a particular node with a

particular tag.

Deployment

Engine

Deploym

ent

Resources.infrast

ructures[i].extra_

properties

object A key-value map that

can be used to provide

information to the

deployment engine or

components running

inside a VDC. This is a

place to put arbitrary

tags that can be useful

such as describe if a

cluster must be treated

as the default one to

deploy VDCs, if it is a

Deployment

Engine

Deploym

ent



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

trustable or untrustable

zone, etc.

Previously, the whole Cookbook_Appendix section was composed of the

information of this Resources field. Now this field has been added to represent

the available resources to initialize while the Deployment field represents these

same resources once they have been initialized.

The following is a continuation of the Identity_Access_Management field,

included in the Internal Structure section of the blueprint. The Cookbook

Appendix field is concerned with information relevant to the runtime of a VDC.

Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Identity_Access_

Management


identity and access

management of this

VDC

Access

Manageme

nt, Request

Monitor

Usage

Phase

Identity_Access_

Management.v

alidation_keys

List of Strings Set of Public keys that

can be used to validate

a key if the JWKS is

unreachable. (optional)

Application,

Request

Monitor,

DAL, Any

that needs

to validate a

token

Usage

Phase

Identity_Access_

Management.m

apping

List of

Objects

Set of automatic role

mappings for a provider

(optional)

In case more than one

provider is used in the

configuration, a

component can use

these rules to

automatically associate

roles based on the origin

of a token.

Request

Monitor

Usage

Phase

Identity_Access_

Management.m

apping[i].provid

er

String Name, needs to match

one from the provider list

Request

Monitor

Usage

Phase



Field Type (JSON

Format)

Description Role/

Component

Phase/

Process

Identity_Access_

Management.m

apping[i].roles

List of String Roles that this mapping

can apply

Request

Monitor

Usage

Phase

Identity_Access_

Management.m

apping[i].role_m

ap

List of

Object

Rules that can be

evaluated based for

each token.

Request

Monitor

Usage

Phase

Identity_Access_

Management.m

apping[i].role_m

ap[j].matcher

String Anko Script Rule1 Request

Monitor

Usage

Phase

This field is new and it has been introduced in this version of the schema, in order

to describe how identity access is managed and configured for this VDC. It was

inserted in the blueprint for two main purposes:

● Allowing the Request Monitor to automatically enforce access control

● Enable pre-filtering of Blueprints that a Developer has no access to

2.6 Exposed API (Blueprint Section 5) According to the DITAS architecture, the VDC interacts with the applications

through the Common Accessibility Framework API, the programming model of

which is REST-oriented. The data administrator is in charge of designing the API as

well as making it publicly available. Therefore, this section of the abstract

blueprint includes all the information about the methods, through which the

administrator exposes totally or partially the data that are stored in the sources

that he/she controls [4]. The CAF RESTful API of the VDC is written according to

the OpenAPI Specification (originally known as the Swagger specification), so

that big vendors and also new providers are able to publish their services and

components (Requirement B3.3). The OAS was extended regarding the

operation object, which in the context of DITAS corresponds to the exposed VDC

method. The JSON schema of the latter is depicted in the table below, whereas

the complete schema of the EXPOSED_API section of the blueprint is presented

in the appendix.

1 https://github.com/mattn/anko

https://github.com/mattn/anko



method

Field Type

(JSON

Format)

Description Comments

summary String A short summary

of what the

operation does

mandatory field

operationId String Unique string

used to identify

method

the same id is used to

identify each one

exposed VDC method

throughout the whole

abstract VDC blueprint,

mandatory field

parameters Array A list of the input

parameters for

this method

optional field

responses Object The list of possible

responses as they

are returned from

calling this

method

mandatory field

responses.200(or

201).description

String A short

description of the

response

mandatory field

responses.200(or

201).content.application/json

.schema

Object the schema of

the data,

included in the

response payload

mandatory field to

enable the developer

to conclude whether

the method fits to

his/her application



method

Field Type

(JSON

Format)

Description Comments

x-data-sources Array An array that

contains all the

identifiers of the

data sources (as

indicated in the

INTERNAL_STRUCT

URE.Data_Sources

field) that are

accessed by the

method

mandatory field to

enable the developer

to conclude whether

the method fits to

his/her application

x-iam-roles Array A list to show that

a client needs to

have one of

these roles to be

able to call the

method

successfully

optional field



3 Final Component Architecture and specification The VDC Blueprint (as described in the previous section) has an intricate schema

that contains a lot of information in order to be able to describe all the

functionalities and features of a VDC. Application Designers are called to select

the most suitable Blueprint and by extend VDC that fulfils their Application

Requirements. This task requires from the Application Designer to not only define

in a structured manner the requirements for the application he/she develops but

also to be able to discover in a database full of intricate blueprints the one that

suits the application. In order to aid Application Designers in the process of

finding the most suitable blueprint, the Resolution Engine was created (Figure 2).

Figure 2: Resolution Engine Architecture

As depicted in the image above (also described in D3.2 Section 4.2 [2]) the

resolution process consists of different interconnected components aiming at

filtering and ranking the Blueprints according to the Application Requirements.

Different aspects and features of the Blueprint feed the Resolution Engine.

● Content Based Resolution: The scope of this component is finding out the

most appropriate blueprints based on the type of data the VDC delivers.

This process receives as input free text that the user provides and delivers

to the follow up component a list of blueprints that matches the content,

based on the users input.

● Data Utility Resolution Engine (DURE): Filtering out based on the type of

data is a crucial step, but it is not enough in order to fulfill the needs of an

application. The quality of the data is an equally important factor that is

taken care of by this component.

● Privacy and security Evaluator (PSE): This component is responsible for

filtering and ranking security and privacy aspects of the blueprint.

● Recommendation: Given that the requirements of the application are

matched with the appropriate blueprints, it is important to be able to

recommend and rank the best candidate blueprints in order to give the

Application Developer a more sophisticated and personalized solution.



● Repository Engine: This component handles all the CRUD operations for

the blueprint repository. It is connected with the resolution process in order

to retrieve the blueprints for evaluation.

In the following sections, these components are analyzed. More specifically, we

will focus on the changes made throughout the second period of the project, on

the motivations for creating these changes, and on how they address the

requirements of the project.

3.1 VDC Blueprint Repository Engine The Data Administrator interacts with the Repository Engine, in order to perform

CRUD operations on his/her abstract VDC Blueprint(s). Using the interface, he/she

submits a blueprint that, after being evaluated by the Blueprint Validator and

found valid, is stored in the Blueprints Repository. After the Blueprint is stored, the

Repository Engine sends back a unique blueprint id, through which the

administrator is able to read, update, or delete his/her blueprint. Other DITAS roles

or components, such as the Resolution Engine, use the id to retrieve the Blueprint

from the Repository. The Creation & Storage phase of the Blueprint lifecycle is

presented in the figure below (being part of the whole lifecycle that is depicted

in Figure 1):

Figure 3: Creation & Storage phase of the Blueprint Lifecycle

With respect to the version described in D3.2 Section 4.1 [2], the component was

slightly changed, mainly to adapt to the final abstract VDC Blueprint schema

that is analyzed in this deliverable. The complete API, written according to the

Swagger specification, of the VDC Blueprint Repository Engine can be found

here:

https://github.com/DITAS-Project/VDC-Blueprint-Repository-

Engine/blob/master/VDC_Blueprint_Repository_Engine_Swagger_v3.yaml

3.1.1 VDC Blueprint Validator

The goal of this subcomponent is to enforce that the inserted or updated

blueprints are valid before they are stored in the Blueprint Repository and in case

of Bad POST or PATCH Requests to provide descriptive and helpful error messages

in order to assist the data administrator to create valid blueprints.

The validator takes as input an abstract VDC Blueprint and validates it against all

limitations and requirements that are defined. A v4 JSON schema is used to

describe the format of a blueprint along with other grammar standards and

https://github.com/DITAS-Project/VDC-Blueprint-Repository-Engine/blob/master/VDC_Blueprint_Repository_Engine_Swagger_v3.yaml

https://github.com/DITAS-Project/VDC-Blueprint-Repository-Engine/blob/master/VDC_Blueprint_Repository_Engine_Swagger_v3.yaml



specifications. The validator also checks about logical requirements such as the

following:

● each one exposed VDC method must have a unique (operation) id

○ every “method_id” that is used throughout the blueprint must be

also defined as an “operationId” in the EXPOSED_API section

● each one data source must have a unique id

○ every data source id that is used throughout the blueprint must be

also declared in the INTERNAL_STRUCTURE.Data_Sources section

3.2 VDC Blueprint Resolution Engine All of the components of the Resolution Engine (architecture, functionalities and

updates) are described in the subsections below. All these components are

connected and with a single input, the resolution produces a list of ranked

Blueprints for the Application Designer to select.

Component API

All the individual component services are described in the individual dedicated

sections. These services are initiated by a single API.

Table x: Resolution Engine method documentation

/searchBlueprintByReq

Purpose

Input Json file with the Application

Requirements:https://github.com/DITAS-Project/VDC-

Resolution-

Engine/blob/master/src/main/resources/user_reqs.json

Candidates

Array of Blueprint

Output ResultSet

Array of Blueprints and scores:

schema:

type: array

items:

type: object

properties:

blueprint:

type: object

score:

type: number

methodNames:

type: array

items:

type: string

https://github.com/DITAS-Project/VDC-Resolution-Engine/blob/master/src/main/resources/user_reqs.json





3.2.1 Content Based Resolution

Content Based Resolution is a component created in order to filter the VDC

blueprints based on the content they provide (requirement T3.10). In order to

achieve this goal elasticSearch2 was integrated in the components, which is one

of the leading solutions for content-based search.

Figure 4: Content Based Search Sequence Diagram

As depicted in Figure 4, the Content Resolution uses as input the application

requirements and more specifically the functional requirements expressed by the

application designer. The Content Resolution component then forms the

appropriate queries in order to retrieve the most suitable Blueprints from the

elasticSearch DB. The elasticSearch DB contain only snippets of the VDC

Blueprints, more specifically it contains all the fields that describe the content of

the blueprint. In this way, the search and also the retrieval of the Blueprints can

be done a lot faster. After the successful retrieval of the blueprint IDs, the Content

Resolutions contacts the repository engine in order to get the full Abstract

blueprints from the Blueprint Repository. All these blueprints as well as the

Application Requirements are then reformed in order to ensure interoperability

between the components (requirements T3.21, T3.22) and are sent to DURE for

further filtering and ranking.

As far as the architecture and interoperability with other components is

concerned, the Content based Resolution remained the same with small fixes in

the communication with the other resolution components. The main focus for this

component evolvement was more on the technologies used. More specifically,

in the second iteration of this component elasticSearch was upgraded to version

7. The reason for this upgrade was the new features of elasticSearch that were

needed for the Resolution Process in terms of functionality and performance.

Features as High level REST client as well as the script score query helped in the

2 https://www.elastic.co/

https://www.elastic.co/



query building and the overall interoperability in the components involved in the

resolution process. In addition, the ranked features of the new elasticSearch

made the filtering and ranking of the blueprints much more efficient and easy to

produce. Finally, the faster retrieval of top hits boosted the performance and

made the component even faster. It is important to mention that all these

updates are also important not only to the Content Based Resolution, but also for

the recommendation system implemented in the resolution process.

As far as the component API is concerned the service URL, the inputs, and the

outputs remain the same.

Component API

/searchBlueprintByReq_ESresponse

Purpose This method searches into specific fields on the blueprint, the

tags and also the description of the blueprint.

Input Free text: “blood test from italy”

Candidates

Array of Blueprint

Output ResultSet

Array of BlueprintUUID and scores of relevance: { "_index": "vdc_search", "_id": "VyOReGAB1xWEy8e1njck", "_score": 2.463948, }, { "_index": "vdc_search", "_id": "OiOReGAB1xWEy8e1LDdq", "_score": 1.7260926, }, { "_index": "vdc_search", "_id": "ViOReGAB1xWEy8e1lDfm", "_score": 1.1507283, }, }

3.2.2 Data Utility Resolution Engine

The Data Utility Resolution Engine (DURE) is used in the blueprint selection phase

of DITAS. This component is responsible for filtering out blueprints that do not fulfill

non-functional application requirements. In addition, the DURE ranks blueprints

based on how well they fulfill non-functional requirements.



Figure 5: Data Utility Resolution Sequence Diagram

Starting from the implementation described in D3.2 Sections 4.2.2 [2], we furtherly

improved it by dynamically updating the data utility of each blueprint before the

assessment takes place.

To this aim, application requirements are firstly passed to the Data Utility

Refinement (DUR) module that is in charge of rebalancing weights in the goal

model based on the type of application developed by the application

developer.

In addition, for each blueprint, the DURE computes its data utility by taking into

account only the columns that are relevant for the output desired by the

application developer. To do so, the DURE specifies in the

INTERNAL_STRUCTURE.Testing_Output_Data section of the blueprint which

attributes are relevant for the desired output. Then, it invokes the Data Utility

Evaluator (DUE) module that computes the new data utility values and updates

the DATA_MANAGEMENT section of the blueprint.

Once the data utility has been computed, the DURE assigns a score in the 0-1

range to the blueprint based on how well it fulfills the non-functional application

requirements. To compute the score, the DURE relies on the internal Ranker

component. The Ranker firstly transforms the goal tree into an expression whose

factors are represented by the requirements (we refer to D3.2 Sections 4.2.2 for

the details on how the expression was produced). Then, for each requirement,

the Ranker estimates how well the blueprint fulfills it. This is done by the Ranker

itself for requirements related to data utility. Instead, the assessment of security

and privacy requirements is performed by invoking the Privacy and Security

Evaluator (PSE) module.



Once the assessment is done, if the rank assigned to the blueprint is 0, the

blueprint is discarded. Otherwise, the goal model specified in the application

requirements is customized for that specific blueprint, and then is inserted into the

ABSTRACT_PROPERTIES section of the blueprint. To do so, the DURE relies on

internal Pruner component. The Pruner customizes the goal tree by pruning

leaves associated to the requirements that cannot be fulfilled by the blueprint.

Component API

POST /v2/filterBlueprints

Purpose This method allows to filter and rank blueprints according to

non-functional requirements

Input ApplicationRequirements

JSON document specifying the application requirements

Candidates

Array of pairs Blueprint, MethodName

Blueprint

JSON document describing the blueprint (see Chapter 3 of

this document)

MethodNames

Array of string indicating the name of the methods in the

blueprint that fulfill the functional requirements

Output ResultSet

Array of tuples Blueprint,Score, MethodNames

Blueprint

JSON document describing the blueprint (see Chapter 3 of

this document), updated with goal model and non-

functional application requirements

Score

Double indicating the rank (in the 0-1 range) assigned to

the blueprint

MethodNames

Array of string indicating the name of the methods in the

blueprint that fulfill the functional requirements

3.2.3 Privacy Security Evaluator

The Privacy Security Evaluator Service (PSES) is used in the blueprint selection

phase of DITAS. Specifically, the PSES addresses the DITAS requirement T3.19, by

determining if and how well privacy and security attributes fit the application

designer requirements.

The PSES, therefore, is responsible for filtering and ranking the security- and

privacy-related properties (see section 2.2) of a Blueprint. The PSES is mainly used

by the DURE during Blueprint selection.



The PSES is built as a stateless microservice, which allows it to be easily scalable

and deployable. It is built on top of Java Spring boot and provides a REST API to

perform the filtering and ranking process. We investigated both serverless

functions-as-a-service approaches for the PSES as well as a containerized

microservice. Serverless approaches would have the benefit to offer high

scalability with an attractive cost model during low usage of the service [5][6].

However, for the current DITAS provider model, it is more sensible for now to use

the self-managed approach mainly because DITAS does not come with an

infrastructure that is required for such an approach. Such as a serverless

framework (e.g., Fission3, KNative4) and supporting services (e.g., scalable data

storage, scalable messaging). Adding such an infrastructure only for one

component is not cost effective but could be beneficial for future versions of

DITAS.

The service consists of the following components (Figure 6): A REST controller

handles parsing incoming requests as well as all result representations. An

evaluator service, in turn, uses the filter services in combination with the ranking

service to generate the final result.

The filter services apply several field filter to each blueprint metric to remove

blueprints that do not match the required security or privacy. Lastly, the Ranking-

Service can use a Ranking-Strategy to order the remaining blueprint properties.

We implemented multiple strategies that can be changed at runtime,

depending on the needs of the DITAS administrator (e.g. enforcement of

minimum security standards). The overall process can be seen in Figure 7.

Figure 6: Simplified Architecture

3 https://fission.io/ 4 https://cloud.google.com/knative/

Rest Controller

Evaluator Service

Filter Service

Ranking Service

Ranking Strategy

Field Filter

1

1

1 1

n

https://fission.io/

https://cloud.google.com/knative/



Figure 7: Filtering process of PSE Sequence Diagram

Component API

POST /v1/filter

Purpose This method filters and ranks one or more security or

privacy related blueprints properties in terms of specified

user requirements.

Input User Requirement (Object) and blueprintMetrics (Array)

{ "requirement": { "id": "1", "name": "<any>", "type": "<any>", "properties": { { "name": "<any>", "unit": "<any>", "value": "<any>" }, ... } }, "blueprintMetrics": [ { "type": "object", "description": "<any>",



"properties": { "id": "blue1", "name": "<any>", "type": "<any>", "properties": { { "name": "<any>", "unit": "<any>", "value": "<any>" }, ... } } }, ... ] }

Output Result [ { "blueprint": { "id": "blue1", ... ] }, "score": <num> }, ... ]

3.2.4 Recommendation Component

Recommendation Component is the final step of the resolution process. It takes

as input the list of ranked and filtered Blueprints, as well as the Application

Requirements from the previous steps and reforms them (Requirements T3.21,

T3.22) in order to produce the complex queries that will produce the user-based

score(recommendation) of the blueprints.



Figure 8: Recommendation Component Sequence Diagram

The architecture as well as the technologies and features of these components

were described in the previous deliverable (D3.2 Section 4.2 [2]). As described in

section 3.2.1, the main focus of the development was the incorporation of

ElasticSearch 7 as well as the interoperability with the other resolution

components. In Figure 8, the sequence in which the recommendation system

produces the final blueprint score is depicted. For every Blueprint that is in the list

of candidates, the recommendation module queries the purchase repository in

order to find the purchase history of every blueprint and also the application

requirements of the users that bought it. After correlating the stored application

requirements with the current ones, it produces a score taking under strong

consideration this correlation. In this way, it produces a recommendation that

depends on what the users needed from the specific Blueprint and how well this

Blueprint fulfilled the needs of the application. By correlating the different

requirements, the system takes under strong consideration the scores of the users

that had similar application requirements. This allows the recommendation

system to produce more user centric recommendations rather than the

technical filtering and raking that the other components produce.



Component API

POST /rateBlueprint

Purpose This method compares the user requirements of other users that

acquired and used each proposed blueprint with the user

requirements of our current user and rates them according to their

similarity rating in combination with their user rating.

Input The proposed blueprint list from DURE as a JSON Array and a JSON

object representing the user requirements of our current user.

{ "requirements": { "id": "1", "name": "<any>", "type": "<any>", "properties": { { "name": "<any>", "unit": "<any>", "value": "<any>" }, ... } }, "blueprintList": [ { "blueprint": { "id": "blue1", ... ] }, "score": <num> }, ... ] }

Output [ { "blueprint": { "id": "blue1", ... ] }, "score": <num>, "rating": <num> }, ... ]



4 Data Access Layer (DAL) As already described in the Architecture deliverable D1.2 [4], DAL is an element

of a VDC, whose role is to expose the data provided by the Data Administrator

of the DITAS-EE infrastructure without violating privacy and security constraints. In

fact, the DAL includes the Privacy Enforcement Layer, which is the component

in charge of rewriting the SQL, which is required to be executed in order to satisfy

the call coming from the Processing Layer, to a SQL that avoids returning the

data that should not be seen externally for a given purpose. This filtering is

affected mainly by the location of the VDC and purpose of the access. In fact,

there is a possibility to move the computation, i.e., the processing and the CAF

layer, and this could affect the data that can be transmitted. For this reason, an

important assumption about the DAL requires that this layer is deployed in the

same place, where the data is stored, i.e., it is invariant of the computation

movement. DAL is always deployed in the same security and privacy realm of

the data source made available by the data administrator and it is in providing

the required connectivity of the data source to the VDC processing while

enforcing the privacy policies.

DAL is also used in data movement by the Data Movement Enactor. Focusing on

the data movement, in case the strategy is to duplicate the data source

somewhere else (e.g., on the premises of the consumer), the DAL firstly ensures

that only the data that can be stored at that location are replicated. Secondly,

a new instance of the DAL is instantiated at the new location to perform access

control after the data is moved. Data movement process initiated by Data

Movement Enactor (DME) which uses DAL API to move the data from original

data source to target data source. If original data source can still change, data

movement will be performed continuously step by step, each step part of the

data will be moved. Data that should be moved during single data movement

step will be described by SQL query. As data movement is a continuous process,

DME (or other component) will contact DAL repeatedly to keep data movement

going. Both the DAL at the original location and the DAL at the data movement

target have to expose the same API to the processing layer of the VDC, but they

might have to comply with different restrictions based on their privacy zone. For

example, data might be moved from the private hospital cloud to the public

cloud, and the same VDC for the researcher application should be able to

retrieve data from it. However, whereas in the private cloud the data might have

been stored in plaintext, in the public cloud it might be stored encrypted. Then

the DAL should be able to operate on plaintext data in the private cloud and on

encrypted data in the public cloud. To this end, the DAL should contain both the

flow of access to plaintext data and the flow of access to encrypted data,

basing the choice of the flow on the concrete blueprint, from which it is created.

In the above data movement example, when running in private cloud before



the movement the DAL would use the plaintext mode, and running in the public

cloud after the movement it will use the encrypted mode.

Figure 9: Initialization of DAL Data Movement Sequence Diagram

Figure 10: Finalization of DAL Data Movement Sequence Diagram

Privacy Enforcement Engine acts as a proxy before executing the query over the

data. It rewrites the query so that it returns only data compliant with privacy

policies, evaluated together with user identity information. To this end, the



original query is augmented with filters based on policies and on additional

attributes of the request or the data, such as the data subject consent.

Figure 11: Data transformation Sequence Diagram

In addition, the Privacy Enforcement Engine creates encryption properties that

are later used by the DAL in order to activate decryption when reading data

frames by DAL upon application data access, and encryption when writing data

frames by DAL during data movement.



Figure 12: DAL Interconnection with CAF and Privacy Enforcement Engine

The protocol of communication between the DAL and the rest of the VDC is

gRPC since, on the one hand, it is generic enough and supports well both

request-response model and streaming and, on the other hand, it can be more

efficient than plain REST over HTTP. The interface of the DAL to the processing

layer is described by a protobuf, and both server and client code are generated

based on it. This helps maintain consistency between the DAL API and the data

processing DAL client.

The DAL component indirectly addresses the requirement T3.15, by allowing

moving computation to a different network, and the VDC would still be able to

access the data stores.



Component API

service QueryService {

rpc query (QueryRequest) returns (QueryReply) {}

}

Purpose This method runs the supplied query on the data sources

managed by this DAL.

Input message QueryRequest {

DalMessageProperties dalMessageProperties = 1;

DalPrivacyProperties dalPrivacyProperties = 2;

string query = 3;

repeated string queryParameters = 4;

}

DAL message properties include properties common to all

DAL messages.

DAL privacy properties include properties, based on which

Policy Enforcement Engine will make policy decisions, such as

whether the data is in private or public zone.

Query is the query for fetching the data from data sources.

Output message QueryReply { repeated string values = 1;

}

service DataMovementService {

rpc startDataMovement (StartDataMovementRequest) returns

(StartDataMovementReply) {}

rpc finishDataMovement (FinishDataMovementRequest) returns

(FinishDataMovementReply) {}

}

This DAL service enacts part of the operations needed for data movement. Its

methods are called by the data movement enactor to start data movement

and to finish it. When data movement is started, a parquet file will be created by

the source DAL with all the data that needs to move. When data movement is

finished, the parquet file will be read by the target DAL and persisted at the data

sources.

startDataMovement()

Purpose This method is called by the data movement enactor to start data

movement.



Input message StartDataMovementRequest {


DalPrivacyProperties sourcePrivacyProperties = 2;

DalPrivacyProperties destinationPrivacyProperties = 3;

string query = 4;


string sharedVolumePath = 6;

}

Source and destination privacy properties specify whether the

source and target data sources are in private or public zone. Policy

enforcement engine will use this information to base its policy

decisions on.

The query specifies the query to run on the data source in order to

extract the data to be moved.

Shared volume path is the volume shared between the source and

the target DAL for sharing the data to be moved.

Output message StartDataMovementReply {

}

finishDataMovement()

Purpose This method is called by the Data Movement Enactor to finish data

movement.

Input

message FinishDataMovementRequest {


DalPrivacyProperties sourcePrivacyProperties = 2;

DalPrivacyProperties destinationPrivacyProperties = 3;

string query = 4;


string sharedVolumePath = 6;

string targetDatasource = 7;

}



Source and destination privacy properties specify whether the

source and target data sources are in private or public zone. Policy

enforcement engine will use this information to base its policy

decisions on.

The query specifies the query to run on the data frame in the file

shared between the source and target DALs.

Shared volume path is the volume shared between the source and

the target DAL for sharing the data to be moved. Target datasource specifies which datasource will accept the persisted data.

Output message FinishDataMovementReply { }



5 Application Profiling and Deployment Strategies The decision about the deployment of the VDC and all the components related

to the management of the access to the data sources is not trivial. When a

blueprint is selected by an application developer and during its lifecycle,

decisions on the deployment should be based on the knowledge about the

typical usage of the VDC by the application requiring access to the data source.

The usage of this knowledge enables to exploit relevant information such as the

frequency of the requests to access the data source, the portion of the data

source usually accessed by the application, and the typical violations of the

Data Utility expressed by the consumer.

In order to support the decisions about the VDC deployment, relevant

information is collected by the Application Profiling activity.

The application profiling aims at gathering together relevant information

collected from different repositories (e.g., the concrete blueprint, the monitoring

data, and the analytics) to make them available as an overview of the VDC

instance behavior. This information is useful to describe the requirements of the

application using the data through the VDC, as well as the typical interaction

between the application and the VDC in accessing these data.

The information collected in the Application Profile can be used in two different

phases:

● at deployment time: the application profile provides valuable input for the

deployment decisions.

● at run time: the application profile supports the Decision System for Data

and Computation Movement (DS4M) when selecting a movement action

for satisfying the application requirements.

The application profile is indirectly created by the interaction between the DITAS

platform and the Application Owner. It can be considered as a virtual metadata,

since it is generated by gathering together data already generated by other

components of the DITAS architecture.

More in detail, the Application Profile is composed of:

● a task description. When expressing the application requirements, the

application designer describes the application requiring access to the

data. This description, used by the DURE and by the DUR is important in the

deployment phace to select the Data Utility metrics relevant for the

application purposes and is stored as an information, which can be used

also at run time.

● the application requirements. Together with a general classification of the

task, the application developer expresses the functional and non-

functional requirements for the application. While the functional

requirements are used at deployment time to filter the VDC blueprints

fitting the application request, the non-functional requirements are used

also at run-time to validate the proper management of the application.

For each metric composing the application requirements, a value

constraint is expressed representing the desired upper or lower limit for that

metric. This information is stored in the profile.

● the application SLA established at deployment time. Similarly, to the

application requirements, the SLA contains the upper or lower limit for the



Data Utility dimensions, but this value represents the agreement between

the application designer requirements and the data administrator

capabilities and might differ from the initial requirements. In addition, this

information is stored in the application profile.

● the application execution logs. At run time, the monitoring component

and the analytics collect data about the requests of the application to

the platform and their outcome in terms of Data Utility. The execution logs

are relevant to discover typical issue related to the application requests,

persistent in time, and can be exploited to improve the requirements

satisfaction by suggesting a new data source or by modifying the

application deployment.

Analyzing the information collected by the Application Profiling, it is possible to

give insights on which is the typical behavior that might be expected by the

system focusing on two main aspects:

● Relevant data: not all the data provided by the data source are used by

the application. When deciding which data to move from the edge to

the cloud and vice-versa, the knowledge about the frequently accessed

data should be taken into account. This knowledge is relevant to improve

the performance of the data retrieval. As an example, when a data

movement is needed in a different location, we can copy to the new

source the data that are more likely to be used in the next future instead

of moving the whole data source.

● Data and computation resources reliability: in a fog environment, we are

seldom subject to not reliable connections between the cloud and the

edge. It means that some resources can be offline at some point in time

making the communication between the cloud and the edge impossible.

Observing the typical behavior of the resources in terms of connectivity

and reliability, we can prevent connectivity issues by using this information

when deciding where to place the data and the computation.



6 DITAS SDK The major goal of the DITAS SDK is to manage the life-cycle of the VDC that is

directly connected to the life-cycle of the VDC Blueprint (i.e., the descriptor of

the VDC) which goal is manifold:

● to describe the characteristics of the exposed data sources

● to support the application designer when looking for the dataset that

could be interesting for his/her purposes.

● to support the DITAS execution environment to properly deploy all the

components composing the VDC needed to expose the data.

Different roles are involved in the management of the VDC:

● The data administrator is the owner of data sources and has complete

knowledge of them. The data administrator takes advantage of DITAS to

enable the provisioning of some of the internal data that s/he would like

to make accessible by other subjects. Depending on the subject and the

consent of usage, the visibility on these data can be partial or total. With

DITAS, the data administrator can simplify the process of making her/his

data available as, through the VDC, the DITAS platform is able to optimize

the data provisioning by means of data and computation movement. In

fact, the data administrator has only the task to define the exposed API,

i.e., the Common Access Framework (CAF), reflecting the methods to

access the data.

● The application developer is the actor in charge of creating the VDC.

Based on the data sources made available by the data administrator s/he

responsible for defining the code able to expose the API defined by the

data administrator. Depending on the case, the data processing

developed can be a simple connection to the provided data sources or

complex data analytics. As a result, the application developer is able to

provide a complete specification of a VDC. It is worth noting, that in

several cases the same actor will hold both the data administrator and

the application developer roles.

● The application designer represents the service consumer and her/his

goal is twofold. On the one hand, the goal is to select the most suitable

VDC with respect to her/his requirements. For this reason, the DITAS

platform has to provide a matchmaker able to compare the application

requirements and the capabilities offered by a VDC. This matchmaking is

mainly driven by the data utility, which encompasses the quality of

service, quality of data, and reputation aspects. On the other hand,

she/he has to check if the VDC is really providing what has been promised

both according to functional and non-functional perspective.

● The DITAS operator is responsible for the run-time platform; this includes the

responsibility for maintaining the applications running. The system operator

has no specific application or data knowledge, but rather dependent on

the monitoring tools to verify that all the applications are properly running,

to monitor the corrective actions the DITAS platform is taking, and to



provide feedback at design-time by suggesting refinements of the data

utility specification.

For each of these roles, DITAS provides a dedicated SDK5, which simplifies the life

of the actors involved in the management of the related VDC. Depending on

the role, the SDK is provided in different flavors: e.g., CLI, GUI, web applications.

Details on the SDK offered for each of the roles follow:

● SDK for Data Administrator6

● SDK for VDC Developer (Application Developer)7

● SDK for Application Designer8

● SDK for DITAS Operator9

5 https://www.ditas-project.eu/wiki/ditas-sdk/ 6 https://www.ditas-project.eu/wiki/guide-for-data-administrator/ 7 https://www.ditas-project.eu/wiki/guide-for-vdc-developer/ 8 https://www.ditas-project.eu/wiki/guide-for-application-designer/ 9 https://www.ditas-project.eu/wiki/guide-for-ditas-operator/

https://www.ditas-project.eu/wiki/ditas-sdk/



7 Conclusions Virtualizing the data sources and creating an end-to-end system that facilitates

all the appropriate functionalities in order to create, discover, deploy, and

monitor a VDC requires several components. Creating all these components as

well as ensuring the interoperability between them is one of the main focuses of

the development in this work package. Although creating the components is

essential to the project, creating an SDK that can aid all the appropriate parties

to reproduce and run the system, is of great importance also. This SDK contained

all the services, guidelines and UI documentation essential to the usability of the

DITAS platform. Taken under consideration all the established requirements as

well as the new and reformed ones that were shaped throughout the project,

the components were extended or reshaped in order to fulfill them. Also, in the

context of this document, the DAL, which is a component that was established

later in the course of the project, was fully described, with all the functionalities

that it provides. As far as the SDK is concerned, since the project is evolving

rapidly and new functionalities or changes in the existing ones are made, a

number of dedicated wiki pages that can be easily updated and can be

accessed by the public were established in order to document all the

appropriate information for the SDK.



8 References

[1] Deliverable D2.2 of DITAS project: “DITAS Data Management – second

release”. © DITAS Consortium, 2018.

[2] Deliverable D3.2 of DITAS Project: “Data Virtualization SDK prototype (initial

version)”. © DITAS Consortium, 2018.

[3] Deliverable D4.2 of DITAS Project: “Execution environment prototype (first

release)”. © DITAS Consortium, 2018.

[4] Deliverable D1.2 of DITAS Project: “Final DITAS architecture and validation

approach”. © DITAS Consortium, 2019.

[5] Werner, Sebastian, Jörn Kuhlenkamp, Markus Klems, Johannes Müller, and

Stefan Tai. "Serverless Big Data Processing using Matrix Multiplication as

Example." In 2018 IEEE International Conference on Big Data (Big Data),

pp. 358-365. IEEE, 2018.

[6] Kuhlenkamp, Jörn, and Sebastian Werner. "Benchmarking FaaS Platforms:

Call for Community Participation." In 2018 IEEE/ACM International

Conference on Utility and Cloud Computing Companion (UCC

Companion), pp. 189-194. IEEE, 2018.



Appendix

Final Abstract VDC Blueprint Schema

{

"type":"object",

"description":"This is a VDC Blueprint which consists of five

sections",

"properties":{

"INTERNAL_STRUCTURE":{

"type":"object",

"description":"General information about the VDC

Blueprint",

"properties":{

"Overview":{

"type":"object",

"properties":{

"name":{

"type":"string",

"description":"This field should contain the

name of the VDC Blueprint"

},

"description":{

"type":"string",

"description":"This field should contain a

short description of the VDC Blueprint"

},

"tags":{

"type":"array",

"description":"Each element of this array

should contain some keywords that describe the functionality of each

one exposed VDC method",

"items":{

"type":"object",

"properties":{

"method_id":{

"type":"string",

"description":"The id (operationId) of

the method (as indicated in the EXPOSED_API.paths field)"

},

"tags":{

"type":"array",

"items":{

"type":"string"

},



"minItems":1,

"uniqueItems":true

}

},

"additionalProperties":false,

"mandatory":[

"method_id",

"tags"

]

},

"minItems":1,

"uniqueItems":true

}

},


"required":[

"name",

"description",

"tags"

]

},

"Data_Sources":{

"type":"array",

"items":{

"type":"object",

"properties":{

"id":{

"type":"string",

"description":"A unique identifier"

},

"description":{

"type":"string"

},

"location":{

"enum":[

"cloud",

"edge"

]

},

"class":{

"enum":[

"relational database",

"object storage",

"time-series database",

"api",

"data stream"

]



},

"type":{

"enum":[

"MySQL",

"Minio",

"InfluxDB",

"rest",

"other"

]

},

"parameters":{

"type":"object",

"description":"Connection parameters"

},

"schema":{

"type":"object"

}

},

"required":[

"id"

]

},

"minItems":1,

"uniqueItems":true

},

"Methods_Input":{

"type":"object",

"description":"This filed contains the part of the

data source that each method needs to be executed",

"properties":{

"Methods":{

"type":"array",

"description":"The list of methods",

"items":{

"type":"object",

"properties":{

"method_id":{

"type":"string",

"description":"The id (operationId) of

the method (as indicated in the EXPOSED_API.paths field)"

},

"dataSources":{

"type":"array",

"description":"The list of data

sources required by the method",

"items":{

"type":"object",



"properties":{

"dataSource_id":{

"type":"string",

"description":"The id of the

data sources (as indicated in the Data_Sources field)"

},

"dataSource_type":{

"type":"string",

"description":"The type of

the data sources (relationa/not_relational/object)"

},

"database":{

"type":"array",

"description":"the list of

databases required by a method in a data source",

"items":{

"type":"object",

"properties":{

"database_id":{

"type":"string",

"description":"The

id of the database"

},

"tables":{

"type":"array",

"description":"the

list of tables/collections required by a method in a data source",

"items":{

"type":"object",

"properties":{

"table_id":{

"type":"string",

"description":"The id of the tables/collection "

},

"columns":{

"type":"array",

"items":{

"type":"object",

"properties":{

"column_id":{



"type":"string",

"description":"The id of the column/field"

},

"computeDataUtility":{

"type":"boolean",

"description":"True if it is required for data utility computation"

}

}

}

}

}

}

}

}

}

}

}

}

}

}

}

}

}

},

"Flow":{

"type":"object",

"description":"The data flow that implements the

VDC",

"properties":{

"platform":{

"enum":[

"Spark",

"Node-RED"

]

},

"parameters":{

"type":"object"

},

"source_code":{

}

}



},

"DAL_Images":{

"description":"Docker images that must be deployed in

the DAL indexed by DAL name. It will be used to compose the service

name and the DNS entry that other images in the cluster can access

to.",

"type":"object",


"description":"Information about the DAL including

its original location",

"type":"object",

"required":[

"original_ip"

],

"properties":{

"original_ip":{

"description":"IP of the original DAL's

location",

"type":"string"

},

"images":{

"description":"Set of images to deploy

indexed by the image identifier",

"type":"object",


"description":"ImageInfo is the

information about an image that will be deployed by the deployment

engine",

"type":"object",

"required":[

"image"

],

"properties":{

"external_port":{

"description":"Port in which this

image must be exposed. It must be unique across all images in all

the ImageSets defined in this blueprint. Due to limitations in k8s,

the port range must be bewteen 30000 and 32767",

"type":"integer",

"format":"int64"

},

"image":{

"description":"Image is the image

name in the standard format [group]/<image_name>:[release]",

"type":"string"

},

"internal_port":{



"description":"Port in which the

docker image is listening internally. Two images inside the same

ImageSet can't have the same internal port.",

"type":"integer",

"format":"int64"

}

}

}

}

}

}

},

"VDC_Images":{

"$ref":"#/properties/INTERNAL_STRUCTURE/properties/DAL_Images/additi

onalProperties/properties/images"

},

"Identity_Access_Management":{

"type":"object",

"properties":{

"jwks_uri":{

"type":"string"

},

"iam_endpoint":{

"type":"string"

},

"roles":{

"type":"array",

"items":{

"type":"string"

},

"minItems":1

},

"provider":{

"type":"array",

"items":{

"type":"object",

"properties":{

"name":{

"type":"string"

},

"type":{

"type":"string"

},

"uri":{

"type":"string"

},



"loginPortal":{

"type":"string"

}

},

"required":[

"name",

"uri"

]

},

"minItems":1

}

},

"required":[

"jwks_uri",

"iam_endpoint"

]

},

"Testing_Output_Data":{

"type":"array",

"items":{

"type":"object",

"properties":{

"method_id":{

"type":"string",

"description":"The id (operationId) of the

method (as indicated in the EXPOSED_API.paths field)"

},

"zip_data":{

"type":"string",

"description":"The URI to the zip testing

output data for each one exposed VDC method"

}

},


"required":[

"method_id",

"zip_data"

]

},

"minItems":1,

"uniqueItems":true

}

},


"required":[

"Overview",

"Data_Sources"



]

},

"DATA_MANAGEMENT":{

"description":"list of methods",

"type":"array",

"items":{

"type":"object",

"properties":{

"method_id":{

"description":"The id (operationId) of the method

(as indicated in the EXPOSED_API.paths field)",

"type":"string"

},

"attributes":{

"type":"object",

"description":"goal trees",

"properties":{

"dataUtility":{

"type":"array",

"items":{

"type":"object",

"description":"definition of the metric",

"properties":{

"id":{

"description":"id of the metric",

"type":"string"

},

"name":{

"description":"name of the metric",

"type":"string"

},

"type":{

"description":"type of the metric",

"type":"string"

},

"properties":{

"type":"object",

"description":"properties related

to the metric",


"type":"object",

"description":"properties

related to the metric",

"properties":{

"unit":{

"description":"unit of

measure of the property",



"type":"string"

},

"maximum":{

"description":"lower limit

of the offered property",

"type":"number"

},

"minimum":{

"description":"upper limit

of the offered property",

"type":"number"

},

"value":{

"description":"value of

the property",

"type":[

"string",

"number",

"array",

"boolean"

]

}

}

}

}

}

}

},

"security":{

"$ref":"#/properties/DATA_MANAGEMENT/items/properties/attributes/pro

perties/dataUtility"

},

"privacy":{

"$ref":"#/properties/DATA_MANAGEMENT/items/properties/attributes/pro

perties/dataUtility"

}

}

}

},

"required":[

"method_id",

"attributes"

]

}

},



"ABSTRACT_PROPERTIES":{

},

"COOKBOOK_APPENDIX":{

"description":"CookbookAppendix is the definition of the

Cookbook Appendix section in the blueprint",

"type":"object",

"required":[

"Resources",

"Deployment"

],

"properties":{

"Identity_Access_Management":{

"type":"object",

"properties":{

"validation_keys":{

"type":"array",

"items":{

"type":"object"

}

},

"mapping":{

"type":"array",

"items":{

"oneOf":[

{

"type":"object",

"properties":{

"provider":{

"type":"string"

},

"roles":{

"type":"array",

"items":{

"type":"string"

}

},

"role_map":{

"type":"array",

"items":{

"type":"object",

"properties":{

"matcher":{

"type":"string"

},

"roles":{

"type":"array",



"items":{

"type":"string"

}

},

"priority":{

"type":"number"

}

}

}

},

"mapping_url":{

"enum":[

""

]

}

},

"required":[

"role_map"

]

},

{

"type":"object",

"properties":{

"provider":{

"type":"string"

},

"roles":{

"type":"array",

"items":{

"type":"string"

}

},

"mapping_url":{

"type":"string"

},

"role_map":{

"enum":[

""

]

}

},

"required":[

"mapping_url"

]

}

]

}



}

},

"required":[

"mapping"

]

},

"Deployment":{

"description":"DeploymentInfo contains information of

a deployment than may compromise several clusters",

"type":"object",

"required":[

"id"

],

"properties":{

"extra_properties":{

"type":"object",

"title":"ExtraPropertiesType represents extra

properties to define for resources, infrastructures or deployments.

This properties are provisioner or deployment specific and they

should document them when they expect any.",


"type":"string"

}

},

"id":{

"description":"Unique ID for the deployment",

"type":"string",

"uniqueItems":true

},

"infrastructures":{

"description":"Lisf of infrastructures, each

one representing a different cluster.",

"type":"object",


"type":"object",

"title":"InfrastructureDeploymentInfo

contains information about a cluster of nodes that has been

instantiated or were already existing.",

"required":[

"id",

"type",

"provider",

"Nodes"

],

"properties":{

"Nodes":{



"description":"Set of nodes in the

infrastructure indexed by role",

"type":"object",


"type":"array",

"items":{

"description":"NodeInfo is the

information of a virtual machine that has been instantiated or a

physical one that was pre-existing",

"type":"object",

"required":[

"ip",

"drive_size"

],

"properties":{

"cores":{

"description":"Number of

cores.",

"type":"integer",

"format":"int64"

},

"cpu":{

"description":"CPU speed

in Mhz.",

"type":"integer",

"format":"int64"

},

"data_drives":{

"description":"Data drives

information",

"type":"array",

"items":{

"description":"DriveInfo is the information of a drive that has been

instantiated",

"type":"object",

"required":[

"name",

"size"

],

"properties":{

"name":{

"description":"Name of the data drive",

"type":"string",

"uniqueItems":true



},

"size":{

"description":"Size of the disk in bytes",

"type":"integer",

"format":"int64"

}

}

}

},

"drive_size":{

"description":"Size of the

boot disk in bytes",

"type":"integer",

"format":"int64",

"uniqueItems":true

},


"type":"object",

"title":"ExtraPropertiesType represents extra properties to define

for resources, infrastructures or deployments. This properties are

provisioner or deployment specific and they should document them

when they expect any.",


"type":"string"

}

},

"hostname":{

"description":"Hostname of

the node.\nrequiered:true",

"type":"string",

"uniqueItems":true

},

"ip":{

"description":"IP assigned

to this node.",

"type":"string",

"uniqueItems":true

},

"ram":{

"description":"RAM

quantity in bytes.",

"type":"integer",

"format":"int64"

},

"role":{



"description":"Role of the

node. Master or slave in case of Kubernetes.",

"type":"string",

"example":"master"

}

}

}

}

},

"VDM":{

"description":"Set weather the VDM is

running in this cluster or not",

"type":"boolean"

},


"type":"object",

"title":"ExtraPropertiesType

represents extra properties to define for resources, infrastructures

or deployments. This properties are provisioner or deployment

specific and they should document them when they expect any.",


"type":"string"

}

},

"id":{

"description":"Unique infrastructure

ID on the deployment",

"type":"string",

"uniqueItems":true

},

"name":{

"description":"Name of the

infrastructure",

"type":"string"

},

"provider":{

"description":"CloudProviderInfo

contains information about a cloud provider",

"type":"object",

"required":[

"api_endpoint"

],

"properties":{

"api_endpoint":{

"description":"Endpoint to use

for this infrastructure",

"type":"string"



},

"api_type":{

"description":"Type of the

infrastructure. i.e AWS, Cloudsigma, GCP or Edge",

"type":"string"

},

"credentials":{

"description":"Credentials to

access the cloud provider. Either this or secret_id is mandatory.

Each cloud provider should define the format of this element.",

"type":"object",


"type":"string"

}

},

"secret_id":{

"description":"Secret identifier

to use to log in to the infrastructure manager.",

"type":"string"

}

}

},

"status":{

"description":"Status of the

infrastructure",

"type":"string"

},

"type":{


infrastructure: cloud or edge",

"type":"string",

"pattern":"cloud|edge"

},

"vdcs":{

"description":"Configuration of VDCs

running in the cluster, indexed by VDC identifier.",

"type":"object",


"description":"VDCInfo contains

information about related to a VDC running in a kubernetes cluster",

"type":"object",

"properties":{

"Ports":{

"type":"object",


"type":"integer",

"format":"int64"



}

}

}

}

}

}

}

},

"name":{

"description":"Name of the deployment",

"type":"string"

},

"status":{

"description":"Global status of the

deployment",

"type":"string"

}

}

},

"Resources":{

"description":"Deployment is a set of infrastructures

that need to be instantiated or configurated to form clusters",

"type":"object",

"required":[

"name",

"infrastructures"

],

"properties":{

"description":{

"description":"Optional description",

"type":"string"

},

"infrastructures":{

"description":"List of infrastructures to

deploy for this hybrid deployment",

"type":"array",

"items":{

"description":"InfrastructureType is a set

of resources that need to be created or configured to form a

cluster",

"type":"object",

"required":[

"name",

"resources"

],

"properties":{

"description":{



"description":"Optional description

for the infrastructure",

"type":"string"

},


"type":"object",






"type":"string"

}

},

"name":{

"description":"Unique name for the

infrastructure",

"type":"string",

"uniqueItems":true

},

"provider":{

"description":"CloudProviderInfo

contains information about a cloud provider",

"type":"object",

"required":[

"api_endpoint"

],

"properties":{

"api_endpoint":{

"description":"Endpoint to use

for this infrastructure",

"type":"string"

},

"api_type":{


infrastructure. i.e AWS, Cloudsigma, GCP or Edge",

"type":"string"

},

"credentials":{

"description":"Credentials to

access the cloud provider. Either this or secret_id is mandatory.

Each cloud provider should define the format of this element.",

"type":"object",


"type":"string"

}

},



"secret_id":{

"description":"Secret identifier

to use to log in to the infrastructure manager.",

"type":"string"

}

}

},

"resources":{

"description":"List of resources to

deploy",

"type":"array",

"items":{

"type":"object",

"title":"ResourceType has

information about a node that needs to be created by a deployer.",

"required":[

"name",

"disk",

"image_id"

],

"properties":{

"cores":{

"description":"Number of

cores. Ignored if type is provided",

"type":"integer",

"format":"int64"

},

"cpu":{

"description":"CPU speed in

Mhz. Ignored if type is provided",

"type":"integer",

"format":"int64"

},

"disk":{

"description":"Boot disk size

in Mb",

"type":"integer",

"format":"int64"

},

"drives":{

"description":"List of data

drives to attach to this VM",

"type":"array",

"items":{

"description":"Drive holds

information about a data drive attached to a node",

"type":"object",



"required":[

"name",

"size"

],

"properties":{

"name":{

"description":"Unique name for the drive",

"type":"string"

},

"size":{

"description":"Size

of the disk in Mb",

"type":"integer",

"format":"int64"

},

"type":{

"description":"Type

of the drive. It can be \"SSD\" or \"HDD\"",

"type":"string",

"pattern":"SSD|HDD",

"example":"SSD"

}

}

}

},


"type":"object",






"type":"string"

}

},

"image_id":{

"description":"Boot image ID

to use",

"type":"string"

},

"ip":{

"description":"IP to assign

this VM. In case it's not specified, the first available one will be

used.",

"type":"string"

},



"name":{

"description":"Suffix for the

hostname. The real hostname will be formed of the infrastructure

name + resource name",

"type":"string",

"uniqueItems":true

},

"ram":{

"description":"RAM quantity

in Mb. Ignored if type is provided",

"type":"integer",

"format":"int64"

},

"role":{

"description":"Role that this

VM plays. In case of a Kubernetes deployment at least one \"master\"

is needed.",

"type":"string"

},

"type":{

"description":"Type of the VM

to create i.e. n1-small",

"type":"string",

"example":"n1-small"

}

}

}

},

"type":{


infrastructure: Cloud or Edge: Cloud infrastructures mean that the

resources will be VMs that need to be instantiated. Edge means that

the infrastructure is already in place and its information will be

added to the database but no further work will be done by a

deployer.",

"type":"string"

}

}

}

},

"name":{

"description":"Name for this deployment",

"type":"string",

"uniqueItems":true

}

}

}



}

},

"EXPOSED_API":{

"title":"CAF API",

"type":"object",

"description":"The CAF RESTful API of the VDC, written

according to the current version (3.0.1) of the OpenAPI

Specification (OAS), but also adapted to DITAS requirements",

"properties":{

"paths":{

"type":"object",

"patternProperties":{

"^/":{

"type":"object",


"^get$":{

"allOf":[

{

"$ref":"#/properties/EXPOSED_API/definitions/method"

},

{

"properties":{

"parameters":{

}

}

}

]

},

"^post$":{

"allOf":[

{

"$ref":"#/properties/EXPOSED_API/definitions/method"

},

{

"properties":{

"requestBody":{

"type":"object",

"properties":{

"content":{

"$ref":"#/properties/EXPOSED_API/definitions/content"

}

}

}



},

"required":[

"requestBody"

]

}

]

}

}

}

}

}

},

"definitions":{

"method":{

"title":"An Exposed VDC Method",

"type":"object",

"description":"Corresponds to the Operation Object

defined in the OpenAPI Specification (OAS) version 3.0.1",

"properties":{

"summary":{

},

"operationId":{

},

"responses":{

"type":"object",


"^200$|^201$":{

"type":"object",

"properties":{

"content":{

"$ref":"#/properties/EXPOSED_API/definitions/content"

}

},

"required":[

"content"

]

}

}

},

"x-data-sources":{

"type":"array",

"description":"An array that contains all the

identifiers of the data sources (as indicated in the



INTERNAL_STRUCTURE.Data_Sources field) that are accessed by the

method",

"items":{

"type":"string"

},

"minItems":1,

"uniqueItems":true

},

"x-iam-roles":{

"type":"array",

"items":{

"type":"string"

}

}

},

"required":[

"summary",

"operationId",

"responses",

"x-data-sources"

]

},

"content":{

"type":"object",


"^application/json$":{

"type":"object",

"properties":{

"schema":{

"type":"object"

}

},

"required":[

"schema"

]

}

}

}

}

}

},


"required":[

"INTERNAL_STRUCTURE",

"DATA_MANAGEMENT",

"ABSTRACT_PROPERTIES",

"COOKBOOK_APPENDIX",



"EXPOSED_API"

]

}

Documents

D3.3 Data Virtualization SDK prototype - Ditas Project · D3.3 Data Virtualization SDK prototype 9 2 Final Abstract VDC Blueprint Schema An abstract VDC Blueprint captures all the