47
This document is issued within the frame and for the purpose of the EUXDAT project. This project has received funding from the European Union’s Horizon2020 Framework Programme under Grant Agreement No. 777549. The opinions expressed and arguments employed herein do not necessarily reflect the official views of the European Commission. This document and its content are the property of the EUXDAT Consortium. All rights relevant to this document are determined by the applicable laws. Access to this document does not grant any right or license on the document or its contents. This document or its contents are not to be used or treated in any manner inconsistent with the rights or interests of the EUXDAT Consortium or the Partners detriment and are not to be disclosed externally without prior written consent from the EUXDAT Partners. Each EUXDAT Partner may use this document in conformity with the EUXDAT Consortium Grant Agreement provisions. (*) Dissemination level.-PU: Public, fully open, e.g. web; CO: Confidential, restricted under conditions set out in Model Grant Agreement; CI: Classified, Int = Internal Working Document, information as referred to in Commission Decision 2001/844/EC. D2.4 EUXDAT e-Infrastructure Definition v2 Keywords: Data Analytics, Big Data, e-Infrastructure, Architecture, Design, EUXDAT Document Identification Status Final Due Date 31/01/2019 Version 1.0 Submission Date 29/04/2019 Related WP WP2 Document Reference D2.4 Related Deliverable(s) D2.1, D2.2, D2.3 Dissemination Level (*) PU Lead Participant F. Javier Nieto (ATOSES) Lead Author F. Javier Nieto (ATOSES) Contributors ATOSES, USTUTT, ATOSFR Reviewers Marcela Doubkova (PESSL) Fabien Castel (ATOSFR)

D2.4 EUXDAT e-Infrastructure Definition v2

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: D2.4 EUXDAT e-Infrastructure Definition v2

This document is issued within the frame and for the purpose of the EUXDAT project. This project has received funding from the European Union’s Horizon2020 Framework Programme under Grant Agreement No. 777549. The opinions expressed and arguments employed herein do not necessarily reflect the official views of the European Commission.

This document and its content are the property of the EUXDAT Consortium. All rights relevant to this document are determined by the applicable laws. Access to this document does not grant any right or license on the document or its contents. This document or its contents are not to be used or treated in any manner inconsistent with the rights or interests of the EUXDAT Consortium or the Partners detriment and are not to be disclosed externally without prior written consent from the EUXDAT Partners.

Each EUXDAT Partner may use this document in conformity with the EUXDAT Consortium Grant Agreement provisions.

(*) Dissemination level.-PU: Public, fully open, e.g. web; CO: Confidential, restricted under conditions set out in Model Grant Agreement; CI: Classified, Int = Internal Working Document, information as referred to in Commission Decision 2001/844/EC.

D2.4 EUXDAT e-Infrastructure Definition v2

Keywords:

Data Analytics, Big Data, e-Infrastructure, Architecture, Design, EUXDAT

Document Identification

Status Final Due Date 31/01/2019

Version 1.0 Submission Date 29/04/2019

Related WP WP2 Document Reference D2.4

Related Deliverable(s)

D2.1, D2.2, D2.3 Dissemination Level (*) PU

Lead Participant F. Javier Nieto (ATOSES) Lead Author F. Javier Nieto (ATOSES)

Contributors ATOSES, USTUTT, ATOSFR

Reviewers Marcela Doubkova (PESSL)

Fabien Castel (ATOSFR)

Page 2: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 2 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Document Information

List of Contributors

Name Partner

F. Javier Nieto ATOSES

Spiros Michalakopoulos ATOSES

Nico Struckmann USTUTT

Fabien Castel ATOSFR

Document History

Version Date Change editors Changes

0.1 09/01/2019 F. J. Nieto (ATOSES) Table of Contents

0.2 14/01/2019 F. J. Nieto (ATOSES) ToC update and section 2

0.3 25/01/2019 F. J. Nieto (ATOSES) Sections 2 and 3

0.4 08/02/2019 F. J. Nieto (ATOSES), F. Castel (ATOSFR), S. Michalakopoulos (ATOSES)

Updates to section 3, contributions to sections 4 and 5

0.5 22/02/2019 F. J. Nieto (ATOSES), N. Struckmann (USTUTT), S. Michalakopoulos (ATOSES)

Add section 6 contributions. Update content in sections 3 to 5.

0.6 27/02/2019 F. J. Nieto (ATOSES) Minor updates and section 7.

0.7 17/04/2019 F. J. Nieto (ATOSES) Changes according to review comments

0.8 29/04/2019 F. J. Nieto (ATOSES) Final version for quality review

0.9 30/04/2019 ATOS ES Quality review

U 30/04/2019 FINAL VERSION TO BE SUBMITTED

Quality Control Role Who (Partner short name) Approval Date

Deliverable leader F. Javier Nieto (ATOSES) 29/04/2019

Page 3: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 3 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Technical manager Fabien Castel (ATOSFR) 29/04/2019

Quality manager Susana Palomares (ATOSES) 29/04/2019

Project Manager F. Javier Nieto (ATOSES) 24/09/2019

Page 4: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 4 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Table of Contents Document Information ............................................................................................................................ 2

Table of Contents .................................................................................................................................... 4

List of Tables ........................................................................................................................................... 6

List of Figures ......................................................................................................................................... 7

List of Acronyms ..................................................................................................................................... 8

1. Executive Summary ....................................................................................................................... 11

2. Introduction .................................................................................................................................... 12

2.1 Relation to other project work ................................................................................................ 12

2.2 Structure of the document ...................................................................................................... 12

3. EUXDAT Features ......................................................................................................................... 14

3.1 Requirements Analysis ........................................................................................................... 14

3.2 Main EUXDAT Features ....................................................................................................... 15

Support for Several Data Formats ..................................................................................... 15

Algorithms and Applications Management ....................................................................... 15

Data Management and Processing ..................................................................................... 16

Security and Users Management ....................................................................................... 16

Visualization and Interaction Capabilities ......................................................................... 17

Management of Computing and Storage Resources .......................................................... 17

Extreme Data Analytics as a Service ................................................................................. 18

3.3 Mapping Requirements and Features ..................................................................................... 18

4. EUXDAT Architecture .................................................................................................................. 26

4.1 High Level Architecture ......................................................................................................... 26

4.2 Main Actors ............................................................................................................................ 27

4.3 Main Components .................................................................................................................. 27

4.4 High Level Interactions .......................................................................................................... 28

Running a data analysis in EUXDAT ................................................................................ 28

Register User ..................................................................................................................... 29

Visualize analysis in custom GUI ...................................................................................... 29

Page 5: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 5 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

4.5 Development Priorities and Roadmap .................................................................................... 30

5. Detailed Design of Main Components ........................................................................................... 32

5.1 EUXDAT Portal ..................................................................................................................... 32

5.2 Identity and Authorization Manager ...................................................................................... 33

5.3 Data and Algorithms Catalogue ............................................................................................. 33

5.4 Data and Algorithms Repository ............................................................................................ 34

5.5 Data Manager ......................................................................................................................... 35

5.6 SLA Manager ......................................................................................................................... 36

5.7 Orchestrator ............................................................................................................................ 37

5.8 Monitoring ............................................................................................................................. 38

5.9 Billing & Accounting ............................................................................................................. 39

6. EUXDAT Deployment................................................................................................................... 41

6.1 Deployment Infrastructure ..................................................................................................... 41

Deployment ........................................................................................................................ 41

Stages ................................................................................................................................. 42

API Development .............................................................................................................. 43

Services .............................................................................................................................. 43

6.2 Components Deployment ....................................................................................................... 44

7. Conclusions .................................................................................................................................... 46

8. References ...................................................................................................................................... 47

Page 6: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 6 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

List of Tables Table 1: Requirements Traceability Matrix ____________________________________________________ 19

Page 7: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 7 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

List of Figures Figure 1: EUXDAT High Level Architecture ___________________________________________________ 26 Figure 2: Running Data Analysis Sequence Diagram ____________________________________________ 28 Figure 3: User Registration Sequence Diagram _________________________________________________ 29 Figure 4: Visualize Analysis in Custom GUI Sequence Diagram ____________________________________ 30 Figure 5: EUXDAT Portal High Level Architecture ______________________________________________ 32 Figure 6: I&A Manager High Level Architecture ________________________________________________ 33 Figure 7: D&A Catalogue High Level Architecture ______________________________________________ 34 Figure 8: D&A Repository High Level Architecture _____________________________________________ 35 Figure 9: Data Manager High Level Architecture _______________________________________________ 36 Figure 10: SLA Manager High Level Architecture _______________________________________________ 37 Figure 11: Orchestrator High Level Architecture _______________________________________________ 38 Figure 12: Monitoring High Level Architecture _________________________________________________ 39 Figure 13: Billing & Accounting High-level Architecture _________________________________________ 40 Figure 14: EUXDAT Deployment ____________________________________________________________ 41 Figure 15: EUXDAT Deployment Stages ______________________________________________________ 42 Figure 16: EUXDAT API Development _______________________________________________________ 43 Figure 17: EUXDAT Deployment Services _____________________________________________________ 44

Page 8: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 8 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

List of Acronyms

Abbreviation / acronym

Description

AMQP Advanced Message Queuing Protocol

API Application Programming Interface

ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer

AWS Amazon Web Services

CD Continuous Delivery

CEP Complex Event Processing

CI Continuous Integration

CoAP Constrained Application Protocol

C-SAR CARIS Spatial Archive

DEM Digital Elevation Model

DIAS Data and Information Access Services

Dx.y Deliverable number y belonging to WP x

EBDVF European Big Data Value Forum

EC European Commission

ECMWF European Centre for Medium-Range Weather Forecasts

EDI Electronic Data Interchange

EO Earth Observation

FTP File Transfer Protocol

GDPR General Data Protection Regulation

GRD Surfer Grid File

GUI Graphical User Interface

HPC High Performance Computing

HTTPS Hypertext Transport Protocol Secure

IoT Internet of Things

JPEG Joint Photographic Experts Group

Page 9: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 9 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Abbreviation / acronym

Description

JSON JavaScript Object Notation

JWE JSON Web Encryption

JWT JSON Web Token

KPI Key Performance Indicator

L1C Level-1C

LAI Leaf Area Index

LDAP Lightweight Directory Access Protocol

LPIS Land Parcel Identification System

MODIS Moderate Resolution Imaging Spectroradiometer

MQTT Message Queuing Telemetry Transport

MSI Mass Spectrometry Imaging

NDVI Normalized Difference Vegetation Index

OEM Object Exchange Model

OTM Open Transport Map

PDF Portable Document Format

QoS Quality of Service

Q&A Questions and Answers

REST Representational State Transfer

RGB Red, Green, Blue

RPAS Remotely Piloted Aircraft Systems

SLA Service Level Agreement

SLC SLiCe format

SOAP Simple Object Access Protocol

SSH Secure SHell

TIF Tagged Image File Format

TOSCA Topology and Orchestration Specification for Cloud Applications

UAV Unmanned Aerial Vehicle

UML Unified Modelling Language

VIIRS Visible Infrared Imaging Radiometer Suite

VM Virtual Machine

Page 10: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 10 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Abbreviation / acronym

Description

WMS Web Map Service

WP Work Package

XML eXtensible Markup Language

Page 11: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 11 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

1. Executive Summary This document provides an update of the high-level view of the EUXDAT e-Infrastructure. First of all, taking into account the original requirements and their update, it updates the description of the features that the project team has identified to be implemented. With such features in mind, the document proposes minor updates to the high-level architecture, together with the updated list of users that will make use of EUXDAT. The high-level interactions among the components are updated, while some new are proposed. The document also updates the description of the internal design expected for each high-level component (although in some cases there are no changes), while the design for a new high-level component is proposed. Finally, the document updates the approach proposed for deploying the e-Infrastructure components, in order to have an operational version of EUXDAT.

Page 12: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 12 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

2. Introduction During the first iteration of the project, the consortium already proposed a first version of the features to be implemented and the high-level architecture that would enable such implementation. Deliverable D2.2 [6] reported the analysis done with respect to the original requirements and the features that were necessary. Also, it proposed a high-level architecture with some high-level components, a set of actors and an initial roadmap for the implementation. D2.2 [6] also identified some crucial features and defined sequence diagrams that showed how the high-level components should interact together. Additionally, a high-level design of the components was proposed, as well as a deployment structure, so EUXDAT e-Infrastructure would be operational. Once the requirements defined originally in D2.1 [1] have been updated in D2.3 [7], and according to the outcomes of the first version of the e-Infrastructure, the consortium has updated the architecture, so it can include any missing feature and it can fit better with the actual and future implementation. This document reports on the analysis of the last version of the requirements and the update of the features, updating, in line with them, the high-level architecture, the proposed components and their high-level design. It also updates and completes the definition of interactions among components, and it updates the deployment of the EUXDAT e-Infrastructure.

2.1 Relation to another project work

As reported in D2.2 [6], the architecture definition is related to the rest of WPs in the project, since it is a centric activity:

• In WP2 (requirements gathering), as it provides the requirements to extract features to be implemented;

• In WP3 (detailed design and implementation of components), as the architecture will determine the high-level components to detail and the features to be implemented;

• In WP4 (infrastructure platform), as in WP3, the architecture defines the way to follow with respect to high level components and features;

• In WP5 (Integration and e-Infrastructure Provision to Pilots), integration activities are influenced by the architecture, as well as the pilots implementation and their usage of EUXDAT features.

2.2 Structure of the document

As in D2.2 [6], this document is structured in five major chapters Chapter 3 presents the features that have been identified to be provided by the EUXDAT e-Infrastructure, based on the requirements collected in deliverable D2.1 [1] and the updates introduced in D2.3 [7].

Page 13: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 13 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Chapter 4 updates the high-level architecture for EUXDAT, the actors and the interactions among components. Chapter 5 updates the high-level design of the components defined for the high-level architecture. Chapter 6 updates the strategy for deploying the different parts of the EUXDAT e-Infrastructure. Chapter 7, finally, presents the summary and conclusions for the current version of the EUXDAT architecture.

Page 14: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 14 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

3. EUXDAT Features 3.1 Requirements Analysis

The architecture defined in D2.2 [6] was based on the requirements collected in D2.1 [1]. Now, there is an update on the collected requirements, already reported in D2.3 [7]. Therefore, it is necessary to update the original analysis, so the architecture will be updated accordingly. There were some changes with respect to requirements for data and data processing:

• New requirement to support soil moisture data from PESSL's instrumentation. • Several requirements were merged about availability and resolution of Sentinel datasets and

time series; • New requirement for enabling statistics on multi-temporal data for given field, I.e. monthly

averaging of spatial datasets (pilot 1). Also, some new functional and technical requirements were proposed for the platform (in some cases, also directly related to the pilots as well):

• Support for structured, semi-structured and un-structured data; • Provision of RESTful interfaces for accessing processing capabilities of EUXDAT platform; • Use of containerization solutions for implementation and deployment of processing

algorithms; • Provision of Data and Processes Catalogue and Marketplace; • Data ingestion and caching in the platform; • EUXDAT shall provide an orchestration mechanism that will allow sending tasks to the

underlying infrastructure in a transparent way to EUXDAT users; • EUXDAT shall provide a web development frontend which will facilitate developers and data

processing experts’ users preparing, testing and deploying their algorithms in the platform, as well as publishing them as new services;

• EUXDAT General Frontend; • EUXDAT Pilot Application Frontend.

Some of these requirements were proposed when the pilots were defined more in detail and when the consortium discussed about how to serve functionality to stakeholders. Additionally, other requirements were proposed after discussing with such stakeholders and understanding the ways in which they would be willing to use EUXDAT. In any case, most of the new functional requirements represent functionalities that, somehow, the consortium was expecting to provide, so the list of proposed features has not been altered too much. Other are directly linked to requirements already collected, focused on data formats to be supported and data analytics.

Page 15: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 15 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

The following subsections report the updates with respect to the original features already defined in D2.2 [6]. For further details, such report should be accessed.

3.2 Main EUXDAT Features

Support for Several Data Formats

According to the original requirements, EUXDAT will support structured data, semi-structured data and unstructured data from different data sources.

EUXDAT is already able to support several data formats, which cover the three kinds of data mentioned before: Copernicus images [4], JSON responses from REST interfaces, maps with different layers, etc…

According to the initial requirements, EUXDAT was expected to support the following data:

• Sensor data;

• Drone data;

• Remote-Sensing/Geospatial data;

• Land Use and Administrative data (see [3]);

• Meteorological data.

All of those remain valid for the current list of features, and there is a new requirement about accessing PESSL data. In reality, PESSL provides data from sensors, but this is accessed through standardized interfaces through their platform (i.e. based on REST). Therefore, we consider that the original definition about supported formats is still valid, requiring for EUXDAT just to implement a concrete connector for the PESSL platform. It is possible to see more details about the data types, storage and shapes in D2.2 [6].

Algorithms and Applications Management

Data analytic algorithms

The D2.2 report already listed several algorithms that would be necessary for analysing large datasets in the context of the different pilots, with the purpose of implementing all the scenarios identified per pilot (calculation of multi- and hyperspectral indices, atmospheric corrections, data merging, fields categorization, etc). These algorithms are being implemented using several approaches. In some cases, it is code directly written by developers (i.e. in Python) and, in other cases, developers make use of some libraries that are useful for the pursued purpose (i.e. GRASS). In any case, both the libraries and the code need to be made more parallelizable, since the initial implementations do not take this into account and they are limited in terms of scalability.

Page 16: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 16 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

This will be supported now by the usage of Notebooks, that can be used by developers in order to implement some algorithms and to test them directly from the EUXDAT e-Infrastructure. The Notebooks will be connected with the EUXDAT backend in order to execute the algorithms, depending on their complexity, and developers will be able to modify their code in situ. Later on, it will be possible for them to directly move the new code to the repository (so it can be integrated with other codes or published as a new service). Finally, as originally proposed and as requested in the last requirements, the analytic functions and applications will be exposed through a marketplace, according to the owners’ conditions.

Algorithms managed as containerized applications

As already discussed in D2.2 [6], EUXDAT will use containers for packaging the applications and some algorithms, so it will be easier to deploy and execute them in different platforms. This will be especially relevant for those parts to be executed in Cloud environments, where the usage of containers is rather usual. D2.2 [6] already listed a set of functionalities related to the usage of containers (a registry of images, REST APIs to manage containers, etc.). Many of these functionalities are already available, through a Kubernetes connector. Although this solution is valid for the Cloud part, it is not the case for the HPC side. Therefore, for those cases in which the source code is stored in the EUXDAT repository, it will be possible to use Continuous Integration and Continuous Delivery mechanisms in order to generate compiled code that can be used for the HPC centres involved in EUXDAT.

Data Management and Processing

In D2.2 [6], we already presented several functionalities related to data management from the remote and local perspective. First of all, we mentioned the need of providing a data catalogue, where stakeholders would be able to find all the data they are looking for. This point has been confirmed with the inclusion of a new requirement which is requesting such catalogue. Therefore, it will be possible to access valuable metadata and to access the point in which it is effectively stored (including Copernicus data through the Mundi DIAS). As requested, EUXDAT will be able to download large amounts of data but, in order to avoid repetitive downloads, a cache mechanism will make things easier. Since the tool used for the data management allows for the definition of policies, we will allow an optimal configuration, so it will be possible to determine which datasets to keep locally and which ones to just download from the remote source. This feature was already proposed, although there is a concrete requirement for this now. Locally speaking, EUXDAT will provide private workspaces, where users will be able to read and write data (with some space limitations). In those cases in which datasets need to be used by the pilots (i.e. UAV data), EUXDAT will guarantee a local data storage for such information.

Security and Users Management

Page 17: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 17 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

User Management

In the case of users’ management, there are no big changes with respect to D2.2. The main feature to provide here is the Single-Sign-On for the different tools available in the project and the secure storage of data related to users’ accounts. Although no sensitive data is expected to be stored, we will include data protection forms as a way to be compliant with the GDPR regulation. The main change is that we do not consider LDAP as the solution to be used anymore.

Security

As mentioned in D2.2 [6], security will be based on JWT and JWE (this is, by using security tokens). There are no main changes with respect to the usage of secure interfaces. As for secure data moving, the tool selected (Rucio) is able to move data in an encrypted way.

Privacy

In this aspect, EUXDAT maintains what was described in D2.2 [6]. EUXDAT will allow for the storage of private data, so only authorized people can access certain datasets, and this will be also controlled from the catalogue said.

Visualization and Interaction Capabilities

The original set of features already included functionalities related to how to visualize data in EUXDAT. Basically, it consists on providing GUIs with maps in which it is possible to paint different layers, representing the outcomes of the algorithms executed. Document D2.2 [6] described more details about the solutions that can be used. What is certainly new is the different GUIs that will be available through the EUXDAT frontend. As required, EUXDAT will support two kinds of interfaces: a generic one that can be used for running almost any algorithm/application and customized interfaces for the pilots/scenarios and other services to be provided through EUXDAT. While in the first case the interface will be built dynamically depending on the required inputs, just showing results in a generic way, the custom interfaces will be directly linked to the frontend, but they will have interfaces totally tailored to the inputs and outputs involved, increasing usability.

Management of Computing and Storage Resources

The features proposed for the management of computing and storage resources were described with enough detail in D2.2 [6]. Even if there are new requirements that are related to this topic, the set of described features already included the requested capacities. We keep the need to orchestrate resources and select the most appropriate ones depending on the tasks to be executed, thanks to tasks profiles. The selected solution makes use of TOSCA [5] for defining

Page 18: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 18 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

the workflows. Also, aspects such as monitoring and SLAs are still important, since they will control that everything is working as expected (and SLAs may be important for certain business models). The only feature that is new is the need to provide some accounting and billing mechanism. When charging users for their usage of applications and/or resources we need to retrieve monitoring information that can tell us the resources and applications used. Once we have all this information, it will be possible to generate bills for the users and to proceed with any payment that has to be done. Payments might depend on different factors but, mainly, on the business models to apply, so that will be clarified in the context of WP7.

Extreme Data Analytics as a Service

This feature is one of the key points of EUXDAT and it is closely related to the visualization. D2.2 [6] already described how we expected to provide very large data analytics as a service. The idea is to provide the generic interface proposed in the visualization section, in such a way that it is possible to run certain data analyses just with a few input parameters. Additionally, custom interfaces will make things easier for end users. Also, as discussed in the context of the project meetings, EUXDAT will also enable a mechanism in such a way that it will be also possible to run these applications/algorithms through a web service interface, as they will be exposed as geo-services (following OGC standards). It is important to highlight that EUXDAT will provide as many as tools adapted to their usage in the context of the e-Infrastructure, such as GRASS, Orfeo Toolbox, etc… Tools, libraries, algorithms and applications will be packaged in such a way they can be used easily. Of course, the documentation about EXUDAT should be very complete, and there will be mechanisms that will support the users, such as training tools, community tools (to ask questions, share experiences, etc), examples, etc.

3.3 Mapping Requirements and Features

This section describes how requirements captured in EUXDAT D2.1 Description of Proposed Pilots and Requirements [1] match to Main EUXDAT Features described in chapter 3.2 of this deliverable. The legend is as follows: green colour marks a full match, yellow colour means partial match and white colour goes for no match.

Page 19: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 19 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Table 1: Requirements Traceability Matrix

Requirement ID Requirement Name Data

For

mat

s Su

ppor

t

Alg

orith

ms

and

App

licat

ions

Mgt

. Da

ta M

gt. a

nd

Proc

essin

g Se

curit

y an

d Us

ers

Mgt

. Vi

sual

izatio

n an

d In

tera

ctio

n

Man

agem

ent o

f Re

sour

ces

Extre

me

Data

A

naly

tics

as a

Ser

vice

EUXDAT-REQ-Pilots-DATA-001

Level-1C multi-spectral imaging products from the Sentinel-2

EUXDAT-REQ-Pilots-DATA-002

UAV-enabled hyperspectral imagery

EUXDAT-REQ-Pilots-DATA-003 Climate data

EUXDAT-REQ-Pilots-DATA-004

Dynamic cropland mask, crop type map and LAI from Sen2-Agri system

EUXDAT-REQ-Pilots-DATA-005

Copernicus European Digital Elevation Model (EU-DEM), version 1.1

EUXDAT-REQ-Pilots-DATA-006 Land use map

EUXDAT-REQ-Pilots-DATA-007 Soil map

EUXDAT-REQ-Pilots-DATA-008

Soil moisture data from Pessl's instrumentation

EUXDAT-REQ-Pilots-DATA-009 Open Land Use Map

EUXDAT-REQ-Pilots-DATA-010

Land Parcel Identification System (LPIS)

EUXDAT-REQ-Pilots-DATA-011 Hydrology for EU

EUXDAT-REQ-Pilots-DATA-012 Actual weather

EUXDAT-REQ-Pilots- Historic weather

Page 20: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 20 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Requirement ID Requirement Name Data

For

mat

s Su

ppor

t

Alg

orith

ms

and

App

licat

ions

Mgt

. Da

ta M

gt. a

nd

Proc

essin

g Se

curit

y an

d Us

ers

Mgt

. Vi

sual

izatio

n an

d In

tera

ctio

n

Man

agem

ent o

f Re

sour

ces

Extre

me

Data

A

naly

tics

as a

Ser

vice

DATA-013

EUXDAT-REQ-Pilot-001

Atmospheric correction of Multispectral Sentinel bands

EUXDAT-REQ-Pilot-002

Enable calculation of spectral indices from the 12 Sentinel multispectral bands

EUXDAT-REQ-Pilot-003

Calculation of Hyperspectral indices relevant for stress and disease

EUXDAT-REQ-Pilot-004

Availability of Sentinel-2 data at field scale/for a given polygon for given time period

EUXDAT-REQ-Pilot-005

2D visualization of time-series over selected pixels, provision of interfaces, toolkits

EUXDAT-REQ-Pilot-006

Installation of Sen2Agri system and provision of Dynamic cropland mask, crop type map and LAI

EUXDAT-REQ-Pilot-007

Enable statistics on multi-temporal data for given field, I.e. monthly averaging of spatial datasets.

EUXDAT-REQ-Pilot-008

Collecting machinery tracking data

Page 21: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 21 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Requirement ID Requirement Name Data

For

mat

s Su

ppor

t

Alg

orith

ms

and

App

licat

ions

Mgt

. Da

ta M

gt. a

nd

Proc

essin

g Se

curit

y an

d Us

ers

Mgt

. Vi

sual

izatio

n an

d In

tera

ctio

n

Man

agem

ent o

f Re

sour

ces

Extre

me

Data

A

naly

tics

as a

Ser

vice

EUXDAT-REQ-Pilot-009

Collecting of agro-meteorological data

EUXDAT-REQ-Pilot-010

Calculation of yield productivity zones

EUXDAT-REQ-Pilot-011

Zone related morphometric statistic

EUXDAT-REQ-Pilot-012

Water influence to weather conditions

EUXDAT-REQ-Pilot-013 3D visualization

EUXDAT-REQ-PLATF-001

Support for various HPC and Cloud providers

EUXDAT-REQ-PLATF-002

Monitor HPC and Cloud resources

EUXDAT-REQ-PLATF-003

Applications monitoring and profiling

EUXDAT-REQ-PLATF-004

Adequate operation of the platform

EUXDAT-REQ-PLATF-005

Optimize data movement

EUXDAT-REQ-PLATF-006

Support security and privacy in data management

EUXDAT-REQ-PLATF-007

Automated deployment and execution of applications

EUXDAT-REQ-PLATF-008

API access to pilots' data and services

EUXDAT-REQ- User management

Page 22: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 22 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Requirement ID Requirement Name Data

For

mat

s Su

ppor

t

Alg

orith

ms

and

App

licat

ions

Mgt

. Da

ta M

gt. a

nd

Proc

essin

g Se

curit

y an

d Us

ers

Mgt

. Vi

sual

izatio

n an

d In

tera

ctio

n

Man

agem

ent o

f Re

sour

ces

Extre

me

Data

A

naly

tics

as a

Ser

vice

PLATF-009

EUXDAT-REQ-PLATF-010

Access sensor observations

EUXDAT-REQ-PLATF-011

Support information modelling

EUXDAT-REQ-PLATF-012

Support integration of meta-information

EUXDAT-REQ-PLATF-013

Compliance with INSPIRE specifications

EUXDAT-REQ-PLATF-014

Compliance with GEO/GEOSS specifications

EUXDAT-REQ-PLATF-015

Integrate Web map services

EUXDAT-REQ-PLATF-016

Multiple Data Centers in the Cloud

EUXDAT-REQ-PLATF-017

Cloud Data Storage

EUXDAT-REQ-PLATF-018

Dependability

EUXDAT-REQ-PLATF-0219

Big Data Management

EUXDAT-REQ-PLATF-020

Identity Management & Access control

EUXDAT-REQ-PLATF-021

Scalability – Users growth

EUXDAT-REQ-PLATF-022

Scalability – Data growth and complex analytics

EUXDAT-REQ- Data decentralization

Page 23: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 23 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Requirement ID Requirement Name Data

For

mat

s Su

ppor

t

Alg

orith

ms

and

App

licat

ions

Mgt

. Da

ta M

gt. a

nd

Proc

essin

g Se

curit

y an

d Us

ers

Mgt

. Vi

sual

izatio

n an

d In

tera

ctio

n

Man

agem

ent o

f Re

sour

ces

Extre

me

Data

A

naly

tics

as a

Ser

vice

PLATF-023

EUXDAT-REQ-PLATF-024

Parallel data stream processing

EUXDAT-REQ-PLATF-025

Reduction in energy consumption by improved processing algorithms

EUXDAT-REQ-PLATF-026

Use of efficient hybrid architectures

EUXDAT-REQ-PLATF-027

Visualization of large amounts of data

EUXDAT-REQ-PLATF-028

Support of different formats for visualization

EUXDAT-REQ-PLATF-029

Provide rich user interfaces for the interactive visualization

EUXDAT-REQ-PLATF-030

Render high resolution data in N arbitrary dimensions

EUXDAT-REQ-PLATF-031

Personalised end-user-centric reusable data visualisation

EUXDAT-REQ-PLATF-032

Detection of abnormal sensor measurements

EUXDAT-REQ-PLATF-033

Use of high performance computing techniques to the processing of extremely huge amounts of data

EUXDAT-REQ-PLATF-034

Heterogeneous data aggregation and

Page 24: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 24 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Requirement ID Requirement Name Data

For

mat

s Su

ppor

t

Alg

orith

ms

and

App

licat

ions

Mgt

. Da

ta M

gt. a

nd

Proc

essin

g Se

curit

y an

d Us

ers

Mgt

. Vi

sual

izatio

n an

d In

tera

ctio

n

Man

agem

ent o

f Re

sour

ces

Extre

me

Data

A

naly

tics

as a

Ser

vice

normalization

EUXDAT-REQ-PLATF-035

Verification of data integrity and veracity

EUXDAT-REQ-PLATF-036

Support for structured, semi-structured and un-structured data

EUXDAT-REQ-PLATF-037

Provision of RESTful interfaces for accessing processing capabilities of EUXDAT platform

EUXDAT-REQ-PLATF-038

Use of containerization solutions for implementation and deployment of processing algorithms

EUXDAT-REQ-PLATF-039

Provision of Data and Processes Catalogue and Marketplace

EUXDAT-REQ-PLATF-040

Data ingestion and caching in the platform

EUXDAT-REQ-PLATF-041

EUXDAT shall provide an orchestration mechanism that will allow sending tasks to the underlying infrastructure in a transparent way to EUXDAT users

EUXDAT-REQ-PLATF-042

EUXDAT shall provide a web development frontend which will facilitate developers and data processing expert users

Page 25: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 25 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Requirement ID Requirement Name Data

For

mat

s Su

ppor

t

Alg

orith

ms

and

App

licat

ions

Mgt

. Da

ta M

gt. a

nd

Proc

essin

g Se

curit

y an

d Us

ers

Mgt

. Vi

sual

izatio

n an

d In

tera

ctio

n

Man

agem

ent o

f Re

sour

ces

Extre

me

Data

A

naly

tics

as a

Ser

vice

preparing, testing and deploying their algorithms in the platform, as well as publishing them as new services.

EUXDAT-REQ-PLATF-043

EUXDAT General Frontend

EUXDAT-REQ-PLATF-044

EUXDAT Pilot Application Frontend

Page 26: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 26 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

4. EUXDAT Architecture This section introduces the high-level architecture of the EUXDAT e-Infrastructure, based on the defined features. It also describes the interactions among high level component in order to implement some of the most important features.

4.1 High Level Architecture

In D2.2 [6], we already defined a high-level architecture which was representing the components that would implement the identified features, extracted from the requirements. Such architecture, although still valid from the technical perspective, required a minor modification, so it would be possible to cover all the functionalities we need EUXDAT to provide.

Figure 1: EUXDAT High Level Architecture

The only difference with respect to the previous version is the addition of a new high-level component, the ‘Billing & Accounting’, since, at some point, if we want EUXDAT to be sustainable, we will need such feature to be in place.

Page 27: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 27 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

As explained in D2.2 [6], the different colours indicate the nature of the components. The green one is used for the component related to the web interfaces, blue is for security-related components, yellow for data-related components and red for resources management related components.

4.2 Main Actors

The list of main actors for the e-Infrastructure has not changed too much since the definition in D2.2 [6]. These are the roles that we envisage for the EUXDAT e-Infrastructure:

• Administrator: It remains as defined originally. It has access to all the features and is in charge of the right operation of the e-Infrastructure, managing users’ accounts and configurations.

• Application Service Providers (ASPs): This role represents those who are providing content to the e-Infrastructure, meaning that they will upload and publish applications, algorithms and data. They will also be able to access certain monitoring information related to their creations. ASPs could potentially be from many different domains, such as Agriculture, Precision Farming, Telemetry Services Providers, Robotics… etc.

• Developer: Developers are similar to ASPs, in the sense that they provide content to EUXDAT, but with the difference that they use EUXDAT tools (such as the Notebooks) in order to develop, store, test and publish their creations.

• End User: As defined originally, this kind of user only navigates through the data and run applications, not having access to publication mechanisms.

4.3 Main Components

The main components identified are the following:

• EUXDAT Portal: It represents the main interface of EUXDAT, where it is possible to access all the features through different GUIs, which connect to the backend;

• Identity and Authorization Manager: It manages users’ accounts, controlling credential, access policies, etc.;

• Data & Algorithms Catalogue: It represents a record of the applications, datasets and tools that can be used in EUXDAT;

• Data & Algorithms Repository: This component deals with the storage of the main elements involved in EUXDAT: datasets, code, maps, etc., to be used by the users;

• Data Manager: This component takes care of moving/copying data as required, through the adequate APIs and connectors, supporting a good set of heterogeneous data sources;

• SLA Manager: It determines the quality attributes to consider and it is continuously checking that the agreements are fulfilled, by retrieving the corresponding monitoring information;

• Orchestrator: It carries out the management of HPC and Cloud resources, by selecting the optimal combination and running the tasks defined in the workflows;

Page 28: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 28 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

• Monitoring: This component collects information about the resources available and about the applications executed;

• Billing & Accounting: It gathers information about resources/software usage, translates this information into users’ cost and generates invoices whenever possible.

4.4 High Level Interactions

In D2.2 [6], we already defined how the components of the high-level architecture interact in order to implement some of the features. We defined how this works for ‘Moving large data in EUXDAT’, ‘Defining a new data analysis in EUXDAT’ and ‘Running a data analysis in EUXDAT’. While the minor changes in the high-level architecture required also to change the diagram defined for the last feature listed, the other two remain valid as defined originally. We also added two new diagrams.

Running a data analysis in EUXDAT

In D2.2 [6], the sequence diagram for running a data analysis was presented. Here we extend it to include the Billing & Accounting component introduced in section 4.1.

Figure 2: Running Data Analysis Sequence Diagram

Page 29: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 29 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

For the textual explanation of Figure 2, please refer to D2.2 [6]. Here we have added a further step after the execution has terminated and the Monitoring component has communicated this to the Orchestrator. The Monitoring component will send the usage information to the Accounting component for billing purposes.

Register User

When a User, who has not yet signed up, wants to access the e-Infrastructure, they need to follow the registration procedure:

1. Access the I&A Manager via the EUXDAT Portal and fill in the registration form; 2. The system will send an automated email to the address provided in the form; 3. The prospective new user will have a time limit to respond to this email and acknowledge

their desire to register with the platform by clicking on a link. Once they have clicked on the link, the system will store this data in the user base.

Figure 3: User Registration Sequence Diagram

Visualize analysis in custom GUI

In section 4.4.1, the sequence diagram for running a data analysis is depicted and explained. We left out the visualization of the data by the user which we discuss here. In Figure 6 we show the sequence diagram for a specific case where a user wants to execute an analysis on a specific area of a map provided by Mundi. The user can draw a polygon on a map and the custom GUI will send the coordinates to Mundi. Mundi finds the available dates and times for which the selected area has data and presents the user with a list of timestamps to choose from. The user selects one and sends this to the system. The “Run Data Analysis” box represents the sequence diagram in Figure 2, i.e. all the EUXDAT components in that

Page 30: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 30 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

figure and their interactions. The end result of the analysis is a list of coordinates and other parameter values in the JSON format, sent to Mapserver which can render a user-friendly map which is displayed to the user. The Custom GUI can allow alternative views of the same data, which will allow the user to select this alternative view and have it rendered in the Custom GUI, via Mapserver.

Figure 4: Visualize Analysis in Custom GUI Sequence Diagram

4.5 Development Priorities and Roadmap

The EUXDAT e-Infrastructure has already released its first version (in M12), in which some of the expected features were already implemented. Still, there are a lot of features to implement and many of them in progress. Therefore, we have updated the roadmap initially defined in D2.2 [6]. Initially, the consortium planned several features to be implemented in v1 of the e-Infrastructure. In the end, these are the features available:

• Initial version of the Orchestrator, able to run tasks in the project infrastructures;

Page 31: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 31 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

• Initial version of the repository for code, datasets and images; • Python notebooks able to launch data analyses; • Set up the I&A Manager, so it will be possible to manage users; • TOSCA templates/examples for launching python code in Kubernetes Pods via Cloudify.

The following features are in progress (some of them belonging to the plans for v1 and others to the plans for v2):

• Deploy a marketplace, which can be used for publishing applications; • Improved version of the Orchestrator, using Cloud + HPC infrastructures and with a simple

algorithm for providers selection; • Enable more monitoring metrics, being able to retrieve information for creating application

profiles (i.e. resources used); • First version of the Data Manager, able to move data using several infrastructures and

protocols (at least GridFTP or similar); • Complete D&A Catalogue, for the datasets and applications/algorithms; • TOSCA templates/examples for Kubernetes Services via Cloudify; • Availability of a catalogue for datasets, so it will be possible to publish and retrieve metadata

(Open Micka); • Initial version of API documentation and developer documentation for ASPs.

Therefore, additionally to those features in progress, the new plans for v2 of the EUXDAT e-Infrastructure are:

• EUXDAT Portal, improving the tool for launching data analytics, with users management, with custom interfaces for scenarios and including a marketplace and the link with the D&A Catalogue for searching and accessing information;

• Set up the Monitoring infrastructure and take some simple metrics from the resource providers;

• Complete D&A Catalogue, for the datasets and applications/algorithms. Finally, in the case of the release v3 (M32), the proposed features for implementation are:

• Final version of the EUXDAT Portal, with the complete version of the tool for launching data analytics and integrating community tools (i.e. forums) and monitoring interfaces;

• Populated D&A Catalogue and Repository; • Improved Orchestrator, able to generate profiles and to use them to allocate resources; • Complete Data Manager, including the datasets evaluation mechanism for improving data

movement; • Complete list of monitoring probes (i.e. add metrics from the applications running); • Complete API documentation and developer documentation for ASPs.

Page 32: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 32 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

5. Detailed Design of Main Components This section provides a deeper view about the high-level components identified, giving an idea of their internal composition. It is not the purpose of this document to enter into the details of the implementation of each high-level component, since that will be defined in WP3 and WP4 which, later on, will also implement the components. It is important to highlight that the proposed diagrams include not only the subcomponents that we identify, but also how these are related to other high-level components, in order to specify which parts are expected to interact.

5.1 EUXDAT Portal

As described in D2.2 [6], the EUXDAT Portal is the main entry point, acting as one-stop-shop for the e-Infrastructure. The EUXDAT Frontend is in charge of linking to the different interfaces that provide those features that EUXDAT makes available for the stakeholders. As in the original version, there are several components that remain the same in the picture: EUXDAT Frontend, Monitoring Interface, Data Browser, Marketplace, Users Manager, Support Forums and Data Analytics Launcher. D2.2 provides more details about these pieces that were identified in the first iteration.

Figure 5: EUXDAT Portal High Level Architecture

The new pieces that have been identified are the “Developers’ Notebooks” and the “Custom GUIs”, since we have realized that they would be necessary for implementing certain capabilities. In the first case, since there are developers who implement their algorithms and want to test it through a friendly interface, EUXDAT will allow them to launch their code through the Orchestrator while, at the same

Page 33: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 33 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

time, they will have the capability to save their implemented code in the repository, so it can be reused or an application can be directly created. In the second case, some pilots and scenarios requested to implement customized interfaces, since they would fit better with the application to run. Therefore, we plan to implement a mechanism in which the custom interfaces implemented would be linked from the EUXDAT Frontend, so they can work like the Data Analytics Launcher component (with the different of this one being generic).

5.2 Identity and Authorization Manager

This high-level component was also defined in D2.2 [6], as part of the original high level architecture. It is in charge of managing users’ accounts and all the information associated to them. As explained originally, other components contact with this one in order to check the validity of the provided credentials and to grant access to certain features according to the credentials used.

The only change done to this component is related to the removal of LDAP as repository for storing the credentials. The implementation of the component did not require to use LDAP and, therefore, that part has been removed, so this architecture will be coherent with the current implementation.

Figure 6: I&A Manager High Level Architecture

Even without the LDAP, this component keeps enabling the single sign-on feature, and no further changes are envisaged.

5.3 Data and Algorithms Catalogue

As described in D2.2 [6], the Data & Algorithms Catalogue is in charge of organizing and maintaining the catalogue of applications/algorithms and datasets, together with their relevant metadata. This component is composed of two main modules: the Data Catalogue implemented with Open Micka (ref: http://micka.bnhelp.cz/) and the Marketplace with zen cart (ref: https://www.zen-cart.com/). Open Micka is an open source web application with a focus on the management of geospatial metadata. Zen cart is a widely used open source marketplace with an active developer and

Page 34: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 34 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

user community. More details about Open Micka and zen cart are presented in D3.1 Detailed Specification of the End Users’ Platform v1 [8].

Figure 7: D&A Catalogue High Level Architecture

The high-level architecture of the D&A Catalogue is not affected by the specific technologies chosen, neither by the modifications in the overall system mentioned in other parts of this document. Figure Figure 7 is thus included exactly as it was presented in D2.2.

5.4 Data and Algorithms Repository

Originally (as described in D2.2 [6]), the Data and Algorithms Repository had two main parts for storage: datasets to be used and code as implementation of the tools and applications (that was also including container images). Now, we have included another separated part: a repository for maps to be visualized. Although it can be considered as part of the datasets, we have set up a concrete repository for storing the map images and the different layers that can be shown with them, modifying the architecture of the component as a result.

Page 35: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 35 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Figure 8: D&A Repository High Level Architecture

This change has been introduced because, technically speaking, it is more efficient to keep this component as support to the visualization. Once the results of an application are ready, this piece (the ‘Maps Repository’) retrieves the maps and layers to be visualized, acting as a kind of intermediator between the frontend and the storage. In the case of EUXDAT, this is implemented by using MapServer (together with other solutions such as Mapnik). The rest of the component remains the same, so we keep a repository for data (in general) and another one for the source code, linked to the CI and CD mechanisms and to the registry of containers.

5.5 Data Manager

The Data Manager was already defined in D2.2 [6] with enough detail. In this case, we have added a modification related to the connection to the D&A Catalogue. Initially, we expected the ‘Data Mover’ component to be accessing the D&A Catalogue in order to retrieve metadata and find the location of a concrete dataset to be moved. In reality, the implementation of components such as the ‘Data Mover’ and the ‘Data Storage Connector’ can be done with Rucio, a rather new software package which is able to move very large amounts of data and that supports a lot of storage solutions.

Page 36: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 36 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Figure 9: Data Manager High Level Architecture

The main issue is that, effectively, even if it is possible to define policies, perform several storage operations (even in distributed environments) and to connect to multiple storages (Amazon S3, Google, GridFTP, etc.), it is necessary to provide already the location (and other information) of the source dataset. Therefore, we have added the ‘Catalogue Connector’, which is a client that will connect to the D&A Catalogue to retrieve the needed metadata when the Orchestrator is going to request the ‘Data Mover’ to perform any operation. With such component, we complete the functionality which is missing, so the implementation is as expected.

5.6 SLA Manager

The SLA Manager component, in charge of the Service Level Agreement negotiation and honouring of the contract agreed, was presented in D2.2 [6]. The high-level architecture diagram is repeated here for completeness.

Page 37: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 37 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Figure 10: SLA Manager High Level Architecture

As explained originally, it has an interfacing component for the negotiation and access part (SLA Negotiator), a solution for storing SLAs (SLAs Repository) and a component for monitoring that agreements are fulfilled (SLAs Monitor).

5.7 Orchestrator

The Orchestrator, sitting in the heart the EUXDAT e-Infrastructure was also introduced in D2.2 [6]. Its high-level architecture has not been affected by other modifications and extensions introduced in this document. Figure 11 thus depicts the Orchestrator’s internal sub-components and their interactions between them and with other EUXDAT components and is repeated as is from D2.2.

Page 38: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 38 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Figure 11: Orchestrator High Level Architecture

As defined originally, it has an interfacing component (Orchestrator Interface), a component for connecting with different HPC and Cloud solutions (Infrastructure Connectors), another component keeping profiles of the applications and libraries to run (Profiles Manager), a component able to connect to several monitoring solutions (Monitoring Connector) and a central component which manages the workflow to be run, executing all the required tasks (Orchestration Engine).

5.8 Monitoring

The Monitoring component is responsible for providing:

• the Orchestrator with information regarding the current status of the system’s various resources, their CPU usage and availability, and other metrics

• the Portal with a visualization of these metrics, for administrators and users • information to the SLA Manager with regards to specific agreements, and • data to Billing & Accounting, which will allow this component to accurately calculate usage

of resources

Page 39: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 39 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Figure 12: Monitoring High Level Architecture

The high-level Monitoring architecture introduced in D2.2 [6] has been augmented with the Billing & Accounting component and is depicted in Figure 12. The accompanying text in D2.2 explains further about this component, and still stands. The only clarification perhaps needed here is that Grafana will indeed be used for visualization purposes, and Prometheus has been selected as the Monitoring Collector. It is worth mentioning that with regards to accurately providing usage information, Prometheus (or any monitoring system we have considered), is not recommended for fine-grained accounting. Thus, a more sophisticated sub-module will possibly be required, to sit within the Monitoring Interface, or as an intermediate box which can convert monitoring metrics into accounting-ready data.

5.9 Billing & Accounting

The Billing & Accounting component is responsible for aggregating the usage of resources and services (data, algorithms, and computing resources) and handling customer payments. Figure 13 depicts the high-level architecture of the Billing & Accounting component. It is made up of the following modules:

• Accounting Interface: this is the point of access for the users. It may be incorporated into the EUXDAT Portal or be a separate view within the portal (or an iframe). The user will be able to visualize their usage of the various resources and any possible recurrent charges or services contracted from the platform.

Page 40: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 40 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

• Accounting Aggregator: this module gathers the usage data from the Monitoring component (pertaining to computing resources) and from the Portal (pertaining to algorithms and datasets). The Aggregator interacts with the Accounting Engine which does the actual calculations and communicates this information to the User Interface and the Billing module.

• Accounting Engine: the core engine of the Billing & Accounting component, this module will maintain the Accounting service running, monitor its modules and do most of the calculations. The Engine will hold the tables of charges and be responsible for balancing the books of the EUXDAT platform as a whole.

• Billing: the actual billing will be done through the Zen Cart eCommerce platform, the chosen technology for the EUXDAT Marketplace. Zen Cart already has payment gateways built in for PayPal, LinkPoint, YourPay and others, and can connect fairly easily to live payment gateway services for credit card payments, etc.

Figure 13: Billing & Accounting High-level Architecture

Page 41: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 41 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

6. EUXDAT Deployment This section explains how the EUXDAT e-Infrastructure from an operations point of view, as well as how the building blocks of the platform, the components will be deployed, and how it will be possible to obtain an operative production environment alongside with a flexible development environment.

6.1 Deployment Infrastructure

Deployment

As already defined in D2.2 [6], the deployment infrastructure is divided into 3 major parts: the Portal environment, the Cloud backend and the HPC/HPDA backend. At the portal environment, all portal related components and global services run, e.g. Monitoring. On this level all components reside that are intended to steer workflows going through the EUXDAT platform. The actual computation is carried out either in a Cloud or HPC/HPDA environment, or in all of them, depending on the workload definition and how the orchestrator decides under consideration of current status, application profile, load and SLAs.

Figure 14: EUXDAT Deployment

Page 42: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 42 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Besides the Cloud and HPC system a third system for big data workloads, an HPDA system, is being introduced as additional computation backend, and is physically located in the HPC environment, uses the same storage, but has a dedicated frontend. It is not explicitly illustrated in the figure above as the setup is identical as for HPC.

Stages

In addition to the general deployment setup, D2.2 [6] also introduced several platform stages considered necessary, in terms of development environment, integration environment and production environment. These deployment stages solely concern the different EUXDAT components, however, they do not concern the computation environments which can be considered to be in production.

Figure 15: EUXDAT Deployment Stages

As defined originally, there are three stages, first one is the development environment where developers deploy and test their components during development phases, in this environment simple test data sets will serve as input to process. As soon as a component can be considered stable and working by the developer, it is staged to the integration stage where the interaction with other components of the platform is tested with real data. Components that passed QA successfully are staged into the production environment. All three stages use the same Cloud and HPC backend.

Page 43: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 43 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

API Development

The latest changes of the development workflow, in order to fully utilize the 3 stages for the API development as well, are about the actual remote server to be queried by the PyNotebooks. This is a new feature to be included and which also affects the deployment of the e-Infrastructure. An environment variable $REMOTE_HOST has been foreseen to control on which stage (development, integration, production) the RESTful API query will be sent to.

Figure 16: EUXDAT API Development

Services

For the deployment, we keep proposing the solution mentioned in D2.2 [6]. There is a git review tool (gerrit) and code repository (gitlab), as well as a continuous integration tool (Jenkins). These tools in combination provide the required services for an automated management of commits and their deployment onto the corresponding stage.

Page 44: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 44 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

Figure 17: EUXDAT Deployment Services

The process is the same as described in D2.2. In a first step an update is committed to gerrit, where the code review takes place. Then, the code is merged and stored in the gitlab repository. Jenkins is triggered each time a commit reaches the repository and the code is deployed on the corresponding stage (development, integration, production). Further, Jenkins is able to enable automated testing by executing regression test suites automatically after component deployment. These testing provide logs for investigation in case of issues. The latest enhancements foresee to couple the development and integration stages with Kubernetes, in order to enable it to deploy docker containers hosting EUXDAT Portal components. And make use of Jenkins to steer deployments on the integration stage. Additionally, Jenkins is intended to take over the build process of (central) components as well as for application code to be executed on backend resources. The output of Jenkins will be docker containers for the cloud hosting and computation environment and native binaries for the HPC/HPDA.

6.2 Components Deployment

The deployment of the components described in this document is the same as the solution proposed in D2.2 [6]. Every component will be seen as a docker micro service, by deploying them in VMs. We are using containers with Kubernetes for managing them correctly. We still need to think about the best way to do the deployment, taking into account how each component works, by analysing the components behaviour and performance. In the ideal case, each central component (that might require more resources than others) will be deployed in one VM, but we can deploy some components together, in order to save resources and

Page 45: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 45 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

minimize communication during their interaction. This will have a direct impact on the operational cost of the e-Infrastructure. As exemplified in D2.2, the SLA Manager, Orchestrator and Monitoring could be deployed together or, at least, in the same physical machine, since they will interact closely and the SLA Manager and Orchestrator are not expected to be active continuously. The D&A Repository and D&A Catalogue could be located together in a VM as well, providing enough storage capability, so the repository will have enough capacity. As the D&A Catalogue reflects what is contained in the repository, and they collaborate closely. The Data Manager (based on Rucio) could be deployed next to them as well, due to its relationship with the repository. In the case of the I&A Manager, it has been deployed alone, although it might be deployed with another one. For instance, it could be deployed together with the EUXDAT Portal (taking into account the users’ management feature), although this component may need scalability capabilities in case a lot of users try to access the Portal at the same time (taking into account that it will have a web server and other tools). We still need to finalize the selection of all the tools to be used for each component and we also need to understand how the selected ones perform and scale up/down. This means that the current configuration may not be optimal. Therefore, the deployment will be analysed again once there is a new version of the detailed design and its implementation. In addition to the components the applications for processing data on the computation backends require due to the nature of the different environments another approach. While in the cloud docker containers can be deployed, the HPC and HPDA systems require the compilation of native binaries. Developers steer the deployment of their components while under development manually, by the help of Kubernetes. Jenkins also uses Kubernetes to update components on development and integration stages each time there is a new stable version becomes available and containers and/or binaries have been built.

Page 46: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 46 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

7. Conclusions Based on the previous high-level architecture, on the updated requirements and on the experience collected during the implementation of the first version of the EUXDAT e-Infrastructure, this document presents an update of the EUXDAT high-level architecture and features. The document analyses the changes identified in the collected requirements, highlighting whether new features had to be added. Again, such top down approach has shown that the features we had identified in the first version are still valid, and only a few features were included. Most of the new requirements were already covered with the originally proposed features, showing that the first design was already very complete. The high-level architecture has been modified, adding a new module for accounting and billing, showing that the original architecture was very complete, but flexible enough to include new features, as needed. The proposed changes required also modifications in one of the diagrams that was defining the interactions among components. Also, new diagrams (interactions) were proposed, so it is clearer how the components behave for different features. The whole picture is much clearer, so it has been possible also to update the high-level design of the components, introducing the required changes and also taking into account the tools we already selected for implementing them. Designs are now much more in line with the implementation and with the real possibilities of EUXDAT. The deployment solution has been also updated, especially taking into account the approach proposed in the context of WP3. In any case, the designs and the deployment are only the base for further development in the context of WP3 and WP4. Finally, there is still room for improvement and adaptation depending on the next implementation and the arrival of new requirements (if any). The architecture already demonstrated its flexibility and completeness, so it will be possible to perform any required adaptation without requiring major changes.

Page 47: D2.4 EUXDAT e-Infrastructure Definition v2

Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 47 of 47

Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final

8. References [1] EUXDAT; “D2.1 Description of Proposed Pilots and Requirements”; Jedlička, Karel et al; 2018.

[2] European e-Infrastructure for Extreme Data Analytics in Sustainable Development (EUXDAT). Grant Agreement. Nieto, Francisco Javier. 2017.

[3] SDI4Apps; Open Land Use Map; http://sdi4apps.eu/open_land_use/; retrieved 2019-02-25

[4] Copernicus; Copernicus Data Access; http://copernicus.eu/data-access; retrieved 2019-02-25

[5] OASIS; TOSCA Simple Profile in YAML Version 1.1; http://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.1/TOSCA-Simple-Profile-YAML-v1.1.html; 30th January 2018; retrieved 2018-05-25

[6] EUXDAT; “D2.2 EUXDAT e-Infrastructure Definition”; Nieto, F. Javier et al; 2018.

[7] EUXDAT; “D2.3 Updated Report on e-Infrastructure Requirements v1”; Jedlička, Karel et al; 2018.

[8] EUXDAT; “D3.1 Detailed Specification of the End Users’ Platform v1”; Castel, Fabien et al; 2018.