Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
This document is issued within the frame and for the purpose of the EUXDAT project. This project has received funding from the European Union’s Horizon2020 Framework Programme under Grant Agreement No. 777549. The opinions expressed and arguments employed herein do not necessarily reflect the official views of the European Commission.
This document and its content are the property of the EUXDAT Consortium. All rights relevant to this document are determined by the applicable laws. Access to this document does not grant any right or license on the document or its contents. This document or its contents are not to be used or treated in any manner inconsistent with the rights or interests of the EUXDAT Consortium or the Partners detriment and are not to be disclosed externally without prior written consent from the EUXDAT Partners.
Each EUXDAT Partner may use this document in conformity with the EUXDAT Consortium Grant Agreement provisions.
(*) Dissemination level.-PU: Public, fully open, e.g. web; CO: Confidential, restricted under conditions set out in Model Grant Agreement; CI: Classified, Int = Internal Working Document, information as referred to in Commission Decision 2001/844/EC.
D2.4 EUXDAT e-Infrastructure Definition v2
Keywords:
Data Analytics, Big Data, e-Infrastructure, Architecture, Design, EUXDAT
Document Identification
Status Final Due Date 31/01/2019
Version 1.0 Submission Date 29/04/2019
Related WP WP2 Document Reference D2.4
Related Deliverable(s)
D2.1, D2.2, D2.3 Dissemination Level (*) PU
Lead Participant F. Javier Nieto (ATOSES) Lead Author F. Javier Nieto (ATOSES)
Contributors ATOSES, USTUTT, ATOSFR
Reviewers Marcela Doubkova (PESSL)
Fabien Castel (ATOSFR)
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 2 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Document Information
List of Contributors
Name Partner
F. Javier Nieto ATOSES
Spiros Michalakopoulos ATOSES
Nico Struckmann USTUTT
Fabien Castel ATOSFR
Document History
Version Date Change editors Changes
0.1 09/01/2019 F. J. Nieto (ATOSES) Table of Contents
0.2 14/01/2019 F. J. Nieto (ATOSES) ToC update and section 2
0.3 25/01/2019 F. J. Nieto (ATOSES) Sections 2 and 3
0.4 08/02/2019 F. J. Nieto (ATOSES), F. Castel (ATOSFR), S. Michalakopoulos (ATOSES)
Updates to section 3, contributions to sections 4 and 5
0.5 22/02/2019 F. J. Nieto (ATOSES), N. Struckmann (USTUTT), S. Michalakopoulos (ATOSES)
Add section 6 contributions. Update content in sections 3 to 5.
0.6 27/02/2019 F. J. Nieto (ATOSES) Minor updates and section 7.
0.7 17/04/2019 F. J. Nieto (ATOSES) Changes according to review comments
0.8 29/04/2019 F. J. Nieto (ATOSES) Final version for quality review
0.9 30/04/2019 ATOS ES Quality review
U 30/04/2019 FINAL VERSION TO BE SUBMITTED
Quality Control Role Who (Partner short name) Approval Date
Deliverable leader F. Javier Nieto (ATOSES) 29/04/2019
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 3 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Technical manager Fabien Castel (ATOSFR) 29/04/2019
Quality manager Susana Palomares (ATOSES) 29/04/2019
Project Manager F. Javier Nieto (ATOSES) 24/09/2019
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 4 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Table of Contents Document Information ............................................................................................................................ 2
Table of Contents .................................................................................................................................... 4
List of Tables ........................................................................................................................................... 6
List of Figures ......................................................................................................................................... 7
List of Acronyms ..................................................................................................................................... 8
1. Executive Summary ....................................................................................................................... 11
2. Introduction .................................................................................................................................... 12
2.1 Relation to other project work ................................................................................................ 12
2.2 Structure of the document ...................................................................................................... 12
3. EUXDAT Features ......................................................................................................................... 14
3.1 Requirements Analysis ........................................................................................................... 14
3.2 Main EUXDAT Features ....................................................................................................... 15
Support for Several Data Formats ..................................................................................... 15
Algorithms and Applications Management ....................................................................... 15
Data Management and Processing ..................................................................................... 16
Security and Users Management ....................................................................................... 16
Visualization and Interaction Capabilities ......................................................................... 17
Management of Computing and Storage Resources .......................................................... 17
Extreme Data Analytics as a Service ................................................................................. 18
3.3 Mapping Requirements and Features ..................................................................................... 18
4. EUXDAT Architecture .................................................................................................................. 26
4.1 High Level Architecture ......................................................................................................... 26
4.2 Main Actors ............................................................................................................................ 27
4.3 Main Components .................................................................................................................. 27
4.4 High Level Interactions .......................................................................................................... 28
Running a data analysis in EUXDAT ................................................................................ 28
Register User ..................................................................................................................... 29
Visualize analysis in custom GUI ...................................................................................... 29
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 5 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
4.5 Development Priorities and Roadmap .................................................................................... 30
5. Detailed Design of Main Components ........................................................................................... 32
5.1 EUXDAT Portal ..................................................................................................................... 32
5.2 Identity and Authorization Manager ...................................................................................... 33
5.3 Data and Algorithms Catalogue ............................................................................................. 33
5.4 Data and Algorithms Repository ............................................................................................ 34
5.5 Data Manager ......................................................................................................................... 35
5.6 SLA Manager ......................................................................................................................... 36
5.7 Orchestrator ............................................................................................................................ 37
5.8 Monitoring ............................................................................................................................. 38
5.9 Billing & Accounting ............................................................................................................. 39
6. EUXDAT Deployment................................................................................................................... 41
6.1 Deployment Infrastructure ..................................................................................................... 41
Deployment ........................................................................................................................ 41
Stages ................................................................................................................................. 42
API Development .............................................................................................................. 43
Services .............................................................................................................................. 43
6.2 Components Deployment ....................................................................................................... 44
7. Conclusions .................................................................................................................................... 46
8. References ...................................................................................................................................... 47
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 6 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
List of Tables Table 1: Requirements Traceability Matrix ____________________________________________________ 19
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 7 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
List of Figures Figure 1: EUXDAT High Level Architecture ___________________________________________________ 26 Figure 2: Running Data Analysis Sequence Diagram ____________________________________________ 28 Figure 3: User Registration Sequence Diagram _________________________________________________ 29 Figure 4: Visualize Analysis in Custom GUI Sequence Diagram ____________________________________ 30 Figure 5: EUXDAT Portal High Level Architecture ______________________________________________ 32 Figure 6: I&A Manager High Level Architecture ________________________________________________ 33 Figure 7: D&A Catalogue High Level Architecture ______________________________________________ 34 Figure 8: D&A Repository High Level Architecture _____________________________________________ 35 Figure 9: Data Manager High Level Architecture _______________________________________________ 36 Figure 10: SLA Manager High Level Architecture _______________________________________________ 37 Figure 11: Orchestrator High Level Architecture _______________________________________________ 38 Figure 12: Monitoring High Level Architecture _________________________________________________ 39 Figure 13: Billing & Accounting High-level Architecture _________________________________________ 40 Figure 14: EUXDAT Deployment ____________________________________________________________ 41 Figure 15: EUXDAT Deployment Stages ______________________________________________________ 42 Figure 16: EUXDAT API Development _______________________________________________________ 43 Figure 17: EUXDAT Deployment Services _____________________________________________________ 44
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 8 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
List of Acronyms
Abbreviation / acronym
Description
AMQP Advanced Message Queuing Protocol
API Application Programming Interface
ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer
AWS Amazon Web Services
CD Continuous Delivery
CEP Complex Event Processing
CI Continuous Integration
CoAP Constrained Application Protocol
C-SAR CARIS Spatial Archive
DEM Digital Elevation Model
DIAS Data and Information Access Services
Dx.y Deliverable number y belonging to WP x
EBDVF European Big Data Value Forum
EC European Commission
ECMWF European Centre for Medium-Range Weather Forecasts
EDI Electronic Data Interchange
EO Earth Observation
FTP File Transfer Protocol
GDPR General Data Protection Regulation
GRD Surfer Grid File
GUI Graphical User Interface
HPC High Performance Computing
HTTPS Hypertext Transport Protocol Secure
IoT Internet of Things
JPEG Joint Photographic Experts Group
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 9 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Abbreviation / acronym
Description
JSON JavaScript Object Notation
JWE JSON Web Encryption
JWT JSON Web Token
KPI Key Performance Indicator
L1C Level-1C
LAI Leaf Area Index
LDAP Lightweight Directory Access Protocol
LPIS Land Parcel Identification System
MODIS Moderate Resolution Imaging Spectroradiometer
MQTT Message Queuing Telemetry Transport
MSI Mass Spectrometry Imaging
NDVI Normalized Difference Vegetation Index
OEM Object Exchange Model
OTM Open Transport Map
PDF Portable Document Format
QoS Quality of Service
Q&A Questions and Answers
REST Representational State Transfer
RGB Red, Green, Blue
RPAS Remotely Piloted Aircraft Systems
SLA Service Level Agreement
SLC SLiCe format
SOAP Simple Object Access Protocol
SSH Secure SHell
TIF Tagged Image File Format
TOSCA Topology and Orchestration Specification for Cloud Applications
UAV Unmanned Aerial Vehicle
UML Unified Modelling Language
VIIRS Visible Infrared Imaging Radiometer Suite
VM Virtual Machine
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 10 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Abbreviation / acronym
Description
WMS Web Map Service
WP Work Package
XML eXtensible Markup Language
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 11 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
1. Executive Summary This document provides an update of the high-level view of the EUXDAT e-Infrastructure. First of all, taking into account the original requirements and their update, it updates the description of the features that the project team has identified to be implemented. With such features in mind, the document proposes minor updates to the high-level architecture, together with the updated list of users that will make use of EUXDAT. The high-level interactions among the components are updated, while some new are proposed. The document also updates the description of the internal design expected for each high-level component (although in some cases there are no changes), while the design for a new high-level component is proposed. Finally, the document updates the approach proposed for deploying the e-Infrastructure components, in order to have an operational version of EUXDAT.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 12 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
2. Introduction During the first iteration of the project, the consortium already proposed a first version of the features to be implemented and the high-level architecture that would enable such implementation. Deliverable D2.2 [6] reported the analysis done with respect to the original requirements and the features that were necessary. Also, it proposed a high-level architecture with some high-level components, a set of actors and an initial roadmap for the implementation. D2.2 [6] also identified some crucial features and defined sequence diagrams that showed how the high-level components should interact together. Additionally, a high-level design of the components was proposed, as well as a deployment structure, so EUXDAT e-Infrastructure would be operational. Once the requirements defined originally in D2.1 [1] have been updated in D2.3 [7], and according to the outcomes of the first version of the e-Infrastructure, the consortium has updated the architecture, so it can include any missing feature and it can fit better with the actual and future implementation. This document reports on the analysis of the last version of the requirements and the update of the features, updating, in line with them, the high-level architecture, the proposed components and their high-level design. It also updates and completes the definition of interactions among components, and it updates the deployment of the EUXDAT e-Infrastructure.
2.1 Relation to another project work
As reported in D2.2 [6], the architecture definition is related to the rest of WPs in the project, since it is a centric activity:
• In WP2 (requirements gathering), as it provides the requirements to extract features to be implemented;
• In WP3 (detailed design and implementation of components), as the architecture will determine the high-level components to detail and the features to be implemented;
• In WP4 (infrastructure platform), as in WP3, the architecture defines the way to follow with respect to high level components and features;
• In WP5 (Integration and e-Infrastructure Provision to Pilots), integration activities are influenced by the architecture, as well as the pilots implementation and their usage of EUXDAT features.
2.2 Structure of the document
As in D2.2 [6], this document is structured in five major chapters Chapter 3 presents the features that have been identified to be provided by the EUXDAT e-Infrastructure, based on the requirements collected in deliverable D2.1 [1] and the updates introduced in D2.3 [7].
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 13 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Chapter 4 updates the high-level architecture for EUXDAT, the actors and the interactions among components. Chapter 5 updates the high-level design of the components defined for the high-level architecture. Chapter 6 updates the strategy for deploying the different parts of the EUXDAT e-Infrastructure. Chapter 7, finally, presents the summary and conclusions for the current version of the EUXDAT architecture.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 14 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
3. EUXDAT Features 3.1 Requirements Analysis
The architecture defined in D2.2 [6] was based on the requirements collected in D2.1 [1]. Now, there is an update on the collected requirements, already reported in D2.3 [7]. Therefore, it is necessary to update the original analysis, so the architecture will be updated accordingly. There were some changes with respect to requirements for data and data processing:
• New requirement to support soil moisture data from PESSL's instrumentation. • Several requirements were merged about availability and resolution of Sentinel datasets and
time series; • New requirement for enabling statistics on multi-temporal data for given field, I.e. monthly
averaging of spatial datasets (pilot 1). Also, some new functional and technical requirements were proposed for the platform (in some cases, also directly related to the pilots as well):
• Support for structured, semi-structured and un-structured data; • Provision of RESTful interfaces for accessing processing capabilities of EUXDAT platform; • Use of containerization solutions for implementation and deployment of processing
algorithms; • Provision of Data and Processes Catalogue and Marketplace; • Data ingestion and caching in the platform; • EUXDAT shall provide an orchestration mechanism that will allow sending tasks to the
underlying infrastructure in a transparent way to EUXDAT users; • EUXDAT shall provide a web development frontend which will facilitate developers and data
processing experts’ users preparing, testing and deploying their algorithms in the platform, as well as publishing them as new services;
• EUXDAT General Frontend; • EUXDAT Pilot Application Frontend.
Some of these requirements were proposed when the pilots were defined more in detail and when the consortium discussed about how to serve functionality to stakeholders. Additionally, other requirements were proposed after discussing with such stakeholders and understanding the ways in which they would be willing to use EUXDAT. In any case, most of the new functional requirements represent functionalities that, somehow, the consortium was expecting to provide, so the list of proposed features has not been altered too much. Other are directly linked to requirements already collected, focused on data formats to be supported and data analytics.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 15 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
The following subsections report the updates with respect to the original features already defined in D2.2 [6]. For further details, such report should be accessed.
3.2 Main EUXDAT Features
Support for Several Data Formats
According to the original requirements, EUXDAT will support structured data, semi-structured data and unstructured data from different data sources.
EUXDAT is already able to support several data formats, which cover the three kinds of data mentioned before: Copernicus images [4], JSON responses from REST interfaces, maps with different layers, etc…
According to the initial requirements, EUXDAT was expected to support the following data:
• Sensor data;
• Drone data;
• Remote-Sensing/Geospatial data;
• Land Use and Administrative data (see [3]);
• Meteorological data.
All of those remain valid for the current list of features, and there is a new requirement about accessing PESSL data. In reality, PESSL provides data from sensors, but this is accessed through standardized interfaces through their platform (i.e. based on REST). Therefore, we consider that the original definition about supported formats is still valid, requiring for EUXDAT just to implement a concrete connector for the PESSL platform. It is possible to see more details about the data types, storage and shapes in D2.2 [6].
Algorithms and Applications Management
Data analytic algorithms
The D2.2 report already listed several algorithms that would be necessary for analysing large datasets in the context of the different pilots, with the purpose of implementing all the scenarios identified per pilot (calculation of multi- and hyperspectral indices, atmospheric corrections, data merging, fields categorization, etc). These algorithms are being implemented using several approaches. In some cases, it is code directly written by developers (i.e. in Python) and, in other cases, developers make use of some libraries that are useful for the pursued purpose (i.e. GRASS). In any case, both the libraries and the code need to be made more parallelizable, since the initial implementations do not take this into account and they are limited in terms of scalability.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 16 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
This will be supported now by the usage of Notebooks, that can be used by developers in order to implement some algorithms and to test them directly from the EUXDAT e-Infrastructure. The Notebooks will be connected with the EUXDAT backend in order to execute the algorithms, depending on their complexity, and developers will be able to modify their code in situ. Later on, it will be possible for them to directly move the new code to the repository (so it can be integrated with other codes or published as a new service). Finally, as originally proposed and as requested in the last requirements, the analytic functions and applications will be exposed through a marketplace, according to the owners’ conditions.
Algorithms managed as containerized applications
As already discussed in D2.2 [6], EUXDAT will use containers for packaging the applications and some algorithms, so it will be easier to deploy and execute them in different platforms. This will be especially relevant for those parts to be executed in Cloud environments, where the usage of containers is rather usual. D2.2 [6] already listed a set of functionalities related to the usage of containers (a registry of images, REST APIs to manage containers, etc.). Many of these functionalities are already available, through a Kubernetes connector. Although this solution is valid for the Cloud part, it is not the case for the HPC side. Therefore, for those cases in which the source code is stored in the EUXDAT repository, it will be possible to use Continuous Integration and Continuous Delivery mechanisms in order to generate compiled code that can be used for the HPC centres involved in EUXDAT.
Data Management and Processing
In D2.2 [6], we already presented several functionalities related to data management from the remote and local perspective. First of all, we mentioned the need of providing a data catalogue, where stakeholders would be able to find all the data they are looking for. This point has been confirmed with the inclusion of a new requirement which is requesting such catalogue. Therefore, it will be possible to access valuable metadata and to access the point in which it is effectively stored (including Copernicus data through the Mundi DIAS). As requested, EUXDAT will be able to download large amounts of data but, in order to avoid repetitive downloads, a cache mechanism will make things easier. Since the tool used for the data management allows for the definition of policies, we will allow an optimal configuration, so it will be possible to determine which datasets to keep locally and which ones to just download from the remote source. This feature was already proposed, although there is a concrete requirement for this now. Locally speaking, EUXDAT will provide private workspaces, where users will be able to read and write data (with some space limitations). In those cases in which datasets need to be used by the pilots (i.e. UAV data), EUXDAT will guarantee a local data storage for such information.
Security and Users Management
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 17 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
User Management
In the case of users’ management, there are no big changes with respect to D2.2. The main feature to provide here is the Single-Sign-On for the different tools available in the project and the secure storage of data related to users’ accounts. Although no sensitive data is expected to be stored, we will include data protection forms as a way to be compliant with the GDPR regulation. The main change is that we do not consider LDAP as the solution to be used anymore.
Security
As mentioned in D2.2 [6], security will be based on JWT and JWE (this is, by using security tokens). There are no main changes with respect to the usage of secure interfaces. As for secure data moving, the tool selected (Rucio) is able to move data in an encrypted way.
Privacy
In this aspect, EUXDAT maintains what was described in D2.2 [6]. EUXDAT will allow for the storage of private data, so only authorized people can access certain datasets, and this will be also controlled from the catalogue said.
Visualization and Interaction Capabilities
The original set of features already included functionalities related to how to visualize data in EUXDAT. Basically, it consists on providing GUIs with maps in which it is possible to paint different layers, representing the outcomes of the algorithms executed. Document D2.2 [6] described more details about the solutions that can be used. What is certainly new is the different GUIs that will be available through the EUXDAT frontend. As required, EUXDAT will support two kinds of interfaces: a generic one that can be used for running almost any algorithm/application and customized interfaces for the pilots/scenarios and other services to be provided through EUXDAT. While in the first case the interface will be built dynamically depending on the required inputs, just showing results in a generic way, the custom interfaces will be directly linked to the frontend, but they will have interfaces totally tailored to the inputs and outputs involved, increasing usability.
Management of Computing and Storage Resources
The features proposed for the management of computing and storage resources were described with enough detail in D2.2 [6]. Even if there are new requirements that are related to this topic, the set of described features already included the requested capacities. We keep the need to orchestrate resources and select the most appropriate ones depending on the tasks to be executed, thanks to tasks profiles. The selected solution makes use of TOSCA [5] for defining
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 18 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
the workflows. Also, aspects such as monitoring and SLAs are still important, since they will control that everything is working as expected (and SLAs may be important for certain business models). The only feature that is new is the need to provide some accounting and billing mechanism. When charging users for their usage of applications and/or resources we need to retrieve monitoring information that can tell us the resources and applications used. Once we have all this information, it will be possible to generate bills for the users and to proceed with any payment that has to be done. Payments might depend on different factors but, mainly, on the business models to apply, so that will be clarified in the context of WP7.
Extreme Data Analytics as a Service
This feature is one of the key points of EUXDAT and it is closely related to the visualization. D2.2 [6] already described how we expected to provide very large data analytics as a service. The idea is to provide the generic interface proposed in the visualization section, in such a way that it is possible to run certain data analyses just with a few input parameters. Additionally, custom interfaces will make things easier for end users. Also, as discussed in the context of the project meetings, EUXDAT will also enable a mechanism in such a way that it will be also possible to run these applications/algorithms through a web service interface, as they will be exposed as geo-services (following OGC standards). It is important to highlight that EUXDAT will provide as many as tools adapted to their usage in the context of the e-Infrastructure, such as GRASS, Orfeo Toolbox, etc… Tools, libraries, algorithms and applications will be packaged in such a way they can be used easily. Of course, the documentation about EXUDAT should be very complete, and there will be mechanisms that will support the users, such as training tools, community tools (to ask questions, share experiences, etc), examples, etc.
3.3 Mapping Requirements and Features
This section describes how requirements captured in EUXDAT D2.1 Description of Proposed Pilots and Requirements [1] match to Main EUXDAT Features described in chapter 3.2 of this deliverable. The legend is as follows: green colour marks a full match, yellow colour means partial match and white colour goes for no match.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 19 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Table 1: Requirements Traceability Matrix
Requirement ID Requirement Name Data
For
mat
s Su
ppor
t
Alg
orith
ms
and
App
licat
ions
Mgt
. Da
ta M
gt. a
nd
Proc
essin
g Se
curit
y an
d Us
ers
Mgt
. Vi
sual
izatio
n an
d In
tera
ctio
n
Man
agem
ent o
f Re
sour
ces
Extre
me
Data
A
naly
tics
as a
Ser
vice
EUXDAT-REQ-Pilots-DATA-001
Level-1C multi-spectral imaging products from the Sentinel-2
EUXDAT-REQ-Pilots-DATA-002
UAV-enabled hyperspectral imagery
EUXDAT-REQ-Pilots-DATA-003 Climate data
EUXDAT-REQ-Pilots-DATA-004
Dynamic cropland mask, crop type map and LAI from Sen2-Agri system
EUXDAT-REQ-Pilots-DATA-005
Copernicus European Digital Elevation Model (EU-DEM), version 1.1
EUXDAT-REQ-Pilots-DATA-006 Land use map
EUXDAT-REQ-Pilots-DATA-007 Soil map
EUXDAT-REQ-Pilots-DATA-008
Soil moisture data from Pessl's instrumentation
EUXDAT-REQ-Pilots-DATA-009 Open Land Use Map
EUXDAT-REQ-Pilots-DATA-010
Land Parcel Identification System (LPIS)
EUXDAT-REQ-Pilots-DATA-011 Hydrology for EU
EUXDAT-REQ-Pilots-DATA-012 Actual weather
EUXDAT-REQ-Pilots- Historic weather
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 20 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Requirement ID Requirement Name Data
For
mat
s Su
ppor
t
Alg
orith
ms
and
App
licat
ions
Mgt
. Da
ta M
gt. a
nd
Proc
essin
g Se
curit
y an
d Us
ers
Mgt
. Vi
sual
izatio
n an
d In
tera
ctio
n
Man
agem
ent o
f Re
sour
ces
Extre
me
Data
A
naly
tics
as a
Ser
vice
DATA-013
EUXDAT-REQ-Pilot-001
Atmospheric correction of Multispectral Sentinel bands
EUXDAT-REQ-Pilot-002
Enable calculation of spectral indices from the 12 Sentinel multispectral bands
EUXDAT-REQ-Pilot-003
Calculation of Hyperspectral indices relevant for stress and disease
EUXDAT-REQ-Pilot-004
Availability of Sentinel-2 data at field scale/for a given polygon for given time period
EUXDAT-REQ-Pilot-005
2D visualization of time-series over selected pixels, provision of interfaces, toolkits
EUXDAT-REQ-Pilot-006
Installation of Sen2Agri system and provision of Dynamic cropland mask, crop type map and LAI
EUXDAT-REQ-Pilot-007
Enable statistics on multi-temporal data for given field, I.e. monthly averaging of spatial datasets.
EUXDAT-REQ-Pilot-008
Collecting machinery tracking data
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 21 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Requirement ID Requirement Name Data
For
mat
s Su
ppor
t
Alg
orith
ms
and
App
licat
ions
Mgt
. Da
ta M
gt. a
nd
Proc
essin
g Se
curit
y an
d Us
ers
Mgt
. Vi
sual
izatio
n an
d In
tera
ctio
n
Man
agem
ent o
f Re
sour
ces
Extre
me
Data
A
naly
tics
as a
Ser
vice
EUXDAT-REQ-Pilot-009
Collecting of agro-meteorological data
EUXDAT-REQ-Pilot-010
Calculation of yield productivity zones
EUXDAT-REQ-Pilot-011
Zone related morphometric statistic
EUXDAT-REQ-Pilot-012
Water influence to weather conditions
EUXDAT-REQ-Pilot-013 3D visualization
EUXDAT-REQ-PLATF-001
Support for various HPC and Cloud providers
EUXDAT-REQ-PLATF-002
Monitor HPC and Cloud resources
EUXDAT-REQ-PLATF-003
Applications monitoring and profiling
EUXDAT-REQ-PLATF-004
Adequate operation of the platform
EUXDAT-REQ-PLATF-005
Optimize data movement
EUXDAT-REQ-PLATF-006
Support security and privacy in data management
EUXDAT-REQ-PLATF-007
Automated deployment and execution of applications
EUXDAT-REQ-PLATF-008
API access to pilots' data and services
EUXDAT-REQ- User management
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 22 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Requirement ID Requirement Name Data
For
mat
s Su
ppor
t
Alg
orith
ms
and
App
licat
ions
Mgt
. Da
ta M
gt. a
nd
Proc
essin
g Se
curit
y an
d Us
ers
Mgt
. Vi
sual
izatio
n an
d In
tera
ctio
n
Man
agem
ent o
f Re
sour
ces
Extre
me
Data
A
naly
tics
as a
Ser
vice
PLATF-009
EUXDAT-REQ-PLATF-010
Access sensor observations
EUXDAT-REQ-PLATF-011
Support information modelling
EUXDAT-REQ-PLATF-012
Support integration of meta-information
EUXDAT-REQ-PLATF-013
Compliance with INSPIRE specifications
EUXDAT-REQ-PLATF-014
Compliance with GEO/GEOSS specifications
EUXDAT-REQ-PLATF-015
Integrate Web map services
EUXDAT-REQ-PLATF-016
Multiple Data Centers in the Cloud
EUXDAT-REQ-PLATF-017
Cloud Data Storage
EUXDAT-REQ-PLATF-018
Dependability
EUXDAT-REQ-PLATF-0219
Big Data Management
EUXDAT-REQ-PLATF-020
Identity Management & Access control
EUXDAT-REQ-PLATF-021
Scalability – Users growth
EUXDAT-REQ-PLATF-022
Scalability – Data growth and complex analytics
EUXDAT-REQ- Data decentralization
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 23 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Requirement ID Requirement Name Data
For
mat
s Su
ppor
t
Alg
orith
ms
and
App
licat
ions
Mgt
. Da
ta M
gt. a
nd
Proc
essin
g Se
curit
y an
d Us
ers
Mgt
. Vi
sual
izatio
n an
d In
tera
ctio
n
Man
agem
ent o
f Re
sour
ces
Extre
me
Data
A
naly
tics
as a
Ser
vice
PLATF-023
EUXDAT-REQ-PLATF-024
Parallel data stream processing
EUXDAT-REQ-PLATF-025
Reduction in energy consumption by improved processing algorithms
EUXDAT-REQ-PLATF-026
Use of efficient hybrid architectures
EUXDAT-REQ-PLATF-027
Visualization of large amounts of data
EUXDAT-REQ-PLATF-028
Support of different formats for visualization
EUXDAT-REQ-PLATF-029
Provide rich user interfaces for the interactive visualization
EUXDAT-REQ-PLATF-030
Render high resolution data in N arbitrary dimensions
EUXDAT-REQ-PLATF-031
Personalised end-user-centric reusable data visualisation
EUXDAT-REQ-PLATF-032
Detection of abnormal sensor measurements
EUXDAT-REQ-PLATF-033
Use of high performance computing techniques to the processing of extremely huge amounts of data
EUXDAT-REQ-PLATF-034
Heterogeneous data aggregation and
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 24 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Requirement ID Requirement Name Data
For
mat
s Su
ppor
t
Alg
orith
ms
and
App
licat
ions
Mgt
. Da
ta M
gt. a
nd
Proc
essin
g Se
curit
y an
d Us
ers
Mgt
. Vi
sual
izatio
n an
d In
tera
ctio
n
Man
agem
ent o
f Re
sour
ces
Extre
me
Data
A
naly
tics
as a
Ser
vice
normalization
EUXDAT-REQ-PLATF-035
Verification of data integrity and veracity
EUXDAT-REQ-PLATF-036
Support for structured, semi-structured and un-structured data
EUXDAT-REQ-PLATF-037
Provision of RESTful interfaces for accessing processing capabilities of EUXDAT platform
EUXDAT-REQ-PLATF-038
Use of containerization solutions for implementation and deployment of processing algorithms
EUXDAT-REQ-PLATF-039
Provision of Data and Processes Catalogue and Marketplace
EUXDAT-REQ-PLATF-040
Data ingestion and caching in the platform
EUXDAT-REQ-PLATF-041
EUXDAT shall provide an orchestration mechanism that will allow sending tasks to the underlying infrastructure in a transparent way to EUXDAT users
EUXDAT-REQ-PLATF-042
EUXDAT shall provide a web development frontend which will facilitate developers and data processing expert users
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 25 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Requirement ID Requirement Name Data
For
mat
s Su
ppor
t
Alg
orith
ms
and
App
licat
ions
Mgt
. Da
ta M
gt. a
nd
Proc
essin
g Se
curit
y an
d Us
ers
Mgt
. Vi
sual
izatio
n an
d In
tera
ctio
n
Man
agem
ent o
f Re
sour
ces
Extre
me
Data
A
naly
tics
as a
Ser
vice
preparing, testing and deploying their algorithms in the platform, as well as publishing them as new services.
EUXDAT-REQ-PLATF-043
EUXDAT General Frontend
EUXDAT-REQ-PLATF-044
EUXDAT Pilot Application Frontend
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 26 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
4. EUXDAT Architecture This section introduces the high-level architecture of the EUXDAT e-Infrastructure, based on the defined features. It also describes the interactions among high level component in order to implement some of the most important features.
4.1 High Level Architecture
In D2.2 [6], we already defined a high-level architecture which was representing the components that would implement the identified features, extracted from the requirements. Such architecture, although still valid from the technical perspective, required a minor modification, so it would be possible to cover all the functionalities we need EUXDAT to provide.
Figure 1: EUXDAT High Level Architecture
The only difference with respect to the previous version is the addition of a new high-level component, the ‘Billing & Accounting’, since, at some point, if we want EUXDAT to be sustainable, we will need such feature to be in place.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 27 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
As explained in D2.2 [6], the different colours indicate the nature of the components. The green one is used for the component related to the web interfaces, blue is for security-related components, yellow for data-related components and red for resources management related components.
4.2 Main Actors
The list of main actors for the e-Infrastructure has not changed too much since the definition in D2.2 [6]. These are the roles that we envisage for the EUXDAT e-Infrastructure:
• Administrator: It remains as defined originally. It has access to all the features and is in charge of the right operation of the e-Infrastructure, managing users’ accounts and configurations.
• Application Service Providers (ASPs): This role represents those who are providing content to the e-Infrastructure, meaning that they will upload and publish applications, algorithms and data. They will also be able to access certain monitoring information related to their creations. ASPs could potentially be from many different domains, such as Agriculture, Precision Farming, Telemetry Services Providers, Robotics… etc.
• Developer: Developers are similar to ASPs, in the sense that they provide content to EUXDAT, but with the difference that they use EUXDAT tools (such as the Notebooks) in order to develop, store, test and publish their creations.
• End User: As defined originally, this kind of user only navigates through the data and run applications, not having access to publication mechanisms.
4.3 Main Components
The main components identified are the following:
• EUXDAT Portal: It represents the main interface of EUXDAT, where it is possible to access all the features through different GUIs, which connect to the backend;
• Identity and Authorization Manager: It manages users’ accounts, controlling credential, access policies, etc.;
• Data & Algorithms Catalogue: It represents a record of the applications, datasets and tools that can be used in EUXDAT;
• Data & Algorithms Repository: This component deals with the storage of the main elements involved in EUXDAT: datasets, code, maps, etc., to be used by the users;
• Data Manager: This component takes care of moving/copying data as required, through the adequate APIs and connectors, supporting a good set of heterogeneous data sources;
• SLA Manager: It determines the quality attributes to consider and it is continuously checking that the agreements are fulfilled, by retrieving the corresponding monitoring information;
• Orchestrator: It carries out the management of HPC and Cloud resources, by selecting the optimal combination and running the tasks defined in the workflows;
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 28 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
• Monitoring: This component collects information about the resources available and about the applications executed;
• Billing & Accounting: It gathers information about resources/software usage, translates this information into users’ cost and generates invoices whenever possible.
4.4 High Level Interactions
In D2.2 [6], we already defined how the components of the high-level architecture interact in order to implement some of the features. We defined how this works for ‘Moving large data in EUXDAT’, ‘Defining a new data analysis in EUXDAT’ and ‘Running a data analysis in EUXDAT’. While the minor changes in the high-level architecture required also to change the diagram defined for the last feature listed, the other two remain valid as defined originally. We also added two new diagrams.
Running a data analysis in EUXDAT
In D2.2 [6], the sequence diagram for running a data analysis was presented. Here we extend it to include the Billing & Accounting component introduced in section 4.1.
Figure 2: Running Data Analysis Sequence Diagram
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 29 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
For the textual explanation of Figure 2, please refer to D2.2 [6]. Here we have added a further step after the execution has terminated and the Monitoring component has communicated this to the Orchestrator. The Monitoring component will send the usage information to the Accounting component for billing purposes.
Register User
When a User, who has not yet signed up, wants to access the e-Infrastructure, they need to follow the registration procedure:
1. Access the I&A Manager via the EUXDAT Portal and fill in the registration form; 2. The system will send an automated email to the address provided in the form; 3. The prospective new user will have a time limit to respond to this email and acknowledge
their desire to register with the platform by clicking on a link. Once they have clicked on the link, the system will store this data in the user base.
Figure 3: User Registration Sequence Diagram
Visualize analysis in custom GUI
In section 4.4.1, the sequence diagram for running a data analysis is depicted and explained. We left out the visualization of the data by the user which we discuss here. In Figure 6 we show the sequence diagram for a specific case where a user wants to execute an analysis on a specific area of a map provided by Mundi. The user can draw a polygon on a map and the custom GUI will send the coordinates to Mundi. Mundi finds the available dates and times for which the selected area has data and presents the user with a list of timestamps to choose from. The user selects one and sends this to the system. The “Run Data Analysis” box represents the sequence diagram in Figure 2, i.e. all the EUXDAT components in that
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 30 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
figure and their interactions. The end result of the analysis is a list of coordinates and other parameter values in the JSON format, sent to Mapserver which can render a user-friendly map which is displayed to the user. The Custom GUI can allow alternative views of the same data, which will allow the user to select this alternative view and have it rendered in the Custom GUI, via Mapserver.
Figure 4: Visualize Analysis in Custom GUI Sequence Diagram
4.5 Development Priorities and Roadmap
The EUXDAT e-Infrastructure has already released its first version (in M12), in which some of the expected features were already implemented. Still, there are a lot of features to implement and many of them in progress. Therefore, we have updated the roadmap initially defined in D2.2 [6]. Initially, the consortium planned several features to be implemented in v1 of the e-Infrastructure. In the end, these are the features available:
• Initial version of the Orchestrator, able to run tasks in the project infrastructures;
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 31 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
• Initial version of the repository for code, datasets and images; • Python notebooks able to launch data analyses; • Set up the I&A Manager, so it will be possible to manage users; • TOSCA templates/examples for launching python code in Kubernetes Pods via Cloudify.
The following features are in progress (some of them belonging to the plans for v1 and others to the plans for v2):
• Deploy a marketplace, which can be used for publishing applications; • Improved version of the Orchestrator, using Cloud + HPC infrastructures and with a simple
algorithm for providers selection; • Enable more monitoring metrics, being able to retrieve information for creating application
profiles (i.e. resources used); • First version of the Data Manager, able to move data using several infrastructures and
protocols (at least GridFTP or similar); • Complete D&A Catalogue, for the datasets and applications/algorithms; • TOSCA templates/examples for Kubernetes Services via Cloudify; • Availability of a catalogue for datasets, so it will be possible to publish and retrieve metadata
(Open Micka); • Initial version of API documentation and developer documentation for ASPs.
Therefore, additionally to those features in progress, the new plans for v2 of the EUXDAT e-Infrastructure are:
• EUXDAT Portal, improving the tool for launching data analytics, with users management, with custom interfaces for scenarios and including a marketplace and the link with the D&A Catalogue for searching and accessing information;
• Set up the Monitoring infrastructure and take some simple metrics from the resource providers;
• Complete D&A Catalogue, for the datasets and applications/algorithms. Finally, in the case of the release v3 (M32), the proposed features for implementation are:
• Final version of the EUXDAT Portal, with the complete version of the tool for launching data analytics and integrating community tools (i.e. forums) and monitoring interfaces;
• Populated D&A Catalogue and Repository; • Improved Orchestrator, able to generate profiles and to use them to allocate resources; • Complete Data Manager, including the datasets evaluation mechanism for improving data
movement; • Complete list of monitoring probes (i.e. add metrics from the applications running); • Complete API documentation and developer documentation for ASPs.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 32 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
5. Detailed Design of Main Components This section provides a deeper view about the high-level components identified, giving an idea of their internal composition. It is not the purpose of this document to enter into the details of the implementation of each high-level component, since that will be defined in WP3 and WP4 which, later on, will also implement the components. It is important to highlight that the proposed diagrams include not only the subcomponents that we identify, but also how these are related to other high-level components, in order to specify which parts are expected to interact.
5.1 EUXDAT Portal
As described in D2.2 [6], the EUXDAT Portal is the main entry point, acting as one-stop-shop for the e-Infrastructure. The EUXDAT Frontend is in charge of linking to the different interfaces that provide those features that EUXDAT makes available for the stakeholders. As in the original version, there are several components that remain the same in the picture: EUXDAT Frontend, Monitoring Interface, Data Browser, Marketplace, Users Manager, Support Forums and Data Analytics Launcher. D2.2 provides more details about these pieces that were identified in the first iteration.
Figure 5: EUXDAT Portal High Level Architecture
The new pieces that have been identified are the “Developers’ Notebooks” and the “Custom GUIs”, since we have realized that they would be necessary for implementing certain capabilities. In the first case, since there are developers who implement their algorithms and want to test it through a friendly interface, EUXDAT will allow them to launch their code through the Orchestrator while, at the same
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 33 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
time, they will have the capability to save their implemented code in the repository, so it can be reused or an application can be directly created. In the second case, some pilots and scenarios requested to implement customized interfaces, since they would fit better with the application to run. Therefore, we plan to implement a mechanism in which the custom interfaces implemented would be linked from the EUXDAT Frontend, so they can work like the Data Analytics Launcher component (with the different of this one being generic).
5.2 Identity and Authorization Manager
This high-level component was also defined in D2.2 [6], as part of the original high level architecture. It is in charge of managing users’ accounts and all the information associated to them. As explained originally, other components contact with this one in order to check the validity of the provided credentials and to grant access to certain features according to the credentials used.
The only change done to this component is related to the removal of LDAP as repository for storing the credentials. The implementation of the component did not require to use LDAP and, therefore, that part has been removed, so this architecture will be coherent with the current implementation.
Figure 6: I&A Manager High Level Architecture
Even without the LDAP, this component keeps enabling the single sign-on feature, and no further changes are envisaged.
5.3 Data and Algorithms Catalogue
As described in D2.2 [6], the Data & Algorithms Catalogue is in charge of organizing and maintaining the catalogue of applications/algorithms and datasets, together with their relevant metadata. This component is composed of two main modules: the Data Catalogue implemented with Open Micka (ref: http://micka.bnhelp.cz/) and the Marketplace with zen cart (ref: https://www.zen-cart.com/). Open Micka is an open source web application with a focus on the management of geospatial metadata. Zen cart is a widely used open source marketplace with an active developer and
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 34 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
user community. More details about Open Micka and zen cart are presented in D3.1 Detailed Specification of the End Users’ Platform v1 [8].
Figure 7: D&A Catalogue High Level Architecture
The high-level architecture of the D&A Catalogue is not affected by the specific technologies chosen, neither by the modifications in the overall system mentioned in other parts of this document. Figure Figure 7 is thus included exactly as it was presented in D2.2.
5.4 Data and Algorithms Repository
Originally (as described in D2.2 [6]), the Data and Algorithms Repository had two main parts for storage: datasets to be used and code as implementation of the tools and applications (that was also including container images). Now, we have included another separated part: a repository for maps to be visualized. Although it can be considered as part of the datasets, we have set up a concrete repository for storing the map images and the different layers that can be shown with them, modifying the architecture of the component as a result.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 35 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Figure 8: D&A Repository High Level Architecture
This change has been introduced because, technically speaking, it is more efficient to keep this component as support to the visualization. Once the results of an application are ready, this piece (the ‘Maps Repository’) retrieves the maps and layers to be visualized, acting as a kind of intermediator between the frontend and the storage. In the case of EUXDAT, this is implemented by using MapServer (together with other solutions such as Mapnik). The rest of the component remains the same, so we keep a repository for data (in general) and another one for the source code, linked to the CI and CD mechanisms and to the registry of containers.
5.5 Data Manager
The Data Manager was already defined in D2.2 [6] with enough detail. In this case, we have added a modification related to the connection to the D&A Catalogue. Initially, we expected the ‘Data Mover’ component to be accessing the D&A Catalogue in order to retrieve metadata and find the location of a concrete dataset to be moved. In reality, the implementation of components such as the ‘Data Mover’ and the ‘Data Storage Connector’ can be done with Rucio, a rather new software package which is able to move very large amounts of data and that supports a lot of storage solutions.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 36 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Figure 9: Data Manager High Level Architecture
The main issue is that, effectively, even if it is possible to define policies, perform several storage operations (even in distributed environments) and to connect to multiple storages (Amazon S3, Google, GridFTP, etc.), it is necessary to provide already the location (and other information) of the source dataset. Therefore, we have added the ‘Catalogue Connector’, which is a client that will connect to the D&A Catalogue to retrieve the needed metadata when the Orchestrator is going to request the ‘Data Mover’ to perform any operation. With such component, we complete the functionality which is missing, so the implementation is as expected.
5.6 SLA Manager
The SLA Manager component, in charge of the Service Level Agreement negotiation and honouring of the contract agreed, was presented in D2.2 [6]. The high-level architecture diagram is repeated here for completeness.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 37 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Figure 10: SLA Manager High Level Architecture
As explained originally, it has an interfacing component for the negotiation and access part (SLA Negotiator), a solution for storing SLAs (SLAs Repository) and a component for monitoring that agreements are fulfilled (SLAs Monitor).
5.7 Orchestrator
The Orchestrator, sitting in the heart the EUXDAT e-Infrastructure was also introduced in D2.2 [6]. Its high-level architecture has not been affected by other modifications and extensions introduced in this document. Figure 11 thus depicts the Orchestrator’s internal sub-components and their interactions between them and with other EUXDAT components and is repeated as is from D2.2.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 38 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Figure 11: Orchestrator High Level Architecture
As defined originally, it has an interfacing component (Orchestrator Interface), a component for connecting with different HPC and Cloud solutions (Infrastructure Connectors), another component keeping profiles of the applications and libraries to run (Profiles Manager), a component able to connect to several monitoring solutions (Monitoring Connector) and a central component which manages the workflow to be run, executing all the required tasks (Orchestration Engine).
5.8 Monitoring
The Monitoring component is responsible for providing:
• the Orchestrator with information regarding the current status of the system’s various resources, their CPU usage and availability, and other metrics
• the Portal with a visualization of these metrics, for administrators and users • information to the SLA Manager with regards to specific agreements, and • data to Billing & Accounting, which will allow this component to accurately calculate usage
of resources
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 39 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Figure 12: Monitoring High Level Architecture
The high-level Monitoring architecture introduced in D2.2 [6] has been augmented with the Billing & Accounting component and is depicted in Figure 12. The accompanying text in D2.2 explains further about this component, and still stands. The only clarification perhaps needed here is that Grafana will indeed be used for visualization purposes, and Prometheus has been selected as the Monitoring Collector. It is worth mentioning that with regards to accurately providing usage information, Prometheus (or any monitoring system we have considered), is not recommended for fine-grained accounting. Thus, a more sophisticated sub-module will possibly be required, to sit within the Monitoring Interface, or as an intermediate box which can convert monitoring metrics into accounting-ready data.
5.9 Billing & Accounting
The Billing & Accounting component is responsible for aggregating the usage of resources and services (data, algorithms, and computing resources) and handling customer payments. Figure 13 depicts the high-level architecture of the Billing & Accounting component. It is made up of the following modules:
• Accounting Interface: this is the point of access for the users. It may be incorporated into the EUXDAT Portal or be a separate view within the portal (or an iframe). The user will be able to visualize their usage of the various resources and any possible recurrent charges or services contracted from the platform.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 40 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
• Accounting Aggregator: this module gathers the usage data from the Monitoring component (pertaining to computing resources) and from the Portal (pertaining to algorithms and datasets). The Aggregator interacts with the Accounting Engine which does the actual calculations and communicates this information to the User Interface and the Billing module.
• Accounting Engine: the core engine of the Billing & Accounting component, this module will maintain the Accounting service running, monitor its modules and do most of the calculations. The Engine will hold the tables of charges and be responsible for balancing the books of the EUXDAT platform as a whole.
• Billing: the actual billing will be done through the Zen Cart eCommerce platform, the chosen technology for the EUXDAT Marketplace. Zen Cart already has payment gateways built in for PayPal, LinkPoint, YourPay and others, and can connect fairly easily to live payment gateway services for credit card payments, etc.
Figure 13: Billing & Accounting High-level Architecture
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 41 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
6. EUXDAT Deployment This section explains how the EUXDAT e-Infrastructure from an operations point of view, as well as how the building blocks of the platform, the components will be deployed, and how it will be possible to obtain an operative production environment alongside with a flexible development environment.
6.1 Deployment Infrastructure
Deployment
As already defined in D2.2 [6], the deployment infrastructure is divided into 3 major parts: the Portal environment, the Cloud backend and the HPC/HPDA backend. At the portal environment, all portal related components and global services run, e.g. Monitoring. On this level all components reside that are intended to steer workflows going through the EUXDAT platform. The actual computation is carried out either in a Cloud or HPC/HPDA environment, or in all of them, depending on the workload definition and how the orchestrator decides under consideration of current status, application profile, load and SLAs.
Figure 14: EUXDAT Deployment
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 42 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Besides the Cloud and HPC system a third system for big data workloads, an HPDA system, is being introduced as additional computation backend, and is physically located in the HPC environment, uses the same storage, but has a dedicated frontend. It is not explicitly illustrated in the figure above as the setup is identical as for HPC.
Stages
In addition to the general deployment setup, D2.2 [6] also introduced several platform stages considered necessary, in terms of development environment, integration environment and production environment. These deployment stages solely concern the different EUXDAT components, however, they do not concern the computation environments which can be considered to be in production.
Figure 15: EUXDAT Deployment Stages
As defined originally, there are three stages, first one is the development environment where developers deploy and test their components during development phases, in this environment simple test data sets will serve as input to process. As soon as a component can be considered stable and working by the developer, it is staged to the integration stage where the interaction with other components of the platform is tested with real data. Components that passed QA successfully are staged into the production environment. All three stages use the same Cloud and HPC backend.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 43 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
API Development
The latest changes of the development workflow, in order to fully utilize the 3 stages for the API development as well, are about the actual remote server to be queried by the PyNotebooks. This is a new feature to be included and which also affects the deployment of the e-Infrastructure. An environment variable $REMOTE_HOST has been foreseen to control on which stage (development, integration, production) the RESTful API query will be sent to.
Figure 16: EUXDAT API Development
Services
For the deployment, we keep proposing the solution mentioned in D2.2 [6]. There is a git review tool (gerrit) and code repository (gitlab), as well as a continuous integration tool (Jenkins). These tools in combination provide the required services for an automated management of commits and their deployment onto the corresponding stage.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 44 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
Figure 17: EUXDAT Deployment Services
The process is the same as described in D2.2. In a first step an update is committed to gerrit, where the code review takes place. Then, the code is merged and stored in the gitlab repository. Jenkins is triggered each time a commit reaches the repository and the code is deployed on the corresponding stage (development, integration, production). Further, Jenkins is able to enable automated testing by executing regression test suites automatically after component deployment. These testing provide logs for investigation in case of issues. The latest enhancements foresee to couple the development and integration stages with Kubernetes, in order to enable it to deploy docker containers hosting EUXDAT Portal components. And make use of Jenkins to steer deployments on the integration stage. Additionally, Jenkins is intended to take over the build process of (central) components as well as for application code to be executed on backend resources. The output of Jenkins will be docker containers for the cloud hosting and computation environment and native binaries for the HPC/HPDA.
6.2 Components Deployment
The deployment of the components described in this document is the same as the solution proposed in D2.2 [6]. Every component will be seen as a docker micro service, by deploying them in VMs. We are using containers with Kubernetes for managing them correctly. We still need to think about the best way to do the deployment, taking into account how each component works, by analysing the components behaviour and performance. In the ideal case, each central component (that might require more resources than others) will be deployed in one VM, but we can deploy some components together, in order to save resources and
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 45 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
minimize communication during their interaction. This will have a direct impact on the operational cost of the e-Infrastructure. As exemplified in D2.2, the SLA Manager, Orchestrator and Monitoring could be deployed together or, at least, in the same physical machine, since they will interact closely and the SLA Manager and Orchestrator are not expected to be active continuously. The D&A Repository and D&A Catalogue could be located together in a VM as well, providing enough storage capability, so the repository will have enough capacity. As the D&A Catalogue reflects what is contained in the repository, and they collaborate closely. The Data Manager (based on Rucio) could be deployed next to them as well, due to its relationship with the repository. In the case of the I&A Manager, it has been deployed alone, although it might be deployed with another one. For instance, it could be deployed together with the EUXDAT Portal (taking into account the users’ management feature), although this component may need scalability capabilities in case a lot of users try to access the Portal at the same time (taking into account that it will have a web server and other tools). We still need to finalize the selection of all the tools to be used for each component and we also need to understand how the selected ones perform and scale up/down. This means that the current configuration may not be optimal. Therefore, the deployment will be analysed again once there is a new version of the detailed design and its implementation. In addition to the components the applications for processing data on the computation backends require due to the nature of the different environments another approach. While in the cloud docker containers can be deployed, the HPC and HPDA systems require the compilation of native binaries. Developers steer the deployment of their components while under development manually, by the help of Kubernetes. Jenkins also uses Kubernetes to update components on development and integration stages each time there is a new stable version becomes available and containers and/or binaries have been built.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 46 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
7. Conclusions Based on the previous high-level architecture, on the updated requirements and on the experience collected during the implementation of the first version of the EUXDAT e-Infrastructure, this document presents an update of the EUXDAT high-level architecture and features. The document analyses the changes identified in the collected requirements, highlighting whether new features had to be added. Again, such top down approach has shown that the features we had identified in the first version are still valid, and only a few features were included. Most of the new requirements were already covered with the originally proposed features, showing that the first design was already very complete. The high-level architecture has been modified, adding a new module for accounting and billing, showing that the original architecture was very complete, but flexible enough to include new features, as needed. The proposed changes required also modifications in one of the diagrams that was defining the interactions among components. Also, new diagrams (interactions) were proposed, so it is clearer how the components behave for different features. The whole picture is much clearer, so it has been possible also to update the high-level design of the components, introducing the required changes and also taking into account the tools we already selected for implementing them. Designs are now much more in line with the implementation and with the real possibilities of EUXDAT. The deployment solution has been also updated, especially taking into account the approach proposed in the context of WP3. In any case, the designs and the deployment are only the base for further development in the context of WP3 and WP4. Finally, there is still room for improvement and adaptation depending on the next implementation and the arrival of new requirements (if any). The architecture already demonstrated its flexibility and completeness, so it will be possible to perform any required adaptation without requiring major changes.
Document name: D2.4 EUXDAT e-Infrastructure Definition v2 Page: 47 of 47
Reference: D2.4 Dissemination: PU Version: 1.0 Status: Final
8. References [1] EUXDAT; “D2.1 Description of Proposed Pilots and Requirements”; Jedlička, Karel et al; 2018.
[2] European e-Infrastructure for Extreme Data Analytics in Sustainable Development (EUXDAT). Grant Agreement. Nieto, Francisco Javier. 2017.
[3] SDI4Apps; Open Land Use Map; http://sdi4apps.eu/open_land_use/; retrieved 2019-02-25
[4] Copernicus; Copernicus Data Access; http://copernicus.eu/data-access; retrieved 2019-02-25
[5] OASIS; TOSCA Simple Profile in YAML Version 1.1; http://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.1/TOSCA-Simple-Profile-YAML-v1.1.html; 30th January 2018; retrieved 2018-05-25
[6] EUXDAT; “D2.2 EUXDAT e-Infrastructure Definition”; Nieto, F. Javier et al; 2018.
[7] EUXDAT; “D2.3 Updated Report on e-Infrastructure Requirements v1”; Jedlička, Karel et al; 2018.
[8] EUXDAT; “D3.1 Detailed Specification of the End Users’ Platform v1”; Castel, Fabien et al; 2018.