20
Efficient and effective : can we combine both to realize high-value, open, scalable, multi- disciplinary data and compute infrastructures? RIA-653549 Davide Salomoni INDIGO-DataCloud Project Coordinator d [email protected] FAIR data management, RDA National Event Firenze, 14-15 November 2016

Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Embed Size (px)

Citation preview

Page 1: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Efficient and effective: can we combine both to realize high-value, open, scalable, multi-

disciplinary data and compute infrastructures?

RIA-653549

Davide SalomoniINDIGO-DataCloud Project Coordinator

[email protected]

FAIR data management, RDA National EventFirenze, 14-15 November 2016

Page 2: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Efficient and Effective

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 2

Page 3: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Something is still missing in the Cloud world…

Source: http://goo.gl/wT8XEq

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 3

was

Page 4: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

What are the main missing points?

• Open interoperation / federation across (proprietary) CLOUD solutions at

• IaaS,• PaaS,• and SaaS levels

• Managing multitenancy• At large scale…• … and in heterogeneous environments

• Dynamic and seamless elasticity• For both private and public cloud…• … and for complex or infrequent requirements

• Data management in a Cloud environment• Due to technical…• … as well as to legal problems

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 4

Filling these gaps should lead to:

• Interoperable PaaS/SaaS services addressing both public and private Cloud infrastructures.

• Migration of legacy applications to the Cloud.

• Increased focus on user-oriented, high-value solutions.

Source:https://goo.gl/cWZhKN

Page 5: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

INDIGO-DataCloud(INtegrating Distributed data Infrastructures for Global ExplOitation)

• An H2020 project approved in the EINFRA-1-2014 call• 11.1M€, 30 months (from April 2015 to September 2017)

• Who: 26 European partners in 11 European countries• Coordination by INFN (Italian National Inst. for Nuclear Physics)• Including developers of distributed software, industrial partners,

research institutes, universities, e-infrastructures

• What: develop an open source Cloud platform for computing and data (“DataCloud”), tailored to science.

• Where: deployable on hybrid (public or private) Cloud infrastructures

• For: multi-disciplinary scientific communities• E.g. structural biology, earth science, physics, bioinformatics, cultural

heritage, astrophysics, life science, climatology.

• Why: to answer to the technological needs of scientists seeking to easily and efficiently exploit distributed compute and data resources.

5Efficient and Effective: INDIGO-DataCloud14-15/11/2016

Page 6: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

INDIGO-DataCloud Positioning• INDIGO aims to:

1. Develop open, interoperable solutions for scientific data.

2. Support open science organizing the European data space.

3. Enable collaborations across diverse scientific communities worldwide.

• INDIGO offers its architecture, analysis, expertise and software components as a concrete step toward the definition and implementation of a European Open Science Cloud and Data Infrastructure.

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 6

Publicly funded e-infrastructures(EGI, EUDAT, GEANT, PRACE, RI,

etc.)

Private or CommercialClouds (Public, PCP-based,

etc.)

Scientific Users

Adopt, Use

Deployed on

Exploiting

To produce

Scientific Results

INDIGO Advanced Components and Solutions

Datasets, Resources

Page 7: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

The INDIGO Foundations

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 7

Put Users First

Exploit Software Development

Know-how

Fill Technology Gaps

Validate through

Concrete Use Cases

Extend and Reuse Open

Source Software

Be Multidisciplinary

, Standards-based

Page 8: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Put Users First

• Requirements come from research communities• “The proposal is oriented to support the use of different e-

infrastructures by a wide-range of scientific communities, and aims to address a wide range of challenging requirements posed by leading-edge research activities” (From the DoW)

• We gathered use cases from many scientific communities.

• LifeWatch, EuroBioImaging, INSTRUCT, LBT, CTA, WeNMR, ENES, eCulture Science Gateway, ELIXIR, EMSO, DARIAH, WLCG.

• We grouped ~100 distinct requirements into 3 categories: Computational requirements, Storage requirements, Requirements on infrastructures, and associated each one with a ranking (mandatory / convenient / optional).

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 8

From Deliverable D2.1

Page 9: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Translating requirements into concrete solutions:From the architecture…

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 9

This is the INDIGO-DataCloud General Architecture*

*: see details in http://arxiv.org/abs/1603.09536 or in https://www.indigo-datacloud.eu/documents-deliverables

Page 10: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

… to the implementation…

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 10

This is our software improvement cycle and the integration / release / software quality processes

Page 11: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

… to INDIGO Releases…

Releasing software components implementing the INDIGO architecture and providing concrete solutions to the requirements of scientific communities is the primary goal of the project.

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 11

See https://www.indigo-datacloud.eu/communication-kit

Page 12: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

… and results.

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 12

Excerpt from an INDIGO Report detailing how scientific communities are implementing their own requirements into applications using INDIGO-DataCloud components.

From Deliverable D2.10

Page 13: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Four main “solution blocks”:• Data Center Solutions• Data / Storage

Solutions• Automated Solutions• User-Oriented

SolutionsAnd “common solutions”:• Authentication and

Authorization

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 13

Putting everything together:

Page 14: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Index of Services in INDIGO MidnightBlue

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 14

Page 15: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

INDIGO Components and Patches AlreadyMerged in Upstream Open Source Projects• OpenStack (https://www.openstack.org)

• Nova Docker• Heat• OpenID-Connect for Keystone• Pre-emptible instances support (under

discussion)

• OpenNebula (http://opennebula.org) • OneDock

• Infrastructure Manager (http://www.grycap.upv.es/im/index.php)

• Clues (http://www.grycap.upv.es/clues/eng/index.php)

• Onedata (https://onedata.org)

• TOSCA adaptor for JSAGA (http://software.in2p3.fr/jsaga/dev/)

• OCCI implementation for OpenStack (https://github.com/openstack/ooi)

• Extended AWS support for rOCCI in OpenNebula. Python and Java libraries for OCCI support.

• CDMI and QoS extensions for dCache (https://www.dcache.org)

• Workflow interface extensions for Ophidia (http://ophidia.cmcc.it)

• OpenID Connect Java implementation for dCache (https://www.dcache.org)

• MitreID (https://mitreid.org/) and OpenID Connect (http://openid.net/connect/) libraries

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 15

Page 16: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

On Data Ingestion and Data Management

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 16

• Constantly align the vision of the different research communities with current recommendations, in particular of the Research Data Alliance (RDA).

• The exploitation of INDIGO-DataCloud solutions requires a careful consideration of data management issues along the full data life cycle to prepare proper Data Management Plans (DMP).

• There is certainly the need for further work to inform the different Research Communities of current recommendations on data management, the need to carefully take them into account, and to further detail those data management needs as requirements to software developers.

• Most of the initial requirements have been already satisfied in the INDIGO MidnightBlue release! However, more work is needed in many areas.

6 INDIGO-related proposals submitted to the RDA open call for collaboration

Page 17: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Data Life Cycle

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 17

Plan •Tool•Deploy and StoreCollect •Store raw data

•Manage raw data

Curate •Filtering•ConversionsAnalyze

•Get derived values•Monitor•Run models, etc.

Publish•Findable•Accessible•Interoperable•ReusablePreserve

Page 18: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Ingested Data in the Life Cycle scheme(vastly simplified)

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 18

For more details: D2.11, https://owncloud.indigo-datacloud.eu/index.php/s/lLNAczJNBNLmLLG

Page 19: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Conclusions• It is often complicated to combine efficient and effective solutions when trying to

exploit distributed data/compute resources.• INDIGO-DataCloud has defined and developed a comprehensive open architecture

to handle distributed data and workloads, extending open source products. • It has already released a novel and rich set of components, that multiple research

communities are adopting for the deployment of scientific applications on hybrid Grid/Cloud infrastructures.

• INDIGO-DataCloud will now focus on consolidating its software, adding requested new features, deploying it in production e-infrastructures and addressing exploitation through concrete links to commercial companies, to other projects or organizations and to current / upcoming EU calls.

• You are all welcome to contribute and share your views and requirements!

14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 19

Page 20: Efficient and effective: can we combine both to realize high-value, open, scalable, multi-disciplinary data and compute infrastructures?

Thank you

https://www.indigo-datacloud.euBetter Software for Better Science.

20Efficient and Effective: INDIGO-DataCloud14-15/11/2016

@indigodatacloud www.indigo-datacloud.eu https://www.facebook.com/indigodatacloud/