Upload
research-data-alliance
View
105
Download
0
Embed Size (px)
Citation preview
Efficient and effective: can we combine both to realize high-value, open, scalable, multi-
disciplinary data and compute infrastructures?
RIA-653549
Davide SalomoniINDIGO-DataCloud Project Coordinator
FAIR data management, RDA National EventFirenze, 14-15 November 2016
Efficient and Effective
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 2
Something is still missing in the Cloud world…
Source: http://goo.gl/wT8XEq
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 3
was
What are the main missing points?
• Open interoperation / federation across (proprietary) CLOUD solutions at
• IaaS,• PaaS,• and SaaS levels
• Managing multitenancy• At large scale…• … and in heterogeneous environments
• Dynamic and seamless elasticity• For both private and public cloud…• … and for complex or infrequent requirements
• Data management in a Cloud environment• Due to technical…• … as well as to legal problems
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 4
Filling these gaps should lead to:
• Interoperable PaaS/SaaS services addressing both public and private Cloud infrastructures.
• Migration of legacy applications to the Cloud.
• Increased focus on user-oriented, high-value solutions.
Source:https://goo.gl/cWZhKN
INDIGO-DataCloud(INtegrating Distributed data Infrastructures for Global ExplOitation)
• An H2020 project approved in the EINFRA-1-2014 call• 11.1M€, 30 months (from April 2015 to September 2017)
• Who: 26 European partners in 11 European countries• Coordination by INFN (Italian National Inst. for Nuclear Physics)• Including developers of distributed software, industrial partners,
research institutes, universities, e-infrastructures
• What: develop an open source Cloud platform for computing and data (“DataCloud”), tailored to science.
• Where: deployable on hybrid (public or private) Cloud infrastructures
• For: multi-disciplinary scientific communities• E.g. structural biology, earth science, physics, bioinformatics, cultural
heritage, astrophysics, life science, climatology.
• Why: to answer to the technological needs of scientists seeking to easily and efficiently exploit distributed compute and data resources.
5Efficient and Effective: INDIGO-DataCloud14-15/11/2016
INDIGO-DataCloud Positioning• INDIGO aims to:
1. Develop open, interoperable solutions for scientific data.
2. Support open science organizing the European data space.
3. Enable collaborations across diverse scientific communities worldwide.
• INDIGO offers its architecture, analysis, expertise and software components as a concrete step toward the definition and implementation of a European Open Science Cloud and Data Infrastructure.
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 6
Publicly funded e-infrastructures(EGI, EUDAT, GEANT, PRACE, RI,
etc.)
Private or CommercialClouds (Public, PCP-based,
etc.)
Scientific Users
Adopt, Use
Deployed on
Exploiting
To produce
Scientific Results
INDIGO Advanced Components and Solutions
Datasets, Resources
The INDIGO Foundations
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 7
Put Users First
Exploit Software Development
Know-how
Fill Technology Gaps
Validate through
Concrete Use Cases
Extend and Reuse Open
Source Software
Be Multidisciplinary
, Standards-based
Put Users First
• Requirements come from research communities• “The proposal is oriented to support the use of different e-
infrastructures by a wide-range of scientific communities, and aims to address a wide range of challenging requirements posed by leading-edge research activities” (From the DoW)
• We gathered use cases from many scientific communities.
• LifeWatch, EuroBioImaging, INSTRUCT, LBT, CTA, WeNMR, ENES, eCulture Science Gateway, ELIXIR, EMSO, DARIAH, WLCG.
• We grouped ~100 distinct requirements into 3 categories: Computational requirements, Storage requirements, Requirements on infrastructures, and associated each one with a ranking (mandatory / convenient / optional).
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 8
From Deliverable D2.1
Translating requirements into concrete solutions:From the architecture…
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 9
This is the INDIGO-DataCloud General Architecture*
*: see details in http://arxiv.org/abs/1603.09536 or in https://www.indigo-datacloud.eu/documents-deliverables
… to the implementation…
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 10
This is our software improvement cycle and the integration / release / software quality processes
… to INDIGO Releases…
Releasing software components implementing the INDIGO architecture and providing concrete solutions to the requirements of scientific communities is the primary goal of the project.
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 11
See https://www.indigo-datacloud.eu/communication-kit
… and results.
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 12
Excerpt from an INDIGO Report detailing how scientific communities are implementing their own requirements into applications using INDIGO-DataCloud components.
From Deliverable D2.10
Four main “solution blocks”:• Data Center Solutions• Data / Storage
Solutions• Automated Solutions• User-Oriented
SolutionsAnd “common solutions”:• Authentication and
Authorization
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 13
Putting everything together:
Index of Services in INDIGO MidnightBlue
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 14
INDIGO Components and Patches AlreadyMerged in Upstream Open Source Projects• OpenStack (https://www.openstack.org)
• Nova Docker• Heat• OpenID-Connect for Keystone• Pre-emptible instances support (under
discussion)
• OpenNebula (http://opennebula.org) • OneDock
• Infrastructure Manager (http://www.grycap.upv.es/im/index.php)
• Clues (http://www.grycap.upv.es/clues/eng/index.php)
• Onedata (https://onedata.org)
• TOSCA adaptor for JSAGA (http://software.in2p3.fr/jsaga/dev/)
• OCCI implementation for OpenStack (https://github.com/openstack/ooi)
• Extended AWS support for rOCCI in OpenNebula. Python and Java libraries for OCCI support.
• CDMI and QoS extensions for dCache (https://www.dcache.org)
• Workflow interface extensions for Ophidia (http://ophidia.cmcc.it)
• OpenID Connect Java implementation for dCache (https://www.dcache.org)
• MitreID (https://mitreid.org/) and OpenID Connect (http://openid.net/connect/) libraries
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 15
On Data Ingestion and Data Management
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 16
• Constantly align the vision of the different research communities with current recommendations, in particular of the Research Data Alliance (RDA).
• The exploitation of INDIGO-DataCloud solutions requires a careful consideration of data management issues along the full data life cycle to prepare proper Data Management Plans (DMP).
• There is certainly the need for further work to inform the different Research Communities of current recommendations on data management, the need to carefully take them into account, and to further detail those data management needs as requirements to software developers.
• Most of the initial requirements have been already satisfied in the INDIGO MidnightBlue release! However, more work is needed in many areas.
6 INDIGO-related proposals submitted to the RDA open call for collaboration
Data Life Cycle
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 17
Plan •Tool•Deploy and StoreCollect •Store raw data
•Manage raw data
Curate •Filtering•ConversionsAnalyze
•Get derived values•Monitor•Run models, etc.
Publish•Findable•Accessible•Interoperable•ReusablePreserve
Ingested Data in the Life Cycle scheme(vastly simplified)
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 18
For more details: D2.11, https://owncloud.indigo-datacloud.eu/index.php/s/lLNAczJNBNLmLLG
Conclusions• It is often complicated to combine efficient and effective solutions when trying to
exploit distributed data/compute resources.• INDIGO-DataCloud has defined and developed a comprehensive open architecture
to handle distributed data and workloads, extending open source products. • It has already released a novel and rich set of components, that multiple research
communities are adopting for the deployment of scientific applications on hybrid Grid/Cloud infrastructures.
• INDIGO-DataCloud will now focus on consolidating its software, adding requested new features, deploying it in production e-infrastructures and addressing exploitation through concrete links to commercial companies, to other projects or organizations and to current / upcoming EU calls.
• You are all welcome to contribute and share your views and requirements!
14-15/11/2016 Efficient and Effective: INDIGO-DataCloud 19
Thank you
https://www.indigo-datacloud.euBetter Software for Better Science.
20Efficient and Effective: INDIGO-DataCloud14-15/11/2016
@indigodatacloud www.indigo-datacloud.eu https://www.facebook.com/indigodatacloud/