15
www.d4science.or g D4SCIENCE DATA INFRASTRUCTURE Facilitator for a FAIR data management Pasquale Pagano CNR – ISTI (Pisa, Italy)

D4Science Data infrastructure: a facilitator for a FAIR data management

Embed Size (px)

Citation preview

Page 1: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.org

D4SCIENCE DATA INFRASTRUCTURE Facilitator for a FAIR data management

Pasquale PaganoCNR – ISTI (Pisa, Italy)

Page 2: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.orgD4Science: Facilitator for a FAIR data management 2

Outline

Context

Requirements

Virtual Research Environments

Dealing with complexity

FAIR principles

Conclusions

Page 3: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.orgD4Science: Facilitator for a FAIR data management 3

D4Science is an hybrid data infrastructuretechnologies integrated to provide

elastic access and usage of data and data-management capabilities• +55 VREs hosted• +2500 scientists in 44 countries• +50 data providers• +25,000 derivative data/month• over a billion quality records • +20,000 temporal datasets• +50,000 spatial datasets • 99.7% service availability

Humanities and Cultural Heritage

Social Mining

Environmental Studies

Biological and Ecological Studies

Page 4: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.org

are multidisciplinary, involve members belonging to diverse organisations

cannot rely on costly environments managed by dedicated organizations

require to access data and services that are spread among many providers

Communities’ needs

D4Science: Facilitator for a FAIR data management 4

cost and time required to implement this approach largely exceed the available capacities

Not individual researchers but group of researchers

dynamically aggregated to address research questions/problems

build and operate their own supporting environments

wish to effectively inject open science in daily tasks

Page 5: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.org

Requirements for IT systems

Support collaborative research and experimentation

Implement Reproducibility-Repeatability-Reusability

Allow sharing data and findings

Grant open access to produced scientific knowledge and data

Tackle simplified access to existing computing and storage resources

Ensure low operational and maintenance costs

Manage heterogeneous data access policies

D4Science: Facilitator for a FAIR data management 5

Page 6: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.org

Virtual Research Environment

An operational environment

Where set of resources (data, services, computational, and storage resources)

are assigned to group of users via interfaces

for a limited timeframe

L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12

Created on demand

Regulated by tailored policies

No cost for the resource providers

Open to host and operate custom software

D4Science: Facilitator for a FAIR data management 6

Page 7: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.org

D4Science Geospatial Interpolation

In situ observations from Copernicus Marine Environment Monitoring Service

Interpolation service SeaDataNet Data-

Interpolating Variational Analysis service (DIVA)

Estimates global, uniform distributions of environmental parameters from scattered observations

Exploit the global estimate and run niche modelling to calculate a species distribution

Page 8: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.org

WPS

REST

Geospatial data infra.

Work--space

WMSWCSGeoTiffNetCDFOPeNDAP

VRE

Data preparation+

Comp. parametersNetCDF file

Provenance Metadata(Prov-O)

Out. file

Sharing

Input

User

Other user

OGC StandardsVisualisation

Publication

VRE

The SeaDataNet-D4Science ConnectorArchitecture

Page 9: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.orgD4Science: Facilitator for a FAIR data management 9

• F1. globally unique and eternally persistent identifier

• F2. rich metadata• F3. indexed in a searchable

resource• F4. metadata specify the data

identifier

• A1 retrievable by their identifier using a standardized protocol

• A1.1 the protocol is open, free, and universally implementable

• A1.2 the protocol allows for an authentication and authorization procedure

• A2 metadata are accessible, even when the data are no longer available.

• I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation

• I2. (meta)data use vocabularies that follow FAIR principles

• I3. (meta)data include qualified references to other (meta)data.

• R1. meta(data) have a plurality of accurate and relevant attributes

• R1.1. (meta)data are released with a clear and accessible data usage license.

• R1.2. (meta)data are associated with their provenance.

• R1.3. (meta)data meet domain-relevant community standards.

Findable Accessible

InteroperableRe-usable

Page 10: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.orgD4Science: Facilitator for a FAIR data management 10

D4Science: Findability

Findability is enabled

• By extending the concept of resources to datasets, methods/algorithms, research objects, and services

• by assigning to each of the D4Science managed resources • a unique identifier• rich and extensible metadata (including attribution, provenance

and licence information)

• by publishing resources in tailored and global catalogues that supports keyword, faceted and temporal/geospatial discovery

Page 11: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.orgD4Science: Facilitator for a FAIR data management 11

D4Science: Accessibility

Accessibility is obtained

• by making shared and published resources available through multiple protocols in order to maximise the set of potential exploitation cases

• by providing also for transparent Authentication and Authorization, whenever the published resource requires it

• by enabling policies enforcement

Page 12: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.orgD4Science: Facilitator for a FAIR data management 12

D4Science: Interoperability

Interoperability is facilitated

• by enriching automatically the resources with metadata in multiple formats • including ISO 19115, Darwin Core, Dublin Core, DCAT and

application profiles

• by promoting exploitation of ontologies and controlled vocabularies

Page 13: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.orgD4Science: Facilitator for a FAIR data management 13

D4Science: Reusability

Reusability is promoted

• by systematically endowing shared and published resources with • a clear licence governing their use/re-use • citation and attribution statements

• by systematically generating provenance metadata

• by design allowing the execution of the experiment in the same technical and contextual environment

Page 14: D4Science Data infrastructure: a facilitator for a FAIR data management

www.d4science.org

D4Science enacts FAIR because …

Embrace as-a-Service approach Exploit communication standards Hide complexity of computational capabilities Enable Access via VRE governed by tailored policies Facilitate provenance and attribution management Implement economy-of-scale and costs reduction Promote collaboration and sharing Enable Re-usability