54
Enabling better science Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group Paolo Manghi [email protected] Institute of Information Science & Technologies “A. Faedo” National Research Council, Pisa, Italy

Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Embed Size (px)

Citation preview

Page 1: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling better scienceResults and vision of the OpenAIRE

infrastructure and RDA Data Publishing Working Group

Paolo [email protected]

Institute of Information Science & Technologies “A. Faedo”National Research Council, Pisa, Italy

Page 2: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

The research group

• http://nemis.isti.cnr.it/groups/infrascience• 27 among senior and junior researchers, sys

admins and PhD student (computer science & information engineering)

Data interoperability

Digital Library Foundations &

Management Systems

Enabling middleware for service-oriented

infrastructures

Enhanced publication &

compound object models

Virtual Research Environments

De-duplication ofinformation objects

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 3: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

The research group

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 4: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Modern Scholarly CommunicationResearch Infrastructures go beyond literature

Data-driven: Jim Gray’s fourth paradigm

software

experiment experiment

service

Dataset publishingData repositories

Scientific process publishing

Web-driven: immediate sharing and access to digital knowledge

Literature publishingInstitutional, thematic repositories

Publisher Journal repositoriesResearch

InfraResearch

Infra

MarketPlace

Page 5: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Modern Scholarly CommunicationPublishing beyond literature

Comprehensive scientific reward by

citation of any research outcome

Improved understanding of research outcome

Better research review-process [repeatability,

replicability, and reproducibility of

experiments - Goble, 2009]

Effective dissemination and re-use of valuable

research assets

Lower costs of science

Publication

Data

Scientificprocess

Methodological processes, executable workflows, piece of

software

Input/output to scientific process

Page 6: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Funders and projects• Funders are crucial to development of science• If public funders, they push for Open Access mandates• Funders require methodologies to monitor impact (ROI) and

adherence to mandates of projects they fund• Projects require the same tools to show off their production

Publication

DataProject

Funding

Scientificprocess

Page 7: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Scientific communication workflows for research products

• Publications: well established (PIDs, metadata, deposition, peer-review, citation, dissemination)• Research data: available for given communities (PIDs,

metadata, deposition?, peer-review?, citation? dissemination?)• Scientific process: almost inexistent and not supported

by scientific reward mechanisms

Deposition Peer-review Dissemination

Page 8: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Sharing/publishing research data: current solutions• Part of article: subset of research data embedded as figures or

tables• Additional material possibly submitted to the journal along with

the publication full-text• Independent from article: Data deposited and described at

dedicated locations• Data centres, Discipline databases, Thematic data repositories

• Linked to article: Data deposited and described at dedicated locations with link from/to article full-texts• Discipline-specific, data papers, deposition guidelines• e.g. DRYAD, PANGAEA, GigaDB• Enhanced publications, research objects

Page 9: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Sharing/publishing scientific process: current solutions• Part of article: piece of code is included in the paper or as

additional material possibly submitted to the journal along with the publication full-text

• Independent from article: software, VMs, workflows, Services, e-notebooks are deposited and described at dedicated locations• Software repositories, e.g. Github• Workflow repositories, e.g. myexperiment.org

• Linked to article: software deposited and described at dedicated locations with link from/to article full-texts• Software papers, but no real deposition guidelines• Enhanced publications, research objects

Page 10: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling Better ScienceMarket-place services in OpenAIRE and RDA

Page 11: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

The OpenAIRE infrastrucure

Page 12: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

• Establishment of an interoperable network of publication repositories • Deposition, discovery, linking and monitoring of research

products (articles, datasets, software) produced under National and EC funding• Monitoring the compliance to EC OA mandates for

publications• Support decision makers with statistics

OPEN ACCESS INFRASTRUCTUREFOR RESEARCH IN EUROPE

The point of reference for Open Access in Europe

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 13: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

The OpenAIRE infrastructureHuman Network e-infrastructure

• NOADS: National Open Access Desks

• Monitor and foster the adoption of Open Access policies at the local level

• Support researchers at the implementation of the Data Pilot

• e-infrastructure for monitoring impact of OA mandates and research projects

• OpenAIRE guidelines for metadata exchange

• Zenodo Repository for the deposition of research products

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 14: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Get support(NOADs)

Linked Content Statistics

Search & Browse

Feedback

Claim/deposit

Publications

& data

Research impact Citations, usage

statistics+++

Data repositories/aggr

egatorsData Journals

Metadata on data

Publication repositories/aggregator

sInstitutional & Thematic

Open Access Journals/Publishers

Usage dataMetadata

And pdfs

National funding

EC funding

Guidelines for use services

InstitutionalCRIS Systems

CERN/OpenAIRE “catch-all” repository

Guidelines for data interoperability

OpenDOAR

re3data

Validation Cleaning & Transformation De-duplication

Enrichment by metadata and

text mining

APIs

Page 15: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

OpenAIRE data model:view from the moon

Page 16: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

The OpenAIRE e-infrastructure: view from the moon

www.d-net.research-infrastructures.eu

Page 17: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

OpenAIRE e-infrastructure: hardware numbers Production• 44 CPU cores• 84 GB of RAM• 3,998 GB allocated disk

Mining Cluster• 14 servers• 98 CPU cores• 514 GB ram• 18,458 GB allocated diskData provision cluster• 15 servers• 90 CPU cores• 236 GB ram• 12,300 GB allocated disk

Page 18: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

OpenAIRE information space numbers (September 2015)http://www.openaire.eu

• 12M publications (de-duplicated)• 200,000 links publication-project from 5 funders • 9,000 datasets linked to publications or projects• 34,000 organizations (de-duplicated)

• Collected from:• 600+ “direct” data providers• 5,000+ “indirect” data providers (inherited from aggregators)• End-users…

Page 19: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

OpenAIRE information space numbers (September 2015)

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 20: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

OpenAIRE’shttp://www.zenodo.org• Zenodo repository (production)• Deposition of publications, datasets, software • DOI minting and metadata curation• Community support• Much more…• FREE

• Numbers• Publications 16,240• Datasets 1,477• Software 4,456• Other products 1,400+

Page 21: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

OpenAIRE partners and liaisons

SHARE

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 22: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Other OA initiatives: international collaborations

RDA-ANDS (Australia)

SHARE (United States)

La Referencia

(South America)

CAS (China)

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 23: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Sharing research products and context to enable better science in OpenAIRE

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

OpenAIRE 2009 - 2012

OpenAIRE Plus 2011 - 2014

OpenAIRE20202015 - 2018

EC Open Access mandate monitoring

European Grid Infra

(EGI)

Links among publications,

data and process

National funders

Links between

publications and data

By Donatella Castelli and Alessia Bardi, September 2015, bardi@

isti.cnr.it

Page 24: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Publications in OpenAIRE

• Publications acquisition policy• Open Access publications• Publications linked to a project whose funder is

supported by OpenAIRE• Publications are collected from literature repositories

and “claimed” by registered end-users• Metadata and full text (when Open Access or agreed with

publishers)

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

Page 25: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Funders and projects in OpenAIRE• Collects projects from the following funder sources• European Commission: FP7 and H2020• Wellcome Trust• FCT (Portugal)• NHRMC (Australia)• ARC (Australia)• Science Foundation Ireland (Ireland)• On the way: Croatian, Dutch, and American (NSF)

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

Page 26: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling better science: publications and funders• OpenAIRE guidelines for literature repositories• “How to describe” publications• “How to describe” projects• “How to put publications in context” with projects• Cooperation with SHARE (US), JISC (UK), La Referencia (South

America)• OpenAIRE Services• Offering access to project information by funder• Inferring links between articles and projects of any funders• Monitoring ROI/Open Access of any funders by project (and

more)

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

Page 27: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling better science: publications and funders• Literature Broker Service for Institutional Repositories

(deliver 2016)• Serving repository managers• Subscriptions based on configurable criteria of

publication-repository “closeness”

27

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

Page 28: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Research Data in OpenAIRE

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

• OpenAIRE research data acquisition Policy• Must be linked to OpenAIRE publications or to projects• No datasets identified by accession numbers• Dealt with as “external links”

• Datasets are collected from data archives and “claimed” by registered end-users

Page 29: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling better science: research data

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

• OpenAIRE guidelines for data archives• “How to describe” datasets (inspired by DataCite)• “How to put datasets in context” with projects

• OpenAIRE services• Inference of links to datasets from article full-text• Extraction of dataset-publication links from data

archives (e.g. PANGAEA, DataCite)

Page 30: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Research initiatives• OpenAIRE opens to “research initiatives” willing to • Monitor the productivity of the community in terms of

publications and datasets• Support the discovery of research made by peers in the same

community

Publication

DataProject

Funding Research Initiative

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

Page 31: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling better scienceResearch initiatives• OpenAIRE research initiatives• European Grid Infrastructure (EGI), concepts: EGI

Virtual Organizations and EGI disciplines• OpenAIRE services• Inference of links to research initiatives from

article full-texts• Monitoring ROI and Open Access w.r.t. relevant

“concepts” of a research activity

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

Page 32: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling better science: Scientific processesTo be defined• Scientific process acquisition policies• e.g. software, process (e.g. Taverna workflows), methods (e.g. e-

notebooks)• Collection strategies• From “process repositories”? E.g. myexperiments.org, GitHub

• Guidelines for “process repository managers”• OpenAIRE services• Monitoring ROI of projects in terms of processes!• Inference/extraction of article/process links?

Publications

EC funding

National funding

Research Data

Research Initiatives

ScientificProcess

Page 33: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Research Data Alliance (RDA)

Page 34: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Research Data Alliance (RDA)Publishing Data Services Working Group

• Forum funded by the Commission to propel the discussion among researchers and practitioners in the ambit of research data management

• Identification of common, cross-discipline problems and yield best practices, recommendations

• Organized in Interest Groups and Working Groups• Focus:• Publishing Data Interest Group: umbrella of WGs focusing on

enabling a stronger research data publishing infrastructure• Publishing Data Services Working Group: focusing on article-

datasets interlinking

Page 35: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Publishing Data Services Working GroupData-article links• Benefits of creating context by establishing data-article links• Increasing visibility and discoverability• Stimulating reuse and repeatability

• Key to make it worth it:• Infrastructural approach: linking needs to be done collectively, at

community (and cross-community) level, sharing procedures, policies and technologies

• Issues• No common framework for interlinking datasets and published

articles• Initiatives live in isolation and cannot be combined

Page 36: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Enabling better science: giving open access to article-dataset links

• Creating “an open, freely accessible, web based service that enables its users to identify datasets that are associated with a given article, and vice versa”

• The Service will serve as a flexible sandbox• Major scholarly communication stakeholders involved at

different levels• Feed authoritative links to the Service• Access links from the service• Feedback requirements, preferences, recommendations, obstacles

to refine/enhance the service

Page 37: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Harkan GruddSiddeswara GuruLaure Haak (ORCID)John Helly Francisco HernandezSimon HodsonRichard Kidd (RSc)Hylke Koers (Elsevier) – co-chairPaolo Manghi (OpenAire)Haralambos MarmanisCaroline MartinJo McEntyre (EMBL - EBI)Yolanda MelecoSheila MorrisseyLyubomir PenevMohan Ramamurthy

Howard Ratner Nigel Robinson (Thomson Reuters)Sergio Ruiz (DataCite)Uwe Schindler (PANGAEA)Johanna Schwarz (Springer)Martina StockhauseCarly Strasser Eefke Smit (STM)Jonathan TeddsJoachim WackerowJuanle WangHua XuEva Zanzerkia

Claire AustinDavid ArcturAmir Aryani (ANDS)Geoff Bilder (CrossRef)Timea BiroAdrian Burton (ANDS) - co-chairIan Bruno (CCDC)Sarah CallaghanDavid Carlson Jamus Collier (PANGAEA)Suenje Dallmeier-ThiessenTim DiLauro Ingrid DilloRorie EdmundsJanine FeldenCarol GobleJeffrey Grethe

PDS-WG Stakeholders

Page 38: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

The Data-Literature Service• A one-for-all service model infrastructure for the research

data publishing • Increase interoperability• Decrease systemic inefficiencies• Power new tools and functionalities to the benefit of researchers

Page 39: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Benefits

• For data repositories and journal publishers• Linking becomes more scalable and cheaper, ensuring

more visibility for data sources (and their “customers”)• For research institutes, bibliographic providers,

and funding bodies• Enables bibliographic services and productivity

assessment tools that track datasets and journal publications within a common framework

• For researchers• Sharing and accessing relevant articles and data easier,

more efficient and accurate, thereby increasing scientific reward and enhancing its practices.

Page 40: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

System development and operation: OpenAIRE and PANGAEA

Links collection

Harmonizing

PID resolution

De-duplicating

Information Space

Web Portal

Core Data Model

Data Sources

OAI-PMHSearch APIs

Examples:• Pairs of DOIs• DataCite records• PANGAEA records

OAI-PMHintersection

Page 41: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Information SpaceCore Data Model Schema

Page 42: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

The Service (BETA)http://dliservice.research-infrastructures.eu

Powered by: • OpenAIRE D-NET

software

• PANGAEA search engine

Page 43: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Some numbers• Close to 1 Million links and 2 Millions objects

Page 44: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Providers

Page 45: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

What’s intrinsically wrong?

Page 46: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Research products publishing workflow

Digital researchProducts (articles, data,

scientific process)

Repositories for literature, research

data, scientific process

Research e-infrastructure

Market-place services

By Donatella Castelli and Alessia Bardi, Massim

iliano Assante, September 2015, bardi@

isti.cnr.it

Page 47: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Research products publishing workflow

Reuse of products

Lack of context: no replication

Deposition

De-contextualisation

Staticity

Extra Cost

Quality Assessment

Inefficientpeer-review:

No repeatability

Dissemination

Fragmentationin thematic or typology silos

Lack of semantic

linking

Page 48: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Research activities• A research activity can be intended as the course of actions,

following a scientific method that leads to prove an initial thesis in order to bring novelty to a research field

• Every research activity builds upon and produces a wide array of research products

Publication

DataProject

Funding Research Initiative

Scientfiicprocess

ResearchInitiative

Time

Page 49: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Time for a change in scholarly communicationPublishing Research Activities• De facto the literature publishing workflow has been adapted

and adopted for other research products• “Elsewhere” and “on date” philosophy

• On the contrary, modern research conducted with the support of Research e-Infrastructures is• Strongly contextualized, intrinsically dynamic

• Research products should be published “in place” and “during” and together with research activities

• Research e-Infrastructures should evolve to support marketplace-like functionality

Page 50: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Science 2.0 Repositories

• Creation of research products in the Re-I is intercepted and the new products are published• Notify peers about

research activities and published products via research social networks• Foster continuous open

peer review

By Donatella Castelli and Alessia Bardi, Massim

iliano Assante, September 2015, bardi@

isti.cnr.it

Page 51: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

SciRepo publishing model benefits

Deposition

In context

Products remain “alive”

No Extra Cost

Alternative products

Quality Assessment

Continuous and in

context

Self-assessment

Dissemination

Unified

Automatic and

Complete

Deposition

De-contextualis

ation

Staticity

Extra Cost

Quality Assessment

Ineffective peer-

review

Dissemination

Fragmentation in

thematic or typology

silos

Lack of semantic

linking

Current Model Cons

SciRepo Model Pros

By Donatella Castelli and Alessia Bardi, Massim

iliano Assante, September 2015, bardi@

isti.cnr.it

Page 52: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

An example of SciRepo Research Activity web page

By Donatella Castelli and Alessia Bardi, Massim

iliano Assante, September 2015, bardi@

isti.cnr.it

Page 53: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

SciRepo in the real worldhttp://www.i-marine.eu

• The iMarine Research Infrastructure features part of the SciRepo social functionalities:• Applications running in the RI generate products that can be shared• Notifications of new research products via News Feed• Research products accessible in the context of the application that

generated them• More will be realized for the new BlueBridge project

By Donatella Castelli and Alessia Bardi, Massim

iliano Assante, September 2015, bardi@

isti.cnr.it

Page 54: Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

Thank youSuggested reading:• Bardi A., Manghi P. A Framework Supporting the Shift from Traditional Digital

Publications to Enhanced Publications (2015). doi: 10.1045/january2015-bardi

• Manghi P., Bolikowski L., Manola N., Schirrwagen J., Smith T. OpenAIREplus: the European Scholarly Communication Data Infrastructure (2012). doi:10.1045/september2012-manghi

• Assante M., Candela L., Castelli D., Manghi P., Pagano P. Science 2.0 Repositories: Time for a Change in Scholarly Communication (2015). doi:10.1045/january2015-assante

Contacts:[email protected]

[email protected]@isti.cnr.it