Upload
openaire
View
1.067
Download
0
Embed Size (px)
Citation preview
Enabling better scienceResults and vision of the OpenAIRE
infrastructure and RDA Data Publishing Working Group
Paolo [email protected]
Institute of Information Science & Technologies “A. Faedo”National Research Council, Pisa, Italy
The research group
• http://nemis.isti.cnr.it/groups/infrascience• 27 among senior and junior researchers, sys
admins and PhD student (computer science & information engineering)
Data interoperability
Digital Library Foundations &
Management Systems
Enabling middleware for service-oriented
infrastructures
Enhanced publication &
compound object models
Virtual Research Environments
De-duplication ofinformation objects
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
The research group
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
Modern Scholarly CommunicationResearch Infrastructures go beyond literature
Data-driven: Jim Gray’s fourth paradigm
software
experiment experiment
service
Dataset publishingData repositories
Scientific process publishing
Web-driven: immediate sharing and access to digital knowledge
Literature publishingInstitutional, thematic repositories
Publisher Journal repositoriesResearch
InfraResearch
Infra
MarketPlace
Modern Scholarly CommunicationPublishing beyond literature
Comprehensive scientific reward by
citation of any research outcome
Improved understanding of research outcome
Better research review-process [repeatability,
replicability, and reproducibility of
experiments - Goble, 2009]
Effective dissemination and re-use of valuable
research assets
Lower costs of science
Publication
Data
Scientificprocess
Methodological processes, executable workflows, piece of
software
Input/output to scientific process
Funders and projects• Funders are crucial to development of science• If public funders, they push for Open Access mandates• Funders require methodologies to monitor impact (ROI) and
adherence to mandates of projects they fund• Projects require the same tools to show off their production
Publication
DataProject
Funding
Scientificprocess
Scientific communication workflows for research products
• Publications: well established (PIDs, metadata, deposition, peer-review, citation, dissemination)• Research data: available for given communities (PIDs,
metadata, deposition?, peer-review?, citation? dissemination?)• Scientific process: almost inexistent and not supported
by scientific reward mechanisms
Deposition Peer-review Dissemination
Sharing/publishing research data: current solutions• Part of article: subset of research data embedded as figures or
tables• Additional material possibly submitted to the journal along with
the publication full-text• Independent from article: Data deposited and described at
dedicated locations• Data centres, Discipline databases, Thematic data repositories
• Linked to article: Data deposited and described at dedicated locations with link from/to article full-texts• Discipline-specific, data papers, deposition guidelines• e.g. DRYAD, PANGAEA, GigaDB• Enhanced publications, research objects
Sharing/publishing scientific process: current solutions• Part of article: piece of code is included in the paper or as
additional material possibly submitted to the journal along with the publication full-text
• Independent from article: software, VMs, workflows, Services, e-notebooks are deposited and described at dedicated locations• Software repositories, e.g. Github• Workflow repositories, e.g. myexperiment.org
• Linked to article: software deposited and described at dedicated locations with link from/to article full-texts• Software papers, but no real deposition guidelines• Enhanced publications, research objects
Enabling Better ScienceMarket-place services in OpenAIRE and RDA
The OpenAIRE infrastrucure
• Establishment of an interoperable network of publication repositories • Deposition, discovery, linking and monitoring of research
products (articles, datasets, software) produced under National and EC funding• Monitoring the compliance to EC OA mandates for
publications• Support decision makers with statistics
OPEN ACCESS INFRASTRUCTUREFOR RESEARCH IN EUROPE
The point of reference for Open Access in Europe
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
The OpenAIRE infrastructureHuman Network e-infrastructure
• NOADS: National Open Access Desks
• Monitor and foster the adoption of Open Access policies at the local level
• Support researchers at the implementation of the Data Pilot
• e-infrastructure for monitoring impact of OA mandates and research projects
• OpenAIRE guidelines for metadata exchange
• Zenodo Repository for the deposition of research products
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
Get support(NOADs)
Linked Content Statistics
Search & Browse
Feedback
Claim/deposit
Publications
& data
Research impact Citations, usage
statistics+++
Data repositories/aggr
egatorsData Journals
Metadata on data
Publication repositories/aggregator
sInstitutional & Thematic
Open Access Journals/Publishers
Usage dataMetadata
And pdfs
National funding
EC funding
Guidelines for use services
InstitutionalCRIS Systems
CERN/OpenAIRE “catch-all” repository
Guidelines for data interoperability
OpenDOAR
re3data
Validation Cleaning & Transformation De-duplication
Enrichment by metadata and
text mining
APIs
OpenAIRE data model:view from the moon
The OpenAIRE e-infrastructure: view from the moon
www.d-net.research-infrastructures.eu
OpenAIRE e-infrastructure: hardware numbers Production• 44 CPU cores• 84 GB of RAM• 3,998 GB allocated disk
Mining Cluster• 14 servers• 98 CPU cores• 514 GB ram• 18,458 GB allocated diskData provision cluster• 15 servers• 90 CPU cores• 236 GB ram• 12,300 GB allocated disk
OpenAIRE information space numbers (September 2015)http://www.openaire.eu
• 12M publications (de-duplicated)• 200,000 links publication-project from 5 funders • 9,000 datasets linked to publications or projects• 34,000 organizations (de-duplicated)
• Collected from:• 600+ “direct” data providers• 5,000+ “indirect” data providers (inherited from aggregators)• End-users…
OpenAIRE information space numbers (September 2015)
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
OpenAIRE’shttp://www.zenodo.org• Zenodo repository (production)• Deposition of publications, datasets, software • DOI minting and metadata curation• Community support• Much more…• FREE
• Numbers• Publications 16,240• Datasets 1,477• Software 4,456• Other products 1,400+
OpenAIRE partners and liaisons
SHARE
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
Other OA initiatives: international collaborations
RDA-ANDS (Australia)
SHARE (United States)
La Referencia
(South America)
CAS (China)
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
Sharing research products and context to enable better science in OpenAIRE
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
OpenAIRE 2009 - 2012
OpenAIRE Plus 2011 - 2014
OpenAIRE20202015 - 2018
EC Open Access mandate monitoring
European Grid Infra
(EGI)
Links among publications,
data and process
National funders
Links between
publications and data
By Donatella Castelli and Alessia Bardi, September 2015, bardi@
isti.cnr.it
Publications in OpenAIRE
• Publications acquisition policy• Open Access publications• Publications linked to a project whose funder is
supported by OpenAIRE• Publications are collected from literature repositories
and “claimed” by registered end-users• Metadata and full text (when Open Access or agreed with
publishers)
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
Funders and projects in OpenAIRE• Collects projects from the following funder sources• European Commission: FP7 and H2020• Wellcome Trust• FCT (Portugal)• NHRMC (Australia)• ARC (Australia)• Science Foundation Ireland (Ireland)• On the way: Croatian, Dutch, and American (NSF)
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
Enabling better science: publications and funders• OpenAIRE guidelines for literature repositories• “How to describe” publications• “How to describe” projects• “How to put publications in context” with projects• Cooperation with SHARE (US), JISC (UK), La Referencia (South
America)• OpenAIRE Services• Offering access to project information by funder• Inferring links between articles and projects of any funders• Monitoring ROI/Open Access of any funders by project (and
more)
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
Enabling better science: publications and funders• Literature Broker Service for Institutional Repositories
(deliver 2016)• Serving repository managers• Subscriptions based on configurable criteria of
publication-repository “closeness”
27
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
Research Data in OpenAIRE
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
• OpenAIRE research data acquisition Policy• Must be linked to OpenAIRE publications or to projects• No datasets identified by accession numbers• Dealt with as “external links”
• Datasets are collected from data archives and “claimed” by registered end-users
Enabling better science: research data
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
• OpenAIRE guidelines for data archives• “How to describe” datasets (inspired by DataCite)• “How to put datasets in context” with projects
• OpenAIRE services• Inference of links to datasets from article full-text• Extraction of dataset-publication links from data
archives (e.g. PANGAEA, DataCite)
Research initiatives• OpenAIRE opens to “research initiatives” willing to • Monitor the productivity of the community in terms of
publications and datasets• Support the discovery of research made by peers in the same
community
Publication
DataProject
Funding Research Initiative
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
Enabling better scienceResearch initiatives• OpenAIRE research initiatives• European Grid Infrastructure (EGI), concepts: EGI
Virtual Organizations and EGI disciplines• OpenAIRE services• Inference of links to research initiatives from
article full-texts• Monitoring ROI and Open Access w.r.t. relevant
“concepts” of a research activity
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
Enabling better science: Scientific processesTo be defined• Scientific process acquisition policies• e.g. software, process (e.g. Taverna workflows), methods (e.g. e-
notebooks)• Collection strategies• From “process repositories”? E.g. myexperiments.org, GitHub
• Guidelines for “process repository managers”• OpenAIRE services• Monitoring ROI of projects in terms of processes!• Inference/extraction of article/process links?
Publications
EC funding
National funding
Research Data
Research Initiatives
ScientificProcess
Research Data Alliance (RDA)
Research Data Alliance (RDA)Publishing Data Services Working Group
• Forum funded by the Commission to propel the discussion among researchers and practitioners in the ambit of research data management
• Identification of common, cross-discipline problems and yield best practices, recommendations
• Organized in Interest Groups and Working Groups• Focus:• Publishing Data Interest Group: umbrella of WGs focusing on
enabling a stronger research data publishing infrastructure• Publishing Data Services Working Group: focusing on article-
datasets interlinking
Publishing Data Services Working GroupData-article links• Benefits of creating context by establishing data-article links• Increasing visibility and discoverability• Stimulating reuse and repeatability
• Key to make it worth it:• Infrastructural approach: linking needs to be done collectively, at
community (and cross-community) level, sharing procedures, policies and technologies
• Issues• No common framework for interlinking datasets and published
articles• Initiatives live in isolation and cannot be combined
Enabling better science: giving open access to article-dataset links
• Creating “an open, freely accessible, web based service that enables its users to identify datasets that are associated with a given article, and vice versa”
• The Service will serve as a flexible sandbox• Major scholarly communication stakeholders involved at
different levels• Feed authoritative links to the Service• Access links from the service• Feedback requirements, preferences, recommendations, obstacles
to refine/enhance the service
Harkan GruddSiddeswara GuruLaure Haak (ORCID)John Helly Francisco HernandezSimon HodsonRichard Kidd (RSc)Hylke Koers (Elsevier) – co-chairPaolo Manghi (OpenAire)Haralambos MarmanisCaroline MartinJo McEntyre (EMBL - EBI)Yolanda MelecoSheila MorrisseyLyubomir PenevMohan Ramamurthy
Howard Ratner Nigel Robinson (Thomson Reuters)Sergio Ruiz (DataCite)Uwe Schindler (PANGAEA)Johanna Schwarz (Springer)Martina StockhauseCarly Strasser Eefke Smit (STM)Jonathan TeddsJoachim WackerowJuanle WangHua XuEva Zanzerkia
Claire AustinDavid ArcturAmir Aryani (ANDS)Geoff Bilder (CrossRef)Timea BiroAdrian Burton (ANDS) - co-chairIan Bruno (CCDC)Sarah CallaghanDavid Carlson Jamus Collier (PANGAEA)Suenje Dallmeier-ThiessenTim DiLauro Ingrid DilloRorie EdmundsJanine FeldenCarol GobleJeffrey Grethe
PDS-WG Stakeholders
The Data-Literature Service• A one-for-all service model infrastructure for the research
data publishing • Increase interoperability• Decrease systemic inefficiencies• Power new tools and functionalities to the benefit of researchers
Benefits
• For data repositories and journal publishers• Linking becomes more scalable and cheaper, ensuring
more visibility for data sources (and their “customers”)• For research institutes, bibliographic providers,
and funding bodies• Enables bibliographic services and productivity
assessment tools that track datasets and journal publications within a common framework
• For researchers• Sharing and accessing relevant articles and data easier,
more efficient and accurate, thereby increasing scientific reward and enhancing its practices.
System development and operation: OpenAIRE and PANGAEA
Links collection
…
Harmonizing
PID resolution
De-duplicating
Information Space
Web Portal
Core Data Model
Data Sources
OAI-PMHSearch APIs
Examples:• Pairs of DOIs• DataCite records• PANGAEA records
OAI-PMHintersection
Information SpaceCore Data Model Schema
The Service (BETA)http://dliservice.research-infrastructures.eu
Powered by: • OpenAIRE D-NET
software
• PANGAEA search engine
Some numbers• Close to 1 Million links and 2 Millions objects
Providers
What’s intrinsically wrong?
Research products publishing workflow
Digital researchProducts (articles, data,
scientific process)
Repositories for literature, research
data, scientific process
Research e-infrastructure
Market-place services
By Donatella Castelli and Alessia Bardi, Massim
iliano Assante, September 2015, bardi@
isti.cnr.it
Research products publishing workflow
Reuse of products
Lack of context: no replication
Deposition
De-contextualisation
Staticity
Extra Cost
Quality Assessment
Inefficientpeer-review:
No repeatability
Dissemination
Fragmentationin thematic or typology silos
Lack of semantic
linking
Research activities• A research activity can be intended as the course of actions,
following a scientific method that leads to prove an initial thesis in order to bring novelty to a research field
• Every research activity builds upon and produces a wide array of research products
Publication
DataProject
Funding Research Initiative
Scientfiicprocess
ResearchInitiative
Time
Time for a change in scholarly communicationPublishing Research Activities• De facto the literature publishing workflow has been adapted
and adopted for other research products• “Elsewhere” and “on date” philosophy
• On the contrary, modern research conducted with the support of Research e-Infrastructures is• Strongly contextualized, intrinsically dynamic
• Research products should be published “in place” and “during” and together with research activities
• Research e-Infrastructures should evolve to support marketplace-like functionality
Science 2.0 Repositories
• Creation of research products in the Re-I is intercepted and the new products are published• Notify peers about
research activities and published products via research social networks• Foster continuous open
peer review
By Donatella Castelli and Alessia Bardi, Massim
iliano Assante, September 2015, bardi@
isti.cnr.it
SciRepo publishing model benefits
Deposition
In context
Products remain “alive”
No Extra Cost
Alternative products
Quality Assessment
Continuous and in
context
Self-assessment
Dissemination
Unified
Automatic and
Complete
Deposition
De-contextualis
ation
Staticity
Extra Cost
Quality Assessment
Ineffective peer-
review
Dissemination
Fragmentation in
thematic or typology
silos
Lack of semantic
linking
Current Model Cons
SciRepo Model Pros
By Donatella Castelli and Alessia Bardi, Massim
iliano Assante, September 2015, bardi@
isti.cnr.it
An example of SciRepo Research Activity web page
By Donatella Castelli and Alessia Bardi, Massim
iliano Assante, September 2015, bardi@
isti.cnr.it
SciRepo in the real worldhttp://www.i-marine.eu
• The iMarine Research Infrastructure features part of the SciRepo social functionalities:• Applications running in the RI generate products that can be shared• Notifications of new research products via News Feed• Research products accessible in the context of the application that
generated them• More will be realized for the new BlueBridge project
By Donatella Castelli and Alessia Bardi, Massim
iliano Assante, September 2015, bardi@
isti.cnr.it
Thank youSuggested reading:• Bardi A., Manghi P. A Framework Supporting the Shift from Traditional Digital
Publications to Enhanced Publications (2015). doi: 10.1045/january2015-bardi
• Manghi P., Bolikowski L., Manola N., Schirrwagen J., Smith T. OpenAIREplus: the European Scholarly Communication Data Infrastructure (2012). doi:10.1045/september2012-manghi
• Assante M., Candela L., Castelli D., Manghi P., Pagano P. Science 2.0 Repositories: Time for a Change in Scholarly Communication (2015). doi:10.1045/january2015-assante
Contacts:[email protected]
[email protected]@isti.cnr.it