19
| 1 Anita de Waard VP Research Data Collaborations Elsevier RDMS [email protected] NFAIS Annual Conference, Philadelphia, PA February 21, 2016 The Rocky Road To Reuse: Encouraging infrastructures to promote data integration and reuse

The Rocky Road to Reuse

Embed Size (px)

Citation preview

Page 1: The Rocky Road to Reuse

| 1

Anita de WaardVP Research Data CollaborationsElsevier [email protected]

NFAIS Annual Conference, Philadelphia, PA

February 21, 2016

The Rocky Road To Reuse: Encouraging infrastructures to promote data integration and reuse

Page 3: The Rocky Road to Reuse

| 3

Collect and Capture: Sharing Protocols

www.hivebench.com

Page 4: The Rocky Road to Reuse

| 4

Save and store: Data Rescue Award

https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue-award-in-the-geosciences

Page 5: The Rocky Road to Reuse

| 5

http://www.journals.elsevier.com/softwarex/

Collaborate & Analyse: SoftwareX

Page 6: The Rocky Road to Reuse

| 6

The first Reproducibility Paper was published recently: http://www.sciencedirect.com/science/article/pii/S0306437915301113and is linked to this paper: http://www.sciencedirect.com/science/article/pii/S0306437915000472The data is hosted here: https://data.mendeley.com/datasets/xz6gv65m6d/6 To reproduce the experiment, the journal requires source code for the software components, together with installation scripts, and we suggest authors to host their code in GitHub (See software publication project) , In addition to the source code, we recommend authors to submit a virtual machine, where all appropriate software components are readily installed and can be reproduced on a wide variety of platforms. Authors are to submit their experiments using either ReproZip or Docker.

Publish Reproducible Formats

Page 7: The Rocky Road to Reuse

| 7

https://data.mendeley.com/datasets/xz6gv65m6d/6

Linked to published papers – or not

Linked to Github – or not

Versioning and provenance

Manage, Store: Mendeley Data

Page 8: The Rocky Road to Reuse

| 8

Share and Publish, Today:

• Supplementary data at PANGAEA• Bidirectional links between PANGAEA &

ScienceDirect• Data visualized next to the article

http://www.elsevier.com/databaselinking

Page 9: The Rocky Road to Reuse

| 9

Share and Publish, Tomorrow:• ICSU/WDS/RDA Publishing Data Service Working group• Currently creating linked-data model for exposing DOI to

DOI links outside publisher’s firewall• Merged with National Data Service pilot with the same goal• Collaboration between CrossRef, DataCite, Europe PubMed

Central, ANDS, Thompson Reuters, Elsevier• About to deliver: http://dliservice.research-infrastructures.eu/#/api

Objective: move from

.. a one-for-all cross-referencing service for articles and data

a plethora of (mostly) bilateral arrangements between the different players… to…

Page 10: The Rocky Road to Reuse

| 10

Researchers

Funding AgencyInstitution

Data RepositoryDataset

JournalPaper

1. Researcher creates datasets2. Researcher writes paper & publishes in journal3. (Sometimes,) dataset gets posted to repository4. Researcher reports (post-hoc) to Institution and Funder

22

1

3

4

4

Share and Publish, Current Status:

Page 11: The Rocky Road to Reuse

| 11

Researchers

Funding AgencyInstitution

Dataset

JournalPaper2

2

1

3

4

4iii. No link between data

and paper

iv. Funders/Institutions informed as an afterthought

i. Too much work for researchers

ii. Data posting not mandatory

Data Repository

Share and Publish, Issues:

Page 12: The Rocky Road to Reuse

| 12

Researchers

Funding AgencyInstitution

Data Repository

Dataset

Journal

Paper

1. Researcher creates datasets and posts to repository(under embargo)

2. Funder is automatically notified of dataset publication3. Researcher writes paper & publishes in journal;

embargo is lifted and data linked- NB this also allows release of non-used data for negative result and reproducibility4. Funder and institution get report on publication and embargo lifting

2

11

3

3

3

4

4i. Less Work!

iv. Better Tracking!

iii. Better Linking!

ii. More Data

Stored!

Share and Publish, Proposal:

Page 13: The Rocky Road to Reuse

| 13

Cite:

https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier

Page 14: The Rocky Road to Reuse

| 14

Discover:

Page 15: The Rocky Road to Reuse

| 15

Federated Poor APIRich API

FTP & Index

Federated Poor APIRich API

FTP & Index

Federated Poor APIRich API

FTP & Index

Data

Enrichment Manual

Automated(User) Intent

Ranking Filtering (how to mix federated & indexed rich &

poor)

SearchRendering

Search all dataFaceted query/Results

refinementStore & Use results

General UI Domain

UI

Filtering

Feeding user signals

back into Search ranking

Evaluation

Birds of a Feather on Data Search: https://rd-alliance.org/bof-data-search.html

DESIRE: Networks of Discovery

Page 16: The Rocky Road to Reuse

| 16

Source: JISC: How and why you should manage your research data: a guide for researchersCaroline Ingram, Published: 7 January 2016

Research Data Life Cycle

Electronic Lab Notebooks

Software Publication

Data repositories

DataSearch

Data Linking and Publishing

Data Citation

Electronic Lab Notebooks

Software Publication

Data repositories

DataSearch

Data Linking and Publishing

Data Citation

Electronic Lab Notebooks

Software Publication

Data repositories

DataSearch

Data Linking and Publishing

Data Citation

Page 17: The Rocky Road to Reuse

| 17

https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data

A Maslow Hierarchy for Research Data:

Page 18: The Rocky Road to Reuse

| 18

Networks of Collaboration:Force11:

- Multi-stakeholder, member-driven organisation- Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.- E.g. Software citation group, akin to Data Citation Group- Will present at Force16 in Portland, OR April 17-19, 2016

National Data Service:- Multi-stakeholder group, based around supercomputing centres- Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects. - Inviting Pilots: two or more partners who have not worked together, interested in

collaborating on a data-centric project to solve a real-world needs: can include software sharing

- E.g. Datasearch, Data Linking systems

RDA: - Coleading Data publishing, linking group- Colead Cost Recovery group- Active in Chemistry, Earth Science groups- Starting BoF Data Search

The NationalDATA SERVICE

Page 19: The Rocky Road to Reuse

| 19

• https://www.hivebench.com• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-da

ta-rescue-award-in-the-geosciences

• http://www.journals.elsevier.com/softwarex/• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html• https://data.mendeley.com/• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/• https://www.elsevier.com/about/open-science/research-data

Anita de Waard, [email protected]

Thank you! Questions?