The Open Data Pilot: practical implementation · The Open Data Pilot: practical implementation...

Preview:

Citation preview

The Open Data Pilot: practical implementation

Sarah Jones

Digital Curation Centre, University of Glasgow

sarah.jones@glasgow.ac.uk

Twitter: @sjDCC

How to make data openly available

OPEN RESEARCH DATA

Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354

How can researchers make data open?

1. Choose the dataset(s) to share

• What can be made open? This step may need to be revisited if

problems are encountered later.

2. Apply an open license

• Determine what IP exists. Apply a suitable licence e.g. CC-BY

3. Make the data available

• Provide the data in a suitable format. Use repositories.

4. Make it discoverable

• Post on the web, get a unique ID, register in catalogues…

https://okfn.org

www.dcc.ac.uk/resources/how-guides/license-research-data

Licensing research data openly

This DCC guide outlines the pros and cons

of each approach and gives practical

advice on how to implement a data licence

CREATIVE COMMONS LIMITATIONS

NC Non-Commercial

What counts as commercial?

SA Share Alike

Reduces interoperability

ND No Derivatives

Severely restricts use

These clauses are not open licenses

Horizon 2020 Open Access

guidelines point to:

or

EUDAT licensing tool

Researchers can answer a series of questions to determine

which licence(s) are appropriate to use

http://ufal.github.io/lindat-license-selector

Metadata standards

• Metadata is basic descriptive information to help others identify and

understand the structure of the data e.g. title, author...

• Documentation provides the wider context e.g. the methodology /

workflow, software and any information needed to understand the data

• Relevant standards should be used for interoperability

www.dcc.ac.uk/resources/metadata-standards

Data file formats

If researchers want their data to be re-used and sustainable in the

long-term, they should opt for open, non-proprietary formats.

Type Recommended Avoid for data sharing

Tabular data CSV, TSV, SPSS portable Excel

Text Plain text, HTML, RTF

PDF/A only if layout matters

Word

Media Container: MP4, Ogg

Codec: Theora, Dirac, FLAC

Quicktime

H264

Images TIFF, JPEG2000, PNG GIF, JPG

Structured data XML, RDF RDBMS

Further examples:

www.data-archive.ac.uk/create-manage/format/formats-table

Data repositories

http://service.re3data.org/search

Zenodo

• OpenAIRE-CERN joint effort

• Multidisciplinary repository

• Multiple data types

– Publications

– Long tail of research data

• Citable data (DOI)

• Links funding, publications,

data & software

www.zenodo.org

Plan for sharing from the outset

Many decisions taken early on in the project will affect

whether the data can be made openly available.

Researchers should:

• Ensure consent agreements also include permission to archive and

share data for reuse by others

• Seek permissions for more than just the primary project purpose if

signing licences to reuse third-party data. Derivative data may not

be able to be shared if it includes somebody else’s IP

• Explore the potential for openness when drafting agreements with

commercial partners

REVIEWING DATA MANAGEMENT PLANS

What to look for in Data Management Plans

Image CC-BY-NC-SA by Ralf Appelt www.flickr.com/photos/adesigna/4090782772

Horizon 2020 templates

The DMP should address the points

below on a dataset by dataset basis:

• Data set reference and name

• Data set description

• Standards and metadata

• Data sharing

• Archiving and preservation

(including storage and backup)

Annex 2 (mid-term & final review)

Scientific research data should be easily:

1. Discoverable

2. Accessible

3. Assessable and intelligible

4. Useable beyond the original purpose

for which it was collected

5. Interoperable to specific quality

standards

Annex 1 (by month 6)

http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

Common themes to cover

• Data Description

• Standards and Metadata (discoverable / usable / interoperable)

• Data Sharing (as open as possible, as closed as necessary)

• Archiving and preservation

Key things to check

• Is the plan appropriate?– adopting relevant standards

– practices in line with norms for that field

– use of support services e.g. university storage, subject repositories…

• Does it seem feasible to implement?

• Has sufficient information been provided?

• Has advice been sought where needed?

• Are restrictions and costs properly justified?

Main judgement to make:

Has the researcher taken time to

reflect on what to do?

There are no absolute right answers. You just want

to be reassured that due consideration has been

given and the approach seems reasonable.

Data Description

• Is it clear what data will be collected?

• Are appropriate file formats proposed?

• Has the reuse or integration of existing data been

considered? (if appropriate)

• If third-party data will be reused, has sharing been

considered in the licence agreements?

Standards and Metadata

• Will enough contextual information and structured

metadata be provided to allow others to find,

understand and reuse the data?

• Will the data be documented during the research? Has

time been allocated to this?

• Will formal standards be used? (where available)

• Is information being captured & shared on the associated

software and tools needed for reuse and reproducibility?

Data Sharing

• Is it clear which data will be shared and with whom?

– Are opportunities to share data openly maximised? e.g. by seeking

consent to share, anonymising data…

– If data can’t be shared, are the reasons why explained?

• Will the data be easily accessible and openly licensed?

• If an embargo period is planned, is that in line with

norms for that discipline?

• Will persistent IDs be assigned for discovery and citation?

Archiving and Preservation (incl. storage)

• Will the research data be deposited in a suitable

community database, repository or archive?

• Are there any costs associated with preservation, and if

so, how will these be covered?

• Will the data be stored and backed-up appropriately

during the research project? e.g. on managed university

filestores rather than external hard drives

Reviewing DMPs

Useful guidelines

• ESRC guidance for peer-reviewers www.esrc.ac.uk/_images/Data-Management-Plan-Guidance-for-peer-reviewers_tcm8-15569.pdf

• MRC guidelines www.mrc.ac.uk/documents/pdf /data-management-plans-guidance-for-reviewers

• Johns Hopkins grant reviewers cribsheethttps://dmp.data.jhu.edu/resources/grant-reviewers-guide

How to assess DMPs forthcoming guide

DCC support on Data Management Plans

• Checklist on what to include

• How to guide on developing a plan

• Webinars and training materials

• DMPonline tool

• Example DMPs

www.dcc.ac.uk/resources/data-management-plans

DMPonline

A web-based tool to help researchers write DMPs

Includes a template for Horizon 2020

https://dmponline.dcc.ac.uk

Example data management plans

• Technical appendix submitted to AHRC by Bristol Unihttp://data.blogs.ilrt.org/files/2014/02/data.bris-AHRC-example-Technical-Plan.pdf

• Rural Economy & Land Use (RELU) programme examples

http://relu.data-archive.ac.uk/data-sharing/planning/examples

• UCSD example DMPs (20+ scientific plans for NSF)

http://libraries.ucsd.edu/services/data-curation/data-management/dmp-samples.html

• LSHTM guide and worked example for Wellcome Trust• www.lshtm.ac.uk/research/researchdataman/plan/wellcometrust_dmp.pdf

• Further examples: www.dcc.ac.uk/resources/data-management-plans/guidance-examples

Thanks for listening

DMP guidance, tools & resources:

www.dcc.ac.uk/resources/

data-management-plans

Follow us on twitter:

@digitalcuration and #ukdcc #DMPonline

Exercise: reviewing DMPs

In pairs or small groups:

1. Read through the example DMP or one you brought along (5 mins)

2. Discuss what you think about the example DMP (10 mins)

— Did you get a clear sense of what data will be created?

— Were particular standards and file formats named and explained?

— Is there enough information about how the data will be made available?

— Will the data be deposited in a repository for preservation?

3. Report back the main points from your discussion (5 mins)

Recommended