Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The Open Data Pilot: practical implementation
Sarah Jones
Digital Curation Centre, University of Glasgow
Twitter: @sjDCC
How to make data openly available
OPEN RESEARCH DATA
Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354
How can researchers make data open?
1. Choose the dataset(s) to share
• What can be made open? This step may need to be revisited if
problems are encountered later.
2. Apply an open license
• Determine what IP exists. Apply a suitable licence e.g. CC-BY
3. Make the data available
• Provide the data in a suitable format. Use repositories.
4. Make it discoverable
• Post on the web, get a unique ID, register in catalogues…
https://okfn.org
www.dcc.ac.uk/resources/how-guides/license-research-data
Licensing research data openly
This DCC guide outlines the pros and cons
of each approach and gives practical
advice on how to implement a data licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
SA Share Alike
Reduces interoperability
ND No Derivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access
guidelines point to:
or
EUDAT licensing tool
Researchers can answer a series of questions to determine
which licence(s) are appropriate to use
http://ufal.github.io/lindat-license-selector
Metadata standards
• Metadata is basic descriptive information to help others identify and
understand the structure of the data e.g. title, author...
• Documentation provides the wider context e.g. the methodology /
workflow, software and any information needed to understand the data
• Relevant standards should be used for interoperability
www.dcc.ac.uk/resources/metadata-standards
Data file formats
If researchers want their data to be re-used and sustainable in the
long-term, they should opt for open, non-proprietary formats.
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples:
www.data-archive.ac.uk/create-manage/format/formats-table
Data repositories
http://service.re3data.org/search
Zenodo
• OpenAIRE-CERN joint effort
• Multidisciplinary repository
• Multiple data types
– Publications
– Long tail of research data
• Citable data (DOI)
• Links funding, publications,
data & software
www.zenodo.org
Plan for sharing from the outset
Many decisions taken early on in the project will affect
whether the data can be made openly available.
Researchers should:
• Ensure consent agreements also include permission to archive and
share data for reuse by others
• Seek permissions for more than just the primary project purpose if
signing licences to reuse third-party data. Derivative data may not
be able to be shared if it includes somebody else’s IP
• Explore the potential for openness when drafting agreements with
commercial partners
REVIEWING DATA MANAGEMENT PLANS
What to look for in Data Management Plans
Image CC-BY-NC-SA by Ralf Appelt www.flickr.com/photos/adesigna/4090782772
Horizon 2020 templates
The DMP should address the points
below on a dataset by dataset basis:
• Data set reference and name
• Data set description
• Standards and metadata
• Data sharing
• Archiving and preservation
(including storage and backup)
Annex 2 (mid-term & final review)
Scientific research data should be easily:
1. Discoverable
2. Accessible
3. Assessable and intelligible
4. Useable beyond the original purpose
for which it was collected
5. Interoperable to specific quality
standards
Annex 1 (by month 6)
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
Common themes to cover
• Data Description
• Standards and Metadata (discoverable / usable / interoperable)
• Data Sharing (as open as possible, as closed as necessary)
• Archiving and preservation
Key things to check
• Is the plan appropriate?– adopting relevant standards
– practices in line with norms for that field
– use of support services e.g. university storage, subject repositories…
• Does it seem feasible to implement?
• Has sufficient information been provided?
• Has advice been sought where needed?
• Are restrictions and costs properly justified?
Main judgement to make:
Has the researcher taken time to
reflect on what to do?
There are no absolute right answers. You just want
to be reassured that due consideration has been
given and the approach seems reasonable.
Data Description
• Is it clear what data will be collected?
• Are appropriate file formats proposed?
• Has the reuse or integration of existing data been
considered? (if appropriate)
• If third-party data will be reused, has sharing been
considered in the licence agreements?
Standards and Metadata
• Will enough contextual information and structured
metadata be provided to allow others to find,
understand and reuse the data?
• Will the data be documented during the research? Has
time been allocated to this?
• Will formal standards be used? (where available)
• Is information being captured & shared on the associated
software and tools needed for reuse and reproducibility?
Data Sharing
• Is it clear which data will be shared and with whom?
– Are opportunities to share data openly maximised? e.g. by seeking
consent to share, anonymising data…
– If data can’t be shared, are the reasons why explained?
• Will the data be easily accessible and openly licensed?
• If an embargo period is planned, is that in line with
norms for that discipline?
• Will persistent IDs be assigned for discovery and citation?
Archiving and Preservation (incl. storage)
• Will the research data be deposited in a suitable
community database, repository or archive?
• Are there any costs associated with preservation, and if
so, how will these be covered?
• Will the data be stored and backed-up appropriately
during the research project? e.g. on managed university
filestores rather than external hard drives
Reviewing DMPs
Useful guidelines
• ESRC guidance for peer-reviewers www.esrc.ac.uk/_images/Data-Management-Plan-Guidance-for-peer-reviewers_tcm8-15569.pdf
• MRC guidelines www.mrc.ac.uk/documents/pdf /data-management-plans-guidance-for-reviewers
• Johns Hopkins grant reviewers cribsheethttps://dmp.data.jhu.edu/resources/grant-reviewers-guide
How to assess DMPs forthcoming guide
DCC support on Data Management Plans
• Checklist on what to include
• How to guide on developing a plan
• Webinars and training materials
• DMPonline tool
• Example DMPs
www.dcc.ac.uk/resources/data-management-plans
DMPonline
A web-based tool to help researchers write DMPs
Includes a template for Horizon 2020
https://dmponline.dcc.ac.uk
Example data management plans
• Technical appendix submitted to AHRC by Bristol Unihttp://data.blogs.ilrt.org/files/2014/02/data.bris-AHRC-example-Technical-Plan.pdf
• Rural Economy & Land Use (RELU) programme examples
http://relu.data-archive.ac.uk/data-sharing/planning/examples
• UCSD example DMPs (20+ scientific plans for NSF)
http://libraries.ucsd.edu/services/data-curation/data-management/dmp-samples.html
• LSHTM guide and worked example for Wellcome Trust• www.lshtm.ac.uk/research/researchdataman/plan/wellcometrust_dmp.pdf
• Further examples: www.dcc.ac.uk/resources/data-management-plans/guidance-examples
Thanks for listening
DMP guidance, tools & resources:
www.dcc.ac.uk/resources/
data-management-plans
Follow us on twitter:
@digitalcuration and #ukdcc #DMPonline
Exercise: reviewing DMPs
In pairs or small groups:
1. Read through the example DMP or one you brought along (5 mins)
2. Discuss what you think about the example DMP (10 mins)
— Did you get a clear sense of what data will be created?
— Were particular standards and file formats named and explained?
— Is there enough information about how the data will be made available?
— Will the data be deposited in a repository for preservation?
3. Report back the main points from your discussion (5 mins)