Upload
openaire
View
395
Download
2
Embed Size (px)
Citation preview
dans.knaw.nlDANS is een instituut van KNAW en NWO
Open Research Data in H2020
Marjan Grootveld OpenAIRE webinar, 26 October 2016
Who we are
Open Access Infrastructure for Research in Europe www.openaire.eu
DANS: Data Archiving and Networked Services
Institute of Dutch Academy and
Research Funding Organisation
(KNAW & NWO) since 2005
First predecessor dates back to
1964 (Steinmetz Foundation),
Historical Data Archive 1989
Mission: promote and provide
permanent access to digital research
information
4
DataverseNL for short- and mid-term storage
EASY: certified long-term Electronic Archiving System for self-deposit
NARCIS: Gateway to scholarly information in the Netherlands
Research data in context
Contents
• Brief recap from recent OpenAIRE-EUDAT webinars• The updated Guidelines for FAIR Data Management:
• F, A, I, R• Costs, data security, ethical aspects, other RDM procedures
• Recommendations• Links to EC and OpenAIRE information
5
Recent webinarsIntroductory RDM webinar, Tony Ross-Hellauer & Sarah Jones, 26 May: • Reasons to manage data • How to manage and share data (+ how to respond to concerns about
sharing)• EUDAT & OpenAIRE servicesQ&A document: https://b2drop.eudat.eu/s/0H6qRgwdwkAVFvD#pdfviewer
“How to write a DMP”, Sarah Jones & Marjan Grootveld, 7/14 July: • What is a Data Management Plan and why to write it?• Example DMPs in different domains, with lots of links!• Lessons and guidance (e.g. storing =/= archiving; how to find a
repository; file-naming conventions)
All recordings and slides are on https://eudat.eu/events/webinars https://www.eudat.eu Research Data Services, Expertise & Technology
6
Recap: why manage data?
(Not for the research funder, but for life we make data management plans)
Make your research easierStop yourself drowning in irrelevant stuffSave data for laterAvoid accusations of fraud or bad scienceWrite a data paper, connect your nano publicationsShare your data for re-use & get them validated in real lifeGet credit for it
7
NON PECUNIAE INVESTIGATIONIS CURATORE SED VITAE FACIMUS PROGRAMMAS DATORUM
PROCURATIONIS
Horizon 2020 infographic
Horizon 2020: Open Research Data Pilot
The use of a Data Management Plan (DMP) is required for projects participating in the Open Research Data Pilot, detailing what data the project will generate, whether and how they will be exploited or made accessible for verification and re-use, and how they will be curated and preserved.
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
9
Guidelines on FAIR DM v.3
Structure of the Guidelines:
1.Background: extension of the pilot2.DMP general definition3.Proposal, submission and evaluation 4.RDM plans during the project life cycle5.Support6.Annex 1: the DMP template
1. Data summary2. FAIR data3. Allocation of resources4. Data security5. Ethical aspects6. Other issues 7. Summary table “Fair DM at a glance”
10
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
What’s new?
• You should develop a DMP for your project. • There is a single DMP template from start to finish.• The DMP template is inspired by the FAIR principles: research
data should be findable, accessible, interoperable and re-usable (without suggesting any specific technology, standard, or implementation solution).
Also explicit in the new guidelines:• From 1-1-2017 the pilot will cover all thematic areas of Horizon
2020. • Costs related to open access to research data are eligible for
reimbursement during the duration of the project under the conditions defined in the Grant Agreement.
11
Good things that remain
Whether a (proposed) project participates in the ORD pilot or chooses to opt out does not affect the evaluation of that project: proposals will not be penalised for opting out.
Participating in the ORD pilot does not necessarily mean opening up all your research data: as open as possible, as closed as necessary.
The DMP is a living document. You are not required to provide detailed answers to all the questions in the first version of the DMP (due M6).
Deposit in a research data repository:a. the data needed to validate the results presented
in scientific publications, including the metadata;b. any other data, including the metadata, as
specified in the DMP;c. plus for a-b the documentation and the tools
that are needed to validate the results, e.g. specialised software or software code, algorithms and analysis protocols (when possible, these instruments themselves).
12
DMPonlineA web-based tool to help researchers write DMPs
Guidance from EUDAT and OpenAIRE being added https://dmponline.dcc.ac.uk
Choose your funder to get their specific template
Choose any additional optional guidance
13
§2 Making data FAIRFindable
– Assign persistent IDs, provide rich metadata, register in a searchable resource, ...
Accessible– Retrievable by their ID using a standard protocol, metadata remain accessible
even if data aren’t...
Interoperable– Use formal, broadly applicable languages, use standard vocabularies, qualified
references...
Reusable– Rich, accurate metadata, clear licences, provenance, use of community
standards...
14
www.force11.org/group/fairgroup/fairprinciples and http://www.nature.com/articles/sdata201618
EC in the Guidelines: “This template is not intended as a strict technical implementation of the FAIR principles, it is rather inspired by FAIR as a general concept.”
EC Infographic: http://ec.europa.eu/research/images/infographics/policy/open-data-2016-w920.png
15
Some F questions
2.1 Making data findable, including provisions for metadata
• Use metadata and specify standards for metadata creation (if any). If there are no standards in your discipline describe what type of metadata will be created and how.
• Search keywords • Persistent and unique identifiers such as DOI• File and folder naming conventions: see
OpenAIRE-EUDAT July webinar• Versioning of the datasets and clear version numbers
16
Metadata and documentation
• Metadata and documentation is needed to find and understand research data.
• Think about what others would need in order to find, evaluate, understand, and reuse your data.
• Get others to check the metadata to improve quality.• Use standards to enable interoperability.
http://rd-alliance.github.io/metadata-directory
17
Some A questions
2.2 Making data openly accessible:• Explain which data can’t be shared openly, if any• Specify how access will be provided in case of restrictions,
e.g. through a data committee, a license, or arranged with the repository.
• Will methods or software tools needed to access the data (if any) be included or documented?
• Deposit the data and associated metadata, documentation and code preferably in certified repositories which support Open Access.
Data Seal of ApprovalICSU World Data System nestor sealISO 16363
18
Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repositoryZenodo: http://www.zenodo.org Re3data.org: http://www.re3data.org
19
File format considerations
No clearcut definitions of “sustainable file format”.Each archives has its own expertise, related to its designated community. Examples:
http://dans.knaw.nl/en/deposit/information-about-depositing-data?set_language=enhttp://researchdata.4tu.nl/en/publishing-research/data-description-and-formats/
4TU.ResearchData DANS
Level 1 Level 2 or 3 Preferred Accepted
audio .wav .ra, .mp3, .wma .wav, .flac .aiff, .mp3, .aac
chemistry NMR, ChemDoodle, … .pdb, .xyz
databasesdelimited flat file w/DDL .mdb, .dbf, .acdb .sql, .siard, .csv .mdb, .dbf, .hdf5 …
video .mp1, .mp2, .mp4, .mov …
.mpg2, .mpg4, .avi, .mov .mkv
20
Interoperability
A440, which has a frequency of 440 Hz, is the
musical note A above middle C and serves as a
general tuning standard for musical pitch. Prior
to the standardization on 440 Hz, many countries
and organizations followed the Austrian
government's 1885 recommendation of 435 Hz. In
the period instrument movement, a consensus has
arisen around a modern baroque pitch of 415 Hz (
A of A440♭ ), baroque for some special church
music (Chorton pitch) at 466 Hz (A♯ of A440), and
classical pitch at 430 Hz.
In the aftermath of the French Revolution (1789),
the traditional units of measure used in the
Ancien Régime were replaced. The livre monetary
unit was replaced by the decimal franc, and a new
unit of length was introduced which became known
as the metre. The metre gained adoption in
continental Europe during the mid nineteenth
century, particularly in scientific usage, and was
officially established as an international
measurement unit by the Metre Convention of 1875.
Before clocks were invented, people kept time using different instruments to observe the Sun’s zenith at noon. Towns and cities set clocks based on sunsets and sunrises. Time calculation became a serious problem for people travelling by train, sometimes hundreds of miles in a day. UTC is the World's Time Standard.
Medical classification is the process of transforming descriptions of medical diagnoses and procedures into universal medical code numbers. SNOMED Clinical Terms (SNOMED CT) is intended to provide a set of concepts and relationships that offers a common reference point for comparison and aggregation of data about the health care process. SNOMED-CT is designed to be managed by computer.
21
Some I questions
2.3 Making data interoperable• Specify what data and metadata vocabularies, standards or
methodologies you will follow to facilitate interoperability. • Standard vocabulary to allow inter-disciplinary
interoperability or a mapping from your vocabulary to more commonly used ontologies?
22
Some R questions
2.4 Increase data re-use (through clarifying licences)
• License the data to permit the widest reuse possible • Specify a data embargo, if this is needed• How long will the data remain reusable? • Describe data quality assurance processes
Re-use over time
23
Licensing research data and software
EUDAT licensing wizard help you pick licence for data & software http://ufal.github.io/public-license-selector/
You should also license Open Access data, or waive rights.
Horizon 2020 Open Access guidelines point
to:
or
24
Keep everything? For always?
When regenerating data is cheaper than archiving, don’t archive. Select what data you’ll need and want to retain.
10 years is often stated in data policies and academic codes, but data can be valuable for ages, in climatology, sociology, health sciences, astronomy, linguistics, … Look beyond minimal retention periods where relevant.
“The lifetime of software is generally not as long as that of data” (Daniel Katz e.a. http://bit.ly/2eScCKp)
RDNL Selection criteria: http://www.researchdata.nl/en/services/data-management/selecting-research-data/ DCC How-to guide: http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
25
§3 Allocation of resources
• What are the costs for making data FAIR in your project? • Resources for long term preservation
Check the UK Data Service Costing model.
Rule of thumb: 5% of the project budget is spent on RDM. The High Level Expert Group on the European Open Science Cloud recommends that “well budgeted data stewardship plans should be made mandatory and we expect that on average about 5% of research expenditure should be spent on properly managing and stewarding data”.
UKDS model http://www.data-archive.ac.uk/create-manage/planning-for-sharing/costingHLEG report http://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf#view=fit&pagemode=none p. 19
26
§4-6
Data security• Provisions for data recovery, secure storage, transfer of
sensitive data?• Safely stored in certified repositories for long term
preservation and curation? Ethical aspects• Any ethical or legal issues that can impact data sharing? • Informed consent for data sharing and long term
preservation included in questionnaires dealing with personal data?
Which other national/funder/sectorial/departmental procedures for data management do you use (if any)?
27
Closing remarks
Image “Fishbone” CC BY-NC-ND 2.0 by ttps://www.flickr.com/photos/mrjnl/
Recommendations
• Think about the desired end result and plan for this.• Involve all work packages and partners to get a coherent
plan. • “Sharing” means “outside the consortium”.• Approach the DMP in whatever way best fits your project:
• EC template is intended as a service, not an obligation. Read the background information and the guidance, and use it as a checklist.
• More than one dataset? Describe generically what is possible and dataset-specific what is necessary.
• Focus effort on datasets you’ll create rather than reuse.
29
The EC Open Research Data pilot
Key sources of information• Guidelines on Open Access to Scientific Publications and Research Data in Horizon
2020http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
• Guidelines on Data Management in Horizon 2020http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• Annotated model grant agreement, clause 29.3 http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/amga/h2020-amga_en.pdf
• New infographic summarising key policy points http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf
• Open Access and Data Management • http://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-iss
ues/open-access-dissemination_en.htm
30
OpenAIRE support materials
• Briefing papers, factsheets, webinars, workshops, FAQs
• Information on:• Open Research Data Pilot• Creating a data
management plan• Selecting a data repository• Personal data
https://www.openaire.eu/opendatapilothttps://www.openaire.eu/support
31
dans.knaw.nlDANS is een instituut van KNAW en NWO
Thank you!
Acknowledgements:Thanks to Sarah Jones (DCC), OpenAIRE and EUDAT for slides.
[email protected] http://dans.knaw.nl/