28
The Horizon 2020 Open Data Pilot Sarah Jones Digital Curation Centre, Glasgow [email protected] Twitter: @sjDCC Fot-Net Data Stakeholder Meeting on Open Data and Data Re-use in Horizon 2020, 10 th March 2015, ERTICO, Brussels Funded by:

Open data pilot

Embed Size (px)

Citation preview

The Horizon 2020 Open Data Pilot

Sarah Jones

Digital Curation Centre, Glasgow

[email protected]

Twitter: @sjDCC

Fot-Net Data Stakeholder Meeting on Open Data and Data Re-use in Horizon 2020, 10th March 2015, ERTICO, Brussels

Funded by:

What is the Digital Curation Centre?

“a centre of expertise in digital information curation with a focus on building capacity, capability and skills

for research data management across the UK's higher education research community”

www.dcc.ac.uk

Benefits and drivers

WHY SHARE DATA (OPENLY)?

Image CC-BY-NC-SA by Wonderwebby www.flickr.com/photos/wonderwebby/2723279491

It’s part of good research practice

Science as an open enterprise

https://royalsociety.org/policy/projects/science-public-enterprise/Report

“Much of the remarkable growth of scientific understanding in recent centuries is due to open practices; open communication and deliberation

sit at the heart of scientific practice.”

The Royal Society report calls for ‘intelligent openness’ whereby data are accessible, intelligible, assessable and usable.

Faster scientific breakthroughs

www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0

“It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that

we would never get biomarkers unless all of us parked our egos and

intellectual property noses outside the door and agreed that all of our data

would be public immediately.”Dr John Trojanowski, University of Pennsylvania

Increased use and economic benefit

UP TO 2008

Sold through the US Geological Survey for US$600 per scene

Sales of 19,000 scenes per year

Annual revenue of $11.4 million

SINCE 2009

Freely available over the internet

Google Earth now uses the images

Transmission of 2,100,000 scenes per year.

Estimated to have created value for the environmental management industry of $935 million, with direct benefit of more than $100 million per year to the US economy

Has stimulated the development of applications from a large number of companies worldwide

The case of NASA Landsat satellite imagery of the Earth’s surface:

http://earthobservatory.nasa.gov/IOTD/view.php?id=83394&src=ve

HORIZON 2020 OPEN DATA PILOT

Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354

Why open access and open data?

“The European Commission’s vision is that information already paid for by the

public purse should not be paid for again each time it is accessed or used, and that

it should benefit European companies and citizens to the full.”

http://ec.europa.eu/research/participants/data/ ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf

H2020 open data pilot

• Seven areas are participating in the pilot, which correspond to about €3 billion or 20% of the overall Horizon 2020 budget in 2014 and 2015.

• Projects in other areas can opt in on a voluntary basis

Guidelines on Data Management in Horizon 2020

http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

• Participants can opt out at proposal stage or during the lifetime of the project

• Reasons for exemption to be explained in the DMP

Which data does the pilot apply to?

Data, including associated metadata, needed to validate the results in scientific publications

Other curated and/or raw data, including associated metadata, as specified in the DMP

Doesn’t apply to all data (researchers to define as appropriate)

Don’t have to share data if inappropriate – exemptions apply

Key requirements of the open data pilot

1. Deposit in a research data repository

2. Make it possible for third parties to access, mine, exploit, reproduce and disseminate data – free of charge for any user

3. Provide information on the tools and instruments needed to validate the results (or better still provide the tools)

Image CC-BY-NC-SA by adesigna www.flickr.com/photos/adesigna/4090782772

Data Management Plans

Projects participating in the pilot will be required to develop a Data Management plan (DMP), in which they will specify what data will be open.

• What types of data will the project generate/collect?

• What standards will be used?

• How will this data be shared/made available? If not, why?

• How will this data be curated and preserved?

Note that the Commission does NOT require applicants to submit a DMP at the proposal stage. DMPs are a deliverable

for those participating in the pilot.

Good practice, tools, infrastructure & services

SUPPORT FOR IMPLEMENTATION

Data sharing: degrees of openness

Open Restricted Closed

Content that can befreely used, modified

and shared by anyonefor any purpose

Limits on who can use the data, how or for what purpose

- Charges for use

- Data sharing agreements

- Restrictive licences

- Peer-to-peer exchange

- …

online under an open licence

structured data

non-proprietary formats

use URIs to denote things

link data to provide context

Five star open data http://5stardata.info

Unable to shareUnder embargo

How to make data open?

1. Choose your dataset(s)What can you may open? You may need to revisit this step if

you encounter problems later.

2. Apply an open licenseDetermine what IP exists. Apply a suitable licence e.g. CC-BY or CC0

3. Make the data availableProvide the data in a suitable format. Use repositories.

4. Make it discoverablePost on the web, register in catalogues…

https://okfn.org

www.dcc.ac.uk/resources/how-guides/license-research-data

Data licensing

This DCC how-to guide outlines pros and cons of each approach and gives practical advice on how to implement your licence.

• Do you own the rights or have permission to redistribute?

• Do you need to place restrictions on who can use the data or how?

EUDAT licensing wizard

http://ufal.github.io/lindat-license-selector

Search / browse through a list of possible licences Or answer questions to determine which is most suitable

Metadata standards• Good metadata is key for research data access and re-use

• Many disciplines have formalised community metadata standards

• Use relevant standards for interoperability

www.dcc.ac.uk/resources/metadata-standards

Data catalogues

Institutional services e.g. DataFinder at the University of Oxford

National services e.g. Research Data Australia and RDDS pilot in the UK

Data centres and community initiatives e.g. FOT Data Catalogue, B2FIND etc

Joining up data catalogues

Data repositories

http://databib.org

http://service.re3data.org/search

Zenodo

• Joint effort by OpenAIRE-CERN

• Multidisciplinary repository

• Multiple data types

– Publications

– Long tail of research data

• Citable data (DOI)

• Links funding, publications, data & software

www.zenodo.org

• Does your publisher or funder suggest a repository?

• Are there data centres or community databases for your field?

• Does your university offer support for long-term preservation?

EUDAT services

EUDAT offers a pan-European solution, providing a generic set of services to ensure minimum level of interoperability

Building common data services in close collaboration with 25+ communities

www.eudat.eu

EUDAT B2 service suite

Covering both access and deposit, from informal data

sharing to long-term archiving, and addressing

identification, discoverability and computability of both

long-tail and big data, EUDAT’s services will

address the full lifecycle of research data

Institutional RDM support services

Diagram courtesy of Sally Rumsey, University of Oxford

University of Edinburgh Research Data Management Roadmap

www.ed.ac.uk/schools-departments/information-services/about/strategy-planning/rdm-roadmap

Research Data Oxfordhttp://researchdata.ox.ac.uk

Support on Data Management Plans

• Checklist on what to include

• How to guide on developing a plan

• Guidance on assessing plans (forthcoming)

• Webinars and training materials

• DMPonline tool

• Example DMPs

www.dcc.ac.uk/resources/data-management-plans

DMPonline

• Presents requirements from funders

• Guidance from funder, uni, discipline…

• Example answers

• Ability to share plans with collaborators

• Export into a variety of formats

• …

https://dmponline.dcc.ac.uk

Thanks for listening

DCC guidance, tools & case studies:

www.dcc.ac.uk/resources

Follow us on twitter:

@digitalcuration and #ukdcc