Realising the value of open data: some disciplinary perspectives

Preview:

DESCRIPTION

Presentation fro the CIRCE workshop on ISS data preservation and use. Presents finding from the RECODE project on the value of making data open from the perspective of different research disciplines.

Citation preview

Realising the value of open

data:Some disciplinary

perspective

Susan Reilly, LIBER Projects Managersusan.reilly@kb.nl

@skreilly

Overview

• Introduction: Policy RECommendations

for Open access to research Data in Europe (RECODE)

• The open research data agenda

• Case studies: drivers and barriers

• The way forward

Project ReCODE

The project will leverage existing networks, communities and projects to

address challenges within the open access and data dissemination and

preservation sector and produce policy recommendations for open access to research data based on existing good

practice.

Project ReCODE Objectives

• Reduce stakeholder fragmentation• Identify stakeholder values and inter-

relationships• Identify gaps, tensions and good practices• Produce a set of guidelines for the sharing

of scientific data• Engagement of stakeholders• Use 5 cases from different disciplines

By Ken Lund (Flickr: Why, Arizona (2)) [CC-BY-SA-2.0 (http://creativecommons.org/licenses

Clear benefits of open data

http://fav.me/d1y5efr

But if we really want researchers to open their data, maybe we should move from

the general to the specific

Because there are barriers too…

• Cultural differences

• Definition of research data

• Lack of skills/education

• Poorly defined roles and responsibilities

• Lack of infrastructure

• Lack of career incentives

5 case studies

• Particle physics

• Clinical science

• Human physiology

• Enviromental science

• Archeology and related disciplines

Particle Physics

• Practice– Large scale collaborative– Numerical data, complex analysis software and

hardware– Long time scale– Grid anlysis

• Motivation– Access for comparision, error testing, less

duplication of effort

Particle physics

• Barriers– Size of data– Relevance– Cost of openness– Complexity– Needs context (metadata)– Culture of collaboration

+ competition

Health Science

• Practices– Interdisciplinary– Different data types and sources– Many stakeholders (commercial, government,

practice)

• Motivations– Faster advancement, more reliable results,

access to negative result, duplication, understand genome

Health Science

• Barriers– Anonymisation– Commericial interests (competition)– Variety of formats– Quality metadata

Archeology

• Practice– Highly individual, fieldwork– Lots of data formats– Lacks standardisation in language,

terminology and measurement

• Motivations– Not replicable, cumulative knowledge,

creating narrative

Archeology

• Barriers– Legacy data– Not digital– Context is key- metadata, interoperability– Unclear research parameters– Specific skill sets needed (e.g. coding)– Cost

How do we define open access to research data?

• We can define ‘open access’ (see Berlin Declaration): license to copy, use, distribute and display material subject to proper attribution of authorship and appropriate standard format, online repository, enable unrestricted distribution,interoperability, and long-term archiving.

• But how do we define research data?Data underlying publications, all experimental data? Disciplines need to define what data should be made open

The entire data lifecycle must be addressed

• Open access to data extends across the life cycle of the production of knowledge, from ethical concerns about data collection, characteristics of data collection, data analysis, data management, access to findings, and the status of findings.

• Although some developments are shared across research practices, these are adapted within specific disciplines

Stakeholder fragmentation

• What is the real cost of open data?• Universities, publishers, public and private

research organizations, software developers, libraries, funding bodies and repositories within national, world regions and global science eco-systems

• High interdependency, but lack

of clarity around roles and

ResponsibilitiesBy Oneblackline (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

Infrastructure & technologies

• Interoperability

• Scalability

• Data quality

• Automatically

executable policies

By Anonymous (Guillaume Blanchard, Juillet 2004, Fujifilm S6900.) [CC-BY-SA-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/2.5-2.0-1.0), GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or FAL], via Wikimedia Commons

Legal and ethical issues

• Intellectual property– the database directive, copyright agreements

with publishers, can we (libraries/repositories) change the format of data?

• Data protection– right to be forgotten

A word on the long tail of research data…

• Data that does not fall within the scope of discipline/government repositories

• https://rd-alliance.org/groups/long-tail-research-data-ig/wiki/objectives-interest-group.html

Thank you from the ReCODE partners!

Recommended