34
RDA 9 th Plenary Meeting, Barcelona, Spain Friday 7 April, 14.00 – 16.00 OPEN RESEARCH DATA: A GAP BETWEEN PRACTICE AND POLICY? LAUNCH EVENT Agenda 14.00 – Welcome by Wouter Haak, Elsevier 14.10 – Presentation of the report Stephane Berghmans, Elsevier Andrew Plume, Elsevier Clifford Tatum, CWTS 14.40 – Panel discussion moderated by Jean-Claude Burgelman, European Commission Panel members Paolo Budroni, University of Vienna Helena Cousijn, Elsevier Mark Hahnel, Figshare Ignasi Labastida, University of Barcelona Ingeborg Meijer, CWTS 15.40 – Summary & conclusions by Jean Claude-Burgelman 16.00 – Drinks

Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Embed Size (px)

Citation preview

Page 1: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

RDA 9th Plenary Meeting, Barcelona, Spain

Friday 7 April, 14.00 – 16.00

OPEN RESEARCH DATA:

A GAP BETWEEN PRACTICE AND POLICY?

LAUNCH EVENT

Agenda

14.00 – Welcome by Wouter Haak, Elsevier

14.10 – Presentation of the reportStephane Berghmans, ElsevierAndrew Plume, ElsevierClifford Tatum, CWTS

14.40 – Panel discussionmoderated by Jean-Claude Burgelman,European Commission

Panel membersPaolo Budroni, University of ViennaHelena Cousijn, ElsevierMark Hahnel, FigshareIgnasi Labastida, University of BarcelonaIngeborg Meijer, CWTS

15.40 – Summary & conclusionsby Jean Claude-Burgelman

16.00 – Drinks

Page 2: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Page 3: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Open Data

Collaboration

Reproducibility

Data

Analysis

Transparency

Page 4: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

1. How are researchers actually sharing data?

2. Do researchers themselves actually want to share data and/or reuse shared data?

3. Why might researchers be reticent to share their own data openly?

4. What are the effects of new data-sharing practices and infrastructures onknowledge production processes and outcomes?

Research Questions – the researcher’s perspective?

Page 5: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Case StudiesGlobal SurveyQuantitative

(Bibliometrics)

Complementary methods approach

Page 6: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Page 7: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Insights from bibliometric dataArticles and their citations in data journals

Page 8: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Insights from bibliometric dataCitations to data journals in different fields of science

Page 9: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Insights from bibliometric dataAnalysis of acknowledgment sections

1.51 million research articles& review articles

in 2014

0.93 million withfunding info

dataAND provide

OR share

29,737 articles (3.2%)

Page 10: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Insights from bibliometric dataKey Findings

1. The introduction of data journals is a recent development. Data journals are still a small-scale phenomenon, but their popularity is growing quite rapidly and it is detectable in strong growth of citations over time.

2. Open data is largely driven by disciplinary culture given the significant differences between scientific fields in the adoption of data journals.

3. The lack of consistency in reporting data sharing in the acknowledgment section of scientific articles highlights a lack of reporting standards.

Page 11: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Insights from large-scale global survey

• How and why are researchers sharing data?

• Why are researchers reticent to share their own data openly?

• What is the role of research data management in data sharing?

• How do researchers perceive reusability?

Page 12: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

A third of respondents do not publish research data

Q: Have you published the research data that you used or created as part of your last research project in any of the following ways?

Page 13: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

The benefits of sharing research data are clear…

Q: To better understand your attitudes towards research data access, please think about the research data that typically is not published (e.g. not summary charts, tables or images), and indicate how much you agree or disagree with the following statements.

Strongly agree/AgreeNeither agree nordisagree/Don’t know

Strongly disagree/Disagree

research

Page 14: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

…but obstacles remain

Q: To better understand your attitudes towards research data access, please think about the research data that typically is not published (e.g. not summary charts, tables or images), and indicate how much you agree or disagree with the following statements.

Strongly agree/AgreeNeither agree nordisagree/Don’t know

Strongly disagree/Disagree

Page 15: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Whose data is it anyway?

Q: Who do you believe ‘owns’ the research data that you have made or will make available to others as part of your last research project?

Page 16: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Who is responsible for acting on data management plans?

Q: [Respondents indicated they are mandated to archive your research data and are provided with a research data management plan to follow.] Who is responsible for the execution this research data management plan? Who is responsible for monitoring compliance this research data management plan?

Page 17: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Insights from large-scale global survey

Key finding 1

Dissemination of data is primarily contained within the current publishing system, even though one third of the researchers do not publish their data at all.

Key finding 2

Data management requires significant effort, and training and resources are required. Open data mandates from funders or publishers are not perceived as a driving force to improving data management training or planning.

Key finding 3

Research data is perceived as personally owned and decisions on sharing are driven by researchers, not by institutes or funders. It is important to be aware that the concept of open data speaks directly to basic questions of ownership, responsibility, and control.

Key finding 4

Researchers have little awareness of reuse licenses and proper attribution, thereby making it less rewarding to make data reusable.

Page 18: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Insights from case studies

• Open Data generally operationalized as the sharing and reuse of data

• Open Data is not yet very common among scholars (Borgman, 2012)

• recent study (Costas, et al. 2013)

– data repositories as the basis for analyzing data sharing

– scarcity of data available in repositories

– wide variety of policies and associated infrastructures

• often overlooked: data practices in fields with a tradition of data sharing

– would not count as open data in the political sense

– But provides a close look on data practices in research at the grass root level

– involves reconceptualizing the ‘open’ in open data to include sharing and reuse that occurs in closed contexts

Borgman, Christine L. 2012. “The Conundrum of Sharing Research Data.” JASIST 63 (6): 1059–78. doi:10.1002/asi.22634.

Costas, Rodrigo, Ingeborg Meijer, Zahedi Zohreh, and Paul Wouters. 2013. “THE VALUE OF RESEARCH DATA: Metrics for

Datasets from a Cultural and Technical Point of View. A Knowledge Exchange Report”

Page 19: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Case studies – analytical dimensions

Six dimensions adapted from Leonelli’s (2013):

1. data situated

2. pragmatics of sharing/reuse

3. incentives for sharing/reuse

4. governance /accountability

5. commodification

6. globalization

Leonelli, Sabina. 2013. “Why the Current Insistence on Open Access to Scientific Data? Big Data, Knowledge Production,

and the Political Economy of Contemporary Biology.” Bulletin of Science, Technology & Society 33 (1-2): 6–11.

Page 20: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Case selection

• Soil Mapping

• Human Genetics

• Digital Humanities

➡ 12 interviews (4 per case)

➡ Atlast.ti coding for dimensions

Actors of interest

• Data producer

• Repository manager

• Data users

• Journal publisher

• Research funder

• Article author

• Metrics researcher

• Software developer

• Database developer

Page 21: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Case selection

Soil Mapping

• international center

dedicated to gathering

information on world soil

• for decades, outside

scientists’ willingness to

share their data with the

center has meant they have

accumulated a variety of

data pertaining to soil

properties of particular

regions

Human Genetics

• research center organized

into several co-located

biomedical genetics labs

• centralized bioinformatics

group provides data

processing and analysis

expertise to multiple labs

in the research center,

coordinating their activities

with several projects

Digital Humanities

• many digital humanities

research projects in the

Netherlands are linked

through a national level

network.

• focused on researchers

whose work straddles the

traditional humanities and

computational sciences

Page 22: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Data situated

• Data is quite often described as digital, structured, and in relation

to databases. Observations or source materials become data upon

deposit in a database, which renders data as accessible for sharing

and further processing

So: sharing/reuse embedded in the concept of data

clear database orientation for both data analysis and sharing

• Soil mapping:

– I would define it as systematized observations … and what I mean by that is enough

to know how that data came to be. Otherwise I don’t think you can really use it, you

might say, “We have observations,” you don’t really have data… That’s why we

speak of a database. It’s got structure. You know what every field is and what it

stands for. That’s what I would call data, yes.

Page 23: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Pragmatics of data sharing• In most cases the data undergoes sequential analyses in a semi-

automated bundle of routines referred to as the ‘pipeline’. The database

is thus integral to data analysis routines and to sharing among

collaborators who participate in different stages of analysis

Layers of metadata; pipeline

Local reuse; bounded sharing

• Human genetics

– Just, yes, lots and lots of very small, sequential steps to come to an end product […] a list

of variants, that is, annotated variants, that’s what this pipeline does.

– we do is that we store all of the variants in a big database […] it will only answer in

frequencies, […] you cannot do any queries on the individual level, because asking, […] I

could identify a person; but by just asking frequency information, I still don’t know

anything except whether or not a variant is rare or frequent in a population.

Page 24: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Incentives for sharing/reuse • The common themes are resisting openness and bounded sharing,

characterized by asymmetrical incentives, collaborative modes of

sharing, and evolving practices associated with new forms of

collaboration.

Tensions in the distribution of labor and publications.

While sharing data is valued, the career benefits in doing so are

uncertain.

• Human genetics

– Everyone always thinks it’s a good idea, but when you say “Okay, now, come send your

data, we’ll put it in this database.” Then people always have concerns. ..you always end

up with long, long discussions why they can’t share it.

• Digital humanities

– There is a natural selection to the kind of students we get in literary studies. Occasionally,

there are students, I’ve got one of them now who says, “I want to do maths. I want to go to

mathematical studies as well and learn statistics, so that I can do this kind of research.”

That’s great.

Page 25: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Governance and accountability• Publisher mandates matter, funder mandates don't

– Human genetics: “funding agencies, they now start to impose this, but they do not control whether

you’ve really... they do not check whether you’ve done it, right? So, there is still not a penalty for this.”

– Soil mapping: “From the perspective of their own accountability as a center, a lack of consistent data

citing practices means that accrediting committees are unable to evaluate the number of times the

center’s data has been reused in publications”.

• Security of data & privacy is leading

– Human genetics: sharing of genetic data must comply with strict privacy measures.

• Cross-disciplinary practices

– Digital humanities: .. the transfer of practices between disciplines and the utilization of resources

common with collaborators rather than following typical repository-oriented resources associated with

the broader open data movement.

• Training related to open data was generally understood as beneficial and/or

desired, but largely missing.

Page 26: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Globalization• Negotiating terms of exchange

• Privacy and security:

– Soil mapping: strict privacy laws that prevent inclusion of geographical coordinate points

(France) and restrictions on the scale of data that can be shown (China)

• Financial

– Soil mapping: diverging expectations over whether monetary exchange should occur

(Netherlands and United Kingdom)

• Bureaucracy

– Soil mapping: bureaucratic practices that prolong and may prevent access to data (India)

• Cultural objects

– Digital humanities: I was just in Japan which has a completely different idea about for

instance museums as treasure holds. They protect the treasures of culture and they would

never consider opening that up just freely for the public... There's one university library in

Nagasaki that is digitizing their own photo albums, but that was mainly it.

Page 27: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Commodification

• licensing and commercialization

commercial funding

commercialization of tools

societal relevance, though often commercialization is still a ‘bad

word’

• Digital humanities

– “A small company, a consultancy company who works on projects for publishers

wanted to know if they could use our corpus, because they were trying to predict a

best-seller. So, now we’re working on a new project in which we try to develop a

scouting tool for publishers.”

Page 28: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Key findings, Case studies (1)1. Consider open data as a situated activity

o All three cases reveal ways in which the pragmatics of data sharing and reuse are

embedded both in conceptions of data and in normal data processing work.

o Observations or source materials become data upon deposit in a database, which renders

data accessible for sharing and further processing.

Reflection on survey: Note that ‘data’ in the survey is primarily defined as

observations/results/source materials, rather than in relation to databases

2. Freeing-up data for reuse and sharing is hindered by national and

regional differences with respect to data privacy and licensing.

o The case study material illustrates potential globalization challenges regarding ‘late stage’

data sharing and reuse practices.

o Friction from national differences was evident, including, cultural, bureaucratic and

financial assumptions.

Reflection on survey: privacy issues, proprietary aspects, and ethics seem a common

barrier

Page 29: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Key findings, Case studies (2)3. Data is only integrally configured for sharing and reuse in

collaborative research projects, where incentives for sharing are

embedded in the research design itself.

Reflection on survey: Collaborative research can be used as a driver

for data sharing also in non-data intensive research fields

4: Training related to open data was generally understood as beneficial

and/or desired, but largely missing.

Reflection on survey: Training on open data handling is a big issue

as well, as well as question who should be responsible for it. The

researcher?

Implication: The key findings raise questions about the efficacy of policy

that prescribes open data practices as an activity apart from situated

contexts.

Page 30: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Page 31: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Intensive data-sharing

Restricted data-sharing

Open Data Scenarios

Page 32: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Challenges Opportunities

Page 33: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Suggested questions for the panel When and why should a researcher choose to publish data in data journals? Is it for

example dependent or independent from other publications?

How would you address the tension of researchers wanting to share but afraid of losing control over their data?

How can you make researchers see the benefits of Open Data before they see the problems?

How would you (re)formulate open data policy to enable bottom-up implementation?

What will be the tipping point(s) for Open Data?

What are concrete implementation steps of Open Data for the researchers, for institutions and for funders?

Page 34: Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

Project Team

Stephane BerghmansHelena CousijnGemma DeakinIngeborg MeijerAdrian MulliganAndrew Plume

Alex RushforthSarah de RijckeClifford Tatum

Stacey TobinThed van Leeuwen

Ludo Waltman

Thank You