Upload
elsevier
View
2.032
Download
0
Embed Size (px)
Citation preview
RDA 9th Plenary Meeting, Barcelona, Spain
Friday 7 April, 14.00 – 16.00
OPEN RESEARCH DATA:
A GAP BETWEEN PRACTICE AND POLICY?
LAUNCH EVENT
Agenda
14.00 – Welcome by Wouter Haak, Elsevier
14.10 – Presentation of the reportStephane Berghmans, ElsevierAndrew Plume, ElsevierClifford Tatum, CWTS
14.40 – Panel discussionmoderated by Jean-Claude Burgelman,European Commission
Panel membersPaolo Budroni, University of ViennaHelena Cousijn, ElsevierMark Hahnel, FigshareIgnasi Labastida, University of BarcelonaIngeborg Meijer, CWTS
15.40 – Summary & conclusionsby Jean Claude-Burgelman
16.00 – Drinks
Open Data
Collaboration
Reproducibility
Data
Analysis
Transparency
1. How are researchers actually sharing data?
2. Do researchers themselves actually want to share data and/or reuse shared data?
3. Why might researchers be reticent to share their own data openly?
4. What are the effects of new data-sharing practices and infrastructures onknowledge production processes and outcomes?
Research Questions – the researcher’s perspective?
Case StudiesGlobal SurveyQuantitative
(Bibliometrics)
Complementary methods approach
Insights from bibliometric dataArticles and their citations in data journals
Insights from bibliometric dataCitations to data journals in different fields of science
Insights from bibliometric dataAnalysis of acknowledgment sections
1.51 million research articles& review articles
in 2014
0.93 million withfunding info
dataAND provide
OR share
29,737 articles (3.2%)
Insights from bibliometric dataKey Findings
1. The introduction of data journals is a recent development. Data journals are still a small-scale phenomenon, but their popularity is growing quite rapidly and it is detectable in strong growth of citations over time.
2. Open data is largely driven by disciplinary culture given the significant differences between scientific fields in the adoption of data journals.
3. The lack of consistency in reporting data sharing in the acknowledgment section of scientific articles highlights a lack of reporting standards.
Insights from large-scale global survey
• How and why are researchers sharing data?
• Why are researchers reticent to share their own data openly?
• What is the role of research data management in data sharing?
• How do researchers perceive reusability?
A third of respondents do not publish research data
Q: Have you published the research data that you used or created as part of your last research project in any of the following ways?
The benefits of sharing research data are clear…
Q: To better understand your attitudes towards research data access, please think about the research data that typically is not published (e.g. not summary charts, tables or images), and indicate how much you agree or disagree with the following statements.
Strongly agree/AgreeNeither agree nordisagree/Don’t know
Strongly disagree/Disagree
research
…but obstacles remain
Q: To better understand your attitudes towards research data access, please think about the research data that typically is not published (e.g. not summary charts, tables or images), and indicate how much you agree or disagree with the following statements.
Strongly agree/AgreeNeither agree nordisagree/Don’t know
Strongly disagree/Disagree
Whose data is it anyway?
Q: Who do you believe ‘owns’ the research data that you have made or will make available to others as part of your last research project?
Who is responsible for acting on data management plans?
Q: [Respondents indicated they are mandated to archive your research data and are provided with a research data management plan to follow.] Who is responsible for the execution this research data management plan? Who is responsible for monitoring compliance this research data management plan?
Insights from large-scale global survey
Key finding 1
Dissemination of data is primarily contained within the current publishing system, even though one third of the researchers do not publish their data at all.
Key finding 2
Data management requires significant effort, and training and resources are required. Open data mandates from funders or publishers are not perceived as a driving force to improving data management training or planning.
Key finding 3
Research data is perceived as personally owned and decisions on sharing are driven by researchers, not by institutes or funders. It is important to be aware that the concept of open data speaks directly to basic questions of ownership, responsibility, and control.
Key finding 4
Researchers have little awareness of reuse licenses and proper attribution, thereby making it less rewarding to make data reusable.
Insights from case studies
• Open Data generally operationalized as the sharing and reuse of data
• Open Data is not yet very common among scholars (Borgman, 2012)
• recent study (Costas, et al. 2013)
– data repositories as the basis for analyzing data sharing
– scarcity of data available in repositories
– wide variety of policies and associated infrastructures
• often overlooked: data practices in fields with a tradition of data sharing
– would not count as open data in the political sense
– But provides a close look on data practices in research at the grass root level
– involves reconceptualizing the ‘open’ in open data to include sharing and reuse that occurs in closed contexts
Borgman, Christine L. 2012. “The Conundrum of Sharing Research Data.” JASIST 63 (6): 1059–78. doi:10.1002/asi.22634.
Costas, Rodrigo, Ingeborg Meijer, Zahedi Zohreh, and Paul Wouters. 2013. “THE VALUE OF RESEARCH DATA: Metrics for
Datasets from a Cultural and Technical Point of View. A Knowledge Exchange Report”
Case studies – analytical dimensions
Six dimensions adapted from Leonelli’s (2013):
1. data situated
2. pragmatics of sharing/reuse
3. incentives for sharing/reuse
4. governance /accountability
5. commodification
6. globalization
Leonelli, Sabina. 2013. “Why the Current Insistence on Open Access to Scientific Data? Big Data, Knowledge Production,
and the Political Economy of Contemporary Biology.” Bulletin of Science, Technology & Society 33 (1-2): 6–11.
Case selection
• Soil Mapping
• Human Genetics
• Digital Humanities
➡ 12 interviews (4 per case)
➡ Atlast.ti coding for dimensions
Actors of interest
• Data producer
• Repository manager
• Data users
• Journal publisher
• Research funder
• Article author
• Metrics researcher
• Software developer
• Database developer
Case selection
Soil Mapping
• international center
dedicated to gathering
information on world soil
• for decades, outside
scientists’ willingness to
share their data with the
center has meant they have
accumulated a variety of
data pertaining to soil
properties of particular
regions
Human Genetics
• research center organized
into several co-located
biomedical genetics labs
• centralized bioinformatics
group provides data
processing and analysis
expertise to multiple labs
in the research center,
coordinating their activities
with several projects
Digital Humanities
• many digital humanities
research projects in the
Netherlands are linked
through a national level
network.
• focused on researchers
whose work straddles the
traditional humanities and
computational sciences
Data situated
• Data is quite often described as digital, structured, and in relation
to databases. Observations or source materials become data upon
deposit in a database, which renders data as accessible for sharing
and further processing
So: sharing/reuse embedded in the concept of data
clear database orientation for both data analysis and sharing
• Soil mapping:
– I would define it as systematized observations … and what I mean by that is enough
to know how that data came to be. Otherwise I don’t think you can really use it, you
might say, “We have observations,” you don’t really have data… That’s why we
speak of a database. It’s got structure. You know what every field is and what it
stands for. That’s what I would call data, yes.
Pragmatics of data sharing• In most cases the data undergoes sequential analyses in a semi-
automated bundle of routines referred to as the ‘pipeline’. The database
is thus integral to data analysis routines and to sharing among
collaborators who participate in different stages of analysis
Layers of metadata; pipeline
Local reuse; bounded sharing
• Human genetics
– Just, yes, lots and lots of very small, sequential steps to come to an end product […] a list
of variants, that is, annotated variants, that’s what this pipeline does.
– we do is that we store all of the variants in a big database […] it will only answer in
frequencies, […] you cannot do any queries on the individual level, because asking, […] I
could identify a person; but by just asking frequency information, I still don’t know
anything except whether or not a variant is rare or frequent in a population.
Incentives for sharing/reuse • The common themes are resisting openness and bounded sharing,
characterized by asymmetrical incentives, collaborative modes of
sharing, and evolving practices associated with new forms of
collaboration.
Tensions in the distribution of labor and publications.
While sharing data is valued, the career benefits in doing so are
uncertain.
• Human genetics
– Everyone always thinks it’s a good idea, but when you say “Okay, now, come send your
data, we’ll put it in this database.” Then people always have concerns. ..you always end
up with long, long discussions why they can’t share it.
• Digital humanities
– There is a natural selection to the kind of students we get in literary studies. Occasionally,
there are students, I’ve got one of them now who says, “I want to do maths. I want to go to
mathematical studies as well and learn statistics, so that I can do this kind of research.”
That’s great.
Governance and accountability• Publisher mandates matter, funder mandates don't
– Human genetics: “funding agencies, they now start to impose this, but they do not control whether
you’ve really... they do not check whether you’ve done it, right? So, there is still not a penalty for this.”
– Soil mapping: “From the perspective of their own accountability as a center, a lack of consistent data
citing practices means that accrediting committees are unable to evaluate the number of times the
center’s data has been reused in publications”.
• Security of data & privacy is leading
– Human genetics: sharing of genetic data must comply with strict privacy measures.
• Cross-disciplinary practices
– Digital humanities: .. the transfer of practices between disciplines and the utilization of resources
common with collaborators rather than following typical repository-oriented resources associated with
the broader open data movement.
• Training related to open data was generally understood as beneficial and/or
desired, but largely missing.
Globalization• Negotiating terms of exchange
• Privacy and security:
– Soil mapping: strict privacy laws that prevent inclusion of geographical coordinate points
(France) and restrictions on the scale of data that can be shown (China)
• Financial
– Soil mapping: diverging expectations over whether monetary exchange should occur
(Netherlands and United Kingdom)
• Bureaucracy
– Soil mapping: bureaucratic practices that prolong and may prevent access to data (India)
• Cultural objects
– Digital humanities: I was just in Japan which has a completely different idea about for
instance museums as treasure holds. They protect the treasures of culture and they would
never consider opening that up just freely for the public... There's one university library in
Nagasaki that is digitizing their own photo albums, but that was mainly it.
Commodification
• licensing and commercialization
commercial funding
commercialization of tools
societal relevance, though often commercialization is still a ‘bad
word’
• Digital humanities
– “A small company, a consultancy company who works on projects for publishers
wanted to know if they could use our corpus, because they were trying to predict a
best-seller. So, now we’re working on a new project in which we try to develop a
scouting tool for publishers.”
Key findings, Case studies (1)1. Consider open data as a situated activity
o All three cases reveal ways in which the pragmatics of data sharing and reuse are
embedded both in conceptions of data and in normal data processing work.
o Observations or source materials become data upon deposit in a database, which renders
data accessible for sharing and further processing.
Reflection on survey: Note that ‘data’ in the survey is primarily defined as
observations/results/source materials, rather than in relation to databases
2. Freeing-up data for reuse and sharing is hindered by national and
regional differences with respect to data privacy and licensing.
o The case study material illustrates potential globalization challenges regarding ‘late stage’
data sharing and reuse practices.
o Friction from national differences was evident, including, cultural, bureaucratic and
financial assumptions.
Reflection on survey: privacy issues, proprietary aspects, and ethics seem a common
barrier
Key findings, Case studies (2)3. Data is only integrally configured for sharing and reuse in
collaborative research projects, where incentives for sharing are
embedded in the research design itself.
Reflection on survey: Collaborative research can be used as a driver
for data sharing also in non-data intensive research fields
4: Training related to open data was generally understood as beneficial
and/or desired, but largely missing.
Reflection on survey: Training on open data handling is a big issue
as well, as well as question who should be responsible for it. The
researcher?
Implication: The key findings raise questions about the efficacy of policy
that prescribes open data practices as an activity apart from situated
contexts.
Intensive data-sharing
Restricted data-sharing
Open Data Scenarios
Challenges Opportunities
Suggested questions for the panel When and why should a researcher choose to publish data in data journals? Is it for
example dependent or independent from other publications?
How would you address the tension of researchers wanting to share but afraid of losing control over their data?
How can you make researchers see the benefits of Open Data before they see the problems?
How would you (re)formulate open data policy to enable bottom-up implementation?
What will be the tipping point(s) for Open Data?
What are concrete implementation steps of Open Data for the researchers, for institutions and for funders?
Project Team
Stephane BerghmansHelena CousijnGemma DeakinIngeborg MeijerAdrian MulliganAndrew Plume
Alex RushforthSarah de RijckeClifford Tatum
Stacey TobinThed van Leeuwen
Ludo Waltman
Thank You