Upload
sarah-jones
View
111
Download
6
Embed Size (px)
DESCRIPTION
Presentation given at the European Research Council workshop on research data management and sharing in Brussels on 18th-19th September 2014. The presentation covers the benefits and drivers for RDM, points to relevant tools and resources and closes with some open questions for discussion.
Citation preview
Managing and sharing data
Sarah JonesDCC, University of Glasgow
[email protected] Twitter: @sjDCC
ERC Workshop on Research Data Management and Sharing 18-19 September 2014 , Brussels
Funded by:
European Research Council policy
Commitment to open science from the start:
"it is the firm intention of the ERC Scientific Council to issue specific guidelines for the mandatory deposit in open access repositories of research results – that is, publications, data and primary materials – obtained thanks to ERC grants, as
soon as pertinent repositories become operational."
Statement on Open Access, December 2006
Image CC BY-SA 3.0 by Greg Emmerich www.flickr.com/photos/gemmerich/6365692655
Why make data available?
Sharing leads to breakthroughs
www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
... increases the speed of discovery
Returns for institutions
“If an institution spent A$10 million on data, what would be the return? The answer is: more publications; an increased citation count; more grants; greater profile; and more collaboration.”
Dr Ross Wilkinson, ANDSwww.ariadne.ac.uk/issue72/oar-2013-rpt
Researchers get a citation boost
“Publicly available data was significantly (p = 0.006) associated with a 69% increase in
citations, independently of journal impact factor, date of publication, and author country of origin
using linear regression.”Piwowar H., Day, R and Fridsma, D. (2007) Sharing detailed research data is associated with
increased citation rate. DOI: 10.1371/journal.pone.0000308
But, there are also barriers...
Who owns the data?• Researchers?• University?• Commercial partners?• Funders?• …
People are often misinformed about who owns the data. It is particularly hard to determine in international projects or ones with industry.
Restrictions on sharing• Patentable data• Commercial sensitivities• Personal, identifiable data• Lack of consent • …
There are legitimate reasons to agree embargo periods, impose conditions, or to share only some of the data.However, these are often given as reasons not to share data at all.
www.dcc.ac.uk/sites/default/files/documents/events/ workshops/IHW-2013/UKDA-barriers-to-data-sharing.pdf
And opportunity costs
By Emilio Brunahttp://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690
For his most recent paper:
1. Double checking the main dataset and reformatting to submit to Dryad: 5 hours
2. Creating complementary file and preparing metadata: 3 hours
3. Submission of these two files and the metadata to Dryad: 45 minutes
4. Preparing a map of the locations: 1 hour
5. Submission of map to Figshare: 15 minutes
6. Cleaning up and documenting the code, uploading it to GitHub: 25 hours
7. Cost of archiving in Dryad: US$90
8. Page Charges: $600
What needs to change?
Conclusions from Emilio Bruna:
• Develop a better system of incentives from the community for archiving data and code
• Teach our students how to do this NOW - it’s much easier if you develop good habits early
• Minimise the actual and opportunity costs
We need to stop telling people “You should” and get better at telling people “Here’s how”
What is involved in data curation
• Data Management Planning• Data creation• Annotating / documenting data• Analysis, use, versioning• Storage and backup• Publishing papers and data• Preparing for deposit• Archiving and sharing• Licensing• Citing…
Plan
Create
Document
Use
Publish
Share
Data Management Plans
Brief plans to determine how data will be created, managed and shared. DMPs usually cover:
1. Description of data to be collected / created
2. Standards and methodologies for data collection & management
3. Any issues or restrictions due to ethics and Intellectual Property
4. Plans for data sharing and access
5. Strategy for long-term preservation
DMPs are often submitted as part of grant applications, but are useful whenever you’re creating data.
Help with DMPs
https://dmponline.dcc.ac.uk
A web-based tool to help researchers write data management plans
www.dcc.ac.uk/sites/default/files/documents/resource/DMP_Checklist_2013.pdf
Framework for creating a DMP
A list of common elements explaining why they are important and giving example answerswww.icpsr.umich.edu/icpsrweb/content/ datamanagement/dmp/framework.html
Examples planswww.dcc.ac.uk/resources/data-management-plans/guidance-examples
Managing and sharing data: a best practice guide
http://data-archive.ac.uk/media/2894/managingsharing.pdf
Training materials
FOSTER project• Open science training• Courses across EU• Portal to OA materials• Guidance on Horizon 2020
• Free online training course• Aimed at PhD students• Case studies, quizzes etc• Data handling tutorials
– R– SPSS– ArcGIS– Nvivo
http://datalib.edina.ac.uk/mantra www.fosteropenscience.eu
DCC tools catalogueA catalogue of RDM tools for different audiences. Tools for researchers focus on data handling, managing workflows, citation and impact.
www.dcc.ac.uk/resources/external/tools-services
Tools to help with RDM activities
impactstory.org
owncloud.org
thedata.org
www.datacite.org
dataup.cdlib.org
www.myexperiment.org
www.taverna.org.uk
www.labtrove.org
Documentation & metadata
Workflow management
Storage & collaboration
Citation & impact
Metadata standards catalogue
Use standards wherever possible for interoperability
www.dcc.ac.uk/resources/metadata-standards
Data repositories
http://databib.org
http://service.re3data.org/search
1. How do you foster open science?
• Make it feasible to comply – provide tools and infrastructure
• Train people early in their careers
• Incentivise openness
• Listen to researchers and learn from their experience about what doesn’t work
• Follow up on any demands made in policies
2. Who is responsible for providing infrastructure and support?
Funders
Discipline
Institution
Third-party
services
National provider
Data centres e.g. via NERC
Institutional support for discipline-specific tools e.g. Monash MeRC partnership on tools like OMERO
National brokerage of deals with third-party providers e.g. Jisc Janet deals with Arkivum
And what about co-ordination?
3. Who should pay?Funding Research Data Management"A conversation with the funders”
The DCC held a special event on this topic in the UK, but there’s still a long way to go
www.dcc.ac.uk/events/research-data-management-forum-rdmf/rdmf-special-event-funding-research-data-management
Thanks – any questions?
DCC guidance, tools and case studies:www.dcc.ac.uk/resources
Follow us on twitter: @digitalcuration and #ukdcc