Upload
bill-worthington
View
37
Download
0
Embed Size (px)
Citation preview
RESEARCHER DEVELOPMENT PROGRAMME ResearchDataManagementMarch2017
BillWorthington,ResearchandScholarlyCommunicationsManager,[email protected]
LeaningAims and Outcomes
Aim: is to develop your awareness of research data management and what it means to you
By the end of this session you will
ü Know what RDM isü Have an awareness of the requirements of the funding bodiesü Understand risks and benefits ü Have an awareness of good practiceü Have of awareness UH facilities for RDMü Be able to find relevant guides and necessary support to manage your
data effectively
Context What is Research Data Management (RDM)?
• the business of managing and safeguarding your working data
• the selection, sharing, and preservation of the data which supports the verification and reproduction of your work
• (and; a professional skill closely allied to the Open Access agenda and which contributes to the advancement of knowledge for the benefit all)
Context Origins of RDM
• data loss, controversy, and a new emphasis on return on investment in research
http://bit.ly/2cXgEQV
http://bit.ly/2cchsQX
http://bit.ly/2cXimlj
Context Research Data Management
Research Data Management is a new skill, increasingly required of all professional researchers
UK, EU, US funders expect an increased return on investment from research and research data is seen as an under-exploited resource.
the message from all major funding bodies - look after data well and share it appropriately
Latest: 28/July/16 - Concordat on Open Research Data launched by HEFCE, RCUK, Universities UK, The Welcome Trusthttp://www.rcuk.ac.uk/media/news/160728/
Context Research Data Management – Really?
All researchers – do they mean me?
If you (or your Supervisor/Principal Investigator) take money from the public purse, then
– yes, this does mean you!
And: even unfunded research generates information, often in electronic form, that will have a cost or consequence to replace if lost or mislaid
From the Digital Curation Centre Summary of UK research funders’ expectations for the content of data management and sharing plans
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
So what?institutional and personal cost of bad RDM
- the cost of re-creating lost data
- reputational and actual financial cost to the University
- reputational and career cost to you
http://research-data-toolkit.herts.ac.uk/2013/06/data-loss-in-the-ddud/
http://www.bbc.co.uk/blogs/researchanddevelopment/2009/12/loss-is-not-where-you-find-it.shtml
http://www.bbc.co.uk/search?q=fined+data+loss&filter=news
http://www.techworld.com/security/uks-13-most-infamous-data-breaches-2016-3604586/2/
So what?benefits of good RDM
- wider, safer access to your working data
- facilitates collaboration and group working
- it will save you time and effort in the long run
- someone else will do some of it for you
- facilitates bidding in the new open access, open data culture
- your data will attract citations in its own right
- career and reputation enhanced
(RDM needy? Un-attributed image - citation lost! )
ActivityWhat is data?
What is your definition of research data?
What data to you have?
(2 minutes each)
What is data? Everything
Tables
CodePlots
Transcripts
Audio-Visual
Images / Photos
but also…. desktop documents, note books, the back of envelopes….unstructured, badly organised, often hiding key metadata about other data
Tables/Spreadsheets/Databases
What is data? Perspectives and definitions
From the concordat: Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical). These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods, or information derived from existing evidence.
Career - any data that would require time and effort to replicate and might just be useful to return to
Policy - UH UPRs: Personal or Confidential Information (PCI) - person identifiable information and other confidential and commercially sensitive information – includes valuable data = research data
Personal - your own PCIhttp://www.staffnet.herts.ac.uk/documents/ict/ict-sstems-training/Managing_Personal_and_Confidential_Information.pdf
http://www.rcuk.ac.uk/documents/documents/concordatonopenresearchdata-pdf/
Data Management PlanningMake a plan - a Data Management Plan (DMP)
A DMP is living document about the stewardship of your working data, and the curation (or disposal) of that which needs to be preserved (or destroyed) at the conclusion of the work
Extent of your plan depends on your context –
• informal, for the benefit of the well organised individual
• structured generic format acceptable to RCUK, EU as a part of funding applications
• externally specified to satisfy rigorous demands, for example, a Standard Operating Procedure (SOP) in clinical work
Data Management Planning DMPonline from the Digital Curation Centre
https://dmponline.dcc.ac.uk
• templates for all major UK funding bodies
• UH template
• DMPs are working, evolving documents
• Professionally compiled output in a variety of formats
http://research-data-toolkit.herts.ac.uk/2012/01/data-management-planning-using-dmponline/
http://research-data-toolkit.herts.ac.uk/wp-content/uploads/2012/01/rtdk-herts-mock-dmp.pdf
http://www.herts.ac.uk/rdm/planning/data-management-plans/uh-dmp-template
Data Management Planning Elements of a Data Management Plan
Introductionandcontext
Datatypes,formats,standards,andcapturemethods
Legalandethicalissues
Short-termstorageanddatamanagement
Access,datasharingandre-use
Depositandlong-termpreservation
Good practice File management strategy
A good file management strategy will pay off at some stage:
• Adopted a file naming practice, and stick to it
• X: > Research > Group name > ProjectName > 20170307-keyworded-filename.docx
• Use dates in filenames – easy to sort and search
• Consider printing to PDF from proprietary formats for future discoverability
Good practice Metadata is key
What contextual details are needed to make this data useful in the future?
• who is in this photograph?• who was the photographer?• when was it taken?• where was it taken• why was it taken??• how was it taken?
Without contextual metadata the data itself will very quickly become seriously devalued (useless even) and will never be discoverable and useful to anyone else
Good practice Plan to share – ethical approval
There is a pervasive myth that you can not share sensitive data acquired during your research. This is not true – IF you to plan ahead and design your research appropriately.
At the outset:
• seek permission for reuse in research
• get permission to publish anonymised data
• think about how to retain the impact of the data whilst making it anonymous
• if open access is not appropriate consider data deposit in a controlled access environment with a dataset metadata record in an open repository
• include your DMP in the ethical approval process
Practical steps to publish anonymised data from a Post Graduate Researcher: https://github.com/peterhcharlton/RRest/files/386203/20160728.Data.Dialogue.pdf slide 12 onward
ActivityWhere is your data?
What media to you use to store, transport or share your working data?
(2 minutes each)
Common data storage practice, expedient… but risky USB sticks and unregulated cloud storage
fragile, easy to lose unfavourable terms & conditions
if you use these for data transport or backup - encrypt PCI and delete it when no longer needed
Common data storage practice, expedient… but risky Hard drive death and laptop theft
1 in 20 hard drives fail within 18 months,1 in 8 within 4 years
http://www.extremetech.com/computing/170748-how-long-do-hard-drives-actually-live-for
http://www.intel.com/content/dam/doc/white-paper/enterprise-security-the-billion-dollar-lost-laptop-problem-paper.pdf
7% of laptops are stolen or lost. This rate is higher in education and research
Good practice One message above all - Use networked storage
• enterprise storage: tiny chance of data loss, > 10000x safer than any local device
• U:drive and X:drive - secure, available anywhere in world 24/7, more than enough space for most people
• R:drive – terrabyte research group storage also available
• (90 day limit on restoration after accidental user deletion)
Good practice OneDrive within Office 365 now available to all @UH
• Effectively limitless storage via university Office365 account - www.office.com
• No VPN, excellent Web GUI, supports sharing (needs care)
• Some limitations for use – slower from inside UH, limited protection for user error
http://www.staffnet.herts.ac.uk/documents/ict/ict-systems-training/File_storage_-_staff_guide.pdf
Good practice if in doubt consult the File Storage Guide
http://www.staffnet.herts.ac.uk/documents/ict/ict-systems-training/File_storage_-_staff_guide.pdf
Good practicewhy don’t we do it?
• “it is too fiddly, time consuming, or otherwise bothersome to do it right”
• “there isn’t enough space; I can’t work on it at home; my data will disappear into a central system that I have no control over; they will lose it, or give it away
• “it belongs to me, I can look after it best”
None of this is true (anymore).
Good practice additional tools: Document management
Enterprise document manage system available for project work
• use when a high standard of project or data governance is required
• versioning, file level audited access control, retention and disposal policies
• project based structure designed by UH researchers
• Drag and drop via web GUI, or mounted drive
• Available to all projects for group work
Good practice additional tools: use encryption
When working away from your ‘home’ environments – or for transporting or transferring data offsite - keep your data secure in an encrypted folder
Veracrypt works on most operating systems. https://veracrypt.codeplex.com/wikipage?title=Downloads
• Cross platform opensource Encryption that works with Windows, Mac and Linux
• Pack your files into an encrypted volume• Send by email, shared drive, cloud storage• Password access
Good practice Use ExchangeFile for data transfer, delivery to collaborators
• approved alternative to unregulated file sharing systems such as Dropbox, Gdrive, YouSendIt, and MailBigFile
• handles files too big for email
• recorded transfer
• auto-disposal
https://www.exchangefile.herts.ac.uk
Good practicea bit of light relief
https://www.youtube.com/embed/N2zK3sAtr-4
A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci. This is what shouldn't happen when a researcher makes a data sharing request!
Open Dataextend publication to include data
PreservationPublicationWorkingData
AllData
Paper
Journal(GoldOA)
OpenRepository(GreenOA)
UHRA
Datasets
SupportingData National/Subject/Archive
The selection of data, methods, algorithms, results, plots, and conclusions are included in research papers. Open Access to these research outputs is now required practice.
There is a corresponding demand for Open Data to support all published work.
RCUK Guidelines > expectations for researchers >…. All papers must include … and, if applicable, a statement on how the underlying research materials – such as data, samples or models – can be accessed (since 2013)http://www.rcuk.ac.uk/documents/documents/rcukopenaccesspolicy-pdf/
Data Statement Author’sManuscripts
Open Data Long term preservation – what to keep?
• Everything? – long term cost preservation prevents this
• Selection is important. Focus on the data that is required to validate your research
• Attach metadata descriptions and access mechanisms if appropriate, use future proofed file formats
• Deposit as required by your funding body in an appropriate national or international subject based repository
• Otherwise deposit in University of Hertfordshire Research Archive (UHRA) (which also holds the Author Accepted Versions of our research papers)
Open Data Good example: Digital Humanities
https://doi.org/10.5281/zenodo.31026
This data has been prepared and described so as to make it FAIR –
FindableAccessible Interoperable Re-usable
https://www.force11.org/group/fairgroup/fairprinciples
RDM resources and local help
also
http://staffnet.herts.ac.uk/staffnet-tools/how-to-guides/computing-guidelines.htm
http://www.staffnet.herts.ac.uk/staffnet-tools/how-to-guides/file-management-and-storage.htm
http://staffnet.herts.ac.uk/documents/ict/ict-systems-training/Managing_Personal_and_Confidential_Information.pdf
http://www.herts.ac.uk/rdm - compiledfromtwoJISCprojectsatUH(tobeupdatedin2017)
http://www.herts.ac.uk/research-data-toolkit - anRDMblogoftheconductofourJISCprojects
[email protected] Research and Scholarly Communications team
http://www.dcc.ac.uk - the Digital Curation Centre, extensive world renowned resource
http://datalib.edina.ac.uk/mantra/ - brilliant online course in RDM
http://www.tubechop.com/watch/421197 - RDM explained by leading advocates
Closing AnecdotesClimateGate
The ClimateGate controversy in which the University of East Anglia (and the global effort on climate change) suffered huge reputational damage started with an incident of data loss (theft), and was exacerbated by a reluctance to publish underlying data
http://www.theguardian.com/environment/blog/2012/apr/24/uea-climate-change-email-publicity
http://www.theguardian.com/environment/2010/jul/07/climate-emails-question-answer