Research Data Management Gareth Cole. Data Curation Officer. 14 th January 2015

Preview:

Citation preview

Research Data ManagementGareth Cole. Data Curation Officer. 14th January 2015

Today’s Session

• Introductions

• Data Storage

• Data Sharing

• Open Access

Introductions

• Who are we?

• Who are you and why are you here?

Why Manage Data?

Short-Term:

• Increase efficiency and save time

• Simplify your life

• Meet funder and institutional requirements

Long-term:

• Preserve your data

• Easier Sharing and collaboration

• Raise your visibility and research profile

Roles and Responsibilities in RDM

PGR Policy• On open access to research papers and research data

management (RDM) – http://hdl.handle.net/10036/4279

RDM:• Encourage good practice in RDM.• Compliance with funder policy.• Data should be made available on open access when

legally, commercially and ethically appropriate.

Guidance

• Annual data review with supervisor (checklist)“The lead PGR Supervisor and the PGR student should discuss and

review research data management annually.”

• Responsibilities“Responsibility for ongoing, day-to-day management of their research

data lies with PGR students. Where the PGR is part of a project, data management policy will be set and monitored by the Principal Investigator (PI) and the PGR will be expected to comply with project guidelines. The lead PGR Supervisor is responsible for advising the PGR student on good practice in research data management.”

• Online guidance, training workshops and face-to-face advice service.

RCUK Common Principles on Data PolicyThe Common Principles are available on the RCUK website:

• Open Data

• Accessible Data

• Discoverable data

• Legal, ethical and commercial considerations should be considered

• Privileged use of the data allowed

• Data use should be acknowledged

• Public funds can be used to support the management and sharing of publically-funded research data

ESRC Policy• Data management plan• Period of privileged use: Data must be made available or archived

within three months of end of the award. • Data Sharing/Archiving: Via UKDS“Where research data are considered confidential or contain sensitive personal data, award holders must seek to secure consent for data sharing or alternatively anonymise the data in order to make sharing possible.”• Monitoring: The final payment of a grant may be withheld if data has

not been offered for deposit to the required standard. • Costs: The ESRC will review any costs associated with implementing

the data plan.

Useful Links

• University of Exeter PGR policy• Draft supervisor checklist• RCUK Common Principles on Data Policy

• ESRC research data policy• DCC summary of funders’ RDM policies

Storing your Data

• Data storage

• Back-up

• File naming

• Versioning

• Data security, encryption and destruction

• Where to find further information

Which is the final version?

Data Storage 1

Where will you be working: at home; in the office; both; fieldwork?

Will you be working collaboratively?

• U:Drive – 20GBs allowance

• ExeHub – 1TB through OneDrive but…

• Cloud storage (not for sensitive or confidential data)

• Computer hard drive

• External hard drives and USB sticks

• Hard copy of documents

Data Storage 2

Remember: File formats and physical storage media become obsolete.• All digital media are fallible: optical (CD, DVD) and magnetic media (hard drive, tapes) degrade• Never assume the format will be around forever

Storage strategy best practice:• At least two storage formats• Prefer open or standard formats – e.g. OpenDocument Format (ODF),

comma-separated values• Some proprietary data formats such as MS Excel are likely to be accessible

for a reasonable, but not unlimited, time• Maintain original copy, external local copy and external remote copy• Copy data files to new media two to five years after first created• Check Check data integrity of stored data files regularly (checksum e.g.

FastSum)

Non-Digital Storage

Always follow the procedures stated in your ethical approval

Confidential items, e.g. signed consent forms, interview notes

• Store securely, behind a lock

• Separate from data files

Printed materials, photographs• Degradation from sunlight and acid (sweat on skin, in paper)• Use high quality media for long-term storage/preservation. e.g.

using acid-free paper & boxes, non-rust paperclips (no staples)

Why Back-up?

Back-ups are additional copies that can be used to restore originals.

• Protect against: software failure, hardware failure, malicious attack, natural disasters

e.g. University of Southampton fire• It’s not backed-up unless it’s backed-up with a strategy

Backing-up need not be expensive• 1Tb external drive = around £50

Back-Up Strategy

Know your institutional and personal back-up strategy:

• What’s backed-up? - all, some data?

• Where? - original copy, external local and remote copies

• What media? - CD, DVD, external hard drive, tape, etc.

• How often? – assess frequency and automate the process

• For how long is it kept?

• Verify and recover - never assume, regularly test a restore

Make sure you know which version is the most up to date...

File Naming

File name = principal identifier of fileEasy to: identify, locate, retrieve, access

Provides context e.g.:

• version number e.g. FoodInterview_1.1

• date e.g. HealthTest_2011-04-06

• content description e.g. BGHSurveyProcedures

• creator name e.g. CommsPlanGJC

File Naming: Best Practice

• Brief and relevant

• No special characters, dots or spaces

• For separation use underscores _  or -

• Name independent of location

• Date: YYYY_MM_DD

Have a System!

• Consistent and logical naming system

• Develop a system with colleagues for shared data

Version Control Tools/Strategies

• Record file status/versions

• Record relationships between files

e.g. data file and documentation; similar data files

• Keep track of file locations

e.g. laptop vs. PC

Version Control: Single User

• File naming; unique file name with date or version number

• Version control table or file history alongside data file

Version Control: Multiple Users

• Control rights to file editing: read/write permissions

e.g. Microsoft Office

• Versioning/file sharing software:

e.g. Google Drive, Amazon S3

• Merging of multiple entries/edits

Version Control: Multiple Locations

• Synchronise files

e.g. MS SyncToy software, DropBox

• Use remote desktop

Encryption: Personal or Sensitive Data

Encrypt anything you would not send on a postcard

• for moving files e.g. transcripts

• for storing files e.g. shared areas, mobile devices

Free software that is easy to use:

• 7-Zip – this is what the University recommends:

http://as.exeter.ac.uk/it/infosec/encryptfiles/

Data Destruction

When you delete data and documentation from a hard drive, it is

probably not gone:

• Files need to be overwritten to ensure they are irretrievably

deleted:

• BCWipe - uses ‘military-grade procedures to surgically

remove all traces of any file’

• If in doubt, physically destroy the drive using an approved

secure destruction facility

• Physically destroy portable media, as you would shred paper

Data SecurityProtect data from unauthorised access, use, change, disclosure and

destruction

Personal data need more protection – always keep separate

Control access to computers • passwords• anti-virus and firewall protection, power surge protection• networked vs non-networked PCs• all devices: desktops, laptops, memory sticks, mobile devices• all locations: work, home, travel• restrict access to sensitive materials e.g. consent forms, patient

records

Proper disposal of equipment (and data)• even reformatting the hard drive is not sufficient

Control physical access to buildings, rooms, cabinets

Data Sharing

• Two stages of your project when you may share data• “Live” sharing during the project• Making your “completed” data available at the end of your

project

Why Should you Share your Data

• Benefits – “Live” data• Increased collaboration opportunities with colleagues• Increased exposure of your current work• Increased efficiency across research group

• Benefits – “Completed” data• Increased citation counts• Increased exposure for your work• Increased chance of collaboration in the future• Allows others to build on your research

• Policy• RCUK Common Principles on Data Policy• University Policy

Group Exercise One

• Thinking of the data you have shared:

• What are the pros and cons of the different methods you

have used?

• What issues did you face when sharing your data?

• Why haven’t you shared data?

• Feedback to the group.

Sharing your Data – During your ResearchWith your supervisor; with project colleagues; with external interested parties• Cloud Storage – Dropbox, Googledrive, OneDrive etc. Not

recommended for sensitive or personal data• Email – issues with large data and/or sensitive data. Potential

version control problems• USB sticks – easily lost. Can transfer viruses• External hard drives – less suitable if collaborator is at a

different institution• Websites – lack of permanency. Need internet connection. May

not have access rights to the site• FTP – Not secure. Data can be intercepted• Hard copy documents – one of a kind

How to Share your Data – at the end of your Research• Archive Repositories

• Discipline specific archive• Archaeology Data Service• UK Data Service• Wellcome Trust list of Data Repositories

• (Inter)national archive• UKDA

• University repository – Open Research Exeter (ORE)• Link your data with your thesis/research papers

• Websites• Link from your University personal webspace to data in a repository• Link from academic network sites

• Academia.edu, ResearchGate.net

Issues in Data Sharing

• Ethical and Data Protection Act

• Copyright and legal issues

• File size

• Metadata

• Discoverability of the data

• Re-use of data

• Documentation of data

• File format – open or proprietary

• What to share

• Quality control and versioning

Ethical and DPA Issues

• Not all data can be shared. • You must ensure that you don’t share data you are not allowed

to:• Abide by your ethical approval• Abide by the Data Protection Act

• Are you sharing this data securely?• Have you got consent to share the data?• Use Cloud Storage wisely – not for sensitive data

• Getting consent: Advice from the UKDA• Ethics advice from: College Ethics Officers (Exeter username

and password needed)• DPA Advice: recordsmanagement@exeter.ac.uk

Copyright and Legal Issues

• You must abide by any contract you or your project group have signed:

• This may state that you are not allowed to share the data or it may include the conditions of sharing

• You must be aware of who owns the copyright for the data you are sharing:

• You may not be allowed to share it• You must get permission from the copyright owner before

sharing data• Also applies to data in your thesis

• Advice from JISC Digital Media on using images

File Size

• Large files cannot be emailed

• Some files may not fit onto USB sticks

• How do you know if a file has been received?

• Use the University’s File Drop Box (up to 600MB)

• Large files can take a long time to upload to Cloud Storage

File Format

• Is the file format you are using widely used?• If not, can you migrate it to a more widely used format?

E.g. .xlsx (Excel); .pdf• Is the format you are using an “open” format or is it

“proprietary”?• Open formats can be more easily accessed by other

researchers e.g. SPSS files can be saved as .csv files. Word files can be saved as an Open Document format (.odt rather than .docx)

• Make sure you don’t lose important information when migrating formats

• See advice from the UKDA

38

Example: Format Conversion

MS Excel format

Tab–delimited text format

Loss of annotation

Metadata

• Record the metadata as you collect/create your data• Have you provided information about the data with the data you

share?• It is needed for discoverability, reuse, reproducibility and

verification etc.• E.g.

• Author• Title• Date of creation• Publisher• Abstract• Description of the data

• University web page

Example of Metadata Record in Institutional Repository

Supporting Documentation

• Have you provided enough information for another researcher to be able to understand, retrieve, validate and re-use the data?• Where was the data created?• How was the data created?• What hardware and software were used?• What methodologies were used?• What assumptions did you make in your experiments?• Why are there anomalies in your data?

• Along with the metadata, the documentation should enable the data to be understood and reusable independently of any other publications, data etc.

• Advice from the UKDA

What to Share?

• You don’t need to share all your “live” data• Only data that is helpful and useful to the recipient

• What to archive?• Consider policy/legal requirements• In collaboration with your supervisor or PI develop a set of

criteria:• Only the data supporting your publications?• Data that can reproduce your results?• Data that can validate your results?• How unique or significant is your data?

• University web pages

Data Re-use

• Data citation is becoming more common• Get credit for all your research

• If others use your data it can increase your citation rates.• Sharing can mean that your data is re-used in areas you didn’t

think it could be.• E.g. ships’ logs are being used by climate scientists

• Prof. Tim Naylor on data sharing:‘I have examples of people who could have simply lifted the

data, gone away and done something with it and given me a citation for it; but actually they have come to me and said, “OK, I’ve got this data, which is yours, we’re interested in it, but we need your expertise to interpret it” and then I get a co-authorship out of it as well.’

Open Access to Completed Data

• You will be required to make your data available on Open Access

where appropriate

• RCUK Common Principles on Data

• Wellcome Trust Policy Statement

• UoE policies

• Link publications and supporting data

• RCUK requires a statement in published research papers

saying where the supporting data can be accessed

• Archives and repositories

Ethical and Legal Issues in Data Sharing

• Ethical arguments for archiving data• Duty of confidentiality/DPA • Options for sharing personal/sensitive data

Ethical Arguments For Archiving Data

• Store and protect data securely • Not burden over-researched, vulnerable groups• Make best use of hard-to-obtain data (e.g., elites, socially

excluded)• Extend voices of participants• Provide greater research transparency• Enable fullest ethical use of rich data

Duty of Confidentiality

• UK: Duty of confidentiality exists in common law

• Public interest can override duty of confidentiality; best practice is to avoid vague or general promises in consent forms

• If participant consents to share data, then sharing does not breach confidentiality

Data Protection Act, 1998• Personal data:

Relate to living individualIndividual can be identified from those data or from those data and

other informationInclude any expression of opinion about the individual

• Fair processing:Open and transparentJustified and reasonable; not kept longer than necessary Processed in accordance with the rights of data subjects,

e.g. right to be informed about how data will be used, stored, processed, transferred, destroyed, right to access info and

data held

• SecurityProtect against unauthorised access, data loss, damage to dataNot transferred abroad without adequate protection

• Only disclosed if consent has been given to do so (except legal duty)

Sensitive Personal Data

• Sensitive personal data• Refers to individual's race or ethnic origin, political

opinion, religious beliefs, trade union membership, physical or mental health, sex life, criminal proceedings or convictions.

• Can only be processed for research purposes if:• Explicit consent (ideally in writing) has been obtained; or• Medical research by a health professional or equivalent

with duty of confidentiality; or • Analysis of racial/ethnic origins for purpose of equal

opportunities monitoring; or• In substantial public interest and won’t cause substantial

damage and distress.

Sharing Personal DataIn groups discuss:

• Can we ensure the fair processing of personal dataif the data is to be shared at the end of a project? How?

• Can we ensure the security of personal data whilst allowing data to be shared at the end of a project? If so, how?

5 minutesFeedback to whole group

Options for Sharing Confidential Data

• Obtain informed consent

• Protect identities e.g. anonymisation or by not collecting personal data.

If data are anonymised (personal identifiers removed) then DP laws will not apply as these no longer constitute ‘personal data’

• Restrict /regulate access where needed (all or part of data)

• Securely storing personal or sensitive data

Informed Consent• What is “informed” consent?

• Purpose of the research• What is involved in participation • Benefits and risks• Mechanism of withdrawal• Data uses – primary research, storing, processing, re-use,

sharing, archiving• Strategies to ensure confidentiality of data where this is relevant –

anonymisation, access restrictions

• Informed consent for unknown future uses• A great deal of information can be provided:

• Who can access the data• Purposes – research or teaching or both/other• Confidentiality protections, undertakings of future users• General consent

Consent through the Research Lifecycle• Plan early in research

• Consider jointly and in dialogue with participants

1. Engagement in the research process• E.g. Decide who approves final versions of transcripts

2. Dissemination in presentations, publications, the web• Decide who approves research outputs

3. Data sharing and archiving• Consider future uses of data

Consent Form

• Meets requirements of Data Protection laws

• Simple

• Avoids excessive warnings

• Complete for all purposes: use, publishing, sharing

UK Data Archive model consent form

When to Ask for Consent

Pros Cons

One-off Simple

Least hassle for participant

Research outputs (not known in advance)

Participants will not know all content they will contribute

Process Most complete for assuring active consent

Might not get consent needed before losing contact

Repetitive, can annoy participant

Do Participants Consent to Share Data?

• Foot and mouth disease in N. Cumbria (2001-2003)• Sensitive community information• 40/54 interviews; 42/54 diaries; audio restricted• Deposited in UKDS

• Medical research and biobank models • Enduring, broad, open consent• No time limits; no recontact required• Wales Cancer Bank: 99% consent rate for 2500+ patients

– samples of tumour, normal tissue and blood plus anonymised data sets for researchers.

AnonymisationA person’s identity can be disclosed through:

• Direct identifierse.g. name, address, postcode, telephone number, voice, picture

Often not essential research information (for administrative use)

• Indirect identifierse.g. occupation, geography, unique or exceptional values or characteristicsPossible disclosure of identity in combination with other information

Key points for Anonymising

• Never disclose personal data - unless consent for disclosure

• Reasonable/appropriate level of anonymity• Maintain maximum meaningful information for context, do

not over-anonymise• Where possible replace rather than remove• Re-users of data have the same legal and ethical

obligation to not disclose confidential information as primary users

Anonymising Quantitative Data• Remove direct identifiers

e.g. names, address, institution, photo• Reduce the precision/detail of a variable through aggregation

e.g. birth year vs. date of birth, area rather than village• Generalise meaning of detailed text variable

e.g. occupational expertise• Restrict upper lower ranges of a variable to hide outliers

e.g. income, age• Combine variables

e.g. rural/urban variable for place variables

Geo-Referenced Data• Spatial references (point coordinates, small areas) may disclose

position of individuals, organisations, businesses• Removing spatial references prevents disclosure; but also all

geographical and related information lost

Possible solutions

• Reduce precision:

• Replace point coordinate with larger, non-disclosing geographical area e.g. km2 area, postcode district, road

• Replace point coordinate with meaningful variable typifying the geographical position; or summary statistics of locatione.g. catchment area, poverty index, population density

• Keep spatial references and impose access restrictions on data

Anonymising Qualitative Data• Don’t collect disclosive data unless necessary.• Edit at time of transcription except longitudinal studies -

anonymise when data collection complete (linkages)• Avoid blanking out; use pseudonyms or replacements.

• identify replacements, e.g. with [brackets]• Avoid over-anonymising - removing/aggregating information in

text can distort data, make them unusable, unreliable or misleading.

• Consistency within research team and throughout project.• Keep anonymisation log of all replacements, aggregations or

removals made – keep separate from anonymised data files.

Anonymising Qualitative Data

Example: Anonymisation log interview transcripts

Interview / Page Original Changed toInt1p1 Spain European

countryp1 E-print Ltd Printing

companyp2 20th J une June

p2 Amy MoiraInt2p1 Francis my friend

Access Control• Essential when anonymisation ineffective or damaging to quality

• E.g. visual or audio data

• Gradation of access controls• Open Access• Metadata only – contact details for requesting data reuse,

End User Licence• Dark Archive• Embargo for given time period

• Multiple access controls can apply to different data types within one study

Useful Links

• Getting consent: Advice from the UKDA• Ethics advice from: College Ethics Officers (Exeter username

and password needed)• DPA Advice: recordsmanagement@exeter.ac.uk

Using Secondary Data

• Data review• Copyright and data sharing• Data citation

Data Review

• Good practice to demonstrate that no suitable data are available for re-use before collecting new data; data review (e.g. via UKDS)

• You must abide by any contract you or your project group have signed about using secondary data:

• This may state that you are not allowed to share data or include the conditions of sharing

Copyright

• Find out who owns the copyright for the data you are using/sharing:

• Copyright permissions must be sought and granted prior to data sharing/archiving.

• Also applies to data in your thesis/research papers e.g PhD student’s copyright case study

• Copyright holders give permission to data archives to preserve data and make them accessible to users.

• Data archives publish data – they hold no copyright.

Data Citation 1

• Data citation:• Good research practice.• Acknowledges the author's sources.• Makes identifying data easier.• Promotes the reproduction of research results.• Makes it easier to find data.• Allows the impact of data to be tracked.• Provides a structure which recognises and rewards data

creator

Data Citation 2

• Include enough information so that the exact version of the data being cited can be located.

• Include a Digital Object Identifier (DOI)• Each dataset used must have a separate citation.• Example:

University of Essex. Institute for Social and Economic Research and National Centre for Social Research, Understanding Society: Wave 1, 2009-2010 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], November 2011. SN: 6614, http://dx.doi.org/10.5255/UKDA-SN-6614-2

Useful Links

• Advice from JISC Digital Media on using images• UKDA advice on copyright• ESRC: Data Citation: What you need to know• How to cite census data

Data Management Plans (DMP)

“Plans typically state what data will be created and how, and outline the plans for sharing and preservation, noting what is appropriate given the nature of the data and any restrictions that may need to be applied.”

Digital Curation Centre website

Why write a DMP?

• Many funders now require a DMP as part of the

application process

• Helps the associated project with data management

issues

• Makes the project members think about relevant issues

Data Management Planning

Bids to most major funders now require a DMP outlining:• Roles and responsibilities• What data will be created and how• Data formats• Documentation of data• Storage and back up• Data sharing• Long-term preservation and access...

Guidance available on the University web pages: University web pages on data management plans

Any Questions

Contact us:

openaccess@exeter.ac.uk