52
©David Castillo Domenici, FreeDigitalPhotos.net Data Management Graça Gabriel “Data that is loved tends to survive.” Kurt Bollacker Department of Engineering, Library and Information Service Attribution-NonCommercial-ShareAlike

Data management

Embed Size (px)

Citation preview

©David Castillo Domenici, FreeDigitalPhotos.net

Data Management

Graça Gabriel

“Data that is loved tends to survive.” Kurt Bollacker

Department of Engineering, Library and Information Service

Attribution-NonCommercial-ShareAlike

What is data?

©EpicGraphic.com

Presentation Information Data Knowledge

What is data?

The Royal Society. (2012). Science as an open enterprise. Available at www.oecd.org/sti/sci-tech/38500813.pdf (retrieved 6 January 2014).

“’research data’ are defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. A research data set constitutes a systematic, partial representation of the subject being investigated.” (OECD, 2007, p.13)

OECD. (2007). OECD Principles and guidelines for access to research from public funding. Available at www.oecd.org/sti/sci-tech/38500813.pdf (retrieved 18 October 2013).

What is research data?

Digital universe

EMC. (2012). The digital universe in 2020: big data,

bigger digital shadows, and biggest growth in the

Far East. Available at http://www.emc.com/leadership/digital-universe/iview/executive-summary-a-universe-of.htm

(retrieved 14 January 2014).

• Video;• Audio;• Databases;• Still images;• Spreadsheets;• Text documents;• Instrument measurements;• Experimental observations;• Quantitative/qualitative data; • Slides, artefacts, specimens, samples; • Survey results & interview transcripts;• Simulation data, models & software;• Sketches, diaries, lab notebooks;• …

Types/formats of research data

©David Castillo Dominici, FreeDigitalPhotos.net

©Supertrooper, FreeDigitalPhotos.net

©Salvatore Vuono, FreeDigitalPhotos.net

©Mentor Graphics

©zirconicusso, FreeDigitalPhotos.net

©NOAA©Evgeni Dinev, FreeDigitalPhotos.net

Data types

The Royal Society. (2012). Science as an open enterprise. Available at www.oecd.org/sti/sci-tech/38500813.pdf (retrieved 6 January 2014).

©Stuart Miles, FreeDigitalPhotos.net

Do you know what your funders expect of your research?

What plans have you made for you research data?

What type of note-taking have you designed?

Early planning 1. Funding bids requirements

Become familiar with what funders expect in terms of:

•Managing generated data (how you will document and maintain the research you produce);

•Publishing results (how/where to publish);

•Sharing outputs (open access types);

•Depositing and preserving outputs (how you will ensure your data is accessible in the long term, such as depositing papers in a repository or using a recommended data centre for safekeeping).

Help provided by:

Department/group computing officer(s)

Cambridge Research Office

DSpace@Cambridge support staff

Librarians

Early planning 2. Data planning

Plan ahead for your data management needs:

•Type of data created

• Consider what data will be created (e.g. interview data and transcripts, experimental measurements, high resolution imaging…);

• Consider how data will be created/captured (e.g. recorded, printed, made available in a website/intranet);

• Consider the equipment/software required (Find out if there is funding in case new software is needed).

Early planning 2. Data planning

Plan ahead for your data management needs:

•Choose what data format(s)

• What discipline-specific norms already exist;

• What software/formats you or colleagues have used in past projects, and which will be easiest to share with others (e.g. Microsoft Excel for recording data, SPSS for analysis);

• What formats will be easiest to annotate with metadata;

• What formats are at risk of obsolescence;

• What software is compatible with hardware you already have.

Early planning 2. Data planning

Plan ahead for your data management needs:

•Volume of data created

• Consider where data is going to be stored;

• Consider if the scale of data poses challenges when sharing/ transferring data.

•Plan how to sort and analyse data;

•Investigate about Intellectual property rights (IPR) concerning your research and its dissemination, future related research projects, and associated profit or credit.

Early planning 2. Data planning

Plan ahead for your data management needs:

•Investigate about data protection and ethics

According to the Data Protection Act 1998 (governs the processing of personal data), information must follow eight data protection principles:

• processed fairly and lawfully• obtained for specified and lawful purposes• adequate, relevant and not excessive• accurate and, where necessary, kept up-to-date• not kept for longer than necessary• processed in accordance with the subject's rights• kept secure• not transferred abroad without adequate protection

“Plagiarism is defined as submitting as one's own work, irrespective of intent to deceive, that which derives in part or in its entirety from the work of others without due acknowledgement. It is both poor scholarship and a breach of academic integrity.” (Cambridge University, 2011)

).

© Thomas Hawk via Flickr

Early planning 3. A note on plagiarism

Early planning 4. Note-taking

Design a reading grid to take notes of the main ideas/data/research (including specific citations that you may want to use later on).

•Quivy and Campenhoudt

Quivy, R.; Campenhoudt, L. (2008). Manual de investigação em ciências sociais (5 ed.). Lisboa: Gradiva.

Main ideas/content Evaluation of ideas/content

1. e.g. Theory A considers… (pages x-x) e.g. Different theories; Take further research on those supporting theory x and theory y;

2. e.g. Theory B considers…3. e.g. Theory C…

Early planning 4. Note-taking

• The Cornell Method

Pauk, W. (1993). How to study in college (5th ed.). Boston: Houghton Mifflin Co.

Major themes Detailed points1st main pointe.g. There are several types of theories

More detailed information. e.g. Theory A explains…More detailed information.e.g. Theory B explains…e.g. Theory C explains…

2nd main pointe.g. Why do some believe in theory A

e.g. Reason 1…e.g. Reason 2…

critical evaluatione.g. Both theories A and B do not explain the occurrence of xxx.

Early planning

Further information

Cambridge University Intellectual Property Rights Regulations

DSpace@Cambridge IPR page

JISC Legal IPR page

DPA 1998: advice for Cambridge staff and students

University page about the Data Protection Act 1998

UK Data Archive Duty of confidentially

The Information Commissioners's Office Guide to data protection

JISC Legal Guide to data protection

Contact the Protection Office: [email protected]

University self-taught courses: Data Protection Training for Academic Staff ; Data Protection Training for Administrators

LEKO via Jalopnik, ThePimp.Blog

How do you organise your files?

How do you name your files?

Do you create metadata to help describe your data?

Do you manage your emails?

How do your organize your bibliographic references?

Do you have remote access to your data?

• Adhere to existing procedures (within your research group, Department or preferred by your supervisor);

• Use folders and subfolders • Name folders appropriately (e.g. after the areas of work and not after

individual researchers or students); • Be consistent with a naming scheme; • Structure folders hierarchically (limited number of folders for the

broader topics, and more specific folders within these);• Separate on-going and completed work;

• Be consistent with filenames • Choose a standard vocabulary: use a revision numbering system

(e.g. xxxx_v01.doc; 1930film0001.tif); specify the amount of digits to use (standard: eight-character limit);

Organize your data 1. Naming and organizing files

• Be consistent with filenames • Decide on the use of dates so that documents are displayed

chronologically;• Include a version control table for important documents;• Avoid characters such as / : * ? < > | (because they are reserved for

the operating system) and spaces (use hyphens or underscores particularly with files destined for the Web);

• When drafts are circulating, decide how to identify individuals (e.g. xxxx_gdcf2_v01.doc);

• Mark the final document as “Final” and prevent further changes.

• Review records (assess materials regularly or at the end of a project to ensure files aren’t kept needlessly);

• Backup your files/data/favourites.

Organize your data 1. Naming and organizing files

• Use metadata (data about data - usually embedded in the data files/documents themselves) to add information to your documents (e.g. use Microsoft Office’s “Document properties”).• Create both study-level information

about the research and data creation, as well as descriptions and annotations at the variable, data item or data file level;

• Provide searchable information to help you/others find information.

Organize your data 2. Documentation and metadata

• Standard metadata fields:• Title (Name of the dataset or research project);• Creator (organization or people who created the data);• Identifier (number used to identify the data);• Subject(s) (keywords describing the subject or content of the data);• Funders;• Rights (known intellectual property rights held for the data);• Access information (where/how data can be accessed by others);• Significant dates (project start and end date; release date; data

lifespan; update schedule);• Methodology (how the data was generated);• Code lists (explanation of codes or abbreviations used);• Versions (date/time stamp for each file); • List of file names (list of all data files associated with the project).

Organize your data 2. Documentation and metadata

Organize your data 2. Documentation and metadata

Further information

Data Documentation Initiative

UK Data Archive: Documenting your data

MIT Libraries Documentation and metadata

Library of Congress Authorities

JISC Digital Media Approaches to describing images

Help provided by Dspace@Cambridge: [email protected]

Organize your data 3. Use RSS feeds

• Structure information from the web (news websites, blogs, etc.) into a feeds reader (e.g. feedly, digg reader, NewsBlur, NetVibes);

©Vector, www.youtoart.com

• Set up journal alerts or citation alerts (from databases).

Organize your data 4. Manage your email

• Structure your folders by subject, activity or project;

• Set up a separate folder for personal emails (create filters so they go directly here);

• Archive old emails (even if it’s in an "Archive" folder);

• Delete useless emails and block junk email;

• Limit the use of attachments (use alternative ‘data sharing’ options) but, if important, save them.

• Try applications to help you manage your email (see “7 great services for taking back control of your inbox”)

• Keep track of every bibliographic reference used/seen;

• Use a reference management software;

• Backup your bibliographic data.

Organize your data 5. Managing references

Further information

University Library webpage about Mendeley, Zotero and EndNote

Organize your data 6. Remote access

©winnond, FreeDigitalPhotos.net

• Use a single technology/method of remote access

or

• Decide on clear rules for managing your remote access technologies

• Designate one device as your “master” storage location;

• Transfer the latest versions of your files to your master device ASAP, every time that you do work away from your master storage location;

• Back up your important files regularly.

• Departmental/college Virtual Private Network (VPN)See the University Computing Service Info sheet

• Desktop Services AccountSee the University Computing Service Introduction to Desktop Services

• Research group’s CamTools site (Moodle in the future)See CamTools site CamTools Helpdesk [email protected]

• Online services that provide storage (e.g. DropBox)

• Online/desktop programs to storage and keep track of the changes made to documents (e.g. Git)

Organize your data 6. Remote access

• Key printed data should be kept in a secure location where access can be restricted to authorised personnel or in locked cupboards;

• Keep our sensitive electronic data password protected, encrypted or sett privileged levels of access (including backups);

• Do not use printouts with sensitive data as scrap paper. Chose efficient methods of disposing (like shredding);

• Computer terminals should not be left unattended and should be logged off at the end of each session;

• Protect your computer with anti-virus, firewall and anti-keylogging;

Organize your data 7. Keep your data safe

• Choose strong passwords (use a mix of upper and lower case letters and digits/punctuation characters) • If you store passwords on a computer system, encrypt the file;• Never give your password to other people;• Frequently change passwords.

Organize your data 7. Keep your data safe

Further information

University Computing Service Password? What password? CUED Departmental policy on data protection

• Store crucial data in more than one secure location• Networked drives;• Personal computers/laptops;• External storage devices (CDs, DVDs, USB

flash drives);• Remote or online systems for storing

(Dropbox, Mozy, A-Drive, etc.).

Further information

See: http://datalib.edina.ac.uk/mantra/

Jones, S. (2011). How to Develop a Data Management and Sharing Plan. Edinburgh: Digital Curation Centre. Available at: http://www.dcc.ac.uk/resources/how-guides/develop-data-plan#sthash.hwE7pntn.dpuf

(retrieved 17 February 2014).

Further information

©Pixabay.com

How do you decide what to keep/delete?

Where/how are you going to preserve your data?

Preserving your data 1. Information in the cloud

EMC (2012). The digital universe in

2020: big data, bigger digital shadows, and biggest growth in the Far East. Available at

http://www.emc.com/leadership/digital-universe/iview/executive-summary-a-universe-of.htm

(retrieved 14 January 2014).

Preserving your data 2. What to keep/delete

• Does your funder/university needs to keep data and /or make it available for a certain amount of time?

• Is the data a vital record of a project/organisation/consortium and therefore needs to be retained indefinitely?

• Do you have the legal and intellectual property rights to keep and re-use the data? If not, can these be negotiated?

• Does sufficient metadata exist to allow data to be found wherever it is stored?

• If you need to pay to keep the data, can you afford it?

• Only store what you need to keep! Storage costs money and/or effort and storing massive amounts of data require a well thought plan to organize it so that information is easily found;

Further information

The University Computing Services provides up to 500 MB of centralised file storage space through the public workstation facility (PWF), which also allows you to store and access files online.

Some colleges/departments/research groups provide networked storage (ask your local computing officers for details).

Digital Curation Centre The value of digital curation

UK Data Archive FAQ

Engineering Research Information Management Project (ERIM)

National Preservation Office Caring for CDs and DVDs

Wikipedia List of backup software

Wikipedia Comparison of online back-up services

Preserving your data 3. Storage

Preserving your data 4. Long-term storage

• Digital repositories

Provide online archival storage – usually open access – and care for digital materials, ensuring that they remain readable for as long as the repository survives. e.g. Dspace@Cambridge

• Archive/data center

Ensure data safe-keeping in the long term: datasets are fully documented with all bibliographical details and users of the data are aware of the need to acknowledge the data sources in publications.

e.g. Archaeology Data Service

Digital Curation Centre. (cop. 2004-2014). DCC

curation lifecycle model [image]. Available at

http://www.dcc.ac.uk/resources/curation-lifecycle-

model (retrieved 17 February 2014).

Summary

©SOMMAI, FreeDigitalPhotos.net

Should you share your data/research?

Are there impediments to sharing data/research?

Do you have/need a marketing plan to publicise your research?

• Scientific integrity - publishing your data and citing its location in published research papers can allow others to replicate, validate, or correct your results, thereby improving the scientific record.

• Funding mandates - UK research councils are increasingly mandating data sharing so as to avoid duplication of effort and save costs.

• Raise/Increase the impact of your research - those who make use of your data and cite it in their own research will help to increase your impact within your field and beyond it.

• Preserve your data for future use – anyone can benefit by being able to identify, retrieve, and understand the data yourself after you have lost familiarity with it, perhaps several years hence.

Market your data 1. Reasons to share

• Teaching purposes - your data may be ideal for others to learn how to collect and analyse similar types of data themselves.

• Making publicly funded research available publicly - there is a growing movement for making publicly funded research available to the public, as indicated for example, in the Organisation for Economic Co-operation and Development (OECD) Principles and Guidelines for Access to Research Data from Public Funding.

• Increase transparency through creating, disseminating and curating knowledge.

• Increase collaboration - the use of archived data by other researchers may lead to with the data owner and to co-authorship of publications based on re-use of the data.

Market your data 1. Reasons to share

• If your data has financial value or is the basis for potentially valuable patents that could be exploited by the University, it may be unwise to share it, even with a data licence or terms and conditions attached.

• If the data contains sensitive, personal information about human subjects, it may violate the Data Protection Act, ethics codes, or your own written consent forms to share it, even with other researchers. (often there are ways to anonymise the data to remove the personally identifying information from it, thus making it sharable as a public use dataset).

• If parts of the data are owned by others, such as commercial entities or authors, then even if you have derived wholly new data from the original sources you may not have the rights to share the data with others.

Market your data 2. Reasons not to share

• Publish in Open Access journals or deposit a copy into DSpace@Cambridge;

• Enhance your online presence though social media (Facebook, Twitter, start and maintain a blog);

• Use author identification (researcherID from Web of Science; Scopus ID, ORCID);

• Share research in ”academic” platforms (LinkedIn, Academia.edu, ResearchGate, Microsoft Academic Search, Mendeley);

• Keep track of different metric statistics (number of citations);

Market your data 2. How do you market?

Market your data 2. How do you market?

Further information

Digital Curation Centre Overview of major funders’ data policies

SHERPA JULIET searchable international database of funders' open access and archiving requirements

Times Higher Education supplement "Research intelligence - Request hits a raw spot" (15 July 2010)

Dspace@Cambridge

DOAJ – Directory of Open Access Journals (with information on OA journal preservation program and OA quality standards

OAD – Open Access Directory

Summary

Guidance Leaflet by DICE, SHARD and PrePARe projects.

Thank you

Graça Gabriel

[email protected]

Department of Engineering, Library and Information Service

[email protected]

Telephone: +44 1223 332626