46
Data Management for Undergraduate Researchers Office of Undergraduate Research Seminar and Workshop Series Rebekah Cummings, Research Data Management Librarian J. Willard Marriott Library, University of Utah February 23, 2016

Data Management for Undergraduate Researchers (updated - 02/2016)

Embed Size (px)

Citation preview

Page 1: Data Management for Undergraduate Researchers (updated - 02/2016)

Data Management for Undergraduate

ResearchersOffice of Undergraduate Research Seminar and Workshop Series

Rebekah Cummings, Research Data Management LibrarianJ. Willard Marriott Library, University of Utah

February 23, 2016

Page 2: Data Management for Undergraduate Researchers (updated - 02/2016)

• Introductions•What are data? •Why manage data? •Data Management Plans

•Data Organization•Metadata•Storage and Archiving•Questions

Page 3: Data Management for Undergraduate Researchers (updated - 02/2016)

NameMajorResearch Project

Page 4: Data Management for Undergraduate Researchers (updated - 02/2016)

What is data management?

The process of controlling the information (read: data)

generated during a research project.

https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html

Page 5: Data Management for Undergraduate Researchers (updated - 02/2016)

What are data? “The recorded factual material

commonly accepted in the research community as

necessary to validate research findings.”

- U.S. OMB Circular A-110

Page 6: Data Management for Undergraduate Researchers (updated - 02/2016)

Data are diverse

Page 7: Data Management for Undergraduate Researchers (updated - 02/2016)

Data are messy

Page 8: Data Management for Undergraduate Researchers (updated - 02/2016)

Why manage data? •Save time and efficiency

•Meet grant requirements•Promote reproducible research•Enable new discoveries from your

data•Make the results of publicly funded

research publicly available

Page 9: Data Management for Undergraduate Researchers (updated - 02/2016)

We are trying to avoid this scenario…

Page 10: Data Management for Undergraduate Researchers (updated - 02/2016)

Two bears data management

problems1. Didn’t know where he stored the data

2. Saved one copy of the data on a USB drive

3. Data was in a format that could only be read by outdated, proprietary software

4. No codebook to explain the variable names

5. Variable names were not descriptive

6. No contact information for the co-author Sam Lee

Page 11: Data Management for Undergraduate Researchers (updated - 02/2016)

ScenarioYou develop a research project during your undergraduate experience. You write up the results, which are accepted by a reputable journal. People start citing your work! Three years later someone accuses you of falsifying your work.

Scenario adapted from MANTRA training module

Page 12: Data Management for Undergraduate Researchers (updated - 02/2016)

•Would you be able to prove you did the work as you described in the article?

•What would you need to prove you hadn’t falsified the data?

•What should you have done throughout your research study to be able to prove you did the work as described?

Page 13: Data Management for Undergraduate Researchers (updated - 02/2016)

Data Management Plans•What data are generated by your

research?•What is your plan for managing the data? •How will your data be shared?

Page 14: Data Management for Undergraduate Researchers (updated - 02/2016)

Elements of a DMP•Types of data, including file

formats•Data description•Data storage•Data sharing, including

confidentiality or security restrictions

•Data archiving and responsibility•Data management costs

Page 15: Data Management for Undergraduate Researchers (updated - 02/2016)

DMPTool – CDL

Page 16: Data Management for Undergraduate Researchers (updated - 02/2016)

Data organization

Page 17: Data Management for Undergraduate Researchers (updated - 02/2016)

File naming

Page 18: Data Management for Undergraduate Researchers (updated - 02/2016)

MyData.xls

MeetingNotes.doc

Presentation.ppt

Assignment1.pdf

Page 19: Data Management for Undergraduate Researchers (updated - 02/2016)

File naming best practices

1. Be descriptive not generic

2. Appropriate length (about 25 chars or less)

3. Be consistent4. Think critically about

your file names

Page 20: Data Management for Undergraduate Researchers (updated - 02/2016)

File naming best practices•Files should include only letters,

numbers, and underscores/dashes.•No special characters •No spaces; Use dashes,

underscores, or camel case (like-this or likeThis)

•Avoid case dependency. Assume this, THIS, and tHiS are the same.

•Have a strategy for version control.•Don’t overwrite file extensions

Page 21: Data Management for Undergraduate Researchers (updated - 02/2016)

One potential strategy

Page 22: Data Management for Undergraduate Researchers (updated - 02/2016)

Version Control - Numbering

001002003009010099

Use leading zeros for scalability

Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version changes and decimals for minor changes

(v1.1, v2.6)

110

239

99

Page 23: Data Management for Undergraduate Researchers (updated - 02/2016)

Version Control - Dates

If using dates use YYYYMMDDJune2015 = BAD!

06-18-2015 = BAD!

20150618 = GREAT!

2015-06-18 = This is fine too

Page 24: Data Management for Undergraduate Researchers (updated - 02/2016)

From a DMP…“Each file name, for all types of data, will contain the project acronym PUCCUK; a reference to the file content (survey, interview, media) and the date of an event (such as the date of an interview).

Page 25: Data Management for Undergraduate Researchers (updated - 02/2016)

•PLPP_EvaluationData_Workshop2_2014.xlsx

•MyData.xlsx

•publiclibrarypartnershipsprojectevaluationdataworkshop22014CummingsHelenaMontana.xlsx

Who filed better?

Page 26: Data Management for Undergraduate Researchers (updated - 02/2016)

Who filed better? •July 24 2014_SoilSamples%_v6•20140724_NSF_SoilSamples_Cum

mings•SoilSamples_FINAL

Page 27: Data Management for Undergraduate Researchers (updated - 02/2016)

Structuring folders and files

• Consider all the types of files you will handle during the course of your project.

• Develop a nested folder structure that makes sense for your project and your team’s retrieval needs.

• Name folders clearly, without special characters (avoid redundancy)

• Use a standard folder structure for each project or subproject (including making folders for files not yet created)

• Create a reference document (README file) that notes the purpose of different folder.

University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management

Page 28: Data Management for Undergraduate Researchers (updated - 02/2016)

File organization exercise

Page 29: Data Management for Undergraduate Researchers (updated - 02/2016)

Describing data

Page 30: Data Management for Undergraduate Researchers (updated - 02/2016)

Research Documentation •Grant proposals and related reports•Applications and approvals (e.g. IRB)•Codebooks, data dictionaries•Consent forms•Surveys, questionnaires, interview protocols•Transcripts, hard copies of audio and video

files•Any software or code you used (no matter

how insignificant or buggy)

Page 31: Data Management for Undergraduate Researchers (updated - 02/2016)

Three levels of documentation

• Project level – what the study set out to do, research questions, methods, sampling frames, instruments, protocols, members of the research team

• File or database level – How all the files relate to one another. A README file is a classic way of capturing this information.

• Variable or item level – Full label explaining the meaning of each variable.

http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/

Page 32: Data Management for Undergraduate Researchers (updated - 02/2016)

IJ?XVAR?

FNAME?

Page 33: Data Management for Undergraduate Researchers (updated - 02/2016)

What goes in a codebook?

•Variable name•Variable meaning•Variable data types•Precision of data•Units

•Known issues with the data

•Relationships to other variables

•Null values•Anything else someone

needs to better understand the data

Page 34: Data Management for Undergraduate Researchers (updated - 02/2016)

http://www.icpsr.umich.edu/files/deposit/Guide-to-Codebooks_v1.pdf

Page 35: Data Management for Undergraduate Researchers (updated - 02/2016)

MetadataUnstructure

d DataStructured

Data

There was a study put out by Dr. Gary Bradshaw from the University of Nebraska Medical Center in 1982 called “ Growth of Rodent Kidney Cells in Serum Media and the Effect of Viral Transformation On Growth”. It concerns the cytology of kidney cells.

Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth.

Author Gary BradshawDate 1982Publisher

University of Nebraska Medical Center

Subject Kidney -- Cytology

Page 37: Data Management for Undergraduate Researchers (updated - 02/2016)

LOCKSS (Lots of Copies

Keeps Stuff Safe)

Page 38: Data Management for Undergraduate Researchers (updated - 02/2016)

Options for data storage

•Personal computers or laptops

•Networked drives•External storage devices

Page 39: Data Management for Undergraduate Researchers (updated - 02/2016)

3-2-1 Backup RuleHave 3 copies of your data

On 2 different mediaIn more than 1 physical

location

Page 40: Data Management for Undergraduate Researchers (updated - 02/2016)

Ubox – box.utah.edu

Page 41: Data Management for Undergraduate Researchers (updated - 02/2016)

Language from a DMP

“All data files will be stored on the University server that is backed up nightly. The University's computing network is protected from viruses by a firewall and anti-virus software. Digital recordings will be copied to the server each day after interviews.

Signed consent forms will be stored in a locked cabinet in the office. Interview recordings and transcripts, which may contain personal information, will be password protected at file-level and stored on the server.

Original versions of the files will always be kept on the server. If copies of files are held on a laptop and edits made, their file names will be changed.”

Page 42: Data Management for Undergraduate Researchers (updated - 02/2016)

Thinking long-term

Page 43: Data Management for Undergraduate Researchers (updated - 02/2016)

Archiving options•Domain-specific repository

•General Purpose Data Repository

•Institutional repository

Page 44: Data Management for Undergraduate Researchers (updated - 02/2016)

When you archive…• Save the data in both its proprietary and non-

proprietary format (e.g. Excel and CSV; Microsoft Word and ASCII)

• Consider any restrictions on your data (copyright, patent, privacy, etc.)

• When possible/mandated/desired, share your data online with a persistent identifier (DOI or ARK)

• Include a data citation and state how you want to get credit for your data

• Link your data to your publications as often as possible

Page 45: Data Management for Undergraduate Researchers (updated - 02/2016)

Major takeaways•Data management starts at the

beginning of a project•Document your data so that someone

else could understand it•Have more than one copy of your

data•Consider archiving options when you

are done with your project

Page 46: Data Management for Undergraduate Researchers (updated - 02/2016)

Questions?

Rebekah [email protected](801) 581-7701Marriott Library, 1705Y…or ask now!