37
Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental Sciences, Environmental Studies [email protected] Sally Wyman, Collection Development Librarian, Sr. Bibliographer for Chemistry, Physics, Environmental Studies [email protected] Barbara Mento, Data/GIS Librarian, Sr. Bibliographer for Computer Science, Economics, Mathematics [email protected]

Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Embed Size (px)

Citation preview

Page 1: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Management … a “nuts-and-bolts” part of

Responsible Conduct of Research

March 21, 2015

Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental Sciences, Environmental Studies

[email protected]

Sally Wyman, Collection Development Librarian, Sr. Bibliographer for Chemistry, Physics, Environmental Studies

[email protected]

Barbara Mento, Data/GIS Librarian, Sr. Bibliographer for Computer Science, Economics, Mathematics

[email protected]

Page 2: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Fits into “responsible conduct of research”

Risk of data loss for you and the University

Facilitates fulfillment of requests from others to see your data

Shared data (“open access”) higher citation rate!

First – Why?

Page 3: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Increasingly, grants require a “Data Management Plan”

NSF

NIH

All larger agencies, coming soon Per White House Directive on Open Data -- Feb. 22, 2013

More scholarly journal policies (Nature, Science, PNAS, PLoS…) require that data must be:

Clearly documented .. available for sharing … detailed enough to permit replication of analysis

New “data journals” starting to appear – including Nature’s Scientific Data, which publishes data sets

More (Really Good) Reasons:

Page 4: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

A “Typical” Data Management Plan

1-2 pages describing the project and how data will be:

Collected (including formats, size, etc.) … Secured … Analyzed … Shared … Preserved

Details about access/sharing

Potential audience(s) for the data

How access will be provided and how others will find it: “Access” (freely-available) vs. “Sharing” (by request)

Stipulations for privacy, confidentiality, IP or other rights

Allowed re-use of the data, derivative products

Metadata standards to be used

How long data will be retained -- archiving, long-term preservation and format migration

Page 5: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

From the NSF FAQ on Data Management Plans:

“DMP” covers recorded factual material commonly accepted in the [specific] scientific community as necessary to validate research findings. May include, but is not limited to:

Data

Publications

Samples

Physical collections

Software and models

But not: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. (Office of Management and Budget (OMB) Circular A-110 )

Page 6: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Boston College LibrariesData Management Plan

Research Guide

Guidance on content

Templates/examples

Additional resources

To arrange a consultation with a subject specialist

http://libguides.bc.edu/dataplan

Page 7: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Management in Action

Some “best practices” while collecting or generating your data

Storage

Documentation

Loss Prevention

Security

Image: digitalart / FreeDigitalPhotos.net

Page 8: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Handling … Storing … and

Backing Up Your DataData Storage Elements to Consider:

File Formats and Naming

Directory Structure

Version Control

Assign Responsibility

Document your practices

Think about all of this EARLY

Page 9: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

File Formats

Whenever possible, save your data using open standards. Avoid proprietary formats. Some examples:

TXT, PDF/PDF Archival, not Word (doc, docx)

ASCII, not Excel (xls, xlsx)

MPEG-4, not Quicktime (qtff)

TIFF or JPEG2000, not GIF or JPG

XML or RDF, not RDBMS

Ideally, save files in both original format AND one of the preferred ones listed above.

Page 10: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Why Use Open File Formats?

No restrictions on their use

Open source code future migration easier

Propriety formats are offered by companies that may go out of business, carrying the code knowledge with them

Facilitates sharing

Page 11: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Organization

File Naming Conventions/Best Practices Consistent, descriptive, UNIQUE … avoid spaces and

special characters

Use brief names

Can contain:

Project acronyms

Researchers’ initials

File type information

Version number

Date

File Status

IUS_v02_092011_final.csv

Internet Usage Study version 2, Sept 2011, final draft, in csv format

Page 12: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Organization

Directory Structure

Use folders!

Possible ways to organize:

By types of data

IR, NMR, etc.

By experiment

By collection method

Choose option that works best for your research group … it should be understandable to others

Image: digitalart / FreeDigitalPhotos.net

Page 13: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Version Control

Keep an archival (unmodified) version, and updated versions (clearly labelled)

Use ordinal numbers (1, 2, 3) for major changes and decimals for minor changes (V1.1, V1.2 …)

Version control software can help, and some software has this built-in… especially instrument software

Page 14: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Entry and Quality Control

Whatever you use, be consistent

Define abbreviations in readme.txt file or in a “codebook”

Record dates for best sorting (YYYYMMDD)

Check periodically for data corruption/integrity using checksum, for example

Flag problematic data

Handling of null values: problematic in moving across software platforms

Consider using blanks: treated as null values by R, Python, Excel

Don’t use text (as in, “no data”) in a data column formatted for numbers

Avoid manual data entry whenever possible

Consider making your raw data files “read only”

Page 15: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Documentation (“Metadata”)

What is metadata?

Benefits of good documentation

What elements should be documented?

For help, contact your subject specialist:

www.bc.edu/libraries/help/askalib.html

ISO suggested Minimum Data Elementso Titleo Creator (Principal Investigators)o Date Created (also versions)o Instrument and modelo Format (and software required)o Subjecto Unique Identifiero Description of the specific data

resourceo Coverage of the data (spatial or

temporal)o Publishing Organizationo Type of Resourceo Rightso Funding or Grant

Page 16: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Why Metadata?

It helps others discover your research when you share your data.

This “data about your data” captures the most critical information about a particular project. Capture it early on… you think you will remember, but …

Metadata may be required for journal publication/data deposit.

Page 17: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Metadata Standards

These vary …

by disciplineby type of data by repository

for example: GenBank

We can help.

Page 18: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Sample GenBank Record – example of a standard

Page 19: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Documentation – What do you do with it once you have it?

Record it in a readme.txt file

In some fields, “codebooks” are used to record methodology and other data management notes (e.g. IRB compliance statements, etc.)

Consider including a “data dictionary”

Inserted with deposited data these files facilitate “discovery” of your data on the Web

Page 20: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Loss Prevention

Regular back-ups protect against data loss

Back up strategy will depend on your needs:

Back up all versions of the files or certain ones?

How often will you back up files?

Have at least two back up locations

internal (your computer)

external (i.e. the BC Research Data Archive or departmental servers)

Assign responsibility for backing-up

Page 21: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Physical Storage Options

Local Centralized Remote

Convenient but less secure (especially external media)

More secure, with automatic back-up … and more space

Permanent, someone else takes responsibility for future migration

• On your own computer’s hard drive

• External media (hard drive, CD/DVD, flash drive)

• Departmental server, local network access

• ITS• Departmental

server, local network access

• Disciplinary Repositories, e.g. GenBank, Cambridge Structure Database

• Secure cloud options are in use at other institutions

Page 22: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Storage

ITS offers a remote, automated backup of faculty and staff computers using a product called Connected Backup by

Autonomy. Users have the ability to recover files from any location using a web browser.

http://www.bc.edu/offices/help/essentials/backup/ironmtn.html

Research Services provides secure archive space for research data that is backed up nightly.

http://www.bc.edu/offices/researchservices/dataresources/archive.html

Your department may provide its own storage options.

Page 23: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Funding Long-term Data Storage

Who will pay for this? NSF DMP guidelines encourage inclusion of cost information … and grants may pay.

How much of your data will you save? Raw data (untouched) always …

In general, data must be stored for three years (contact Dr. Stephen Erickson at the Boston College Office of Research and Integrity for more information).

Page 24: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Security

For additional assistance with security planning, consult the Computer Policy & Security Office of the IT Assurance

Department.

Director: David Escalante

www.bc.edu/offices/its/depts/assurance/policysecurity.html

Page 25: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Access and SharingOptions include:

Personal website

Journal “supplementary materials” (ACS, etc.)

Institutional repository, e.g. eScholarship@bc

Disciplinary (or multidisciplinary) repository

Or, a combination: journal-designated repository – Nature example)

Page 26: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

E-Scholarship@bc

• A repository for BC data sets and publications

• A portal for pointing to your data wherever it is stored (at BC or beyond)

Page 27: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Sharing Options Beyond BC

Subject-based archives – ask your subject librarian

Directories of data repositories:

DataBib (Beta) http://databib.org/index.php#

Simmons Data Repositories Listing http://oad.simmons.edu/oadwiki/Data_repos

itories

Page 28: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Examples of Repositories

Biomedicine:

GenBank -- sequence data

RSCB Protein DataBank -- biomolecule crystal structure coordinates, etc.

Chemistry:

Cambridge Structural Database (CSD)

PubChem (Part of NCBI Entrez, covering biological activities of small molecules)

Multidisciplinary: FigShare.com (Open, Free)

Page 29: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

DMPs: Data Sharing … also Archiving

What does the Data Sharing Policy Mean?

Example NSF: “plans for archiving data, samples, and other research products, and for preservation of access to them.”

Archiving Data means not just preserving the data in the original format but also in a format that is non-platform reliant, using a standard that ensures that the data can be re-used in the future.

Metadata is vital to insure data is findable.

Page 30: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Ethics and Privacy Sensitive data should be

redacted before depositing in a public archive or repository.

Access to data may be embargoed (access limited for a time) for confidentiality, legal, patentability or other reasons.

Dark archives ensure permanent protection of confidentiality.

Where human subjects/privacy is involved, BC’s Institutional Review Board (IRB) must approve. http://www.bc.edu/research/oric/human.html

Image: digitalart / FreeDigitalPhotos.net

Page 31: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Data Ownership

You may have copyright or ownership concerns when planning to share your

data.

For assistance and more information, please contact the Boston College

Office for Research Integrity and Compliance:

http://www.bc.edu/content/bc/research/oric/compliance.html

Page 32: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Intellectual Property/Technology Transfer Concerns

Funders/journals expect that you will share your data within a reasonable amount of time …

However, they also recognize the need to protect intellectual property rights and potential commercial value

The DMP should describe your plans to protect those rights

Contact the Boston College Office for Technology Transfer and Licensing as part of your DMP writing process

Page 33: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Research OutputData Citations

Why should I cite data? Ensures that original producers of the data

(you!) are credited in citation indexes*

Allows researchers to locate research data used in an article

May be required by the archive that stored the data you have repurposed

*Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.doi:10.1371/journal.pone.0000308

Page 34: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Citing Data SetsEssential citation elements; style will vary:

• author or creator

• title or description

• year of publication

• publisher and/or the database/archive from which it was retrieved

• the URL or DOI if the data set is online

Mackey, R.A., Mackey, E.F., and O’Brien, B.A. (1990). Lasting relationships research data archive (eScholarship version) [Data file]. Boston College School of Social Work. http://hdl.handle.net/2345/2228

National Center for Biotechnology Information. PubChem Compound Database; CID=5934766, http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5934766 (accessed Feb. 22, 2011).

Page 35: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Additional Support

The Libraries

The Data Management LibGuidelibguides.bc.edu/dataplan

Subject Specialistswww.bc.edu/libraries/help/askalib.html

[email protected]

The Office for Sponsored Programs Researchhttp://www.bc.edu/research/osp.html

ITS/Research Serviceshttp://www.bc.edu/offices/researchservices/

Office for Research Integrity and Compliancehttp://www.bc.edu/research/oric/compliance.html

The Office for Technology Transfer and Licensing http://www.bc.edu/research/ottl/

Page 36: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Some Useful Links

Data Management and Sharing Snafu in 3 Short Acts (NYU Health Sciences Library)https://www.youtube.com/watch?v=N2zK3sAtr-4

DataOne Best Practiceshttps//www.dataone.org/all-best-practices-download-pdf

DCC (Digital Curation Center) Disciplinary Metadata Standardshttp://www.dcc.ac.uk//resources/metadata-standards

DCC Digital Curation Center Metadata Standards – Physical Sciences http://www.dcc.ac.uk/resources/subject-areas/physical-science

Guide to Writing “Readme” Style Metadata (Cornell)http://data.research.cornell.edu/content/readme

Page 37: Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental

Questions?