30
It’s 2015. Do You Know Where Your Data Are? Professional Development Seminar Demography 590 Penn State University 22 October 2015 This presentation is licensed CC BY 4.0.

Demography pro sem

Embed Size (px)

Citation preview

It’s 2015.

Do You Know Where Your Data Are?

Professional Development SeminarDemography 590Penn State University22 October 2015

This presentation is licensed CC BY 4.0.

Patricia Hswe | University Libraries Co-department Head, Publishing and Curation Services

Digital Content Strategist and Head, ScholarSphere User Services

http://www.libraries.psu.edu/psul/pubcur.html

[email protected] | 867-3702

Data accountability—or lack thereof—keeps making the news.

This is . . . data?

I’m confused by Brian Moore via Flickr CC BY-SA

1108845-godzilla_facepalm_godzilla_facepalm_face_palm_epic_fail_demotivational_poster_1245384435_super by Patty Marvel via Flickr CC BY-NC-ND

What we’ll talk about

• What’s the future of your data?

• Tips, tools, resourcesfor managing data

• DMPs – What are they?

• Discussion: questions, comments, concerns?

WHAT’S THE FUTURE OF YOUR DATA?

“The Availability of Research Data Declines Rapidly with Article Age.” (Title of a 2014 article by Vines et al.)

“The major cause of the reduced data availability for older papers was the rapid increase in the proportion of data sets reported as either lost or on inaccessible storage media.”

Forty years of removable storage by David Smith via Flickr CC BY

“The odds that we were able to find an apparently working e-mail address (either in the paper or by searching online) for any of the contacted authors did decrease by about 7% per year.”

e-mail symbol by Micky Aldridge via Flickr CC BY

“Unfortunately, many of these missing data sets could be retrieved only with considerable effort by the authors, and others are completely lost to science.”

• The implications are apparent.

• What can researchers begin doing differently?

MANAGE YOUR RESEARCH DATA NOW

Be proactive!

NIH Data Sharing Policy(required for proposed projects > $500K)

• When will you make the data available?• What file formats will you use for your data, and why?• What transformations will be necessary to prepare

data for preservation/data sharing?• What metadata/documentation will be submitted

alongside the data?• Will a data-sharing agreement will be required? What

will the agreement state?• What are your plans for providing access to your data?• Which archive/repository/central database have you

identified as a place to deposit data?

Quick tips and best practices

• Lifecycle mindset for research and data

• File-naming conventions

• Standards for description

• File formats

• StorageTool library by takomabibelotvia Flickr CC BY

From DataONE Best Practiceshttps://www.dataone.org/best-practices

Reflect on the “during” & end of research data at the beginning

File-naming conventions

• Consistency

– Patterns

• Descriptiveness

– Keywords

– “Aboutness” / content

• Versions

– Which versions need to be saved, tracked?

• Major components (will depend on type of research)

– Project name

– Content of the file

– Date

– Version number

– Location

– Instrument name / number

1108845-godzilla_facepalm_godzilla_facepalm_face_palm_epic_fail_demotivational_poster_1245384435_super - NOT A USEFUL FILE NAME!

Data description for access/use

• What standards does your

discipline use to describe

information?– Darwin Core

– DDI (Data Documentation

– Initiative)

• README.TXT

• Consult librarians to assist

with describing/documenting Old Standard Fireworks Poster by Epic Fireworks via Flickr CC BY

File formats –be intentional about them

• Open rather than proprietary

– Interoperable, usable across platforms

• What’s commonly used in your community / discipline?

• Formats for use vs. formats for archiving

–PNG or JPG vs. TIFF

–Word vs. PDF

Storage – spread / repeat / copy

• Distribution and redundancy– Keep the same files in more than one place

– Local options: internal (computer, laptop) hard drive; external hard drive; college/department servers

– Campus enterprise services: Box, Tivoli Storage Manager, High Performance Computing (may cost)

– Cloud services: Dropbox, Box, Spideroak, Amazon Web Services

• At least 3 copies

• Have master files from which copies get made

DATA MANAGEMENT PLANS

What funding agencies expect

NIH Data Sharing Policy(required for proposed projects > $500K)

• When will you make the data available?• What file formats will you use for your data, and why?• What transformations will be necessary to prepare

data for preservation/data sharing?• What metadata/documentation will be submitted

alongside the data?• Will a data-sharing agreement will be required? What

will the agreement state?• What are your plans for providing access to your data?• Which archive/repository/central database have you

identified as a place to deposit data?

Each funding agency, seemingly its own DMP requirements

But commonalities exist:

• Expected data?

• Data retention?

• Data formats?

• Dissemination of data?

• Data preservation?

• Access to data?

• Whose responsibility in the project?

Snowflake-017 by yellowcloud via Flickr CC BY

Restricted data and DMPs

• Security measures to protect data?

• How will data be anonymized? Deidentified?

• Consent forms? Will possibility of sharing be addressed in consent forms?

• Policy for sharing parts of the data? Conditions of use?

• Embargoes?

• Where will data be kept? For how long?

Restricted data guidance

• “Restricted Use Data Management at ICPSR”

• “Managing sensitive research data” – U. Bristol, U.K.

• Review what our institution states in Research Administration Guidelines / Policies.

• Evaluate for sensitivity.

• Comply, if relevant – e.g., HIPAA, FERPA.

• Enable restricted use / access, if possible.

DEMOS OF TOOLS/RESOURCES/SERVICES

Tools / Resources / Services • Training

– MANTRA: http://datalib.edina.ac.uk/mantra/

– Penn State’s DMP Tutorial: https://www.e-education.psu.edu/dmpt/

• Resources– DMPTool: https://dmp.cdlib.org/

– re3data - data repository index: http://www.re3data.org/

– PSU resources: Penn State boilerplate language andPennState DMP local guidance

• Services– ScholarSphere: https://scholarsphere.psu.edu/

• Sandbox environment: https://scholarsphere-demo.dlt.psu.edu/

– Libraries also consult, teach, review DMPs

Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, AnetaSiemiginowska, Aleksandra Slavkovic. 2014.

“Ten Simple Rules for the Care and Feeding of Scientific Data.”

PLoS Comput Biol 10 (4): e1003542. doi:10.1371/journal.pcbi.1003542.

A few of the rules

• Practice science with certain level of reuse in mind

• Publish workflow as context

• Link your data to your publications

• Publish your code

• Say how you want to be credited for your data

• Foster and use data repositories as much as possible.

Reuse by GotCredit via Flickr CC BY

So,planforthe

future of

your data.

Questions? Comments? Feedback? Words of wisdom?

Keep in touch: Patricia Hswe | [email protected]

futu

re s

oo

nb

y k

rup

pvi

a Fl

ickr