Upload
patricia-hswe
View
291
Download
0
Embed Size (px)
Citation preview
It’s 2015.
Do You Know Where Your Data Are?
Professional Development SeminarDemography 590Penn State University22 October 2015
This presentation is licensed CC BY 4.0.
Patricia Hswe | University Libraries Co-department Head, Publishing and Curation Services
Digital Content Strategist and Head, ScholarSphere User Services
http://www.libraries.psu.edu/psul/pubcur.html
[email protected] | 867-3702
This is . . . data?
I’m confused by Brian Moore via Flickr CC BY-SA
1108845-godzilla_facepalm_godzilla_facepalm_face_palm_epic_fail_demotivational_poster_1245384435_super by Patty Marvel via Flickr CC BY-NC-ND
What we’ll talk about
• What’s the future of your data?
• Tips, tools, resourcesfor managing data
• DMPs – What are they?
• Discussion: questions, comments, concerns?
WHAT’S THE FUTURE OF YOUR DATA?
“The Availability of Research Data Declines Rapidly with Article Age.” (Title of a 2014 article by Vines et al.)
“The major cause of the reduced data availability for older papers was the rapid increase in the proportion of data sets reported as either lost or on inaccessible storage media.”
Forty years of removable storage by David Smith via Flickr CC BY
“The odds that we were able to find an apparently working e-mail address (either in the paper or by searching online) for any of the contacted authors did decrease by about 7% per year.”
e-mail symbol by Micky Aldridge via Flickr CC BY
“Unfortunately, many of these missing data sets could be retrieved only with considerable effort by the authors, and others are completely lost to science.”
• The implications are apparent.
• What can researchers begin doing differently?
NIH Data Sharing Policy(required for proposed projects > $500K)
• When will you make the data available?• What file formats will you use for your data, and why?• What transformations will be necessary to prepare
data for preservation/data sharing?• What metadata/documentation will be submitted
alongside the data?• Will a data-sharing agreement will be required? What
will the agreement state?• What are your plans for providing access to your data?• Which archive/repository/central database have you
identified as a place to deposit data?
Quick tips and best practices
• Lifecycle mindset for research and data
• File-naming conventions
• Standards for description
• File formats
• StorageTool library by takomabibelotvia Flickr CC BY
From DataONE Best Practiceshttps://www.dataone.org/best-practices
Reflect on the “during” & end of research data at the beginning
File-naming conventions
• Consistency
– Patterns
• Descriptiveness
– Keywords
– “Aboutness” / content
• Versions
– Which versions need to be saved, tracked?
• Major components (will depend on type of research)
– Project name
– Content of the file
– Date
– Version number
– Location
– Instrument name / number
1108845-godzilla_facepalm_godzilla_facepalm_face_palm_epic_fail_demotivational_poster_1245384435_super - NOT A USEFUL FILE NAME!
Data description for access/use
• What standards does your
discipline use to describe
information?– Darwin Core
– DDI (Data Documentation
– Initiative)
• README.TXT
• Consult librarians to assist
with describing/documenting Old Standard Fireworks Poster by Epic Fireworks via Flickr CC BY
File formats –be intentional about them
• Open rather than proprietary
– Interoperable, usable across platforms
• What’s commonly used in your community / discipline?
• Formats for use vs. formats for archiving
–PNG or JPG vs. TIFF
–Word vs. PDF
Storage – spread / repeat / copy
• Distribution and redundancy– Keep the same files in more than one place
– Local options: internal (computer, laptop) hard drive; external hard drive; college/department servers
– Campus enterprise services: Box, Tivoli Storage Manager, High Performance Computing (may cost)
– Cloud services: Dropbox, Box, Spideroak, Amazon Web Services
• At least 3 copies
• Have master files from which copies get made
NIH Data Sharing Policy(required for proposed projects > $500K)
• When will you make the data available?• What file formats will you use for your data, and why?• What transformations will be necessary to prepare
data for preservation/data sharing?• What metadata/documentation will be submitted
alongside the data?• Will a data-sharing agreement will be required? What
will the agreement state?• What are your plans for providing access to your data?• Which archive/repository/central database have you
identified as a place to deposit data?
Each funding agency, seemingly its own DMP requirements
But commonalities exist:
• Expected data?
• Data retention?
• Data formats?
• Dissemination of data?
• Data preservation?
• Access to data?
• Whose responsibility in the project?
Snowflake-017 by yellowcloud via Flickr CC BY
Restricted data and DMPs
• Security measures to protect data?
• How will data be anonymized? Deidentified?
• Consent forms? Will possibility of sharing be addressed in consent forms?
• Policy for sharing parts of the data? Conditions of use?
• Embargoes?
• Where will data be kept? For how long?
Restricted data guidance
• “Restricted Use Data Management at ICPSR”
• “Managing sensitive research data” – U. Bristol, U.K.
• Review what our institution states in Research Administration Guidelines / Policies.
• Evaluate for sensitivity.
• Comply, if relevant – e.g., HIPAA, FERPA.
• Enable restricted use / access, if possible.
Tools / Resources / Services • Training
– MANTRA: http://datalib.edina.ac.uk/mantra/
– Penn State’s DMP Tutorial: https://www.e-education.psu.edu/dmpt/
• Resources– DMPTool: https://dmp.cdlib.org/
– re3data - data repository index: http://www.re3data.org/
– PSU resources: Penn State boilerplate language andPennState DMP local guidance
• Services– ScholarSphere: https://scholarsphere.psu.edu/
• Sandbox environment: https://scholarsphere-demo.dlt.psu.edu/
– Libraries also consult, teach, review DMPs
Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, AnetaSiemiginowska, Aleksandra Slavkovic. 2014.
“Ten Simple Rules for the Care and Feeding of Scientific Data.”
PLoS Comput Biol 10 (4): e1003542. doi:10.1371/journal.pcbi.1003542.
A few of the rules
• Practice science with certain level of reuse in mind
• Publish workflow as context
• Link your data to your publications
• Publish your code
• Say how you want to be credited for your data
• Foster and use data repositories as much as possible.
Reuse by GotCredit via Flickr CC BY
So,planforthe
future of
your data.
Questions? Comments? Feedback? Words of wisdom?
Keep in touch: Patricia Hswe | [email protected]
futu
re s
oo
nb
y k
rup
pvi
a Fl
ickr