Upload
uw-research-data-services
View
52
Download
1
Embed Size (px)
Citation preview
Data Storage & PreservationLuke Bluma | Brianna Marshall | Elliott Shuppy
IGERT workshop | November 2014
STORAGE
Outline
• Problem with Storage
• Storage vs Backup
• Storage Types
• UW-Madison Options
• Personal Options
• Best Practices
• Use Cases
• Key Takeaways
The Problem with Storage
• It’s everywhere!
• All the options seem similar
but slightly different
• Every use case is a little different
Storage vs Backup
Storage
Your working files. The files you access regularly and
change frequently. You need to store data safely and
securely but you also need to have access to it. In general,
losing your storage means losing current versions of the
data.
Storage vs Backup
Backup
A frequent and regular process of copying your data to a
secure place that is separate from where you keep your
storage. Backup can be overlooked because you don’t
really need it until you lose data, but when you need to
restore a file it can be the most important process you have
in place.
Rule of 3
• Keep THREE copies of your data– TWO onsite
– ONE offsite
• Example: – One: Network Drive
– Two: External Hard Drive
– Three: Cloud Storage
• This ensures that your storage and backup is not all in the same place – that’s too risky!
Storage Types
• Local storage
– Hard drive, external hard
drive, thumb drive, etc.
• Network storage
– Private cloud, public cloud,
etc.
• Private Cloud = network
storage run by UW
• Public Cloud = network
storage run by vendor
UW Data - Storage Options
• Local Storage/Backup Options– External Hard Drive (TechStore)
• Local IT Options– Services available depends on your
local IT department
• DoIT Options– Storage: File and Block Storage
– Backup: Bucky Backup Lite
• Cloud Options– UW’s Box Account
UW Data – DoIT Options
• Storage: File and Block Storage– File: easy to access, manage and share with
other UW folks
– Block: additional raw storage available over the network for your server
• Backup: Bucky Backup Lite– Client runs on your computer or server and
does incremental backups nightly
– You can manage the retention policy and version control
• Cloud Storage – UW’s Box Account
Personal Data - Storage Options
• Personal Data
– Your personal UW data: UW’s Box Account
– Your personal data: thumb drive, external
hard drive, or cloud options like Box,
Crashplan, Dropbox, etc.
• Discount with Crash Plan – 30% off -
http://go.wisc.edu/crashplan
Evaluating Cloud Services
• Lots of options out there – and not all are
created equal
• Read the Terms of Service!
• Servers get hacked all the time. Whatever
you’re storing, you don’t want your
provider to have access to it.
• Data encryption is your friend.
Storage & Backup Best Practices
• Think about and plan your data management
strategy before storing data
• If the data has ANY value to you, back it up
• If you have questions, ask for help! Local IT,
RDS, peers, friends, etc.
• Network storage is great, but think about
having a plan in place if you need to access
the data and the network is down
Storage & Backup Best Practices
• Put in the appropriate security measures
• Version control can be important especially when sharing data – plan ahead
• Document who has access to the data and audit that on a regular basis
• Test your backups – make sure they are working and you can actually restore a file
• If you use cloud storage, think about an exit strategy
Use Case 1 – Starting Fresh
• If you have a local IT person, contact them
first to talk about services available
• Contact RDS about a data management plan
• If local IT doesn’t have service offerings,
contact DoIT
• If all else fails – at least plan out your data
management strategy (storage, backup, etc.)
before starting to collect/use data
Use Case 2 – Leaving UW
• UW Data– If you have a local IT person, contact them
– If someone will be taking over your work, give them access to a shared space like Box
– If you are using DoIT services, make sure someone else still on campus has access to the data
– If you don’t have local IT, and aren’t using shared services but think the data is valuable to UW contact RDS
• Personal Data– If you are using UW Box, then transfer the data over to a
personal Box/Dropbox/Cloud account
– Purchase an external hard drive and transfer data over that way
Key Takeaways
• Figure out your storage requirements
– High security? Remote access? Ease of use? Scalability?
• Ask around – people are happy to help!
– Local IT, Peers, Friends, Family, etc.
• Rule of 3
– 2 onsite, 1 offsite – better to be safe, than sorry
• Test it!
– Make sure it works as advertised and do some disaster testing
PRESERVATION
Storage & Backup
vs. Preservation
Storage & Backup = short-term
– Working copies
– Expected to change
Preservation = long-term
– Usually the final, “fixed” version/s
Thinking Long-Term
• The data you’ve carefully stored is only useful if it’s readable and understandable
• Many factors affect this:
– Media• What software did you use to create the data? Does
hardware exist to access it?
– Metadata• How much contextual information accompanies your data?
Can you understand it? Can a stranger understand it?
– Organization• Is it all jumbled together? Or have you organized it
meaningfully? Do you know where your data is?
Thinking Long-Term
• None of the concepts discussed during this
workshop exist in a vacuum
• Some aspects of preservation feel out of our
control, like too much work
• The truth? It is confusing to plan ahead for
our data in a landscape of quickly changing
services…
• … but it’s worth it.
Time to Ponder
• Can you still access your data from…
– 20 years ago?
– 10 years ago?
– 5 years ago?
– 1 year ago?
Let’s talk about the data you’ve kept and lost.
Unreadable Data
CULPRITS
• Obsolete media
• Obsolete software &
file formats
• Obsolete hardware
CC image by Flickr user wlef70
Unreadable Data: Solutions
Now
- Start researching. (Google!) Odds are someone else
has faced the same issue.
- Digital forensics tools such as BitCurator can provide
guidance: http://www.bitcurator.net/
- Don’t assume your data is gone for good.
- Contact me to brainstorm.
Unreadable Data: Solutions
Moving forward • Today’s popular software can become obsolete through
business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.)
• Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently!
• Some file formats are less susceptible to obsolescence than others
– Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG)
– Wide adoption
– History of backward compatibility
– Metadata support in open format (XML)
Lost Data
Now
• Do a data inventory. List all the places where your data lives (both physical and digital)
• Plan for consolidating – follow the rule of 3, not the rule of 17
Moving forward
• Too many copies can be a headache: hard to keep track of versions and know what is where. It makes sense to start a data inventory to track your data, especially at the beginning of a big project with many people and moving parts.
Decontextualized Data
Coded SPSS
survey
responses
(Useless without
the original
questionnaires)
Decontextualized Data: Solutions
Now
• Write contextual information in the form of a readme file and/or scan written notes.
• Publish as additional bitstream to your datasets.
• Accept that some old data will never have necessary contextual information. Is it worth it to preserve it?
Moving forward
• Take the time to create metadata.
• At the very least, create a readme file. (Good example located here: http://hdl.handle.net/2022/17155)
Repositories
Disciplinary repositories provide a good home
for data, often with the requirement that you
share it openly.
DataONE: https://www.dataone.org/
Dryad: http://datadryad.org/
Knowledge Network for Biocomplexity:
https://knb.ecoinformatics.org/
Databib & re3data
Plan to merge their two projects into one service by the end of 2015.
Institutional Help with Preservation
• IR not yet up to task of managing data… but
that’s in the works.
• UW Libraries is a member of the Digital
Preservation Network
• Several distributed, “dark archive”
preservation systems being explored
• And of course, RDS can help!
Final Thoughts
• Preservation = thinking about how your data
organization, metadata, and storage impacts
your ability to access your data years from now.
• Prioritize your most important research. You
might not be able to preserve everything.
• It takes active researcher participation.
• Any plan is better than no plan at all. Start today.
Ask for help.
Contact Us
• Research Data Services (RDS)
– http://researchdata.wisc.edu/help/about-us/
• DoIT Storage and Backup
Questions?