34
Data Storage & Preservation Luke Bluma | Brianna Marshall | Elliott Shuppy IGERT workshop | November 2014

Data Storage & Preservation

Embed Size (px)

Citation preview

Page 1: Data Storage & Preservation

Data Storage & PreservationLuke Bluma | Brianna Marshall | Elliott Shuppy

IGERT workshop | November 2014

Page 2: Data Storage & Preservation

STORAGE

Page 3: Data Storage & Preservation

Outline

• Problem with Storage

• Storage vs Backup

• Storage Types

• UW-Madison Options

• Personal Options

• Best Practices

• Use Cases

• Key Takeaways

Page 4: Data Storage & Preservation

The Problem with Storage

• It’s everywhere!

• All the options seem similar

but slightly different

• Every use case is a little different

Page 5: Data Storage & Preservation

Storage vs Backup

Storage

Your working files. The files you access regularly and

change frequently. You need to store data safely and

securely but you also need to have access to it. In general,

losing your storage means losing current versions of the

data.

Page 6: Data Storage & Preservation

Storage vs Backup

Backup

A frequent and regular process of copying your data to a

secure place that is separate from where you keep your

storage. Backup can be overlooked because you don’t

really need it until you lose data, but when you need to

restore a file it can be the most important process you have

in place.

Page 7: Data Storage & Preservation

Rule of 3

• Keep THREE copies of your data– TWO onsite

– ONE offsite

• Example: – One: Network Drive

– Two: External Hard Drive

– Three: Cloud Storage

• This ensures that your storage and backup is not all in the same place – that’s too risky!

Page 8: Data Storage & Preservation

Storage Types

• Local storage

– Hard drive, external hard

drive, thumb drive, etc.

• Network storage

– Private cloud, public cloud,

etc.

• Private Cloud = network

storage run by UW

• Public Cloud = network

storage run by vendor

Page 9: Data Storage & Preservation

UW Data - Storage Options

• Local Storage/Backup Options– External Hard Drive (TechStore)

• Local IT Options– Services available depends on your

local IT department

• DoIT Options– Storage: File and Block Storage

– Backup: Bucky Backup Lite

• Cloud Options– UW’s Box Account

Page 10: Data Storage & Preservation

UW Data – DoIT Options

• Storage: File and Block Storage– File: easy to access, manage and share with

other UW folks

– Block: additional raw storage available over the network for your server

• Backup: Bucky Backup Lite– Client runs on your computer or server and

does incremental backups nightly

– You can manage the retention policy and version control

• Cloud Storage – UW’s Box Account

Page 11: Data Storage & Preservation

Personal Data - Storage Options

• Personal Data

– Your personal UW data: UW’s Box Account

– Your personal data: thumb drive, external

hard drive, or cloud options like Box,

Crashplan, Dropbox, etc.

• Discount with Crash Plan – 30% off -

http://go.wisc.edu/crashplan

Page 12: Data Storage & Preservation

Evaluating Cloud Services

• Lots of options out there – and not all are

created equal

• Read the Terms of Service!

• Servers get hacked all the time. Whatever

you’re storing, you don’t want your

provider to have access to it.

• Data encryption is your friend.

Page 13: Data Storage & Preservation

Storage & Backup Best Practices

• Think about and plan your data management

strategy before storing data

• If the data has ANY value to you, back it up

• If you have questions, ask for help! Local IT,

RDS, peers, friends, etc.

• Network storage is great, but think about

having a plan in place if you need to access

the data and the network is down

Page 14: Data Storage & Preservation

Storage & Backup Best Practices

• Put in the appropriate security measures

• Version control can be important especially when sharing data – plan ahead

• Document who has access to the data and audit that on a regular basis

• Test your backups – make sure they are working and you can actually restore a file

• If you use cloud storage, think about an exit strategy

Page 15: Data Storage & Preservation

Use Case 1 – Starting Fresh

• If you have a local IT person, contact them

first to talk about services available

• Contact RDS about a data management plan

• If local IT doesn’t have service offerings,

contact DoIT

• If all else fails – at least plan out your data

management strategy (storage, backup, etc.)

before starting to collect/use data

Page 16: Data Storage & Preservation

Use Case 2 – Leaving UW

• UW Data– If you have a local IT person, contact them

– If someone will be taking over your work, give them access to a shared space like Box

– If you are using DoIT services, make sure someone else still on campus has access to the data

– If you don’t have local IT, and aren’t using shared services but think the data is valuable to UW contact RDS

• Personal Data– If you are using UW Box, then transfer the data over to a

personal Box/Dropbox/Cloud account

– Purchase an external hard drive and transfer data over that way

Page 17: Data Storage & Preservation

Key Takeaways

• Figure out your storage requirements

– High security? Remote access? Ease of use? Scalability?

• Ask around – people are happy to help!

– Local IT, Peers, Friends, Family, etc.

• Rule of 3

– 2 onsite, 1 offsite – better to be safe, than sorry

• Test it!

– Make sure it works as advertised and do some disaster testing

Page 18: Data Storage & Preservation

PRESERVATION

Page 19: Data Storage & Preservation

Storage & Backup

vs. Preservation

Storage & Backup = short-term

– Working copies

– Expected to change

Preservation = long-term

– Usually the final, “fixed” version/s

Page 20: Data Storage & Preservation

Thinking Long-Term

• The data you’ve carefully stored is only useful if it’s readable and understandable

• Many factors affect this:

– Media• What software did you use to create the data? Does

hardware exist to access it?

– Metadata• How much contextual information accompanies your data?

Can you understand it? Can a stranger understand it?

– Organization• Is it all jumbled together? Or have you organized it

meaningfully? Do you know where your data is?

Page 21: Data Storage & Preservation

Thinking Long-Term

• None of the concepts discussed during this

workshop exist in a vacuum

• Some aspects of preservation feel out of our

control, like too much work

• The truth? It is confusing to plan ahead for

our data in a landscape of quickly changing

services…

• … but it’s worth it.

Page 22: Data Storage & Preservation

Time to Ponder

• Can you still access your data from…

– 20 years ago?

– 10 years ago?

– 5 years ago?

– 1 year ago?

Let’s talk about the data you’ve kept and lost.

Page 23: Data Storage & Preservation

Unreadable Data

CULPRITS

• Obsolete media

• Obsolete software &

file formats

• Obsolete hardware

CC image by Flickr user wlef70

Page 24: Data Storage & Preservation

Unreadable Data: Solutions

Now

- Start researching. (Google!) Odds are someone else

has faced the same issue.

- Digital forensics tools such as BitCurator can provide

guidance: http://www.bitcurator.net/

- Don’t assume your data is gone for good.

- Contact me to brainstorm.

Page 25: Data Storage & Preservation

Unreadable Data: Solutions

Moving forward • Today’s popular software can become obsolete through

business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.)

• Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently!

• Some file formats are less susceptible to obsolescence than others

– Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG)

– Wide adoption

– History of backward compatibility

– Metadata support in open format (XML)

Page 26: Data Storage & Preservation

Lost Data

Now

• Do a data inventory. List all the places where your data lives (both physical and digital)

• Plan for consolidating – follow the rule of 3, not the rule of 17

Moving forward

• Too many copies can be a headache: hard to keep track of versions and know what is where. It makes sense to start a data inventory to track your data, especially at the beginning of a big project with many people and moving parts.

Page 27: Data Storage & Preservation

Decontextualized Data

Coded SPSS

survey

responses

(Useless without

the original

questionnaires)

Page 28: Data Storage & Preservation

Decontextualized Data: Solutions

Now

• Write contextual information in the form of a readme file and/or scan written notes.

• Publish as additional bitstream to your datasets.

• Accept that some old data will never have necessary contextual information. Is it worth it to preserve it?

Moving forward

• Take the time to create metadata.

• At the very least, create a readme file. (Good example located here: http://hdl.handle.net/2022/17155)

Page 29: Data Storage & Preservation

Repositories

Disciplinary repositories provide a good home

for data, often with the requirement that you

share it openly.

DataONE: https://www.dataone.org/

Dryad: http://datadryad.org/

Knowledge Network for Biocomplexity:

https://knb.ecoinformatics.org/

Page 30: Data Storage & Preservation

Databib & re3data

Plan to merge their two projects into one service by the end of 2015.

Page 31: Data Storage & Preservation

Institutional Help with Preservation

• IR not yet up to task of managing data… but

that’s in the works.

• UW Libraries is a member of the Digital

Preservation Network

• Several distributed, “dark archive”

preservation systems being explored

• And of course, RDS can help!

Page 32: Data Storage & Preservation

Final Thoughts

• Preservation = thinking about how your data

organization, metadata, and storage impacts

your ability to access your data years from now.

• Prioritize your most important research. You

might not be able to preserve everything.

• It takes active researcher participation.

• Any plan is better than no plan at all. Start today.

Ask for help.

Page 33: Data Storage & Preservation

Contact Us

• Research Data Services (RDS)

– http://researchdata.wisc.edu/help/about-us/

• DoIT Storage and Backup

[email protected]

Page 34: Data Storage & Preservation

Questions?