29
{ Disk image! … and then what? Strategies for sustainable long-term storage and access

Disk Image!...and then what? Strategies for sustainable long-term storage and access

Embed Size (px)

Citation preview

Page 1: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

{Disk image! … and then what?

Strategies for sustainable long-term storage and access

Page 2: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

{ Document stuff about your stuff

Metadata

Page 3: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Metadata = documentation

Physical components of a work Technical components of a work Source code Dependencies Artist’s intent Significant properties The usual aspects of conservation documentation:

who, what, when, where, why, how.

Page 4: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Example Environment: Windows XP operating system

Simple! Straightforward!

Page 5: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Metadata ≠ documentation Documentation:

Scattered Free-form Human-readable

Metadata: Unified Standards-based Machine actionable Interoperable

Page 6: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

PREMIS PREservation Metadata: Implementation Strategies

Library of Congress metadata standard

http://www.loc.gov/standards/premis/

Includes elements for: Significant properties Hardware and software dependencies Object and environment characteristics Actions performed on objects And more!

Page 7: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Metadata guidelines Bare minimum:

Record all the documentation. Inventory everything, including the documentation.

Keep the inventory, metadata, etc. backed up, including one copy that is NOT co-located with the objects.

Best practice: Use standards-based, machine-actionable metadata. Consider PREMIS (PREservation Metadata: Implementation

Strategies).

Page 8: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Interlude

NDSA Levels of Digital Preservation

http://www.digitalpreservation.gov/ndsa/activities/levels.html

Four levels ranging from minimum accepted practice to best practice

Page 9: Disk Image!...and then what?  Strategies for sustainable long-term storage and access
Page 10: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

{ Stuff’s gotta go somewhere

Storage

Page 11: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

What is a repository?

Page 12: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

PHYSICAL storage media

“All digital is physical. They aren't literal clouds, folks.” - Stephanie Gowler, Project Conservator at Northwestern University

Page 13: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Don’t panic!

Page 14: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Common default storage habits

Leave it sitting on the computer you used for disk imaging.

Put it back on on an external hard drive and store it in your desk drawer.

Transfer it to your repository and forget about it.

ALL OF THESE ARE BAD OPTIONS!

Page 15: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Physical storage media optionsMedia Type Long-Term Sustainability Cost

Optical media (CDs, DVDs) Terrible! Very short shelf life, high rate of data loss

Dirt cheap

Removable/offline storage media (External hard drives, tape)

Pretty good, low rate of data loss Fairly cheap

Online local disks (Internal computer hard drives)

Pretty good, low rate of data loss Moderate cost

Redundant disk arrays (RAID) Very good, super low rate of data loss Getting pricey

Local network servers (NAS, SAN) Great! Super low rate of data loss and experts are managing it

Expensive

Cloud servers Great (probably)! Super low rate of data loss (probably) and experts are managing it (probably)

Expensive, but varies based on type

Page 16: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Storage guidelines Bare minimum:

Two complete, identical copies on different types of storage media

Ex: one copy on a local desktop computer, one copy on tape backups

+

Page 17: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Location, location, location Best practice:

Three complete, identical copies stored on different types of storage media, in different geographic locations.

Ex: one copy on a local RAID array in NYC, one copy in cloud server storage based in Utah, one copy on LTO tapes in a vault in Michigan.

21

3

Page 18: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Some thoughts on backup

Continuous backup means file corruption gets duplicated.

What’s the retention period?

Know the policies around this. Your IT staff are focused on business continuity, NOT long-term preservation. That’s your job.

Page 19: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Replacement Roughly every five years

BEFORE failure occurs!

Page 20: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

{ Who’s in charge of this stuff, anyway?

Management

Page 21: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Why manage?

To ensure that files don’t get corrupted, lost, damaged, or otherwise altered.

To ensure that files can still be used.

Page 22: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Management components Metadata: Do you know what you have?

File fixity: Do you actually have what you think you have?

Information security: What’s happening to what you have?

Preventing obsolescence: Can you use what you have?

Page 23: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

File fixity Checksums vs cryptographic hash algorithms MD5 and SHA-1 Verification is key!

Page 24: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

File fixity guidelines Bare minimum:

Create checksums of everything on ingest. (See: practical session)

Best practice: VERIFY checksums periodically, and always when

performing these tasks:o Ingest into a repository.o Transferring files from one storage medium to another.

Replace damaged files with copies from another location/storage medium.

Page 25: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Information security for non-IT folks Manage access – particularly delete permissions Log access and actions Audit the logs periodically (or automate)

Page 26: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Information security guidelines Bare minimum:

Know and document who has access to files. Try to restrict (particularly write/delete) to as few people

as possible, and require multi-person authorization to delete.

Best practice: Maintain logs of all actions performed on files. Audit those logs regularly to ensure no unintended actions

were performed.

Page 27: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Preventing obsolescence Bare minimum:

Document information such as file formats, dependencies, etc. (See: metadata)

Best practice: Review format and dependency information regularly to

identify high-risk objects. Perform migration, emulation tests, etc. as obsolescence

issues arise.

Page 28: Disk Image!...and then what?  Strategies for sustainable long-term storage and access

Still awake? Questions?

Thank you!