Upload
helen-bailey
View
403
Download
0
Embed Size (px)
Citation preview
{Disk image! … and then what?
Strategies for sustainable long-term storage and access
{ Document stuff about your stuff
Metadata
Metadata = documentation
Physical components of a work Technical components of a work Source code Dependencies Artist’s intent Significant properties The usual aspects of conservation documentation:
who, what, when, where, why, how.
Example Environment: Windows XP operating system
Simple! Straightforward!
Metadata ≠ documentation Documentation:
Scattered Free-form Human-readable
Metadata: Unified Standards-based Machine actionable Interoperable
PREMIS PREservation Metadata: Implementation Strategies
Library of Congress metadata standard
http://www.loc.gov/standards/premis/
Includes elements for: Significant properties Hardware and software dependencies Object and environment characteristics Actions performed on objects And more!
Metadata guidelines Bare minimum:
Record all the documentation. Inventory everything, including the documentation.
Keep the inventory, metadata, etc. backed up, including one copy that is NOT co-located with the objects.
Best practice: Use standards-based, machine-actionable metadata. Consider PREMIS (PREservation Metadata: Implementation
Strategies).
Interlude
NDSA Levels of Digital Preservation
http://www.digitalpreservation.gov/ndsa/activities/levels.html
Four levels ranging from minimum accepted practice to best practice
{ Stuff’s gotta go somewhere
Storage
What is a repository?
PHYSICAL storage media
“All digital is physical. They aren't literal clouds, folks.” - Stephanie Gowler, Project Conservator at Northwestern University
Don’t panic!
Common default storage habits
Leave it sitting on the computer you used for disk imaging.
Put it back on on an external hard drive and store it in your desk drawer.
Transfer it to your repository and forget about it.
ALL OF THESE ARE BAD OPTIONS!
Physical storage media optionsMedia Type Long-Term Sustainability Cost
Optical media (CDs, DVDs) Terrible! Very short shelf life, high rate of data loss
Dirt cheap
Removable/offline storage media (External hard drives, tape)
Pretty good, low rate of data loss Fairly cheap
Online local disks (Internal computer hard drives)
Pretty good, low rate of data loss Moderate cost
Redundant disk arrays (RAID) Very good, super low rate of data loss Getting pricey
Local network servers (NAS, SAN) Great! Super low rate of data loss and experts are managing it
Expensive
Cloud servers Great (probably)! Super low rate of data loss (probably) and experts are managing it (probably)
Expensive, but varies based on type
Storage guidelines Bare minimum:
Two complete, identical copies on different types of storage media
Ex: one copy on a local desktop computer, one copy on tape backups
+
Location, location, location Best practice:
Three complete, identical copies stored on different types of storage media, in different geographic locations.
Ex: one copy on a local RAID array in NYC, one copy in cloud server storage based in Utah, one copy on LTO tapes in a vault in Michigan.
21
3
Some thoughts on backup
Continuous backup means file corruption gets duplicated.
What’s the retention period?
Know the policies around this. Your IT staff are focused on business continuity, NOT long-term preservation. That’s your job.
Replacement Roughly every five years
BEFORE failure occurs!
{ Who’s in charge of this stuff, anyway?
Management
Why manage?
To ensure that files don’t get corrupted, lost, damaged, or otherwise altered.
To ensure that files can still be used.
Management components Metadata: Do you know what you have?
File fixity: Do you actually have what you think you have?
Information security: What’s happening to what you have?
Preventing obsolescence: Can you use what you have?
File fixity Checksums vs cryptographic hash algorithms MD5 and SHA-1 Verification is key!
File fixity guidelines Bare minimum:
Create checksums of everything on ingest. (See: practical session)
Best practice: VERIFY checksums periodically, and always when
performing these tasks:o Ingest into a repository.o Transferring files from one storage medium to another.
Replace damaged files with copies from another location/storage medium.
Information security for non-IT folks Manage access – particularly delete permissions Log access and actions Audit the logs periodically (or automate)
Information security guidelines Bare minimum:
Know and document who has access to files. Try to restrict (particularly write/delete) to as few people
as possible, and require multi-person authorization to delete.
Best practice: Maintain logs of all actions performed on files. Audit those logs regularly to ensure no unintended actions
were performed.
Preventing obsolescence Bare minimum:
Document information such as file formats, dependencies, etc. (See: metadata)
Best practice: Review format and dependency information regularly to
identify high-risk objects. Perform migration, emulation tests, etc. as obsolescence
issues arise.
Still awake? Questions?
Thank you!
Image credits Windows XP (slide 4): 2K Networking, Inc. NDSA levels (slide 9): NDSA Vault (slide 11): The Mark Consulting Panicked cat (slide 13): Run Salt Run Hard drive (slide 12): x2element Computer (slide 16): Johan Larsson via Compfight cc Wire tower (slide 16): tanakawho via Compfight cc Hard drive (slide 19): Tech-addict File not found (slide 21): Ragha’s Siebel Blog No change (slide 23): Return on Focus Delete (slide 25): The Ramblin Professor Awake cat (slide 28): Desktop Nexus