Upload
vothien
View
223
Download
4
Embed Size (px)
Citation preview
CUWL
June 3, 2016
Digital Preservation Outreach and Education (DPOE)
Managing Digital Content over Time:
Store Module
Modules
DPOE Baseline Modules: Intro, version 2.0, Nov 2011
Select - what portion of that content will be preserved?
Identify - what digital content do you have?
Store - what issues are there for long term storage?
Protect - what steps are needed to protect your digital content?
Manage - what provisions are needed for long-term management?
Provide - what considerations are there for long-term access?
2
Key Decision Points
• How are you going to organize it?
• What are you going to store it on?
• Where are you going to store it?
• How many copies do you need?
4
File Naming
• Why is this important?
– To prevent accidental
overwriting
– To help you find it again
Train Wreck Image ID: WHi-2011
• Do t use spe ial hara ters i your file/folder titles ^ <>|?\ / : @ * &) Just because you CAN does ’t ea you SHOULD
5
File Naming
• Keep folder / document titles short and descriptive
• Date your documents consistently
– yyyymmdd_brieftitle.xxx
• Clearly label drafts and revisions
6
Examples
• Good
20070113_ExecutiveBoardMinutes.doc
2006_AnnualBudget_FINAL.doc
• Not so good: April Minutes.doc 04 minutes.doc Board minutes.doc Draftminutes.doc
7
File Management
• Store similar digital items together – Co-locate in a central location
• Do t ury ite s i ultiple le els
• Get rid of easy-to-purge items – Rescued or recovered documents
– Empty file folders
– ~.tmp files
8
File Management
• Make decisions about what NOT to keep – File backups/copies/drafts
– Supplementary files that provide no additional long-term
value
– Corrupted files
– Certain file formats
• Leave breadcrumbs
• Deter i e hat you do t k o
9
Remove Empty Directories
The application searches and deletes empty
directories recursively below a given start
folder and shows the result in a well arranged
tree
http://sourceforge.net/projects/rem-empty-
dir/files/latest/download?source=files
12
Remove Duplicate Files
• Auslogics Duplicate File Finder
http://www.auslogics.com/en/software/duplicate-file-finder/
• Similar Images
http://similarimages.en.softonic.com/
• VisiPics
http://www.visipics.info/index.php?title=Main_Page
13
Rename Your Files
• Advanced Renamer https://www.advancedrenamer.com
Advanced Renamer is a free program for renaming multiple
files and folders at once.
You can construct new file names by adding, removing,
replacing, changing case, or giving the file a brand new name
based on known information about the file.
19
Do u e t Your De isio s….
Create a documents standardizing your:
• File naming conventions
• Folder organization
• Acceptable formats
21
Well-managed Collections
Well-managed status makes preservation easier
Sample characteristics of well-managed:
• Basic information about each deposit
• Minimal metadata for objects
• Common (or normalized) file formats
• Controlled and known storage of content
• Multiple copies in at least 2 locations
22
Purpose of Metadata
Information that helps to....
- find,
- use,
- describe,
- manage, and
- understand
... Your digital content
23
Importance of Metadata
• How do you know what an object is?
− Metadata uniquely identifies digital objects
• How do you use content in the future?
– Metadata makes digital objects understandable
• How do you know an object is authentic?
– Metadata allows objects to be traced over time
Metadata enables long-term preservation
24
Metadata uniquely identifies digital
objects
Unique Identifiers can also store descriptive data
EX: MFM121587
25
How do you know an object is
authentic?
One-Way Encryption b43efderwkl3jh7834
In 2004,
One-Way Encryption 845kjsnlkdrkjhndgiu5
But in 2010
Different hash means the file has changed
27
Preservation Metadata Content (what), Fixity (unchanged), Provenance (life story),
Reference (this thing), Context (relationships)
Administrative
(manage) Structural
(understand, use)
Descriptive
(find, use)
Object-level Metadata
Diagram courtesy DPM Workshops
28
Archival Metadata s Goals
Content: preserve the substance
Fixity: demonstrate content is unchanged
Reference: identify as this content and no other
Provenance: trace to its origin (or to deposit)
Context: preserve linkages with other objects
Original source: Preserving Digital Information Report, 1996
29
Key Decision Points
• How are you going to organize it?
• What are you going to store it on?
• Where are you going to store it?
• How many copies do you need?
30
What Drives Storage Decisions?
• Immediate Costs
– Quantity (size and number of files)
– Number of copies
– Media (life span, availability, $$)
31
What Drives Storage Decisions?
• Other resources
– Expertise (skills required to manage)
– Services (local vs. hosted)
– Partners (achieving geographic distribution)
• Institutional constraints (e.g., legal restrictions)
32
Archival Storage
Archival storage manages content as Digital
Objects
Digital content (files + metadata = object)
•May include any type of content
– e.g., images, text, sound, video, maps
•Requires some identification and description
– Captured as metadata
33
Archival Storage ≠ Ba kup
Backups keep your computer
working and files safe Archival Storage keeps the
content accessible for future
computers and users
34
What akes it Ar hi al storage?
• Your storage system may include any type of
content
– e.g., images, text, sound, video, maps
• Requires some identification and description
– Captured as metadata
• Reliable, long-term bit preservation
• Needs at least two copies in at least two places
35
Storage Options
• Content (objects) are kept on storage media
– online, near-line, offline, cloud
• Factors for choosing options include
– Cost (available resources for preservation)
– Quantity (size and number of files)
– Expertise (skills required to manage)
– Partners (achieving geographic distribution)
– Services (outsourcing)
36
Key Decision Points
• How are you going to organize it?
• What are you going to store it on?
• Where are you going to store it?
• How many copies do you need?
40
Storage options
Locally • You will manage everything in-house
Storage partners • DuraCloud
• Other institutions
Large commercial options • Google Drive
• Amazon Simple Storage Service (S3)
41
The ostly Good…..
Responsibilities and costs are transferred to the
other entity • Installation / replacement / upgrades of hardware and
software
• Backup and recovery of data are part of the package
• No local physical presence (valuable space)
• No local environmental requirements (power or cooling costs)
42
The (potentially) Bad
There are pote tial disad a tages ho e er….. • Can records be managed correctly throughout their entire
lifecycle?
• Can it support Open Records requests?
• Security concerns
• Do you know where your data is?
• Accessibility – ore poi ts of failure he the data is remote
• Costs for accessing data can be high
43
Resources
State of Wisconsin Public Records Board has created two
documents which can be found at:
http://publicrecordsboard.wi.gov/docs_all.asp?locid=165
• Public Records Board Guidance on the Use of Contractors
for Records Management Services
• Use of Contractors for Records Management Services
(Both docs are in the Reference Materials section)
44
Repository Selection
If you decide to use (build, join, buy) a repository
•Range of types to consider:
– general (any content) to specialized (format-
specific)
– open source to proprietary
– easy to advanced installation and management
•Each option has pros and cons
•No system is fully compliant to standards
Select best option for your content – for now
45
Key Decision Points
• How are you going to organize it?
• What are you going to store it on?
• Where are you going to store it?
• How many copies do you need?
46
Number of Copies
How many copies are enough for you?
• Minimum: two (2) copies in two location
• Additional copies in additional locations
mitigate risk
Examples of storage factors:
• Video files are too large to store 6 copies
• Possible legal restrictions (e.g., storage locations)
• Types of media used for storing the content 47
Store – To Do List
Digital preservation requires an organization to:
• Develop a storage management policy
– E.g., number of copies, locations, fixity means
– Technical team roster and stakeholder list
• If you opt to manage your own content
• Determine functional requirements for storage system
• Monitor copies of content for errors/change
• Plan for storage system replacement
48
Store – To Do List
•If you decide to let someone else manage your
content • Specify storage service or partner agreements
49