Upload
sara-parker
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Government Information Preservation Working Group
Highlights of Digital Preservation Survey
December 16th 2003
Oliver Slattery
Information Access Division
National Institute of Standards and Technology
Need for Digital Preservation:• ….crucial….critical….essential....important…
• Legally required.
• Principle role of agency/central to agency mission.
• 30-100’s years
• Archive distribution and central requirements of data assets.
• Important for department to provide secure, accessible, archival information on QC testing and other technical work.
• Continuity of operations.
• The need to stay current.• Records are ‘permanent’.
Challenges in the next 5 years:Obstacles
• Large/increasing volumes of data
• Multiple formats / format compatibility
• Quality/capacity of media
• Storage space
• Getting customers to use latest media
• Upgrading infrastructure/equipment – procurement (cost and time)
• Ensuring authenticity
Specific challenges/tasks
• Websites (archiving of)
• Preservation with online/on demand access
• Coordinating/integrating preservation procedures
• Migration of current archive
• Ensuring authenticity
Other concerns
• Management/record keeping
• Defining digital preservation
• Test capabilities/equipment (procurement – cost and time)
• Uniformity among suppliers of digital documents
• Same document through every phase of life cycle
Current strategy and its limitations:• Control Formats – limit: must be done at creation.• Use tapes to store and distribute data – limit: tapes are expensive, will soon no
longer be made and are susceptible to errors • DLT, CD/DVD ROM. PDF/TIFF – limit: size, cost, compatibility• Networked computer disk drives and backup magnetic media. Systems include
Access databases and laboratory test database called Testream (SQL) – limit: Access portion not secure or traceable. Backup may be insufficient. No assurances of data accessibility if formats change.
• Coordinate the preservation of born digital items – Limit: resources• Currently migrating from analog to digital. Still acquire in analog, but send out
to customers in digital. Moving towards full digital acquisition. – Limit: storage space and budget. Process is slow.
• From archive to CD/DVD for distribution. ‘Deep archive’ facilities for long term storage. – Limit: Large data sets too big for current archive media capacities.
• HD media (tapes) such as DLT and SDLT ect., Servers/LAN, some web based access. –Limit: Network throughput is small – nearing limitation. Automation not available for HD preservation work.
Research we want to see:Information Quality and Access
• Authentication• Accuracy of rendering. • Universal media.• One size fits all.• Safeguards to ensure authenticity
and version control of archived docs
• PDF for archiving• Universal access tool.• Practices and procedures. Digital
is easy to change but hard to detect changes!
• Standards analysis and development.
Reliability• Media durability• Physical testing and artificial
aging of digital media to predict durability.
• Preservation media.• Testing and evaluation of media.
Important to share results.• Large capacity, reliable archive
media.• Development of media analysis
tools.• Detect changes of error rates in
media.• Classical issues such as video
archiving, microfilm preservation issues, environmental studies.
New/Alternative Technologies• Fiber channel hard drives• Blue-ray discs• Solid state storage• Universal media.• Keeping an eye on future
technology…hardware, software, formats.
• Large capacity, reliable archive media.
New/Alternative Technologies• Fiber channel hard drives• Blue-ray discs• Solid state storage• Universal media.• Keeping an eye on future
technology…hardware, software, formats.
• Large capacity, reliable archive media.
Procedures/Best Practices • Methods for migration of legacy
information.• Safeguards to ensure
authenticity and version control of archived docs
• Practices and procedures. Digital is easy to change but hard to detect changes!
Formats• PDF for archiving• Preservation media.• Universal access tool.• Preservation format.• Format interconversion.
Types of data:
• Data files
• Microfilm
• Multimedia/web
• Imagery (Scanned, digital)
• Documents (mixed/compound, digital)
• Software
• Video
• Laboratory results (from equipment)
• Records
• Graphics/Drawings
• Support data• Binary
• Binary – seismic
• Binary – well logs
• Text
• Audio
Bold = multiple hits
Capture and Collection
Absolute Maximum = 50
Very important = 5 points
Quite important = 4 points
Somewhat important = 3 points
Not especially important = 2 points
Not at all important = 1
0 10 20 30 40 50
Accessing Old Data
Digitizing
Digital Conversion
Data Formats
Amount of Data
Dig
ital
Pre
serv
atio
n I
ssu
es
Score out of 50
Capture and Collection
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Amount of Data: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Capture and Collection
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Data Formats: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Capture and Collection
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Digital Format Conversion: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Capture and Collection
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Digitization of Analog Data: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Capture and Collection
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Accessing Old Data: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Absolute Maximum = 50
Very important = 5 points
Quite important = 4 points
Somewhat important = 3 points
Not especially important = 2 points
Not at all important = 1
0 10 20 30 40 50
Performance
Media S tandards
Media Compatibility
Multiple Media Types
Media Capacity
Media Longevity
Media Technology
Dig
ital
Pre
serv
atio
n I
ssu
es
Score out of 50
Storage Media
Storage Media
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Media Technology: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Storage Media
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Media Longevity: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Storage Media
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Media Capacity: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Storage Media
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Multiple Media Formats: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Storage Media
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Media Compatibility: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Storage Media
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Media Standards: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Storage Media
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Media/System Performance: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Data and Storage Management
Absolute Maximum = 50
Very important = 5 points
Quite important = 4 points
Somewhat important = 3 points
Not especially important = 2 points
Not at all important = 1
0 10 20 30 40 50
Software Tools
Data Managment
Infrustructure
Data Format Stds.
Migration Program
Dig
ital
Pre
serv
atio
n I
ssu
es
Score out of 50
Data and Storage Management
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Data Migration Program: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Data and Storage Management
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Data Format Standards: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Data and Storage Management
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Infrastructure Management: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Data and Storage Management
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Data Management: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Data and Storage Management
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Software Tools: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Access and Distribution
Absolute Maximum = 50
Very important = 5 points
Quite important = 4 points
Somewhat important = 3 points
Not especially important = 2 points
Not at all important = 1
0 10 20 30 40 50
VPN Access
Internet access
Times
Security
Dig
ital
Pre
serv
atio
n I
ssu
es
Score out of 50
Access and Distribution
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Security, Muti-level Access: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Access and Distribution
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Access and Retrieval Times: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Access and Distribution
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Need to Provide Internet Access: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Access and Distribution
The maximum number of hits per level of importance is 10.
The minimum number of hits per importance level is 0.
Virtual Private Network Access: Importance
0 2 4 6 8
Not at all
Not especially
Somewhat
Quite
Very
Imp
orta
nce
Lev
el
Hits out of 10
Thanks
Thanks to all who replied.
Survey creation:
Jerry McFaul, Ollie Slattery, Victor McCrary, Fred Byers , Xiao Tang, Rich Vining.