Upload
christopher-porter
View
216
Download
0
Embed Size (px)
Citation preview
27 March 2006Digital Preservation in State Government – Wilmington,
NC
NCSU Libraries
North Carolina Geospatial Data Archiving Project
Workflow, Tools, and Resources
Jim Tuttle, Geospatial Data Librarian
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Project Overview
• Partnership between university library (NCSU) and state agency (NCCGIA)
• One of eight projects in the first NDIIPP funding round: "Building a Network of Partners"
• Focus on state and local geospatial content in North Carolina
• Objective: engage existing state/federal geospatial data infrastructures in preservation
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Content Complexity
• Multi file objects
• Spatial databases
• Ancillary data files
• Time-versioning
• Diverse data sources/metadata practices
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Workflow Overview
• Acquisition
• Format Migration
• Submission Information Package (SIP) Creation
• Ingest Metadata
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Acquisition: Workflow
• Collection creation/declaration
• File Manifest
• Metadata Seed File
• Transfer data to processing machine
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Acquisition: Tools and Resources
• PHP/PostgreSQL form
• Python automation scripting
• Threat analysis– ClamAV– Unix ‘file’ utility
• jjtuttle@dli:~/$ file putty
• putty: MS-DOS executable (EXE), OS/2 or MS Windows
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Acquisition: Tools and Resources
• Md5 checksum – jjtuttle@dli:~/$ md5sum O-view.vsd– 69b3e2f6cff1537bd607f5522d0c5c4d O-view.vsd
• Jhove
• Format registries – PRONOM (UK National Archives), GDFR
(Harvard/Mellon), Fred (LC)
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Format Migration: Workflow
• On-receipt migration of selected formats
• Object-level metadata creation/augmentation
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Format Migration: Tools and Resources
• Python batch process wrappers
• ArcCatalog metadata templates
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
SIP Item Creation: Workflow
• Submission Information Package grouping– Ontology logic based on defined multi-file
complex format components and directory structure
• Repository-agnostic item grouping
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
SIP Item Creation: Tools and Resources
• Python scripts highly dependent on:– Explicit understanding of ontological relationships
of complex format components– Logical directory structure as dictated by data-
producer software
• Spreadsheet illustrating item assignment for manual review
• Automated revision of assignment based on spreadsheet modifications
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Ingest Metadata: Workflow
• Extraction of elements from multiple sources
• Crosswalk metadata to archive ingest record (DSpace Qualified Dublin Core), METS, and external Workflow Management Database
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Ingest Metadata: Tools and Resources
• Python XML libraries
• XSL/XSLT
• NOID (Nice Opaque Identifier) Persistent Identifier
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
Conclusion
• Plenty of free, open source tools
• The robustness of an ingest process must be inversely proportionate to the demands placed on data producers in preparation for ingest
• Finding the balance between cost-saving automation and the accuracy and flexibility of human intervention is difficult
27 March 2006Digital Preservation in
State Government - Wilmington
NCSU Libraries
For More Information
• NCGDAP – North Carolina Data Archiving Project http://www.lib.ncsu.edu/ncgdap/
• NDIIPP – National Digital Information Infrastructure Preservation Program http://www.digitalpreservation.gov/
• ClamAV http://www.clamav.net/
• Unix File utility: ‘man file’
• JHOVE – JSTOR Harvard Object Validation Environment http://hul.harvard.edu/jhove/
• PRONOM Format Registry http://www.nationalarchives.gov.uk/pronom/
• GDFR – Global Digital Format Registry (in planning) http://hul.harvard.edu/gdfr/
• Fred Format Registry (proof-of-concept) http://tom.library.upenn.edu/cgi-bin/fred?cmd=Default&sid=ca21d10e67b269a75a98fe369d2ab670
• XSLT – eXtensible Stylesheet Language Transformations http://www.w3.org/TR/xslt
• NOID – Nice Opaque IDentifier http://www.cdlib.org/inside/diglib/ark/
Jim Tuttle, Geospatial Data Librarian
jim_tuttle at ncsu dot edu