15
27 March 2006 Digital Preservation in State Government – Wilmington, NC NCSU Libraries North Carolina Geospatial Data Archiving Project Workflow, Tools, and Resources Jim Tuttle, Geospatial Data Librarian

NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

Embed Size (px)

Citation preview

Page 1: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in State Government – Wilmington,

NC

NCSU Libraries

North Carolina Geospatial Data Archiving Project

Workflow, Tools, and Resources

Jim Tuttle, Geospatial Data Librarian

Page 2: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Project Overview

• Partnership between university library (NCSU) and state agency (NCCGIA)

• One of eight projects in the first NDIIPP funding round: "Building a Network of Partners"

• Focus on state and local geospatial content in North Carolina

• Objective: engage existing state/federal geospatial data infrastructures in preservation

Page 3: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Content Complexity

• Multi file objects

• Spatial databases

• Ancillary data files

• Time-versioning

• Diverse data sources/metadata practices

Page 4: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Workflow Overview

• Acquisition

• Format Migration

• Submission Information Package (SIP) Creation

• Ingest Metadata

Page 5: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Acquisition: Workflow

• Collection creation/declaration

• File Manifest

• Metadata Seed File

• Transfer data to processing machine

Page 6: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Acquisition: Tools and Resources

• PHP/PostgreSQL form

• Python automation scripting

• Threat analysis– ClamAV– Unix ‘file’ utility

• jjtuttle@dli:~/$ file putty

• putty: MS-DOS executable (EXE), OS/2 or MS Windows

Page 7: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Acquisition: Tools and Resources

• Md5 checksum – jjtuttle@dli:~/$ md5sum O-view.vsd– 69b3e2f6cff1537bd607f5522d0c5c4d O-view.vsd

• Jhove

• Format registries – PRONOM (UK National Archives), GDFR

(Harvard/Mellon), Fred (LC)

Page 8: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Format Migration: Workflow

• On-receipt migration of selected formats

• Object-level metadata creation/augmentation

Page 9: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Format Migration: Tools and Resources

• Python batch process wrappers

• ArcCatalog metadata templates

Page 10: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

SIP Item Creation: Workflow

• Submission Information Package grouping– Ontology logic based on defined multi-file

complex format components and directory structure

• Repository-agnostic item grouping

Page 11: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

SIP Item Creation: Tools and Resources

• Python scripts highly dependent on:– Explicit understanding of ontological relationships

of complex format components– Logical directory structure as dictated by data-

producer software

• Spreadsheet illustrating item assignment for manual review

• Automated revision of assignment based on spreadsheet modifications

Page 12: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Ingest Metadata: Workflow

• Extraction of elements from multiple sources

• Crosswalk metadata to archive ingest record (DSpace Qualified Dublin Core), METS, and external Workflow Management Database

Page 13: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Ingest Metadata: Tools and Resources

• Python XML libraries

• XSL/XSLT

• NOID (Nice Opaque Identifier) Persistent Identifier

Page 14: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

Conclusion

• Plenty of free, open source tools

• The robustness of an ingest process must be inversely proportionate to the demands placed on data producers in preparation for ingest

• Finding the balance between cost-saving automation and the accuracy and flexibility of human intervention is difficult

Page 15: NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,

27 March 2006Digital Preservation in

State Government - Wilmington

NCSU Libraries

For More Information

• NCGDAP – North Carolina Data Archiving Project http://www.lib.ncsu.edu/ncgdap/

• NDIIPP – National Digital Information Infrastructure Preservation Program http://www.digitalpreservation.gov/

• ClamAV http://www.clamav.net/

• Unix File utility: ‘man file’

• JHOVE – JSTOR Harvard Object Validation Environment http://hul.harvard.edu/jhove/

• PRONOM Format Registry http://www.nationalarchives.gov.uk/pronom/

• GDFR – Global Digital Format Registry (in planning) http://hul.harvard.edu/gdfr/

• Fred Format Registry (proof-of-concept) http://tom.library.upenn.edu/cgi-bin/fred?cmd=Default&sid=ca21d10e67b269a75a98fe369d2ab670

• XSLT – eXtensible Stylesheet Language Transformations http://www.w3.org/TR/xslt

• NOID – Nice Opaque IDentifier http://www.cdlib.org/inside/diglib/ark/

Jim Tuttle, Geospatial Data Librarian

jim_tuttle at ncsu dot edu