View
283
Download
2
Category
Tags:
Preview:
Citation preview
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards
Ruth Duerr, NSIDCMiQun Yang, THG
Azhar Sikander, NSIDCChoonghwan Lee, THG
Motivation
Technologies change regularly, organizations come and go, but data
must survive
But preserving data takes more than just preserving the bits, all the components
of an AIP are critical
Project Goals
• Prototype development of Archive Information Packages for HDF data: For entire data sets For individual “granules”
• Test usability of digital library standards with geospatial data
Metadata Standards - METS
• Metadata Encoding and Transmission Standard• An initiative of the Digital Library Federation• Provides the means to convey the metadata
necessary for management of digital objects within a repository exchange of objects between repositories (or between
repositories and their users)
• Designed to facilitate shared development of information management
tools/services interoperable exchange of digital materials
METS - A very brief overview
Describes the METS document itself
e.g., creator or editorDescribes the objectusing some external standarde.g., MARC, FGDC, Dublin CoreDescribes object creation, storage,
intellectual property rights, source info, provenance, etc.
e.g., PREMISProvides an inventory of all of the files that are part of the object
describedA physical or logical map of theorganization of the materials
describedAllows specification of hyperlinksbetween parts of the map (mostlyuseful when preserving websites)Used to associate executable code
with parts of the content
ISO 19115 Geographic Information - Metadata
• Purpose Characterize geographic data properly Facilitate organization and management of
metadata for geographic data Enable users to efficiently use such data Facilitate discovery, retrieval, and reuse Enable data assessment
ISO 19115 entities
• Identification• Constraints• Data Quality• Maintenance
Information• Spatial
Representation• Reference System
• Content Information• Portrayal Catalogue
Reference• Distribution• Metadata Extension
Information• Application Schema
Information
Metadata Standards - PREMIS
• Provide a core preservation metadata set with broad applicability across the digital preservation community
• Developed by an OCLC and RLG sponsored international working group Representatives from libraries, museums,
archives, government, and the private sector.
• Maintained by the Library of Congress• Based on the OAIS reference model
Current Program Plan
NetCDF4 / HDF5 Data
METS
NSIDC/ ECS
HDF4-data
ISO-19115
H4toH5
ECS to METS
(Data Set)
CDM/NetCDF4
ECS toMETS(Granule)
NSIDC/ECS
Metadata
HDF5-AIP
NetCDF4/HDF5-data
NetCDF4 / HDF5 Data
NSIDC/ ECS
HDF4-data
H4toH5NetCDF4/HDF5-data
Data file HDF5
METS
Primary Schema Extension Schema
|<mets>|---<dmdSec>----------------<ISO 19115>|---<amdSec>--------------|--<techMD>| |--<rightsMD> PREMIS| |--<sourceMD>|----<fileGrp>|----<structMap>
http://www.hdfgroup.uiuc.edu/papers/papers/AIP/HDF5_AIP_White_Paper.pdf
HDF5 AIP Components
Metadata file
HDF5 File Level Archive Information Packages
METS
Primary Schema Extension Schema
|<mets>|---<dmdSec>----------------<ISO 19115>|---<amdSec>--------------|--<techMD>| |--<rightsMD> PREMIS| |--<sourceMD>|----<fileGrp>|----<structMap>
Metadata file
Data Set Level Archive Information Package
HDF- AIPContextualInfomationHDF- AIPHDF- AIPHDF- AIP
ContextualInfomation
ContextualInfomation
ContextualInfomation
ContextualInfomation
HDF- AIP
File Level AIP Activity Status
• Development of a map from NSIDC/ECS metadata to METS/PREMIS/ISO 19115 completed
• Implementation underway• Issues
Auxillary file handling - own AIP or not?o E.g., browse files, processing history, PGE’so Granules vs files
Schema redundancy
Data Set AIP Activities Status
• Contextual information availability assessed for MODIS data Currently GCSRLTA information requirements are
being met Much of the information is available via a variety of
websites many of which are dynamically updated Format of the material varies widely Some material should be considered geographic
data sets in their own right Much of the material applies to multiple data sets
Recommended