PAWN: A Novel Ingestion Workflow Technology for Digital Preservation

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

PAWN: A Novel Ingestion Workflow Technology for Digital Preservation. Mike Smorul, Joseph JaJa, Yang Wang, Mike McGann, and Fritz McCall. Overall Principles. Consistent with the Open Archival Information System (OAIS) model Distributed, secure ingestion - PowerPoint PPT Presentation

Text of PAWN: A Novel Ingestion Workflow Technology for Digital Preservation

  • PAWN: A Novel Ingestion Workflow Technology for Digital PreservationMike Smorul, Joseph JaJa, Yang Wang, Mike McGann, and Fritz McCall

  • Overall PrinciplesConsistent with the Open Archival Information System (OAIS) modelDistributed, secure ingestionUse of web/grid technologies platform independentMinimal client-side requirementsEase of integration with archival storage or data grid systems.

  • Producer

  • ProducerProvides data to an Archive based on a prior agreement.

    Consists of a management/metadata server and an ingestion client.

    Provides initial arrangement, context, and metadata.

  • Archive - receiving

  • Archive receivingReceives data from a ProducerValidates bitstreams and metadata, and sends acknowledgement to Producer.Arranges into collections and specifies preservation policy.Publishes bitstreams into a digital archive.

  • Archive Long term preservationImplemented using grid technologies.

    Use the existing prototype NARA/UMD/SDSC site.

    Automated replication and integrity checking.

    Enforces access control and preservation policy

  • Ingestion WorkflowNegotiate Submission Agreement.Workflow Initialization and Submission Information Packet (SIP) creation.Transfer of SIPs to archive.Validation of SIP transferOrganization of data into collections and transfer into persistent archive.

  • Submission AgreementBased on data appraisal and record schedule, including format and metadata.Create machine actionable set of rules describing items.Final Submission Agreement is composed of:METS document for application defaultsMETS Constraint document to limit METS form to submission parameters

  • METS OverviewProvides a framework for linking structural organization of objects with metadata.Using XML namespace, metadata from various XML schema can be attached to objectsIe, dublin core, FGDC, etcExtensible for more complex metadatahttp://www.loc.gov/standards/mets/

  • Sample METS Document

    MetadataLinking

    StructuralOrganization

  • Why METS Constraints?METS doesnt provide a way to create machine interpretable rules describing a collectionIe: allow only JPEG files in certain structural areasMETS profiles allow for developer interpretable rules, not machine interpretable

  • METS ConstraintsAllows structural, metadata, and file constraints.Structural Constraints:Restrict child divs and restrict pointers to div, file, and other mets documentsFile Constraints:Restrict files by mime-type or validation testsMetadata Constraints:Restrict allowed metadata schema.

  • METS Constraints - Template

  • METS Constraints - Rules

  • Ingestion WorkflowNegotiate Submission Agreement.Workflow Initialization and Submission Information Packet creation.Transfer of SIPs to archive.Validation of SIP transferOrganization of data into collections and transfer into persistent archive.

  • Initialize Ingestion workflowInstantiate Producer management server to track registered objectsEstablish a working trust relationship with the ArchiveIssue clients.

  • Create SIPEach client registers objects stored locally with producer management serverRegister file types, validation tests, etcClient follows rules in Submission AgreementProducer-wide agents can arrange registered object to give a broader context

  • SIP ExampleMETS Handles all areas of a SIP except Physical Object and Descriptive InformationDescriptive Information can be embedded into METS as 3rd party XML schema

    Physical ObjectRepresentation Information

    ProvenanceFixityReference Context

    Packaging Information

    Descriptive Information

    Content Information

    Preservation DescriptionInformation

    OAIS Information packet

  • Mapping SIP metadata to METSPackaging InformationSIP only exists in entirety during transitMETS Flocat sections allow mapping of metadata to physical object at various stages in transit.Content InformationPhysical Object encoded in http/tar streamRepresentation Information point to validation services at an archive rather than viewer. Tests are assumed to be representative of viewers

  • Mapping SIP metadata to METS (cont)Preservation Description InformationProvenance stacked File location tagsContext provided by structural map sectionReference can be embedded in various descriptive metadata sections (Dublin Core, etc)Fixity Provided by checksums in each file.

  • Client Interface

  • Ingestion WorkflowNegotiate Submission Agreement.Workflow Initialization and Submission Information Packet creation.Transfer of SIPs to archive.Validation of SIP transferOrganization of data into collections and transfer into persistent archive.

  • Transfer SIP to archiveRetrieve previously registered SIP from producer management serverAuthenticate to archiveUpdate provenance information in METS document with file structure of SIPTransfer METS document describing SIP and container for SIP physical objectsArchive acknowledges transfer completion to producer management server

  • Ingestion WorkflowNegotiate Submission Agreement.Workflow Initialization and Submission Information Packet creation.Transfer of SIP to archive.Validation of SIP transferOrganization of data into collections and transfer into persistent archive.

  • Validation of SIP transferCheck incoming SIP against constraints documents.Ensure object integrity by verifying checksums/cryptographic digestValidate bitstreams against tests described in METS documentUpdate METS document with validation results and movement of objects on receiving server

  • Ingestion WorkflowNegotiate Submission Agreement.Workflow Initialization and Submission Information Packet creation.Transfer of SIP to archive.Validation of SIP transferOrganization of data into collections and transfer into persistent archive.

  • Final transfer to archiveTransfer objects to digital archiveUpdate provenance information in METS document with handle to object in archiveTransfer METS document into archiveReturn accept/reject messages to producer metadata server

  • Component Overview

  • Producer ComponentsDatabase to track registered objectsCertificate Authority managementWeb service for archive security checkManagement server supplies web service interfaces to ingestion clients and management operations.Clients are designed to be standalone, with security certificates issued by producer

  • Archive ComponentsReceiving servers validate connecting clients and validate SIPsValidation Services are simple webservice calls.Abstract I/O layer into digital archive.

  • RecapImplemented using web technologiesArchitecture independentOAIS compliantXML based metadataMETS based SIPsAdd-on constraints describing Submission Agreement

    FGDC - Federal Geographic Data Committee