View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Producer-Archive Workflow Network (PAWN)
Goals
•Consistent with the Open Archival Information System (OAIS) model•Use of web/grid technologies and platform independent•Ease of integration with current pilot system based on data grids•XML representation of metadata and bitstreams•Accountability of transfer and guarantee of data integrity
Project MembersJoseph JaJa
Mike Smorul
Yang Wang
Mike McGann
Fritz McCall
Chris Wambler
Gary Jackson
Tim Norris
CRL check
Success/Failure notification of ingestion
METS document registration/retrieval
Producer Management Interface Archive Management Interface
Producer data suppliers
SIP transfer
Bitstream Validation Service
Archive Data Grid
Producer Components Archive Components
•Database to track registered objects
•Certificate Authority management
•Management server supplies web
service interfaces to ingestion
clients and management operations.
•Clients are designed to be
standalone, with security certificates
issued by producer
•Receiving servers validate
connecting clients and validate
SIPs
•Validation Services are simple
webservice calls.
•Abstract I/O layer into digital
archive.
•All components are scalable using
standard load balancing techniques.
Secure Distributed Ingestion
• Distributed security management through multiple Certificate Authorities (CA)• Compatible with existing producer CA’s• SSL encrypted and authenticated connections• Automatic Certificate Revocation List (CRL) checking• Scalable using standard load balancing technology
``
`
Producer
``
`
Producer
``
`
Producer
``
`
Producer
Archive Data Grid
Ingestion Workflow
1. Negotiate Submission Agreement.• Create XML document regarding expected file formats, metadata,
and layout of submission
2. Workflow Initialization and Submission Information Packet (SIP) creation.• Trust relationship between Archive and Producer is established• Clients are issued and register data
3. Transfer of SIPs to archive.• A Submission Information Packet is created on a client.• Client contacts archive and transfers SIP
4. Validation of SIP transfer• Metadata and bitstreams are checked for integrity against
checksums• All items are also checked against requirements document• Bitstreams are validated against test specified in requirements
document.
5. Organization of data and transfer into persistent archive.• Metadata may be transformed into an optimal object format
depending on digital archive requirements
Defining an Information Packet
PAWN uses the Metdata Encoding and Transmission Standard (METS) schema to describe the contents and metadata of a Submission Information Packet (SIP). Each client generates a SIP containing a METS XML document and bitstreams to transfer to an archive.
PAWN uses a template document based on METS combined with a set of rules that allow PAWN to enforce restrictions on how a SIP should arrive at an archive. These restrictions allow for the following types of control:
StructuralLimitations on the hierarchical ordering of document can be enforces
Format
Formats can be defined in a few ways, including required validation tests as defined by an archive, or simpler mime-types
MetadataMetadata can be restricted by schema to certain structural areas
PAWN Client
Multiple PAWN clients run at each producer, each client can independently register and transfer holdings to an archive.
Clients perform two functions, registering its holdings with a producer management server, and later transferring its holdings to an archive. During registration clients will notify a management server about holdings that it wants to transfer to an archive, along with metadata that is locally harvested. After registration a client will later create a SIP and transfer it to an archive.
The two step transfer process allows oversight at the producer. Between registration and submission of data, context at a producer wide level may be attached to holdings.