1
Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform independent Ease of integration with current pilot system based on data grids XML representation of metadata and bitstreams Accountability of transfer and guarantee of data integrity Project Members Joseph JaJa Mike Smorul Yang Wang Mike McGann Fritz McCall Chris Wambler Gary Jackson Tim Norris C R L check Success /Failure notification ofingestion M ETS docum entregistration/retrieval ProducerM anagem entInterface Archive M anagementInterface Producerdata suppliers SIP transfer Bitstream Validation Service Archive D ata G rid Producer Components Archive Components Database to track registered objects Certificate Authority management Management server supplies web service interfaces to ingestion clients and management operations. Clients are designed to be standalone, with security certificates issued by producer Receiving servers validate connecting clients and validate SIPs Validation Services are simple webservice calls. Abstract I/O layer into digital archive. All components are scalable using standard load balancing techniques. Secure Distributed Ingestion Distributed security management through multiple Certificate Authorities (CA) Compatible with existing producer CA’s SSL encrypted and authenticated connections Automatic Certificate Revocation List (CRL) checking Scalable using standard load balancing technology ` ` ` Producer ` ` ` Producer ` ` ` Producer ` ` ` Producer Archive D ata G rid Ingestion Workflow 1.Negotiate Submission Agreement. Create XML document regarding expected file formats, metadata, and layout of submission 2.Workflow Initialization and Submission Information Packet (SIP) creation. Trust relationship between Archive and Producer is established Clients are issued and register data 3.Transfer of SIPs to archive. A Submission Information Packet is created on a client. Client contacts archive and transfers SIP 4.Validation of SIP transfer Metadata and bitstreams are checked for integrity against checksums All items are also checked against requirements document Bitstreams are validated against test specified in requirements document. 5.Organization of data and transfer into persistent archive. Metadata may be transformed into an optimal object format depending on digital archive requirements Defining an Information Packet PAWN uses the Metdata Encoding and Transmission Standard (METS) schema to describe the contents and metadata of a Submission Information Packet (SIP). Each client generates a SIP containing a METS XML document and bitstreams to transfer to an archive. PAWN uses a template document based on METS combined with a set of rules that allow PAWN to enforce restrictions on how a SIP should arrive at an archive. These restrictions allow for the following types of control: Structural Limitations on the hierarchical ordering of document can be enforces Format Formats can be defined in a few ways, including required validation tests as defined by an archive, or simpler mime- types Metadata Metadata can be restricted by schema to certain structural areas PAWN Client Multiple PAWN clients run at each producer, each client can independently register and transfer holdings to an archive. Clients perform two functions, registering its holdings with a producer management server, and later transferring its holdings to an archive. During registration clients will notify a management server about holdings that it wants to transfer to an archive, along with metadata that is locally harvested. After registration a client will later create a SIP and transfer it to an archive. The two step transfer process allows oversight at the producer. Between registration and submission of data, context at a producer wide level may be attached to holdings.

Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform

Producer-Archive Workflow Network (PAWN)

Goals

•Consistent with the Open Archival Information System (OAIS) model•Use of web/grid technologies and platform independent•Ease of integration with current pilot system based on data grids•XML representation of metadata and bitstreams•Accountability of transfer and guarantee of data integrity

Project MembersJoseph JaJa

Mike Smorul

Yang Wang

Mike McGann

Fritz McCall

Chris Wambler

Gary Jackson

Tim Norris

CRL check

Success/Failure notification of ingestion

METS document registration/retrieval

Producer Management Interface Archive Management Interface

Producer data suppliers

SIP transfer

Bitstream Validation Service

Archive Data Grid

Producer Components Archive Components

•Database to track registered objects

•Certificate Authority management

•Management server supplies web

service interfaces to ingestion

clients and management operations.

•Clients are designed to be

standalone, with security certificates

issued by producer

•Receiving servers validate

connecting clients and validate

SIPs

•Validation Services are simple

webservice calls.

•Abstract I/O layer into digital

archive.

•All components are scalable using

standard load balancing techniques.

Secure Distributed Ingestion

• Distributed security management through multiple Certificate Authorities (CA)• Compatible with existing producer CA’s• SSL encrypted and authenticated connections• Automatic Certificate Revocation List (CRL) checking• Scalable using standard load balancing technology

``

`

Producer

``

`

Producer

``

`

Producer

``

`

Producer

Archive Data Grid

Ingestion Workflow

1. Negotiate Submission Agreement.• Create XML document regarding expected file formats, metadata,

and layout of submission

2. Workflow Initialization and Submission Information Packet (SIP) creation.• Trust relationship between Archive and Producer is established• Clients are issued and register data

3. Transfer of SIPs to archive.• A Submission Information Packet is created on a client.• Client contacts archive and transfers SIP

4. Validation of SIP transfer• Metadata and bitstreams are checked for integrity against

checksums• All items are also checked against requirements document• Bitstreams are validated against test specified in requirements

document.

5. Organization of data and transfer into persistent archive.• Metadata may be transformed into an optimal object format

depending on digital archive requirements

Defining an Information Packet

PAWN uses the Metdata Encoding and Transmission Standard (METS) schema to describe the contents and metadata of a Submission Information Packet (SIP). Each client generates a SIP containing a METS XML document and bitstreams to transfer to an archive.

PAWN uses a template document based on METS combined with a set of rules that allow PAWN to enforce restrictions on how a SIP should arrive at an archive. These restrictions allow for the following types of control:

StructuralLimitations on the hierarchical ordering of document can be enforces

Format

Formats can be defined in a few ways, including required validation tests as defined by an archive, or simpler mime-types

MetadataMetadata can be restricted by schema to certain structural areas

PAWN Client

Multiple PAWN clients run at each producer, each client can independently register and transfer holdings to an archive.

Clients perform two functions, registering its holdings with a producer management server, and later transferring its holdings to an archive. During registration clients will notify a management server about holdings that it wants to transfer to an archive, along with metadata that is locally harvested. After registration a client will later create a SIP and transfer it to an archive.

The two step transfer process allows oversight at the producer. Between registration and submission of data, context at a producer wide level may be attached to holdings.