18
Persistent Digital Archives and Library System (PeDALS) Dennis Bitterlich, Electronic Records Archivist

Persistent Digital Archives and Library System (PeDALS) Dennis Bitterlich, Electronic Records Archivist

Embed Size (px)

Citation preview

Persistent Digital Archives and Library System (PeDALS)

Dennis Bitterlich, Electronic Records Archivist

What is PeDALS?

• A grant funded multi-state project financed by the Library of Congress (National Digital Information Infrastructure & Preservation Program (NDIIPP)) and the Institute for Museum and Library Services

• Includes five state partners: Arizona, Florida, New York, South Carolina and Wisconsin, with Arizona as the lead partner

• Project will run 18-months, until the middle of 2009; if successful, WHS intends to continue participation beyond this period

• At the end of the project each partner will have a functioning electronic records repository

Why is PeDALS Needed?

• An increasing number of state government records of long-term value are created in electronic-only format

• Due to the large and increasing volume of electronic records in varied formats, traditional appraisal and acquisition practices are no longer effective—an automated, rules-based system like PeDALS is one possible response to this new reality

• PeDALS is not an electronic records management system, but rather a way to acquire electronic records already scheduled for transfer

• PeDALS is both a learning opportunity and a chance to implement a functioning system

Goals of the Project

• Develop a methodology to support an automated, integrated workflow to process collections of electronic records

• Implement an inexpensive storage system that can preserve the integrity and authenticity of electronic records over time

• Remove barriers to adoption by keeping costs of the system as low as possible

• Work with Wisconsin Document Depository Program to develop ways to integrate digital format state agency publications into PeDALS processes; since 2005 the Depository has worked to preserve e-publications acquired from state websites

PeDALS Open Archival Information System (OAIS) Network Architecture

Submission Information Package (SIP)Archival Information Package (AIP)

• SIP: Agency records with associated metadata are transferred to the PeDALS system

• Initial checks for authenticity, integrity, restrictions, and any viruses or malware

• AIP: Rules-based software will transform records into format for long-term storage

Lots of Copies Keeps Stuff Safe (LOCKSS)http://www.lockss.org/lockss/Home

• Records are transferred into LOCKSS servers for long-term preservation

• LOCKSS is a data storage system that scans for and repairs file corruption and other data integrity problems

• Hardened firewalls and geographic distribution provides added security

Dissemination Information Package (DIP)

• Web server will provide Internet access to records through a web-based search interface

• Access to records restricted by statute or otherwise will be blocked during restriction period

• Records scheduled for transfer, but not access, are held in the electronic archive, but no user copy is sent to the web server until public access is allowed

Microsoft BizTalk Overview

BizTalk is a middleware application which at its core is an XML Message Queue which will:

Receive Objects → Converts & Performs Logic on Objects → Send Objects

Completed by BizTalk using XML

BizTalk Pipelines

Pipelines• Connections between systems

– Connect BizTalk to databases– Connect BizTalk to web– Connect BizTalk to file servers– Connect BizTalk to programs

BizTalk Business Rules

Business rules– BizTalk speak for high level processes that determine what

orchestrations will be performed– If record series confidential or restricted then go to

orchestration to populate restrictions

BizTalk Orchestrations

Orchestrations– BizTalk speak for the logic to process objects

– Build in logic to calculate length of restrictions and database fields to populate

Initial BizTalk Development Goals & Objectives

1 – Write ARCAT BizTalk Code pipeline– Series already cataloged– Reduced duplication of work & manual data entry – Pipeline will work for CGI/BIN Web Service– Copy programming code to create next pipelines

2 – Write Web Services BizTalk Code pipeline– Copied from CGI/BIN ARCAT Service pipeline– Generic HTTP pipeline to Agencies Web Pages– Can use for PeDALS “Drop Box”

Initial BizTalk Development Goals & Objectives

3 – Write DHS BizTalk Code pipeline– Code copied from prior pipelines– Connect to a database– Solve issues related to external networks

4 – Write DWD BizTalk Code pipeline– Connect to a file server– Issues related to external networks should be solved, but

may be different for file server connection

Initial BizTalk Development Goals & Objectives

5 – Write Call JHOVE, MetaExtractor, or C# Code in BizTalk to wrap records with preservation metadata orchestration– Once we can receive records through pipelines– Create logic to perform in BizTalk– Wrap records in XML in preservation metadata– First, execute a third party open source program such as JHOVE or

MetaExtractor– Second, write code to interact with software programming languages

such as C#

Measurement of Success

1 – Ability to extract MARC records from ARCAT and insert into database

2 – Ability to create external web services pipeline to transfer records to WHS

3 – Ability to create external file pipeline to DHS Quest Archives Manager to transfer records to WHS

4 – Ability to create external file pipeline to DWD to transfer records to WHS

5 – Ability to wrap electronic records with preservation metadata inside of BizTalk

Process to Write Code

Iterative Process to:

1) Write BizTalk programming code

2) Test BizTalk programming code

3) Revise BizTalk programming code

4) Retest BizTalk programming code

Questions?

Dennis Bitterlich, Electronic Records Archivist

[email protected]