15
Towards smart storage for repository preservation services Steve Hitchcock, David Tarrant, Adrian Brown 1 , Ben O’Steen 2 , Neil Jefferies 2 and Leslie Carr Preserv 2 Project School of Electronics and Computer Science, University of Southampton 1 The National Archives, Kew 2 Oxford University Library Services @iPRES 2008: The Fifth International Conference on Preservation of Digital Objects, London, 29-30 September 2008

Towards smart storage for repository preservation services

  • Upload
    abner

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Towards smart storage for repository preservation services. Steve Hitchcock , David Tarrant, Adrian Brown 1 , Ben O’Steen 2 , Neil Jefferies 2 and Leslie Carr Preserv 2 Project School of Electronics and Computer Science, University of Southampton 1 The National Archives, Kew - PowerPoint PPT Presentation

Citation preview

Page 1: Towards  smart storage  for repository preservation services

Towards smart storage for repository preservation services

Steve Hitchcock, David Tarrant, Adrian Brown1, Ben O’Steen2, Neil Jefferies2 and Leslie Carr

Preserv 2 Project

School of Electronics and Computer Science, University of Southampton 1The National Archives, Kew

2Oxford University Library Services

@iPRES 2008: The Fifth International Conference on Preservation of Digital Objects, London, 29-30 September 2008

Page 2: Towards  smart storage  for repository preservation services

Three-stage strategy for keeping your data safe

• Ability to move data freely, easily and instantly– OAI, ORE, Atom

• Reliable, trusted large-scale storage – Open Storage

• Risk profiling: invoke a range of selectable services– Smart storage

Page 3: Towards  smart storage  for repository preservation services

About institutional repositories

• Set up by institutions of higher education and research to manage and disseminate their digital intellectual outputs.

• IRs are a special type of Web site, typically based on some repository software that presents a database of records pointing to the objects deposited.

• The Preserv 2 project is investigating the provision of preservation services for IRs.

IRs in flux• Uncertainty in terms of

target content - published papers, theses, research data, teaching materials - policy, rights, even locus of content and responsibility for long-term management.

• OAI-ORE (Object Reuse and Exchange) effectively frees the data from being captive to repository software.

• Commercial repository services, from software-specific services to digital library services or more general 'cloud' or network storage services.

Photo: Flickr/cpikas

Page 4: Towards  smart storage  for repository preservation services

IRs are

• Open source repository softwares• Open access content • Open archives using OAI-PMH to share data with e.g. discovery

services.• Open repositories, using OAI-ORE enables the easy movement of

data between different types of repository software

Photo: Flickr/Rightee

Page 5: Towards  smart storage  for repository preservation services

A new ‘open’

How open storage supports preservation services

• Open storage, large-scale storage devices based on open source software

• Open storage averts the need for a repository layer to access first-class objects – these are objects that can be addressed directly – In turn, these digital objects can be distributed and/or replicated over

many open storage platforms.

– In turn, able to select storage with built-in preservation support

– Resilient storage platforms may be viable for preservation services aimed at multiple repositories

• E.g. Sun Microsystems STK5800 (codenamed Honeycomb)• Google Repository

Page 6: Towards  smart storage  for repository preservation services

Smart storage

• Smart storage combines an underlying passive storage approach with the intelligence provided through services.

• The key to realising smart storage is to enable the services to communicate and share information with the digital content sources they may be acting on. This is done through machine-level application programming interfaces (APIs) and protocols.

Page 7: Towards  smart storage  for repository preservation services

APIs, interfaces and the Web architecture

• Major services on the Web, such as deploy their own simple, but different, APIs, e.g.– Google Maps

– Within the repository community, SWORD (Simple Web-service Offering Repository Deposit)

– Open storage platforms such as Sun's STK5800 and the Amazon Simple Storage Service (S3)

• To take advantage of open storage, repositories have to be able to talk to these services through their APIs.

Page 8: Towards  smart storage  for repository preservation services

Smart storage example: format services

• Preservation methods affecting formats can be classified in three stages (‘seamless flow’):– Format identification and characterization (which format?)

– Preservation planning and technology watch (format risk and implications)

– Preservation action, migration, etc. (what to do with the format)

• Format-based services tend to be ad hoc processes for which some tools are available – E.g. PRONOM-DROID from The National Archives (UK)

– PRONOM is an online registry of technical information, such as file format signatures

– DROID is a downloadable file format identification tool that applies these signatures)

• These and other tools could be used in a more coordinated manner.

Page 9: Towards  smart storage  for repository preservation services

Smart storage DROID: concept

Page 10: Towards  smart storage  for repository preservation services

Smart storage DROID:

scheduling/history

• Scheduling interface controls when a DROID classification needs to be performed.

• Preserv 2 has developed a scheduling service that uses the Darwin Calendar Server and iCalendar format.

• Provides a powerful scheduling service with many clients already available - Apple iCal, Mozilla Sunbird, and others - that can read and interpret the files so that past and future events can be reviewed.

Page 11: Towards  smart storage  for repository preservation services

Smart storage DROID:

OAI-PMH interface

• An OAI-PMH interface to open storage discovers the latest objects to have been deposited and which are ready for format classification.

• Could also be performed by simpler RSS or Atom-based methods.

• The interface has since been expanded to allow export of OAI-ORE resource maps in both RDF and Atom formats.

Page 12: Towards  smart storage  for repository preservation services

Smart storage DROID: implementation

E.g. iCal, Outlook, Sunbird

DROID

MessagingH

istory

Open storage

OA

I-PM

H

Web server HTTPStores results of DROID events

Calendar server Repository

Atom?

Schedule event

Is event done?

Get results of event

url, date

User interface

Machine interface, API

Implemented To be implemented

Scheduler

DR

OID

-OA

I harvester

Page 13: Towards  smart storage  for repository preservation services

• Risk profiling• The scheduler will invoke actions based on the results of

scanning by DROID allied to decision-making tools that use intelligence from planning and technology watch tools, such as – PRONOM,– Plato preservation planning tool from the EC-funded

Planets project, – and others.

Photo: Flickr/yourbartender

Page 14: Towards  smart storage  for repository preservation services

Summary: smart storage in the storage scheme

Binary stream

File system need to store multiple streams with permissions

Content addressable adds content validation and object identifiers, metadata required to locate an object

Open adds error correction and recovery, places processing close to storage, solves some bandwidth problems

Smart opens up the close-to-storage approach for application development, transition to 'cloud' storage

How smart storage addresses current storage issues – see full paper

Page 15: Towards  smart storage  for repository preservation services

Storage can become smarter

• Openness, in its various forms, the ability to move data freely and easily, needs to be supplemented by decision-making that can be automated based on the supplied intelligence and information.

• In this way, open storage can become ‘smarter’.

http://preserv.eprints.org/

Thanks to