42
FEDORA Selecting and Implementing an Open Source Digital Repository Corey Keith [email protected]

Fedora

Embed Size (px)

Citation preview

FEDORASelecting and Implementing an Open Source Digital Repository

Corey Keith

[email protected]

Introduction

History FEDORA Overview Object Oriented Principals LC’s Requirements LC’s Architecture Review

Pop Quiz

XML OAIS METS FEDORA DSPACE

FEDORA History

Continuing Research Project – Cornell 1997

Prototype Application– University Virginia

Fedora 1.0– Open Source Release 2002

Fedora 1.2 – Tomorrow!

Options, options, options

Very few tools directly compete with each other

Many tools can be used to accomplish similar behavior

Many tools fulfill parts of the functionality needed for a repository

Roll your own solution

Why Fedora?

Repository Architects & Developers Excited

Object oriented approach to digital objects

Open Source Project– Funded development (and support)

Java Based– Multiple HW Platforms

Flexible

Integrates well with existing systems– CGI Scripts– Web Services

Leaves most decisions to implementers

Extensible

Again, no product can do it all– Imaging, Audio, Transformations,

Courseware Easy to add new functionality to objects Embraces web services Open API’s

– Access– Management

Digital Object What is the definition of a digital object?

–Documents, such as articles, preprints, working papers, technical reports, conference papers –Books –Theses –Data sets –Computer programs –Visualizations, simulations, and other models

–Multimedia publications –Administrative records –Published books –Bibliographic datasets –Images –Audio files –Video files –Reformatted digital library collections –Learning objects –Web pages

list taken from the dspace.org website

Repository Architecture

Objects Behavior Definitions Behavior Mechanisms API

– Management– Access

Object Oriented A software design method that models the

characteristics of abstract or real objects using classes and objects.

Proven Techniques for Software Development– Requirements gathering – Use Cases

• Developers speak to librarians and other stakeholders

Facilitates reuse of functionality Design Patterns Not hacking Perl Scripts to make an

institutional repository

Object Oriented

Data– Metadata

• MODS – Descriptive• METS – Structural• MIX, etc – Technical

– Bit streams• Actual Files – JPG, TIF, WAV, MP3, TEI, EAD

Methods (Behaviors)– Do stuff with the data

Object Oriented Concepts

Classes– Objects of the same type belong to a class

Interfaces– A contract defining behaviors a class of objects will

implement

Encapsulation– Behaviors operate on the data in an object

Reflection– Discover what interfaces and behaviors an object

implements

Image Objects

Two File Image Object– Data

• Hi Resolution Version: tif• Low Resolution Version: jpg

MrSID File Image Object– Data

• MrSID File

Basic Image Interface

getHighResolutionTIF getLowResolutionJPG

Basic Image Interface Implementations

Two File Image Object– getHighResolutionTIF

• returns high resolution TIF

– getLowResolutionJPG• returns low resolution JPG

MrSID Image Object– getHighResolutionTIF

• processes the MrSID file to return a high resolution TIF file of the image

– getLowResolutionJPG• processes the MrSID file to return a low resolution JPG of

the image

Sheet Music Object

Data– MODS Metadata– Images of the pages (Image Objects)– TEI encoded text of the lyrics (TEI Objects)

Behaviors– getPageImage(Pagenumber)

• Invoke the getLowResolutionJPG to return the image!

– getMODS– getLyrics

Persistent ID (PID)

Behavior DefinitionMetadata

SystemMetadata

DatastreamsData Object

Persistent ID (PID)

Service BindingMetadata (WSDL)

SystemMetadata

Datastreams

Persistent ID (PID)

Disseminators

Datastreams

System Metadata

Behavior Mechanism Object

Behavior Definition Object

FEDORA’s Interface Implementation

graphics taken from presentations available at www.fedora.info

What is FEDORA?

“Plumbing” Manage associations between objects

and their interfaces Invoke behaviors from an interface

which an object subscribes Manages or references files

What FEDORA currently does not do?

“Digital Library in a Box”– Requires integration and custom

development

Prescribe the right way to do things– Implementers are free to choose– Best practices still being fleshed out

LC’s Requirements

Complex Digital Objects– Structurally

• METS structMap

– Rich descriptive metadata• Exploiting MODS features

– relatedItem

Choosing Repository Software

Fedora provides a foundation to build on

LC member of initial deployment team No other software is like FEDORA

– Except general purpose programming languages

How LC is implementing FEDORA

Types of Digital Objects– Sheet Music– Scores– Sound Recordings– Compact Discs– Manuscripts– Photographs– Websites– “Collections”

Less emphasis – Intellectual output of university’s research faculty

METS Profiles

Correlates well with classes of objects Articulates

– Structure of an object– Metadata requirements

METS documents conforming to profiles are ingested into repository– Atomization– Behavior association

Architecture

Fedora (Repository) Cocoon (Application Layer)

FedoraRepository

System

web browser

cocoon

Fedora Service APIs

user

SIP vs AIP

Complex digital objects are atomized into small reusable objects upon ingest to FEDORA– Sheet Music METS Profile (SIP)

• Sheet music object (AIP)– Structural metadata encoded in METS – Descriptive encoded in MODS

• Image objects for each page (AIP)– TIF and JPG Files– Technical encoded in MIX

• TEI object for the lyrics (AIP)– TEI File

Why this Architecture? Clean Separation of Concerns

– Logic: Makes it go!– Content: From FEDORA– Style: Web Designers

Object not bound to display – Repository is for preservation of metadata and

files not markup (HTML)– Markup accomplished in cocoon layer

Leverage use of METS structural metadata Performance: Cocoon Caching

User Interface Development

Web Designers– Relate to objects and behaviors– Can develop in HTML for display– XSLT

• Uses XML from repository to drive display

Other Pieces of the Repository Puzzle

Other open source tools– Cocoon

• XML Publishing Framework

– Lucene• Text Indexing and Search API

Someone has to write software!– Java to build Lucene indexes– XSP searching – More XSLT than you want to see

Digital Object Production

How are we building these digital objects?– MySQL – Cocoon– XSLT– Homegrown Java

• Technical metadata extraction

Cocoon

XML Publishing Framework (Toolbox)– Generate

• From files (or URLS)• From databases• From code (XSP, JSP, PHP)

– Transform• XSLT

– Serialize• XML, HTML, PDF, SVG, MIDI?

– Caching

XSLT

Philosophy– Get data into XML as early in the workflow

as possible

Flexibility– Easy to change logic in XSLT– No need to recompile

Performance Issues

Resources Needed for FEDORA (Cheap)

Hardware Requirements– Minimal for experimentation

• Installs on Windows PC• Packaged to get up and running quickly• Demo set of objects

– Scales with hardware in a production environment

Resources Needed for FEDORA (Expensive)

1 or More Developers– 1: Kick the tires– or More: Real production

Application Architects Requirement Analysts Subject Matter Experts

– Articulate requirements• Object Structure• Descriptive Metadata

Summary

Five Questions– Who – What– When– Why– Where

Who

Institutions with resources to do software development

Unique requirements for digital library software – Preexisting tools do not fit the need

Need for integration of existing systems into one management infrastructure

What

Digital Library Plumbing Very general purpose

– Use it to build almost any digital library application

When

December 10th Version 1.2

Why

Robust Set of tools to build YOUR repository

User support high from FEDORA development team

Smart people working on hard problems

Where

www.fedora.info

Questions