50
Introducing “Pergamos” Libraries Computer Center Department of Informatics & Telecommunications University of Athens A FEDORA-based Digital Library System utilizing Digital Object Prototypes Kostas Saidis [email protected] European FEDORA User Meeting Copenhagen, 28 September 2005

Introducing “Pergamos” Libraries Computer Center Department of Informatics & Telecommunications University of Athens A FEDORA-based Digital Library System

Embed Size (px)

Citation preview

Introducing “Pergamos”

Libraries Computer Center

Department of Informatics & Telecommunications

University of Athens

A FEDORA-based Digital Library System utilizing Digital Object Prototypes

Kostas [email protected]

European FEDORA User MeetingCopenhagen, 28 September 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Outline Motivation – The University of Athens (UoA) DL Digital Objects (DOs)

DO Storage (FEDORA) DO Manipulation (DL Application Logic)

Digital Object Prototypes Automatic DO Type Conformance Scope of Prototypes & Collection Management Implementation Details

A Preview of Pergamos Discussion

September 28, CopenhagenEuropean FEDORA User Meeting 2005

The UoA DL Project Over 1 million objects originating from 8

disparate collections Folklore notebooks, Ancient papyri, UoA Historical

Archive, Byzantine music manuscripts, Theatrical photos & brochures, Informatics research papers and dissertations, Medical images, Press articles

Heterogeneous material, in terms of content type, metadata, structure, user requirements

Mostly digitized material, requiring detailed cataloging

September 28, CopenhagenEuropean FEDORA User Meeting 2005

UoA DL Project Metadata Build a Web-based DL System to handle all

material Centralized DL approach due to

Existing hardware infrastructure Funding restrictions Administration simplicity

FEDORA is our DO Repository

September 28, CopenhagenEuropean FEDORA User Meeting 2005

UoA DL Project Metadata Contd. Small Team

2.5 developers, 1 librarian, 1 manager Requirements, Specifications, Development,

Digitization & Cataloging Management … … while everyday tasks keep running!

Cataloging Personnel Scholars & Experts in each collection’s domain

(not librarians) Strict Schedule

First Collection deadline: early 2006 Project deadline: end of 2006

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Motivation Simplify & speed up the cataloging process

Provide effective Web-based cataloging interfaces

Automate content ingestion Decrease development time

Avoid custom coding for each content variation Elaborate on reusable and configurable DL

modules Provide the means to treat content variations in

a unified manner

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Digital Objects A Digital Object is a human generated artifact

consisting of the digital content and related information

September 28, CopenhagenEuropean FEDORA User Meeting 2005

FEDORA FEDORA Digital Object Model

Content Models, Datastreams, Behavior Definitions, Mechanisms & Disseminators

FEDORA is a DO Repository Focus on how each DO part is encoded &

stored Handles effectively issues related to storage,

preservation & versioning, searching & indexing, interoperability

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Traditional 2-tier Approach

September 28, CopenhagenEuropean FEDORA User Meeting 2005

DL Application Logic Cataloging, Workflows, Collection Building &

Management, User Interfaces, etc DL Modules manipulate DOs in a higher level

of abstraction Focus on the overall behavior of the DO

(what are the DO parts and how do they behave)

DOs reflect the underlying “real world” objects – they behave according to their nature, their essence, their type

DO Typing information

Do we effectively capture, express and utilize the nature (type) of DOs?

September 28, CopenhagenEuropean FEDORA User Meeting 2005

An example – Theatrical Collection Albums containing photos of National Theater

Performances What is a Photo DO?

A digital image stored in various formats (e.g high quality, www

quality, thumbnail) accompanied by the metadata required for

describing the picture What is an Album DO?

A container of Photo DOs accompanied by theatrical play metadata

September 28, CopenhagenEuropean FEDORA User Meeting 2005

A 2nd example – Historical Archive University’s Senate Session Proceedings >

Folders > Sessions > Items What is a Item DO?

A digital image (capturing 1 or 2 pages) stored in various formats (e.g high quality, www

quality, thumbnail) What is a Session DO?

A container of Item DOs + metadata What is a Folder DO?

A container of Session DOs + metadata

September 28, CopenhagenEuropean FEDORA User Meeting 2005

DO Typing Information FEDORA Content Models express DO Typing

information Content Models are metadata attributes (e.g.

“photo”, “album”) that we use as a guide Humans interpret Content Models, not the DL

System Manual resolution of DO Typing issues

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Problems Catalogers carry out manual XML editing in a

low level of abstraction with too technical, complex & over detailed semantics

Developers generate ad-hoc, custom & not reusable implementations of DO types’ variations of behavior

DL modules exhibit limited evolution and configuration capabilities

DO Typing Information

The DL System should resolve DO Typing issues automatically

(in a manner transparent to the DL Application Logic)

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Automatic DO Type Conformance The designer specifies the various DO types… … and the DL System makes DOs conform to

these type specifications automatically How?

September 28, CopenhagenEuropean FEDORA User Meeting 2005

By drawing on the notions of OO

September 28, CopenhagenEuropean FEDORA User Meeting 2005

The OO Viewpoint In the OO model an object is itself aware of its

“nature” and behaves accordingly Objects are conceived as instances of a type,

automatically conforming to the type’s definitions & specifications

OO types are separate entities (named either classes or prototypes)

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Digital Object Prototypes A DO Prototype is a DO Type Specification, a

separate entity that defines the DO’s: Constitutional parts – metadata sets, files,

structure, etc Private behaviors – DO internal operations

such as serializations, validations, assignment of default values, content conversions, etc

Public behaviors (behavior schemes) – the DO external interface, consisting of high level operations such as Detail view, Browse View, Edit View, etc

September 28, CopenhagenEuropean FEDORA User Meeting 2005

OO Encapsulation

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Photo Prototype & Instances

September 28, CopenhagenEuropean FEDORA User Meeting 2005

DO Prototypes & Instances The designer carries out the definition of DO

Prototypes – the DL System handles the rest DO Prototypes represent the realization of the

Content Model notion in a OO fashion: The process of generating a DO from a

Prototype is called instantiation The resulted object is an instance of the

prototype A DO instance automatically conforms to the

Prototype’s specifications Stored DOs vs DO instances

September 28, CopenhagenEuropean FEDORA User Meeting 2005

3-tier DL Architecture

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Digital Object Dictionary The runtime environment in which DO instances

and Prototypes operate: Instantiation of DOs based on the prototype

specifications (private behaviors: load & parse XML, assign default values, etc)

Exposure of the public DO behaviors in a high level, uniform API (for use by DL Modules)

Serialization of the DO instance back to FEDORA (private behaviors: serialize data structures in XML, perform validations, etc)

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Expression of DL Application Logic A DL Module performs the following steps:

1. Acquire the DO Instancedo = dictionary.acquireObject(“type”)do = dictionary.acquireObject(“uoadl:1024”)

2. Perform operations upon itdo.getMDSet(“DC”).getField(“title”)dictionary.executeBehavior(do, “editView”)

3. Store the DO in the repositorydictionary.saveObject(do)

Cleaner, simpler, more effective

September 28, CopenhagenEuropean FEDORA User Meeting 2005

3-tier DL ArchitectureS

epar

atio

n o

f C

on

cern

s

September 28, CopenhagenEuropean FEDORA User Meeting 2005

3-tier DL Architecture

Storage

Sep

arat

ion

of

Co

nce

rns

September 28, CopenhagenEuropean FEDORA User Meeting 2005

3-tier DL Architecture

Storage

DO Typing & Instantiation

Sep

arat

ion

of

Co

nce

rns

September 28, CopenhagenEuropean FEDORA User Meeting 2005

3-tier DL Architecture

Storage

DO Typing & Instantiation

Composition of DO behaviors

Sep

arat

ion

of

Co

nce

rns

If it sounds like Greek…

Pergamos

September 28, CopenhagenEuropean FEDORA User Meeting 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Scope of Prototypes Should we have global DO Types? Collection-pertinent types: A DO Prototype is

defined in the context of a Collection Support fine grained definition of collection

specific kinds of material Hierarchical naming scheme for types

Theatrical Collection Photo: dl.theatre.photo Medical Collection Photo: dl.medical.photo Stored in the “contentModel” metadata attribute

Avoid type collisions

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Album Prototype & Instances

September 28, CopenhagenEuropean FEDORA User Meeting 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Collection Management DL = Hierarchy of DO instances

Collections are also DOs The DL itself is a DO, representing the “super-

collection” (the collection of all the collections) Easily add new collections & sub-collections All content is modeled in a unified manner &

can be characterized Allow the DL designer to work out the details of

each collection independently, yet in a uniform manner

September 28, CopenhagenEuropean FEDORA User Meeting 2005

DL as a Hierarchy of DO instances

September 28, CopenhagenEuropean FEDORA User Meeting 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Implementation details DO Prototypes are

Specified in XML form Stored in the “TEMPLATE” datastream of the

appropriate Collection DO Loaded, parsed & interpreted by the DO

Dictionary in its bootstrap procedure Transparent to FEDORA

DO Instances are supplied with the “CONTAINER” datastream, containing the pids of the DOs they “contain”

September 28, CopenhagenEuropean FEDORA User Meeting 2005

DO Prototypes in detail MD Sets

Specification of each individual field (label, description, multi-value, mandatory, UI characteristics)

Serialization information (how to store it in FEDORA) Field mappings (under development)

Files: Automatic conversions (tiff -> jpeg + thumb) Batch Import: automatically create Dos from zip bundles Structure: allowed children types Browsers: browse field Indices: e.g. subject catalog Behavior schemes: atomic DO elements

Discussion

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Pergamos Historical Archive (production) Folklore Notebooks (testing) Theatrical Collection, Medical Images &

Byzantine music manuscripts (finalization of requirements & specifications)

Undergoing development … the remaining collections are coming next

Historical Archive will be published on early 2006…

… with a multi-lingual UI, hopefully!

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Public DO BehaviorsFEDORA Behaviors Behavior Schemes

Are defined in each DO separately

Are defined once and in one place (in the Prototype)

Operate on the datastreams Operate on the atomic elements of a DO

Invoked directly on the DO Invoked as in OO Dynamic Method Dispatch

Require the a priori existence of datastreams

Instantiation (empty DO)

Generic Targeted on UI issues

Exposed as Web services Web services will be of use after the DL has been built

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Future Work Fully implement the OO paradigm

OO Inheritance for DO Prototypes (e.g the Notebook type derives from the Book type)

OO Polymorphism for DO instances (e.g the DO “uoadl:1234” is both a Notebook & a Book)

Supply general purpose linking capabilities that exceed structural relations (FEDORA Metadata for Object-to-Object Relationships?)

Deliver on schedule…

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Conclusions If in doubt, use FEDORA

Flexible & Extensible (they mean it) 1 year of Pergamos development, 2 months of

testing & 3 months of production use (Historical Archive) with no serious problems

Though, Sandy & Carl, I’d be grateful for some minutes of your time!!!

DO Prototypes: a realization of Content Models in OO terms, implemented on top of FDOM to handle DO Typing issues automatically

Detailed report on Pergamos to appear…

September 28, CopenhagenEuropean FEDORA User Meeting 2005

Thank You Questions? Comments? For details:

"On the Effective Manipulation of Digital Objects: A Prototype-based Instantiation Approach"Kostas Saidis, George Pyrounakis, Mara Nikolaidou, Proc. 9th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2005, Vienna, Austria, September 2005

email: [email protected]