View
26
Download
0
Category
Tags:
Preview:
Citation preview
Introduction
History FEDORA Overview Object Oriented Principals LC’s Requirements LC’s Architecture Review
FEDORA History
Continuing Research Project – Cornell 1997
Prototype Application– University Virginia
Fedora 1.0– Open Source Release 2002
Fedora 1.2 – Tomorrow!
Options, options, options
Very few tools directly compete with each other
Many tools can be used to accomplish similar behavior
Many tools fulfill parts of the functionality needed for a repository
Roll your own solution
Why Fedora?
Repository Architects & Developers Excited
Object oriented approach to digital objects
Open Source Project– Funded development (and support)
Java Based– Multiple HW Platforms
Flexible
Integrates well with existing systems– CGI Scripts– Web Services
Leaves most decisions to implementers
Extensible
Again, no product can do it all– Imaging, Audio, Transformations,
Courseware Easy to add new functionality to objects Embraces web services Open API’s
– Access– Management
Digital Object What is the definition of a digital object?
–Documents, such as articles, preprints, working papers, technical reports, conference papers –Books –Theses –Data sets –Computer programs –Visualizations, simulations, and other models
–Multimedia publications –Administrative records –Published books –Bibliographic datasets –Images –Audio files –Video files –Reformatted digital library collections –Learning objects –Web pages
list taken from the dspace.org website
Object Oriented A software design method that models the
characteristics of abstract or real objects using classes and objects.
Proven Techniques for Software Development– Requirements gathering – Use Cases
• Developers speak to librarians and other stakeholders
Facilitates reuse of functionality Design Patterns Not hacking Perl Scripts to make an
institutional repository
Object Oriented
Data– Metadata
• MODS – Descriptive• METS – Structural• MIX, etc – Technical
– Bit streams• Actual Files – JPG, TIF, WAV, MP3, TEI, EAD
Methods (Behaviors)– Do stuff with the data
Object Oriented Concepts
Classes– Objects of the same type belong to a class
Interfaces– A contract defining behaviors a class of objects will
implement
Encapsulation– Behaviors operate on the data in an object
Reflection– Discover what interfaces and behaviors an object
implements
Image Objects
Two File Image Object– Data
• Hi Resolution Version: tif• Low Resolution Version: jpg
MrSID File Image Object– Data
• MrSID File
Basic Image Interface Implementations
Two File Image Object– getHighResolutionTIF
• returns high resolution TIF
– getLowResolutionJPG• returns low resolution JPG
MrSID Image Object– getHighResolutionTIF
• processes the MrSID file to return a high resolution TIF file of the image
– getLowResolutionJPG• processes the MrSID file to return a low resolution JPG of
the image
Sheet Music Object
Data– MODS Metadata– Images of the pages (Image Objects)– TEI encoded text of the lyrics (TEI Objects)
Behaviors– getPageImage(Pagenumber)
• Invoke the getLowResolutionJPG to return the image!
– getMODS– getLyrics
Persistent ID (PID)
Behavior DefinitionMetadata
SystemMetadata
DatastreamsData Object
Persistent ID (PID)
Service BindingMetadata (WSDL)
SystemMetadata
Datastreams
Persistent ID (PID)
Disseminators
Datastreams
System Metadata
Behavior Mechanism Object
Behavior Definition Object
FEDORA’s Interface Implementation
graphics taken from presentations available at www.fedora.info
What is FEDORA?
“Plumbing” Manage associations between objects
and their interfaces Invoke behaviors from an interface
which an object subscribes Manages or references files
What FEDORA currently does not do?
“Digital Library in a Box”– Requires integration and custom
development
Prescribe the right way to do things– Implementers are free to choose– Best practices still being fleshed out
LC’s Requirements
Complex Digital Objects– Structurally
• METS structMap
– Rich descriptive metadata• Exploiting MODS features
– relatedItem
Choosing Repository Software
Fedora provides a foundation to build on
LC member of initial deployment team No other software is like FEDORA
– Except general purpose programming languages
How LC is implementing FEDORA
Types of Digital Objects– Sheet Music– Scores– Sound Recordings– Compact Discs– Manuscripts– Photographs– Websites– “Collections”
Less emphasis – Intellectual output of university’s research faculty
METS Profiles
Correlates well with classes of objects Articulates
– Structure of an object– Metadata requirements
METS documents conforming to profiles are ingested into repository– Atomization– Behavior association
Architecture
Fedora (Repository) Cocoon (Application Layer)
FedoraRepository
System
web browser
cocoon
Fedora Service APIs
user
SIP vs AIP
Complex digital objects are atomized into small reusable objects upon ingest to FEDORA– Sheet Music METS Profile (SIP)
• Sheet music object (AIP)– Structural metadata encoded in METS – Descriptive encoded in MODS
• Image objects for each page (AIP)– TIF and JPG Files– Technical encoded in MIX
• TEI object for the lyrics (AIP)– TEI File
Why this Architecture? Clean Separation of Concerns
– Logic: Makes it go!– Content: From FEDORA– Style: Web Designers
Object not bound to display – Repository is for preservation of metadata and
files not markup (HTML)– Markup accomplished in cocoon layer
Leverage use of METS structural metadata Performance: Cocoon Caching
User Interface Development
Web Designers– Relate to objects and behaviors– Can develop in HTML for display– XSLT
• Uses XML from repository to drive display
Other Pieces of the Repository Puzzle
Other open source tools– Cocoon
• XML Publishing Framework
– Lucene• Text Indexing and Search API
Someone has to write software!– Java to build Lucene indexes– XSP searching – More XSLT than you want to see
Digital Object Production
How are we building these digital objects?– MySQL – Cocoon– XSLT– Homegrown Java
• Technical metadata extraction
Cocoon
XML Publishing Framework (Toolbox)– Generate
• From files (or URLS)• From databases• From code (XSP, JSP, PHP)
– Transform• XSLT
– Serialize• XML, HTML, PDF, SVG, MIDI?
– Caching
XSLT
Philosophy– Get data into XML as early in the workflow
as possible
Flexibility– Easy to change logic in XSLT– No need to recompile
Performance Issues
Resources Needed for FEDORA (Cheap)
Hardware Requirements– Minimal for experimentation
• Installs on Windows PC• Packaged to get up and running quickly• Demo set of objects
– Scales with hardware in a production environment
Resources Needed for FEDORA (Expensive)
1 or More Developers– 1: Kick the tires– or More: Real production
Application Architects Requirement Analysts Subject Matter Experts
– Articulate requirements• Object Structure• Descriptive Metadata
Who
Institutions with resources to do software development
Unique requirements for digital library software – Preexisting tools do not fit the need
Need for integration of existing systems into one management infrastructure
What
Digital Library Plumbing Very general purpose
– Use it to build almost any digital library application
Why
Robust Set of tools to build YOUR repository
User support high from FEDORA development team
Smart people working on hard problems
Recommended