38
Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Libra DASER Summ November 22, 20

Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Embed Size (px)

Citation preview

Page 1: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

FedoraTM and Repository Implementation at UVa

Leslie Johnston, UVa LibraryDASER Summit

November 22, 2003

Page 2: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

FedoraTM History• Research (1997-present) :

– DARPA and NSF-funded research project at Cornell University Digital Library Research Group.

– Reference implementation developed at Cornell.

• First Application (1999-2001) : – University of Virginia Library Digital Library Research and

Development prototype.– Scale/stress testing for 10,000,000 objects.

• Open Source Software (2002-present): – Andrew W. Mellon Foundation granted Virginia and Cornell $1 million

to develop a production-quality Fedora system.– Fedora 1.0 released in May 2003.

Page 3: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

What is FedoraTM?• Fedora is a Digital Asset Management architecture, upon which

many types of Digital Library systems might be built.

• Fedora is based on object models that represent data objects (units of content) or collections of data objects.

• The objects contain linkages between datastreams (internally managed or external media files), metadata (inline or external), and behaviors that are themselves code objects and link to disseminators (processes, mechanisms, and external software). A data object subscribes to a pair of behavior objects

• Object models can be thought of as containers that give a useful shape to information poured into them; if the information fits the container, it can immediately be used in predefined ways.

Page 4: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

FedoraTM Data Object Components• Datastreams – represent content and metadata.• PID – persistent identifier, unique to the

Repository.• System Metadata – metadata that the Repository

keeps.• Disseminators – bindings to objects that can

deliver software processes that can be used with the datastreams.

Page 5: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

FedoraTM Data Objects

Persistent ID (PID)

Default Disseminator

System Metadata

Datastream (item)

Datastream (item)

Datastream (item)

Extension

Extension

PID = uva-lib:100

Default Disseminator

System Metadata

Image (mrsid)

DC (xml)

Thumbnail (jpeg)

Image Disseminator

Digital object identifier

Service view: methods for disseminating content

Internal view: key metadata necessary to manage the object

Content view: Set of data and metadata items

Page 6: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Persistent ID (PID)

Behavior DefinitionMetadata

SystemMetadata

DatastreamsData Object

Persistent ID (PID)

Service BindingMetadata (WSDL)

SystemMetadata

DatastreamsWeb

Service

behavior contract

behavior

subscriptio

n

data contract

Persistent ID (PID)

Disseminators

Datastreams

System Metadata

Behavior Mechanism Object

Behavior Definition Object

Page 7: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

FedoraTM Service Interfaces• Management Service (API-M)

– Ingest - XML-encoded object submission– Create - interactive object creation via API requests– Maintain - interactive object modification via API requests– Validate – application of integrity rules to objects– Identify - generate unique object identifiers– Security - authentication and access control– Preserve - automatic content versioning and audit trail– Export - XML-encoded object formats

• Access Service (API-A and API-A-LITE)– Search - search repository for objects– Object Reflection - what disseminations can the object provide?– Object Dissemination - request a view of the object’s content

• OAI-PMH Provider Service– OAI-DC records

Page 8: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

FedoraTM

Distribution Package• Open Source (Mozilla Public License)• 100% Java (Sun Java J2SDK1.4)• Supporting Technologies

– Apache Tomcat 4.1 and Apache Axis (SOAP)– Xerces 2-2.0.2 for XML parsing and validation– Saxon 6.5 for XSLT transformation– Schematron 1.5 for validation– MySQL and Mckoi relational database– Oracle 9i support

• Deployment Platforms– Windows 2000, NT, XP– Solaris– Linux

Page 9: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

What FedoraTM Is Not• Fedora is not finished – the development process is

only half way complete. – Version 1.2 releases on December 10, 2003.

– The scheduled date for implementation of all features outlined in the grant-funded project is early 2005.

• Fedora is the underlying architecture for a digital repository, not a complete management, indexing, discovery, and delivery application.

• Fedora by itself is not the UVa Library's Digital Library system - Fedora is the "plumbing" for our first phase production Central Digital Repository.

Page 10: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Process for Repository Development

• Fedora developers met with content and format specialists, application developers, and user service librarians to understand what media files we have and how our users expect to find them and use them.

• Priorities were set for phased development and content migration by format type:– First Phase: Electronic Texts, EAD, and Images– Second Phase: Datasets and GIS– Third Phase: Digital Audio and Video

Page 11: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Process for Repository Development

• Specifications were set for:– Datastreams (formats, variation in deliverables

[EAD vs. TEI vs. Ebooks, page images vs. documentary images])

– Metadata – Discovery functionality and interface (simple and

advanced searching, metadata vs. full-text searching, presentation of results sets, etc.)

– Delivery (must support static and on-the-fly file delivery, and varied end user download and printing requirements)

Page 12: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Repository Prototype

• A prototype discovery interface was released for review by Library staff during summer 2003.

• Almost 150 comments on functionality, user interface, and proposed additional features were collected.

• The comments were collated into categories which were prioritized by Library department heads, user services staff, and developers for implementation into a first release, scheduled for early 2004.

Page 13: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Proposed Searching ServicesD is co v e ry

S e a rchI n te rfa ce

O PA CD ig ita l D is co v e ry

I n de x

M o de rnEn g lis hI n de x

A rt a n dA rch ite ctu re

I n de x

Fin din gA ids

I n de x

PID

DIssem inators

System Metadata

De sc M e tadata

Admin M e tadata

TEI F ile

PID

DIssem inators

System Metadata

De sc M e tadata

Admin M e tadata

GDM S F ile

PID

DIssem inators

System Metadata

De sc M e tadata

Admin M e tadata

EAD F ile

Fu ll- t e x tS e a rch

I n te rfa ce

A rt a n d A rch .S e a rch

I n te rfa ce

Fin din g A idsS e a rch

I n te rfa ce

Page 14: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 15: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 16: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 17: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 18: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 19: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 20: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 21: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 22: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 23: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 24: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 25: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 26: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 27: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 28: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 29: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 30: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 31: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003
Page 32: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Issues - Standards• Collate, standardize, and document in-house production standards.

– Slide and photograph scanning; Book page scanning; and Full-text markup

< http://www.lib.virginia.edu/digital/reports/best_practices.html>

• Develop UVa DescMeta XML element set, and document minimum metadata elements and best use practices.<http://www.lib.virginia.edu/digital/reports/metadata.html>

<http://www.lib.virginia.edu/digital/reports/DLMRPGroupReport.htm>

• Develop the General Descriptive Modeling Scheme (GDMS) XML encoding standard to describe complex, structured collections.<http://www.lib.virginia.edu/digital/resndev/gdms.html>

• Recommend the in-house standards for faculty with digitization projects through our consulting services.– Born digital faculty projects are selected for collection by the Library,

assuring a smoother collection process.

Page 33: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Issues – Authoring Tools• User Collection Tool

– Web-based database for the organization and annotation of personal media collections.<http://iris.lib.virginia.edu/dmmc/collectiontool/>

• GDMS Tool– XML authoring tool to create documents using a locally defined XML

encoding standard to represent structured collections of images and metadata.<http://www.lib.virginia.edu/digital/resndev/gdms.html>

• A Data Workbench is planned to create relationships between objects and prepare files for ingest into the Repository.

• A Scholarly Object Workbench is planned for faculty to use in creating their research and instructional resources in formats that can be more easily collected by the Library.

Page 34: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Upcoming – Modeling Virginia

• Collaboration between Systems Engineering, Environmental Sciences, and the Library.

• Weather datasets, traffic datasets, and the 2000 census.– Proof-of-concept – Hampton Roads area.

– Applying for funding for the entirety of Virginia.

• Will drive the development of object models and disseminators for discovery and download of variables across datasets with DDI codebooks.

Page 35: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Upcoming – Aggregation Objects

• On-the-fly collection objects where the content data stream contains rules, formatted as XQuery or XPath statements, rather than explicit collection relationships.

• Child objects of the collection are assembled at dissemination time.

• Disseminators can include such functions as building a full-text index, rendering a search page, etc.

Page 36: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Upcoming – FedoraTM 1.2• Open Fedora APIs

– Repository as web services (REST and SOAP bindings); WSDL interface defs

• Flexible Digital Object Model– Content View: objects as bundle of items (content and metadata)– Service View: objects as a set of service methods (“behaviors”)– Extensible functionality by associating services with objects

• Repository System– Core Services: Management, Access/Search, OAI-PMH– Storage: XML object store; relational db object cache; relational db object registry– Mediation - auto-dispatching to distributed web services for content transformation– Auto-Indexing – system metadata and DC record of each object– HTTP Basic Authentication and Access Control– Built-in disseminator services: XSLT x-form, image manipulation, xml-to-PDF

• Content Versioning– Automatic version control (saves version of content/metadata when modified)– Enables date-time stamped API requests (see object as it looked at a point in time)

• Clients– Fedora Administrator: GUI client to create/maintain objects– Default Web browser interface: search; access objects via default disseminator– Command line utilities (batch load, ingest, purge, others)– Migration Utility – mass export/ingest

Page 37: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

FedoraTM December 2003-January 2005• Fedora Object XML (FOXML)

– Internal storage format; direct expression of Fedora object model– Better support for relationships (“kinship” metadata)– Better support for audit trail (event history)– Format identifiers for dynamic service binding

• Shibboleth authentication• Policy Enforcement

– XACML expression language– Fedora policy enforcement module

• Web interface for easy content submission• Batch object modification utility• Administrative Reporting• Object Event History (ABC/RDF disseminations)• Better support for “collections”• New ingest and export formats (METS1.3, DIDL)

Page 38: Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003

Contact Information

www.fedora.info

www.lib.virginia.edu/digital/