DEF System Architecture XML Web Services Fedora and the Zebra Search Engine in an OAI Eprints...

Preview:

Citation preview

DEF System ArchitectureXML Web Services

Fedora and the Zebra Search Engine in an OAI Eprints Application

by Gert Schmeltz Pedersen, DTVgsp@dtv.dk - +45 4525 7244

2

Contents

1. XML Web Services and the 3-tier Architecture2. The DEF Eprints Service3. DEF-XWS Eprints4. Generic Search Service5. Repository Federation

3

DEF-XWS project suite

"XML Web Services and the 3-tier architecture"

a project suite within the programme area System Architecture at Denmark's Electronic Research Library (DEFF) (http://defxws.cvt.dk)

a collaboration with The Royal Library, The State and University Library, Aarhus Business School Library a.o.

Get web services hands-on and experience.

Get Fedora hands-on and experience.

Use Fedora to implement a web service version of

DEF Eprints - International eprints metadata harvested from Open Archives, a DEF project carried out at DTV.

Add full text indexing and retrieval.

4

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Data base

Common service

Local Portal

Common service

Common service

Local service Local service Common service

Central Portal

Webbrowser

DEff 3-tier Service Oriented Architecture

5

The DEF Eprints ServiceArchitecture of the DEF Eprints Service Provider

OAIManager

Full set

Sub set

Librarian

DEF Portal User

OAIHarvester

Open Archives Initiative Data Providers

MYSQL

Z39.50

OAI-PMH

Eprint Service Provider

Zebraserve

r

Web UIw/

Z39.50

InfoNetUserZebra

server

Web UIw/

Z39.50

EXPORT

Protocol for Metadata Harvesting

6

DEF-XWS Eprints

OAIManager

Full set

Sub set

Librarian

DEF Portal User

OAIHarvester

Open Archives Initiative Data Providers

MYSQL

Z39.50

OAI-PMH

Eprint Service Provider

Zebraserve

r

Web UIw/

Z39.50

InfoNetUserZebra

server

Web UIw/

Z39.50

EXPORT

Fedora server

Zebra server

Full text retrieval

Batch ingest

EXPORT

AppXYZ User

DEF-XWS Eprints User

DEF-XWS Eprints User

SOAP/REST

Web UIw/SOAP

javaWeb UIw/REST

php

AppXYZw/SOAP

perl

7

DEF-XWS EprintsZebraForFedora, a module for Fedora (http://www.indexdata.dk/zebra)

Purpose: to obtain powerful text index and search functionality and performance.

The original text index and search functionality in Fedora is simple SQL on a table, where DC element texts are stored in fields.

ZebraForFedora is a set of Java classes that deploys over existing Fedora and Zebra installations by the running of an Ant target.

In the Fedora configuration file:

<module role="fedora.server.search.FieldSearch" class="dk.defxws.eprints.fedora.server.search.FieldSearchZebraModule"><comment>Instead of fedora.server.search.FieldSearchSQLModule</comment> <datastore id="zebra"> <comment>Zebra server</comment> <param name="host" value="defxws.cvt.dk"/> <param name="port" value="9395"/> </datastore>

8

DEF-XWS Eprints

9

Purpose achieved Fedora hands-on and experience web services hands-on and experience DEF-XWS Eprints available from web services

http://defxws.cvt.dk:8082/fedora/access/soap?wsdl http://defxws.cvt.dk:8082/fedora/accessDEF-XWS/soap?

wsdl ready for 3-layered system architecture

applications combining many web services

Lesson Do not override field search, provide generic search service instead ...

DEF-XWS Eprints

10

Generic Search Service

• Core Fedora Repository Service• new services are deployed as web applications (.war files), with a

configuration file.• The Generic Search Service shall be a webapp, configurable to use

an existing Fedora repository and an existing installation of an indexing and searching engine, like Zebra, Lucene, and others.

• Functionality to be decided by a working group of Fedora users and developers.

Generic

Lucene

Zebra ...

11

Generic Search Service

•preliminary analysis of what has been done by others already, approaches and issues people have taken in the following areas

a.  what kinds of search engines?b.  how is indexing done and how is it kept up to date?c.  configuration options?  How can you specify what datastreams/disseminations

to index?d.  what interfaces for doing searches?e.  how do you deal with security in terms of the service interacting with Fedora?f.   what are problems with current approaches?g.  what would be desirable in a generic search service that would be delivered

with Fedora?

•gathering of requirements and issues for moving towards a reference implementation

- ZebraForFedora may serve as a reference implementation

•from a broader perspective, how to deal with search for federations of repositories

- P2P search in EU project Alvis may be relevant

•things that the Fedora Dev Team might need to do for new services in the Framework:

- a notification/messaging module in the core Fedora repository service so that other services can find out when objects are added or changed.

- how the services run securely with Fedora, a Basic Auth approach is used now

12

Repository Federation

Idea under elaboration:

Fedora as Superpeer in an ALVIS peer-to-peer system

13

DEF-XWS

Thank you!

14

future

DEF-XWS Pilot Web Service-Oriented Architecture

Graphics from Web Services: A Manager's Guide, by Anne Thomas Manes, Addison-Wesley, 2003

DEF-XWS PilotDEF-XWS Pilot

Java Eprint WSphp Test UI

Java Test UISimple Object Access Protocol

or RESTRepresentational State Transfer

Web Services Description Language

http://host/fedora/ws/soap?wsdl

DEF-XWS Eprints

Recommended