29
A Web Services Approach for Search and Retrieve The Next Generation Z39.50 ess 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen <[email protected]> School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603

A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Embed Size (px)

Citation preview

Page 1: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

A Web Services Approach for Search and Retrieve The Next Generation Z39.50

Access 2004, October 13-16, 2004, Halifax, Nova Scotia

William E. Moen<[email protected]>

School of Library and Information SciencesTexas Center for Digital Knowledge

University of North TexasDenton, TX 72603

Page 2: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 2

Overview

Quick description of SRW Brief background – historical, political,

conceptual Non-technical (almost) introduction to SRW Common Query Language (CQL) briefly Concluding thoughts

Page 3: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 3

What is SRW? Search and Retrieve Web Service (SRW) An XML-based protocol for searching, retrieving,

and other information retrieval transactions Cast in the standards/technologies for web

services XML SOAP HTTP

Brings the concepts and experience of Z39.50 into the web environment using web technologies

Page 4: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 4

Why SRW?

Genesis: several years of soul searching by Z39.50 developers and implementors

The “web” had become the common implementation environment

Z39.50 was not perceived as web friendly Pivotal moments:

December 2000 ZIG meeting July 2001 meeting

Page 5: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 5

Turning point: December 2000 “Z39.50 Future” discussion Perceptions of Z39.50

broken heavy-weight difficult and complex old technology not web friendly

Several options presented Rewrite the protocol from the ground up Rewrite as an XML protocol Separate the Z39.50 protocol from its use of BER as a wire

protocol Simplify the protocol specifications to focus on core features

Recognition of the intellectual contribution of Z39.50

Page 6: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 6

Taking action: June 2001 Invitational meeting to discuss moving Z39.50 to an XML-

based protocol Goal

Lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50, discarding those aspects no longer useful or meaningful.

Objective Define specifications for a new web service definition based on

Z39.50 together with web technologies Separate the Z39.50 abstract and associated semantic model

from its specific encoding and wire protocol (i.e., ASN.1/BER and TCP/IP)

Initially called Z39.50 Next Generation (ZNG) Intended as proof-of-concept Defining only those protocol specifications that would

actually be implemented by participants

Page 7: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 7

ZING – Z39.50 International Next Generation

Make intellectual/semantic content of Z39.50 more broadly available

Make Z39.50 more attractive by lowering barriers to implementation Use of XML – to represent and encode data Use of HTTP – for transport Use of SOAP – for interaction between client and

server Several ZING initiatives: ZOOM, ez39.50, ZeeRex,

SRW/U

FOR MORE INFORMATION, VISIT THE ZING WEBSITE…

http://www.loc.gov/z3950/agency/zing/

Page 8: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 8

SRW/U, SRW, SRU SRW/U: Search and Retrieve for the Web

General designation for this initiative SRW: Search and Retrieve Web Service

XML messages Simple Object Access Protocol (SOAP) HTTP Post

SRU: Search and Retrieve URL Service Request parameters included in URL syntax HTPP Get

Development Version 1.0 November 2001 Version 1.1 February 2004

FOR MORE INFORMATION, VISIT THE SRW WEBSITE…

http://www.loc.gov/srw

Page 9: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 9

Networked information retrieval

What’s needed: Identifying a target to search A vocabulary for expressing search requests,

search criteria, retrieval requests, etc. Methods to encode the requests and

responses from the target Methods to transport the requests and

responses across a network In other words, a protocol and supporting

specifications

Page 10: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 10

Abstract Model of IR

Page 11: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 11

Abstract model of Z39.50

Page 12: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 12

Z39.50 classic & SRW

Page 13: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 13

SRW Overview

Builds on Z39.50 concepts and web technologies

Web technologies: XML, SOAP, HTTP Uses new, human-readable query

language Combines several Z39.50 features into

several “operation types” searchRetrieve operation scan operation explain operation

Page 14: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 14

searchRetrieve operation

The core of the protocol Expresses the search and additional criteria Records are returned in XML

Request parameters version query Optional parameters

• sortkeys• recordPacking• recordSchema• recordXPath• stylesheet

Response parameters version numberOfRecords Optional parameters

• resultSetID• resultSetIdleTime• records• diagnostics

Page 15: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 15

SRW & XML

XML as foundation for protocol Provides syntax for intelligent markup Defines or references XML schemas Example XML schema for SRW

specifications searchRetrieveRequest searchRetrieveResponse

Page 16: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 16

searchRetrieveRequest example

XML document is sent to the server Using SOAP to wrap the request Sent as a HTTP Post

<searchRetrieveRequest> <version>1.1</version> <query>dc.title all "Squirrel Hungry"</query> <maximumRecords>1</maximumRecords> <startRecord>1</startrecord> <recordSchema>dc</recordSchema> </searchRetrieveRequest>

Page 17: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 17

searchRetrieveResponse

Records returned in response All records in XML syntax According to one or more XML schemas

(semantics) Dublin Core Onix MODS MarcXML

Page 18: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 18

searchRetrieveResponse example

<searchRetrieveResponse> <version>1.1</version> <numberOfRecords>10</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc-

v1.1</recordSchema> <recordData> <dc:record> <dc:title>Squirrel is Hungry</dc:title> </dc:record> </recordData> </record> </records> </searchRetrieveResponse>

Page 19: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 19

searchRetrieve example

Retrieval results XML view Screen shot

<searchRetrieveRequest> <version>1.1</version> <query>dc.title computer</query> <startRecord>1</startrecord> <maximumRecords>10</maximumRecords> <recordPacking>xml</recordPacking> <recordSchema>dc></recordSchema></searchRetrieveRequest>

Page 20: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 20

SRW results

Page 21: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 21

SRU briefly Protocol requests can be carried via HTTP Get searchRetrieveRequest parameters expressed in

standard URL syntax baseURL and search part separated by question

mark “?” Response is XML document containing records The searchRetrieveRequest in SRU:

http://alcme.oclc.org/srw/search/SOAR?operation=searchRetrieve&version=1.1&query=dc.title=%22computer%22&recordSchema=DC&startRecord=1&maximumRecords=10&recordPacking=xml

Eric Lease Morgan’s Journal Locator Use of “extra data parameters” allow implementers to add additional functionality

Page 22: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 22

search/Retrieve query

SRW query consists of one or more query statements linked by Boolean operators

Five categories of query statements:1. single search clause

2. two or more search clauses linked by Boolean

3. search clauses and result sets linked by Boolean

4. two or more result sets linked by Boolean

5. single result set

Expressed in the Common Query Language (CQL)

Page 23: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 23

Common Query Language (CQL) A formal language for representing queries to information retrieval

systems Simple free text Complex Boolean, proximity

Human-readable Search clause

Always includes a term• simple terms consist of one or more words

May include index name• To limit search to a particular field/element• Index name includes base name and may include prefix

• title, subject• dc.title, dc.subject

Several index sets have been defined dc bath cql

Context sets in SRW define the available indexes for a particular application and additional query specifications (e.g., relation operators)

Legend of the Five Rings Database

Page 24: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 24

Other components of CQL Relation

<, >, <=, >=, =, <> exact used for string matching all when term is list of words to indicate all words must be found any when term is list of words to indicate any words must be

found Boolean operators: and, or, not Proximity (prox operator)

relation (<, >, <=, >=, =, <>) distance (integer) unit (word, sentence, paragraph, element) ordering (ordered or unordered)

Masking rules and special characters single asterisk (*) to mask zero or more characters single question mark (?) to mask a single character carat/hat (^) to indicate anchoring, left or right

Page 25: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 25

CQL examples Simple queries:

dinosaur "the complete dinosaur"

Boolean dinosaur and bird or dinobird "feathered dinosaur" and (yixian or jehol)

Proximity foo prox bar foo prox/>/4/word/ordered bar

Indexes title = dinosaur bath.title="the complete dinosaur" srw.serverChoice=dinosaur

Relations year > 1998 title all "complete dinosaur" title any "dinosaur bird reptile" title exact "the complete dinosaur"

Page 26: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 26

SRW & classic Z39.50 SRW

No explicit concept of connection, session, or state

Results sets named by server

Single record syntax (XML), multiple schemas

String (i.e., human-readable) queries CQL

Named indexes

Classic Z39.50 Stateful Results sets named by

client Multiple record syntaxes No human-readable query

language Type 1 query using attribute

sets Use attribute to identify

access point

Z39.50 Concepts Retained Result sets Abstract access points

Abstract record schemas Explain Diagnostics

Page 27: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 27

What problems does SRW solve Addresses need for standards-based searching

in the networked environment Shows the vitality of the Z39.50 concepts and

implements those in a web services & URL access context

Offers database providers with a web-friendly method for offering standards-based searching of resources

Provides low barrier to entry solution using commonly available technologies

XML format of records provide for more reuse, and more interesting use of resources

Page 28: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 28

Possible implementation venues

Gateways to existing Z39.50 servers Lightweight SRW/U servers to specialized

databases Standard search interface for OAI service

providers and institutional repositories Cost-effective search access to

commercial databases (e.g., citation, full-text)

Metasearching Beyond libraries to many other information

communities

Page 29: A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School

Moen Access 2004 -- October 13–16, 2004 -- Halifax, Nova Scotia 29

References Z39.50 International Next Generation – ZING

http://www.loc.gov/z3950/agency/zing/

Search and Retrieve for the Web – SRW/U http://www.loc.gov/srw

A Gentle Introduction to SRW http://www.loc.gov/z3950/agency/zing/srw/introduction.html

A Gentle Introduction to CQL http://zing.z3950.org/cql/intro.html

An Introduction to the Search/Retrieve URL Service (SRU) by Eric Lease Morgan in Ariadne (July 04) http://www.ariadne.ac.uk/issue40/morgan/

Search and Retrieval in The European Library: A New Approach by van Veen and Oldroyd in D-Lib (Feb 04) http://www.dlib.org/dlib/february04/vanveen/02vanveen.html