25
SRW/U for DSpace Ralph LeVan Research Scientist

SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

Embed Size (px)

Citation preview

Page 1: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

SRW/U for DSpace

Ralph LeVanResearch Scientist

Page 2: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

What is SRW/UWhat is SRW/U

• A Pair of HTTP-based Text Query Protocols

– SRW: Search and Retrieve Web Service

– SRU: Search and Retrieve URL Service

• An alternative to Z39.50

Page 3: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The Weaknesses of Classic Z39.50The Weaknesses of Classic Z39.50

• Not popular with the Web community

– Connection-based Sessions

– Binary Encoding

– Transmitted directly over TCP/IP

• Complicated

Page 4: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The Strengths of Classic Z39.50The Strengths of Classic Z39.50

• Result Sets (a.k.a. Statefulness)

• Abstraction

– Abstract Access Points (Attribute Sets)

– Abstract Record Schemas

• Explain

Page 5: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

SRW: Search and Retrieve Web ServiceSRW: Search and Retrieve Web Service

• SOAP (Simple Object Access Protocol) Based

– HTTP

– XML

• Records Described in WSDL (Web Service Description Language)

• 3 Services: SearchRetrieve, Scan and Explain

Page 6: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

SRW: The BasicsSRW: The Basics

• Only one database per request

• String (not structure) based queries

• Index Sets, not Attribute Sets

• One Record Syntax (XML)

Page 7: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The Explain RequestThe Explain Request

• An empty request

– E.g. http://alcme.oclc.org/srw/search/SOAR

Page 8: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The Explain ResponseThe Explain Response

• A description of the database

• A list of the supported indexes

• A list of the supported record schemas

Page 9: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The SearchRetrieve RequestThe SearchRetrieve Request

• String CQL Query

• Integer StartRecord

• Integer MaximumRecords

• String RecordSchema

http://alcme.oclc.org/srw/search/SOAR?query=dog

Page 10: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The SearchRetrieve ResponseThe SearchRetrieve Response

• ResultSetReference

– String resultSetName

– Integer resultSetTimeToLive

• Integer numberOfRecords

• Records

• Diagnostics

Page 11: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

CQL: Common Query LanguageCQL: Common Query Language

• Loosely based on CCL Search

• Boolean & Proximity Operators

• Index Sets & Indexes

• Truncation Characters ‘*’, ‘#’ & ‘?’

• Example:

dc.title=“harry potter” or bib1.isbn=123-456-78x

Page 12: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The Scan RequestThe Scan Request

• String CQL scanClause

• Integer maximumTerms

• Integer responsePosition

http://alcme.oclc.org/srw/search/SOAR?operation=scan&scanClause=dog&maximumTerms=3&responsePosition=3

Page 13: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

The Scan ResponseThe Scan Response

• Terms

– A term for searching

– Possibly a term for displaying

– The number of records retrieved by the term

• Diagnostics

Page 14: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

Using SRUUsing SRU

• Send the URL and get the responseBufferedReader in = new BufferedReader(

new InputStreamReader(new URL(“http://alcme.oclc.org/srw/SOAR?query=dog”) .openStream()));

String inputLine=null, response;

StringBuffer content=new StringBuffer();

while((inputLine=in.readLine())!=null) content.append(inputLine);

response=content.toString();

Page 15: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

Using SRUUsing SRU

• Parse the response using String methods

int i=response.indexOf(“<numberOfRecords>”, j=response.indexOf(“</numberOfRecords>”), count=Integer.parseInt(response.substring(i+17, j);

Page 16: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

Using SRUUsing SRU

• Parse the response using DOM classes

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

DocumentBuilder builder = factory.newDocumentBuilder();

Document document = builder.parse(new InputSource(new StringReader(record)));

Page 17: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

Using SRWUsing SRW

• Get WSDL from server or LOC

http://alcme.oclc.org/srw/search/SOAR?wsdl

or

http://www.loc.gov/z3950/agency/zing/srw/srw-sample-service.wsdl

Page 18: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

Using SRWUsing SRW

• Convert WSDL to code

java org.apache.axis.wsdl.WSDL2Java --server-side --skeletonDeploy true srw-sample-service.wsdl

Page 19: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

Using SRWUsing SRW

• Write Client

SRWSampleServiceLocator service=new SRWSampleServiceLocator(); URL url=new URL("http://alcme.oclc.org/srw/search/SOAR"); SRWPort port=service.getSRW(url); SearchRetrieveRequestType request=new SearchRetrieveRequestType(); request.setQuery(“dog"); SearchRetrieveResponseType response= port.searchRetrieveOperation(request); int postings=response.getNumberOfRecords());

Page 20: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

DSpace ImplementationDSpace Implementation

• Reads list of Lucene indexes from SRWDatabase.props

• Converts CQL queries to Lucene queries

• Gets Dublin Core record from database

Page 21: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

InstallationInstallation

• Get the SRW.war file from http://www.oclc.org/research/software/srw

• Start tomcat (to unpack the .war file)

• Edit the SRWServer.props configuration file

• Copy the SRWDatabase.props file to your DSpace/config directory

• Restart tomcat

• http://yourserver/SRW/search/DSpace

Page 22: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

SRWServer.propsSRWServer.props

# parameters for the SRW Servlet

SRW.Home=d:/Apache Tomcat 4.1/webapps/SRW/

default.database=DSpace

resultSetIdleTime=300

db.DSpace.class=ORG.oclc.os.SRW.SRWLuceneDatabase

db.DSpace.home=d:/dspace/dspace-1.1/

db.DSpace.configuration=config/SRWDatabase.props

Page 23: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

ExamplesExamples

• http://alcme.oclc.org/srw/search/GSAFD?

• http://alcme.oclc.org/srw/search/SOAR?

• http://alcme.oclc.org/srw/search/NDL?

Page 24: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

LinksLinks

• http://www.loc.gov/srw

• http://www.loc.gov/z3950/srutest.html

• http://www.oclc.org/research/software/srw

• http://staff.oclc.org/~levan/docs/SRWforDSpace.ppt

Page 25: SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:

&QuestionsAA

nswersnswers