35
A centre of expertise in digital information management www.ukoln.ac.u k UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations Seminar London, Friday 2 December 2005 Pete Johnston Research Officer, UKOLN, University of Bath www.bath.ac.u k

A centre of expertise in digital information management UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Embed Size (px)

Citation preview

Page 1: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

UKOLN is supported by:

Is MetasearchingReally Better Searching?

STM Innovations SeminarLondon, Friday 2 December 2005

Pete Johnston

Research Officer, UKOLN, University of Bath

www.bath.ac.uk

Page 2: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Is Metasearching Better Searching?

• What is metasearch?• Making metasearch work

– The NISO Metasearch Initiative

• Metasearch today– Metasearch and Google– Metasearch and "social bookmarking"

Page 3: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

What is metasearch?

Page 4: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

What is metasearch?

“Metasearch, parallel search, federated search, broadcast search, cross-database search, search

portal are a familiar part of the information community's vocabulary.

They speak to the need for search and retrieval to span multiple databases, sources,

platforms, protocols, and vendors at one time.”

NISO MetaSearch initiativehttp://www.niso.org/committees/MS_initiative.html

Page 5: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

The search problem

• User wants to find, access, and use items made available by multiple content providers

• Content providers make their collections available through their own separate “presentation services”

• User interacts with multiple services in succession, e.g.– Query Resource Discovery Network (RDN) for

Web resources– Query Zetoc for journal articles– etc

Page 6: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

The search problem

Web Sites

Page 7: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

The search problem

• User has to– Discover different services– Manage different authentication/access

requirements– Use different user interfaces for search– Interpret different result sets

• different metadata– Manipulate different result sets

• human-readable (HTML)• but difficult to merge, reuse

• May still not have access to (appropriate copy of) resource

Page 8: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

The metasearch solution

• The provision of "metasearch" services that

– enable user to search across the metadata databases of multiple content providers from a single interface

– manage multiple result sets and present to user– manage authentication/access– (etc!)

• Seamless (to the user) discovery of and access to heterogeneous, distributed resources!

Page 9: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Approaches to metasearch (1): cross-searching • Metasearch service accepts user query• Sends query to multiple content provider

search targets• Receives responses from targets• Presents result sets to user

Page 10: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Z39.50, SRW, SRU, etc

Metasearch:Cross-search

Web Site

Search Targets

Page 11: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Approaches to metasearch (2): harvesting • Metasearch service periodically gathers

metadata records from content provider repositories into local database

• Metasearch service accepts user query• Executes query on local database• Presents result sets to user

• Some harvesting services may also harvest/index copy of resource

Page 12: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Metasearch:Harvester

OAI-PMH

Web Site

Repositories

Page 13: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Cross-searching & harvesting

• Metasearch service may use both in combination!

• Cross-search– Latest results returned– Content provider controls searches available– May slow overall performance

• Harvesting– Better performance for user query– Options for normalisation etc by harvester– Only as up-to-date as last harvest

Page 14: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

A hospitable climate for metasearch?

• Metasearch service depends on access to metadata• Web Services

– Standards for providing machine interfaces to applications on Web– Based on HTTP and XML– SOAP (messaging protocol), WSDL (service description), WS-* (!!)– WS not just for search! – Service-oriented approaches, modular applications– Google and Amazon provide Web Services

• "Web 2.0"– "The Web as platform"– Recombining data and services from multiple sources

Page 15: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

The problems with metasearch

• User requires/expects resources from increasing range of content providers

• What if content provider doesn't implement standard search/harvest interface?

• Some proprietary APIs, "XML Gateways"– Scalability

• Some "screen-scraping"– Parsing of HTML pages to obtain metadata– Rights issues– Scalability, volatility

Page 16: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

The problems with metasearch

• Metasearch services work, but….• For service provider

– complex, laborious– fragile, susceptible to change by content

provider– duplication of effort by service providers

• For content provider– concerns over efficiency– concerns over access management– rights, branding, results presentation/ranking

Page 17: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Making metasearch work

Page 18: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Making metasearch work• Effective metasearch requires agreements between

content providers and service providers– Transport protocol(s)– Query language(s)

• syntax and semantics– Metadata schemas

• syntax and semantics– Metadata quality

• presence of values, formats of literals etc– Intellectual property rights issues

• how metadata records and resources are presented, used– Authorisation / authentication– Disclosure / discovery of collections and services

Andy Powell, "Metasearching: an overview", Presentation to BCS EPSG Seminar, July 2004

Page 19: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

The NISO Metasearch Initiative

• Response to concerns of librarians, systems vendors, content providers

• Aims to enable– metasearch service providers to offer more

effective and responsive services – content providers to deliver enhanced content and

protect their intellectual property – libraries to deliver services that distinguish their

services from Google and other free web services

NISO MetaSearch initiativehttp://www.niso.org/committees/MS_initiative.html

Page 20: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Task Group 1: Access Management• Conducted survey of authentication methods

in use• Developed use cases for authentication in

metasearch context• Ranked methods by ability to satisfy needs of

use cases• Recommends either:

– IP-Authentication with a Proxy Server, or– Username/Password authentication

• Liaison with Shibboleth community

Page 21: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Task Group 2: Collection Description• Metasearch service needs information

about targets available for search/harvest– Discover collections of potential interest– Obtain sufficient information to identify a

collection– Select one or more collections from amongst a

number of discovered collections– Discover the services that provide access to

the collection

– Select a service with which to interact– Interact with service

Collectiondescription

Servicedescription

Page 22: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Metasearch 1 Metasearch 2

Collection/ServiceKnowledge Base 1

Collection/ServiceKnowledge Base 2

SharedCollection/Service

Registry

Page 23: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Task Group 2: Collection Description

• Collection Description Specification– Metadata schema for collection-level

description– Closely aligned with DCMI Collection

Description Application Profile– Title, Subject, Size, Language, Item Type,

Owner, Collector, Audience, Rights etc– Whole/Part relationships– Collection/Catalogue relationships– Collection/Service relationships

Page 24: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Task Group 2: Collection Description

• Information Retrieval Service Description Specification– Describe those digital services that provide

access to collections– Zeerex

• Indicates protocol used• Describes access point(s) for service• Describes authentication/authorization requirements• Lists operations/queries supported

Page 25: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Task Group 3: Search/Retrieve

• Result Set Metadata– Metadata schema to describe result set

and record within result set– To support ranking, branding etc

• Citation Metadata– Metadata schema for citation components

(based on subset of OpenURL)

Page 26: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Task Group 3: Search/Retrieve

• NISO XML Gateway– Based on SRU ("non-conformant subset")– Query encoded in URI, transmitted in HTTP GET,

response as XML document– Three levels of implementation

• Level 0: Any query grammar• Level 1: Provide description record for database• Level 3: Support CQL

– Liaison with A9 Opensearch

Page 27: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Metasearch today

Page 28: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Metasearch and Google

• Google– Harvests full-text of Web pages by following links– Makes indexes available for search– Result ranking based on number of links to page

• Index coverage limited to "visible Web"– Problems with

• Authentication controls• Non-persistent URIs• Non-textual resources

• Even if indexed, low ranking if few links• No fielded searching

Page 29: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Metasearch and Google

• "Success is as much about what you don’t search as what you do"

• Selection is important• Relevance of results not determined only

by links, citations• e.g. often useful/vital to select/filter by

audience, purpose of resource

Roy Tennant, "Is Metasearch Dead?"http://www.niso.org/news/events_workshops/OpenURL-05-Agen-FINAL.html

Page 30: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Metasearch and Google

• Google interest in indexing "hidden Web"– Collaborations with repository providers, OCLC etc– Google Scholar

• Google interest in metadata-based approach?– Google Base

• Google and Metasearch as complementary approaches to discovery

Page 31: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Metasearch and "Social bookmarking"

del.icio.ushttp://del.icio.us/

Page 32: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Bibliographic metadata addedto item by Connotea

Metasearch and "Social bookmarking"

Connoteahttp://www.connotea.org/

Page 33: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

Metasearch and "Social Bookmarking"• Simple user-generated metadata

– Typically description plus "tags"– Capture user perceptions of resources– Some services adding richer metadata

• Social: merging of personal collections– Bookmarking services as discovery services

• Connotea as "community-driven recommendation system" (Lund et al)

• Metadata available via RSS or simple API– Can metasearch services use/integrate metadata

from bookmarking services?

Page 34: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

Is Metasearching Better Searching?• Technical components for metasearch available• User expectations of coverage mean metasearch is a

cross-domain problem• However, quality of metasearch dependent on

– metadata quality – metadata consistency – …across multiple providers

• Metasearch can complement other approaches• Metasearch as "enabler"

– supporting construction of many different services

Page 35: A centre of expertise in digital information management  UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations

                                                             

A centre of expertise in digital information management www.ukoln.ac.uk

UKOLN is supported by:

Is MetasearchingReally Better Searching?

STM Innovations SeminarLondon, Friday 2 December 2005

Pete Johnston

Research Officer, UKOLN, University of Bath

www.bath.ac.uk