30
TECHNOLOGIES, TOOLS AND ISSUES Discovery Platforms Saiful Amin 39 th Five Laws Lecture (2011)

Discovery platforms: Technology, tools and issues

Embed Size (px)

DESCRIPTION

Evolution of discovery tools and platforms.

Citation preview

Page 1: Discovery platforms: Technology, tools and issues

T E C H N O L O G I E S , T O O L S A N D I S S U E S

Discovery Platforms

Saiful Amin 39th Five Laws Lecture (2011)

Page 2: Discovery platforms: Technology, tools and issues

Evolution of Discovery Tools

Printed catalogues

Traditional (Web)OPAC

Integrated OPAC portals

Federated search services

Discovery interfaces

Web-scale discovery services

Integrated discovery platform

Page 3: Discovery platforms: Technology, tools and issues

Printed catalogues

Author browse

Title browse

Series browse

Call Number browse

Subject browse

Shelf list (inventory)

Page 4: Discovery platforms: Technology, tools and issues

Traditional (Web)OPAC

ILS Database (Bibs)

(Web)Server Application

Page 5: Discovery platforms: Technology, tools and issues
Page 6: Discovery platforms: Technology, tools and issues

Pros Cons

Keyword search! Author, title, subject

ISBN/LCCN search

Boolean queries

Proximity search

Browse index Authority headings

Title, Call Number

Real-time item status! Copies & availability info

Link to URL (tag 856)

Uses database queries

„LIKE‟ statements

Exact/partial match

Limited use of search algorithm

No relevance ranking

Only physical collection and e-books

Traditional (Web)OPAC

Page 7: Discovery platforms: Technology, tools and issues

Integrated OPAC Portal

ILS Database (Bibs)

Web Server Application

ILS Database (Patrons)

Website content

Enrichment Services Web services

Page 8: Discovery platforms: Technology, tools and issues
Page 9: Discovery platforms: Technology, tools and issues

Pros Cons

All WebOPAC features Keyword search Headings browse Availability info

Library website integration Patron empowerment Circ/Account details Online renewal Online hold placement SDI services New arrivals

OPAC enrichment Book cover/reviews

Thesaurus integration

Uses database queries „LIKE‟ statements

Exact/partial match

Limited use of search algorithms

No relevance ranking

Still limited to only physical collection & e-books

Integrated OPAC Portal

Page 10: Discovery platforms: Technology, tools and issues

Federated Search Service

Web Server Application

Library Catalog

Digital Repository

ProQuest EBSCO Science Direct

PubMed Emerald

Full-text links

dbWiz 360 Search

Pazpar2 Research Pro

Page 11: Discovery platforms: Technology, tools and issues

Federated Search Service

Muse Content Architecture

http://www.museglobal.com/technology/contentIntegration.html

Supports 6300+ databases!

Page 12: Discovery platforms: Technology, tools and issues
Page 13: Discovery platforms: Technology, tools and issues

Pros Cons

Single search broadcast Real-time search results Based on standards Z39.50, SRU/W MARC, ISO2709, XML

Supports large set of databases 7000+ in “360 Search” 6300+ in Muse platform

Merging and sorting No local index

(maintenance free!)

Not all databases are standards compliant Requires custom search scripts

Requires metadata crosswalk

Network intensive Performance issues

Mostly available as hosted service Annual subscription

Federated Search Service

Page 14: Discovery platforms: Technology, tools and issues
Page 15: Discovery platforms: Technology, tools and issues

Discovery Interface

Central Index (Solr/Lucene)

Web Server Application

ILS Database

MARC Bib data

Availability/Holds

Digital Repository

DC XML data

Full-text link

Enrichment Services Web services

Page 16: Discovery platforms: Technology, tools and issues
Page 17: Discovery platforms: Technology, tools and issues

Discovery Interface

Word stemming „fishing‟, „fished‟, „fish‟,

„fisher‟ => „fish‟

Fuzzy search insertion: cot coat deletion: coat cot substitution: coat cost

Auto-suggest N-gram, Edge N-gram

analysis

Phrase query „Did you mean?‟

Spell Checker

Relevance ranking TF-IDF / Term Vector

Term weights

Lucene scores

Faceted browsing Who are main authors and

their count?

What are main subjects and their count?

Page 18: Discovery platforms: Technology, tools and issues

Pros Cons

Google-like search box Advanced features Fuzzy searching Relevance ranking Word stemming algorithms Social tagging/reviews “Did you mean?” feature Auto-suggest (type ahead) Faceted browsing

Availability/Hold requests Metadata enrichment Linking Amazon/Google/Wikipedia

Digital repository integration

Searches only locally hosted collections

Discovery Interface

Page 19: Discovery platforms: Technology, tools and issues

Can we combine the two?

Modern discovery interface

Local collections + Remote databases

Unified search result

Page 20: Discovery platforms: Technology, tools and issues

Web-scale Discovery Services

Central Index

Library Catalog

MARC data

Availability Full-text link

EBSCO

ProQuest

ABI Inform

PubMed

Science Direct

Lexis-Nexis

Web Server Application

Full-text and metadata

… Digital

Repository DC data

Page 21: Discovery platforms: Technology, tools and issues

Web-scale Discovery Services

Page 22: Discovery platforms: Technology, tools and issues

Web-scale Discovery Services

Library catalog records E-journal articles Institutional repositories Newspaper articles E-books Dissertations

Conference proceedings Grey literature Cited references Reports Digital library Databases and more.

Content types include:

Summon Service

Page 23: Discovery platforms: Technology, tools and issues
Page 24: Discovery platforms: Technology, tools and issues

Pros Cons

Google-like single search box Pre-indexed licensed content Inclusion of local collection OAI-PMH, MARC updates

Advanced features Relevance ranking “Did you mean?” Auto-suggest (type ahead) Faceted navigation

Availability/Full-text links Mobile friendly Web-service APIs Easier off-campus access No installation/maintenance

Supports limited number of databases (1000-1500) Requires huge investment to

maintain centralized index

Publisher partnerships (Licensing/legal issues)

Regular pre-publication indexing

Mostly hosted-only service Content bias? (ranking) Vendor lock-in?

Annual subscription

Web-scale Discovery Services

Page 25: Discovery platforms: Technology, tools and issues

Can we have best of both worlds?

Web Server Application

Digital Repository

ILS database (Bibs)

Remote database

Remote database

Remote database

Remote database

Remote database

Remote database

Remote database

Remote database

Modern discovery interface

Local collections +

Remote databases

Unified search result

Supports large number of databases

Based on open standards (extensible)

Can be maintained locally (No subscription!)

Page 26: Discovery platforms: Technology, tools and issues

Integrated Discovery Platform

http://www.indexdata.com/masterkey

Semi-commercial Supports 1000+ databases

Page 27: Discovery platforms: Technology, tools and issues
Page 28: Discovery platforms: Technology, tools and issues

Integrated Discovery Platform

Pazpar2 Architecture

https://www.indexdata.com/pazpar2

Open source (GPL) Build your own connector!

Page 29: Discovery platforms: Technology, tools and issues

Conclusion

Each platform has its own goals: Pure library catalog can provide expressive search (high precision)

Federated search improves content coverage in single search

Discovery interfaces are designed to improve user experience for local collections

Web-scale discovery provides unified search experience for local and remote collections (still way short in content coverage)

Integrated platform provides extensibility (but requires significant effort in development and maintenance)

One size does not fit all. No single system is perfect.

As content becomes more open, the focus of discovery solutions should be on open platforms that are extensible as well as affordable.

Page 30: Discovery platforms: Technology, tools and issues

Questions and Discussions